Friday, August 27, 2021

INVESTIGATING DATA SHARING IN SPEECH RECOGNITION FOR AN UNDER-RESOURCED LANGUAGE: THE CASE OF ALGERIAN DIALECT

Author :  Mohamed Amine Menacer

Affiliation :  Université de Lorraine

Country :  France

Category :  Computer Science & Information Technology

Volume, Issue, Month, Year :  11, 03, March, 2021

Abstract :

The Arabic language has many varieties, including its standard form, Modern Standard Arabic (MSA), and its spoken forms, namely the dialects. Those dialects are representative examples of under-resourced languages for which automatic speech recognition is considered as an unresolved issue. To address this issue, we recorded several hours of spoken Algerian dialect and used them to train a baseline model. This model was boosted afterwards by taking advantage of other languages that impact this dialect by integrating their data in one large corpus and by investigating three approaches: multilingual training, multitask learning and transfer learning. The best performance was achieved using a limited and balanced amount of acoustic data from each additional language, as compared to the data size of the studied dialect. This approach led to an improvement of 3.8% in terms of word error rate in comparison to the baseline system trained only on the dialect data.

Keyword :  Automatic speech recognition, Algerian dialect, MSA, multilingual training, multitask learning, transfer learning.

For More Detailshttps://aircconline.com/csit/papers/vol11/csit110308.pdf

No comments:

Post a Comment