Research output: Contribution to journal › Conference article › peer-review
Unveiling the Variance of Uzbek language: A Rule-Based Algorithm for Dialect Recognition. / Mengliev, Davlatyor; Barakhnin, Vladimir; Madirimov, Shohrux et al.
In: AIP Conference Proceedings, Vol. 3244, No. 1, 030012, 27.11.2024.Research output: Contribution to journal › Conference article › peer-review
}
TY - JOUR
T1 - Unveiling the Variance of Uzbek language: A Rule-Based Algorithm for Dialect Recognition
AU - Mengliev, Davlatyor
AU - Barakhnin, Vladimir
AU - Madirimov, Shohrux
AU - Ibragimov, Bahodir
AU - Eshkulov, Mukhriddin
AU - Saidov, Bobur
PY - 2024/11/27
Y1 - 2024/11/27
N2 - The study presents an algorithm for recognizing dialects of Uzbek language, which is one of the low-resource languages. The authors did research about related works in this field, and included results of brief comparative analysis in second sub-section of the manuscript. Moreover, during the study a rule-based algorithm for dialect detection was developed, which uses 100 000 words dictionaries for each dialect in the analysis process. This approach differs from existing solutions by its transparency in the analysis process of the words, which allows tracking each step of the algorithm. For more optimized performance of the algorithm, the authors divided the corpus of each dialect into dictionaries, each dialect has 28 dictionaries and thus 140 dictionaries were generated. Testing of the algorithm showed its high accuracy in identifying dialect words and processing mixed dialect entries.
AB - The study presents an algorithm for recognizing dialects of Uzbek language, which is one of the low-resource languages. The authors did research about related works in this field, and included results of brief comparative analysis in second sub-section of the manuscript. Moreover, during the study a rule-based algorithm for dialect detection was developed, which uses 100 000 words dictionaries for each dialect in the analysis process. This approach differs from existing solutions by its transparency in the analysis process of the words, which allows tracking each step of the algorithm. For more optimized performance of the algorithm, the authors divided the corpus of each dialect into dictionaries, each dialect has 28 dictionaries and thus 140 dictionaries were generated. Testing of the algorithm showed its high accuracy in identifying dialect words and processing mixed dialect entries.
UR - https://www.scopus.com/record/display.uri?eid=2-s2.0-85212103757&origin=inward&txGid=d95a6c241859fd3b1f5d68371fc0af84
UR - https://www.mendeley.com/catalogue/04ab98e8-40f9-3f5f-86eb-6ca3673c8747/
U2 - 10.1063/5.0241409
DO - 10.1063/5.0241409
M3 - Conference article
VL - 3244
JO - AIP Conference Proceedings
JF - AIP Conference Proceedings
SN - 0094-243X
IS - 1
M1 - 030012
T2 - 2024 International Scientific Conference on Modern Problems of Applied Science and Engineering
Y2 - 2 May 2024 through 3 May 2024
ER -
ID: 61408074