Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование
Integrating morphological stemming and syntactic parsing for low-resource Uzbek texts. / Mengliev, Davlatyor; Abdurakhmonova, Nilufar; Barkhnin, Vladimir и др.
AIP Conference Proceedings. ред. / Niyetbay Uteuliev; Bakhtiyor Khuzhayorov; Bekzodjion Fayziev. Том 3377 American Institute of Physics Inc., 2025. 040003 (AIP Conference Proceedings; Том 3377, № 1).Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование
}
TY - GEN
T1 - Integrating morphological stemming and syntactic parsing for low-resource Uzbek texts
AU - Mengliev, Davlatyor
AU - Abdurakhmonova, Nilufar
AU - Barkhnin, Vladimir
AU - Ibragimov, Bahodir
AU - Jurakulova, Madina
AU - Urazaliyeva, Mavluda
AU - Islombekov, Bozorboy
N1 - Conference code: 2
PY - 2025/11/7
Y1 - 2025/11/7
N2 - In the context of the lack of language resources for the Uzbek language, the development of complex tools for automatic text processing is particularly relevant. This article proposes a hybrid approach that combines preliminary morphological normalization of word forms with subsequent syntactic analysis of Uzbek sentences. At the first stage, stemming and lemmatization are performed using rules and dictionary resources, which allows obtaining canonical forms of words and reducing the degree of ambiguity. At the next stage, a trained model based on a syntactic parser (spaCy) determines grammatical relations between words. Experiments conducted on a corpus of multi-genre Uzbek texts demonstrated an improvement in the quality of syntactic analysis using morphological normalization: accuracy (Precision) reached 92%, recall (Recall) - about 91%, and F1-measure - about 91%. A comparative analysis with a model without preliminary normalization showed a decrease in quality by several percentage points, which emphasizes the rather important role of the morphological stage. The results obtained indicate the prospects of the proposed solution and create a basis for further development of tools for processing Uzbek texts.
AB - In the context of the lack of language resources for the Uzbek language, the development of complex tools for automatic text processing is particularly relevant. This article proposes a hybrid approach that combines preliminary morphological normalization of word forms with subsequent syntactic analysis of Uzbek sentences. At the first stage, stemming and lemmatization are performed using rules and dictionary resources, which allows obtaining canonical forms of words and reducing the degree of ambiguity. At the next stage, a trained model based on a syntactic parser (spaCy) determines grammatical relations between words. Experiments conducted on a corpus of multi-genre Uzbek texts demonstrated an improvement in the quality of syntactic analysis using morphological normalization: accuracy (Precision) reached 92%, recall (Recall) - about 91%, and F1-measure - about 91%. A comparative analysis with a model without preliminary normalization showed a decrease in quality by several percentage points, which emphasizes the rather important role of the morphological stage. The results obtained indicate the prospects of the proposed solution and create a basis for further development of tools for processing Uzbek texts.
UR - https://www.scopus.com/pages/publications/105021346124
UR - https://www.mendeley.com/catalogue/ebf97d9e-e85f-3e62-8a4d-a8447a3a53ac/
U2 - 10.1063/5.0299773
DO - 10.1063/5.0299773
M3 - Conference contribution
VL - 3377
T3 - AIP Conference Proceedings
BT - AIP Conference Proceedings
A2 - Uteuliev, Niyetbay
A2 - Khuzhayorov, Bakhtiyor
A2 - Fayziev, Bekzodjion
PB - American Institute of Physics Inc.
T2 - Second International Scientific and Practical Conference on Actual Problems of Mathematical Modeling and Information Technology
Y2 - 12 November 2024 through 13 November 2024
ER -
ID: 72347744