Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование
A comparative study of annotation schemes for named entity recognition in Uzbek legal texts. / Mengliev, Davlatyor; Abdurakhmonova, Nilufar; Shirinova, Raima и др.
AIP Conference Proceedings. ред. / Niyetbay Uteuliev; Bakhtiyor Khuzhayorov; Bekzodjion Fayziev. Том 3377 American Institute of Physics Inc., 2025. 070005 (AIP Conference Proceedings; Том 3377, № 1).Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование
}
TY - GEN
T1 - A comparative study of annotation schemes for named entity recognition in Uzbek legal texts
AU - Mengliev, Davlatyor
AU - Abdurakhmonova, Nilufar
AU - Shirinova, Raima
AU - Ibragimov, Bahodir
AU - Khudayberganova, Dildora
AU - Suyunova, Mohinur
N1 - Conference code: 2
PY - 2025/10/7
Y1 - 2025/10/7
N2 - Developing effective named entity extraction systems for the Uzbek language is complicated by low resource availability and the lack of standard tools. This paper presents a comparative analysis of four different named entity annotation schemes (BIO, BIOES, IO, and BILOU) in the context of Uzbek legislative texts. To assess the impact of annotation on the quality of entity extraction, a corpus of 3,000 sentences in four copies was formed. Each copy was annotated using a separate scheme. Based on the obtained data, four instances of the mBERT model were trained, and their performance was assessed using standard metrics (Precision, Recall, F1-score). The results showed that the choice of annotation scheme significantly affects the quality of the resulting NER models. The best results were achieved using BIOES (∼94% F1-score) and BILOU (∼92% F1-score), while BIO-scheme demonstrated intermediate quality (∼81% F1-score) and IO-scheme was the least effective (∼75% F1-score). Thus, in the context of a custom corpus, more detailed annotation schemes that allow explicit identification of entity boundaries are the preferred solution. This study highlights the importance of careful choice of annotation scheme when developing NER systems for Uzbek texts and provides a basis for further improvements in named entity recognition quality.
AB - Developing effective named entity extraction systems for the Uzbek language is complicated by low resource availability and the lack of standard tools. This paper presents a comparative analysis of four different named entity annotation schemes (BIO, BIOES, IO, and BILOU) in the context of Uzbek legislative texts. To assess the impact of annotation on the quality of entity extraction, a corpus of 3,000 sentences in four copies was formed. Each copy was annotated using a separate scheme. Based on the obtained data, four instances of the mBERT model were trained, and their performance was assessed using standard metrics (Precision, Recall, F1-score). The results showed that the choice of annotation scheme significantly affects the quality of the resulting NER models. The best results were achieved using BIOES (∼94% F1-score) and BILOU (∼92% F1-score), while BIO-scheme demonstrated intermediate quality (∼81% F1-score) and IO-scheme was the least effective (∼75% F1-score). Thus, in the context of a custom corpus, more detailed annotation schemes that allow explicit identification of entity boundaries are the preferred solution. This study highlights the importance of careful choice of annotation scheme when developing NER systems for Uzbek texts and provides a basis for further improvements in named entity recognition quality.
UR - https://www.scopus.com/pages/publications/105021307050
UR - https://www.mendeley.com/catalogue/4b12ae6f-5f7b-3298-8492-e4afca8f6962/
U2 - 10.1063/5.0299776
DO - 10.1063/5.0299776
M3 - Conference contribution
VL - 3377
T3 - AIP Conference Proceedings
BT - AIP Conference Proceedings
A2 - Uteuliev, Niyetbay
A2 - Khuzhayorov, Bakhtiyor
A2 - Fayziev, Bekzodjion
PB - American Institute of Physics Inc.
T2 - Second International Scientific and Practical Conference on Actual Problems of Mathematical Modeling and Information Technology
Y2 - 12 November 2024 through 13 November 2024
ER -
ID: 72346872