Standard

A comparative study of annotation schemes for named entity recognition in Uzbek legal texts. / Mengliev, Davlatyor; Abdurakhmonova, Nilufar; Shirinova, Raima и др.

AIP Conference Proceedings. ред. / Niyetbay Uteuliev; Bakhtiyor Khuzhayorov; Bekzodjion Fayziev. Том 3377 American Institute of Physics Inc., 2025. 070005 (AIP Conference Proceedings; Том 3377, № 1).

Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференцийстатья в сборнике материалов конференциинаучнаяРецензирование

Harvard

Mengliev, D, Abdurakhmonova, N, Shirinova, R, Ibragimov, B, Khudayberganova, D & Suyunova, M 2025, A comparative study of annotation schemes for named entity recognition in Uzbek legal texts. в N Uteuliev, B Khuzhayorov & B Fayziev (ред.), AIP Conference Proceedings. Том. 3377, 070005, AIP Conference Proceedings, № 1, Том. 3377, American Institute of Physics Inc., Second International Scientific and Practical Conference on Actual Problems of Mathematical Modeling and Information Technology, Nukus, Узбекистан, 12.11.2024. https://doi.org/10.1063/5.0299776

APA

Mengliev, D., Abdurakhmonova, N., Shirinova, R., Ibragimov, B., Khudayberganova, D., & Suyunova, M. (2025). A comparative study of annotation schemes for named entity recognition in Uzbek legal texts. в N. Uteuliev, B. Khuzhayorov, & B. Fayziev (Ред.), AIP Conference Proceedings (Том 3377). [070005] (AIP Conference Proceedings; Том 3377, № 1). American Institute of Physics Inc.. https://doi.org/10.1063/5.0299776

Vancouver

Mengliev D, Abdurakhmonova N, Shirinova R, Ibragimov B, Khudayberganova D, Suyunova M. A comparative study of annotation schemes for named entity recognition in Uzbek legal texts. в Uteuliev N, Khuzhayorov B, Fayziev B, Редакторы, AIP Conference Proceedings. Том 3377. American Institute of Physics Inc. 2025. 070005. (AIP Conference Proceedings; 1). doi: 10.1063/5.0299776

Author

Mengliev, Davlatyor ; Abdurakhmonova, Nilufar ; Shirinova, Raima и др. / A comparative study of annotation schemes for named entity recognition in Uzbek legal texts. AIP Conference Proceedings. Редактор / Niyetbay Uteuliev ; Bakhtiyor Khuzhayorov ; Bekzodjion Fayziev. Том 3377 American Institute of Physics Inc., 2025. (AIP Conference Proceedings; 1).

BibTeX

@inproceedings{523b2d5ca9dd4fc9b3a48b2459681791,
title = "A comparative study of annotation schemes for named entity recognition in Uzbek legal texts",
abstract = "Developing effective named entity extraction systems for the Uzbek language is complicated by low resource availability and the lack of standard tools. This paper presents a comparative analysis of four different named entity annotation schemes (BIO, BIOES, IO, and BILOU) in the context of Uzbek legislative texts. To assess the impact of annotation on the quality of entity extraction, a corpus of 3,000 sentences in four copies was formed. Each copy was annotated using a separate scheme. Based on the obtained data, four instances of the mBERT model were trained, and their performance was assessed using standard metrics (Precision, Recall, F1-score). The results showed that the choice of annotation scheme significantly affects the quality of the resulting NER models. The best results were achieved using BIOES (∼94% F1-score) and BILOU (∼92% F1-score), while BIO-scheme demonstrated intermediate quality (∼81% F1-score) and IO-scheme was the least effective (∼75% F1-score). Thus, in the context of a custom corpus, more detailed annotation schemes that allow explicit identification of entity boundaries are the preferred solution. This study highlights the importance of careful choice of annotation scheme when developing NER systems for Uzbek texts and provides a basis for further improvements in named entity recognition quality.",
author = "Davlatyor Mengliev and Nilufar Abdurakhmonova and Raima Shirinova and Bahodir Ibragimov and Dildora Khudayberganova and Mohinur Suyunova",
year = "2025",
month = oct,
day = "7",
doi = "10.1063/5.0299776",
language = "English",
volume = "3377",
series = "AIP Conference Proceedings",
publisher = "American Institute of Physics Inc.",
number = "1",
editor = "Niyetbay Uteuliev and Bakhtiyor Khuzhayorov and Bekzodjion Fayziev",
booktitle = "AIP Conference Proceedings",
address = "United States",
note = "Second International Scientific and Practical Conference on Actual Problems of Mathematical Modeling and Information Technology, APMMIT2024 ; Conference date: 12-11-2024 Through 13-11-2024",

}

RIS

TY - GEN

T1 - A comparative study of annotation schemes for named entity recognition in Uzbek legal texts

AU - Mengliev, Davlatyor

AU - Abdurakhmonova, Nilufar

AU - Shirinova, Raima

AU - Ibragimov, Bahodir

AU - Khudayberganova, Dildora

AU - Suyunova, Mohinur

N1 - Conference code: 2

PY - 2025/10/7

Y1 - 2025/10/7

N2 - Developing effective named entity extraction systems for the Uzbek language is complicated by low resource availability and the lack of standard tools. This paper presents a comparative analysis of four different named entity annotation schemes (BIO, BIOES, IO, and BILOU) in the context of Uzbek legislative texts. To assess the impact of annotation on the quality of entity extraction, a corpus of 3,000 sentences in four copies was formed. Each copy was annotated using a separate scheme. Based on the obtained data, four instances of the mBERT model were trained, and their performance was assessed using standard metrics (Precision, Recall, F1-score). The results showed that the choice of annotation scheme significantly affects the quality of the resulting NER models. The best results were achieved using BIOES (∼94% F1-score) and BILOU (∼92% F1-score), while BIO-scheme demonstrated intermediate quality (∼81% F1-score) and IO-scheme was the least effective (∼75% F1-score). Thus, in the context of a custom corpus, more detailed annotation schemes that allow explicit identification of entity boundaries are the preferred solution. This study highlights the importance of careful choice of annotation scheme when developing NER systems for Uzbek texts and provides a basis for further improvements in named entity recognition quality.

AB - Developing effective named entity extraction systems for the Uzbek language is complicated by low resource availability and the lack of standard tools. This paper presents a comparative analysis of four different named entity annotation schemes (BIO, BIOES, IO, and BILOU) in the context of Uzbek legislative texts. To assess the impact of annotation on the quality of entity extraction, a corpus of 3,000 sentences in four copies was formed. Each copy was annotated using a separate scheme. Based on the obtained data, four instances of the mBERT model were trained, and their performance was assessed using standard metrics (Precision, Recall, F1-score). The results showed that the choice of annotation scheme significantly affects the quality of the resulting NER models. The best results were achieved using BIOES (∼94% F1-score) and BILOU (∼92% F1-score), while BIO-scheme demonstrated intermediate quality (∼81% F1-score) and IO-scheme was the least effective (∼75% F1-score). Thus, in the context of a custom corpus, more detailed annotation schemes that allow explicit identification of entity boundaries are the preferred solution. This study highlights the importance of careful choice of annotation scheme when developing NER systems for Uzbek texts and provides a basis for further improvements in named entity recognition quality.

UR - https://www.scopus.com/pages/publications/105021307050

UR - https://www.mendeley.com/catalogue/4b12ae6f-5f7b-3298-8492-e4afca8f6962/

U2 - 10.1063/5.0299776

DO - 10.1063/5.0299776

M3 - Conference contribution

VL - 3377

T3 - AIP Conference Proceedings

BT - AIP Conference Proceedings

A2 - Uteuliev, Niyetbay

A2 - Khuzhayorov, Bakhtiyor

A2 - Fayziev, Bekzodjion

PB - American Institute of Physics Inc.

T2 - Second International Scientific and Practical Conference on Actual Problems of Mathematical Modeling and Information Technology

Y2 - 12 November 2024 through 13 November 2024

ER -

ID: 72346872