Automated Recognition of Named Entities and Dialect Standardization in Uzbek Legal Texts

Standard

Automated Recognition of Named Entities and Dialect Standardization in Uzbek Legal Texts. / Mengliev, Davlatyor B.; Abdurakhmonova, Nilufar Z.; Rahimov, Hasanboy et al.

Proceedings of the IEEE 3rd International Conference on Problems of Informatics, Electronics and Radio Engineering, PIERE 2024. Institute of Electrical and Electronics Engineers Inc., 2024. p. 1050-1053.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review

Harvard

Mengliev, DB, Abdurakhmonova, NZ, Rahimov, H, Zolotykh, NY, Ubaydullayev, AA & Ibragimov, BB 2024, Automated Recognition of Named Entities and Dialect Standardization in Uzbek Legal Texts. in Proceedings of the IEEE 3rd International Conference on Problems of Informatics, Electronics and Radio Engineering, PIERE 2024. Institute of Electrical and Electronics Engineers Inc., pp. 1050-1053, 3rd IEEE International Conference on Problems of Informatics, Electronics and Radio Engineering, Novosibirsk, Russian Federation, 15.11.2024. https://doi.org/10.1109/PIERE62470.2024.10804942

APA

Mengliev, D. B., Abdurakhmonova, N. Z., Rahimov, H., Zolotykh, N. Y., Ubaydullayev, A. A., & Ibragimov, B. B. (2024). Automated Recognition of Named Entities and Dialect Standardization in Uzbek Legal Texts. In Proceedings of the IEEE 3rd International Conference on Problems of Informatics, Electronics and Radio Engineering, PIERE 2024 (pp. 1050-1053). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/PIERE62470.2024.10804942

Vancouver

Mengliev DB, Abdurakhmonova NZ, Rahimov H, Zolotykh NY, Ubaydullayev AA, Ibragimov BB. Automated Recognition of Named Entities and Dialect Standardization in Uzbek Legal Texts. In Proceedings of the IEEE 3rd International Conference on Problems of Informatics, Electronics and Radio Engineering, PIERE 2024. Institute of Electrical and Electronics Engineers Inc. 2024. p. 1050-1053 doi: 10.1109/PIERE62470.2024.10804942

Author

Mengliev, Davlatyor B. ; Abdurakhmonova, Nilufar Z. ; Rahimov, Hasanboy et al. / Automated Recognition of Named Entities and Dialect Standardization in Uzbek Legal Texts. Proceedings of the IEEE 3rd International Conference on Problems of Informatics, Electronics and Radio Engineering, PIERE 2024. Institute of Electrical and Electronics Engineers Inc., 2024. pp. 1050-1053

BibTeX

@inproceedings{fed11155f1ac40f1ab2a036bbf8048af,

title = "Automated Recognition of Named Entities and Dialect Standardization in Uzbek Legal Texts",

abstract = "This study presents the development of a tool for identifying named entities in Uzbek legal texts. It should be noted, that besides of detecting named entities, the authors developed an algorithm, which is able to standardize word forms by replacing the detected dialect words (Karluk, Kypchak and Oghuz) with their formal forms. This will help to fix popular grammatical mistakes among native speakers from different regions of the Uzbekistan. The proposed hybrid approach combines the traditional approach, which is used in the preprocessing (standardization of word forms), where a dictionary with more than 10 thousand marked words is actively used. At the same time, a custom language model is used to work with detecting named entities, which was trained on 2000 legal sentences. The testing results showed quite high indicators, in particular, the language model detected named entities with an accuracy of 90%, and the recall reached 94%. Moreover, the algorithm used to standardize dialect word forms showed even higher rates, ranging from 90% to 100% depending on the dialect.",

keywords = "Karluk, Kypchak, Oghuz, Uzbek language, algorithm development, dialect standardization, legal documents, linguistic diversity, low-resource languages, named entity recognition, text processing",

author = "Mengliev, {Davlatyor B.} and Abdurakhmonova, {Nilufar Z.} and Hasanboy Rahimov and Zolotykh, {Nikolai Yu} and Ubaydullayev, {Alisher A.} and Ibragimov, {Bahodir B.}",

year = "2024",

doi = "10.1109/PIERE62470.2024.10804942",

language = "English",

isbn = "979-8-3315-1633-8",

pages = "1050--1053",

booktitle = "Proceedings of the IEEE 3rd International Conference on Problems of Informatics, Electronics and Radio Engineering, PIERE 2024",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

address = "United States",

note = "3rd IEEE International Conference on Problems of Informatics, Electronics and Radio Engineering, PIERE 2024 ; Conference date: 15-11-2024 Through 17-11-2024",

}

RIS

TY - GEN

T1 - Automated Recognition of Named Entities and Dialect Standardization in Uzbek Legal Texts

AU - Mengliev, Davlatyor B.

AU - Abdurakhmonova, Nilufar Z.

AU - Rahimov, Hasanboy

AU - Zolotykh, Nikolai Yu

AU - Ubaydullayev, Alisher A.

AU - Ibragimov, Bahodir B.

N1 - Conference code: 3

PY - 2024

Y1 - 2024

N2 - This study presents the development of a tool for identifying named entities in Uzbek legal texts. It should be noted, that besides of detecting named entities, the authors developed an algorithm, which is able to standardize word forms by replacing the detected dialect words (Karluk, Kypchak and Oghuz) with their formal forms. This will help to fix popular grammatical mistakes among native speakers from different regions of the Uzbekistan. The proposed hybrid approach combines the traditional approach, which is used in the preprocessing (standardization of word forms), where a dictionary with more than 10 thousand marked words is actively used. At the same time, a custom language model is used to work with detecting named entities, which was trained on 2000 legal sentences. The testing results showed quite high indicators, in particular, the language model detected named entities with an accuracy of 90%, and the recall reached 94%. Moreover, the algorithm used to standardize dialect word forms showed even higher rates, ranging from 90% to 100% depending on the dialect.

AB - This study presents the development of a tool for identifying named entities in Uzbek legal texts. It should be noted, that besides of detecting named entities, the authors developed an algorithm, which is able to standardize word forms by replacing the detected dialect words (Karluk, Kypchak and Oghuz) with their formal forms. This will help to fix popular grammatical mistakes among native speakers from different regions of the Uzbekistan. The proposed hybrid approach combines the traditional approach, which is used in the preprocessing (standardization of word forms), where a dictionary with more than 10 thousand marked words is actively used. At the same time, a custom language model is used to work with detecting named entities, which was trained on 2000 legal sentences. The testing results showed quite high indicators, in particular, the language model detected named entities with an accuracy of 90%, and the recall reached 94%. Moreover, the algorithm used to standardize dialect word forms showed even higher rates, ranging from 90% to 100% depending on the dialect.

KW - Karluk

KW - Kypchak

KW - Oghuz

KW - Uzbek language

KW - algorithm development

KW - dialect standardization

KW - legal documents

KW - linguistic diversity

KW - low-resource languages

KW - named entity recognition

KW - text processing

UR - https://www.scopus.com/record/display.uri?eid=2-s2.0-85216560257&origin=inward&txGid=bcd4f6c5c60f2f3afee61beccfbc3aed

UR - https://www.mendeley.com/catalogue/1ac29b63-8f94-33c1-ad2d-4e98457bcdfe/

U2 - 10.1109/PIERE62470.2024.10804942

DO - 10.1109/PIERE62470.2024.10804942

M3 - Conference contribution

SN - 979-8-3315-1633-8

SP - 1050

EP - 1053

BT - Proceedings of the IEEE 3rd International Conference on Problems of Informatics, Electronics and Radio Engineering, PIERE 2024

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 3rd IEEE International Conference on Problems of Informatics, Electronics and Radio Engineering

Y2 - 15 November 2024 through 17 November 2024

ER -

ID: 64619149