Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
Automated Recognition of Named Entities and Dialect Standardization in Uzbek Legal Texts. / Mengliev, Davlatyor B.; Abdurakhmonova, Nilufar Z.; Rahimov, Hasanboy et al.
Proceedings of the IEEE 3rd International Conference on Problems of Informatics, Electronics and Radio Engineering, PIERE 2024. Institute of Electrical and Electronics Engineers Inc., 2024. p. 1050-1053.Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
}
TY - GEN
T1 - Automated Recognition of Named Entities and Dialect Standardization in Uzbek Legal Texts
AU - Mengliev, Davlatyor B.
AU - Abdurakhmonova, Nilufar Z.
AU - Rahimov, Hasanboy
AU - Zolotykh, Nikolai Yu
AU - Ubaydullayev, Alisher A.
AU - Ibragimov, Bahodir B.
N1 - Conference code: 3
PY - 2024
Y1 - 2024
N2 - This study presents the development of a tool for identifying named entities in Uzbek legal texts. It should be noted, that besides of detecting named entities, the authors developed an algorithm, which is able to standardize word forms by replacing the detected dialect words (Karluk, Kypchak and Oghuz) with their formal forms. This will help to fix popular grammatical mistakes among native speakers from different regions of the Uzbekistan. The proposed hybrid approach combines the traditional approach, which is used in the preprocessing (standardization of word forms), where a dictionary with more than 10 thousand marked words is actively used. At the same time, a custom language model is used to work with detecting named entities, which was trained on 2000 legal sentences. The testing results showed quite high indicators, in particular, the language model detected named entities with an accuracy of 90%, and the recall reached 94%. Moreover, the algorithm used to standardize dialect word forms showed even higher rates, ranging from 90% to 100% depending on the dialect.
AB - This study presents the development of a tool for identifying named entities in Uzbek legal texts. It should be noted, that besides of detecting named entities, the authors developed an algorithm, which is able to standardize word forms by replacing the detected dialect words (Karluk, Kypchak and Oghuz) with their formal forms. This will help to fix popular grammatical mistakes among native speakers from different regions of the Uzbekistan. The proposed hybrid approach combines the traditional approach, which is used in the preprocessing (standardization of word forms), where a dictionary with more than 10 thousand marked words is actively used. At the same time, a custom language model is used to work with detecting named entities, which was trained on 2000 legal sentences. The testing results showed quite high indicators, in particular, the language model detected named entities with an accuracy of 90%, and the recall reached 94%. Moreover, the algorithm used to standardize dialect word forms showed even higher rates, ranging from 90% to 100% depending on the dialect.
KW - Karluk
KW - Kypchak
KW - Oghuz
KW - Uzbek language
KW - algorithm development
KW - dialect standardization
KW - legal documents
KW - linguistic diversity
KW - low-resource languages
KW - named entity recognition
KW - text processing
UR - https://www.scopus.com/record/display.uri?eid=2-s2.0-85216560257&origin=inward&txGid=bcd4f6c5c60f2f3afee61beccfbc3aed
UR - https://www.mendeley.com/catalogue/1ac29b63-8f94-33c1-ad2d-4e98457bcdfe/
U2 - 10.1109/PIERE62470.2024.10804942
DO - 10.1109/PIERE62470.2024.10804942
M3 - Conference contribution
SN - 979-8-3315-1633-8
SP - 1050
EP - 1053
BT - Proceedings of the IEEE 3rd International Conference on Problems of Informatics, Electronics and Radio Engineering, PIERE 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 3rd IEEE International Conference on Problems of Informatics, Electronics and Radio Engineering
Y2 - 15 November 2024 through 17 November 2024
ER -
ID: 64619149