Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование
Development of a Legal Document Recognition Algorithm for the Karakalpak Language. / Mengliev, Davlatyor B.; Barakhnin, Vladimir B.; Eshkulov, Mukhriddin O. и др.
2024 IEEE International Multi-Conference on Engineering, Computer and Information Sciences, SIBIRCON 2024. Institute of Electrical and Electronics Engineers Inc., 2024. стр. 323-326 (2024 IEEE International Multi-Conference on Engineering, Computer and Information Sciences, SIBIRCON 2024).Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование
}
TY - GEN
T1 - Development of a Legal Document Recognition Algorithm for the Karakalpak Language
AU - Mengliev, Davlatyor B.
AU - Barakhnin, Vladimir B.
AU - Eshkulov, Mukhriddin O.
AU - Allamov, Oybek T.
AU - Ibragimov, Bahodir B.
AU - Khudaybergenov, Timur A.
PY - 2024/11/26
Y1 - 2024/11/26
N2 - In this study, the authors propose an algorithm for recognizing legal documents in Karakalpak texts. To develop such an algorithm, similar scientific works were studied, the relevance of the current work and the problems that need to be solved were identified. The proposed algorithm is developed based on traditional rules, which include the rules of morphology of the Karakalpak language, as well as a dictionary of tagged words used to identify legal words and phrases from texts. The dictionary used contains more than 12,000 tagged words, which include both word roots and other forms concatenated to grammatical affixes. The authors also tested the algorithm, where high accuracy in identifying the necessary words was achieved. In particular, three samples were formed, each of which contained words and phrases of legal terminology in a certain amount. In conclusion, the authors added information regarding the proposed improvements and further prospects for the development of the algorithm.
AB - In this study, the authors propose an algorithm for recognizing legal documents in Karakalpak texts. To develop such an algorithm, similar scientific works were studied, the relevance of the current work and the problems that need to be solved were identified. The proposed algorithm is developed based on traditional rules, which include the rules of morphology of the Karakalpak language, as well as a dictionary of tagged words used to identify legal words and phrases from texts. The dictionary used contains more than 12,000 tagged words, which include both word roots and other forms concatenated to grammatical affixes. The authors also tested the algorithm, where high accuracy in identifying the necessary words was achieved. In particular, three samples were formed, each of which contained words and phrases of legal terminology in a certain amount. In conclusion, the authors added information regarding the proposed improvements and further prospects for the development of the algorithm.
KW - Karakalpak language
KW - NLP
KW - algorithm testing
KW - annotated dictionary
KW - legal document recognition
KW - linguistic resources
KW - low-resource language
KW - morphological analysis
KW - rule-based algorithm
KW - text tokenization
UR - https://www.scopus.com/record/display.uri?eid=2-s2.0-85212092216&origin=inward&txGid=50a189d008719dc9c313fbd29622cce3
UR - https://www.mendeley.com/catalogue/015c4969-ba88-3319-b8e6-8093da50dff8/
U2 - 10.1109/SIBIRCON63777.2024.10758548
DO - 10.1109/SIBIRCON63777.2024.10758548
M3 - Conference contribution
SN - 9798331532024
T3 - 2024 IEEE International Multi-Conference on Engineering, Computer and Information Sciences, SIBIRCON 2024
SP - 323
EP - 326
BT - 2024 IEEE International Multi-Conference on Engineering, Computer and Information Sciences, SIBIRCON 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 IEEE International Multi-Conference on Engineering, Computer and Information Sciences
Y2 - 30 September 2024 through 2 November 2024
ER -
ID: 61787771