Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование
From morphological rules to neural networks: A hybrid framework for medical entity extraction in Karakalpak. / Mengliev, Davlatyor; Abdurakhmonova, Nilufar; Zokirova, Hulkar и др.
AIP Conference Proceedings. ред. / Niyetbay Uteuliev; Bakhtiyor Khuzhayorov; Bekzodjion Fayziev. Том 3377 American Institute of Physics Inc., 2025. 070004 (AIP Conference Proceedings; Том 3377, № 1).Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование
}
TY - GEN
T1 - From morphological rules to neural networks: A hybrid framework for medical entity extraction in Karakalpak
AU - Mengliev, Davlatyor
AU - Abdurakhmonova, Nilufar
AU - Zokirova, Hulkar
AU - Ibragimov, Bahodir
AU - Jurakulova, Madina
AU - Abdunazarova, Maftuna
N1 - Conference code: 2
PY - 2025/11/7
Y1 - 2025/11/7
N2 - This paper presents a hybrid method for extracting named entities from medical texts in the Karakalpak language. The approach is based on a rule-oriented method that preprocesses the text in the form of morphological analysis of word forms in the text. This analysis is based on rules and a base of affixes that allow the stemming process to be carried out in order to identify the root of a word or correct misspelled words. After preprocessing, the named entities in the text are directly identified using the multilingual mBERT model. To train this language model, a sample of 5,000 sentences marked using the BIOES scheme was used. The test results showed that the hybrid approach outperforms both rule-based methods without a neural network and neural network solutions without preprocessing. The high score is supported by digital indicators, where the accuracy and recall of the model reached 90% and 90%, respectively, and the F1-measure was about 91%. In addition, the authors conducted a comparative analysis of existing solutions and provided information on the Karakalpak language.
AB - This paper presents a hybrid method for extracting named entities from medical texts in the Karakalpak language. The approach is based on a rule-oriented method that preprocesses the text in the form of morphological analysis of word forms in the text. This analysis is based on rules and a base of affixes that allow the stemming process to be carried out in order to identify the root of a word or correct misspelled words. After preprocessing, the named entities in the text are directly identified using the multilingual mBERT model. To train this language model, a sample of 5,000 sentences marked using the BIOES scheme was used. The test results showed that the hybrid approach outperforms both rule-based methods without a neural network and neural network solutions without preprocessing. The high score is supported by digital indicators, where the accuracy and recall of the model reached 90% and 90%, respectively, and the F1-measure was about 91%. In addition, the authors conducted a comparative analysis of existing solutions and provided information on the Karakalpak language.
UR - https://www.scopus.com/pages/publications/105021378739
UR - https://www.mendeley.com/catalogue/165312ea-6316-30e7-a87e-ea7406ecb546/
U2 - 10.1063/5.0299775
DO - 10.1063/5.0299775
M3 - Conference contribution
VL - 3377
T3 - AIP Conference Proceedings
BT - AIP Conference Proceedings
A2 - Uteuliev, Niyetbay
A2 - Khuzhayorov, Bakhtiyor
A2 - Fayziev, Bekzodjion
PB - American Institute of Physics Inc.
T2 - Second International Scientific and Practical Conference on Actual Problems of Mathematical Modeling and Information Technology
Y2 - 12 November 2024 through 13 November 2024
ER -
ID: 72346981