Standard

Linguistic Nuances in Text Analysis: TF-IDF Metric's Algorithm Implementation for the Karakalpak Language Recognition. / Mengliev, Davlatyor; Eshkulov, Mukhriddin; Barakhnin, Vladimir и др.

Proceedings - 2024 IEEE Ural-Siberian Conference on Biomedical Engineering, Radioelectronics and Information Technology, USBEREIT 2024. Institute of Electrical and Electronics Engineers Inc., 2024. стр. 19-22 (Proceedings - 2024 IEEE Ural-Siberian Conference on Biomedical Engineering, Radioelectronics and Information Technology, USBEREIT 2024).

Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференцийстатья в сборнике материалов конференциинаучнаяРецензирование

Harvard

Mengliev, D, Eshkulov, M, Barakhnin, V, Abdullayev, R, Boltayev, N & Ibragimov, B 2024, Linguistic Nuances in Text Analysis: TF-IDF Metric's Algorithm Implementation for the Karakalpak Language Recognition. в Proceedings - 2024 IEEE Ural-Siberian Conference on Biomedical Engineering, Radioelectronics and Information Technology, USBEREIT 2024. Proceedings - 2024 IEEE Ural-Siberian Conference on Biomedical Engineering, Radioelectronics and Information Technology, USBEREIT 2024, Institute of Electrical and Electronics Engineers Inc., стр. 19-22, 2024 IEEE Ural-Siberian Conference on Biomedical Engineering, Radioelectronics and Information Technology, Екатеринбург, Российская Федерация, 13.05.2024. https://doi.org/10.1109/USBEREIT61901.2024.10584051

APA

Mengliev, D., Eshkulov, M., Barakhnin, V., Abdullayev, R., Boltayev, N., & Ibragimov, B. (2024). Linguistic Nuances in Text Analysis: TF-IDF Metric's Algorithm Implementation for the Karakalpak Language Recognition. в Proceedings - 2024 IEEE Ural-Siberian Conference on Biomedical Engineering, Radioelectronics and Information Technology, USBEREIT 2024 (стр. 19-22). (Proceedings - 2024 IEEE Ural-Siberian Conference on Biomedical Engineering, Radioelectronics and Information Technology, USBEREIT 2024). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/USBEREIT61901.2024.10584051

Vancouver

Mengliev D, Eshkulov M, Barakhnin V, Abdullayev R, Boltayev N, Ibragimov B. Linguistic Nuances in Text Analysis: TF-IDF Metric's Algorithm Implementation for the Karakalpak Language Recognition. в Proceedings - 2024 IEEE Ural-Siberian Conference on Biomedical Engineering, Radioelectronics and Information Technology, USBEREIT 2024. Institute of Electrical and Electronics Engineers Inc. 2024. стр. 19-22. (Proceedings - 2024 IEEE Ural-Siberian Conference on Biomedical Engineering, Radioelectronics and Information Technology, USBEREIT 2024). doi: 10.1109/USBEREIT61901.2024.10584051

Author

Mengliev, Davlatyor ; Eshkulov, Mukhriddin ; Barakhnin, Vladimir и др. / Linguistic Nuances in Text Analysis: TF-IDF Metric's Algorithm Implementation for the Karakalpak Language Recognition. Proceedings - 2024 IEEE Ural-Siberian Conference on Biomedical Engineering, Radioelectronics and Information Technology, USBEREIT 2024. Institute of Electrical and Electronics Engineers Inc., 2024. стр. 19-22 (Proceedings - 2024 IEEE Ural-Siberian Conference on Biomedical Engineering, Radioelectronics and Information Technology, USBEREIT 2024).

BibTeX

@inproceedings{739f4a64454d49ddac11326475b2df43,
title = "Linguistic Nuances in Text Analysis: TF-IDF Metric's Algorithm Implementation for the Karakalpak Language Recognition",
abstract = "This article discusses an original approach to calculating the TF-IDF metric for Karakalpak language documents. The paper reviews related work, including efforts to automatically extract stop words and apply the TF-IDF metric tailored to the linguistic characteristics of the Karakalpak language, highlighting the importance of morphological preprocessing to improve the accuracy and efficiency of algorithms.Despite the challenges associated with the agglutinative nature of the Karakalpak language, such as the need for extensive morphological pre-processing to accurately identify and analyze word forms, this study proposes a new algorithm that demonstrates significant potential in dealing with the complexity of the language. By carefully adapting the TF-IDF metric to account for the morphological structure of Karakalpak, the proposed algorithm marks a significant advance in the computational analysis of agglutinative languages.Testing of the algorithm was thorough and included a diverse set of words unique to each dialect, as well as words common to multiple dialects and misspelled words. The algorithm has demonstrated high accuracy in identifying dialect-specific words and processing records in mixed dialects.In addition, this study contributes to the broader field of Turkic languages by offering insights into the structural and lexical features of the Uzbek language.",
keywords = "Agglutinative languages, Karakalpak language, TF-IDF, compound word formation, morphological structure, natural language processing, noun cases, suffixation, verb conjugation, vowel harmony",
author = "Davlatyor Mengliev and Mukhriddin Eshkulov and Vladimir Barakhnin and Ruslan Abdullayev and Nodirbek Boltayev and Bahodir Ibragimov",
year = "2024",
doi = "10.1109/USBEREIT61901.2024.10584051",
language = "English",
isbn = "9798350362893",
series = "Proceedings - 2024 IEEE Ural-Siberian Conference on Biomedical Engineering, Radioelectronics and Information Technology, USBEREIT 2024",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "19--22",
booktitle = "Proceedings - 2024 IEEE Ural-Siberian Conference on Biomedical Engineering, Radioelectronics and Information Technology, USBEREIT 2024",
address = "United States",
note = "2024 IEEE Ural-Siberian Conference on Biomedical Engineering, Radioelectronics and Information Technology, USBEREIT 2024 ; Conference date: 13-05-2024 Through 15-05-2024",
url = "https://usbereit.ieeesiberia.org/",

}

RIS

TY - GEN

T1 - Linguistic Nuances in Text Analysis: TF-IDF Metric's Algorithm Implementation for the Karakalpak Language Recognition

AU - Mengliev, Davlatyor

AU - Eshkulov, Mukhriddin

AU - Barakhnin, Vladimir

AU - Abdullayev, Ruslan

AU - Boltayev, Nodirbek

AU - Ibragimov, Bahodir

PY - 2024

Y1 - 2024

N2 - This article discusses an original approach to calculating the TF-IDF metric for Karakalpak language documents. The paper reviews related work, including efforts to automatically extract stop words and apply the TF-IDF metric tailored to the linguistic characteristics of the Karakalpak language, highlighting the importance of morphological preprocessing to improve the accuracy and efficiency of algorithms.Despite the challenges associated with the agglutinative nature of the Karakalpak language, such as the need for extensive morphological pre-processing to accurately identify and analyze word forms, this study proposes a new algorithm that demonstrates significant potential in dealing with the complexity of the language. By carefully adapting the TF-IDF metric to account for the morphological structure of Karakalpak, the proposed algorithm marks a significant advance in the computational analysis of agglutinative languages.Testing of the algorithm was thorough and included a diverse set of words unique to each dialect, as well as words common to multiple dialects and misspelled words. The algorithm has demonstrated high accuracy in identifying dialect-specific words and processing records in mixed dialects.In addition, this study contributes to the broader field of Turkic languages by offering insights into the structural and lexical features of the Uzbek language.

AB - This article discusses an original approach to calculating the TF-IDF metric for Karakalpak language documents. The paper reviews related work, including efforts to automatically extract stop words and apply the TF-IDF metric tailored to the linguistic characteristics of the Karakalpak language, highlighting the importance of morphological preprocessing to improve the accuracy and efficiency of algorithms.Despite the challenges associated with the agglutinative nature of the Karakalpak language, such as the need for extensive morphological pre-processing to accurately identify and analyze word forms, this study proposes a new algorithm that demonstrates significant potential in dealing with the complexity of the language. By carefully adapting the TF-IDF metric to account for the morphological structure of Karakalpak, the proposed algorithm marks a significant advance in the computational analysis of agglutinative languages.Testing of the algorithm was thorough and included a diverse set of words unique to each dialect, as well as words common to multiple dialects and misspelled words. The algorithm has demonstrated high accuracy in identifying dialect-specific words and processing records in mixed dialects.In addition, this study contributes to the broader field of Turkic languages by offering insights into the structural and lexical features of the Uzbek language.

KW - Agglutinative languages

KW - Karakalpak language

KW - TF-IDF

KW - compound word formation

KW - morphological structure

KW - natural language processing

KW - noun cases

KW - suffixation

KW - verb conjugation

KW - vowel harmony

UR - https://www.scopus.com/record/display.uri?eid=2-s2.0-85199189313&origin=inward&txGid=b967a67803f95a2d6c82dc5b0761de33

UR - https://www.mendeley.com/catalogue/80e227ce-0e75-3502-8f76-6bac8ee32b4a/

U2 - 10.1109/USBEREIT61901.2024.10584051

DO - 10.1109/USBEREIT61901.2024.10584051

M3 - Conference contribution

SN - 9798350362893

T3 - Proceedings - 2024 IEEE Ural-Siberian Conference on Biomedical Engineering, Radioelectronics and Information Technology, USBEREIT 2024

SP - 19

EP - 22

BT - Proceedings - 2024 IEEE Ural-Siberian Conference on Biomedical Engineering, Radioelectronics and Information Technology, USBEREIT 2024

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2024 IEEE Ural-Siberian Conference on Biomedical Engineering, Radioelectronics and Information Technology

Y2 - 13 May 2024 through 15 May 2024

ER -

ID: 60463270