Standard
Design and Development of Pipeline of Preprocessing Tools for Kazakh Language Texts. / Mansurova, Madina; Barakhnin, Vladimir B.; Madiyeva, Gulmira et al.
Human Language Technology. Challenges for Computer Science and Linguistics - 9th Language and Technology Conference, LTC 2019, Revised Selected Papers. ed. / Zygmunt Vetulani; Patrick Paroubek; Marek Kubis. Springer Science and Business Media Deutschland GmbH, 2022. p. 129-142 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13212 LNAI).
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
Harvard
Mansurova, M
, Barakhnin, VB, Madiyeva, G, Kadyrbek, N & Dossanov, B 2022,
Design and Development of Pipeline of Preprocessing Tools for Kazakh Language Texts. in Z Vetulani, P Paroubek & M Kubis (eds),
Human Language Technology. Challenges for Computer Science and Linguistics - 9th Language and Technology Conference, LTC 2019, Revised Selected Papers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13212 LNAI, Springer Science and Business Media Deutschland GmbH, pp. 129-142, 9th Language and Technology Conference, LTC 2019, Poznań, Poland,
17.05.2019.
https://doi.org/10.1007/978-3-031-05328-3_9
APA
Mansurova, M.
, Barakhnin, V. B., Madiyeva, G., Kadyrbek, N., & Dossanov, B. (2022).
Design and Development of Pipeline of Preprocessing Tools for Kazakh Language Texts. In Z. Vetulani, P. Paroubek, & M. Kubis (Eds.),
Human Language Technology. Challenges for Computer Science and Linguistics - 9th Language and Technology Conference, LTC 2019, Revised Selected Papers (pp. 129-142). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13212 LNAI). Springer Science and Business Media Deutschland GmbH.
https://doi.org/10.1007/978-3-031-05328-3_9
Vancouver
Mansurova M
, Barakhnin VB, Madiyeva G, Kadyrbek N, Dossanov B.
Design and Development of Pipeline of Preprocessing Tools for Kazakh Language Texts. In Vetulani Z, Paroubek P, Kubis M, editors, Human Language Technology. Challenges for Computer Science and Linguistics - 9th Language and Technology Conference, LTC 2019, Revised Selected Papers. Springer Science and Business Media Deutschland GmbH. 2022. p. 129-142. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-05328-3_9
Author
Mansurova, Madina
; Barakhnin, Vladimir B. ; Madiyeva, Gulmira et al. /
Design and Development of Pipeline of Preprocessing Tools for Kazakh Language Texts. Human Language Technology. Challenges for Computer Science and Linguistics - 9th Language and Technology Conference, LTC 2019, Revised Selected Papers. editor / Zygmunt Vetulani ; Patrick Paroubek ; Marek Kubis. Springer Science and Business Media Deutschland GmbH, 2022. pp. 129-142 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
BibTeX
@inproceedings{af9cef28063c4481b9d4398f4f8b497f,
title = "Design and Development of Pipeline of Preprocessing Tools for Kazakh Language Texts",
abstract = "Nowadays, the Kazakh language belongs to the category of less-resourced languages, as there is a small number of resources developed and accessible to a wide range of users, such as text corpora, electronic dictionaries, morphological analyzers, thesauri, which allow to analyze text documents. The aim of this work is the design and development of pipeline of preprocessing tools for media-corpus of the Kazakh language. Media-corpus is hosted by al-Farabi Kazakh National University and serves linguists as an empirical basis for research in the contemporary written Kazakh language. The development of pipeline of preprocessing tools for media-corpus, the lexical and grammatical features of the Kazakh language were analyzed, on the basis of which the composition of the fundamental rules for changing the words (inflection) of the Kazakh language was determined. In the process of research, the tools for generation and lemmatization of the word forms of the Kazakh language were created. The proposed tools can be applied at the stage of morphological analysis in the systems of automatic analysis of the texts, in the creation of thesauruses and ontologies. For the case of the presence of homonymy, the template method was used, which allow to reduce the level of homonymy.",
keywords = "Kazakh language, Lemmatization, Morphological model, Pipeline, Preprocessing tools",
author = "Madina Mansurova and Barakhnin, {Vladimir B.} and Gulmira Madiyeva and Nurgali Kadyrbek and Bekzhan Dossanov",
note = "Funding Information: This work was supported in part under grant of Foundation of the Ministry of Education and Science of the Republic of Kazakhstan AP09261344 “Development of methods for automatic extraction of spatial objects from heterogeneous sources for information support of geographic information systems” (2020–2022) and Erasmus+ Project “Development of the interdisciplinary master program on Computational Linguistics at Central Asian universities” (585845-EPP-1-2017-1–ES-EPPKA2-CBHE-JP). Publisher Copyright: {\textcopyright} 2022, Springer Nature Switzerland AG.; 9th Language and Technology Conference, LTC 2019 ; Conference date: 17-05-2019 Through 19-05-2019",
year = "2022",
doi = "10.1007/978-3-031-05328-3_9",
language = "English",
isbn = "9783031053276",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Science and Business Media Deutschland GmbH",
pages = "129--142",
editor = "Zygmunt Vetulani and Patrick Paroubek and Marek Kubis",
booktitle = "Human Language Technology. Challenges for Computer Science and Linguistics - 9th Language and Technology Conference, LTC 2019, Revised Selected Papers",
address = "Germany",
}
RIS
TY - GEN
T1 - Design and Development of Pipeline of Preprocessing Tools for Kazakh Language Texts
AU - Mansurova, Madina
AU - Barakhnin, Vladimir B.
AU - Madiyeva, Gulmira
AU - Kadyrbek, Nurgali
AU - Dossanov, Bekzhan
N1 - Funding Information:
This work was supported in part under grant of Foundation of the Ministry of Education and Science of the Republic of Kazakhstan AP09261344 “Development of methods for automatic extraction of spatial objects from heterogeneous sources for information support of geographic information systems” (2020–2022) and Erasmus+ Project “Development of the interdisciplinary master program on Computational Linguistics at Central Asian universities” (585845-EPP-1-2017-1–ES-EPPKA2-CBHE-JP).
Publisher Copyright:
© 2022, Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - Nowadays, the Kazakh language belongs to the category of less-resourced languages, as there is a small number of resources developed and accessible to a wide range of users, such as text corpora, electronic dictionaries, morphological analyzers, thesauri, which allow to analyze text documents. The aim of this work is the design and development of pipeline of preprocessing tools for media-corpus of the Kazakh language. Media-corpus is hosted by al-Farabi Kazakh National University and serves linguists as an empirical basis for research in the contemporary written Kazakh language. The development of pipeline of preprocessing tools for media-corpus, the lexical and grammatical features of the Kazakh language were analyzed, on the basis of which the composition of the fundamental rules for changing the words (inflection) of the Kazakh language was determined. In the process of research, the tools for generation and lemmatization of the word forms of the Kazakh language were created. The proposed tools can be applied at the stage of morphological analysis in the systems of automatic analysis of the texts, in the creation of thesauruses and ontologies. For the case of the presence of homonymy, the template method was used, which allow to reduce the level of homonymy.
AB - Nowadays, the Kazakh language belongs to the category of less-resourced languages, as there is a small number of resources developed and accessible to a wide range of users, such as text corpora, electronic dictionaries, morphological analyzers, thesauri, which allow to analyze text documents. The aim of this work is the design and development of pipeline of preprocessing tools for media-corpus of the Kazakh language. Media-corpus is hosted by al-Farabi Kazakh National University and serves linguists as an empirical basis for research in the contemporary written Kazakh language. The development of pipeline of preprocessing tools for media-corpus, the lexical and grammatical features of the Kazakh language were analyzed, on the basis of which the composition of the fundamental rules for changing the words (inflection) of the Kazakh language was determined. In the process of research, the tools for generation and lemmatization of the word forms of the Kazakh language were created. The proposed tools can be applied at the stage of morphological analysis in the systems of automatic analysis of the texts, in the creation of thesauruses and ontologies. For the case of the presence of homonymy, the template method was used, which allow to reduce the level of homonymy.
KW - Kazakh language
KW - Lemmatization
KW - Morphological model
KW - Pipeline
KW - Preprocessing tools
UR - http://www.scopus.com/inward/record.url?scp=85132893173&partnerID=8YFLogxK
UR - https://www.mendeley.com/catalogue/5c89a0f2-756b-30df-b5dc-35beab9a7c0e/
U2 - 10.1007/978-3-031-05328-3_9
DO - 10.1007/978-3-031-05328-3_9
M3 - Conference contribution
AN - SCOPUS:85132893173
SN - 9783031053276
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 129
EP - 142
BT - Human Language Technology. Challenges for Computer Science and Linguistics - 9th Language and Technology Conference, LTC 2019, Revised Selected Papers
A2 - Vetulani, Zygmunt
A2 - Paroubek, Patrick
A2 - Kubis, Marek
PB - Springer Science and Business Media Deutschland GmbH
T2 - 9th Language and Technology Conference, LTC 2019
Y2 - 17 May 2019 through 19 May 2019
ER -