Standard

Generalization Ability of CNN-Based Morpheme Segmentation. / Garipov, Timur; Morozov, Dmitry; Glazkova, Anna.

Proceedings - Ivannikov ISPRAS Open Conference. ed. / A. Avetisyan. Institute of Electrical and Electronics Engineers Inc., 2023. p. 58-62 (Proceedings - Ivannikov ISPRAS Open Conference).

Research output: Chapter in Book/Report/Conference proceedingChapterResearchpeer-review

Harvard

Garipov, T, Morozov, D & Glazkova, A 2023, Generalization Ability of CNN-Based Morpheme Segmentation. in A Avetisyan (ed.), Proceedings - Ivannikov ISPRAS Open Conference. Proceedings - Ivannikov ISPRAS Open Conference, Institute of Electrical and Electronics Engineers Inc., pp. 58-62, 2023 Ivannikov ISPRAS Open Conference, Москва, Russian Federation, 04.12.2023. https://doi.org/10.1109/ISPRAS60948.2023.10508171

APA

Garipov, T., Morozov, D., & Glazkova, A. (2023). Generalization Ability of CNN-Based Morpheme Segmentation. In A. Avetisyan (Ed.), Proceedings - Ivannikov ISPRAS Open Conference (pp. 58-62). (Proceedings - Ivannikov ISPRAS Open Conference). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ISPRAS60948.2023.10508171

Vancouver

Garipov T, Morozov D, Glazkova A. Generalization Ability of CNN-Based Morpheme Segmentation. In Avetisyan A, editor, Proceedings - Ivannikov ISPRAS Open Conference. Institute of Electrical and Electronics Engineers Inc. 2023. p. 58-62. (Proceedings - Ivannikov ISPRAS Open Conference). doi: 10.1109/ISPRAS60948.2023.10508171

Author

Garipov, Timur ; Morozov, Dmitry ; Glazkova, Anna. / Generalization Ability of CNN-Based Morpheme Segmentation. Proceedings - Ivannikov ISPRAS Open Conference. editor / A. Avetisyan. Institute of Electrical and Electronics Engineers Inc., 2023. pp. 58-62 (Proceedings - Ivannikov ISPRAS Open Conference).

BibTeX

@inbook{367e7e9fe06c47a69c20e05e5dbeb475,
title = "Generalization Ability of CNN-Based Morpheme Segmentation",
abstract = "Determining the morphemic structure of a word is a problem that is particularly relevant in teaching the Russian language. Automatic evaluation of this structure is complicated by the lack of agreement among linguists in some complex cases. At the same time, several papers have been published in recent years, whose authors use various machine learning methods to solve this problem in applications. The authors of [1] propose an architecture based on convolutional neural networks for Russian lemmas. The proposed algorithm has shown quality sufficient for solving various applied problems. At the same time, generalization ability of this algorithm in case of unmet morphemes remains unclear. In this paper, we discovered that quality of the algorithm drops by 16-18% in terms of word accuracy when testing on words with roots absent from the training sample. Taking into account the significant robustness of the algorithm to a uniform reduction in the training sample, we can conclude that training dataset for studied model can be small but should be as diverse as possible.",
author = "Timur Garipov and Dmitry Morozov and Anna Glazkova",
year = "2023",
doi = "10.1109/ISPRAS60948.2023.10508171",
language = "English",
isbn = "979-835034999-3",
series = "Proceedings - Ivannikov ISPRAS Open Conference",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "58--62",
editor = "A. Avetisyan",
booktitle = "Proceedings - Ivannikov ISPRAS Open Conference",
address = "United States",
note = "2023 Ivannikov ISPRAS Open Conference, ISPRAS 2023 ; Conference date: 04-12-2023 Through 05-12-2023",
url = "https://www.isprasopen.ru/",

}

RIS

TY - CHAP

T1 - Generalization Ability of CNN-Based Morpheme Segmentation

AU - Garipov, Timur

AU - Morozov, Dmitry

AU - Glazkova, Anna

PY - 2023

Y1 - 2023

N2 - Determining the morphemic structure of a word is a problem that is particularly relevant in teaching the Russian language. Automatic evaluation of this structure is complicated by the lack of agreement among linguists in some complex cases. At the same time, several papers have been published in recent years, whose authors use various machine learning methods to solve this problem in applications. The authors of [1] propose an architecture based on convolutional neural networks for Russian lemmas. The proposed algorithm has shown quality sufficient for solving various applied problems. At the same time, generalization ability of this algorithm in case of unmet morphemes remains unclear. In this paper, we discovered that quality of the algorithm drops by 16-18% in terms of word accuracy when testing on words with roots absent from the training sample. Taking into account the significant robustness of the algorithm to a uniform reduction in the training sample, we can conclude that training dataset for studied model can be small but should be as diverse as possible.

AB - Determining the morphemic structure of a word is a problem that is particularly relevant in teaching the Russian language. Automatic evaluation of this structure is complicated by the lack of agreement among linguists in some complex cases. At the same time, several papers have been published in recent years, whose authors use various machine learning methods to solve this problem in applications. The authors of [1] propose an architecture based on convolutional neural networks for Russian lemmas. The proposed algorithm has shown quality sufficient for solving various applied problems. At the same time, generalization ability of this algorithm in case of unmet morphemes remains unclear. In this paper, we discovered that quality of the algorithm drops by 16-18% in terms of word accuracy when testing on words with roots absent from the training sample. Taking into account the significant robustness of the algorithm to a uniform reduction in the training sample, we can conclude that training dataset for studied model can be small but should be as diverse as possible.

UR - https://www.scopus.com/record/display.uri?eid=2-s2.0-85192732986&origin=inward&txGid=ed027bf4a387024abac203bdd9655d94

UR - https://www.mendeley.com/catalogue/47ec5a4d-db0f-30b4-a2fe-84e633131a4f/

U2 - 10.1109/ISPRAS60948.2023.10508171

DO - 10.1109/ISPRAS60948.2023.10508171

M3 - Chapter

SN - 979-835034999-3

T3 - Proceedings - Ivannikov ISPRAS Open Conference

SP - 58

EP - 62

BT - Proceedings - Ivannikov ISPRAS Open Conference

A2 - Avetisyan, A.

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2023 Ivannikov ISPRAS Open Conference

Y2 - 4 December 2023 through 5 December 2023

ER -

ID: 60406199