Standard
BERT-like Models for Slavic Morpheme Segmentation. / Morozov, Dmitry; Astapenka, Lizaveta; Glazkova, Anna et al.
Proceedings of the Annual Meeting of the Association for Computational Linguistics. ed. / Wanxiang Che; Joyce Nabende; Ekaterina Shutova; Mohammad Taher Pilehvar. Association for Computational Linguistics, 2025. p. 6795-6815 (Proceedings of the Annual Meeting of the Association for Computational Linguistics; Vol. 1).
Research output: Chapter in Book/Report/Conference proceeding › Chapter › Research › peer-review
Harvard
Morozov, D, Astapenka, L, Glazkova, A, Garipov, T & Lyashevskaya, O 2025,
BERT-like Models for Slavic Morpheme Segmentation. in W Che, J Nabende, E Shutova & MT Pilehvar (eds),
Proceedings of the Annual Meeting of the Association for Computational Linguistics. Proceedings of the Annual Meeting of the Association for Computational Linguistics, vol. 1, Association for Computational Linguistics, pp. 6795-6815, The 63rd Annual Meeting of the Association for Computational Linguistics, Vienna, Austria,
27.07.2025.
https://doi.org/10.18653/v1/2025.acl-long.337
APA
Morozov, D., Astapenka, L., Glazkova, A., Garipov, T., & Lyashevskaya, O. (2025).
BERT-like Models for Slavic Morpheme Segmentation. In W. Che, J. Nabende, E. Shutova, & M. T. Pilehvar (Eds.),
Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 6795-6815). (Proceedings of the Annual Meeting of the Association for Computational Linguistics; Vol. 1). Association for Computational Linguistics.
https://doi.org/10.18653/v1/2025.acl-long.337
Vancouver
Morozov D, Astapenka L, Glazkova A, Garipov T, Lyashevskaya O.
BERT-like Models for Slavic Morpheme Segmentation. In Che W, Nabende J, Shutova E, Pilehvar MT, editors, Proceedings of the Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics. 2025. p. 6795-6815. (Proceedings of the Annual Meeting of the Association for Computational Linguistics). doi: 10.18653/v1/2025.acl-long.337
Author
Morozov, Dmitry ; Astapenka, Lizaveta ; Glazkova, Anna et al. /
BERT-like Models for Slavic Morpheme Segmentation. Proceedings of the Annual Meeting of the Association for Computational Linguistics. editor / Wanxiang Che ; Joyce Nabende ; Ekaterina Shutova ; Mohammad Taher Pilehvar. Association for Computational Linguistics, 2025. pp. 6795-6815 (Proceedings of the Annual Meeting of the Association for Computational Linguistics).
BibTeX
@inbook{cba8e7cc3775487ca87842b6d48adaf7,
title = "BERT-like Models for Slavic Morpheme Segmentation",
abstract = "Automatic morpheme segmentation algorithms are applicable in various tasks, such as building tokenizers and language education. For Slavic languages, the development of such algorithms is complicated by the rich derivational capabilities of these languages. Previous research has shown that, on average, these algorithms have already reached expert-level quality. However, a key unresolved issue is the significant decline in performance when segmenting words containing roots not present in the training data. This problem can be partially addressed by using pre-trained language models to better account for word semantics. In this work, we explored the possibility of fine-tuning BERT-like models for morpheme segmentation using data from Belarusian, Czech, and Russian. We found that for Czech and Russian, our models outperform all previously proposed approaches, achieving word-level accuracy of 92.5-95.1%. For Belarusian, this task was addressed for the first time. The best-performing approach for Belarusian was an ensemble of convolutional neural networks with word-level accuracy of 90.45%.",
author = "Dmitry Morozov and Lizaveta Astapenka and Anna Glazkova and Timur Garipov and Olga Lyashevskaya",
note = "Dmitry Morozov, Lizaveta Astapenka, Anna Glazkova, Timur Garipov, and Olga Lyashevskaya. 2025. BERT-like Models for Slavic Morpheme Segmentation. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6795–6815, Vienna, Austria. Association for Computational Linguistics.; The 63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025 ; Conference date: 27-07-2025 Through 01-08-2025",
year = "2025",
month = jul,
doi = "10.18653/v1/2025.acl-long.337",
language = "English",
isbn = "9798891762510",
series = "Proceedings of the Annual Meeting of the Association for Computational Linguistics",
publisher = "Association for Computational Linguistics",
pages = "6795--6815",
editor = "Wanxiang Che and Joyce Nabende and Ekaterina Shutova and Pilehvar, {Mohammad Taher}",
booktitle = "Proceedings of the Annual Meeting of the Association for Computational Linguistics",
address = "United States",
url = "https://2025.aclweb.org/",
}
RIS
TY - CHAP
T1 - BERT-like Models for Slavic Morpheme Segmentation
AU - Morozov, Dmitry
AU - Astapenka, Lizaveta
AU - Glazkova, Anna
AU - Garipov, Timur
AU - Lyashevskaya, Olga
N1 - Conference code: 63
PY - 2025/7
Y1 - 2025/7
N2 - Automatic morpheme segmentation algorithms are applicable in various tasks, such as building tokenizers and language education. For Slavic languages, the development of such algorithms is complicated by the rich derivational capabilities of these languages. Previous research has shown that, on average, these algorithms have already reached expert-level quality. However, a key unresolved issue is the significant decline in performance when segmenting words containing roots not present in the training data. This problem can be partially addressed by using pre-trained language models to better account for word semantics. In this work, we explored the possibility of fine-tuning BERT-like models for morpheme segmentation using data from Belarusian, Czech, and Russian. We found that for Czech and Russian, our models outperform all previously proposed approaches, achieving word-level accuracy of 92.5-95.1%. For Belarusian, this task was addressed for the first time. The best-performing approach for Belarusian was an ensemble of convolutional neural networks with word-level accuracy of 90.45%.
AB - Automatic morpheme segmentation algorithms are applicable in various tasks, such as building tokenizers and language education. For Slavic languages, the development of such algorithms is complicated by the rich derivational capabilities of these languages. Previous research has shown that, on average, these algorithms have already reached expert-level quality. However, a key unresolved issue is the significant decline in performance when segmenting words containing roots not present in the training data. This problem can be partially addressed by using pre-trained language models to better account for word semantics. In this work, we explored the possibility of fine-tuning BERT-like models for morpheme segmentation using data from Belarusian, Czech, and Russian. We found that for Czech and Russian, our models outperform all previously proposed approaches, achieving word-level accuracy of 92.5-95.1%. For Belarusian, this task was addressed for the first time. The best-performing approach for Belarusian was an ensemble of convolutional neural networks with word-level accuracy of 90.45%.
UR - https://www.scopus.com/pages/publications/105021025720
UR - https://www.mendeley.com/catalogue/4c32a22e-143f-342d-8eae-72bcb93844d4/
U2 - 10.18653/v1/2025.acl-long.337
DO - 10.18653/v1/2025.acl-long.337
M3 - Chapter
SN - 9798891762510
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 6795
EP - 6815
BT - Proceedings of the Annual Meeting of the Association for Computational Linguistics
A2 - Che, Wanxiang
A2 - Nabende, Joyce
A2 - Shutova, Ekaterina
A2 - Pilehvar, Mohammad Taher
PB - Association for Computational Linguistics
T2 - The 63rd Annual Meeting of the Association for Computational Linguistics
Y2 - 27 July 2025 through 1 August 2025
ER -