Standard

BERT-like Models for Slavic Morpheme Segmentation. / Morozov, Dmitry; Astapenka, Lizaveta; Glazkova, Anna и др.

Proceedings of the Annual Meeting of the Association for Computational Linguistics. ред. / Wanxiang Che; Joyce Nabende; Ekaterina Shutova; Mohammad Taher Pilehvar. Association for Computational Linguistics, 2025. стр. 6795-6815 (Proceedings of the Annual Meeting of the Association for Computational Linguistics; Том 1).

Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференцийглава/разделнаучнаяРецензирование

Harvard

Morozov, D, Astapenka, L, Glazkova, A, Garipov, T & Lyashevskaya, O 2025, BERT-like Models for Slavic Morpheme Segmentation. в W Che, J Nabende, E Shutova & MT Pilehvar (ред.), Proceedings of the Annual Meeting of the Association for Computational Linguistics. Proceedings of the Annual Meeting of the Association for Computational Linguistics, Том. 1, Association for Computational Linguistics, стр. 6795-6815, The 63rd Annual Meeting of the Association for Computational Linguistics, Vienna, Австрия, 27.07.2025. https://doi.org/10.18653/v1/2025.acl-long.337

APA

Morozov, D., Astapenka, L., Glazkova, A., Garipov, T., & Lyashevskaya, O. (2025). BERT-like Models for Slavic Morpheme Segmentation. в W. Che, J. Nabende, E. Shutova, & M. T. Pilehvar (Ред.), Proceedings of the Annual Meeting of the Association for Computational Linguistics (стр. 6795-6815). (Proceedings of the Annual Meeting of the Association for Computational Linguistics; Том 1). Association for Computational Linguistics. https://doi.org/10.18653/v1/2025.acl-long.337

Vancouver

Morozov D, Astapenka L, Glazkova A, Garipov T, Lyashevskaya O. BERT-like Models for Slavic Morpheme Segmentation. в Che W, Nabende J, Shutova E, Pilehvar MT, Редакторы, Proceedings of the Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics. 2025. стр. 6795-6815. (Proceedings of the Annual Meeting of the Association for Computational Linguistics). doi: 10.18653/v1/2025.acl-long.337

Author

Morozov, Dmitry ; Astapenka, Lizaveta ; Glazkova, Anna и др. / BERT-like Models for Slavic Morpheme Segmentation. Proceedings of the Annual Meeting of the Association for Computational Linguistics. Редактор / Wanxiang Che ; Joyce Nabende ; Ekaterina Shutova ; Mohammad Taher Pilehvar. Association for Computational Linguistics, 2025. стр. 6795-6815 (Proceedings of the Annual Meeting of the Association for Computational Linguistics).

BibTeX

@inbook{cba8e7cc3775487ca87842b6d48adaf7,
title = "BERT-like Models for Slavic Morpheme Segmentation",
abstract = "Automatic morpheme segmentation algorithms are applicable in various tasks, such as building tokenizers and language education. For Slavic languages, the development of such algorithms is complicated by the rich derivational capabilities of these languages. Previous research has shown that, on average, these algorithms have already reached expert-level quality. However, a key unresolved issue is the significant decline in performance when segmenting words containing roots not present in the training data. This problem can be partially addressed by using pre-trained language models to better account for word semantics. In this work, we explored the possibility of fine-tuning BERT-like models for morpheme segmentation using data from Belarusian, Czech, and Russian. We found that for Czech and Russian, our models outperform all previously proposed approaches, achieving word-level accuracy of 92.5-95.1%. For Belarusian, this task was addressed for the first time. The best-performing approach for Belarusian was an ensemble of convolutional neural networks with word-level accuracy of 90.45%.",
author = "Dmitry Morozov and Lizaveta Astapenka and Anna Glazkova and Timur Garipov and Olga Lyashevskaya",
note = "Dmitry Morozov, Lizaveta Astapenka, Anna Glazkova, Timur Garipov, and Olga Lyashevskaya. 2025. BERT-like Models for Slavic Morpheme Segmentation. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6795–6815, Vienna, Austria. Association for Computational Linguistics.; The 63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025 ; Conference date: 27-07-2025 Through 01-08-2025",
year = "2025",
month = jul,
doi = "10.18653/v1/2025.acl-long.337",
language = "English",
isbn = "9798891762510",
series = "Proceedings of the Annual Meeting of the Association for Computational Linguistics",
publisher = "Association for Computational Linguistics",
pages = "6795--6815",
editor = "Wanxiang Che and Joyce Nabende and Ekaterina Shutova and Pilehvar, {Mohammad Taher}",
booktitle = "Proceedings of the Annual Meeting of the Association for Computational Linguistics",
address = "United States",
url = "https://2025.aclweb.org/",

}

RIS

TY - CHAP

T1 - BERT-like Models for Slavic Morpheme Segmentation

AU - Morozov, Dmitry

AU - Astapenka, Lizaveta

AU - Glazkova, Anna

AU - Garipov, Timur

AU - Lyashevskaya, Olga

N1 - Conference code: 63

PY - 2025/7

Y1 - 2025/7

N2 - Automatic morpheme segmentation algorithms are applicable in various tasks, such as building tokenizers and language education. For Slavic languages, the development of such algorithms is complicated by the rich derivational capabilities of these languages. Previous research has shown that, on average, these algorithms have already reached expert-level quality. However, a key unresolved issue is the significant decline in performance when segmenting words containing roots not present in the training data. This problem can be partially addressed by using pre-trained language models to better account for word semantics. In this work, we explored the possibility of fine-tuning BERT-like models for morpheme segmentation using data from Belarusian, Czech, and Russian. We found that for Czech and Russian, our models outperform all previously proposed approaches, achieving word-level accuracy of 92.5-95.1%. For Belarusian, this task was addressed for the first time. The best-performing approach for Belarusian was an ensemble of convolutional neural networks with word-level accuracy of 90.45%.

AB - Automatic morpheme segmentation algorithms are applicable in various tasks, such as building tokenizers and language education. For Slavic languages, the development of such algorithms is complicated by the rich derivational capabilities of these languages. Previous research has shown that, on average, these algorithms have already reached expert-level quality. However, a key unresolved issue is the significant decline in performance when segmenting words containing roots not present in the training data. This problem can be partially addressed by using pre-trained language models to better account for word semantics. In this work, we explored the possibility of fine-tuning BERT-like models for morpheme segmentation using data from Belarusian, Czech, and Russian. We found that for Czech and Russian, our models outperform all previously proposed approaches, achieving word-level accuracy of 92.5-95.1%. For Belarusian, this task was addressed for the first time. The best-performing approach for Belarusian was an ensemble of convolutional neural networks with word-level accuracy of 90.45%.

UR - https://www.scopus.com/pages/publications/105021025720

UR - https://www.mendeley.com/catalogue/4c32a22e-143f-342d-8eae-72bcb93844d4/

U2 - 10.18653/v1/2025.acl-long.337

DO - 10.18653/v1/2025.acl-long.337

M3 - Chapter

SN - 9798891762510

T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics

SP - 6795

EP - 6815

BT - Proceedings of the Annual Meeting of the Association for Computational Linguistics

A2 - Che, Wanxiang

A2 - Nabende, Joyce

A2 - Shutova, Ekaterina

A2 - Pilehvar, Mohammad Taher

PB - Association for Computational Linguistics

T2 - The 63rd Annual Meeting of the Association for Computational Linguistics

Y2 - 27 July 2025 through 1 August 2025

ER -

ID: 72454885