Standard
Rubic2: Ensemble Model for Russian Lemmatization. / Afanasev, Ilia; Glazkova, Anna; Lyashevskaya, Olga и др.
Proceedings of the Annual Meeting of the Association for Computational Linguistics. ред. / Wanxiang Che; Joyce Nabende; Ekaterina Shutova; Mohammad Taher Pilehvar. Association for Computational Linguistics, 2025. стр. 157-170 (Proceedings of the Annual Meeting of the Association for Computational Linguistics; Том 1).
Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование
Harvard
Afanasev, I, Glazkova, A, Lyashevskaya, O
, Morozov, D, Smal, I & Vlasova, N 2025,
Rubic2: Ensemble Model for Russian Lemmatization. в W Che, J Nabende, E Shutova & MT Pilehvar (ред.),
Proceedings of the Annual Meeting of the Association for Computational Linguistics. Proceedings of the Annual Meeting of the Association for Computational Linguistics, Том. 1, Association for Computational Linguistics, стр. 157-170, The 63rd Annual Meeting of the Association for Computational Linguistics, Vienna, Австрия,
27.07.2025.
https://doi.org/10.18653/v1/2025.bsnlp-1.18
APA
Afanasev, I., Glazkova, A., Lyashevskaya, O.
, Morozov, D., Smal, I., & Vlasova, N. (2025).
Rubic2: Ensemble Model for Russian Lemmatization. в W. Che, J. Nabende, E. Shutova, & M. T. Pilehvar (Ред.),
Proceedings of the Annual Meeting of the Association for Computational Linguistics (стр. 157-170). (Proceedings of the Annual Meeting of the Association for Computational Linguistics; Том 1). Association for Computational Linguistics.
https://doi.org/10.18653/v1/2025.bsnlp-1.18
Vancouver
Afanasev I, Glazkova A, Lyashevskaya O
, Morozov D, Smal I, Vlasova N.
Rubic2: Ensemble Model for Russian Lemmatization. в Che W, Nabende J, Shutova E, Pilehvar MT, Редакторы, Proceedings of the Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics. 2025. стр. 157-170. (Proceedings of the Annual Meeting of the Association for Computational Linguistics). doi: 10.18653/v1/2025.bsnlp-1.18
Author
Afanasev, Ilia ; Glazkova, Anna ; Lyashevskaya, Olga и др. /
Rubic2: Ensemble Model for Russian Lemmatization. Proceedings of the Annual Meeting of the Association for Computational Linguistics. Редактор / Wanxiang Che ; Joyce Nabende ; Ekaterina Shutova ; Mohammad Taher Pilehvar. Association for Computational Linguistics, 2025. стр. 157-170 (Proceedings of the Annual Meeting of the Association for Computational Linguistics).
BibTeX
@inproceedings{4b6d1b4801654b7f848bd32e38f9d5a6,
title = "Rubic2: Ensemble Model for Russian Lemmatization",
abstract = "Pre-trained language models have significantly advanced natural language processing (NLP), particularly in analyzing languages with complex morphological structures. This study addresses lemmatization for the Russian language, the errors in which can critically affect the performance of information retrieval, question answering, and other tasks. We present the results of experiments on generative lemmatization using pre-trained language models. Our findings demonstrate that combining generative models with the existing solutions allows achieving performance that surpasses current results for the lemmatization of Russian. This paper also introduces Rubic2, a new ensemble approach that combines the generative BART-base model, fine-tuned on a manually annotated data set of 2.1 million tokens, with the neural model called Rubic which is currently used for morphological annotation and lemmatization in the Russian National Corpus. Extensive experiments show that Rubic2 outperforms current solutions for the lemmatization of Russian, offering superior results across various text domains and contributing to advancements in NLP applications.",
author = "Ilia Afanasev and Anna Glazkova and Olga Lyashevskaya and Dmitry Morozov and Ivan Smal and Natalia Vlasova",
note = "Ilia Afanasev, Anna Glazkova, Olga Lyashevskaya, Dmitry Morozov, Ivan Smal, and Natalia Vlasova. 2025. Rubic2: Ensemble Model for Russian Lemmatization. In Proceedings of the 10th Workshop on Slavic Natural Language Processing (Slavic NLP 2025), pages 157–170, Vienna, Austria. Association for Computational Linguistics.; The 63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025 ; Conference date: 27-07-2025 Through 01-08-2025",
year = "2025",
month = jul,
doi = "10.18653/v1/2025.bsnlp-1.18",
language = "English",
isbn = "9798891762510",
series = "Proceedings of the Annual Meeting of the Association for Computational Linguistics",
publisher = "Association for Computational Linguistics",
pages = "157--170",
editor = "Wanxiang Che and Joyce Nabende and Ekaterina Shutova and Pilehvar, {Mohammad Taher}",
booktitle = "Proceedings of the Annual Meeting of the Association for Computational Linguistics",
address = "United States",
url = "https://2025.aclweb.org/",
}
RIS
TY - GEN
T1 - Rubic2: Ensemble Model for Russian Lemmatization
AU - Afanasev, Ilia
AU - Glazkova, Anna
AU - Lyashevskaya, Olga
AU - Morozov, Dmitry
AU - Smal, Ivan
AU - Vlasova, Natalia
N1 - Conference code: 63
PY - 2025/7
Y1 - 2025/7
N2 - Pre-trained language models have significantly advanced natural language processing (NLP), particularly in analyzing languages with complex morphological structures. This study addresses lemmatization for the Russian language, the errors in which can critically affect the performance of information retrieval, question answering, and other tasks. We present the results of experiments on generative lemmatization using pre-trained language models. Our findings demonstrate that combining generative models with the existing solutions allows achieving performance that surpasses current results for the lemmatization of Russian. This paper also introduces Rubic2, a new ensemble approach that combines the generative BART-base model, fine-tuned on a manually annotated data set of 2.1 million tokens, with the neural model called Rubic which is currently used for morphological annotation and lemmatization in the Russian National Corpus. Extensive experiments show that Rubic2 outperforms current solutions for the lemmatization of Russian, offering superior results across various text domains and contributing to advancements in NLP applications.
AB - Pre-trained language models have significantly advanced natural language processing (NLP), particularly in analyzing languages with complex morphological structures. This study addresses lemmatization for the Russian language, the errors in which can critically affect the performance of information retrieval, question answering, and other tasks. We present the results of experiments on generative lemmatization using pre-trained language models. Our findings demonstrate that combining generative models with the existing solutions allows achieving performance that surpasses current results for the lemmatization of Russian. This paper also introduces Rubic2, a new ensemble approach that combines the generative BART-base model, fine-tuned on a manually annotated data set of 2.1 million tokens, with the neural model called Rubic which is currently used for morphological annotation and lemmatization in the Russian National Corpus. Extensive experiments show that Rubic2 outperforms current solutions for the lemmatization of Russian, offering superior results across various text domains and contributing to advancements in NLP applications.
UR - https://www.mendeley.com/catalogue/0abd9ac5-ec84-3630-bf23-7f7a861f6cd1/
U2 - 10.18653/v1/2025.bsnlp-1.18
DO - 10.18653/v1/2025.bsnlp-1.18
M3 - Conference contribution
SN - 9798891762510
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 157
EP - 170
BT - Proceedings of the Annual Meeting of the Association for Computational Linguistics
A2 - Che, Wanxiang
A2 - Nabende, Joyce
A2 - Shutova, Ekaterina
A2 - Pilehvar, Mohammad Taher
PB - Association for Computational Linguistics
T2 - The 63rd Annual Meeting of the Association for Computational Linguistics
Y2 - 27 July 2025 through 1 August 2025
ER -