Research output: Contribution to journal › Article › peer-review
Discriminative Lemmatization of Abbreviations in the Era of LLMs. / Glazkova, A. V.; Smal, I. A.; Lyashevskaya, O. N. et al.
In: Doklady Mathematics, Vol. 112, No. 1, 08.2025, p. 219-226.Research output: Contribution to journal › Article › peer-review
}
TY - JOUR
T1 - Discriminative Lemmatization of Abbreviations in the Era of LLMs
AU - Glazkova, A. V.
AU - Smal, I. A.
AU - Lyashevskaya, O. N.
AU - Морозов, Дмитрий Алексеевич
N1 - Glazkova A. V., Smal I., Lyashevskaya O., Morozov D. Discriminative lemmatization of abbreviations in the era of LLMs // Doklady Mathematics. — 2026. — Vol. 112. - № 1. — P. 219–226. — DOI: 10.1134/S106456242570022X.
PY - 2025/8
Y1 - 2025/8
N2 - This paper presents a study on the effectiveness of discriminative methods for abbreviation lemmatization in Russian texts. Unlike generative approaches, discriminative models select an optimal lemma from a fixed set of candidates, eliminating the risk of generating grammatically incorrect word forms. For the first time in Russian language processing, we have conducted a comprehensive analysis of four context-aware approaches: (1) masked language model ranking, (2) binary classification, (3) multi-class classification, and (4) prompt-based learning. Special attention is given to cases of contextual ambiguity, where the same abbreviation within a single text fragment corresponds to different lemmas. The results demonstrate that fine-tuned multi-class classification achieves the highest quality (macro-averaged F-score of 97.75–99.92% depending on the abbreviation). However, with limited training data, both prompt-based learning and masked language model ranking show promising results. Moreover, the effectiveness of these approaches increases in cases of contextual ambiguity. The study contributes to the development of Russian text processing methods by providing practical recommendations for selecting architectures for abbreviation lemmatization tasks.
AB - This paper presents a study on the effectiveness of discriminative methods for abbreviation lemmatization in Russian texts. Unlike generative approaches, discriminative models select an optimal lemma from a fixed set of candidates, eliminating the risk of generating grammatically incorrect word forms. For the first time in Russian language processing, we have conducted a comprehensive analysis of four context-aware approaches: (1) masked language model ranking, (2) binary classification, (3) multi-class classification, and (4) prompt-based learning. Special attention is given to cases of contextual ambiguity, where the same abbreviation within a single text fragment corresponds to different lemmas. The results demonstrate that fine-tuned multi-class classification achieves the highest quality (macro-averaged F-score of 97.75–99.92% depending on the abbreviation). However, with limited training data, both prompt-based learning and masked language model ranking show promising results. Moreover, the effectiveness of these approaches increases in cases of contextual ambiguity. The study contributes to the development of Russian text processing methods by providing practical recommendations for selecting architectures for abbreviation lemmatization tasks.
KW - ЛЕММАТИЗАЦИЯ
KW - СОКРАЩЕНИЯ
KW - РУССКИЙ ЯЗЫК
KW - ДИСКРИМИНАТИВНЫЕ МЕТОДЫ
KW - КЛАССИФИКАЦИЯ ТЕКСТОВ
KW - ОБРАБОТКА ЕСТЕСТВЕННОГО ЯЗЫКА
KW - lemmatization
KW - abbreviations
KW - Russian language
KW - discriminative methods
KW - text classification
KW - natural language processing
UR - https://www.scopus.com/pages/publications/105031649591
UR - https://www.mendeley.com/catalogue/855304cb-9e0a-3069-9ac2-5904fb1e232a/
U2 - 10.1134/S106456242570022X
DO - 10.1134/S106456242570022X
M3 - Article
VL - 112
SP - 219
EP - 226
JO - Doklady Mathematics
JF - Doklady Mathematics
SN - 1064-5624
IS - 1
ER -
ID: 75590070