Discriminative Lemmatization of Abbreviations in the Era of LLMs

Standard

Discriminative Lemmatization of Abbreviations in the Era of LLMs. / Glazkova, A. V.; Smal, I. A.; Lyashevskaya, O. N. и др.

в: Doklady Mathematics, Том 112, № 1, 08.2025, стр. 219-226.

Результаты исследований: Научные публикации в периодических изданиях › статья › Рецензирование

Harvard

Glazkova, AV, Smal, IA, Lyashevskaya, ON & Морозов, ДА 2025, 'Discriminative Lemmatization of Abbreviations in the Era of LLMs', Doklady Mathematics, Том. 112, № 1, стр. 219-226. https://doi.org/10.1134/S106456242570022X

APA

Glazkova, A. V., Smal, I. A., Lyashevskaya, O. N., & Морозов, Д. А. (2025). Discriminative Lemmatization of Abbreviations in the Era of LLMs. Doklady Mathematics, 112(1), 219-226. https://doi.org/10.1134/S106456242570022X

Vancouver

Glazkova AV, Smal IA, Lyashevskaya ON, Морозов ДА. Discriminative Lemmatization of Abbreviations in the Era of LLMs. Doklady Mathematics. 2025 авг.;112(1):219-226. doi: 10.1134/S106456242570022X

Author

Glazkova, A. V. ; Smal, I. A. ; Lyashevskaya, O. N. и др. / Discriminative Lemmatization of Abbreviations in the Era of LLMs. в: Doklady Mathematics. 2025 ; Том 112, № 1. стр. 219-226.

BibTeX

@article{f89dc6314c9d4b96ab469f2436de2360,

title = "Discriminative Lemmatization of Abbreviations in the Era of LLMs",

abstract = "This paper presents a study on the effectiveness of discriminative methods for abbreviation lemmatization in Russian texts. Unlike generative approaches, discriminative models select an optimal lemma from a fixed set of candidates, eliminating the risk of generating grammatically incorrect word forms. For the first time in Russian language processing, we have conducted a comprehensive analysis of four context-aware approaches: (1) masked language model ranking, (2) binary classification, (3) multi-class classification, and (4) prompt-based learning. Special attention is given to cases of contextual ambiguity, where the same abbreviation within a single text fragment corresponds to different lemmas. The results demonstrate that fine-tuned multi-class classification achieves the highest quality (macro-averaged F-score of 97.75–99.92% depending on the abbreviation). However, with limited training data, both prompt-based learning and masked language model ranking show promising results. Moreover, the effectiveness of these approaches increases in cases of contextual ambiguity. The study contributes to the development of Russian text processing methods by providing practical recommendations for selecting architectures for abbreviation lemmatization tasks.",

keywords = "ЛЕММАТИЗАЦИЯ, СОКРАЩЕНИЯ, РУССКИЙ ЯЗЫК, ДИСКРИМИНАТИВНЫЕ МЕТОДЫ, КЛАССИФИКАЦИЯ ТЕКСТОВ, ОБРАБОТКА ЕСТЕСТВЕННОГО ЯЗЫКА, lemmatization, abbreviations, Russian language, discriminative methods, text classification, natural language processing",

author = "Glazkova, {A. V.} and Smal, {I. A.} and Lyashevskaya, {O. N.} and Морозов, {Дмитрий Алексеевич}",

note = "Glazkova A. V., Smal I., Lyashevskaya O., Morozov D. Discriminative lemmatization of abbreviations in the era of LLMs // Doklady Mathematics. — 2026. — Vol. 112. - № 1. — P. 219–226. — DOI: 10.1134/S106456242570022X.",

year = "2025",

month = aug,

doi = "10.1134/S106456242570022X",

language = "English",

volume = "112",

pages = "219--226",

journal = "Doklady Mathematics",

issn = "1064-5624",

publisher = "Maik Nauka-Interperiodica Publishing",

number = "1",

}

RIS

TY - JOUR

T1 - Discriminative Lemmatization of Abbreviations in the Era of LLMs

AU - Glazkova, A. V.

AU - Smal, I. A.

AU - Lyashevskaya, O. N.

AU - Морозов, Дмитрий Алексеевич

N1 - Glazkova A. V., Smal I., Lyashevskaya O., Morozov D. Discriminative lemmatization of abbreviations in the era of LLMs // Doklady Mathematics. — 2026. — Vol. 112. - № 1. — P. 219–226. — DOI: 10.1134/S106456242570022X.

PY - 2025/8

Y1 - 2025/8

N2 - This paper presents a study on the effectiveness of discriminative methods for abbreviation lemmatization in Russian texts. Unlike generative approaches, discriminative models select an optimal lemma from a fixed set of candidates, eliminating the risk of generating grammatically incorrect word forms. For the first time in Russian language processing, we have conducted a comprehensive analysis of four context-aware approaches: (1) masked language model ranking, (2) binary classification, (3) multi-class classification, and (4) prompt-based learning. Special attention is given to cases of contextual ambiguity, where the same abbreviation within a single text fragment corresponds to different lemmas. The results demonstrate that fine-tuned multi-class classification achieves the highest quality (macro-averaged F-score of 97.75–99.92% depending on the abbreviation). However, with limited training data, both prompt-based learning and masked language model ranking show promising results. Moreover, the effectiveness of these approaches increases in cases of contextual ambiguity. The study contributes to the development of Russian text processing methods by providing practical recommendations for selecting architectures for abbreviation lemmatization tasks.

AB - This paper presents a study on the effectiveness of discriminative methods for abbreviation lemmatization in Russian texts. Unlike generative approaches, discriminative models select an optimal lemma from a fixed set of candidates, eliminating the risk of generating grammatically incorrect word forms. For the first time in Russian language processing, we have conducted a comprehensive analysis of four context-aware approaches: (1) masked language model ranking, (2) binary classification, (3) multi-class classification, and (4) prompt-based learning. Special attention is given to cases of contextual ambiguity, where the same abbreviation within a single text fragment corresponds to different lemmas. The results demonstrate that fine-tuned multi-class classification achieves the highest quality (macro-averaged F-score of 97.75–99.92% depending on the abbreviation). However, with limited training data, both prompt-based learning and masked language model ranking show promising results. Moreover, the effectiveness of these approaches increases in cases of contextual ambiguity. The study contributes to the development of Russian text processing methods by providing practical recommendations for selecting architectures for abbreviation lemmatization tasks.

KW - ЛЕММАТИЗАЦИЯ

KW - СОКРАЩЕНИЯ

KW - РУССКИЙ ЯЗЫК

KW - ДИСКРИМИНАТИВНЫЕ МЕТОДЫ

KW - КЛАССИФИКАЦИЯ ТЕКСТОВ

KW - ОБРАБОТКА ЕСТЕСТВЕННОГО ЯЗЫКА

KW - lemmatization

KW - abbreviations

KW - Russian language

KW - discriminative methods

KW - text classification

KW - natural language processing

UR - https://www.scopus.com/pages/publications/105031649591

UR - https://www.mendeley.com/catalogue/855304cb-9e0a-3069-9ac2-5904fb1e232a/

U2 - 10.1134/S106456242570022X

DO - 10.1134/S106456242570022X

M3 - Article

VL - 112

SP - 219

EP - 226

JO - Doklady Mathematics

JF - Doklady Mathematics

SN - 1064-5624

IS - 1

ER -

ID: 75590070