Research output: Contribution to journal › Article › peer-review
Can LLMs Get to the Roots? Evaluating Russian Morpheme Segmentation Capabilities in Large Language Models. / Морозов, Дмитрий Алексеевич; Glazkova, Anna V.; Iomdin, Boris L.
In: Supercomputing Frontiers and Innovations, Vol. 12, No. 3, 25.12.2025, p. 63-75.Research output: Contribution to journal › Article › peer-review
}
TY - JOUR
T1 - Can LLMs Get to the Roots? Evaluating Russian Morpheme Segmentation Capabilities in Large Language Models
AU - Морозов, Дмитрий Алексеевич
AU - Glazkova, Anna V.
AU - Iomdin, Boris L.
PY - 2025/12/25
Y1 - 2025/12/25
N2 - Automatic morpheme segmentation, a crucial task for morphologically rich languages like Russian, is persistently hindered by a significant drop in performance on words containing out-of-vocabulary (OOV) roots. This issue affects even state-of-the-art models, such as fine-tuned BERT models. This study investigates the potential of modern Large Language Models (LLMs) to address this challenge, focusing on the specific task of root identification in Russian. We evaluate a diverse set of eight state-of-the-art LLMs, including proprietary and open-weight models, using a prompt-based, few-shot learning approach. The models' performance is benchmarked against strong baselines – a fine-tuned RuRoberta model and a CNN ensemble – on a 500-word test set. Our results demonstrate that one model, Gemini 2.5 Pro, surpasses both baselines by approximately 5 percentage points in root identification accuracy. An examination of the model's reasoning capabilities shows that while it can produce logically sound, etymologically-informed analyses, it is also highly prone to factual hallucinations. This work highlights that while LLMs show significant promise in overcoming the OOV root problem, the inconsistency of their reasoning presents a significant obstacle to their direct application, underscoring the need for further research into improving their factuality and consistency.
AB - Automatic morpheme segmentation, a crucial task for morphologically rich languages like Russian, is persistently hindered by a significant drop in performance on words containing out-of-vocabulary (OOV) roots. This issue affects even state-of-the-art models, such as fine-tuned BERT models. This study investigates the potential of modern Large Language Models (LLMs) to address this challenge, focusing on the specific task of root identification in Russian. We evaluate a diverse set of eight state-of-the-art LLMs, including proprietary and open-weight models, using a prompt-based, few-shot learning approach. The models' performance is benchmarked against strong baselines – a fine-tuned RuRoberta model and a CNN ensemble – on a 500-word test set. Our results demonstrate that one model, Gemini 2.5 Pro, surpasses both baselines by approximately 5 percentage points in root identification accuracy. An examination of the model's reasoning capabilities shows that while it can produce logically sound, etymologically-informed analyses, it is also highly prone to factual hallucinations. This work highlights that while LLMs show significant promise in overcoming the OOV root problem, the inconsistency of their reasoning presents a significant obstacle to their direct application, underscoring the need for further research into improving their factuality and consistency.
UR - https://www.scopus.com/pages/publications/105029075142
UR - https://www.mendeley.com/catalogue/c1c52f0b-7250-3fcb-b6cb-062e80ebdfd4/
U2 - 10.14529/jsfi250305
DO - 10.14529/jsfi250305
M3 - Article
VL - 12
SP - 63
EP - 75
JO - Supercomputing Frontiers and Innovations
JF - Supercomputing Frontiers and Innovations
SN - 2409-6008
IS - 3
ER -
ID: 74461329