Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
Exploring Fine-Tuned Generative Models for Keyphrase Selection: A Case Study for Russian. / Glazkova, Anna; Morozov, Dmitry.
Data Analytics and Management in Data Intensive Domains. ed. / Panos Pardalos; Eduard Babkin; Nikolay Zolotykh; Sergey Stupnikov. Springer, 2026. p. 98-111 7 (Communications in Computer and Information Science; Vol. 2641 CCIS).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
}
TY - GEN
T1 - Exploring Fine-Tuned Generative Models for Keyphrase Selection: A Case Study for Russian
AU - Glazkova, Anna
AU - Morozov, Dmitry
N1 - Conference code: 26
PY - 2026
Y1 - 2026
N2 - Keyphrase selection plays a pivotal role within the domain of scholarly texts, facilitating efficient information retrieval, summarization, and indexing. In this work, we explored how to apply fine-tuned generative transformer-based models to the specific task of keyphrase selection within Russian scientific texts. We experimented with four distinct generative models, such as ruT5, ruGPT, mT5, and mBART, and evaluated their performance in both in-domain and cross-domain settings. The experiments were conducted on the texts of Russian scientific abstracts from four domains: mathematics & computer science, history, medicine, and linguistics. The use of generative models, namely mBART, led to gains in in-domain performance (up to 4.9% in BERTScore, 9.0% in ROUGE-1, and 12.2% in F1-score) over three keyphrase extraction baselines for the Russian language. Although the results for cross-domain usage were significantly lower, they still demonstrated the capability to surpass baseline performances in several cases, underscoring the promising potential for further exploration and refinement in this research field.
AB - Keyphrase selection plays a pivotal role within the domain of scholarly texts, facilitating efficient information retrieval, summarization, and indexing. In this work, we explored how to apply fine-tuned generative transformer-based models to the specific task of keyphrase selection within Russian scientific texts. We experimented with four distinct generative models, such as ruT5, ruGPT, mT5, and mBART, and evaluated their performance in both in-domain and cross-domain settings. The experiments were conducted on the texts of Russian scientific abstracts from four domains: mathematics & computer science, history, medicine, and linguistics. The use of generative models, namely mBART, led to gains in in-domain performance (up to 4.9% in BERTScore, 9.0% in ROUGE-1, and 12.2% in F1-score) over three keyphrase extraction baselines for the Russian language. Although the results for cross-domain usage were significantly lower, they still demonstrated the capability to surpass baseline performances in several cases, underscoring the promising potential for further exploration and refinement in this research field.
KW - Keyphrase Selection
KW - Keywords
KW - Scholarly Documents
KW - Sequence-to-sequence Models
KW - Text Generation
KW - Text Summarization
KW - mBART
UR - https://www.scopus.com/pages/publications/105021001901
UR - https://www.mendeley.com/catalogue/f0b0b62f-fd49-3584-b4bb-de0b7188a4b0/
U2 - 10.1007/978-3-032-03997-2_7
DO - 10.1007/978-3-032-03997-2_7
M3 - Conference contribution
SN - 978-3-032-03996-5
T3 - Communications in Computer and Information Science
SP - 98
EP - 111
BT - Data Analytics and Management in Data Intensive Domains
A2 - Pardalos, Panos
A2 - Babkin, Eduard
A2 - Zolotykh, Nikolay
A2 - Stupnikov, Sergey
PB - Springer
T2 - 26th International Conference Data Analytics and Management in Data Intensive Domains
Y2 - 23 October 2024 through 25 October 2024
ER -
ID: 72143660