Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование
Cross-Domain Robustness of Transformer-Based Keyphrase Generation. / Glazkova, Anna; Morozov, Dmitry.
Communications in Computer and Information Science. Springer Science and Business Media Deutschland GmbH, 2024. стр. 249-265 19 (Communications in Computer and Information Science; Том 2086 CCIS).Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование
}
TY - GEN
T1 - Cross-Domain Robustness of Transformer-Based Keyphrase Generation
AU - Glazkova, Anna
AU - Morozov, Dmitry
N1 - Conference code: 25
PY - 2024
Y1 - 2024
N2 - Modern models for text generation show state-of-the-art results in many natural language processing tasks. In this work, we explore the effectiveness of abstractive text summarization models for keyphrase selection. A list of keyphrases is an important element of a text in databases and repositories of electronic documents. In our experiments, abstractive text summarization models fine-tuned for keyphrase generation show quite high results for a target text corpus. However, in most cases, the zero-shot performance on other corpora and domains is significantly lower. We investigate cross-domain limitations of abstractive text summarization models for keyphrase generation. We present an evaluation of the fine-tuned BART models for the keyphrase selection task across six benchmark corpora for keyphrase extraction including scientific texts from two domains and news texts. We explore the role of transfer learning between different domains to improve the BART model performance on small text corpora. Our experiments show that preliminary fine-tuning on out-of-domain corpora can be effective under conditions of a limited number of samples.
AB - Modern models for text generation show state-of-the-art results in many natural language processing tasks. In this work, we explore the effectiveness of abstractive text summarization models for keyphrase selection. A list of keyphrases is an important element of a text in databases and repositories of electronic documents. In our experiments, abstractive text summarization models fine-tuned for keyphrase generation show quite high results for a target text corpus. However, in most cases, the zero-shot performance on other corpora and domains is significantly lower. We investigate cross-domain limitations of abstractive text summarization models for keyphrase generation. We present an evaluation of the fine-tuned BART models for the keyphrase selection task across six benchmark corpora for keyphrase extraction including scientific texts from two domains and news texts. We explore the role of transfer learning between different domains to improve the BART model performance on small text corpora. Our experiments show that preliminary fine-tuning on out-of-domain corpora can be effective under conditions of a limited number of samples.
KW - BART
KW - Keyphrase extraction
KW - Scholarly document
KW - Text summarization
KW - Transfer learning
UR - https://www.scopus.com/record/display.uri?eid=2-s2.0-85206363258&origin=inward&txGid=9723b476eda80e86bb31d2c08110c0e4
UR - https://www.mendeley.com/catalogue/4722ccdf-ff68-3981-942a-c448b4f3fd51/
U2 - 10.1007/978-3-031-67826-4_19
DO - 10.1007/978-3-031-67826-4_19
M3 - Conference contribution
SN - 978-3-031-67825-7
T3 - Communications in Computer and Information Science
SP - 249
EP - 265
BT - Communications in Computer and Information Science
PB - Springer Science and Business Media Deutschland GmbH
T2 - 25th International Conference on Data Analytics and Management in Data Intensive Domains
Y2 - 24 October 2023 through 27 October 2023
ER -
ID: 61528953