Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
Automatic Aspect Extraction from Scientific Texts. / Marshalova, Anna; Bruches, Elena; Batura, Tatiana.
Communications in Computer and Information Science. Springer Science and Business Media Deutschland GmbH, 2024. p. 67-80 6 (Communications in Computer and Information Science; Vol. 1905 CCIS).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
}
TY - GEN
T1 - Automatic Aspect Extraction from Scientific Texts
AU - Marshalova, Anna
AU - Bruches, Elena
AU - Batura, Tatiana
N1 - Conference code: 11
PY - 2024
Y1 - 2024
N2 - Being able to extract from scientific papers their main points, key insights, and other important information, referred to here as aspects, might facilitate the process of conducting a scientific literature review. Therefore, the aim of our research is to create a tool for automatic aspect extraction from Russian-language scientific texts of any domain. In this paper, we present a cross-domain dataset of scientific texts in Russian, annotated with such aspects as Task, Contribution, Method, and Conclusion, as well as a baseline algorithm for aspect extraction, based on the multilingual BERT model fine-tuned on our data. We show that there are some differences in aspect representation in different domains, but even though our model was trained on a limited number of scientific domains, it is still able to generalize to new domains, as was proved by cross-domain experiments. The code and the dataset are available at https://github.com/anna-marshalova/automatic-aspect-extraction-from-scientific-texts.
AB - Being able to extract from scientific papers their main points, key insights, and other important information, referred to here as aspects, might facilitate the process of conducting a scientific literature review. Therefore, the aim of our research is to create a tool for automatic aspect extraction from Russian-language scientific texts of any domain. In this paper, we present a cross-domain dataset of scientific texts in Russian, annotated with such aspects as Task, Contribution, Method, and Conclusion, as well as a baseline algorithm for aspect extraction, based on the multilingual BERT model fine-tuned on our data. We show that there are some differences in aspect representation in different domains, but even though our model was trained on a limited number of scientific domains, it is still able to generalize to new domains, as was proved by cross-domain experiments. The code and the dataset are available at https://github.com/anna-marshalova/automatic-aspect-extraction-from-scientific-texts.
KW - Aspect extraction
KW - BERT fine-tuning
KW - Dataset annotation
KW - Scientific information extraction
KW - Sequence labelling
UR - https://www.scopus.com/record/display.uri?eid=2-s2.0-85200994417&origin=inward&txGid=778e7a7107844bf260bae80e395454f1
UR - https://www.mendeley.com/catalogue/9cfadbee-b35d-3491-b019-bd8ec9a68342/
U2 - 10.1007/978-3-031-67008-4_6
DO - 10.1007/978-3-031-67008-4_6
M3 - Conference contribution
SN - 9783031670077
T3 - Communications in Computer and Information Science
SP - 67
EP - 80
BT - Communications in Computer and Information Science
PB - Springer Science and Business Media Deutschland GmbH
T2 - 11th International Conference on Analysis of Images, Social Networks and Texts
Y2 - 28 September 2023 through 30 September 2023
ER -
ID: 61236561