Standard

Automatic Aspect Extraction from Scientific Texts. / Marshalova, Anna; Bruches, Elena; Batura, Tatiana.

Communications in Computer and Information Science. Springer Science and Business Media Deutschland GmbH, 2024. p. 67-80 6 (Communications in Computer and Information Science; Vol. 1905 CCIS).

Research output: Chapter in Book/Report/Conference proceedingConference contributionResearchpeer-review

Harvard

Marshalova, A, Bruches, E & Batura, T 2024, Automatic Aspect Extraction from Scientific Texts. in Communications in Computer and Information Science., 6, Communications in Computer and Information Science, vol. 1905 CCIS, Springer Science and Business Media Deutschland GmbH, pp. 67-80, 11th International Conference on Analysis of Images, Social Networks and Texts, Ереван, Armenia, 28.09.2023. https://doi.org/10.1007/978-3-031-67008-4_6

APA

Marshalova, A., Bruches, E., & Batura, T. (2024). Automatic Aspect Extraction from Scientific Texts. In Communications in Computer and Information Science (pp. 67-80). [6] (Communications in Computer and Information Science; Vol. 1905 CCIS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-67008-4_6

Vancouver

Marshalova A, Bruches E, Batura T. Automatic Aspect Extraction from Scientific Texts. In Communications in Computer and Information Science. Springer Science and Business Media Deutschland GmbH. 2024. p. 67-80. 6. (Communications in Computer and Information Science). doi: 10.1007/978-3-031-67008-4_6

Author

Marshalova, Anna ; Bruches, Elena ; Batura, Tatiana. / Automatic Aspect Extraction from Scientific Texts. Communications in Computer and Information Science. Springer Science and Business Media Deutschland GmbH, 2024. pp. 67-80 (Communications in Computer and Information Science).

BibTeX

@inproceedings{967006a612034f3baf471b6886fdd005,
title = "Automatic Aspect Extraction from Scientific Texts",
abstract = "Being able to extract from scientific papers their main points, key insights, and other important information, referred to here as aspects, might facilitate the process of conducting a scientific literature review. Therefore, the aim of our research is to create a tool for automatic aspect extraction from Russian-language scientific texts of any domain. In this paper, we present a cross-domain dataset of scientific texts in Russian, annotated with such aspects as Task, Contribution, Method, and Conclusion, as well as a baseline algorithm for aspect extraction, based on the multilingual BERT model fine-tuned on our data. We show that there are some differences in aspect representation in different domains, but even though our model was trained on a limited number of scientific domains, it is still able to generalize to new domains, as was proved by cross-domain experiments. The code and the dataset are available at https://github.com/anna-marshalova/automatic-aspect-extraction-from-scientific-texts.",
keywords = "Aspect extraction, BERT fine-tuning, Dataset annotation, Scientific information extraction, Sequence labelling",
author = "Anna Marshalova and Elena Bruches and Tatiana Batura",
year = "2024",
doi = "10.1007/978-3-031-67008-4_6",
language = "English",
isbn = "9783031670077",
series = "Communications in Computer and Information Science",
publisher = "Springer Science and Business Media Deutschland GmbH",
pages = "67--80",
booktitle = "Communications in Computer and Information Science",
address = "Germany",
note = "11th International Conference on Analysis of Images, Social Networks and Texts, AIST 2023 ; Conference date: 28-09-2023 Through 30-09-2023",

}

RIS

TY - GEN

T1 - Automatic Aspect Extraction from Scientific Texts

AU - Marshalova, Anna

AU - Bruches, Elena

AU - Batura, Tatiana

N1 - Conference code: 11

PY - 2024

Y1 - 2024

N2 - Being able to extract from scientific papers their main points, key insights, and other important information, referred to here as aspects, might facilitate the process of conducting a scientific literature review. Therefore, the aim of our research is to create a tool for automatic aspect extraction from Russian-language scientific texts of any domain. In this paper, we present a cross-domain dataset of scientific texts in Russian, annotated with such aspects as Task, Contribution, Method, and Conclusion, as well as a baseline algorithm for aspect extraction, based on the multilingual BERT model fine-tuned on our data. We show that there are some differences in aspect representation in different domains, but even though our model was trained on a limited number of scientific domains, it is still able to generalize to new domains, as was proved by cross-domain experiments. The code and the dataset are available at https://github.com/anna-marshalova/automatic-aspect-extraction-from-scientific-texts.

AB - Being able to extract from scientific papers their main points, key insights, and other important information, referred to here as aspects, might facilitate the process of conducting a scientific literature review. Therefore, the aim of our research is to create a tool for automatic aspect extraction from Russian-language scientific texts of any domain. In this paper, we present a cross-domain dataset of scientific texts in Russian, annotated with such aspects as Task, Contribution, Method, and Conclusion, as well as a baseline algorithm for aspect extraction, based on the multilingual BERT model fine-tuned on our data. We show that there are some differences in aspect representation in different domains, but even though our model was trained on a limited number of scientific domains, it is still able to generalize to new domains, as was proved by cross-domain experiments. The code and the dataset are available at https://github.com/anna-marshalova/automatic-aspect-extraction-from-scientific-texts.

KW - Aspect extraction

KW - BERT fine-tuning

KW - Dataset annotation

KW - Scientific information extraction

KW - Sequence labelling

UR - https://www.scopus.com/record/display.uri?eid=2-s2.0-85200994417&origin=inward&txGid=778e7a7107844bf260bae80e395454f1

UR - https://www.mendeley.com/catalogue/9cfadbee-b35d-3491-b019-bd8ec9a68342/

U2 - 10.1007/978-3-031-67008-4_6

DO - 10.1007/978-3-031-67008-4_6

M3 - Conference contribution

SN - 9783031670077

T3 - Communications in Computer and Information Science

SP - 67

EP - 80

BT - Communications in Computer and Information Science

PB - Springer Science and Business Media Deutschland GmbH

T2 - 11th International Conference on Analysis of Images, Social Networks and Texts

Y2 - 28 September 2023 through 30 September 2023

ER -

ID: 61236561