Automatic Aspect Extraction from Scientific Texts

Standard

Automatic Aspect Extraction from Scientific Texts. / Marshalova, Anna; Bruches, Elena; Batura, Tatiana.

Communications in Computer and Information Science. Springer, 2024. p. 67-80 6 (Communications in Computer and Information Science; Vol. 1905 CCIS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review

Harvard

Marshalova, A, Bruches, E & Batura, T 2024, Automatic Aspect Extraction from Scientific Texts. in Communications in Computer and Information Science., 6, Communications in Computer and Information Science, vol. 1905 CCIS, Springer, pp. 67-80, 11th International Conference on Analysis of Images, Social Networks and Texts, Ереван, Armenia, 28.09.2023. https://doi.org/10.1007/978-3-031-67008-4_6

APA

Marshalova, A., Bruches, E., & Batura, T. (2024). Automatic Aspect Extraction from Scientific Texts. In Communications in Computer and Information Science (pp. 67-80). [6] (Communications in Computer and Information Science; Vol. 1905 CCIS). Springer. https://doi.org/10.1007/978-3-031-67008-4_6

Vancouver

Marshalova A, Bruches E, Batura T. Automatic Aspect Extraction from Scientific Texts. In Communications in Computer and Information Science. Springer. 2024. p. 67-80. 6. (Communications in Computer and Information Science). doi: 10.1007/978-3-031-67008-4_6

Author

Marshalova, Anna ; Bruches, Elena ; Batura, Tatiana. / Automatic Aspect Extraction from Scientific Texts. Communications in Computer and Information Science. Springer, 2024. pp. 67-80 (Communications in Computer and Information Science).

BibTeX

@inproceedings{967006a612034f3baf471b6886fdd005,

title = "Automatic Aspect Extraction from Scientific Texts",

abstract = "Being able to extract from scientific papers their main points, key insights, and other important information, referred to here as aspects, might facilitate the process of conducting a scientific literature review. Therefore, the aim of our research is to create a tool for automatic aspect extraction from Russian-language scientific texts of any domain. In this paper, we present a cross-domain dataset of scientific texts in Russian, annotated with such aspects as Task, Contribution, Method, and Conclusion, as well as a baseline algorithm for aspect extraction, based on the multilingual BERT model fine-tuned on our data. We show that there are some differences in aspect representation in different domains, but even though our model was trained on a limited number of scientific domains, it is still able to generalize to new domains, as was proved by cross-domain experiments. The code and the dataset are available at https://github.com/anna-marshalova/automatic-aspect-extraction-from-scientific-texts.",

keywords = "Aspect extraction, BERT fine-tuning, Dataset annotation, Scientific information extraction, Sequence labelling",

author = "Anna Marshalova and Elena Bruches and Tatiana Batura",

year = "2024",

doi = "10.1007/978-3-031-67008-4_6",

language = "English",

isbn = "9783031670077",

series = "Communications in Computer and Information Science",

publisher = "Springer",

pages = "67--80",

booktitle = "Communications in Computer and Information Science",

address = "United States",

note = "11th International Conference on Analysis of Images, Social Networks and Texts, AIST 2023 ; Conference date: 28-09-2023 Through 30-09-2023",

}

RIS

TY - GEN

T1 - Automatic Aspect Extraction from Scientific Texts

AU - Marshalova, Anna

AU - Bruches, Elena

AU - Batura, Tatiana

N1 - Conference code: 11

PY - 2024

Y1 - 2024

N2 - Being able to extract from scientific papers their main points, key insights, and other important information, referred to here as aspects, might facilitate the process of conducting a scientific literature review. Therefore, the aim of our research is to create a tool for automatic aspect extraction from Russian-language scientific texts of any domain. In this paper, we present a cross-domain dataset of scientific texts in Russian, annotated with such aspects as Task, Contribution, Method, and Conclusion, as well as a baseline algorithm for aspect extraction, based on the multilingual BERT model fine-tuned on our data. We show that there are some differences in aspect representation in different domains, but even though our model was trained on a limited number of scientific domains, it is still able to generalize to new domains, as was proved by cross-domain experiments. The code and the dataset are available at https://github.com/anna-marshalova/automatic-aspect-extraction-from-scientific-texts.

AB - Being able to extract from scientific papers their main points, key insights, and other important information, referred to here as aspects, might facilitate the process of conducting a scientific literature review. Therefore, the aim of our research is to create a tool for automatic aspect extraction from Russian-language scientific texts of any domain. In this paper, we present a cross-domain dataset of scientific texts in Russian, annotated with such aspects as Task, Contribution, Method, and Conclusion, as well as a baseline algorithm for aspect extraction, based on the multilingual BERT model fine-tuned on our data. We show that there are some differences in aspect representation in different domains, but even though our model was trained on a limited number of scientific domains, it is still able to generalize to new domains, as was proved by cross-domain experiments. The code and the dataset are available at https://github.com/anna-marshalova/automatic-aspect-extraction-from-scientific-texts.

KW - Aspect extraction

KW - BERT fine-tuning

KW - Dataset annotation

KW - Scientific information extraction

KW - Sequence labelling

UR - https://www.scopus.com/record/display.uri?eid=2-s2.0-85200994417&origin=inward&txGid=778e7a7107844bf260bae80e395454f1

UR - https://www.mendeley.com/catalogue/9cfadbee-b35d-3491-b019-bd8ec9a68342/

U2 - 10.1007/978-3-031-67008-4_6

DO - 10.1007/978-3-031-67008-4_6

M3 - Conference contribution

SN - 9783031670077

T3 - Communications in Computer and Information Science

SP - 67

EP - 80

BT - Communications in Computer and Information Science

PB - Springer

T2 - 11th International Conference on Analysis of Images, Social Networks and Texts

Y2 - 28 September 2023 through 30 September 2023

ER -

ID: 61236561