Standard

A method for automatic text summarization based on rhetorical analysis and topic modeling. / Batura, Tatiana; Bakiyeva, Aigerim; Charintseva, Maria.

In: International Journal of Computing, Vol. 19, No. 1, 01.01.2020, p. 118-127.

Research output: Contribution to journalArticlepeer-review

Harvard

Batura, T, Bakiyeva, A & Charintseva, M 2020, 'A method for automatic text summarization based on rhetorical analysis and topic modeling', International Journal of Computing, vol. 19, no. 1, pp. 118-127.

APA

Batura, T., Bakiyeva, A., & Charintseva, M. (2020). A method for automatic text summarization based on rhetorical analysis and topic modeling. International Journal of Computing, 19(1), 118-127.

Vancouver

Batura T, Bakiyeva A, Charintseva M. A method for automatic text summarization based on rhetorical analysis and topic modeling. International Journal of Computing. 2020 Jan 1;19(1):118-127.

Author

Batura, Tatiana ; Bakiyeva, Aigerim ; Charintseva, Maria. / A method for automatic text summarization based on rhetorical analysis and topic modeling. In: International Journal of Computing. 2020 ; Vol. 19, No. 1. pp. 118-127.

BibTeX

@article{6e53409b80f64ce0a5c6d2b58ba5e2e7,
title = "A method for automatic text summarization based on rhetorical analysis and topic modeling",
abstract = "This article describes the original method of automatic summarization of scientific and technical texts based on rhetorical analysis and using topic modeling. The proposed method combines the use of a linguistic knowledge base and machine learning. For the detection of key terms, we used topic modeling. First, unigram topic models containing only one-word terms are constructed. Further, these models are extended by adding multiword terms. The most significant fragments of the original document are determined in the process of rhetorical analysis with the help of discursive markers. When evaluating the importance of text fragments, keywords, multiword terms, and scientific lexicon characterizing scientific and technical texts are also taken into account. A linguistic knowledge base has been created to store information about the markers and scientific lexicon. The experiments showed that this method is effective, needs a comparatively small amount of training data and can be adapted to processing texts of different subject fields in other languages.",
keywords = "Additive regularization, Automatic summarization, Discourse markers, Natural language processing, Rhetorical structure theory, Topic modeling",
author = "Tatiana Batura and Aigerim Bakiyeva and Maria Charintseva",
year = "2020",
month = jan,
day = "1",
language = "English",
volume = "19",
pages = "118--127",
journal = "International Journal of Computing",
issn = "1727-6209",
publisher = "Research Institute of Intelligent Computer Systems",
number = "1",

}

RIS

TY - JOUR

T1 - A method for automatic text summarization based on rhetorical analysis and topic modeling

AU - Batura, Tatiana

AU - Bakiyeva, Aigerim

AU - Charintseva, Maria

PY - 2020/1/1

Y1 - 2020/1/1

N2 - This article describes the original method of automatic summarization of scientific and technical texts based on rhetorical analysis and using topic modeling. The proposed method combines the use of a linguistic knowledge base and machine learning. For the detection of key terms, we used topic modeling. First, unigram topic models containing only one-word terms are constructed. Further, these models are extended by adding multiword terms. The most significant fragments of the original document are determined in the process of rhetorical analysis with the help of discursive markers. When evaluating the importance of text fragments, keywords, multiword terms, and scientific lexicon characterizing scientific and technical texts are also taken into account. A linguistic knowledge base has been created to store information about the markers and scientific lexicon. The experiments showed that this method is effective, needs a comparatively small amount of training data and can be adapted to processing texts of different subject fields in other languages.

AB - This article describes the original method of automatic summarization of scientific and technical texts based on rhetorical analysis and using topic modeling. The proposed method combines the use of a linguistic knowledge base and machine learning. For the detection of key terms, we used topic modeling. First, unigram topic models containing only one-word terms are constructed. Further, these models are extended by adding multiword terms. The most significant fragments of the original document are determined in the process of rhetorical analysis with the help of discursive markers. When evaluating the importance of text fragments, keywords, multiword terms, and scientific lexicon characterizing scientific and technical texts are also taken into account. A linguistic knowledge base has been created to store information about the markers and scientific lexicon. The experiments showed that this method is effective, needs a comparatively small amount of training data and can be adapted to processing texts of different subject fields in other languages.

KW - Additive regularization

KW - Automatic summarization

KW - Discourse markers

KW - Natural language processing

KW - Rhetorical structure theory

KW - Topic modeling

UR - http://www.scopus.com/inward/record.url?scp=85086101281&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:85086101281

VL - 19

SP - 118

EP - 127

JO - International Journal of Computing

JF - International Journal of Computing

SN - 1727-6209

IS - 1

ER -

ID: 24517288