Результаты исследований: Научные публикации в периодических изданиях › статья › Рецензирование
A method for automatic text summarization based on rhetorical analysis and topic modeling. / Batura, Tatiana; Bakiyeva, Aigerim; Charintseva, Maria.
в: International Journal of Computing, Том 19, № 1, 01.01.2020, стр. 118-127.Результаты исследований: Научные публикации в периодических изданиях › статья › Рецензирование
}
TY - JOUR
T1 - A method for automatic text summarization based on rhetorical analysis and topic modeling
AU - Batura, Tatiana
AU - Bakiyeva, Aigerim
AU - Charintseva, Maria
PY - 2020/1/1
Y1 - 2020/1/1
N2 - This article describes the original method of automatic summarization of scientific and technical texts based on rhetorical analysis and using topic modeling. The proposed method combines the use of a linguistic knowledge base and machine learning. For the detection of key terms, we used topic modeling. First, unigram topic models containing only one-word terms are constructed. Further, these models are extended by adding multiword terms. The most significant fragments of the original document are determined in the process of rhetorical analysis with the help of discursive markers. When evaluating the importance of text fragments, keywords, multiword terms, and scientific lexicon characterizing scientific and technical texts are also taken into account. A linguistic knowledge base has been created to store information about the markers and scientific lexicon. The experiments showed that this method is effective, needs a comparatively small amount of training data and can be adapted to processing texts of different subject fields in other languages.
AB - This article describes the original method of automatic summarization of scientific and technical texts based on rhetorical analysis and using topic modeling. The proposed method combines the use of a linguistic knowledge base and machine learning. For the detection of key terms, we used topic modeling. First, unigram topic models containing only one-word terms are constructed. Further, these models are extended by adding multiword terms. The most significant fragments of the original document are determined in the process of rhetorical analysis with the help of discursive markers. When evaluating the importance of text fragments, keywords, multiword terms, and scientific lexicon characterizing scientific and technical texts are also taken into account. A linguistic knowledge base has been created to store information about the markers and scientific lexicon. The experiments showed that this method is effective, needs a comparatively small amount of training data and can be adapted to processing texts of different subject fields in other languages.
KW - Additive regularization
KW - Automatic summarization
KW - Discourse markers
KW - Natural language processing
KW - Rhetorical structure theory
KW - Topic modeling
UR - http://www.scopus.com/inward/record.url?scp=85086101281&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:85086101281
VL - 19
SP - 118
EP - 127
JO - International Journal of Computing
JF - International Journal of Computing
SN - 1727-6209
IS - 1
ER -
ID: 24517288