Hybrid approach to automatic summarization of scientific and technical texts

Standard

Hybrid approach to automatic summarization of scientific and technical texts. / Bakiyeva, Aigerim M.; Batura, Tatiana V.

в: Journal of Theoretical and Applied Information Technology, Том 98, № 4, 29.02.2020, стр. 559-570.

Результаты исследований: Научные публикации в периодических изданиях › статья › Рецензирование

Harvard

Bakiyeva, AM & Batura, TV 2020, 'Hybrid approach to automatic summarization of scientific and technical texts', Journal of Theoretical and Applied Information Technology, Том. 98, № 4, стр. 559-570. <http://www.jatit.org/volumes/Vol98No4/3Vol98No4.pdf>

APA

Bakiyeva, A. M., & Batura, T. V. (2020). Hybrid approach to automatic summarization of scientific and technical texts. Journal of Theoretical and Applied Information Technology, 98(4), 559-570. http://www.jatit.org/volumes/Vol98No4/3Vol98No4.pdf

Vancouver

Bakiyeva AM, Batura TV. Hybrid approach to automatic summarization of scientific and technical texts. Journal of Theoretical and Applied Information Technology. 2020 февр. 29;98(4):559-570.

Author

Bakiyeva, Aigerim M. ; Batura, Tatiana V. / Hybrid approach to automatic summarization of scientific and technical texts. в: Journal of Theoretical and Applied Information Technology. 2020 ; Том 98, № 4. стр. 559-570.

BibTeX

@article{969333a3403d47089d44cb1510df4a21,

title = "Hybrid approach to automatic summarization of scientific and technical texts",

abstract = "The paper is devoted to the methods of automatic summarization, which use the representation of a text in the form of a graph. And contains an attempt at formal description of the text transformation in terms of the predicate calculus logic. The proposed method combines the use of a linguistic knowledge base, graph representation of texts and machine learning. The fragments of a text, such as words, sentences, paragraphs, are represented as graph nodes, and relations between nodes, for example, rhetorical relations, are denoted by edges. Automatic determination of rhetorical relations in the text allows you to set the location of the nucleus and satellite. To compile a brief annotation, it is necessary to transform the original text, based on the assumption that the nucleus contains the most important part of the statement. The relations between discursive markers in the text define a hierarchy that allows one to solve various problems of word processing in a natural language, including the task of automatically compiling a short abstract on a large volume of text. The summarization process created by the authors consists of six main steps: preprocessing, topic modeling, rhetorical analysis and transformation, weight evaluation, sentence selection, and smoothing. Topic modeling is used to discover key terms. First, unigram topiс models, that contain only one-word terms, are constructed. These models are further expanded by adding multiword terms. The most significant fragments of the source document are determined in the process of rhetorical analysis using discursive markers. Presentation of texts in the form of graphs helps to demonstrate the transformations with the text necessary to highlight important fragments. In assessing the importance of the text fragments are also included keywords, multiword and scientific terms, describing the scientific and technical texts. To store the marker information has created a linguistic knowledge base. The final step in the formation of the annotation is smoothing — a text conversion procedure that allows you to make the text of the abstract (annotation) received more coherent and consistent. The importance of sentences is determined using discursive markers and connectors. We used additive regularization for topic modeling (ARTM) to extract keywords and discover the topics. Our proposed BigARTM and Rake hybrid method for obtaining thematic models and the task of obtaining an abstract using RST markers, action and templates showed its effectiveness and efficiency in testing and in comparison with other methods as was shown in comparisons using the precision, recall and F- measure calculated in a way similar to [2, 10].",

keywords = "Automatic Text Processing, Discursive Marker, Rhetorical Relationships, Semantics, Text Analysis, Theory of Rhetorical Structures",

author = "Bakiyeva, {Aigerim M.} and Batura, {Tatiana V.}",

year = "2020",

month = feb,

day = "29",

language = "English",

volume = "98",

pages = "559--570",

journal = "Journal of Theoretical and Applied Information Technology",

issn = "1992-8645",

publisher = "Asian Research Publishing Network (ARPN)",

number = "4",

}

RIS

TY - JOUR

T1 - Hybrid approach to automatic summarization of scientific and technical texts

AU - Bakiyeva, Aigerim M.

AU - Batura, Tatiana V.

PY - 2020/2/29

Y1 - 2020/2/29

N2 - The paper is devoted to the methods of automatic summarization, which use the representation of a text in the form of a graph. And contains an attempt at formal description of the text transformation in terms of the predicate calculus logic. The proposed method combines the use of a linguistic knowledge base, graph representation of texts and machine learning. The fragments of a text, such as words, sentences, paragraphs, are represented as graph nodes, and relations between nodes, for example, rhetorical relations, are denoted by edges. Automatic determination of rhetorical relations in the text allows you to set the location of the nucleus and satellite. To compile a brief annotation, it is necessary to transform the original text, based on the assumption that the nucleus contains the most important part of the statement. The relations between discursive markers in the text define a hierarchy that allows one to solve various problems of word processing in a natural language, including the task of automatically compiling a short abstract on a large volume of text. The summarization process created by the authors consists of six main steps: preprocessing, topic modeling, rhetorical analysis and transformation, weight evaluation, sentence selection, and smoothing. Topic modeling is used to discover key terms. First, unigram topiс models, that contain only one-word terms, are constructed. These models are further expanded by adding multiword terms. The most significant fragments of the source document are determined in the process of rhetorical analysis using discursive markers. Presentation of texts in the form of graphs helps to demonstrate the transformations with the text necessary to highlight important fragments. In assessing the importance of the text fragments are also included keywords, multiword and scientific terms, describing the scientific and technical texts. To store the marker information has created a linguistic knowledge base. The final step in the formation of the annotation is smoothing — a text conversion procedure that allows you to make the text of the abstract (annotation) received more coherent and consistent. The importance of sentences is determined using discursive markers and connectors. We used additive regularization for topic modeling (ARTM) to extract keywords and discover the topics. Our proposed BigARTM and Rake hybrid method for obtaining thematic models and the task of obtaining an abstract using RST markers, action and templates showed its effectiveness and efficiency in testing and in comparison with other methods as was shown in comparisons using the precision, recall and F- measure calculated in a way similar to [2, 10].

AB - The paper is devoted to the methods of automatic summarization, which use the representation of a text in the form of a graph. And contains an attempt at formal description of the text transformation in terms of the predicate calculus logic. The proposed method combines the use of a linguistic knowledge base, graph representation of texts and machine learning. The fragments of a text, such as words, sentences, paragraphs, are represented as graph nodes, and relations between nodes, for example, rhetorical relations, are denoted by edges. Automatic determination of rhetorical relations in the text allows you to set the location of the nucleus and satellite. To compile a brief annotation, it is necessary to transform the original text, based on the assumption that the nucleus contains the most important part of the statement. The relations between discursive markers in the text define a hierarchy that allows one to solve various problems of word processing in a natural language, including the task of automatically compiling a short abstract on a large volume of text. The summarization process created by the authors consists of six main steps: preprocessing, topic modeling, rhetorical analysis and transformation, weight evaluation, sentence selection, and smoothing. Topic modeling is used to discover key terms. First, unigram topiс models, that contain only one-word terms, are constructed. These models are further expanded by adding multiword terms. The most significant fragments of the source document are determined in the process of rhetorical analysis using discursive markers. Presentation of texts in the form of graphs helps to demonstrate the transformations with the text necessary to highlight important fragments. In assessing the importance of the text fragments are also included keywords, multiword and scientific terms, describing the scientific and technical texts. To store the marker information has created a linguistic knowledge base. The final step in the formation of the annotation is smoothing — a text conversion procedure that allows you to make the text of the abstract (annotation) received more coherent and consistent. The importance of sentences is determined using discursive markers and connectors. We used additive regularization for topic modeling (ARTM) to extract keywords and discover the topics. Our proposed BigARTM and Rake hybrid method for obtaining thematic models and the task of obtaining an abstract using RST markers, action and templates showed its effectiveness and efficiency in testing and in comparison with other methods as was shown in comparisons using the precision, recall and F- measure calculated in a way similar to [2, 10].

KW - Automatic Text Processing

KW - Discursive Marker

KW - Rhetorical Relationships

KW - Semantics

KW - Text Analysis

KW - Theory of Rhetorical Structures

UR - http://www.scopus.com/inward/record.url?scp=85081355627&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:85081355627

VL - 98

SP - 559

EP - 570

JO - Journal of Theoretical and Applied Information Technology

JF - Journal of Theoretical and Applied Information Technology

SN - 1992-8645

IS - 4

ER -

ID: 23758714