Standard

Identification of argumentative sentences in Russian scientific and popular science texts. / Salomatina, N. V.; Pimenov, I. S.; Sidorova, E. A.

In: Journal of Physics: Conference Series, Vol. 2099, No. 1, 012025, 13.12.2021.

Research output: Contribution to journalConference articlepeer-review

Harvard

Salomatina, NV, Pimenov, IS & Sidorova, EA 2021, 'Identification of argumentative sentences in Russian scientific and popular science texts', Journal of Physics: Conference Series, vol. 2099, no. 1, 012025. https://doi.org/10.1088/1742-6596/2099/1/012025

APA

Vancouver

Salomatina NV, Pimenov IS, Sidorova EA. Identification of argumentative sentences in Russian scientific and popular science texts. Journal of Physics: Conference Series. 2021 Dec 13;2099(1):012025. doi: 10.1088/1742-6596/2099/1/012025

Author

Salomatina, N. V. ; Pimenov, I. S. ; Sidorova, E. A. / Identification of argumentative sentences in Russian scientific and popular science texts. In: Journal of Physics: Conference Series. 2021 ; Vol. 2099, No. 1.

BibTeX

@article{2d95206ff5ce47afb84fc5066198577f,
title = "Identification of argumentative sentences in Russian scientific and popular science texts",
abstract = "In this study we analyze the applicability of specific machine learning algorithms to the task of detecting sentences containing argumentation in Russian text. We employ a collection of scientific and popular science texts with manually annotated argumentation to evaluate the quality of identifying argumentative sentences in terms of precision, recall, and F-measure. The experiment involves three algorithms: MNB, SVM, and MLP. The bag of words model is used for representing texts. Lemmas of words in analyzed sentences serve as features for the classification. We perform the automatic selection of informative features in accordance with Variance and x2 criteria combined with the weight-based filtration of lemmas (via TF*IDF and EMI). The training set includes around 800 sentences, while the test set contains 180. The MNB algorithm demonstrates the highest F-measure and recall scores on almost all feature sets (maximal values reached equal 68.7% and 89% respectively), while the MLP algorithm shows the best precision for about half of feature selection variations (the maximal value is 72.5%).",
author = "Salomatina, {N. V.} and Pimenov, {I. S.} and Sidorova, {E. A.}",
note = "Funding Information: The study was carried out within the framework of the state contract of the Sobolev Institute of Mathematics (project no. 0314-2019-0015). Publisher Copyright: {\textcopyright} 2021 Institute of Physics Publishing. All rights reserved.; International Conference on Marchuk Scientific Readings 2021, MSR 2021 ; Conference date: 04-10-2021 Through 08-10-2021",
year = "2021",
month = dec,
day = "13",
doi = "10.1088/1742-6596/2099/1/012025",
language = "English",
volume = "2099",
journal = "Journal of Physics: Conference Series",
issn = "1742-6588",
publisher = "IOP Publishing Ltd.",
number = "1",

}

RIS

TY - JOUR

T1 - Identification of argumentative sentences in Russian scientific and popular science texts

AU - Salomatina, N. V.

AU - Pimenov, I. S.

AU - Sidorova, E. A.

N1 - Funding Information: The study was carried out within the framework of the state contract of the Sobolev Institute of Mathematics (project no. 0314-2019-0015). Publisher Copyright: © 2021 Institute of Physics Publishing. All rights reserved.

PY - 2021/12/13

Y1 - 2021/12/13

N2 - In this study we analyze the applicability of specific machine learning algorithms to the task of detecting sentences containing argumentation in Russian text. We employ a collection of scientific and popular science texts with manually annotated argumentation to evaluate the quality of identifying argumentative sentences in terms of precision, recall, and F-measure. The experiment involves three algorithms: MNB, SVM, and MLP. The bag of words model is used for representing texts. Lemmas of words in analyzed sentences serve as features for the classification. We perform the automatic selection of informative features in accordance with Variance and x2 criteria combined with the weight-based filtration of lemmas (via TF*IDF and EMI). The training set includes around 800 sentences, while the test set contains 180. The MNB algorithm demonstrates the highest F-measure and recall scores on almost all feature sets (maximal values reached equal 68.7% and 89% respectively), while the MLP algorithm shows the best precision for about half of feature selection variations (the maximal value is 72.5%).

AB - In this study we analyze the applicability of specific machine learning algorithms to the task of detecting sentences containing argumentation in Russian text. We employ a collection of scientific and popular science texts with manually annotated argumentation to evaluate the quality of identifying argumentative sentences in terms of precision, recall, and F-measure. The experiment involves three algorithms: MNB, SVM, and MLP. The bag of words model is used for representing texts. Lemmas of words in analyzed sentences serve as features for the classification. We perform the automatic selection of informative features in accordance with Variance and x2 criteria combined with the weight-based filtration of lemmas (via TF*IDF and EMI). The training set includes around 800 sentences, while the test set contains 180. The MNB algorithm demonstrates the highest F-measure and recall scores on almost all feature sets (maximal values reached equal 68.7% and 89% respectively), while the MLP algorithm shows the best precision for about half of feature selection variations (the maximal value is 72.5%).

UR - http://www.scopus.com/inward/record.url?scp=85123692447&partnerID=8YFLogxK

U2 - 10.1088/1742-6596/2099/1/012025

DO - 10.1088/1742-6596/2099/1/012025

M3 - Conference article

AN - SCOPUS:85123692447

VL - 2099

JO - Journal of Physics: Conference Series

JF - Journal of Physics: Conference Series

SN - 1742-6588

IS - 1

M1 - 012025

T2 - International Conference on Marchuk Scientific Readings 2021, MSR 2021

Y2 - 4 October 2021 through 8 October 2021

ER -

ID: 35379061