Research output: Contribution to journal › Conference article › peer-review
Identification of argumentative sentences in Russian scientific and popular science texts. / Salomatina, N. V.; Pimenov, I. S.; Sidorova, E. A.
In: Journal of Physics: Conference Series, Vol. 2099, No. 1, 012025, 13.12.2021.Research output: Contribution to journal › Conference article › peer-review
}
TY - JOUR
T1 - Identification of argumentative sentences in Russian scientific and popular science texts
AU - Salomatina, N. V.
AU - Pimenov, I. S.
AU - Sidorova, E. A.
N1 - Funding Information: The study was carried out within the framework of the state contract of the Sobolev Institute of Mathematics (project no. 0314-2019-0015). Publisher Copyright: © 2021 Institute of Physics Publishing. All rights reserved.
PY - 2021/12/13
Y1 - 2021/12/13
N2 - In this study we analyze the applicability of specific machine learning algorithms to the task of detecting sentences containing argumentation in Russian text. We employ a collection of scientific and popular science texts with manually annotated argumentation to evaluate the quality of identifying argumentative sentences in terms of precision, recall, and F-measure. The experiment involves three algorithms: MNB, SVM, and MLP. The bag of words model is used for representing texts. Lemmas of words in analyzed sentences serve as features for the classification. We perform the automatic selection of informative features in accordance with Variance and x2 criteria combined with the weight-based filtration of lemmas (via TF*IDF and EMI). The training set includes around 800 sentences, while the test set contains 180. The MNB algorithm demonstrates the highest F-measure and recall scores on almost all feature sets (maximal values reached equal 68.7% and 89% respectively), while the MLP algorithm shows the best precision for about half of feature selection variations (the maximal value is 72.5%).
AB - In this study we analyze the applicability of specific machine learning algorithms to the task of detecting sentences containing argumentation in Russian text. We employ a collection of scientific and popular science texts with manually annotated argumentation to evaluate the quality of identifying argumentative sentences in terms of precision, recall, and F-measure. The experiment involves three algorithms: MNB, SVM, and MLP. The bag of words model is used for representing texts. Lemmas of words in analyzed sentences serve as features for the classification. We perform the automatic selection of informative features in accordance with Variance and x2 criteria combined with the weight-based filtration of lemmas (via TF*IDF and EMI). The training set includes around 800 sentences, while the test set contains 180. The MNB algorithm demonstrates the highest F-measure and recall scores on almost all feature sets (maximal values reached equal 68.7% and 89% respectively), while the MLP algorithm shows the best precision for about half of feature selection variations (the maximal value is 72.5%).
UR - http://www.scopus.com/inward/record.url?scp=85123692447&partnerID=8YFLogxK
U2 - 10.1088/1742-6596/2099/1/012025
DO - 10.1088/1742-6596/2099/1/012025
M3 - Conference article
AN - SCOPUS:85123692447
VL - 2099
JO - Journal of Physics: Conference Series
JF - Journal of Physics: Conference Series
SN - 1742-6588
IS - 1
M1 - 012025
T2 - International Conference on Marchuk Scientific Readings 2021, MSR 2021
Y2 - 4 October 2021 through 8 October 2021
ER -
ID: 35379061