Research output: Contribution to journal › Article › peer-review
Expert, Journal, and Automatic Classification of Full Texts and Annotations of Scientific Articles. / Selivanova, I. V.; Kosyakov, D. V.; Dubovitskii, D. A. et al.
In: Automatic documentation and mathematical linguistics, Vol. 55, No. 4, 07.2021, p. 178-189.Research output: Contribution to journal › Article › peer-review
}
TY - JOUR
T1 - Expert, Journal, and Automatic Classification of Full Texts and Annotations of Scientific Articles
AU - Selivanova, I. V.
AU - Kosyakov, D. V.
AU - Dubovitskii, D. A.
AU - Guskov, A. E.
PY - 2021/7
Y1 - 2021/7
N2 - In this article we consider a fundamentally new information-theoretic approach to the classification of scientific texts based on compression algorithms. An analysis using the example of the comparative classification of full-text documents from arXiv.org and short annotations from Scopus showed that the accuracy of the proposed method is 87-92% and, in general, is not inferior to the existing ones. These conclusions were confirmed by an expert assessment.
AB - In this article we consider a fundamentally new information-theoretic approach to the classification of scientific texts based on compression algorithms. An analysis using the example of the comparative classification of full-text documents from arXiv.org and short annotations from Scopus showed that the accuracy of the proposed method is 87-92% and, in general, is not inferior to the existing ones. These conclusions were confirmed by an expert assessment.
KW - text classification methods
KW - data compression algorithms
KW - scientific texts
KW - arXiv.org
KW - Scopus
KW - k-nearest neighbors
KW - logistic regression
KW - random forests
KW - naive Bayesian classification
KW - support vector machines
KW - K-NEAREST NEIGHBOR
U2 - 10.3103/S0005105521040075
DO - 10.3103/S0005105521040075
M3 - Article
VL - 55
SP - 178
EP - 189
JO - Automatic documentation and mathematical linguistics
JF - Automatic documentation and mathematical linguistics
SN - 0005-1055
IS - 4
ER -
ID: 34690303