Research output: Contribution to journal › Article › peer-review
Classification by Compression : Application of Information-Theory Methods for the Identification of Themes of Scientific Texts. / Selivanova, I. V.; Ryabko, B. Y. A.; Guskov, A. E.
In: Automatic documentation and mathematical linguistics, Vol. 51, No. 3, 01.06.2017, p. 120-126.Research output: Contribution to journal › Article › peer-review
}
TY - JOUR
T1 - Classification by Compression
T2 - Application of Information-Theory Methods for the Identification of Themes of Scientific Texts
AU - Selivanova, I. V.
AU - Ryabko, B. Y. A.
AU - Guskov, A. E.
PY - 2017/6/1
Y1 - 2017/6/1
N2 - A method for automatic classification of scientific texts based on data compression is proposed. The method is implemented and investigated based on the data from an archive of scientific texts (arXiv.org) and in the CyberLeninka scientific electronic library (CyberLeninka.ru). Experiments showed that the method correctly identified the themes of scientific texts with a probability of 75-95%; its accuracy depends on the quality of the original data.
AB - A method for automatic classification of scientific texts based on data compression is proposed. The method is implemented and investigated based on the data from an archive of scientific texts (arXiv.org) and in the CyberLeninka scientific electronic library (CyberLeninka.ru). Experiments showed that the method correctly identified the themes of scientific texts with a probability of 75-95%; its accuracy depends on the quality of the original data.
KW - classification
KW - thematic classification of texts
KW - information theory
KW - text compression
KW - arXiv.org
KW - CyberLeninka
U2 - 10.3103/S0005105517030116
DO - 10.3103/S0005105517030116
M3 - Article
VL - 51
SP - 120
EP - 126
JO - Automatic documentation and mathematical linguistics
JF - Automatic documentation and mathematical linguistics
SN - 0005-1055
IS - 3
ER -
ID: 25331515