Standard

Classification by Compression : Application of Information-Theory Methods for the Identification of Themes of Scientific Texts. / Selivanova, I. V.; Ryabko, B. Y. A.; Guskov, A. E.

In: Automatic documentation and mathematical linguistics, Vol. 51, No. 3, 01.06.2017, p. 120-126.

Research output: Contribution to journalArticlepeer-review

Harvard

APA

Vancouver

Selivanova IV, Ryabko BYA, Guskov AE. Classification by Compression: Application of Information-Theory Methods for the Identification of Themes of Scientific Texts. Automatic documentation and mathematical linguistics. 2017 Jun 1;51(3):120-126. doi: 10.3103/S0005105517030116

Author

Selivanova, I. V. ; Ryabko, B. Y. A. ; Guskov, A. E. / Classification by Compression : Application of Information-Theory Methods for the Identification of Themes of Scientific Texts. In: Automatic documentation and mathematical linguistics. 2017 ; Vol. 51, No. 3. pp. 120-126.

BibTeX

@article{f461d05c43f74244a5a727b382e4bc57,
title = "Classification by Compression: Application of Information-Theory Methods for the Identification of Themes of Scientific Texts",
abstract = "A method for automatic classification of scientific texts based on data compression is proposed. The method is implemented and investigated based on the data from an archive of scientific texts (arXiv.org) and in the CyberLeninka scientific electronic library (CyberLeninka.ru). Experiments showed that the method correctly identified the themes of scientific texts with a probability of 75-95%; its accuracy depends on the quality of the original data.",
keywords = "classification, thematic classification of texts, information theory, text compression, arXiv.org, CyberLeninka",
author = "Selivanova, {I. V.} and Ryabko, {B. Y. A.} and Guskov, {A. E.}",
year = "2017",
month = jun,
day = "1",
doi = "10.3103/S0005105517030116",
language = "English",
volume = "51",
pages = "120--126",
journal = "Automatic documentation and mathematical linguistics",
issn = "0005-1055",
publisher = "Allerton Press Inc.",
number = "3",

}

RIS

TY - JOUR

T1 - Classification by Compression

T2 - Application of Information-Theory Methods for the Identification of Themes of Scientific Texts

AU - Selivanova, I. V.

AU - Ryabko, B. Y. A.

AU - Guskov, A. E.

PY - 2017/6/1

Y1 - 2017/6/1

N2 - A method for automatic classification of scientific texts based on data compression is proposed. The method is implemented and investigated based on the data from an archive of scientific texts (arXiv.org) and in the CyberLeninka scientific electronic library (CyberLeninka.ru). Experiments showed that the method correctly identified the themes of scientific texts with a probability of 75-95%; its accuracy depends on the quality of the original data.

AB - A method for automatic classification of scientific texts based on data compression is proposed. The method is implemented and investigated based on the data from an archive of scientific texts (arXiv.org) and in the CyberLeninka scientific electronic library (CyberLeninka.ru). Experiments showed that the method correctly identified the themes of scientific texts with a probability of 75-95%; its accuracy depends on the quality of the original data.

KW - classification

KW - thematic classification of texts

KW - information theory

KW - text compression

KW - arXiv.org

KW - CyberLeninka

U2 - 10.3103/S0005105517030116

DO - 10.3103/S0005105517030116

M3 - Article

VL - 51

SP - 120

EP - 126

JO - Automatic documentation and mathematical linguistics

JF - Automatic documentation and mathematical linguistics

SN - 0005-1055

IS - 3

ER -

ID: 25331515