Parallel text document clustering based on genetic algorithm

Standard

Parallel text document clustering based on genetic algorithm. / Mansurova, Madina; Barakhnin, Vladimir; Aubakirov, Sanzhar et al.

In: CEUR Workshop Proceedings, Vol. 1839, 2017, p. 218-232.

Research output: Contribution to journal › Article › peer-review

Harvard

Mansurova, M, Barakhnin, V, Aubakirov, S, Khibatkhanuly, Y & Mussina, A 2017, 'Parallel text document clustering based on genetic algorithm', CEUR Workshop Proceedings, vol. 1839, pp. 218-232.

APA

Mansurova, M., Barakhnin, V., Aubakirov, S., Khibatkhanuly, Y., & Mussina, A. (2017). Parallel text document clustering based on genetic algorithm. CEUR Workshop Proceedings, 1839, 218-232.

Vancouver

Mansurova M, Barakhnin V, Aubakirov S, Khibatkhanuly Y, Mussina A. Parallel text document clustering based on genetic algorithm. CEUR Workshop Proceedings. 2017;1839:218-232.

Author

Mansurova, Madina ; Barakhnin, Vladimir ; Aubakirov, Sanzhar et al. / Parallel text document clustering based on genetic algorithm. In: CEUR Workshop Proceedings. 2017 ; Vol. 1839. pp. 218-232.

BibTeX

@article{fa8ba098697e464781c72a451d2e92ae,

title = "Parallel text document clustering based on genetic algorithm",

abstract = "This work describes parallel implementation of the text document clustering algorithm. The algorithm is based on evaluation of the similarity between objects in a competitive situation, which leads to the notion of the function of rival similarity. Attributes of bibliographic description of scientific articles were chosen as the scales for determining similarity measure. To find the weighting coefficients which are used in the formula of similarity measure a genetic algorithm is developed. To speed up the performance of the algorithm, parallel computing technologies are used. Parallelization is executed in two stages: in the stage of the genetic algorithm, as well as directly in clustering. The parallel genetic algorithm is implemented with the help of MPJ Express library and the parallel clustering algorithm using the Java 8 Streams library. The results of computational experiments showing benefits of the parallel implementation of the algorithm are presented.",

keywords = "Clustering algorithm, Genetic algorithm, Parallel computing",

author = "Madina Mansurova and Vladimir Barakhnin and Sanzhar Aubakirov and Yerzhan Khibatkhanuly and Aigerim Mussina",

year = "2017",

language = "English",

volume = "1839",

pages = "218--232",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

}

RIS

TY - JOUR

T1 - Parallel text document clustering based on genetic algorithm

AU - Mansurova, Madina

AU - Barakhnin, Vladimir

AU - Aubakirov, Sanzhar

AU - Khibatkhanuly, Yerzhan

AU - Mussina, Aigerim

PY - 2017

Y1 - 2017

N2 - This work describes parallel implementation of the text document clustering algorithm. The algorithm is based on evaluation of the similarity between objects in a competitive situation, which leads to the notion of the function of rival similarity. Attributes of bibliographic description of scientific articles were chosen as the scales for determining similarity measure. To find the weighting coefficients which are used in the formula of similarity measure a genetic algorithm is developed. To speed up the performance of the algorithm, parallel computing technologies are used. Parallelization is executed in two stages: in the stage of the genetic algorithm, as well as directly in clustering. The parallel genetic algorithm is implemented with the help of MPJ Express library and the parallel clustering algorithm using the Java 8 Streams library. The results of computational experiments showing benefits of the parallel implementation of the algorithm are presented.

AB - This work describes parallel implementation of the text document clustering algorithm. The algorithm is based on evaluation of the similarity between objects in a competitive situation, which leads to the notion of the function of rival similarity. Attributes of bibliographic description of scientific articles were chosen as the scales for determining similarity measure. To find the weighting coefficients which are used in the formula of similarity measure a genetic algorithm is developed. To speed up the performance of the algorithm, parallel computing technologies are used. Parallelization is executed in two stages: in the stage of the genetic algorithm, as well as directly in clustering. The parallel genetic algorithm is implemented with the help of MPJ Express library and the parallel clustering algorithm using the Java 8 Streams library. The results of computational experiments showing benefits of the parallel implementation of the algorithm are presented.

KW - Clustering algorithm

KW - Genetic algorithm

KW - Parallel computing

UR - http://www.scopus.com/inward/record.url?scp=85020491808&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:85020491808

VL - 1839

SP - 218

EP - 232

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

SN - 1613-0073

ER -

ID: 9410924