Research output: Contribution to journal › Article › peer-review
Parallel text document clustering based on genetic algorithm. / Mansurova, Madina; Barakhnin, Vladimir; Aubakirov, Sanzhar et al.
In: CEUR Workshop Proceedings, Vol. 1839, 2017, p. 218-232.Research output: Contribution to journal › Article › peer-review
}
TY - JOUR
T1 - Parallel text document clustering based on genetic algorithm
AU - Mansurova, Madina
AU - Barakhnin, Vladimir
AU - Aubakirov, Sanzhar
AU - Khibatkhanuly, Yerzhan
AU - Mussina, Aigerim
PY - 2017
Y1 - 2017
N2 - This work describes parallel implementation of the text document clustering algorithm. The algorithm is based on evaluation of the similarity between objects in a competitive situation, which leads to the notion of the function of rival similarity. Attributes of bibliographic description of scientific articles were chosen as the scales for determining similarity measure. To find the weighting coefficients which are used in the formula of similarity measure a genetic algorithm is developed. To speed up the performance of the algorithm, parallel computing technologies are used. Parallelization is executed in two stages: in the stage of the genetic algorithm, as well as directly in clustering. The parallel genetic algorithm is implemented with the help of MPJ Express library and the parallel clustering algorithm using the Java 8 Streams library. The results of computational experiments showing benefits of the parallel implementation of the algorithm are presented.
AB - This work describes parallel implementation of the text document clustering algorithm. The algorithm is based on evaluation of the similarity between objects in a competitive situation, which leads to the notion of the function of rival similarity. Attributes of bibliographic description of scientific articles were chosen as the scales for determining similarity measure. To find the weighting coefficients which are used in the formula of similarity measure a genetic algorithm is developed. To speed up the performance of the algorithm, parallel computing technologies are used. Parallelization is executed in two stages: in the stage of the genetic algorithm, as well as directly in clustering. The parallel genetic algorithm is implemented with the help of MPJ Express library and the parallel clustering algorithm using the Java 8 Streams library. The results of computational experiments showing benefits of the parallel implementation of the algorithm are presented.
KW - Clustering algorithm
KW - Genetic algorithm
KW - Parallel computing
UR - http://www.scopus.com/inward/record.url?scp=85020491808&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:85020491808
VL - 1839
SP - 218
EP - 232
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
SN - 1613-0073
ER -
ID: 9410924