Standard
Automated detection of non-relevant posts on the russian imageboard “2ch” : Importance of the choice of word representations. / Bakarov, Amir; Gureenkova, Olga.
Analysis of Images, Social Networks and Texts - 6th International Conference, AIST 2017, Revised Selected Papers. ed. / WMP VanDerAalst; DI Ignatov; M Khachay; SO Kuznetsov; Lempitsky; IA Lomazova; N Loukachevitch; A Napoli; A Panchenko; PM Pardalos; AV Savchenko; S Wasserman. Springer-Verlag GmbH and Co. KG, 2018. p. 16-21 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10716 LNCS).
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
Harvard
Bakarov, A & Gureenkova, O 2018,
Automated detection of non-relevant posts on the russian imageboard “2ch”: Importance of the choice of word representations. in WMP VanDerAalst, DI Ignatov, M Khachay, SO Kuznetsov, Lempitsky, IA Lomazova, N Loukachevitch, A Napoli, A Panchenko, PM Pardalos, AV Savchenko & S Wasserman (eds),
Analysis of Images, Social Networks and Texts - 6th International Conference, AIST 2017, Revised Selected Papers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10716 LNCS, Springer-Verlag GmbH and Co. KG, pp. 16-21, 6th International Conference on Analysis of Images, Social Networks and Texts, AIST 2017, Moscow, Russian Federation,
27.07.2017.
https://doi.org/10.1007/978-3-319-73013-4_2
APA
Bakarov, A., & Gureenkova, O. (2018).
Automated detection of non-relevant posts on the russian imageboard “2ch”: Importance of the choice of word representations. In WMP. VanDerAalst, DI. Ignatov, M. Khachay, SO. Kuznetsov, Lempitsky, IA. Lomazova, N. Loukachevitch, A. Napoli, A. Panchenko, PM. Pardalos, AV. Savchenko, & S. Wasserman (Eds.),
Analysis of Images, Social Networks and Texts - 6th International Conference, AIST 2017, Revised Selected Papers (pp. 16-21). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10716 LNCS). Springer-Verlag GmbH and Co. KG.
https://doi.org/10.1007/978-3-319-73013-4_2
Vancouver
Bakarov A, Gureenkova O.
Automated detection of non-relevant posts on the russian imageboard “2ch”: Importance of the choice of word representations. In VanDerAalst WMP, Ignatov DI, Khachay M, Kuznetsov SO, Lempitsky, Lomazova IA, Loukachevitch N, Napoli A, Panchenko A, Pardalos PM, Savchenko AV, Wasserman S, editors, Analysis of Images, Social Networks and Texts - 6th International Conference, AIST 2017, Revised Selected Papers. Springer-Verlag GmbH and Co. KG. 2018. p. 16-21. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-319-73013-4_2
Author
Bakarov, Amir ; Gureenkova, Olga. /
Automated detection of non-relevant posts on the russian imageboard “2ch” : Importance of the choice of word representations. Analysis of Images, Social Networks and Texts - 6th International Conference, AIST 2017, Revised Selected Papers. editor / WMP VanDerAalst ; DI Ignatov ; M Khachay ; SO Kuznetsov ; Lempitsky ; IA Lomazova ; N Loukachevitch ; A Napoli ; A Panchenko ; PM Pardalos ; AV Savchenko ; S Wasserman. Springer-Verlag GmbH and Co. KG, 2018. pp. 16-21 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
BibTeX
@inproceedings{aafdd81284da4db982407a1df933d78d,
title = "Automated detection of non-relevant posts on the russian imageboard “2ch”: Importance of the choice of word representations",
abstract = "This study considers the problem of automated detection of non-relevant posts on Web forums and discusses the approach of resolving this problem by approximation it with the task of detection of semantic relatedness between the given post and the opening post of the forum discussion thread. The approximated task could be resolved through learning the supervised classifier with a composed word embeddings of two posts. Considering that the success in this task could be quite sensitive to the choice of word representations, we propose a comparison of the performance of different word embedding models. We train 7 models (Word2Vec, Glove, Word2Vec-f, Wang2Vec, AdaGram, FastText, Swivel), evaluate embeddings produced by them on dataset of human judgements and compare their performance on the task of non-relevant posts detection. To make the comparison, we propose a dataset of semantic relatedness with posts from one of the most popular Russian Web forums, imageboard “2ch”, which has challenging lexical and grammatical features.",
keywords = "2ch, Compositional semantics, Distributional semantics, Imageboard, Semantic relatedness, Word embeddings, Word similarity, Word similarity Word embeddings, Compositional semantics 2ch",
author = "Amir Bakarov and Olga Gureenkova",
year = "2018",
month = jan,
day = "1",
doi = "10.1007/978-3-319-73013-4_2",
language = "English",
isbn = "9783319730127",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer-Verlag GmbH and Co. KG",
pages = "16--21",
editor = "WMP VanDerAalst and DI Ignatov and M Khachay and SO Kuznetsov and Lempitsky and IA Lomazova and N Loukachevitch and A Napoli and A Panchenko and PM Pardalos and AV Savchenko and S Wasserman",
booktitle = "Analysis of Images, Social Networks and Texts - 6th International Conference, AIST 2017, Revised Selected Papers",
address = "Germany",
note = "6th International Conference on Analysis of Images, Social Networks and Texts, AIST 2017 ; Conference date: 27-07-2017 Through 29-07-2017",
}
RIS
TY - GEN
T1 - Automated detection of non-relevant posts on the russian imageboard “2ch”
T2 - 6th International Conference on Analysis of Images, Social Networks and Texts, AIST 2017
AU - Bakarov, Amir
AU - Gureenkova, Olga
PY - 2018/1/1
Y1 - 2018/1/1
N2 - This study considers the problem of automated detection of non-relevant posts on Web forums and discusses the approach of resolving this problem by approximation it with the task of detection of semantic relatedness between the given post and the opening post of the forum discussion thread. The approximated task could be resolved through learning the supervised classifier with a composed word embeddings of two posts. Considering that the success in this task could be quite sensitive to the choice of word representations, we propose a comparison of the performance of different word embedding models. We train 7 models (Word2Vec, Glove, Word2Vec-f, Wang2Vec, AdaGram, FastText, Swivel), evaluate embeddings produced by them on dataset of human judgements and compare their performance on the task of non-relevant posts detection. To make the comparison, we propose a dataset of semantic relatedness with posts from one of the most popular Russian Web forums, imageboard “2ch”, which has challenging lexical and grammatical features.
AB - This study considers the problem of automated detection of non-relevant posts on Web forums and discusses the approach of resolving this problem by approximation it with the task of detection of semantic relatedness between the given post and the opening post of the forum discussion thread. The approximated task could be resolved through learning the supervised classifier with a composed word embeddings of two posts. Considering that the success in this task could be quite sensitive to the choice of word representations, we propose a comparison of the performance of different word embedding models. We train 7 models (Word2Vec, Glove, Word2Vec-f, Wang2Vec, AdaGram, FastText, Swivel), evaluate embeddings produced by them on dataset of human judgements and compare their performance on the task of non-relevant posts detection. To make the comparison, we propose a dataset of semantic relatedness with posts from one of the most popular Russian Web forums, imageboard “2ch”, which has challenging lexical and grammatical features.
KW - 2ch
KW - Compositional semantics
KW - Distributional semantics
KW - Imageboard
KW - Semantic relatedness
KW - Word embeddings
KW - Word similarity
KW - Word similarity Word embeddings
KW - Compositional semantics 2ch
UR - http://www.scopus.com/inward/record.url?scp=85039438003&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-73013-4_2
DO - 10.1007/978-3-319-73013-4_2
M3 - Conference contribution
AN - SCOPUS:85039438003
SN - 9783319730127
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 16
EP - 21
BT - Analysis of Images, Social Networks and Texts - 6th International Conference, AIST 2017, Revised Selected Papers
A2 - VanDerAalst, WMP
A2 - Ignatov, DI
A2 - Khachay, M
A2 - Kuznetsov, SO
A2 - Lempitsky, null
A2 - Lomazova, IA
A2 - Loukachevitch, N
A2 - Napoli, A
A2 - Panchenko, A
A2 - Pardalos, PM
A2 - Savchenko, AV
A2 - Wasserman, S
PB - Springer-Verlag GmbH and Co. KG
Y2 - 27 July 2017 through 29 July 2017
ER -