Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование
Automated detection of non-relevant posts on the russian imageboard “2ch” : Importance of the choice of word representations. / Bakarov, Amir; Gureenkova, Olga.
Analysis of Images, Social Networks and Texts - 6th International Conference, AIST 2017, Revised Selected Papers. ред. / WMP VanDerAalst; DI Ignatov; M Khachay; SO Kuznetsov; Lempitsky; IA Lomazova; N Loukachevitch; A Napoli; A Panchenko; PM Pardalos; AV Savchenko; S Wasserman. Springer, 2018. стр. 16-21 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Том 10716 LNCS).Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование
}
TY - GEN
T1 - Automated detection of non-relevant posts on the russian imageboard “2ch”
T2 - 6th International Conference on Analysis of Images, Social Networks and Texts, AIST 2017
AU - Bakarov, Amir
AU - Gureenkova, Olga
PY - 2018/1/1
Y1 - 2018/1/1
N2 - This study considers the problem of automated detection of non-relevant posts on Web forums and discusses the approach of resolving this problem by approximation it with the task of detection of semantic relatedness between the given post and the opening post of the forum discussion thread. The approximated task could be resolved through learning the supervised classifier with a composed word embeddings of two posts. Considering that the success in this task could be quite sensitive to the choice of word representations, we propose a comparison of the performance of different word embedding models. We train 7 models (Word2Vec, Glove, Word2Vec-f, Wang2Vec, AdaGram, FastText, Swivel), evaluate embeddings produced by them on dataset of human judgements and compare their performance on the task of non-relevant posts detection. To make the comparison, we propose a dataset of semantic relatedness with posts from one of the most popular Russian Web forums, imageboard “2ch”, which has challenging lexical and grammatical features.
AB - This study considers the problem of automated detection of non-relevant posts on Web forums and discusses the approach of resolving this problem by approximation it with the task of detection of semantic relatedness between the given post and the opening post of the forum discussion thread. The approximated task could be resolved through learning the supervised classifier with a composed word embeddings of two posts. Considering that the success in this task could be quite sensitive to the choice of word representations, we propose a comparison of the performance of different word embedding models. We train 7 models (Word2Vec, Glove, Word2Vec-f, Wang2Vec, AdaGram, FastText, Swivel), evaluate embeddings produced by them on dataset of human judgements and compare their performance on the task of non-relevant posts detection. To make the comparison, we propose a dataset of semantic relatedness with posts from one of the most popular Russian Web forums, imageboard “2ch”, which has challenging lexical and grammatical features.
KW - 2ch
KW - Compositional semantics
KW - Distributional semantics
KW - Imageboard
KW - Semantic relatedness
KW - Word embeddings
KW - Word similarity
KW - Word similarity Word embeddings
KW - Compositional semantics 2ch
UR - http://www.scopus.com/inward/record.url?scp=85039438003&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-73013-4_2
DO - 10.1007/978-3-319-73013-4_2
M3 - Conference contribution
AN - SCOPUS:85039438003
SN - 9783319730127
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 16
EP - 21
BT - Analysis of Images, Social Networks and Texts - 6th International Conference, AIST 2017, Revised Selected Papers
A2 - VanDerAalst, WMP
A2 - Ignatov, DI
A2 - Khachay, M
A2 - Kuznetsov, SO
A2 - Lempitsky, null
A2 - Lomazova, IA
A2 - Loukachevitch, N
A2 - Napoli, A
A2 - Panchenko, A
A2 - Pardalos, PM
A2 - Savchenko, AV
A2 - Wasserman, S
PB - Springer
Y2 - 27 July 2017 through 29 July 2017
ER -
ID: 12099770