Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование
Automated Classification of Potentially Insulting Speech Acts on Social Network Sites. / Komalova, Liliya; Glazkova, Anna; Morozov, Dmitry и др.
Digital Transformation and Global Society - 6th International Conference, DTGS 2021, Revised Selected Papers. ред. / Daniel A. Alexandrov; Andrei V. Chugunov; Yury Kabanov; Olessia Koltsova; Ilya Musabirov; Sergei Pashakhin; Alexander V. Boukhanovsky; Andrei V. Chugunov. Springer, 2022. стр. 365-374 (Communications in Computer and Information Science; Том 1503 CCIS).Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование
}
TY - GEN
T1 - Automated Classification of Potentially Insulting Speech Acts on Social Network Sites
AU - Komalova, Liliya
AU - Glazkova, Anna
AU - Morozov, Dmitry
AU - Epifanov, Rostislav
AU - Motovskikh, Leonid
AU - Mayorova, Ekaterina
N1 - Funding Information: The research done for this work has been supported by the 1st Workshop at the Mathematical Center in Akademgorodok (project No 26 "Mathematical support for linguistic expertise", 13 July-14 August, 2020) http://mca.nsu.ru/workshopen/. The authors express their sincere gratitude to the students of the Engineering School of Novosibirsk State University, especially to M.V. Fedorova and E.V. Timofeeva, as well as a student of the Higher School of Economics M.O. Maslova, who made an invaluable contribution to the collection of the dataset and acted as annotators. Publisher Copyright: © 2022, Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - Insulting speech acts have become the subject of public discussion in the media, social media, the basis for speculation in political communication, and a working concept in the legal environment. The present research article explores insulting speech acts on the social network site “VKontakte” aiming to develop an algorithm for automatic classification of text data. We conducted semantic analysis of the text of “Article 5.61” of the Code of Administrative Offenses of the Russian Federation, which made it possible to formulate inclusion criteria for formal classification. We used three common word embeddings models (BERT, ELMo, and fastText) on the original Russian language dataset consisting of 4596 annotated messages perceived as insulting speech acts. General findings argue that even in a specialized dataset the share of messages that meet criteria of inclusion is negligible. This indicates a low probability of going to court on the fact of an administrative offense under Article 5.61 based on speech communication on social network sites, even though such communication is public in nature and is automatically recorded in writing. Machine learning text classifier based on BERT model showed best performance.
AB - Insulting speech acts have become the subject of public discussion in the media, social media, the basis for speculation in political communication, and a working concept in the legal environment. The present research article explores insulting speech acts on the social network site “VKontakte” aiming to develop an algorithm for automatic classification of text data. We conducted semantic analysis of the text of “Article 5.61” of the Code of Administrative Offenses of the Russian Federation, which made it possible to formulate inclusion criteria for formal classification. We used three common word embeddings models (BERT, ELMo, and fastText) on the original Russian language dataset consisting of 4596 annotated messages perceived as insulting speech acts. General findings argue that even in a specialized dataset the share of messages that meet criteria of inclusion is negligible. This indicates a low probability of going to court on the fact of an administrative offense under Article 5.61 based on speech communication on social network sites, even though such communication is public in nature and is automatically recorded in writing. Machine learning text classifier based on BERT model showed best performance.
KW - Annotated dataset
KW - Automated classification
KW - Corpus linguistics
KW - Forensic linguistics
KW - Insulting speech act
KW - Internet language
KW - Linguistic expertise
KW - Social network site
KW - Vector word embedding
UR - http://www.scopus.com/inward/record.url?scp=85124647154&partnerID=8YFLogxK
UR - https://www.mendeley.com/catalogue/69deebb8-ae76-39eb-8aee-7dff4e62f478/
U2 - 10.1007/978-3-030-93715-7_26
DO - 10.1007/978-3-030-93715-7_26
M3 - Conference contribution
AN - SCOPUS:85124647154
SN - 978-3-030-93714-0
T3 - Communications in Computer and Information Science
SP - 365
EP - 374
BT - Digital Transformation and Global Society - 6th International Conference, DTGS 2021, Revised Selected Papers
A2 - Alexandrov, Daniel A.
A2 - Chugunov, Andrei V.
A2 - Kabanov, Yury
A2 - Koltsova, Olessia
A2 - Musabirov, Ilya
A2 - Pashakhin, Sergei
A2 - Boukhanovsky, Alexander V.
A2 - Chugunov, Andrei V.
PB - Springer
T2 - 6th International Conference on Digital Transformation and Global Society, DTGS 2021
Y2 - 23 June 2021 through 25 June 2021
ER -
ID: 35550346