Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
Out-of-Dictionary Meanings Detecting Using Word-Sense Disambiguation Algorithms. / Borodina, Darya; Morozov, Dmitry.
Communications in Computer and Information Science. ed. / Maxim Bakaev; Radomir Bolgov; Anna Chizhik; Andrei Chugunov; Yury Kabanov; Roberto Pereira; Elakkiya R; Wei Zhang. Vol. 2534 Springer, 2026. p. 85-104 8.Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
}
TY - GEN
T1 - Out-of-Dictionary Meanings Detecting Using Word-Sense Disambiguation Algorithms
AU - Borodina, Darya
AU - Morozov, Dmitry
N1 - Conference code: 27
PY - 2026
Y1 - 2026
N2 - Automatic semantic shift detection can be used in a number of applied problems in linguistics, from compiling explanatory dictionaries to analyzing historical documents. In this paper, we consider the related problem of word-sense disambiguation (WSD) and the feasibility of adapting algorithms based on explanatory dictionaries to detect out-of-dictionary meanings. We selected 50 reference words, for which we collected a dataset of context-gloss pairs based on the National Media subcorpus of the Russian National Corpus, news feed of the Vkontakte social network, and the Big Explanatory Dictionary of the Russian Language. We adapted four algorithms originally developed for the WSD problem. The best quality (F1-score = 0.96) was achieved through the GlossBERT algorithm. We also assessed the generalization ability of this algorithm applying a train-test split by reference words. In this case, F1-score dropped significantly to a value of 0.71.
AB - Automatic semantic shift detection can be used in a number of applied problems in linguistics, from compiling explanatory dictionaries to analyzing historical documents. In this paper, we consider the related problem of word-sense disambiguation (WSD) and the feasibility of adapting algorithms based on explanatory dictionaries to detect out-of-dictionary meanings. We selected 50 reference words, for which we collected a dataset of context-gloss pairs based on the National Media subcorpus of the Russian National Corpus, news feed of the Vkontakte social network, and the Big Explanatory Dictionary of the Russian Language. We adapted four algorithms originally developed for the WSD problem. The best quality (F1-score = 0.96) was achieved through the GlossBERT algorithm. We also assessed the generalization ability of this algorithm applying a train-test split by reference words. In this case, F1-score dropped significantly to a value of 0.71.
KW - Semantic Shift Detection
KW - Word-Sense Disambiguation
KW - Natural Language Processing
UR - https://www.scopus.com/pages/publications/105019249740
UR - https://www.mendeley.com/catalogue/5d8c8c19-3252-3e8c-a708-91911972c6a0/
U2 - 10.1007/978-3-031-96177-9_8
DO - 10.1007/978-3-031-96177-9_8
M3 - Conference contribution
SN - 978-3-031-96176-2
VL - 2534
SP - 85
EP - 104
BT - Communications in Computer and Information Science
A2 - Bakaev, Maxim
A2 - Bolgov, Radomir
A2 - Chizhik, Anna
A2 - Chugunov, Andrei
A2 - Kabanov, Yury
A2 - Pereira, Roberto
A2 - R, Elakkiya
A2 - Zhang, Wei
PB - Springer
T2 - 27th International Conference "Internet and Modern Society. Human-Computer Communication"
Y2 - 24 June 2024 through 26 June 2024
ER -
ID: 71312786