Результаты исследований: Научные публикации в периодических изданиях › статья по материалам конференции › Рецензирование
A study of machine learning algorithms applied to GIS queries spelling correction. / Fomin, V. V.; Bondarenko, I. Yu.
в: Komp'juternaja Lingvistika i Intellektual'nye Tehnologii, Том 2018-May, № 17, 01.01.2018, стр. 185-199.Результаты исследований: Научные публикации в периодических изданиях › статья по материалам конференции › Рецензирование
}
TY - JOUR
T1 - A study of machine learning algorithms applied to GIS queries spelling correction
AU - Fomin, V. V.
AU - Bondarenko, I. Yu
PY - 2018/1/1
Y1 - 2018/1/1
N2 - The problem of spelling correction is crucial for search engines as misspellings have a negative effect on their performance. It gets even harder when search queries are related to a specific area not quite covered by standard spell checkers, such as geographic information systems (GIS). Moreover, standard spell-checkers are interactive, i. e. they can notice a misspelled word and suggest candidate corrections, but picking one of them is up to the user. This is why we decided to develop a spelling correction unit for 2GIS, a cartographic search company. To do this, we have extracted and manually annotated a corpus of GIS lookup queries, trained a language model, performed various experiments to find the best feature extractor, then fitted a logistic regression using an approach suggested in SpellRuEval, and then used it iteratively to get a better result. We have then measured the resulting performance by means of cross-validation, compared at against a baseline and observed a substantial increase. We also present an interpretation of the result achieved by calculating and discussing the importance of specific features and analyzing the output of the model.
AB - The problem of spelling correction is crucial for search engines as misspellings have a negative effect on their performance. It gets even harder when search queries are related to a specific area not quite covered by standard spell checkers, such as geographic information systems (GIS). Moreover, standard spell-checkers are interactive, i. e. they can notice a misspelled word and suggest candidate corrections, but picking one of them is up to the user. This is why we decided to develop a spelling correction unit for 2GIS, a cartographic search company. To do this, we have extracted and manually annotated a corpus of GIS lookup queries, trained a language model, performed various experiments to find the best feature extractor, then fitted a logistic regression using an approach suggested in SpellRuEval, and then used it iteratively to get a better result. We have then measured the resulting performance by means of cross-validation, compared at against a baseline and observed a substantial increase. We also present an interpretation of the result achieved by calculating and discussing the importance of specific features and analyzing the output of the model.
KW - Geographic information system
KW - Language model
KW - Local search
KW - Spell checker
KW - Text corpus
UR - http://www.scopus.com/inward/record.url?scp=85051246341&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85051246341
VL - 2018-May
SP - 185
EP - 199
JO - Компьютерная лингвистика и интеллектуальные технологии
JF - Компьютерная лингвистика и интеллектуальные технологии
SN - 2221-7932
IS - 17
T2 - 2018 International Conference on Computational Linguistics and Intellectual Technologies, Dialogue 2018
Y2 - 30 May 2018 through 2 June 2018
ER -
ID: 16113183