Research output: Contribution to journal › Article › peer-review
A study of machine learning algorithms applied to gis queries spelling correction. / Fomin, V. V.; Bondarenko, I. Yu.
In: Komp'juternaja Lingvistika i Intellektual'nye Tehnologii, No. 17, 2018, p. 200-210.Research output: Contribution to journal › Article › peer-review
}
TY - JOUR
T1 - A study of machine learning algorithms applied to gis queries spelling correction
AU - Fomin, V. V.
AU - Bondarenko, I. Yu
N1 - Publisher Copyright: © 2018 Rossiiskii Gosudarstvennyi Gumanitarnyi Universitet.All Rights Reserved. Copyright: Copyright 2018 Elsevier B.V., All rights reserved.
PY - 2018
Y1 - 2018
N2 - The problem of spelling correction is crucial for search engines as misspellings have a negative effect on their performance. It gets even harder when search queries are related to a specific area not quite covered by standard spell checkers, such as geographic information systems (GIS). Moreover, standard spell-checkers are interactive, i. e. they can notice a misspelled word and suggest candidate corrections, but picking one of them is up to the user. This is why we decided to develop a spelling correction unit for 2GIS, a cartographic search company. To do this, we have extracted and manually annotated a corpus of GIS lookup queries, trained a language model, performed various experiments to find the best feature extractor, then fitted a logistic regression using an approach suggested in SpellRuEval, and then used it iteratively to get a better result. We have then measured the resulting performance by means of cross-validation, compared at against a baseline and observed a substantial increase. We also present an interpretation of the result achieved by calculating and discussing the importance of specific features and analyzing the output of the model.
AB - The problem of spelling correction is crucial for search engines as misspellings have a negative effect on their performance. It gets even harder when search queries are related to a specific area not quite covered by standard spell checkers, such as geographic information systems (GIS). Moreover, standard spell-checkers are interactive, i. e. they can notice a misspelled word and suggest candidate corrections, but picking one of them is up to the user. This is why we decided to develop a spelling correction unit for 2GIS, a cartographic search company. To do this, we have extracted and manually annotated a corpus of GIS lookup queries, trained a language model, performed various experiments to find the best feature extractor, then fitted a logistic regression using an approach suggested in SpellRuEval, and then used it iteratively to get a better result. We have then measured the resulting performance by means of cross-validation, compared at against a baseline and observed a substantial increase. We also present an interpretation of the result achieved by calculating and discussing the importance of specific features and analyzing the output of the model.
KW - Geographic information system
KW - Language model
KW - Local search
KW - Spell checker
KW - Text corpus
UR - http://www.scopus.com/inward/record.url?scp=85058037460&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:85058037460
SP - 200
EP - 210
JO - Компьютерная лингвистика и интеллектуальные технологии
JF - Компьютерная лингвистика и интеллектуальные технологии
SN - 2221-7932
IS - 17
ER -
ID: 27546984