Standard

A study of machine learning algorithms applied to gis queries spelling correction. / Fomin, V. V.; Bondarenko, I. Yu.

в: Komp'juternaja Lingvistika i Intellektual'nye Tehnologii, № 17, 2018, стр. 200-210.

Результаты исследований: Научные публикации в периодических изданияхстатьяРецензирование

Harvard

Fomin, VV & Bondarenko, IY 2018, 'A study of machine learning algorithms applied to gis queries spelling correction', Komp'juternaja Lingvistika i Intellektual'nye Tehnologii, № 17, стр. 200-210.

APA

Fomin, V. V., & Bondarenko, I. Y. (2018). A study of machine learning algorithms applied to gis queries spelling correction. Komp'juternaja Lingvistika i Intellektual'nye Tehnologii, (17), 200-210.

Vancouver

Fomin VV, Bondarenko IY. A study of machine learning algorithms applied to gis queries spelling correction. Komp'juternaja Lingvistika i Intellektual'nye Tehnologii. 2018;(17):200-210.

Author

Fomin, V. V. ; Bondarenko, I. Yu. / A study of machine learning algorithms applied to gis queries spelling correction. в: Komp'juternaja Lingvistika i Intellektual'nye Tehnologii. 2018 ; № 17. стр. 200-210.

BibTeX

@article{f6182856d1d24ede8a1eeecfcd6f0321,
title = "A study of machine learning algorithms applied to gis queries spelling correction",
abstract = "The problem of spelling correction is crucial for search engines as misspellings have a negative effect on their performance. It gets even harder when search queries are related to a specific area not quite covered by standard spell checkers, such as geographic information systems (GIS). Moreover, standard spell-checkers are interactive, i. e. they can notice a misspelled word and suggest candidate corrections, but picking one of them is up to the user. This is why we decided to develop a spelling correction unit for 2GIS, a cartographic search company. To do this, we have extracted and manually annotated a corpus of GIS lookup queries, trained a language model, performed various experiments to find the best feature extractor, then fitted a logistic regression using an approach suggested in SpellRuEval, and then used it iteratively to get a better result. We have then measured the resulting performance by means of cross-validation, compared at against a baseline and observed a substantial increase. We also present an interpretation of the result achieved by calculating and discussing the importance of specific features and analyzing the output of the model.",
keywords = "Geographic information system, Language model, Local search, Spell checker, Text corpus",
author = "Fomin, {V. V.} and Bondarenko, {I. Yu}",
note = "Publisher Copyright: {\textcopyright} 2018 Rossiiskii Gosudarstvennyi Gumanitarnyi Universitet.All Rights Reserved. Copyright: Copyright 2018 Elsevier B.V., All rights reserved.",
year = "2018",
language = "English",
pages = "200--210",
journal = "Компьютерная лингвистика и интеллектуальные технологии",
issn = "2221-7932",
publisher = "Komp'juternaja Lingvistika i Intellektual'nye Tehnologii",
number = "17",

}

RIS

TY - JOUR

T1 - A study of machine learning algorithms applied to gis queries spelling correction

AU - Fomin, V. V.

AU - Bondarenko, I. Yu

N1 - Publisher Copyright: © 2018 Rossiiskii Gosudarstvennyi Gumanitarnyi Universitet.All Rights Reserved. Copyright: Copyright 2018 Elsevier B.V., All rights reserved.

PY - 2018

Y1 - 2018

N2 - The problem of spelling correction is crucial for search engines as misspellings have a negative effect on their performance. It gets even harder when search queries are related to a specific area not quite covered by standard spell checkers, such as geographic information systems (GIS). Moreover, standard spell-checkers are interactive, i. e. they can notice a misspelled word and suggest candidate corrections, but picking one of them is up to the user. This is why we decided to develop a spelling correction unit for 2GIS, a cartographic search company. To do this, we have extracted and manually annotated a corpus of GIS lookup queries, trained a language model, performed various experiments to find the best feature extractor, then fitted a logistic regression using an approach suggested in SpellRuEval, and then used it iteratively to get a better result. We have then measured the resulting performance by means of cross-validation, compared at against a baseline and observed a substantial increase. We also present an interpretation of the result achieved by calculating and discussing the importance of specific features and analyzing the output of the model.

AB - The problem of spelling correction is crucial for search engines as misspellings have a negative effect on their performance. It gets even harder when search queries are related to a specific area not quite covered by standard spell checkers, such as geographic information systems (GIS). Moreover, standard spell-checkers are interactive, i. e. they can notice a misspelled word and suggest candidate corrections, but picking one of them is up to the user. This is why we decided to develop a spelling correction unit for 2GIS, a cartographic search company. To do this, we have extracted and manually annotated a corpus of GIS lookup queries, trained a language model, performed various experiments to find the best feature extractor, then fitted a logistic regression using an approach suggested in SpellRuEval, and then used it iteratively to get a better result. We have then measured the resulting performance by means of cross-validation, compared at against a baseline and observed a substantial increase. We also present an interpretation of the result achieved by calculating and discussing the importance of specific features and analyzing the output of the model.

KW - Geographic information system

KW - Language model

KW - Local search

KW - Spell checker

KW - Text corpus

UR - http://www.scopus.com/inward/record.url?scp=85058037460&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:85058037460

SP - 200

EP - 210

JO - Компьютерная лингвистика и интеллектуальные технологии

JF - Компьютерная лингвистика и интеллектуальные технологии

SN - 2221-7932

IS - 17

ER -

ID: 27546984