Standard

A study of machine learning algorithms applied to GIS queries spelling correction. / Fomin, V. V.; Bondarenko, I. Yu.

в: Komp'juternaja Lingvistika i Intellektual'nye Tehnologii, Том 2018-May, № 17, 01.01.2018, стр. 185-199.

Результаты исследований: Научные публикации в периодических изданияхстатья по материалам конференцииРецензирование

Harvard

Fomin, VV & Bondarenko, IY 2018, 'A study of machine learning algorithms applied to GIS queries spelling correction', Komp'juternaja Lingvistika i Intellektual'nye Tehnologii, Том. 2018-May, № 17, стр. 185-199. <http://www.dialog-21.ru/media/4540/fominvv_bondarenkoiyu.pdf>

APA

Vancouver

Fomin VV, Bondarenko IY. A study of machine learning algorithms applied to GIS queries spelling correction. Komp'juternaja Lingvistika i Intellektual'nye Tehnologii. 2018 янв. 1;2018-May(17):185-199.

Author

Fomin, V. V. ; Bondarenko, I. Yu. / A study of machine learning algorithms applied to GIS queries spelling correction. в: Komp'juternaja Lingvistika i Intellektual'nye Tehnologii. 2018 ; Том 2018-May, № 17. стр. 185-199.

BibTeX

@article{aae741238779428185603800d524d31a,
title = "A study of machine learning algorithms applied to GIS queries spelling correction",
abstract = "The problem of spelling correction is crucial for search engines as misspellings have a negative effect on their performance. It gets even harder when search queries are related to a specific area not quite covered by standard spell checkers, such as geographic information systems (GIS). Moreover, standard spell-checkers are interactive, i. e. they can notice a misspelled word and suggest candidate corrections, but picking one of them is up to the user. This is why we decided to develop a spelling correction unit for 2GIS, a cartographic search company. To do this, we have extracted and manually annotated a corpus of GIS lookup queries, trained a language model, performed various experiments to find the best feature extractor, then fitted a logistic regression using an approach suggested in SpellRuEval, and then used it iteratively to get a better result. We have then measured the resulting performance by means of cross-validation, compared at against a baseline and observed a substantial increase. We also present an interpretation of the result achieved by calculating and discussing the importance of specific features and analyzing the output of the model.",
keywords = "Geographic information system, Language model, Local search, Spell checker, Text corpus",
author = "Fomin, {V. V.} and Bondarenko, {I. Yu}",
year = "2018",
month = jan,
day = "1",
language = "English",
volume = "2018-May",
pages = "185--199",
journal = "Компьютерная лингвистика и интеллектуальные технологии",
issn = "2221-7932",
publisher = "Komp'juternaja Lingvistika i Intellektual'nye Tehnologii",
number = "17",
note = "2018 International Conference on Computational Linguistics and Intellectual Technologies, Dialogue 2018 ; Conference date: 30-05-2018 Through 02-06-2018",

}

RIS

TY - JOUR

T1 - A study of machine learning algorithms applied to GIS queries spelling correction

AU - Fomin, V. V.

AU - Bondarenko, I. Yu

PY - 2018/1/1

Y1 - 2018/1/1

N2 - The problem of spelling correction is crucial for search engines as misspellings have a negative effect on their performance. It gets even harder when search queries are related to a specific area not quite covered by standard spell checkers, such as geographic information systems (GIS). Moreover, standard spell-checkers are interactive, i. e. they can notice a misspelled word and suggest candidate corrections, but picking one of them is up to the user. This is why we decided to develop a spelling correction unit for 2GIS, a cartographic search company. To do this, we have extracted and manually annotated a corpus of GIS lookup queries, trained a language model, performed various experiments to find the best feature extractor, then fitted a logistic regression using an approach suggested in SpellRuEval, and then used it iteratively to get a better result. We have then measured the resulting performance by means of cross-validation, compared at against a baseline and observed a substantial increase. We also present an interpretation of the result achieved by calculating and discussing the importance of specific features and analyzing the output of the model.

AB - The problem of spelling correction is crucial for search engines as misspellings have a negative effect on their performance. It gets even harder when search queries are related to a specific area not quite covered by standard spell checkers, such as geographic information systems (GIS). Moreover, standard spell-checkers are interactive, i. e. they can notice a misspelled word and suggest candidate corrections, but picking one of them is up to the user. This is why we decided to develop a spelling correction unit for 2GIS, a cartographic search company. To do this, we have extracted and manually annotated a corpus of GIS lookup queries, trained a language model, performed various experiments to find the best feature extractor, then fitted a logistic regression using an approach suggested in SpellRuEval, and then used it iteratively to get a better result. We have then measured the resulting performance by means of cross-validation, compared at against a baseline and observed a substantial increase. We also present an interpretation of the result achieved by calculating and discussing the importance of specific features and analyzing the output of the model.

KW - Geographic information system

KW - Language model

KW - Local search

KW - Spell checker

KW - Text corpus

UR - http://www.scopus.com/inward/record.url?scp=85051246341&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85051246341

VL - 2018-May

SP - 185

EP - 199

JO - Компьютерная лингвистика и интеллектуальные технологии

JF - Компьютерная лингвистика и интеллектуальные технологии

SN - 2221-7932

IS - 17

T2 - 2018 International Conference on Computational Linguistics and Intellectual Technologies, Dialogue 2018

Y2 - 30 May 2018 through 2 June 2018

ER -

ID: 16113183