Standard

RuREBus: A Case Study of Joint Named Entity Recognition and Relation Extraction from E-Government Domain. / Ivanin, Vitaly; Artemova, Ekaterina; Batura, Tatiana et al.

Analysis of Images, Social Networks and Texts - 9th International Conference, AIST 2020, Revised Selected Papers. ed. / Wil M. van der Aalst; Vladimir Batagelj; Dmitry I. Ignatov; Michael Khachay; Olessia Koltsova; Andrey Kutuzov; Sergei O. Kuznetsov; Irina A. Lomazova; Natalia Loukachevitch; Amedeo Napoli; Alexander Panchenko; Panos M. Pardalos; Marcello Pelillo; Andrey V. Savchenko; Elena Tutubalina. Springer Science and Business Media Deutschland GmbH, 2021. p. 19-27 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 12602 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contributionResearchpeer-review

Harvard

Ivanin, V, Artemova, E, Batura, T, Ivanov, V, Sarkisyan, V, Tutubalina, E & Smurov, I 2021, RuREBus: A Case Study of Joint Named Entity Recognition and Relation Extraction from E-Government Domain. in WM van der Aalst, V Batagelj, DI Ignatov, M Khachay, O Koltsova, A Kutuzov, SO Kuznetsov, IA Lomazova, N Loukachevitch, A Napoli, A Panchenko, PM Pardalos, M Pelillo, AV Savchenko & E Tutubalina (eds), Analysis of Images, Social Networks and Texts - 9th International Conference, AIST 2020, Revised Selected Papers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12602 LNCS, Springer Science and Business Media Deutschland GmbH, pp. 19-27, 9th International Conference on Analysis of Images, Social Networks and Texts, AIST 2020, Moscow, Russian Federation, 15.10.2020. https://doi.org/10.1007/978-3-030-72610-2_2

APA

Ivanin, V., Artemova, E., Batura, T., Ivanov, V., Sarkisyan, V., Tutubalina, E., & Smurov, I. (2021). RuREBus: A Case Study of Joint Named Entity Recognition and Relation Extraction from E-Government Domain. In W. M. van der Aalst, V. Batagelj, D. I. Ignatov, M. Khachay, O. Koltsova, A. Kutuzov, S. O. Kuznetsov, I. A. Lomazova, N. Loukachevitch, A. Napoli, A. Panchenko, P. M. Pardalos, M. Pelillo, A. V. Savchenko, & E. Tutubalina (Eds.), Analysis of Images, Social Networks and Texts - 9th International Conference, AIST 2020, Revised Selected Papers (pp. 19-27). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 12602 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-72610-2_2

Vancouver

Ivanin V, Artemova E, Batura T, Ivanov V, Sarkisyan V, Tutubalina E et al. RuREBus: A Case Study of Joint Named Entity Recognition and Relation Extraction from E-Government Domain. In van der Aalst WM, Batagelj V, Ignatov DI, Khachay M, Koltsova O, Kutuzov A, Kuznetsov SO, Lomazova IA, Loukachevitch N, Napoli A, Panchenko A, Pardalos PM, Pelillo M, Savchenko AV, Tutubalina E, editors, Analysis of Images, Social Networks and Texts - 9th International Conference, AIST 2020, Revised Selected Papers. Springer Science and Business Media Deutschland GmbH. 2021. p. 19-27. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-030-72610-2_2

Author

Ivanin, Vitaly ; Artemova, Ekaterina ; Batura, Tatiana et al. / RuREBus: A Case Study of Joint Named Entity Recognition and Relation Extraction from E-Government Domain. Analysis of Images, Social Networks and Texts - 9th International Conference, AIST 2020, Revised Selected Papers. editor / Wil M. van der Aalst ; Vladimir Batagelj ; Dmitry I. Ignatov ; Michael Khachay ; Olessia Koltsova ; Andrey Kutuzov ; Sergei O. Kuznetsov ; Irina A. Lomazova ; Natalia Loukachevitch ; Amedeo Napoli ; Alexander Panchenko ; Panos M. Pardalos ; Marcello Pelillo ; Andrey V. Savchenko ; Elena Tutubalina. Springer Science and Business Media Deutschland GmbH, 2021. pp. 19-27 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

BibTeX

@inproceedings{5b46be85c08f4a139ebb38fd9adfbeed,
title = "RuREBus: A Case Study of Joint Named Entity Recognition and Relation Extraction from E-Government Domain",
abstract = "We show-case an application of information extraction methods, such as named entity recognition (NER) and relation extraction (RE) to a novel corpus, consisting of documents, issued by a state agency. The main challenges of this corpus are: 1) the annotation scheme differs greatly from the one used for the general domain corpora, and 2) the documents are written in a language other than English. Unlike expectations, the state-of-the-art transformer-based models show modest performance for both tasks, either when approached sequentially, or in an end-to-end fashion. Our experiments have demonstrated that fine-tuning on a large unlabeled corpora does not automatically yield significant improvement and thus we may conclude that more sophisticated strategies of leveraging unlabelled texts are demanded. In this paper, we describe the whole developed pipeline, starting from text annotation, baseline development, and designing a shared task in hopes of improving the baseline. Eventually, we realize that the current NER and RE technologies are far from being mature and do not overcome so far challenges like ours.",
keywords = "Information extraction, Named entity recognition, Relation extraction",
author = "Vitaly Ivanin and Ekaterina Artemova and Tatiana Batura and Vladimir Ivanov and Veronika Sarkisyan and Elena Tutubalina and Ivan Smurov",
note = "Publisher Copyright: {\textcopyright} 2021, Springer Nature Switzerland AG. Copyright: Copyright 2021 Elsevier B.V., All rights reserved.; 9th International Conference on Analysis of Images, Social Networks and Texts, AIST 2020 ; Conference date: 15-10-2020 Through 16-10-2020",
year = "2021",
doi = "10.1007/978-3-030-72610-2_2",
language = "English",
isbn = "9783030726096",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Science and Business Media Deutschland GmbH",
pages = "19--27",
editor = "{van der Aalst}, {Wil M.} and Vladimir Batagelj and Ignatov, {Dmitry I.} and Michael Khachay and Olessia Koltsova and Andrey Kutuzov and Kuznetsov, {Sergei O.} and Lomazova, {Irina A.} and Natalia Loukachevitch and Amedeo Napoli and Alexander Panchenko and Pardalos, {Panos M.} and Marcello Pelillo and Savchenko, {Andrey V.} and Elena Tutubalina",
booktitle = "Analysis of Images, Social Networks and Texts - 9th International Conference, AIST 2020, Revised Selected Papers",
address = "Germany",

}

RIS

TY - GEN

T1 - RuREBus: A Case Study of Joint Named Entity Recognition and Relation Extraction from E-Government Domain

AU - Ivanin, Vitaly

AU - Artemova, Ekaterina

AU - Batura, Tatiana

AU - Ivanov, Vladimir

AU - Sarkisyan, Veronika

AU - Tutubalina, Elena

AU - Smurov, Ivan

N1 - Publisher Copyright: © 2021, Springer Nature Switzerland AG. Copyright: Copyright 2021 Elsevier B.V., All rights reserved.

PY - 2021

Y1 - 2021

N2 - We show-case an application of information extraction methods, such as named entity recognition (NER) and relation extraction (RE) to a novel corpus, consisting of documents, issued by a state agency. The main challenges of this corpus are: 1) the annotation scheme differs greatly from the one used for the general domain corpora, and 2) the documents are written in a language other than English. Unlike expectations, the state-of-the-art transformer-based models show modest performance for both tasks, either when approached sequentially, or in an end-to-end fashion. Our experiments have demonstrated that fine-tuning on a large unlabeled corpora does not automatically yield significant improvement and thus we may conclude that more sophisticated strategies of leveraging unlabelled texts are demanded. In this paper, we describe the whole developed pipeline, starting from text annotation, baseline development, and designing a shared task in hopes of improving the baseline. Eventually, we realize that the current NER and RE technologies are far from being mature and do not overcome so far challenges like ours.

AB - We show-case an application of information extraction methods, such as named entity recognition (NER) and relation extraction (RE) to a novel corpus, consisting of documents, issued by a state agency. The main challenges of this corpus are: 1) the annotation scheme differs greatly from the one used for the general domain corpora, and 2) the documents are written in a language other than English. Unlike expectations, the state-of-the-art transformer-based models show modest performance for both tasks, either when approached sequentially, or in an end-to-end fashion. Our experiments have demonstrated that fine-tuning on a large unlabeled corpora does not automatically yield significant improvement and thus we may conclude that more sophisticated strategies of leveraging unlabelled texts are demanded. In this paper, we describe the whole developed pipeline, starting from text annotation, baseline development, and designing a shared task in hopes of improving the baseline. Eventually, we realize that the current NER and RE technologies are far from being mature and do not overcome so far challenges like ours.

KW - Information extraction

KW - Named entity recognition

KW - Relation extraction

UR - http://www.scopus.com/inward/record.url?scp=85104807259&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-72610-2_2

DO - 10.1007/978-3-030-72610-2_2

M3 - Conference contribution

AN - SCOPUS:85104807259

SN - 9783030726096

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 19

EP - 27

BT - Analysis of Images, Social Networks and Texts - 9th International Conference, AIST 2020, Revised Selected Papers

A2 - van der Aalst, Wil M.

A2 - Batagelj, Vladimir

A2 - Ignatov, Dmitry I.

A2 - Khachay, Michael

A2 - Koltsova, Olessia

A2 - Kutuzov, Andrey

A2 - Kuznetsov, Sergei O.

A2 - Lomazova, Irina A.

A2 - Loukachevitch, Natalia

A2 - Napoli, Amedeo

A2 - Panchenko, Alexander

A2 - Pardalos, Panos M.

A2 - Pelillo, Marcello

A2 - Savchenko, Andrey V.

A2 - Tutubalina, Elena

PB - Springer Science and Business Media Deutschland GmbH

T2 - 9th International Conference on Analysis of Images, Social Networks and Texts, AIST 2020

Y2 - 15 October 2020 through 16 October 2020

ER -

ID: 28499418