Standard
RuREBus: A Case Study of Joint Named Entity Recognition and Relation Extraction from E-Government Domain. / Ivanin, Vitaly; Artemova, Ekaterina; Batura, Tatiana et al.
Analysis of Images, Social Networks and Texts - 9th International Conference, AIST 2020, Revised Selected Papers. ed. / Wil M. van der Aalst; Vladimir Batagelj; Dmitry I. Ignatov; Michael Khachay; Olessia Koltsova; Andrey Kutuzov; Sergei O. Kuznetsov; Irina A. Lomazova; Natalia Loukachevitch; Amedeo Napoli; Alexander Panchenko; Panos M. Pardalos; Marcello Pelillo; Andrey V. Savchenko; Elena Tutubalina. Springer Science and Business Media Deutschland GmbH, 2021. p. 19-27 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 12602 LNCS).
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
Harvard
Ivanin, V, Artemova, E, Batura, T, Ivanov, V, Sarkisyan, V, Tutubalina, E & Smurov, I 2021,
RuREBus: A Case Study of Joint Named Entity Recognition and Relation Extraction from E-Government Domain. in WM van der Aalst, V Batagelj, DI Ignatov, M Khachay, O Koltsova, A Kutuzov, SO Kuznetsov, IA Lomazova, N Loukachevitch, A Napoli, A Panchenko, PM Pardalos, M Pelillo, AV Savchenko & E Tutubalina (eds),
Analysis of Images, Social Networks and Texts - 9th International Conference, AIST 2020, Revised Selected Papers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12602 LNCS, Springer Science and Business Media Deutschland GmbH, pp. 19-27, 9th International Conference on Analysis of Images, Social Networks and Texts, AIST 2020, Moscow, Russian Federation,
15.10.2020.
https://doi.org/10.1007/978-3-030-72610-2_2
APA
Ivanin, V., Artemova, E., Batura, T., Ivanov, V., Sarkisyan, V., Tutubalina, E., & Smurov, I. (2021).
RuREBus: A Case Study of Joint Named Entity Recognition and Relation Extraction from E-Government Domain. In W. M. van der Aalst, V. Batagelj, D. I. Ignatov, M. Khachay, O. Koltsova, A. Kutuzov, S. O. Kuznetsov, I. A. Lomazova, N. Loukachevitch, A. Napoli, A. Panchenko, P. M. Pardalos, M. Pelillo, A. V. Savchenko, & E. Tutubalina (Eds.),
Analysis of Images, Social Networks and Texts - 9th International Conference, AIST 2020, Revised Selected Papers (pp. 19-27). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 12602 LNCS). Springer Science and Business Media Deutschland GmbH.
https://doi.org/10.1007/978-3-030-72610-2_2
Vancouver
Ivanin V, Artemova E, Batura T, Ivanov V, Sarkisyan V, Tutubalina E et al.
RuREBus: A Case Study of Joint Named Entity Recognition and Relation Extraction from E-Government Domain. In van der Aalst WM, Batagelj V, Ignatov DI, Khachay M, Koltsova O, Kutuzov A, Kuznetsov SO, Lomazova IA, Loukachevitch N, Napoli A, Panchenko A, Pardalos PM, Pelillo M, Savchenko AV, Tutubalina E, editors, Analysis of Images, Social Networks and Texts - 9th International Conference, AIST 2020, Revised Selected Papers. Springer Science and Business Media Deutschland GmbH. 2021. p. 19-27. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-030-72610-2_2
Author
Ivanin, Vitaly ; Artemova, Ekaterina ; Batura, Tatiana et al. /
RuREBus: A Case Study of Joint Named Entity Recognition and Relation Extraction from E-Government Domain. Analysis of Images, Social Networks and Texts - 9th International Conference, AIST 2020, Revised Selected Papers. editor / Wil M. van der Aalst ; Vladimir Batagelj ; Dmitry I. Ignatov ; Michael Khachay ; Olessia Koltsova ; Andrey Kutuzov ; Sergei O. Kuznetsov ; Irina A. Lomazova ; Natalia Loukachevitch ; Amedeo Napoli ; Alexander Panchenko ; Panos M. Pardalos ; Marcello Pelillo ; Andrey V. Savchenko ; Elena Tutubalina. Springer Science and Business Media Deutschland GmbH, 2021. pp. 19-27 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
BibTeX
@inproceedings{5b46be85c08f4a139ebb38fd9adfbeed,
title = "RuREBus: A Case Study of Joint Named Entity Recognition and Relation Extraction from E-Government Domain",
abstract = "We show-case an application of information extraction methods, such as named entity recognition (NER) and relation extraction (RE) to a novel corpus, consisting of documents, issued by a state agency. The main challenges of this corpus are: 1) the annotation scheme differs greatly from the one used for the general domain corpora, and 2) the documents are written in a language other than English. Unlike expectations, the state-of-the-art transformer-based models show modest performance for both tasks, either when approached sequentially, or in an end-to-end fashion. Our experiments have demonstrated that fine-tuning on a large unlabeled corpora does not automatically yield significant improvement and thus we may conclude that more sophisticated strategies of leveraging unlabelled texts are demanded. In this paper, we describe the whole developed pipeline, starting from text annotation, baseline development, and designing a shared task in hopes of improving the baseline. Eventually, we realize that the current NER and RE technologies are far from being mature and do not overcome so far challenges like ours.",
keywords = "Information extraction, Named entity recognition, Relation extraction",
author = "Vitaly Ivanin and Ekaterina Artemova and Tatiana Batura and Vladimir Ivanov and Veronika Sarkisyan and Elena Tutubalina and Ivan Smurov",
note = "Publisher Copyright: {\textcopyright} 2021, Springer Nature Switzerland AG. Copyright: Copyright 2021 Elsevier B.V., All rights reserved.; 9th International Conference on Analysis of Images, Social Networks and Texts, AIST 2020 ; Conference date: 15-10-2020 Through 16-10-2020",
year = "2021",
doi = "10.1007/978-3-030-72610-2_2",
language = "English",
isbn = "9783030726096",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Science and Business Media Deutschland GmbH",
pages = "19--27",
editor = "{van der Aalst}, {Wil M.} and Vladimir Batagelj and Ignatov, {Dmitry I.} and Michael Khachay and Olessia Koltsova and Andrey Kutuzov and Kuznetsov, {Sergei O.} and Lomazova, {Irina A.} and Natalia Loukachevitch and Amedeo Napoli and Alexander Panchenko and Pardalos, {Panos M.} and Marcello Pelillo and Savchenko, {Andrey V.} and Elena Tutubalina",
booktitle = "Analysis of Images, Social Networks and Texts - 9th International Conference, AIST 2020, Revised Selected Papers",
address = "Germany",
}
RIS
TY - GEN
T1 - RuREBus: A Case Study of Joint Named Entity Recognition and Relation Extraction from E-Government Domain
AU - Ivanin, Vitaly
AU - Artemova, Ekaterina
AU - Batura, Tatiana
AU - Ivanov, Vladimir
AU - Sarkisyan, Veronika
AU - Tutubalina, Elena
AU - Smurov, Ivan
N1 - Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.
PY - 2021
Y1 - 2021
N2 - We show-case an application of information extraction methods, such as named entity recognition (NER) and relation extraction (RE) to a novel corpus, consisting of documents, issued by a state agency. The main challenges of this corpus are: 1) the annotation scheme differs greatly from the one used for the general domain corpora, and 2) the documents are written in a language other than English. Unlike expectations, the state-of-the-art transformer-based models show modest performance for both tasks, either when approached sequentially, or in an end-to-end fashion. Our experiments have demonstrated that fine-tuning on a large unlabeled corpora does not automatically yield significant improvement and thus we may conclude that more sophisticated strategies of leveraging unlabelled texts are demanded. In this paper, we describe the whole developed pipeline, starting from text annotation, baseline development, and designing a shared task in hopes of improving the baseline. Eventually, we realize that the current NER and RE technologies are far from being mature and do not overcome so far challenges like ours.
AB - We show-case an application of information extraction methods, such as named entity recognition (NER) and relation extraction (RE) to a novel corpus, consisting of documents, issued by a state agency. The main challenges of this corpus are: 1) the annotation scheme differs greatly from the one used for the general domain corpora, and 2) the documents are written in a language other than English. Unlike expectations, the state-of-the-art transformer-based models show modest performance for both tasks, either when approached sequentially, or in an end-to-end fashion. Our experiments have demonstrated that fine-tuning on a large unlabeled corpora does not automatically yield significant improvement and thus we may conclude that more sophisticated strategies of leveraging unlabelled texts are demanded. In this paper, we describe the whole developed pipeline, starting from text annotation, baseline development, and designing a shared task in hopes of improving the baseline. Eventually, we realize that the current NER and RE technologies are far from being mature and do not overcome so far challenges like ours.
KW - Information extraction
KW - Named entity recognition
KW - Relation extraction
UR - http://www.scopus.com/inward/record.url?scp=85104807259&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-72610-2_2
DO - 10.1007/978-3-030-72610-2_2
M3 - Conference contribution
AN - SCOPUS:85104807259
SN - 9783030726096
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 19
EP - 27
BT - Analysis of Images, Social Networks and Texts - 9th International Conference, AIST 2020, Revised Selected Papers
A2 - van der Aalst, Wil M.
A2 - Batagelj, Vladimir
A2 - Ignatov, Dmitry I.
A2 - Khachay, Michael
A2 - Koltsova, Olessia
A2 - Kutuzov, Andrey
A2 - Kuznetsov, Sergei O.
A2 - Lomazova, Irina A.
A2 - Loukachevitch, Natalia
A2 - Napoli, Amedeo
A2 - Panchenko, Alexander
A2 - Pardalos, Panos M.
A2 - Pelillo, Marcello
A2 - Savchenko, Andrey V.
A2 - Tutubalina, Elena
PB - Springer Science and Business Media Deutschland GmbH
T2 - 9th International Conference on Analysis of Images, Social Networks and Texts, AIST 2020
Y2 - 15 October 2020 through 16 October 2020
ER -