Результаты исследований: Научные публикации в периодических изданиях › статья › Рецензирование
NEREL: a Russian information extraction dataset with rich annotation for nested entities, relations, and wikidata entity links. / Loukachevitch, Natalia; Artemova, Ekaterina; Batura, Tatiana и др.
в: Language Resources and Evaluation, Том 58, № 2, 06.2024, стр. 547-583.Результаты исследований: Научные публикации в периодических изданиях › статья › Рецензирование
}
TY - JOUR
T1 - NEREL: a Russian information extraction dataset with rich annotation for nested entities, relations, and wikidata entity links
AU - Loukachevitch, Natalia
AU - Artemova, Ekaterina
AU - Batura, Tatiana
AU - Braslavski, Pavel
AU - Ivanov, Vladimir
AU - Manandhar, Suresh
AU - Pugachev, Alexander
AU - Rozhkov, Igor
AU - Shelmanov, Artem
AU - Tutubalina, Elena
AU - Yandutov, Alexey
N1 - The work is supported by a grant for research centers in the field of artificial intelligence, provided by the Analytical Center for the Government of the Russian Federation in accordance with the subsidy agreement (agreement identifier 000000D730321P5Q0002) and the agreement with the Ivannikov Institute for System Programming of the Russian Academy of Sciences dated November 2, 2021 No. 70-2021-00142. Публикация для корректировки.
PY - 2024/6
Y1 - 2024/6
N2 - This paper describes NEREL—a Russian news dataset suited for three tasks: nested named entity recognition, relation extraction, and entity linking. Compared to flat entities, nested named entities provide a richer and more complete annotation while also increasing the coverage of relations annotation and entity linking. Relations between nested named entities may cross entity boundaries to connect to shorter entities nested within longer ones, which makes it harder to detect such relations. NEREL is currently the largest Russian dataset annotated with entities and relations: it comprises 29 named entity types and 49 relation types. At the time of writing, the dataset contains 56 K named entities and 39 K relations annotated in 933 person-oriented news articles. NEREL is annotated with relations at three levels: (1) within nested named entities, (2) within sentences, and (3) with relations crossing sentence boundaries. We provide benchmark evaluation of current state-of-the-art methods in all three tasks. The dataset is freely available at https://github.com/nerel-ds/NEREL .
AB - This paper describes NEREL—a Russian news dataset suited for three tasks: nested named entity recognition, relation extraction, and entity linking. Compared to flat entities, nested named entities provide a richer and more complete annotation while also increasing the coverage of relations annotation and entity linking. Relations between nested named entities may cross entity boundaries to connect to shorter entities nested within longer ones, which makes it harder to detect such relations. NEREL is currently the largest Russian dataset annotated with entities and relations: it comprises 29 named entity types and 49 relation types. At the time of writing, the dataset contains 56 K named entities and 39 K relations annotated in 933 person-oriented news articles. NEREL is annotated with relations at three levels: (1) within nested named entities, (2) within sentences, and (3) with relations crossing sentence boundaries. We provide benchmark evaluation of current state-of-the-art methods in all three tasks. The dataset is freely available at https://github.com/nerel-ds/NEREL .
KW - Entity linking
KW - Named entity recognition
KW - Nested entities
KW - Nested relations
KW - Relation extraction
KW - 68T35
KW - 68T50
UR - https://www.scopus.com/record/display.uri?eid=2-s2.0-85171757390&origin=inward&txGid=1b19e870f0d3cb73630957ea4947ec92
UR - https://www.mendeley.com/catalogue/870f1879-7284-36d6-8eba-b3d8c378a277/
U2 - 10.1007/s10579-023-09674-z
DO - 10.1007/s10579-023-09674-z
M3 - Article
VL - 58
SP - 547
EP - 583
JO - Language Resources and Evaluation
JF - Language Resources and Evaluation
SN - 1574-0218
IS - 2
ER -
ID: 59174838