Standard

The New Version of the ANDDigest Tool with Improved AI-Based Short Names Recognition. / Ivanisenko, Timofey V.; Demenkov, Pavel S.; Kolchanov, Nikolay A. et al.

In: International Journal of Molecular Sciences, Vol. 23, No. 23, 14934, 12.2022.

Research output: Contribution to journalArticlepeer-review

Harvard

APA

Vancouver

Ivanisenko TV, Demenkov PS, Kolchanov NA, Ivanisenko VA. The New Version of the ANDDigest Tool with Improved AI-Based Short Names Recognition. International Journal of Molecular Sciences. 2022 Dec;23(23):14934. doi: 10.3390/ijms232314934

Author

Ivanisenko, Timofey V. ; Demenkov, Pavel S. ; Kolchanov, Nikolay A. et al. / The New Version of the ANDDigest Tool with Improved AI-Based Short Names Recognition. In: International Journal of Molecular Sciences. 2022 ; Vol. 23, No. 23.

BibTeX

@article{a4eb1af501bb45ee97a8e11c5408de5c,
title = "The New Version of the ANDDigest Tool with Improved AI-Based Short Names Recognition",
abstract = "The body of scientific literature continues to grow annually. Over 1.5 million abstracts of biomedical publications were added to the PubMed database in 2021. Therefore, developing cognitive systems that provide a specialized search for information in scientific publications based on subject area ontology and modern artificial intelligence methods is urgently needed. We previously developed a web-based information retrieval system, ANDDigest, designed to search and analyze information in the PubMed database using a customized domain ontology. This paper presents an improved ANDDigest version that uses fine-tuned PubMedBERT classifiers to enhance the quality of short name recognition for molecular-genetics entities in PubMed abstracts on eight biological object types: cell components, diseases, side effects, genes, proteins, pathways, drugs, and metabolites. This approach increased average short name recognition accuracy by 13%.",
keywords = "ANDDigest, ANDSystem, machine learning, named entity recognition, PubMedBERT, text-mining, Proteins, Data Mining/methods, Artificial Intelligence, PubMed, Databases, Factual",
author = "Ivanisenko, {Timofey V.} and Demenkov, {Pavel S.} and Kolchanov, {Nikolay A.} and Ivanisenko, {Vladimir A.}",
note = "Funding Information: The study was funded by the Ministry of Science and Higher Education of the Russian Federation project “Kurchatov Center for World-Class Genomic Research” No. 075-15-2019-1662 from 2019-10-31. Publisher Copyright: {\textcopyright} 2022 by the authors.",
year = "2022",
month = dec,
doi = "10.3390/ijms232314934",
language = "English",
volume = "23",
journal = "International Journal of Molecular Sciences",
issn = "1661-6596",
publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",
number = "23",

}

RIS

TY - JOUR

T1 - The New Version of the ANDDigest Tool with Improved AI-Based Short Names Recognition

AU - Ivanisenko, Timofey V.

AU - Demenkov, Pavel S.

AU - Kolchanov, Nikolay A.

AU - Ivanisenko, Vladimir A.

N1 - Funding Information: The study was funded by the Ministry of Science and Higher Education of the Russian Federation project “Kurchatov Center for World-Class Genomic Research” No. 075-15-2019-1662 from 2019-10-31. Publisher Copyright: © 2022 by the authors.

PY - 2022/12

Y1 - 2022/12

N2 - The body of scientific literature continues to grow annually. Over 1.5 million abstracts of biomedical publications were added to the PubMed database in 2021. Therefore, developing cognitive systems that provide a specialized search for information in scientific publications based on subject area ontology and modern artificial intelligence methods is urgently needed. We previously developed a web-based information retrieval system, ANDDigest, designed to search and analyze information in the PubMed database using a customized domain ontology. This paper presents an improved ANDDigest version that uses fine-tuned PubMedBERT classifiers to enhance the quality of short name recognition for molecular-genetics entities in PubMed abstracts on eight biological object types: cell components, diseases, side effects, genes, proteins, pathways, drugs, and metabolites. This approach increased average short name recognition accuracy by 13%.

AB - The body of scientific literature continues to grow annually. Over 1.5 million abstracts of biomedical publications were added to the PubMed database in 2021. Therefore, developing cognitive systems that provide a specialized search for information in scientific publications based on subject area ontology and modern artificial intelligence methods is urgently needed. We previously developed a web-based information retrieval system, ANDDigest, designed to search and analyze information in the PubMed database using a customized domain ontology. This paper presents an improved ANDDigest version that uses fine-tuned PubMedBERT classifiers to enhance the quality of short name recognition for molecular-genetics entities in PubMed abstracts on eight biological object types: cell components, diseases, side effects, genes, proteins, pathways, drugs, and metabolites. This approach increased average short name recognition accuracy by 13%.

KW - ANDDigest

KW - ANDSystem

KW - machine learning

KW - named entity recognition

KW - PubMedBERT

KW - text-mining

KW - Proteins

KW - Data Mining/methods

KW - Artificial Intelligence

KW - PubMed

KW - Databases, Factual

UR - http://www.scopus.com/inward/record.url?scp=85143756543&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/07fa24d3-c69d-34c2-8c9e-0ecd5f5ecd2e/

U2 - 10.3390/ijms232314934

DO - 10.3390/ijms232314934

M3 - Article

C2 - 36499269

AN - SCOPUS:85143756543

VL - 23

JO - International Journal of Molecular Sciences

JF - International Journal of Molecular Sciences

SN - 1661-6596

IS - 23

M1 - 14934

ER -

ID: 40915139