Research output: Contribution to journal › Article › peer-review
The New Version of the ANDDigest Tool with Improved AI-Based Short Names Recognition. / Ivanisenko, Timofey V.; Demenkov, Pavel S.; Kolchanov, Nikolay A. et al.
In: International Journal of Molecular Sciences, Vol. 23, No. 23, 14934, 12.2022.Research output: Contribution to journal › Article › peer-review
}
TY - JOUR
T1 - The New Version of the ANDDigest Tool with Improved AI-Based Short Names Recognition
AU - Ivanisenko, Timofey V.
AU - Demenkov, Pavel S.
AU - Kolchanov, Nikolay A.
AU - Ivanisenko, Vladimir A.
N1 - Funding Information: The study was funded by the Ministry of Science and Higher Education of the Russian Federation project “Kurchatov Center for World-Class Genomic Research” No. 075-15-2019-1662 from 2019-10-31. Publisher Copyright: © 2022 by the authors.
PY - 2022/12
Y1 - 2022/12
N2 - The body of scientific literature continues to grow annually. Over 1.5 million abstracts of biomedical publications were added to the PubMed database in 2021. Therefore, developing cognitive systems that provide a specialized search for information in scientific publications based on subject area ontology and modern artificial intelligence methods is urgently needed. We previously developed a web-based information retrieval system, ANDDigest, designed to search and analyze information in the PubMed database using a customized domain ontology. This paper presents an improved ANDDigest version that uses fine-tuned PubMedBERT classifiers to enhance the quality of short name recognition for molecular-genetics entities in PubMed abstracts on eight biological object types: cell components, diseases, side effects, genes, proteins, pathways, drugs, and metabolites. This approach increased average short name recognition accuracy by 13%.
AB - The body of scientific literature continues to grow annually. Over 1.5 million abstracts of biomedical publications were added to the PubMed database in 2021. Therefore, developing cognitive systems that provide a specialized search for information in scientific publications based on subject area ontology and modern artificial intelligence methods is urgently needed. We previously developed a web-based information retrieval system, ANDDigest, designed to search and analyze information in the PubMed database using a customized domain ontology. This paper presents an improved ANDDigest version that uses fine-tuned PubMedBERT classifiers to enhance the quality of short name recognition for molecular-genetics entities in PubMed abstracts on eight biological object types: cell components, diseases, side effects, genes, proteins, pathways, drugs, and metabolites. This approach increased average short name recognition accuracy by 13%.
KW - ANDDigest
KW - ANDSystem
KW - machine learning
KW - named entity recognition
KW - PubMedBERT
KW - text-mining
KW - Proteins
KW - Data Mining/methods
KW - Artificial Intelligence
KW - PubMed
KW - Databases, Factual
UR - http://www.scopus.com/inward/record.url?scp=85143756543&partnerID=8YFLogxK
UR - https://www.mendeley.com/catalogue/07fa24d3-c69d-34c2-8c9e-0ecd5f5ecd2e/
U2 - 10.3390/ijms232314934
DO - 10.3390/ijms232314934
M3 - Article
C2 - 36499269
AN - SCOPUS:85143756543
VL - 23
JO - International Journal of Molecular Sciences
JF - International Journal of Molecular Sciences
SN - 1661-6596
IS - 23
M1 - 14934
ER -
ID: 40915139