Результаты исследований: Научные публикации в периодических изданиях › статья › Рецензирование
Hierarchical Classification of Scientific Articles Using Deep Learning (Using the UDC Hierarchy As an Example). / Мамедов, Валентин Юрьевич; Ковалевский, Данил Анатольевич; Морозов, Дмитрий Алексеевич и др.
в: Automatic Control and Computer Sciences, Том 59, № 7, 12.2025, стр. 1181-1192.Результаты исследований: Научные публикации в периодических изданиях › статья › Рецензирование
}
TY - JOUR
T1 - Hierarchical Classification of Scientific Articles Using Deep Learning (Using the UDC Hierarchy As an Example)
AU - Мамедов, Валентин Юрьевич
AU - Ковалевский, Данил Анатольевич
AU - Морозов, Дмитрий Алексеевич
AU - Столяров, Степан Сергеевич
AU - Оспичев, Сергей Сергеевич
N1 - Mamedov, V.Y., Kovalevsky, D.A., Morozov, D.A. et al. Hierarchical Classification of Scientific Articles Using Deep Learning (Using the UDC Hierarchy As an Example). Aut. Control Comp. Sci. 59, 1181–1192 (2025). https://doi.org/10.3103/S0146411625700440
PY - 2025/12
Y1 - 2025/12
N2 - The exponential growth of scientific publications has heightened the need for robust tools to organize and retrieve research effectively. The Universal Decimal Classification (UDC) serves as a valuable framework for categorizing articles by subject area. However, manual assignment of UDC codes is often prone to inaccuracies or oversimplification, limiting its utility. In this study, we present a novel approach for the automated assignment of UDC codes to scientific articles using BERT-based models. Our methodology is trained and evaluated on a dataset comprising over 19 000 articles in mathematics and related disciplines. To address the hierarchical structure of the UDC, we develop two specialized evaluation metrics: hierarchical classification accuracy and hierarchical recommendation accuracy. We also explore multiple strategies for flattening hierarchical labels. Our results demonstrate a hierarchical recommendation accuracy of 0.8220. Furthermore, blind expert evaluation reveals that discrepancies between the reference and predicted labels often stem from errors in the original UDC code assignments by the authors of articles. Our approach demonstrates strong potential for automating the classification of scientific articles and can be extended to other hierarchical classification systems.
AB - The exponential growth of scientific publications has heightened the need for robust tools to organize and retrieve research effectively. The Universal Decimal Classification (UDC) serves as a valuable framework for categorizing articles by subject area. However, manual assignment of UDC codes is often prone to inaccuracies or oversimplification, limiting its utility. In this study, we present a novel approach for the automated assignment of UDC codes to scientific articles using BERT-based models. Our methodology is trained and evaluated on a dataset comprising over 19 000 articles in mathematics and related disciplines. To address the hierarchical structure of the UDC, we develop two specialized evaluation metrics: hierarchical classification accuracy and hierarchical recommendation accuracy. We also explore multiple strategies for flattening hierarchical labels. Our results demonstrate a hierarchical recommendation accuracy of 0.8220. Furthermore, blind expert evaluation reveals that discrepancies between the reference and predicted labels often stem from errors in the original UDC code assignments by the authors of articles. Our approach demonstrates strong potential for automating the classification of scientific articles and can be extended to other hierarchical classification systems.
KW - КЛАССИФИКАЦИЯ ТЕКСТОВ
KW - ИЕРАРХИЧЕСКАЯ КЛАССИФИКАЦИЯ ТЕКСТОВ
KW - УНИВЕРСАЛЬНЫЙ ДЕСЯТИЧНЫЙ КЛАССИФИКАТОР
KW - ГЛУБОКОЕ ОБУЧЕНИЕ
KW - TEXT CLASSIFICATION
KW - HIERARCHICAL TEXT CLASSIFICATION
KW - UNIVERSAL DECIMAL CLASSIFIER
KW - DEEP LEARNING
UR - https://www.scopus.com/pages/publications/105030608969
UR - https://elibrary.ru/item.asp?id=80479012
U2 - 10.3103/S0146411625700440
DO - 10.3103/S0146411625700440
M3 - Article
VL - 59
SP - 1181
EP - 1192
JO - Automatic Control and Computer Sciences
JF - Automatic Control and Computer Sciences
SN - 1558-108X
IS - 7
ER -
ID: 75468441