Research output: Chapter in Book/Report/Conference proceeding › Chapter › Research › peer-review
Development of Folklore Motif Classifier Using Limited Data. / Matveeva, Maria; Malykh, Valentin.
Communications in Computer and Information Science. Springer Science and Business Media Deutschland GmbH, 2022. p. 40-48 (Communications in Computer and Information Science; Vol. 1731 CCIS).Research output: Chapter in Book/Report/Conference proceeding › Chapter › Research › peer-review
}
TY - CHAP
T1 - Development of Folklore Motif Classifier Using Limited Data
AU - Matveeva, Maria
AU - Malykh, Valentin
N1 - Публикация для корректировки.
PY - 2022
Y1 - 2022
N2 - The existence of mythological universals - common or similar folklore images and motifs in different cultures, makes it possible to catalog them and present them in the form of classifications. Attributing folklore texts to certain motifs is part of the work of folklorists, but at the moment only manual marking is possible. This paper proposes methods for developing a classifier of folklore motifs using the zero-shot approach, which makes it possible to train the classifier on a limited dataset, and also allows to predict the motif for any text, even if the text with such a motif was not present in the training set. Various ways of vectorizing texts and various models were tested. Evaluation of the results of the classifiers’ work allows us to assert that the developed classifier can correlate texts with motifs with sufficient accuracy.
AB - The existence of mythological universals - common or similar folklore images and motifs in different cultures, makes it possible to catalog them and present them in the form of classifications. Attributing folklore texts to certain motifs is part of the work of folklorists, but at the moment only manual marking is possible. This paper proposes methods for developing a classifier of folklore motifs using the zero-shot approach, which makes it possible to train the classifier on a limited dataset, and also allows to predict the motif for any text, even if the text with such a motif was not present in the training set. Various ways of vectorizing texts and various models were tested. Evaluation of the results of the classifiers’ work allows us to assert that the developed classifier can correlate texts with motifs with sufficient accuracy.
KW - Multi-label classification
KW - Text classification
KW - Zero-shot learning
UR - https://www.scopus.com/record/display.uri?eid=2-s2.0-85148696505&origin=inward&txGid=79ba96c140cfa4c6c51d64c7622d1313
UR - https://www.mendeley.com/catalogue/1c834160-cbe4-3c00-8fb4-ff2e6f226cea/
U2 - 10.1007/978-3-031-23372-2_4
DO - 10.1007/978-3-031-23372-2_4
M3 - Chapter
SN - 9783031233715
T3 - Communications in Computer and Information Science
SP - 40
EP - 48
BT - Communications in Computer and Information Science
PB - Springer Science and Business Media Deutschland GmbH
ER -
ID: 55720230