Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
Exploring the Feature Space for Cross-Domain Assessing the Complexity of Russian-Language Texts. / Morozov, Dmitry; Lagutina, Ksenia; Drozhashchikh, Grigory et al.
Proceedings - Ivannikov ISPRAS Open Conference. Institute of Electrical and Electronics Engineers Inc., 2024. (Proceedings - Ivannikov ISPRAS Open Conference).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
}
TY - GEN
T1 - Exploring the Feature Space for Cross-Domain Assessing the Complexity of Russian-Language Texts
AU - Morozov, Dmitry
AU - Lagutina, Ksenia
AU - Drozhashchikh, Grigory
AU - Garipov, Timur
AU - Glazkova, Anna
N1 - This project is supported in part by the Yaroslavl State University (Project no. VIP-016).
PY - 2024
Y1 - 2024
N2 - The assessment of text complexity is a significant applied problem with potential applications in drafting legal doc-uments, editing textbooks, and selecting books for extracurricular reading. Different task formulations give rise to various types of text complexity that are weakly correlated. Despite this, re-searchers typically overlook cross-domain complexity assessment. This study evaluates the applicability of various linguistic features in assessing the complexity of Russian-language texts, adding two new groups of features (rhythmic and cohesion) to those previously studied and introducing a new group of features for lexical complexity. We perform both in-domain and cross-domain comparisons of the features. Our findings indicate that syntactic features are the most significant in terms of Mutual Information. In the in-domain context, lexical and morphological features were found to be the most beneficial, whereas in the cross-domain context, syntactic, morphological, and lexical features proved to be the most effective. Conversely, rhythmic and cohesion features did not significantly impact the quality of the assessment algorithms.
AB - The assessment of text complexity is a significant applied problem with potential applications in drafting legal doc-uments, editing textbooks, and selecting books for extracurricular reading. Different task formulations give rise to various types of text complexity that are weakly correlated. Despite this, re-searchers typically overlook cross-domain complexity assessment. This study evaluates the applicability of various linguistic features in assessing the complexity of Russian-language texts, adding two new groups of features (rhythmic and cohesion) to those previously studied and introducing a new group of features for lexical complexity. We perform both in-domain and cross-domain comparisons of the features. Our findings indicate that syntactic features are the most significant in terms of Mutual Information. In the in-domain context, lexical and morphological features were found to be the most beneficial, whereas in the cross-domain context, syntactic, morphological, and lexical features proved to be the most effective. Conversely, rhythmic and cohesion features did not significantly impact the quality of the assessment algorithms.
KW - lexical complexity
KW - natural language processing
KW - rhythmic features
KW - text cohesion
KW - text complexity
UR - https://www.scopus.com/record/display.uri?eid=2-s2.0-105000143941&origin=inward&txGid=a70f4261e52d36a44a1b5871b41c8d28
UR - https://www.mendeley.com/catalogue/f40795d8-7941-3b75-98db-3ad97056f0b7/
U2 - 10.1109/ISPRAS64596.2024.10899137
DO - 10.1109/ISPRAS64596.2024.10899137
M3 - Conference contribution
SN - 979-8-3315-2603-0
T3 - Proceedings - Ivannikov ISPRAS Open Conference
BT - Proceedings - Ivannikov ISPRAS Open Conference
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 Ivannikov Open Conference
Y2 - 11 December 2024 through 12 December 2024
ER -
ID: 65127067