Standard

Exploring the Feature Space for Cross-Domain Assessing the Complexity of Russian-Language Texts. / Morozov, Dmitry; Lagutina, Ksenia; Drozhashchikh, Grigory et al.

Proceedings - Ivannikov ISPRAS Open Conference. Institute of Electrical and Electronics Engineers Inc., 2024. (Proceedings - Ivannikov ISPRAS Open Conference).

Research output: Chapter in Book/Report/Conference proceedingConference contributionResearchpeer-review

Harvard

Morozov, D, Lagutina, K, Drozhashchikh, G, Garipov, T & Glazkova, A 2024, Exploring the Feature Space for Cross-Domain Assessing the Complexity of Russian-Language Texts. in Proceedings - Ivannikov ISPRAS Open Conference. Proceedings - Ivannikov ISPRAS Open Conference, Institute of Electrical and Electronics Engineers Inc., 2024 Ivannikov Open Conference, Москва, Russian Federation, 11.12.2024. https://doi.org/10.1109/ISPRAS64596.2024.10899137

APA

Morozov, D., Lagutina, K., Drozhashchikh, G., Garipov, T., & Glazkova, A. (2024). Exploring the Feature Space for Cross-Domain Assessing the Complexity of Russian-Language Texts. In Proceedings - Ivannikov ISPRAS Open Conference (Proceedings - Ivannikov ISPRAS Open Conference). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ISPRAS64596.2024.10899137

Vancouver

Morozov D, Lagutina K, Drozhashchikh G, Garipov T, Glazkova A. Exploring the Feature Space for Cross-Domain Assessing the Complexity of Russian-Language Texts. In Proceedings - Ivannikov ISPRAS Open Conference. Institute of Electrical and Electronics Engineers Inc. 2024. (Proceedings - Ivannikov ISPRAS Open Conference). doi: 10.1109/ISPRAS64596.2024.10899137

Author

Morozov, Dmitry ; Lagutina, Ksenia ; Drozhashchikh, Grigory et al. / Exploring the Feature Space for Cross-Domain Assessing the Complexity of Russian-Language Texts. Proceedings - Ivannikov ISPRAS Open Conference. Institute of Electrical and Electronics Engineers Inc., 2024. (Proceedings - Ivannikov ISPRAS Open Conference).

BibTeX

@inproceedings{f7d3f8873b0f4a478ad3e85f5cd8f4ff,
title = "Exploring the Feature Space for Cross-Domain Assessing the Complexity of Russian-Language Texts",
abstract = "The assessment of text complexity is a significant applied problem with potential applications in drafting legal doc-uments, editing textbooks, and selecting books for extracurricular reading. Different task formulations give rise to various types of text complexity that are weakly correlated. Despite this, re-searchers typically overlook cross-domain complexity assessment. This study evaluates the applicability of various linguistic features in assessing the complexity of Russian-language texts, adding two new groups of features (rhythmic and cohesion) to those previously studied and introducing a new group of features for lexical complexity. We perform both in-domain and cross-domain comparisons of the features. Our findings indicate that syntactic features are the most significant in terms of Mutual Information. In the in-domain context, lexical and morphological features were found to be the most beneficial, whereas in the cross-domain context, syntactic, morphological, and lexical features proved to be the most effective. Conversely, rhythmic and cohesion features did not significantly impact the quality of the assessment algorithms.",
keywords = "lexical complexity, natural language processing, rhythmic features, text cohesion, text complexity",
author = "Dmitry Morozov and Ksenia Lagutina and Grigory Drozhashchikh and Timur Garipov and Anna Glazkova",
note = "This project is supported in part by the Yaroslavl State University (Project no. VIP-016).; 2024 Ivannikov Open Conference, ISPRAS 2024 ; Conference date: 11-12-2024 Through 12-12-2024",
year = "2024",
doi = "10.1109/ISPRAS64596.2024.10899137",
language = "English",
isbn = "979-8-3315-2603-0",
series = "Proceedings - Ivannikov ISPRAS Open Conference",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
booktitle = "Proceedings - Ivannikov ISPRAS Open Conference",
address = "United States",

}

RIS

TY - GEN

T1 - Exploring the Feature Space for Cross-Domain Assessing the Complexity of Russian-Language Texts

AU - Morozov, Dmitry

AU - Lagutina, Ksenia

AU - Drozhashchikh, Grigory

AU - Garipov, Timur

AU - Glazkova, Anna

N1 - This project is supported in part by the Yaroslavl State University (Project no. VIP-016).

PY - 2024

Y1 - 2024

N2 - The assessment of text complexity is a significant applied problem with potential applications in drafting legal doc-uments, editing textbooks, and selecting books for extracurricular reading. Different task formulations give rise to various types of text complexity that are weakly correlated. Despite this, re-searchers typically overlook cross-domain complexity assessment. This study evaluates the applicability of various linguistic features in assessing the complexity of Russian-language texts, adding two new groups of features (rhythmic and cohesion) to those previously studied and introducing a new group of features for lexical complexity. We perform both in-domain and cross-domain comparisons of the features. Our findings indicate that syntactic features are the most significant in terms of Mutual Information. In the in-domain context, lexical and morphological features were found to be the most beneficial, whereas in the cross-domain context, syntactic, morphological, and lexical features proved to be the most effective. Conversely, rhythmic and cohesion features did not significantly impact the quality of the assessment algorithms.

AB - The assessment of text complexity is a significant applied problem with potential applications in drafting legal doc-uments, editing textbooks, and selecting books for extracurricular reading. Different task formulations give rise to various types of text complexity that are weakly correlated. Despite this, re-searchers typically overlook cross-domain complexity assessment. This study evaluates the applicability of various linguistic features in assessing the complexity of Russian-language texts, adding two new groups of features (rhythmic and cohesion) to those previously studied and introducing a new group of features for lexical complexity. We perform both in-domain and cross-domain comparisons of the features. Our findings indicate that syntactic features are the most significant in terms of Mutual Information. In the in-domain context, lexical and morphological features were found to be the most beneficial, whereas in the cross-domain context, syntactic, morphological, and lexical features proved to be the most effective. Conversely, rhythmic and cohesion features did not significantly impact the quality of the assessment algorithms.

KW - lexical complexity

KW - natural language processing

KW - rhythmic features

KW - text cohesion

KW - text complexity

UR - https://www.scopus.com/record/display.uri?eid=2-s2.0-105000143941&origin=inward&txGid=a70f4261e52d36a44a1b5871b41c8d28

UR - https://www.mendeley.com/catalogue/f40795d8-7941-3b75-98db-3ad97056f0b7/

U2 - 10.1109/ISPRAS64596.2024.10899137

DO - 10.1109/ISPRAS64596.2024.10899137

M3 - Conference contribution

SN - 979-8-3315-2603-0

T3 - Proceedings - Ivannikov ISPRAS Open Conference

BT - Proceedings - Ivannikov ISPRAS Open Conference

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2024 Ivannikov Open Conference

Y2 - 11 December 2024 through 12 December 2024

ER -

ID: 65127067