Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование
Development and Implementation of Software Application for Comparative Analysis of the Estimates of the Complexity of Text Data. / Gavenko, Olga; Obersht, Sofia.
Data Analytics and Management in Data Intensive Domains. ред. / Panos Pardalos; Eduard Babkin; Nikolay Zolotykh; Sergey Stupnikov. Springer, 2026. стр. 267-278 19 (Communications in Computer and Information Science; Том 2641 CCIS).Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование
}
TY - GEN
T1 - Development and Implementation of Software Application for Comparative Analysis of the Estimates of the Complexity of Text Data
AU - Gavenko, Olga
AU - Obersht, Sofia
N1 - Conference code: 26
PY - 2026
Y1 - 2026
N2 - The complexity of text is a complex concept consisting of difficultness, readability and comprehensibility and describing the text structure. The determination of text complexity has applied significance in understanding and processing of information and knowledges. Subjective parameters of text include empirical data on the reader’s perception of the text, physical and cognitive abilities, knowledge and education of an individual. Objective parameters are divided into quantitative such as length, frequency of usage or number of tokens, and qualitative which are related to the analysis of linguistic means of categorical language levels and their implementation. The task becomes more complicated with the usage of the large text data. Defining text as a character sequence, the estimating model of complexity can be developed, the choice of the objective parameters, as well as methods of complexity estimation can vary; most of the formulas are universal and based on the linear-regression model. The goal of this paper is the development and implementation of software application in Python and the comparative analysis of basic formulas for English and adapted for the Russian. School textbooks on Social Studies, 5–11 classes (Russian Readability Corpus), make the test sample. The experiments with the text corpus data shows incorrect results what is explained by the fact that the model development based on the texts of different genres and styles and the difference in languages; in addition, the fact, that quantitative parameters may not be sufficient to obtain reliable results, should be taken into account when expanding corpus data.
AB - The complexity of text is a complex concept consisting of difficultness, readability and comprehensibility and describing the text structure. The determination of text complexity has applied significance in understanding and processing of information and knowledges. Subjective parameters of text include empirical data on the reader’s perception of the text, physical and cognitive abilities, knowledge and education of an individual. Objective parameters are divided into quantitative such as length, frequency of usage or number of tokens, and qualitative which are related to the analysis of linguistic means of categorical language levels and their implementation. The task becomes more complicated with the usage of the large text data. Defining text as a character sequence, the estimating model of complexity can be developed, the choice of the objective parameters, as well as methods of complexity estimation can vary; most of the formulas are universal and based on the linear-regression model. The goal of this paper is the development and implementation of software application in Python and the comparative analysis of basic formulas for English and adapted for the Russian. School textbooks on Social Studies, 5–11 classes (Russian Readability Corpus), make the test sample. The experiments with the text corpus data shows incorrect results what is explained by the fact that the model development based on the texts of different genres and styles and the difference in languages; in addition, the fact, that quantitative parameters may not be sufficient to obtain reliable results, should be taken into account when expanding corpus data.
KW - Readability Estimates
KW - Text Complexity
KW - Text Corpus Data
UR - https://www.scopus.com/pages/publications/105021005988
UR - https://www.mendeley.com/catalogue/38ad3bf4-44ec-323b-ae71-c7518ae2f0c2/
U2 - 10.1007/978-3-032-03997-2_19
DO - 10.1007/978-3-032-03997-2_19
M3 - Conference contribution
SN - 978-3-032-03996-5
T3 - Communications in Computer and Information Science
SP - 267
EP - 278
BT - Data Analytics and Management in Data Intensive Domains
A2 - Pardalos, Panos
A2 - Babkin, Eduard
A2 - Zolotykh, Nikolay
A2 - Stupnikov, Sergey
PB - Springer
T2 - 26th International Conference Data Analytics and Management in Data Intensive Domains
Y2 - 23 October 2024 through 25 October 2024
ER -
ID: 72143523