Development and Implementation of Software Application for Comparative Analysis of the Estimates of the Complexity of Text Data

Standard

Development and Implementation of Software Application for Comparative Analysis of the Estimates of the Complexity of Text Data. / Gavenko, Olga; Obersht, Sofia.

Data Analytics and Management in Data Intensive Domains. ред. / Panos Pardalos; Eduard Babkin; Nikolay Zolotykh; Sergey Stupnikov. Springer, 2026. стр. 267-278 19 (Communications in Computer and Information Science; Том 2641 CCIS).

Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование

Harvard

Gavenko, O & Obersht, S 2026, Development and Implementation of Software Application for Comparative Analysis of the Estimates of the Complexity of Text Data. в P Pardalos, E Babkin, N Zolotykh & S Stupnikov (ред.), Data Analytics and Management in Data Intensive Domains., 19, Communications in Computer and Information Science, Том. 2641 CCIS, Springer, стр. 267-278, 26th International Conference Data Analytics and Management in Data Intensive Domains, Нижний Новгород, Российская Федерация, 23.10.2024. https://doi.org/10.1007/978-3-032-03997-2_19

APA

Gavenko, O., & Obersht, S. (2026). Development and Implementation of Software Application for Comparative Analysis of the Estimates of the Complexity of Text Data. в P. Pardalos, E. Babkin, N. Zolotykh, & S. Stupnikov (Ред.), Data Analytics and Management in Data Intensive Domains (стр. 267-278). [19] (Communications in Computer and Information Science; Том 2641 CCIS). Springer. https://doi.org/10.1007/978-3-032-03997-2_19

Vancouver

Gavenko O, Obersht S. Development and Implementation of Software Application for Comparative Analysis of the Estimates of the Complexity of Text Data. в Pardalos P, Babkin E, Zolotykh N, Stupnikov S, Редакторы, Data Analytics and Management in Data Intensive Domains. Springer. 2026. стр. 267-278. 19. (Communications in Computer and Information Science). doi: 10.1007/978-3-032-03997-2_19

Author

Gavenko, Olga ; Obersht, Sofia. / Development and Implementation of Software Application for Comparative Analysis of the Estimates of the Complexity of Text Data. Data Analytics and Management in Data Intensive Domains. Редактор / Panos Pardalos ; Eduard Babkin ; Nikolay Zolotykh ; Sergey Stupnikov. Springer, 2026. стр. 267-278 (Communications in Computer and Information Science).

BibTeX

@inproceedings{07c2b53e3ba1411dac735568a5bbd095,

title = "Development and Implementation of Software Application for Comparative Analysis of the Estimates of the Complexity of Text Data",

abstract = "The complexity of text is a complex concept consisting of difficultness, readability and comprehensibility and describing the text structure. The determination of text complexity has applied significance in understanding and processing of information and knowledges. Subjective parameters of text include empirical data on the reader{\textquoteright}s perception of the text, physical and cognitive abilities, knowledge and education of an individual. Objective parameters are divided into quantitative such as length, frequency of usage or number of tokens, and qualitative which are related to the analysis of linguistic means of categorical language levels and their implementation. The task becomes more complicated with the usage of the large text data. Defining text as a character sequence, the estimating model of complexity can be developed, the choice of the objective parameters, as well as methods of complexity estimation can vary; most of the formulas are universal and based on the linear-regression model. The goal of this paper is the development and implementation of software application in Python and the comparative analysis of basic formulas for English and adapted for the Russian. School textbooks on Social Studies, 5–11 classes (Russian Readability Corpus), make the test sample. The experiments with the text corpus data shows incorrect results what is explained by the fact that the model development based on the texts of different genres and styles and the difference in languages; in addition, the fact, that quantitative parameters may not be sufficient to obtain reliable results, should be taken into account when expanding corpus data.",

keywords = "Readability Estimates, Text Complexity, Text Corpus Data",

author = "Olga Gavenko and Sofia Obersht",

note = "Gavenko, O., Obersht, S. (2026). Development and Implementation of Software Application for Comparative Analysis of the Estimates of the Complexity of Text Data. In: Pardalos, P., Babkin, E., Zolotykh, N., Stupnikov, S. (eds) Data Analytics and Management in Data Intensive Domains. DAMDID/RCDL 2024. Communications in Computer and Information Science, vol 2641. Springer, Cham. https://doi.org/10.1007/978-3-032-03997-2_19; 26th International Conference Data Analytics and Management in Data Intensive Domains, DAMDID/RCDL 2024 ; Conference date: 23-10-2024 Through 25-10-2024",

year = "2026",

doi = "10.1007/978-3-032-03997-2_19",

language = "English",

isbn = "978-3-032-03996-5",

series = "Communications in Computer and Information Science",

publisher = "Springer",

pages = "267--278",

editor = "Panos Pardalos and Eduard Babkin and Nikolay Zolotykh and Sergey Stupnikov",

booktitle = "Data Analytics and Management in Data Intensive Domains",

address = "United States",

}

RIS

TY - GEN

T1 - Development and Implementation of Software Application for Comparative Analysis of the Estimates of the Complexity of Text Data

AU - Gavenko, Olga

AU - Obersht, Sofia

N1 - Conference code: 26

PY - 2026

Y1 - 2026

N2 - The complexity of text is a complex concept consisting of difficultness, readability and comprehensibility and describing the text structure. The determination of text complexity has applied significance in understanding and processing of information and knowledges. Subjective parameters of text include empirical data on the reader’s perception of the text, physical and cognitive abilities, knowledge and education of an individual. Objective parameters are divided into quantitative such as length, frequency of usage or number of tokens, and qualitative which are related to the analysis of linguistic means of categorical language levels and their implementation. The task becomes more complicated with the usage of the large text data. Defining text as a character sequence, the estimating model of complexity can be developed, the choice of the objective parameters, as well as methods of complexity estimation can vary; most of the formulas are universal and based on the linear-regression model. The goal of this paper is the development and implementation of software application in Python and the comparative analysis of basic formulas for English and adapted for the Russian. School textbooks on Social Studies, 5–11 classes (Russian Readability Corpus), make the test sample. The experiments with the text corpus data shows incorrect results what is explained by the fact that the model development based on the texts of different genres and styles and the difference in languages; in addition, the fact, that quantitative parameters may not be sufficient to obtain reliable results, should be taken into account when expanding corpus data.

AB - The complexity of text is a complex concept consisting of difficultness, readability and comprehensibility and describing the text structure. The determination of text complexity has applied significance in understanding and processing of information and knowledges. Subjective parameters of text include empirical data on the reader’s perception of the text, physical and cognitive abilities, knowledge and education of an individual. Objective parameters are divided into quantitative such as length, frequency of usage or number of tokens, and qualitative which are related to the analysis of linguistic means of categorical language levels and their implementation. The task becomes more complicated with the usage of the large text data. Defining text as a character sequence, the estimating model of complexity can be developed, the choice of the objective parameters, as well as methods of complexity estimation can vary; most of the formulas are universal and based on the linear-regression model. The goal of this paper is the development and implementation of software application in Python and the comparative analysis of basic formulas for English and adapted for the Russian. School textbooks on Social Studies, 5–11 classes (Russian Readability Corpus), make the test sample. The experiments with the text corpus data shows incorrect results what is explained by the fact that the model development based on the texts of different genres and styles and the difference in languages; in addition, the fact, that quantitative parameters may not be sufficient to obtain reliable results, should be taken into account when expanding corpus data.

KW - Readability Estimates

KW - Text Complexity

KW - Text Corpus Data

UR - https://www.scopus.com/pages/publications/105021005988

UR - https://www.mendeley.com/catalogue/38ad3bf4-44ec-323b-ae71-c7518ae2f0c2/

U2 - 10.1007/978-3-032-03997-2_19

DO - 10.1007/978-3-032-03997-2_19

M3 - Conference contribution

SN - 978-3-032-03996-5

T3 - Communications in Computer and Information Science

SP - 267

EP - 278

BT - Data Analytics and Management in Data Intensive Domains

A2 - Pardalos, Panos

A2 - Babkin, Eduard

A2 - Zolotykh, Nikolay

A2 - Stupnikov, Sergey

PB - Springer

T2 - 26th International Conference Data Analytics and Management in Data Intensive Domains

Y2 - 23 October 2024 through 25 October 2024

ER -

ID: 72143523