Standard

Statistical tests for text homogeneity: using forward and backward processes of numbers of different words. / Abebe, Berhane; Chebunin, Mikhail; Kovalevskii, Artyom et al.

In: Glottometrics, Vol. 53, 2022, p. 42-58.

Research output: Contribution to journalArticlepeer-review

Harvard

APA

Vancouver

Author

BibTeX

@article{a5dddbe4bba54647bbffd5e208b64067,
title = "Statistical tests for text homogeneity: using forward and backward processes of numbers of different words",
abstract = "The processes of growth in the number of diverse words in a text, when reading in the forward and backward directions, are studied in this article. Based upon the statistics achieved from the difference between these two processes, we construct a statistical test. This statistical test is used for text homogeneity checks. The elementary model states that words in a text are selected from some dictionary independent of each other according to the Zipf–Mandelbrot law. P-values of the statistical test are calculated based on the elementary probabilistic model using the asymptotic normality of corresponding statistics. At last but not least, this statistical test is applied for the analysis of homogeneity of sequences of sonnets.",
keywords = "Gaussian process, Zipf{\textquoteright}s law, statistical test, text homogeneity, urn model, weak convergence",
author = "Berhane Abebe and Mikhail Chebunin and Artyom Kovalevskii and Natalia Zakrevskaya",
note = "Acknowledgments: The work is supported by Mathematical Center in Akademgorodok under agreement No. 075-15-2019-1675 with the Ministry of Science and Higher Education of the Russian Federation.",
year = "2022",
doi = "10.53482/2022_53_401",
language = "English",
volume = "53",
pages = "42--58",
journal = "Glottometrics",
issn = "2625-8226",
publisher = "International Quantitative Linguistics Association",

}

RIS

TY - JOUR

T1 - Statistical tests for text homogeneity: using forward and backward processes of numbers of different words

AU - Abebe, Berhane

AU - Chebunin, Mikhail

AU - Kovalevskii, Artyom

AU - Zakrevskaya, Natalia

N1 - Acknowledgments: The work is supported by Mathematical Center in Akademgorodok under agreement No. 075-15-2019-1675 with the Ministry of Science and Higher Education of the Russian Federation.

PY - 2022

Y1 - 2022

N2 - The processes of growth in the number of diverse words in a text, when reading in the forward and backward directions, are studied in this article. Based upon the statistics achieved from the difference between these two processes, we construct a statistical test. This statistical test is used for text homogeneity checks. The elementary model states that words in a text are selected from some dictionary independent of each other according to the Zipf–Mandelbrot law. P-values of the statistical test are calculated based on the elementary probabilistic model using the asymptotic normality of corresponding statistics. At last but not least, this statistical test is applied for the analysis of homogeneity of sequences of sonnets.

AB - The processes of growth in the number of diverse words in a text, when reading in the forward and backward directions, are studied in this article. Based upon the statistics achieved from the difference between these two processes, we construct a statistical test. This statistical test is used for text homogeneity checks. The elementary model states that words in a text are selected from some dictionary independent of each other according to the Zipf–Mandelbrot law. P-values of the statistical test are calculated based on the elementary probabilistic model using the asymptotic normality of corresponding statistics. At last but not least, this statistical test is applied for the analysis of homogeneity of sequences of sonnets.

KW - Gaussian process

KW - Zipf’s law

KW - statistical test

KW - text homogeneity

KW - urn model

KW - weak convergence

UR - https://www.scopus.com/inward/record.url?eid=2-s2.0-85146477404&partnerID=40&md5=99151fa5b2831d1fceb80c9b03da089f

UR - https://www.mendeley.com/catalogue/8727f656-59ce-31a8-8afc-3dcde65fc5ea/

U2 - 10.53482/2022_53_401

DO - 10.53482/2022_53_401

M3 - Article

VL - 53

SP - 42

EP - 58

JO - Glottometrics

JF - Glottometrics

SN - 2625-8226

ER -

ID: 45765358