Research output: Contribution to journal › Article › peer-review
Statistical tests for text homogeneity: using forward and backward processes of numbers of different words. / Abebe, Berhane; Chebunin, Mikhail; Kovalevskii, Artyom et al.
In: Glottometrics, Vol. 53, 2022, p. 42-58.Research output: Contribution to journal › Article › peer-review
}
TY - JOUR
T1 - Statistical tests for text homogeneity: using forward and backward processes of numbers of different words
AU - Abebe, Berhane
AU - Chebunin, Mikhail
AU - Kovalevskii, Artyom
AU - Zakrevskaya, Natalia
N1 - Acknowledgments: The work is supported by Mathematical Center in Akademgorodok under agreement No. 075-15-2019-1675 with the Ministry of Science and Higher Education of the Russian Federation.
PY - 2022
Y1 - 2022
N2 - The processes of growth in the number of diverse words in a text, when reading in the forward and backward directions, are studied in this article. Based upon the statistics achieved from the difference between these two processes, we construct a statistical test. This statistical test is used for text homogeneity checks. The elementary model states that words in a text are selected from some dictionary independent of each other according to the Zipf–Mandelbrot law. P-values of the statistical test are calculated based on the elementary probabilistic model using the asymptotic normality of corresponding statistics. At last but not least, this statistical test is applied for the analysis of homogeneity of sequences of sonnets.
AB - The processes of growth in the number of diverse words in a text, when reading in the forward and backward directions, are studied in this article. Based upon the statistics achieved from the difference between these two processes, we construct a statistical test. This statistical test is used for text homogeneity checks. The elementary model states that words in a text are selected from some dictionary independent of each other according to the Zipf–Mandelbrot law. P-values of the statistical test are calculated based on the elementary probabilistic model using the asymptotic normality of corresponding statistics. At last but not least, this statistical test is applied for the analysis of homogeneity of sequences of sonnets.
KW - Gaussian process
KW - Zipf’s law
KW - statistical test
KW - text homogeneity
KW - urn model
KW - weak convergence
UR - https://www.scopus.com/inward/record.url?eid=2-s2.0-85146477404&partnerID=40&md5=99151fa5b2831d1fceb80c9b03da089f
UR - https://www.mendeley.com/catalogue/8727f656-59ce-31a8-8afc-3dcde65fc5ea/
U2 - 10.53482/2022_53_401
DO - 10.53482/2022_53_401
M3 - Article
VL - 53
SP - 42
EP - 58
JO - Glottometrics
JF - Glottometrics
SN - 2625-8226
ER -
ID: 45765358