Standard

Hapax legomena via stochastic processes. / Fayzullayev, Shahzod; Ковалевский, Артем Павлович.

In: Glottometrics, Vol. 56, 30.07.2024, p. 22-39.

Research output: Contribution to journalArticlepeer-review

Harvard

Fayzullayev, S & Ковалевский, АП 2024, 'Hapax legomena via stochastic processes', Glottometrics, vol. 56, pp. 22-39.

APA

Fayzullayev, S., & Ковалевский, А. П. (2024). Hapax legomena via stochastic processes. Glottometrics, 56, 22-39.

Vancouver

Fayzullayev S, Ковалевский АП. Hapax legomena via stochastic processes. Glottometrics. 2024 Jul 30;56:22-39.

Author

Fayzullayev, Shahzod ; Ковалевский, Артем Павлович. / Hapax legomena via stochastic processes. In: Glottometrics. 2024 ; Vol. 56. pp. 22-39.

BibTeX

@article{47c909ccfd2b4492ab75092e34fad207,
title = "Hapax legomena via stochastic processes",
abstract = "We study the number of words that occur exactly once since the beginning of a text. We model it as a stochastic process over the length of the text. The elementary probability model, going back to Bahadur and Karlin, states that the number of words that occur exactly once should grow according to a power law, like the number of different words. The final value of the number of words occurring exactly once is the number of hapaxes of this text. We construct two statistical tests to test Karlin's model under the assumption that the probabilities of words in this model satisfy the generalized Zipf's law. These statistical tests show that some texts fit the model well, but many texts deviate significantly from it. This deviation is that the number of hapaxes is too small relative to the number of different words.",
author = "Shahzod Fayzullayev and Ковалевский, {Артем Павлович}",
note = "Институт математики им. С.Л. Соболева СО РАН FWNF-2022-0010",
year = "2024",
month = jul,
day = "30",
language = "English",
volume = "56",
pages = "22--39",
journal = "Glottometrics",
issn = "2625-8226",
publisher = "International Quantitative Linguistics Association",

}

RIS

TY - JOUR

T1 - Hapax legomena via stochastic processes

AU - Fayzullayev, Shahzod

AU - Ковалевский, Артем Павлович

N1 - Институт математики им. С.Л. Соболева СО РАН FWNF-2022-0010

PY - 2024/7/30

Y1 - 2024/7/30

N2 - We study the number of words that occur exactly once since the beginning of a text. We model it as a stochastic process over the length of the text. The elementary probability model, going back to Bahadur and Karlin, states that the number of words that occur exactly once should grow according to a power law, like the number of different words. The final value of the number of words occurring exactly once is the number of hapaxes of this text. We construct two statistical tests to test Karlin's model under the assumption that the probabilities of words in this model satisfy the generalized Zipf's law. These statistical tests show that some texts fit the model well, but many texts deviate significantly from it. This deviation is that the number of hapaxes is too small relative to the number of different words.

AB - We study the number of words that occur exactly once since the beginning of a text. We model it as a stochastic process over the length of the text. The elementary probability model, going back to Bahadur and Karlin, states that the number of words that occur exactly once should grow according to a power law, like the number of different words. The final value of the number of words occurring exactly once is the number of hapaxes of this text. We construct two statistical tests to test Karlin's model under the assumption that the probabilities of words in this model satisfy the generalized Zipf's law. These statistical tests show that some texts fit the model well, but many texts deviate significantly from it. This deviation is that the number of hapaxes is too small relative to the number of different words.

UR - https://www.webofscience.com/wos/woscc/full-record/WOS:001274055400002

M3 - Article

VL - 56

SP - 22

EP - 39

JO - Glottometrics

JF - Glottometrics

SN - 2625-8226

ER -

ID: 61237572