Hapax legomena via stochastic processes

Standard

Hapax legomena via stochastic processes. / Файзуллаев, Шахзод Шухрат угли; Ковалевский, Артем Павлович.

In: Glottometrics, Vol. 56, 2, 2024, p. 22-39.

Research output: Contribution to journal › Article › peer-review

Harvard

Файзуллаев, ШШУ & Ковалевский, АП 2024, 'Hapax legomena via stochastic processes', Glottometrics, vol. 56, 2, pp. 22-39. https://doi.org/10.53482/2024_56_415

APA

Файзуллаев, Ш. Ш. У., & Ковалевский, А. П. (2024). Hapax legomena via stochastic processes. Glottometrics, 56, 22-39. [2]. https://doi.org/10.53482/2024_56_415

Vancouver

Файзуллаев ШШУ, Ковалевский АП. Hapax legomena via stochastic processes. Glottometrics. 2024;56:22-39. 2. doi: 10.53482/2024_56_415

Author

Файзуллаев, Шахзод Шухрат угли ; Ковалевский, Артем Павлович. / Hapax legomena via stochastic processes. In: Glottometrics. 2024 ; Vol. 56. pp. 22-39.

BibTeX

@article{47c909ccfd2b4492ab75092e34fad207,

title = "Hapax legomena via stochastic processes",

abstract = "We study the number of words that occur exactly once since the beginning of a text. We model it as a stochastic process over the length of the text. The elementary probability model, going back to Bahadur and Karlin, states that the number of words that occur exactly once should grow according to a power law, like the number of different words. The final value of the number of words occurring exactly once is the number of hapaxes of this text. We construct two statistical tests to test Karlin's model under the assumption that the probabilities of words in this model satisfy the generalized Zipf's law. These statistical tests show that some texts fit the model well, but many texts deviate significantly from it. This deviation is that the number of hapaxes is too small relative to the number of different words.",

keywords = "limit theorems, mathematical expectation, statistical test, Zipf{\textquoteright}s law, limit theorems, mathematical expectation, statistical test, Zipf{\textquoteright}s law",

author = "Файзуллаев, {Шахзод Шухрат угли} and Ковалевский, {Артем Павлович}",

note = "The research of Shahzod Fayzullaev is supported by the {"}El-yurt umidi{"} foundation under the Cabinet of Ministers of the Republic of Uzbekistan for training specialists abroad and communication with compatriots. The research of Artyom Kovalevskii is supported by the Fundamental scientific research of the SB RAS, project FWNF-2022-0010.",

year = "2024",

doi = "10.53482/2024_56_415",

language = "English",

volume = "56",

pages = "22--39",

journal = "Glottometrics",

issn = "2625-8226",

publisher = "International Quantitative Linguistics Association",

}

RIS

TY - JOUR

T1 - Hapax legomena via stochastic processes

AU - Файзуллаев, Шахзод Шухрат угли

AU - Ковалевский, Артем Павлович

N1 - The research of Shahzod Fayzullaev is supported by the "El-yurt umidi" foundation under the Cabinet of Ministers of the Republic of Uzbekistan for training specialists abroad and communication with compatriots. The research of Artyom Kovalevskii is supported by the Fundamental scientific research of the SB RAS, project FWNF-2022-0010.

PY - 2024

Y1 - 2024

N2 - We study the number of words that occur exactly once since the beginning of a text. We model it as a stochastic process over the length of the text. The elementary probability model, going back to Bahadur and Karlin, states that the number of words that occur exactly once should grow according to a power law, like the number of different words. The final value of the number of words occurring exactly once is the number of hapaxes of this text. We construct two statistical tests to test Karlin's model under the assumption that the probabilities of words in this model satisfy the generalized Zipf's law. These statistical tests show that some texts fit the model well, but many texts deviate significantly from it. This deviation is that the number of hapaxes is too small relative to the number of different words.

AB - We study the number of words that occur exactly once since the beginning of a text. We model it as a stochastic process over the length of the text. The elementary probability model, going back to Bahadur and Karlin, states that the number of words that occur exactly once should grow according to a power law, like the number of different words. The final value of the number of words occurring exactly once is the number of hapaxes of this text. We construct two statistical tests to test Karlin's model under the assumption that the probabilities of words in this model satisfy the generalized Zipf's law. These statistical tests show that some texts fit the model well, but many texts deviate significantly from it. This deviation is that the number of hapaxes is too small relative to the number of different words.

KW - limit theorems

KW - mathematical expectation

KW - statistical test

KW - Zipf’s law

KW - limit theorems

KW - mathematical expectation

KW - statistical test

KW - Zipf’s law

UR - https://www.scopus.com/record/display.uri?eid=2-s2.0-85200776162&origin=inward&txGid=d0a0070f66785cee2d6b16ea146d22bd

UR - https://www.webofscience.com/wos/woscc/full-record/WOS:001274055400002

UR - https://elibrary.ru/item.asp?id=68935439

U2 - 10.53482/2024_56_415

DO - 10.53482/2024_56_415

M3 - Article

VL - 56

SP - 22

EP - 39

JO - Glottometrics

JF - Glottometrics

SN - 2625-8226

M1 - 2

ER -

ID: 61237572