Standard

Text Segmentation Via Processes that Count the Number of Different Words Forward and Backward. / Abebe, Berhane; Chebunin, Mikhail; Kovalevskii, Artyom.

в: Journal of Quantitative Linguistics, 2023, стр. 1-18.

Результаты исследований: Научные публикации в периодических изданияхстатьяРецензирование

Harvard

APA

Vancouver

Abebe B, Chebunin M, Kovalevskii A. Text Segmentation Via Processes that Count the Number of Different Words Forward and Backward. Journal of Quantitative Linguistics. 2023;1-18. doi: 10.1080/09296174.2023.2275342

Author

BibTeX

@article{9125524a8e6a4eeaab4ba366a516dad9,
title = "Text Segmentation Via Processes that Count the Number of Different Words Forward and Backward",
abstract = "The paper is developing a new statistical approach to automatic partitioning of texts into parts belonging to different authors. It is based on the analysis of processes that counts the number of different words forward and backward. The theoretical study of the processes is based on the assumptions of an elementary probability model with a change point. We prove consistence of our statistical estimate of the point of concatenation in the case when the concatenated texts have different Zipf exponents. This method is being tested on the Brown corpus and also on newspaper texts in different languages. Testing shows a good estimate of the concatenation point. This method can be used in parallel with other text segmentation methods.",
author = "Berhane Abebe and Mikhail Chebunin and Artyom Kovalevskii",
note = "The work is supported partially by the Fundamental scientific research of the SB RAS, project FWNF-2022-0010.",
year = "2023",
doi = "10.1080/09296174.2023.2275342",
language = "English",
pages = "1--18",
journal = "Journal of Quantitative Linguistics",
issn = "0929-6174",
publisher = "Taylor and Francis Ltd.",

}

RIS

TY - JOUR

T1 - Text Segmentation Via Processes that Count the Number of Different Words Forward and Backward

AU - Abebe, Berhane

AU - Chebunin, Mikhail

AU - Kovalevskii, Artyom

N1 - The work is supported partially by the Fundamental scientific research of the SB RAS, project FWNF-2022-0010.

PY - 2023

Y1 - 2023

N2 - The paper is developing a new statistical approach to automatic partitioning of texts into parts belonging to different authors. It is based on the analysis of processes that counts the number of different words forward and backward. The theoretical study of the processes is based on the assumptions of an elementary probability model with a change point. We prove consistence of our statistical estimate of the point of concatenation in the case when the concatenated texts have different Zipf exponents. This method is being tested on the Brown corpus and also on newspaper texts in different languages. Testing shows a good estimate of the concatenation point. This method can be used in parallel with other text segmentation methods.

AB - The paper is developing a new statistical approach to automatic partitioning of texts into parts belonging to different authors. It is based on the analysis of processes that counts the number of different words forward and backward. The theoretical study of the processes is based on the assumptions of an elementary probability model with a change point. We prove consistence of our statistical estimate of the point of concatenation in the case when the concatenated texts have different Zipf exponents. This method is being tested on the Brown corpus and also on newspaper texts in different languages. Testing shows a good estimate of the concatenation point. This method can be used in parallel with other text segmentation methods.

UR - https://www.scopus.com/record/display.uri?eid=2-s2.0-85176726465&origin=inward&txGid=81ee0a8d108e42aef8e36e5ecce0dd9d

UR - https://www.mendeley.com/catalogue/3cc22079-a7cf-3038-aaa1-33b068c90bde/

U2 - 10.1080/09296174.2023.2275342

DO - 10.1080/09296174.2023.2275342

M3 - Article

SP - 1

EP - 18

JO - Journal of Quantitative Linguistics

JF - Journal of Quantitative Linguistics

SN - 0929-6174

ER -

ID: 59232960