Standard

Evolution of Efficient Symbolic Communication Codes. / Kolonin, Anton.

Studies in Computational Intelligence. Springer Science and Business Media Deutschland GmbH, 2023. p. 3-12 (Studies in Computational Intelligence; Vol. 1120).

Research output: Chapter in Book/Report/Conference proceedingConference contributionResearchpeer-review

Harvard

Kolonin, A 2023, Evolution of Efficient Symbolic Communication Codes. in Studies in Computational Intelligence. Studies in Computational Intelligence, vol. 1120, Springer Science and Business Media Deutschland GmbH, pp. 3-12. https://doi.org/10.1007/978-3-031-44865-2_1

APA

Kolonin, A. (2023). Evolution of Efficient Symbolic Communication Codes. In Studies in Computational Intelligence (pp. 3-12). (Studies in Computational Intelligence; Vol. 1120). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-44865-2_1

Vancouver

Kolonin A. Evolution of Efficient Symbolic Communication Codes. In Studies in Computational Intelligence. Springer Science and Business Media Deutschland GmbH. 2023. p. 3-12. (Studies in Computational Intelligence). doi: 10.1007/978-3-031-44865-2_1

Author

Kolonin, Anton. / Evolution of Efficient Symbolic Communication Codes. Studies in Computational Intelligence. Springer Science and Business Media Deutschland GmbH, 2023. pp. 3-12 (Studies in Computational Intelligence).

BibTeX

@inproceedings{60dfdef9d36b40f8980a2262822fd93b,
title = "Evolution of Efficient Symbolic Communication Codes",
abstract = "The paper explores how the human natural language structure can be seen as a product of evolution of inter-personal communication code, targeting maximization of such culture-agnostic and cross-lingual metrics such as anti-entropy, compression factor and cross-split F1 score. The exploration is done as part of a larger unsupervised language learning effort, the attempt is made to perform meta-learning in a space of hyper-parameters maximizing F1 score based on the “ground truth” language structure, by means of maximizing the metrics mentioned above. The paper presents preliminary results of cross-lingual word-level segmentation tokenization study for Russian, Chinese and English as well as subword segmentation or morpho-parsing study for English. It is found that language structure form the word-level segmentation or tokenization can be found as driven by all of these metrics, anti-entropy being more relevant to English and Russian while compression factor more specific for Chinese. The study for subword segmentation or morpho-parsing on English lexicon has revealed straight connection between the compression been found to be associated with compression factor, while, surprising, the same connection with anti-entropy has turned to be the inverse.",
keywords = "Communication Code, Compression, Cross-lingual, Entropy, Meta-learning, Natural Language, Subword Segmentation, Tokenization, Unsupervised Language Learning",
author = "Anton Kolonin",
note = "We are grateful to Sergey Terekhov and Nikolay Mikhaylovskiy for valuable questions, critique, recommendations and suggestions during the course of work.",
year = "2023",
doi = "10.1007/978-3-031-44865-2_1",
language = "English",
isbn = "9783031448645",
series = "Studies in Computational Intelligence",
publisher = "Springer Science and Business Media Deutschland GmbH",
pages = "3--12",
booktitle = "Studies in Computational Intelligence",
address = "Germany",

}

RIS

TY - GEN

T1 - Evolution of Efficient Symbolic Communication Codes

AU - Kolonin, Anton

N1 - We are grateful to Sergey Terekhov and Nikolay Mikhaylovskiy for valuable questions, critique, recommendations and suggestions during the course of work.

PY - 2023

Y1 - 2023

N2 - The paper explores how the human natural language structure can be seen as a product of evolution of inter-personal communication code, targeting maximization of such culture-agnostic and cross-lingual metrics such as anti-entropy, compression factor and cross-split F1 score. The exploration is done as part of a larger unsupervised language learning effort, the attempt is made to perform meta-learning in a space of hyper-parameters maximizing F1 score based on the “ground truth” language structure, by means of maximizing the metrics mentioned above. The paper presents preliminary results of cross-lingual word-level segmentation tokenization study for Russian, Chinese and English as well as subword segmentation or morpho-parsing study for English. It is found that language structure form the word-level segmentation or tokenization can be found as driven by all of these metrics, anti-entropy being more relevant to English and Russian while compression factor more specific for Chinese. The study for subword segmentation or morpho-parsing on English lexicon has revealed straight connection between the compression been found to be associated with compression factor, while, surprising, the same connection with anti-entropy has turned to be the inverse.

AB - The paper explores how the human natural language structure can be seen as a product of evolution of inter-personal communication code, targeting maximization of such culture-agnostic and cross-lingual metrics such as anti-entropy, compression factor and cross-split F1 score. The exploration is done as part of a larger unsupervised language learning effort, the attempt is made to perform meta-learning in a space of hyper-parameters maximizing F1 score based on the “ground truth” language structure, by means of maximizing the metrics mentioned above. The paper presents preliminary results of cross-lingual word-level segmentation tokenization study for Russian, Chinese and English as well as subword segmentation or morpho-parsing study for English. It is found that language structure form the word-level segmentation or tokenization can be found as driven by all of these metrics, anti-entropy being more relevant to English and Russian while compression factor more specific for Chinese. The study for subword segmentation or morpho-parsing on English lexicon has revealed straight connection between the compression been found to be associated with compression factor, while, surprising, the same connection with anti-entropy has turned to be the inverse.

KW - Communication Code

KW - Compression

KW - Cross-lingual

KW - Entropy

KW - Meta-learning

KW - Natural Language

KW - Subword Segmentation

KW - Tokenization

KW - Unsupervised Language Learning

UR - https://www.scopus.com/record/display.uri?eid=2-s2.0-85175815154&origin=inward&txGid=ff8763439a02e4b423dba92fe55cb89a

UR - https://www.mendeley.com/catalogue/f2b6373d-6628-35b8-a066-cd733dec2e03/

U2 - 10.1007/978-3-031-44865-2_1

DO - 10.1007/978-3-031-44865-2_1

M3 - Conference contribution

SN - 9783031448645

T3 - Studies in Computational Intelligence

SP - 3

EP - 12

BT - Studies in Computational Intelligence

PB - Springer Science and Business Media Deutschland GmbH

ER -

ID: 59193519