Evolution of Efficient Symbolic Communication Codes › Обзор исследований

Standard

Evolution of Efficient Symbolic Communication Codes. / Kolonin, Anton.

Studies in Computational Intelligence. Springer, 2023. стр. 3-12 (Studies in Computational Intelligence; Том 1120).

Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование

Harvard

Kolonin, A 2023, Evolution of Efficient Symbolic Communication Codes. в Studies in Computational Intelligence. Studies in Computational Intelligence, Том. 1120, Springer, стр. 3-12. https://doi.org/10.1007/978-3-031-44865-2_1

APA

Kolonin, A. (2023). Evolution of Efficient Symbolic Communication Codes. в Studies in Computational Intelligence (стр. 3-12). (Studies in Computational Intelligence; Том 1120). Springer. https://doi.org/10.1007/978-3-031-44865-2_1

Vancouver

Kolonin A. Evolution of Efficient Symbolic Communication Codes. в Studies in Computational Intelligence. Springer. 2023. стр. 3-12. (Studies in Computational Intelligence). doi: 10.1007/978-3-031-44865-2_1

Author

Kolonin, Anton. / Evolution of Efficient Symbolic Communication Codes. Studies in Computational Intelligence. Springer, 2023. стр. 3-12 (Studies in Computational Intelligence).

BibTeX

@inproceedings{60dfdef9d36b40f8980a2262822fd93b,

title = "Evolution of Efficient Symbolic Communication Codes",

abstract = "The paper explores how the human natural language structure can be seen as a product of evolution of inter-personal communication code, targeting maximization of such culture-agnostic and cross-lingual metrics such as anti-entropy, compression factor and cross-split F1 score. The exploration is done as part of a larger unsupervised language learning effort, the attempt is made to perform meta-learning in a space of hyper-parameters maximizing F1 score based on the “ground truth” language structure, by means of maximizing the metrics mentioned above. The paper presents preliminary results of cross-lingual word-level segmentation tokenization study for Russian, Chinese and English as well as subword segmentation or morpho-parsing study for English. It is found that language structure form the word-level segmentation or tokenization can be found as driven by all of these metrics, anti-entropy being more relevant to English and Russian while compression factor more specific for Chinese. The study for subword segmentation or morpho-parsing on English lexicon has revealed straight connection between the compression been found to be associated with compression factor, while, surprising, the same connection with anti-entropy has turned to be the inverse.",

keywords = "Communication Code, Compression, Cross-lingual, Entropy, Meta-learning, Natural Language, Subword Segmentation, Tokenization, Unsupervised Language Learning",

author = "Anton Kolonin",

note = "We are grateful to Sergey Terekhov and Nikolay Mikhaylovskiy for valuable questions, critique, recommendations and suggestions during the course of work.",

year = "2023",

doi = "10.1007/978-3-031-44865-2_1",

language = "English",

isbn = "9783031448645",

series = "Studies in Computational Intelligence",

publisher = "Springer",

pages = "3--12",

booktitle = "Studies in Computational Intelligence",

address = "United States",

}

RIS

TY - GEN

T1 - Evolution of Efficient Symbolic Communication Codes

AU - Kolonin, Anton

N1 - We are grateful to Sergey Terekhov and Nikolay Mikhaylovskiy for valuable questions, critique, recommendations and suggestions during the course of work.

PY - 2023

Y1 - 2023

N2 - The paper explores how the human natural language structure can be seen as a product of evolution of inter-personal communication code, targeting maximization of such culture-agnostic and cross-lingual metrics such as anti-entropy, compression factor and cross-split F1 score. The exploration is done as part of a larger unsupervised language learning effort, the attempt is made to perform meta-learning in a space of hyper-parameters maximizing F1 score based on the “ground truth” language structure, by means of maximizing the metrics mentioned above. The paper presents preliminary results of cross-lingual word-level segmentation tokenization study for Russian, Chinese and English as well as subword segmentation or morpho-parsing study for English. It is found that language structure form the word-level segmentation or tokenization can be found as driven by all of these metrics, anti-entropy being more relevant to English and Russian while compression factor more specific for Chinese. The study for subword segmentation or morpho-parsing on English lexicon has revealed straight connection between the compression been found to be associated with compression factor, while, surprising, the same connection with anti-entropy has turned to be the inverse.

AB - The paper explores how the human natural language structure can be seen as a product of evolution of inter-personal communication code, targeting maximization of such culture-agnostic and cross-lingual metrics such as anti-entropy, compression factor and cross-split F1 score. The exploration is done as part of a larger unsupervised language learning effort, the attempt is made to perform meta-learning in a space of hyper-parameters maximizing F1 score based on the “ground truth” language structure, by means of maximizing the metrics mentioned above. The paper presents preliminary results of cross-lingual word-level segmentation tokenization study for Russian, Chinese and English as well as subword segmentation or morpho-parsing study for English. It is found that language structure form the word-level segmentation or tokenization can be found as driven by all of these metrics, anti-entropy being more relevant to English and Russian while compression factor more specific for Chinese. The study for subword segmentation or morpho-parsing on English lexicon has revealed straight connection between the compression been found to be associated with compression factor, while, surprising, the same connection with anti-entropy has turned to be the inverse.

KW - Communication Code

KW - Compression

KW - Cross-lingual

KW - Entropy

KW - Meta-learning

KW - Natural Language

KW - Subword Segmentation

KW - Tokenization

KW - Unsupervised Language Learning

UR - https://www.scopus.com/record/display.uri?eid=2-s2.0-85175815154&origin=inward&txGid=ff8763439a02e4b423dba92fe55cb89a

UR - https://www.mendeley.com/catalogue/f2b6373d-6628-35b8-a066-cd733dec2e03/

U2 - 10.1007/978-3-031-44865-2_1

DO - 10.1007/978-3-031-44865-2_1

M3 - Conference contribution

SN - 9783031448645

T3 - Studies in Computational Intelligence

SP - 3

EP - 12

BT - Studies in Computational Intelligence

PB - Springer

ER -

ID: 59193519