Research output: Contribution to journal › Article › peer-review
Division of the Standard Set of Amino Acids into Groups According to Their Evolutionary Age. / Efimov, V. M.; Efimov, K. V.; Kovaleva, V. Yu.
In: Molecular Biology, Vol. 59, No. 2, 15.06.2025, p. 263-271.Research output: Contribution to journal › Article › peer-review
}
TY - JOUR
T1 - Division of the Standard Set of Amino Acids into Groups According to Their Evolutionary Age
AU - Efimov, V. M.
AU - Efimov, K. V.
AU - Kovaleva, V. Yu
N1 - This work was supported by the Program of Fundamental Scientific Research of the State Academies of Sciences FWNR-2022-0019 of the Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences, and FWGS-2021-0002 of the Institute of Animal Systematics and Ecology, Siberian Branch, Russian Academy of Sciences.
PY - 2025/6/15
Y1 - 2025/6/15
N2 - Abstract: It is generally accepted that the existing set of proteinogenic amino acids encoded by the standard genetic code was formed step by step in the course of evolution. Most studies name Ala, Asp, Glu, Gly, Ile, Leu, Pro, Ser, Thr, and Val as early amino acids, presumably of extraterrestrial origin. However, other studies have chosen a consensus list of early amino acids in which Ile is replaced by Arg. We compared the differences between early and late amino acids for the lists with Ile and with Arg based on their physicochemical properties (AAindex database). The point-biserial correlation coefficient rpb, Student’s t-test, and its reliability, the p-value, were calculated between the binary lists with Ile and Arg and each AA index. Since in total 2×553 p‑values were obtained, the problem of multiple comparisons was solved using the Bonferroni correction and the Benjamini–Hochberg method. Next, we used the 2B-PLS method, which is applied to two different sets of variables related to the same objects, to find information common to both sets. The first set was the binary lists of Trifonov (Arg) and Wong (Ile), and the second set was 553 AA indexes. The maximum correlation with both the list with Ile and with Arg (1.0 and 0.8, respectively) was demonstrated by the binary AA index CHAM830108, which characterizes the ability of an amino acid to be a charge donor: late amino acids are capable of being donors, while early ones are not. Apparently, this is due to the differences in the conditions under which the standard set of amino acids evolved: prebiotic and biotic. The results of the 2B-PLS analysis also show that in the list of ten evolutionarily early amino acids, Ile appears preferable to Arg. The allocation of the last six amino acids (Cys, His, Met, Phe, Trp, and Tyr) obtained on the basis of the reduction of the HOMO–LUMO gap in a separate, third stage of the evolution of the set of standard amino acids is confirmed. A compact arrangement on the 2B-PLS plane of the physicochemical properties of three groups of amino acids, in which adenine, thymine, and cytosine are located in the second position of the codons, respectively, as well as the maximum dispersion of amino acids with guanine in the second position of the codons, is revealed.
AB - Abstract: It is generally accepted that the existing set of proteinogenic amino acids encoded by the standard genetic code was formed step by step in the course of evolution. Most studies name Ala, Asp, Glu, Gly, Ile, Leu, Pro, Ser, Thr, and Val as early amino acids, presumably of extraterrestrial origin. However, other studies have chosen a consensus list of early amino acids in which Ile is replaced by Arg. We compared the differences between early and late amino acids for the lists with Ile and with Arg based on their physicochemical properties (AAindex database). The point-biserial correlation coefficient rpb, Student’s t-test, and its reliability, the p-value, were calculated between the binary lists with Ile and Arg and each AA index. Since in total 2×553 p‑values were obtained, the problem of multiple comparisons was solved using the Bonferroni correction and the Benjamini–Hochberg method. Next, we used the 2B-PLS method, which is applied to two different sets of variables related to the same objects, to find information common to both sets. The first set was the binary lists of Trifonov (Arg) and Wong (Ile), and the second set was 553 AA indexes. The maximum correlation with both the list with Ile and with Arg (1.0 and 0.8, respectively) was demonstrated by the binary AA index CHAM830108, which characterizes the ability of an amino acid to be a charge donor: late amino acids are capable of being donors, while early ones are not. Apparently, this is due to the differences in the conditions under which the standard set of amino acids evolved: prebiotic and biotic. The results of the 2B-PLS analysis also show that in the list of ten evolutionarily early amino acids, Ile appears preferable to Arg. The allocation of the last six amino acids (Cys, His, Met, Phe, Trp, and Tyr) obtained on the basis of the reduction of the HOMO–LUMO gap in a separate, third stage of the evolution of the set of standard amino acids is confirmed. A compact arrangement on the 2B-PLS plane of the physicochemical properties of three groups of amino acids, in which adenine, thymine, and cytosine are located in the second position of the codons, respectively, as well as the maximum dispersion of amino acids with guanine in the second position of the codons, is revealed.
KW - 2B-PLS analysis
KW - AAindex
KW - Benjamini–Hochberg method
KW - Bonferroni correction
KW - CHAM830108
KW - early and late amino acids
KW - point-biserial correlation coefficient
KW - p‑value
UR - https://www.mendeley.com/catalogue/f352f3c1-d532-3326-812f-3452f7382f42/
UR - https://www.scopus.com/record/display.uri?eid=2-s2.0-105008308598&origin=inward&txGid=64cd952e6e647ac80f05d1040767e990
U2 - 10.1134/S0026893324700894
DO - 10.1134/S0026893324700894
M3 - Article
VL - 59
SP - 263
EP - 271
JO - Molecular Biology
JF - Molecular Biology
SN - 0026-8933
IS - 2
ER -
ID: 68148746