Standard

Comparative analysis of alignment-free genome clustering and whole genome alignment-based phylogenomic relationship of coronaviruses. / Kirichenko, Anastasiya D.; Poroshina, Anastasiya A.; Sherbakov, Dmitry Yu et al.

In: PLoS ONE, Vol. 17, No. 3, e0264640, 03.2022.

Research output: Contribution to journalArticlepeer-review

Harvard

APA

Vancouver

Kirichenko AD, Poroshina AA, Sherbakov DY, Sadovsky MG, Krutovsky KV. Comparative analysis of alignment-free genome clustering and whole genome alignment-based phylogenomic relationship of coronaviruses. PLoS ONE. 2022 Mar;17(3):e0264640. doi: 10.1371/journal.pone.0264640

Author

BibTeX

@article{b1fb9db389a5431d8ad8ac6371a58942,
title = "Comparative analysis of alignment-free genome clustering and whole genome alignment-based phylogenomic relationship of coronaviruses",
abstract = "The SARS-CoV-2 is the third coronavirus in addition to SARS-CoV and MERS-CoV that causes severe respiratory syndrome in humans. All of them likely crossed the interspecific barrier between animals and humans and are of zoonotic origin, respectively. The origin and evolution of viruses and their phylogenetic relationships are of great importance for study of their pathogenicity and development of antiviral drugs and vaccines. The main objective of the presented study was to compare two methods for identifying relationships between coronavirus genomes: phylogenetic one based on the whole genome alignment followed by molecular phylogenetic tree inference and alignment-free clustering of triplet frequencies, respectively, using 69 coronavirus genomes selected from two public databases. Both approaches resulted in well-resolved robust classifications. In general, the clusters identified by the first approach were in good agreement with the classes identified by the second using K-means and the elastic map method, but not always, which still needs to be explained. Both approaches demonstrated also a significant divergence of genomes on a taxonomic level, but there was less correspondence between genomes regarding the types of diseases they caused, which may be due to the individual characteristics of the host. This research showed that alignment-free methods are efficient in combination with alignment-based methods. They have a significant advantage in computational complexity and provide valuable additional alternative information on the genomes relationships.",
keywords = "Chromosome Mapping, Cluster Analysis, Comparative Genomic Hybridization/methods, Coronavirus/classification, Genome, Viral, Humans, Phylogeny, SARS-CoV-2/classification, Sequence Alignment",
author = "Kirichenko, {Anastasiya D.} and Poroshina, {Anastasiya A.} and Sherbakov, {Dmitry Yu} and Sadovsky, {Michael G.} and Krutovsky, {Konstantin V.}",
note = "Funding Information: D.Y.S. and A.A.P. were supported by the basic project № 0279-2021-002 funded by the Ministry of Science and Higher Education of the Russian Federation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Publisher Copyright: {\textcopyright} 2022 Kirichenko et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.",
year = "2022",
month = mar,
doi = "10.1371/journal.pone.0264640",
language = "English",
volume = "17",
journal = "PLoS ONE",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "3",

}

RIS

TY - JOUR

T1 - Comparative analysis of alignment-free genome clustering and whole genome alignment-based phylogenomic relationship of coronaviruses

AU - Kirichenko, Anastasiya D.

AU - Poroshina, Anastasiya A.

AU - Sherbakov, Dmitry Yu

AU - Sadovsky, Michael G.

AU - Krutovsky, Konstantin V.

N1 - Funding Information: D.Y.S. and A.A.P. were supported by the basic project № 0279-2021-002 funded by the Ministry of Science and Higher Education of the Russian Federation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Publisher Copyright: © 2022 Kirichenko et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PY - 2022/3

Y1 - 2022/3

N2 - The SARS-CoV-2 is the third coronavirus in addition to SARS-CoV and MERS-CoV that causes severe respiratory syndrome in humans. All of them likely crossed the interspecific barrier between animals and humans and are of zoonotic origin, respectively. The origin and evolution of viruses and their phylogenetic relationships are of great importance for study of their pathogenicity and development of antiviral drugs and vaccines. The main objective of the presented study was to compare two methods for identifying relationships between coronavirus genomes: phylogenetic one based on the whole genome alignment followed by molecular phylogenetic tree inference and alignment-free clustering of triplet frequencies, respectively, using 69 coronavirus genomes selected from two public databases. Both approaches resulted in well-resolved robust classifications. In general, the clusters identified by the first approach were in good agreement with the classes identified by the second using K-means and the elastic map method, but not always, which still needs to be explained. Both approaches demonstrated also a significant divergence of genomes on a taxonomic level, but there was less correspondence between genomes regarding the types of diseases they caused, which may be due to the individual characteristics of the host. This research showed that alignment-free methods are efficient in combination with alignment-based methods. They have a significant advantage in computational complexity and provide valuable additional alternative information on the genomes relationships.

AB - The SARS-CoV-2 is the third coronavirus in addition to SARS-CoV and MERS-CoV that causes severe respiratory syndrome in humans. All of them likely crossed the interspecific barrier between animals and humans and are of zoonotic origin, respectively. The origin and evolution of viruses and their phylogenetic relationships are of great importance for study of their pathogenicity and development of antiviral drugs and vaccines. The main objective of the presented study was to compare two methods for identifying relationships between coronavirus genomes: phylogenetic one based on the whole genome alignment followed by molecular phylogenetic tree inference and alignment-free clustering of triplet frequencies, respectively, using 69 coronavirus genomes selected from two public databases. Both approaches resulted in well-resolved robust classifications. In general, the clusters identified by the first approach were in good agreement with the classes identified by the second using K-means and the elastic map method, but not always, which still needs to be explained. Both approaches demonstrated also a significant divergence of genomes on a taxonomic level, but there was less correspondence between genomes regarding the types of diseases they caused, which may be due to the individual characteristics of the host. This research showed that alignment-free methods are efficient in combination with alignment-based methods. They have a significant advantage in computational complexity and provide valuable additional alternative information on the genomes relationships.

KW - Chromosome Mapping

KW - Cluster Analysis

KW - Comparative Genomic Hybridization/methods

KW - Coronavirus/classification

KW - Genome, Viral

KW - Humans

KW - Phylogeny

KW - SARS-CoV-2/classification

KW - Sequence Alignment

UR - http://www.scopus.com/inward/record.url?scp=85126078546&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/72ce7f0d-1b1d-36e1-94e3-070c37174d49/

U2 - 10.1371/journal.pone.0264640

DO - 10.1371/journal.pone.0264640

M3 - Article

C2 - 35259178

AN - SCOPUS:85126078546

VL - 17

JO - PLoS ONE

JF - PLoS ONE

SN - 1932-6203

IS - 3

M1 - e0264640

ER -

ID: 35663854