Gbs-dp: a bioinformatics pipeline for processing data coming from genotyping by sequencing

Standard

Gbs-dp: a bioinformatics pipeline for processing data coming from genotyping by sequencing. / Pronozin, A Y; Salina, E A; Afonnikov, D A.

In: Vavilovskii Zhurnal Genetiki i Selektsii, Vol. 27, No. 7, 12.2023, p. 737-745.

Research output: Contribution to journal › Article › peer-review

Harvard

Pronozin, AY, Salina, EA & Afonnikov, DA 2023, 'Gbs-dp: a bioinformatics pipeline for processing data coming from genotyping by sequencing', Vavilovskii Zhurnal Genetiki i Selektsii, vol. 27, no. 7, pp. 737-745. https://doi.org/10.18699/VJGB-23-86

APA

Pronozin, A. Y., Salina, E. A., & Afonnikov, D. A. (2023). Gbs-dp: a bioinformatics pipeline for processing data coming from genotyping by sequencing. Vavilovskii Zhurnal Genetiki i Selektsii, 27(7), 737-745. https://doi.org/10.18699/VJGB-23-86

Vancouver

Pronozin AY, Salina EA, Afonnikov DA. Gbs-dp: a bioinformatics pipeline for processing data coming from genotyping by sequencing. Vavilovskii Zhurnal Genetiki i Selektsii. 2023 Dec;27(7):737-745. doi: 10.18699/VJGB-23-86

Author

Pronozin, A Y ; Salina, E A ; Afonnikov, D A. / Gbs-dp: a bioinformatics pipeline for processing data coming from genotyping by sequencing. In: Vavilovskii Zhurnal Genetiki i Selektsii. 2023 ; Vol. 27, No. 7. pp. 737-745.

BibTeX

@article{96fc6b3d30364785be1efd40435440e2,

title = "Gbs-dp: a bioinformatics pipeline for processing data coming from genotyping by sequencing",

abstract = "The development of next-generation sequencing technologies has provided new opportunities for genotyping various organisms, including plants. Genotyping by sequencing (GBS) is used to identify genetic variability more rapidly, and is more cost-effective than whole-genome sequencing. GBS has demonstrated its reliability and flexibility for a number of plant species and populations. It has been applied to genetic mapping, molecular marker discovery, genomic selection, genetic diversity studies, variety identification, conservation biology and evolutionary studies. However, reduction in sequencing time and cost has led to the need to develop efficient bioinformatics analyses for an ever-expanding amount of sequenced data. Bioinformatics pipelines for GBS data analysis serve the purpose. Due to the similarity of data processing steps, existing pipelines are mainly characterised by a combination of software packages specifically selected either to process data for certain organisms or to process data from any organisms. However, despite the usage of efficient software packages, these pipelines have some disadvantages. For example, there is a lack of process automation (in some pipelines, each step must be started manually), which significantly reduces the performance of the analysis. In the majority of pipelines, there is no possibility of automatic installation of all necessary software packages; for most of them, it is also impossible to switch off unnecessary or completed steps. In the present work, we have developed a GBS-DP bioinformatics pipeline for GBS data analysis. The pipeline can be applied for various species. The pipeline is implemented using the Snakemake workflow engine. This implementation allows fully automating the process of calculation and installation of the necessary software packages. Our pipeline is able to perform analysis of large datasets (more than 400 samples).",

author = "Pronozin, {A Y} and Salina, {E A} and Afonnikov, {D A}",

note = "The work was supported by the budget project FWNR-2022-0020. Copyright {\textcopyright} AUTHORS. Публикация для корректировки.",

year = "2023",

month = dec,

doi = "10.18699/VJGB-23-86",

language = "English",

volume = "27",

pages = "737--745",

journal = "Вавиловский журнал генетики и селекции",

issn = "2500-0462",

publisher = "Институт цитологии и генетики СО РАН",

number = "7",

}

RIS

TY - JOUR

T1 - Gbs-dp: a bioinformatics pipeline for processing data coming from genotyping by sequencing

AU - Pronozin, A Y

AU - Salina, E A

AU - Afonnikov, D A

PY - 2023/12

Y1 - 2023/12

N2 - The development of next-generation sequencing technologies has provided new opportunities for genotyping various organisms, including plants. Genotyping by sequencing (GBS) is used to identify genetic variability more rapidly, and is more cost-effective than whole-genome sequencing. GBS has demonstrated its reliability and flexibility for a number of plant species and populations. It has been applied to genetic mapping, molecular marker discovery, genomic selection, genetic diversity studies, variety identification, conservation biology and evolutionary studies. However, reduction in sequencing time and cost has led to the need to develop efficient bioinformatics analyses for an ever-expanding amount of sequenced data. Bioinformatics pipelines for GBS data analysis serve the purpose. Due to the similarity of data processing steps, existing pipelines are mainly characterised by a combination of software packages specifically selected either to process data for certain organisms or to process data from any organisms. However, despite the usage of efficient software packages, these pipelines have some disadvantages. For example, there is a lack of process automation (in some pipelines, each step must be started manually), which significantly reduces the performance of the analysis. In the majority of pipelines, there is no possibility of automatic installation of all necessary software packages; for most of them, it is also impossible to switch off unnecessary or completed steps. In the present work, we have developed a GBS-DP bioinformatics pipeline for GBS data analysis. The pipeline can be applied for various species. The pipeline is implemented using the Snakemake workflow engine. This implementation allows fully automating the process of calculation and installation of the necessary software packages. Our pipeline is able to perform analysis of large datasets (more than 400 samples).

AB - The development of next-generation sequencing technologies has provided new opportunities for genotyping various organisms, including plants. Genotyping by sequencing (GBS) is used to identify genetic variability more rapidly, and is more cost-effective than whole-genome sequencing. GBS has demonstrated its reliability and flexibility for a number of plant species and populations. It has been applied to genetic mapping, molecular marker discovery, genomic selection, genetic diversity studies, variety identification, conservation biology and evolutionary studies. However, reduction in sequencing time and cost has led to the need to develop efficient bioinformatics analyses for an ever-expanding amount of sequenced data. Bioinformatics pipelines for GBS data analysis serve the purpose. Due to the similarity of data processing steps, existing pipelines are mainly characterised by a combination of software packages specifically selected either to process data for certain organisms or to process data from any organisms. However, despite the usage of efficient software packages, these pipelines have some disadvantages. For example, there is a lack of process automation (in some pipelines, each step must be started manually), which significantly reduces the performance of the analysis. In the majority of pipelines, there is no possibility of automatic installation of all necessary software packages; for most of them, it is also impossible to switch off unnecessary or completed steps. In the present work, we have developed a GBS-DP bioinformatics pipeline for GBS data analysis. The pipeline can be applied for various species. The pipeline is implemented using the Snakemake workflow engine. This implementation allows fully automating the process of calculation and installation of the necessary software packages. Our pipeline is able to perform analysis of large datasets (more than 400 samples).

UR - https://www.scopus.com/record/display.uri?eid=2-s2.0-85181506700&origin=inward&txGid=e7f89040e8fb5747f48ffd0c477d5b1c

U2 - 10.18699/VJGB-23-86

DO - 10.18699/VJGB-23-86

M3 - Article

C2 - 38213704

VL - 27

SP - 737

EP - 745

JO - Вавиловский журнал генетики и селекции

JF - Вавиловский журнал генетики и селекции

SN - 2500-0462

IS - 7

ER -

ID: 59529955