Research output: Contribution to journal › Article › peer-review
Acceleration Of Recombinant Viral Sequences Search By 3SEQ Algorithm Via Adding Support Of Multi-Threaded Calculations And Considering Sample Collection Dates. / Devyaterikov, A. P.; Palyanov, A. Y.
In: Mathematical Biology and Bioinformatics, Vol. 19, No. 2, 01.2024, p. 338-353.Research output: Contribution to journal › Article › peer-review
}
TY - JOUR
T1 - Acceleration Of Recombinant Viral Sequences Search By 3SEQ Algorithm Via Adding Support Of Multi-Threaded Calculations And Considering Sample Collection Dates
AU - Devyaterikov, A. P.
AU - Palyanov, A. Y.
PY - 2024/1
Y1 - 2024/1
N2 - The article presents an efficient multithreaded implementation of the modern 3SEQ algorithm for detecting recombinant genetic sequences, tested on viral genomes. The work was carried out within the framework of the project to create a domestic (Russian) web-platform (bioprojects.iis.nsk.su) for solving a wide range of problems related to data analysis in the field of bioinformatics, virology and epidemiology. A recombinant viral genome emerges when two different variants of virus genomes of the same species exchange their parts, which is possible in case of infection with both variants simultaneously. The emergence of recombinants is rare but important events in the context of virus evolution research. One of the most promising among the existing algorithms for searching for recombinants is 3SEQ, but the author’s version works only in single-threaded mode. We implemented this algorithm with support for multithreaded computing and taking into account the dates of sample collection, which provided a significant increase in the computing speed. The developed software was used to search for recombinants in the samples of influenza A H1N1 (only PB2 segments from 2174 genomes were analyzed), Dengue fever (726 genomes), Ebola virus (865 genomes) and in two samples of SARS-CoV-2 coronavirus (776 and 2132 genomes). No recombinants were found for influenza A H1N1 (PB2 segment) and the first dataset on SARS-CoV-2 (variant from Russia), which is in agreement with the analysis of the same data by the RDP algorithm. For the second SARS-CoV-2 dataset (variants from the Siberian Federal District), the only recombinant present in the dataset was correctly found. 725 recombinants were found in Dengue fever viruses, with a recombination region length in the range from 50 to 1000 nucleotides. In Ebola viruses, the length of the recombination region was shorter – in 572 recombinants it was in the range of 50 to 100 nucleotides, and in 249 genomes – was less than 50 nucleotides.
AB - The article presents an efficient multithreaded implementation of the modern 3SEQ algorithm for detecting recombinant genetic sequences, tested on viral genomes. The work was carried out within the framework of the project to create a domestic (Russian) web-platform (bioprojects.iis.nsk.su) for solving a wide range of problems related to data analysis in the field of bioinformatics, virology and epidemiology. A recombinant viral genome emerges when two different variants of virus genomes of the same species exchange their parts, which is possible in case of infection with both variants simultaneously. The emergence of recombinants is rare but important events in the context of virus evolution research. One of the most promising among the existing algorithms for searching for recombinants is 3SEQ, but the author’s version works only in single-threaded mode. We implemented this algorithm with support for multithreaded computing and taking into account the dates of sample collection, which provided a significant increase in the computing speed. The developed software was used to search for recombinants in the samples of influenza A H1N1 (only PB2 segments from 2174 genomes were analyzed), Dengue fever (726 genomes), Ebola virus (865 genomes) and in two samples of SARS-CoV-2 coronavirus (776 and 2132 genomes). No recombinants were found for influenza A H1N1 (PB2 segment) and the first dataset on SARS-CoV-2 (variant from Russia), which is in agreement with the analysis of the same data by the RDP algorithm. For the second SARS-CoV-2 dataset (variants from the Siberian Federal District), the only recombinant present in the dataset was correctly found. 725 recombinants were found in Dengue fever viruses, with a recombination region length in the range from 50 to 1000 nucleotides. In Ebola viruses, the length of the recombination region was shorter – in 572 recombinants it was in the range of 50 to 100 nucleotides, and in 249 genomes – was less than 50 nucleotides.
KW - 3SEQ algorithm
KW - acceleration
KW - bioinformatics
KW - computational performance
KW - multithreaded
KW - recombinants detection
KW - software
KW - virology
KW - алгоритм 3SEQ
KW - биоинформатика
KW - вирусология
KW - многопоточность
KW - поиск рекомбинантов
KW - программа
KW - ускорение вычислений
UR - https://www.scopus.com/record/display.uri?eid=2-s2.0-85211229505&origin=inward&txGid=0ce9100fb4ebbab62dabcfa7b913f680
UR - https://www.mendeley.com/catalogue/9429fb5d-3f20-3ab3-8fa3-a8094630619a/
U2 - 10.17537/2024.19.338
DO - 10.17537/2024.19.338
M3 - Article
VL - 19
SP - 338
EP - 353
JO - Mathematical Biology and Bioinformatics
JF - Mathematical Biology and Bioinformatics
SN - 1994-6538
IS - 2
ER -
ID: 61295730