Standard

Argo_CUDA : Exhaustive GPU based approach for motif discovery in large DNA datasets. / Vishnevsky, Oleg V.; Bocharnikov, Andrey V.; Kolchanov, Nikolay A.

In: Journal of Bioinformatics and Computational Biology, Vol. 16, No. 1, 1740012, 01.02.2018.

Research output: Contribution to journalArticlepeer-review

Harvard

Vishnevsky, OV, Bocharnikov, AV & Kolchanov, NA 2018, 'Argo_CUDA: Exhaustive GPU based approach for motif discovery in large DNA datasets', Journal of Bioinformatics and Computational Biology, vol. 16, no. 1, 1740012. https://doi.org/10.1142/S0219720017400121

APA

Vancouver

Vishnevsky OV, Bocharnikov AV, Kolchanov NA. Argo_CUDA: Exhaustive GPU based approach for motif discovery in large DNA datasets. Journal of Bioinformatics and Computational Biology. 2018 Feb 1;16(1):1740012. doi: 10.1142/S0219720017400121

Author

Vishnevsky, Oleg V. ; Bocharnikov, Andrey V. ; Kolchanov, Nikolay A. / Argo_CUDA : Exhaustive GPU based approach for motif discovery in large DNA datasets. In: Journal of Bioinformatics and Computational Biology. 2018 ; Vol. 16, No. 1.

BibTeX

@article{08676a8e753d4b57b62eb6755c89098a,
title = "Argo_CUDA: Exhaustive GPU based approach for motif discovery in large DNA datasets",
abstract = "The development of chromatin immunoprecipitation sequencing (ChIP-seq) technology has revolutionized the genetic analysis of the basic mechanisms underlying transcription regulation and led to accumulation of information about a huge amount of DNA sequences. There are a lot of web services which are currently available for de novo motif discovery in datasets containing information about DNA/protein binding. An enormous motif diversity makes their finding challenging. In order to avoid the difficulties, researchers use different stochastic approaches. Unfortunately, the efficiency of the motif discovery programs dramatically declines with the query set size increase. This leads to the fact that only a fraction of top “peak” ChIP-Seq segments can be analyzed or the area of analysis should be narrowed. Thus, the motif discovery in massive datasets remains a challenging issue. Argo_Compute Unified Device Architecture (CUDA) web service is designed to process the massive DNA data. It is a program for the detection of degenerate oligonucleotide motifs of fixed length written in 15-letter IUPAC code. Argo_CUDA is a full-exhaustive approach based on the high-performance GPU technologies. Compared with the existing motif discovery web services, Argo_CUDA shows good prediction quality on simulated sets. The analysis of ChIP-Seq sequences revealed the motifs which correspond to known transcription factor binding sites.",
keywords = "ChIP-Seq, Motif discovery, oligonucleotide motif, transcription regulation, SEQUENCE MOTIFS, INFORMATION-CONTENT, FACTOR-BINDING PROFILES, IDENTIFICATION, NUCLEOTIDE, CHIP-SEQ DATA, ELEMENTS, OPEN-ACCESS DATABASE, SITES, TOOL, Chromatin Immunoprecipitation/methods, Databases, Genetic, Computational Biology/methods, Algorithms, Animals, Nucleotide Motifs, Transcription Factors/metabolism, Hepatocyte Nuclear Factor 3-beta/genetics, Mice, Binding Sites",
author = "Vishnevsky, {Oleg V.} and Bocharnikov, {Andrey V.} and Kolchanov, {Nikolay A.}",
note = "Publisher Copyright: {\textcopyright} 2018 World Scientific Publishing Europe Ltd.",
year = "2018",
month = feb,
day = "1",
doi = "10.1142/S0219720017400121",
language = "English",
volume = "16",
journal = "Journal of Bioinformatics and Computational Biology",
issn = "0219-7200",
publisher = "World Scientific Publishing Co. Pte Ltd",
number = "1",

}

RIS

TY - JOUR

T1 - Argo_CUDA

T2 - Exhaustive GPU based approach for motif discovery in large DNA datasets

AU - Vishnevsky, Oleg V.

AU - Bocharnikov, Andrey V.

AU - Kolchanov, Nikolay A.

N1 - Publisher Copyright: © 2018 World Scientific Publishing Europe Ltd.

PY - 2018/2/1

Y1 - 2018/2/1

N2 - The development of chromatin immunoprecipitation sequencing (ChIP-seq) technology has revolutionized the genetic analysis of the basic mechanisms underlying transcription regulation and led to accumulation of information about a huge amount of DNA sequences. There are a lot of web services which are currently available for de novo motif discovery in datasets containing information about DNA/protein binding. An enormous motif diversity makes their finding challenging. In order to avoid the difficulties, researchers use different stochastic approaches. Unfortunately, the efficiency of the motif discovery programs dramatically declines with the query set size increase. This leads to the fact that only a fraction of top “peak” ChIP-Seq segments can be analyzed or the area of analysis should be narrowed. Thus, the motif discovery in massive datasets remains a challenging issue. Argo_Compute Unified Device Architecture (CUDA) web service is designed to process the massive DNA data. It is a program for the detection of degenerate oligonucleotide motifs of fixed length written in 15-letter IUPAC code. Argo_CUDA is a full-exhaustive approach based on the high-performance GPU technologies. Compared with the existing motif discovery web services, Argo_CUDA shows good prediction quality on simulated sets. The analysis of ChIP-Seq sequences revealed the motifs which correspond to known transcription factor binding sites.

AB - The development of chromatin immunoprecipitation sequencing (ChIP-seq) technology has revolutionized the genetic analysis of the basic mechanisms underlying transcription regulation and led to accumulation of information about a huge amount of DNA sequences. There are a lot of web services which are currently available for de novo motif discovery in datasets containing information about DNA/protein binding. An enormous motif diversity makes their finding challenging. In order to avoid the difficulties, researchers use different stochastic approaches. Unfortunately, the efficiency of the motif discovery programs dramatically declines with the query set size increase. This leads to the fact that only a fraction of top “peak” ChIP-Seq segments can be analyzed or the area of analysis should be narrowed. Thus, the motif discovery in massive datasets remains a challenging issue. Argo_Compute Unified Device Architecture (CUDA) web service is designed to process the massive DNA data. It is a program for the detection of degenerate oligonucleotide motifs of fixed length written in 15-letter IUPAC code. Argo_CUDA is a full-exhaustive approach based on the high-performance GPU technologies. Compared with the existing motif discovery web services, Argo_CUDA shows good prediction quality on simulated sets. The analysis of ChIP-Seq sequences revealed the motifs which correspond to known transcription factor binding sites.

KW - ChIP-Seq

KW - Motif discovery

KW - oligonucleotide motif

KW - transcription regulation

KW - SEQUENCE MOTIFS

KW - INFORMATION-CONTENT

KW - FACTOR-BINDING PROFILES

KW - IDENTIFICATION

KW - NUCLEOTIDE

KW - CHIP-SEQ DATA

KW - ELEMENTS

KW - OPEN-ACCESS DATABASE

KW - SITES

KW - TOOL

KW - Chromatin Immunoprecipitation/methods

KW - Databases, Genetic

KW - Computational Biology/methods

KW - Algorithms

KW - Animals

KW - Nucleotide Motifs

KW - Transcription Factors/metabolism

KW - Hepatocyte Nuclear Factor 3-beta/genetics

KW - Mice

KW - Binding Sites

UR - http://www.scopus.com/inward/record.url?scp=85039543928&partnerID=8YFLogxK

U2 - 10.1142/S0219720017400121

DO - 10.1142/S0219720017400121

M3 - Article

C2 - 29281953

AN - SCOPUS:85039543928

VL - 16

JO - Journal of Bioinformatics and Computational Biology

JF - Journal of Bioinformatics and Computational Biology

SN - 0219-7200

IS - 1

M1 - 1740012

ER -

ID: 9399223