Standard

Population size estimation for quality control of ChIP-Seq datasets. / Kolmykov, Semyon K.; Kondrakhin, Yury V.; Yevshin, Ivan S. et al.

In: PLoS ONE, Vol. 14, No. 8, e0221760, 01.08.2019, p. e0221760.

Research output: Contribution to journalArticlepeer-review

Harvard

Kolmykov, SK, Kondrakhin, YV, Yevshin, IS, Sharipov, RN, Ryabova, AS & Kolpakov, FA 2019, 'Population size estimation for quality control of ChIP-Seq datasets', PLoS ONE, vol. 14, no. 8, e0221760, pp. e0221760. https://doi.org/10.1371/journal.pone.0221760

APA

Kolmykov, S. K., Kondrakhin, Y. V., Yevshin, I. S., Sharipov, R. N., Ryabova, A. S., & Kolpakov, F. A. (2019). Population size estimation for quality control of ChIP-Seq datasets. PLoS ONE, 14(8), e0221760. [e0221760]. https://doi.org/10.1371/journal.pone.0221760

Vancouver

Kolmykov SK, Kondrakhin YV, Yevshin IS, Sharipov RN, Ryabova AS, Kolpakov FA. Population size estimation for quality control of ChIP-Seq datasets. PLoS ONE. 2019 Aug 1;14(8):e0221760. e0221760. doi: 10.1371/journal.pone.0221760

Author

Kolmykov, Semyon K. ; Kondrakhin, Yury V. ; Yevshin, Ivan S. et al. / Population size estimation for quality control of ChIP-Seq datasets. In: PLoS ONE. 2019 ; Vol. 14, No. 8. pp. e0221760.

BibTeX

@article{638b340e77964cee845723659d98e217,
title = "Population size estimation for quality control of ChIP-Seq datasets",
abstract = "Chromatin immunoprecipitation followed by sequencing, i.e. ChIP-Seq, is a widely used experimental technology for the identification of functional protein-DNA interactions. Nowadays, such databases as ENCODE, GTRD, ChIP-Atlas and ReMap systematically collect and annotate a large number of ChIP-Seq datasets. Comprehensive control of dataset quality is currently indispensable to select the most reliable data for further analysis. In addition to existing quality control metrics, we have developed two novel metrics that allow to control false positives and false negatives in ChIP-Seq datasets. For this purpose, we have adapted well-known population size estimate for determination of unknown number of genuine transcription factor binding regions. Determination of the proposed metrics was based on overlapping distinct binding sites derived from processing one ChIP-Seq experiment by different peak callers. Moreover, the metrics also can be useful for assessing quality of datasets obtained from processing distinct ChIP-Seq experiments by a given peak caller. We also have shown that these metrics appear to be useful not only for dataset selection but also for comparison of peak callers and identification of site motifs based on ChIP-Seq datasets. The developed algorithm for determination of the false positive control metric and false negative control metric for ChIP-Seq datasets was implemented as a plugin for a BioUML platform: https://ict.biouml.org/bioumlweb/chipseq_analysis.html.",
keywords = "FACTOR-BINDING SITES, CAPTURE-RECAPTURE, DATABASE",
author = "Kolmykov, {Semyon K.} and Kondrakhin, {Yury V.} and Yevshin, {Ivan S.} and Sharipov, {Ruslan N.} and Ryabova, {Anna S.} and Kolpakov, {Fedor A.}",
year = "2019",
month = aug,
day = "1",
doi = "10.1371/journal.pone.0221760",
language = "English",
volume = "14",
pages = "e0221760",
journal = "PLoS ONE",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "8",

}

RIS

TY - JOUR

T1 - Population size estimation for quality control of ChIP-Seq datasets

AU - Kolmykov, Semyon K.

AU - Kondrakhin, Yury V.

AU - Yevshin, Ivan S.

AU - Sharipov, Ruslan N.

AU - Ryabova, Anna S.

AU - Kolpakov, Fedor A.

PY - 2019/8/1

Y1 - 2019/8/1

N2 - Chromatin immunoprecipitation followed by sequencing, i.e. ChIP-Seq, is a widely used experimental technology for the identification of functional protein-DNA interactions. Nowadays, such databases as ENCODE, GTRD, ChIP-Atlas and ReMap systematically collect and annotate a large number of ChIP-Seq datasets. Comprehensive control of dataset quality is currently indispensable to select the most reliable data for further analysis. In addition to existing quality control metrics, we have developed two novel metrics that allow to control false positives and false negatives in ChIP-Seq datasets. For this purpose, we have adapted well-known population size estimate for determination of unknown number of genuine transcription factor binding regions. Determination of the proposed metrics was based on overlapping distinct binding sites derived from processing one ChIP-Seq experiment by different peak callers. Moreover, the metrics also can be useful for assessing quality of datasets obtained from processing distinct ChIP-Seq experiments by a given peak caller. We also have shown that these metrics appear to be useful not only for dataset selection but also for comparison of peak callers and identification of site motifs based on ChIP-Seq datasets. The developed algorithm for determination of the false positive control metric and false negative control metric for ChIP-Seq datasets was implemented as a plugin for a BioUML platform: https://ict.biouml.org/bioumlweb/chipseq_analysis.html.

AB - Chromatin immunoprecipitation followed by sequencing, i.e. ChIP-Seq, is a widely used experimental technology for the identification of functional protein-DNA interactions. Nowadays, such databases as ENCODE, GTRD, ChIP-Atlas and ReMap systematically collect and annotate a large number of ChIP-Seq datasets. Comprehensive control of dataset quality is currently indispensable to select the most reliable data for further analysis. In addition to existing quality control metrics, we have developed two novel metrics that allow to control false positives and false negatives in ChIP-Seq datasets. For this purpose, we have adapted well-known population size estimate for determination of unknown number of genuine transcription factor binding regions. Determination of the proposed metrics was based on overlapping distinct binding sites derived from processing one ChIP-Seq experiment by different peak callers. Moreover, the metrics also can be useful for assessing quality of datasets obtained from processing distinct ChIP-Seq experiments by a given peak caller. We also have shown that these metrics appear to be useful not only for dataset selection but also for comparison of peak callers and identification of site motifs based on ChIP-Seq datasets. The developed algorithm for determination of the false positive control metric and false negative control metric for ChIP-Seq datasets was implemented as a plugin for a BioUML platform: https://ict.biouml.org/bioumlweb/chipseq_analysis.html.

KW - FACTOR-BINDING SITES

KW - CAPTURE-RECAPTURE

KW - DATABASE

UR - http://www.scopus.com/inward/record.url?scp=85071401703&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0221760

DO - 10.1371/journal.pone.0221760

M3 - Article

C2 - 31465497

AN - SCOPUS:85071401703

VL - 14

SP - e0221760

JO - PLoS ONE

JF - PLoS ONE

SN - 1932-6203

IS - 8

M1 - e0221760

ER -

ID: 21349052