Research output: Contribution to journal › Article › peer-review
Computer analysis of colocalization of the TFs’ binding sites in the genome according to the ChIP-seq data. / Dergilev, A. I.; Spitsina, A. M.; Chadaeva, I. V. et al.
In: Russian Journal of Genetics: Applied Research, Vol. 7, No. 5, 01.07.2017, p. 513-522.Research output: Contribution to journal › Article › peer-review
}
TY - JOUR
T1 - Computer analysis of colocalization of the TFs’ binding sites in the genome according to the ChIP-seq data
AU - Dergilev, A. I.
AU - Spitsina, A. M.
AU - Chadaeva, I. V.
AU - Svichkarev, A. V.
AU - Naumenko, F. M.
AU - Kulakova, E. V.
AU - Galieva, E. R.
AU - Vityaev, E. E.
AU - Chen, M.
AU - Orlov, Yu L.
N1 - Publisher Copyright: © 2017, Pleiades Publishing, Ltd.
PY - 2017/7/1
Y1 - 2017/7/1
N2 - A computer program for calculating clusters of binding sites of various transcription factors (TFs) according to the genomic coordinates of the ChIP-seq (Chromatin ImmunoPrecipitation-sequencing) profile peaks is developed. The statistical features of the distribution of the transcription factors’ binding sites (TFBSs) in the mouse genome, obtained with the help of ChIP-seq experiments in embryonic stem cells, are considered. Clusters of sites containing at least four binding sites of various TFs in the mouse genome are determined and their localization relative to the regulatory regions of the genes is described. Two types of colocalization of the sites are confirmed: clusters containing binding sites of factors Oct4, Nanog, and Sox2 located in the distal regions and clusters with n-Myc and c-Myc binding sites located mainly in the promoter regions of mouse genes. Analysis of the new ChIP-seq data on the binding of TFs Nr5a2, Tbx3, Cep, SRF, and USF1 in the same cell type confirmed the differentiation of clusters of the TFBSs into two types: those containing pluripotency regulator binding sites (Oct4, Nanog, and Sox2) and those not containing them. A computer program for the statistical processing of the data on the location of the sites in the genes is developed; it uses the experimental data on site localization obtained by ChIP-seq methods in mouse and human genomes. With the help of this program, the localization patterns of the binding sites of various TFs are detected. The distances between the closest binding sites of the TF groups Oct4, Nanog, and Sox2 and the binding sites of other factors in site clusters that serve as a basis for the analysis of the joint binding of protein complexes to DNA are calculated. The fraction of the presence of the known nucleotide motifs of TFBSs in the genomic regions of ChIP-seq is calculated. The weight matrices for such nucleotide motifs are recalculated. The correlation between the presence of motifs and the ChIP-seq binding intensity is shown. The programs implementing the computerized methods for assessing the clustering of binding sites of various TFs for new ChIP-seq data are available upon request from the authors.
AB - A computer program for calculating clusters of binding sites of various transcription factors (TFs) according to the genomic coordinates of the ChIP-seq (Chromatin ImmunoPrecipitation-sequencing) profile peaks is developed. The statistical features of the distribution of the transcription factors’ binding sites (TFBSs) in the mouse genome, obtained with the help of ChIP-seq experiments in embryonic stem cells, are considered. Clusters of sites containing at least four binding sites of various TFs in the mouse genome are determined and their localization relative to the regulatory regions of the genes is described. Two types of colocalization of the sites are confirmed: clusters containing binding sites of factors Oct4, Nanog, and Sox2 located in the distal regions and clusters with n-Myc and c-Myc binding sites located mainly in the promoter regions of mouse genes. Analysis of the new ChIP-seq data on the binding of TFs Nr5a2, Tbx3, Cep, SRF, and USF1 in the same cell type confirmed the differentiation of clusters of the TFBSs into two types: those containing pluripotency regulator binding sites (Oct4, Nanog, and Sox2) and those not containing them. A computer program for the statistical processing of the data on the location of the sites in the genes is developed; it uses the experimental data on site localization obtained by ChIP-seq methods in mouse and human genomes. With the help of this program, the localization patterns of the binding sites of various TFs are detected. The distances between the closest binding sites of the TF groups Oct4, Nanog, and Sox2 and the binding sites of other factors in site clusters that serve as a basis for the analysis of the joint binding of protein complexes to DNA are calculated. The fraction of the presence of the known nucleotide motifs of TFBSs in the genomic regions of ChIP-seq is calculated. The weight matrices for such nucleotide motifs are recalculated. The correlation between the presence of motifs and the ChIP-seq binding intensity is shown. The programs implementing the computerized methods for assessing the clustering of binding sites of various TFs for new ChIP-seq data are available upon request from the authors.
KW - binding sites
KW - ChIP-seq
KW - embryonic stem cells
KW - enhancers
KW - nucleotide motifs
KW - search for regularities
UR - http://www.scopus.com/inward/record.url?scp=85028008012&partnerID=8YFLogxK
U2 - 10.1134/S2079059717050057
DO - 10.1134/S2079059717050057
M3 - Article
AN - SCOPUS:85028008012
VL - 7
SP - 513
EP - 522
JO - Russian Journal of Genetics: Applied Research
JF - Russian Journal of Genetics: Applied Research
SN - 2079-0597
IS - 5
ER -
ID: 9046165