Research output: Contribution to journal › Article › peer-review
MetArea: a software package for analysis of the mutually exclusive occurrence in pairs of motifs of transcription factor binding sites based on ChIP-seq data. / Levitsky, V. G.; Tsukanov, A. V.; Merkulova, T. I.
In: Vavilovskii Zhurnal Genetiki i Selektsii, Vol. 28, No. 8, 2024, p. 822-833.Research output: Contribution to journal › Article › peer-review
}
TY - JOUR
T1 - MetArea: a software package for analysis of the mutually exclusive occurrence in pairs of motifs of transcription factor binding sites based on ChIP-seq data
AU - Levitsky, V. G.
AU - Tsukanov, A. V.
AU - Merkulova, T. I.
N1 - The work was supported by the Russian government project No. FWNR-2022-0020, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences
PY - 2024
Y1 - 2024
N2 - ChIP-seq technology, which is based on chromatin immunoprecipitation (ChIP), allows mapping a set of genomic loci (peaks) containing binding sites (BS) for the investigated (target) transcription factor (TF). A TF may recognize several structurally different BS motifs. The multiprotein complex mapped in a ChIP-seq experiment includes target and other “partner” TFs linked by protein-protein interactions. Not all these TFs bind to DNA directly. Therefore, both target and partner TFs recognize enriched BS motifs in peaks. A de novo search approach is used to search for enriched TF BS motifs in ChIP-seq data. For a pair of enriched BS motifs of TFs, the co-occurrence or mutually exclusive occurrence can be detected from a set of peaks: the co-occurrence reflects a more frequent occurrence of two motifs in the same peaks, while the mutually exclusive means their more frequent detection in different peaks. We propose the MetArea software package to identify pairs of TF BS motifs with the mutually exclusive occurrence in ChIP-seq data. MetArea was designed to predict the structural diversity of BS motifs of the same TFs, and the functional relation of BS motifs of different TFs. The functional relation of the motifs of the two distinct TFs presumes that they are interchangeable as part of a multiprotein complex that uses the BS of these TFs to bind directly to DNA in different peaks. MetArea calculates the estimates of recognition performance pAUPRC (partial area under the Precision–Recall curve) for each of the two input single motifs, identifies the “joint” motif, and computes the performance for it too. The goal of the analysis is to find pairs of single motifs A and B for which the accuracy of the joint A&B motif is higher than those of both single motifs.
AB - ChIP-seq technology, which is based on chromatin immunoprecipitation (ChIP), allows mapping a set of genomic loci (peaks) containing binding sites (BS) for the investigated (target) transcription factor (TF). A TF may recognize several structurally different BS motifs. The multiprotein complex mapped in a ChIP-seq experiment includes target and other “partner” TFs linked by protein-protein interactions. Not all these TFs bind to DNA directly. Therefore, both target and partner TFs recognize enriched BS motifs in peaks. A de novo search approach is used to search for enriched TF BS motifs in ChIP-seq data. For a pair of enriched BS motifs of TFs, the co-occurrence or mutually exclusive occurrence can be detected from a set of peaks: the co-occurrence reflects a more frequent occurrence of two motifs in the same peaks, while the mutually exclusive means their more frequent detection in different peaks. We propose the MetArea software package to identify pairs of TF BS motifs with the mutually exclusive occurrence in ChIP-seq data. MetArea was designed to predict the structural diversity of BS motifs of the same TFs, and the functional relation of BS motifs of different TFs. The functional relation of the motifs of the two distinct TFs presumes that they are interchangeable as part of a multiprotein complex that uses the BS of these TFs to bind directly to DNA in different peaks. MetArea calculates the estimates of recognition performance pAUPRC (partial area under the Precision–Recall curve) for each of the two input single motifs, identifies the “joint” motif, and computes the performance for it too. The goal of the analysis is to find pairs of single motifs A and B for which the accuracy of the joint A&B motif is higher than those of both single motifs.
KW - PR curve
KW - area under curve
KW - cooperative action of transcription factors
KW - de novo motif search
KW - structural variants of transcription factor binding site motifs
UR - https://www.mendeley.com/catalogue/c9a02f0b-fbcc-39ef-af42-fe0f7aa7cc1b/
UR - https://www.scopus.com/record/display.uri?eid=2-s2.0-85217191627&origin=inward&txGid=57e3dd0aa7db237cf6143e9b28a720cd
U2 - 10.18699/vjgb-24-90
DO - 10.18699/vjgb-24-90
M3 - Article
C2 - 39944799
VL - 28
SP - 822
EP - 833
JO - Вавиловский журнал генетики и селекции
JF - Вавиловский журнал генетики и селекции
SN - 2500-0462
IS - 8
ER -
ID: 64715644