Standard

MPRAdecoder : Processing of the Raw MPRA Data With a priori Unknown Sequences of the Region of Interest and Associated Barcodes. / Letiagina, Anna E.; Omelina, Evgeniya S.; Ivankin, Anton V. et al.

In: Frontiers in Genetics, Vol. 12, 618189, 11.05.2021, p. 618189.

Research output: Contribution to journalArticlepeer-review

Harvard

Letiagina, AE, Omelina, ES, Ivankin, AV & Pindyurin, AV 2021, 'MPRAdecoder: Processing of the Raw MPRA Data With a priori Unknown Sequences of the Region of Interest and Associated Barcodes', Frontiers in Genetics, vol. 12, 618189, pp. 618189. https://doi.org/10.3389/fgene.2021.618189

APA

Letiagina, A. E., Omelina, E. S., Ivankin, A. V., & Pindyurin, A. V. (2021). MPRAdecoder: Processing of the Raw MPRA Data With a priori Unknown Sequences of the Region of Interest and Associated Barcodes. Frontiers in Genetics, 12, 618189. [618189]. https://doi.org/10.3389/fgene.2021.618189

Vancouver

Letiagina AE, Omelina ES, Ivankin AV, Pindyurin AV. MPRAdecoder: Processing of the Raw MPRA Data With a priori Unknown Sequences of the Region of Interest and Associated Barcodes. Frontiers in Genetics. 2021 May 11;12:618189. 618189. doi: 10.3389/fgene.2021.618189

Author

Letiagina, Anna E. ; Omelina, Evgeniya S. ; Ivankin, Anton V. et al. / MPRAdecoder : Processing of the Raw MPRA Data With a priori Unknown Sequences of the Region of Interest and Associated Barcodes. In: Frontiers in Genetics. 2021 ; Vol. 12. pp. 618189.

BibTeX

@article{cb9f75cc3ad24fe08cacf374ce4a1bba,
title = "MPRAdecoder: Processing of the Raw MPRA Data With a priori Unknown Sequences of the Region of Interest and Associated Barcodes",
abstract = "Massively parallel reporter assays (MPRAs) enable high-throughput functional evaluation of numerous DNA regulatory elements and/or their mutant variants. The assays are based on the construction of reporter plasmid libraries containing two variable parts, a region of interest (ROI) and a barcode (BC), located outside and within the transcription unit, respectively. Importantly, each plasmid molecule in a such a highly diverse library is characterized by a unique BC–ROI association. The reporter constructs are delivered to target cells and expression of BCs at the transcript level is assayed by RT-PCR followed by next-generation sequencing (NGS). The obtained values are normalized to the abundance of BCs in the plasmid DNA sample. Altogether, this allows evaluating the regulatory potential of the associated ROI sequences. However, depending on the MPRA library construction design, the BC and ROI sequences as well as their associations can be a priori unknown. In such a case, the BC and ROI sequences, their possible mutant variants, and unambiguous BC–ROI associations have to be identified, whereas all uncertain cases have to be excluded from the analysis. Besides the preparation of additional “mapping” samples for NGS, this also requires specific bioinformatics tools. Here, we present a pipeline for processing raw MPRA data obtained by NGS for reporter construct libraries with a priori unknown sequences of BCs and ROIs. The pipeline robustly identifies unambiguous (so-called genuine) BCs and ROIs associated with them, calculates the normalized expression level for each BC and the averaged values for each ROI, and provides a graphical visualization of the processed data.",
keywords = "barcodes, massively parallel reporter assay, MPRA, next-generation sequencing, NGS data processing, pipeline, region of interest, reporter constructs",
author = "Letiagina, {Anna E.} and Omelina, {Evgeniya S.} and Ivankin, {Anton V.} and Pindyurin, {Alexey V.}",
note = "Funding Information: We thank Lyubov A. Yarinich and Mikhail O. Lebedev for the generation of the MPRA plasmid library, Lyubov A. Yarinich for critical reading of the manuscript, and Petr P. Laktionov and Daniil A. Maksimov for the assistance with the Illumina DNA sequencing that was performed at the Molecular and Cellular Biology core facility of the Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences. Funding. This work was mainly supported by the Russian Science Foundation Grant 16-14-10288 and in part of the preparation and deposition of the materials to the GitHub repository by the Russian Science Foundation Grant 20-74-00137. Publisher Copyright: {\textcopyright} Copyright {\textcopyright} 2021 Letiagina, Omelina, Ivankin and Pindyurin. Copyright: Copyright 2021 Elsevier B.V., All rights reserved.",
year = "2021",
month = may,
day = "11",
doi = "10.3389/fgene.2021.618189",
language = "English",
volume = "12",
pages = "618189",
journal = "Frontiers in Genetics",
issn = "1664-8021",
publisher = "Frontiers Media S.A.",

}

RIS

TY - JOUR

T1 - MPRAdecoder

T2 - Processing of the Raw MPRA Data With a priori Unknown Sequences of the Region of Interest and Associated Barcodes

AU - Letiagina, Anna E.

AU - Omelina, Evgeniya S.

AU - Ivankin, Anton V.

AU - Pindyurin, Alexey V.

N1 - Funding Information: We thank Lyubov A. Yarinich and Mikhail O. Lebedev for the generation of the MPRA plasmid library, Lyubov A. Yarinich for critical reading of the manuscript, and Petr P. Laktionov and Daniil A. Maksimov for the assistance with the Illumina DNA sequencing that was performed at the Molecular and Cellular Biology core facility of the Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences. Funding. This work was mainly supported by the Russian Science Foundation Grant 16-14-10288 and in part of the preparation and deposition of the materials to the GitHub repository by the Russian Science Foundation Grant 20-74-00137. Publisher Copyright: © Copyright © 2021 Letiagina, Omelina, Ivankin and Pindyurin. Copyright: Copyright 2021 Elsevier B.V., All rights reserved.

PY - 2021/5/11

Y1 - 2021/5/11

N2 - Massively parallel reporter assays (MPRAs) enable high-throughput functional evaluation of numerous DNA regulatory elements and/or their mutant variants. The assays are based on the construction of reporter plasmid libraries containing two variable parts, a region of interest (ROI) and a barcode (BC), located outside and within the transcription unit, respectively. Importantly, each plasmid molecule in a such a highly diverse library is characterized by a unique BC–ROI association. The reporter constructs are delivered to target cells and expression of BCs at the transcript level is assayed by RT-PCR followed by next-generation sequencing (NGS). The obtained values are normalized to the abundance of BCs in the plasmid DNA sample. Altogether, this allows evaluating the regulatory potential of the associated ROI sequences. However, depending on the MPRA library construction design, the BC and ROI sequences as well as their associations can be a priori unknown. In such a case, the BC and ROI sequences, their possible mutant variants, and unambiguous BC–ROI associations have to be identified, whereas all uncertain cases have to be excluded from the analysis. Besides the preparation of additional “mapping” samples for NGS, this also requires specific bioinformatics tools. Here, we present a pipeline for processing raw MPRA data obtained by NGS for reporter construct libraries with a priori unknown sequences of BCs and ROIs. The pipeline robustly identifies unambiguous (so-called genuine) BCs and ROIs associated with them, calculates the normalized expression level for each BC and the averaged values for each ROI, and provides a graphical visualization of the processed data.

AB - Massively parallel reporter assays (MPRAs) enable high-throughput functional evaluation of numerous DNA regulatory elements and/or their mutant variants. The assays are based on the construction of reporter plasmid libraries containing two variable parts, a region of interest (ROI) and a barcode (BC), located outside and within the transcription unit, respectively. Importantly, each plasmid molecule in a such a highly diverse library is characterized by a unique BC–ROI association. The reporter constructs are delivered to target cells and expression of BCs at the transcript level is assayed by RT-PCR followed by next-generation sequencing (NGS). The obtained values are normalized to the abundance of BCs in the plasmid DNA sample. Altogether, this allows evaluating the regulatory potential of the associated ROI sequences. However, depending on the MPRA library construction design, the BC and ROI sequences as well as their associations can be a priori unknown. In such a case, the BC and ROI sequences, their possible mutant variants, and unambiguous BC–ROI associations have to be identified, whereas all uncertain cases have to be excluded from the analysis. Besides the preparation of additional “mapping” samples for NGS, this also requires specific bioinformatics tools. Here, we present a pipeline for processing raw MPRA data obtained by NGS for reporter construct libraries with a priori unknown sequences of BCs and ROIs. The pipeline robustly identifies unambiguous (so-called genuine) BCs and ROIs associated with them, calculates the normalized expression level for each BC and the averaged values for each ROI, and provides a graphical visualization of the processed data.

KW - barcodes

KW - massively parallel reporter assay

KW - MPRA

KW - next-generation sequencing

KW - NGS data processing

KW - pipeline

KW - region of interest

KW - reporter constructs

UR - http://www.scopus.com/inward/record.url?scp=85107060194&partnerID=8YFLogxK

U2 - 10.3389/fgene.2021.618189

DO - 10.3389/fgene.2021.618189

M3 - Article

C2 - 34046055

AN - SCOPUS:85107060194

VL - 12

SP - 618189

JO - Frontiers in Genetics

JF - Frontiers in Genetics

SN - 1664-8021

M1 - 618189

ER -

ID: 28752893