ICAnnoLncRNA: A Snakemake Pipeline for a Long Non-Coding-RNA Search and Annotation in Transcriptomic Sequences

Standard

ICAnnoLncRNA: A Snakemake Pipeline for a Long Non-Coding-RNA Search and Annotation in Transcriptomic Sequences. / Pronozin, Artem Yu; Afonnikov, Dmitry A.

In: Genes, Vol. 14, No. 7, 1331, 24.06.2023.

Research output: Contribution to journal › Article › peer-review

BibTeX

@article{d86603dc1e5944988afb46b2bea733e4,

title = "ICAnnoLncRNA: A Snakemake Pipeline for a Long Non-Coding-RNA Search and Annotation in Transcriptomic Sequences",

abstract = "Long non-coding RNAs (lncRNAs) are RNA molecules longer than 200 nucleotides that do not encode proteins. Experimental studies have shown the diversity and importance of lncRNA functions in plants. To expand knowledge about lncRNAs in other species, computational pipelines that allow for standardised data-processing steps in a mode that does not require user control up until the final result were actively developed recently. These advancements enable wider functionality for lncRNA data identification and analysis. In the present work, we propose the ICAnnoLncRNA pipeline for the automatic identification, classification and annotation of plant lncRNAs in assembled transcriptomic sequences. It uses the LncFinder software for the identification of lncRNAs and allows the adjustment of recognition parameters using genomic data for which lncRNA annotation is available. The pipeline allows the prediction of lncRNA candidates, alignment of lncRNA sequences to the reference genome, filtering of erroneous/noise transcripts and probable transposable elements, lncRNA classification by genome location, comparison with sequences from external databases and analysis of lncRNA structural features and expression. We used transcriptomic sequences from 15 maize libraries assembled by Trinity and Hisat2/StringTie to demonstrate the application of the ICAnnoLncRNA pipeline.",

author = "Pronozin, {Artem Yu} and Afonnikov, {Dmitry A}",

note = "Funding: The work was supported by the Budget Project #FWNR-2022-0006 of the Ministry of Science and Higher Education of The Russian Federation(transcriptome analysis) and by the Kurchatov Genomic Centre of the Institute of Cytology and Genetics, SB RAS (No. 075-15-2019-1662) (pipeline development).",

year = "2023",

month = jun,

day = "24",

doi = "10.3390/genes14071331",

language = "English",

volume = "14",

journal = "Genes",

issn = "2073-4425",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "7",

}

RIS

TY - JOUR

T1 - ICAnnoLncRNA: A Snakemake Pipeline for a Long Non-Coding-RNA Search and Annotation in Transcriptomic Sequences

AU - Pronozin, Artem Yu

AU - Afonnikov, Dmitry A

N1 - Funding: The work was supported by the Budget Project #FWNR-2022-0006 of the Ministry of Science and Higher Education of The Russian Federation(transcriptome analysis) and by the Kurchatov Genomic Centre of the Institute of Cytology and Genetics, SB RAS (No. 075-15-2019-1662) (pipeline development).

PY - 2023/6/24

Y1 - 2023/6/24

N2 - Long non-coding RNAs (lncRNAs) are RNA molecules longer than 200 nucleotides that do not encode proteins. Experimental studies have shown the diversity and importance of lncRNA functions in plants. To expand knowledge about lncRNAs in other species, computational pipelines that allow for standardised data-processing steps in a mode that does not require user control up until the final result were actively developed recently. These advancements enable wider functionality for lncRNA data identification and analysis. In the present work, we propose the ICAnnoLncRNA pipeline for the automatic identification, classification and annotation of plant lncRNAs in assembled transcriptomic sequences. It uses the LncFinder software for the identification of lncRNAs and allows the adjustment of recognition parameters using genomic data for which lncRNA annotation is available. The pipeline allows the prediction of lncRNA candidates, alignment of lncRNA sequences to the reference genome, filtering of erroneous/noise transcripts and probable transposable elements, lncRNA classification by genome location, comparison with sequences from external databases and analysis of lncRNA structural features and expression. We used transcriptomic sequences from 15 maize libraries assembled by Trinity and Hisat2/StringTie to demonstrate the application of the ICAnnoLncRNA pipeline.

AB - Long non-coding RNAs (lncRNAs) are RNA molecules longer than 200 nucleotides that do not encode proteins. Experimental studies have shown the diversity and importance of lncRNA functions in plants. To expand knowledge about lncRNAs in other species, computational pipelines that allow for standardised data-processing steps in a mode that does not require user control up until the final result were actively developed recently. These advancements enable wider functionality for lncRNA data identification and analysis. In the present work, we propose the ICAnnoLncRNA pipeline for the automatic identification, classification and annotation of plant lncRNAs in assembled transcriptomic sequences. It uses the LncFinder software for the identification of lncRNAs and allows the adjustment of recognition parameters using genomic data for which lncRNA annotation is available. The pipeline allows the prediction of lncRNA candidates, alignment of lncRNA sequences to the reference genome, filtering of erroneous/noise transcripts and probable transposable elements, lncRNA classification by genome location, comparison with sequences from external databases and analysis of lncRNA structural features and expression. We used transcriptomic sequences from 15 maize libraries assembled by Trinity and Hisat2/StringTie to demonstrate the application of the ICAnnoLncRNA pipeline.

UR - https://www.scopus.com/record/display.uri?eid=2-s2.0-85165954290&origin=inward&txGid=669e4b9266285e36bda2f6eb4d61e1bb

UR - https://www.mendeley.com/catalogue/8cec4a45-0c57-3cbd-9355-f2b409e5def7/

U2 - 10.3390/genes14071331

DO - 10.3390/genes14071331

M3 - Article

C2 - 37510236

VL - 14

JO - Genes

JF - Genes

SN - 2073-4425

IS - 7

M1 - 1331

ER -

ID: 53249397