Research output: Contribution to journal › Article › peer-review
ICAnnoLncRNA: A Snakemake Pipeline for a Long Non-Coding-RNA Search and Annotation in Transcriptomic Sequences. / Pronozin, Artem Yu; Afonnikov, Dmitry A.
In: Genes, Vol. 14, No. 7, 1331, 24.06.2023.Research output: Contribution to journal › Article › peer-review
}
TY - JOUR
T1 - ICAnnoLncRNA: A Snakemake Pipeline for a Long Non-Coding-RNA Search and Annotation in Transcriptomic Sequences
AU - Pronozin, Artem Yu
AU - Afonnikov, Dmitry A
N1 - Funding: The work was supported by the Budget Project #FWNR-2022-0006 of the Ministry of Science and Higher Education of The Russian Federation(transcriptome analysis) and by the Kurchatov Genomic Centre of the Institute of Cytology and Genetics, SB RAS (No. 075-15-2019-1662) (pipeline development).
PY - 2023/6/24
Y1 - 2023/6/24
N2 - Long non-coding RNAs (lncRNAs) are RNA molecules longer than 200 nucleotides that do not encode proteins. Experimental studies have shown the diversity and importance of lncRNA functions in plants. To expand knowledge about lncRNAs in other species, computational pipelines that allow for standardised data-processing steps in a mode that does not require user control up until the final result were actively developed recently. These advancements enable wider functionality for lncRNA data identification and analysis. In the present work, we propose the ICAnnoLncRNA pipeline for the automatic identification, classification and annotation of plant lncRNAs in assembled transcriptomic sequences. It uses the LncFinder software for the identification of lncRNAs and allows the adjustment of recognition parameters using genomic data for which lncRNA annotation is available. The pipeline allows the prediction of lncRNA candidates, alignment of lncRNA sequences to the reference genome, filtering of erroneous/noise transcripts and probable transposable elements, lncRNA classification by genome location, comparison with sequences from external databases and analysis of lncRNA structural features and expression. We used transcriptomic sequences from 15 maize libraries assembled by Trinity and Hisat2/StringTie to demonstrate the application of the ICAnnoLncRNA pipeline.
AB - Long non-coding RNAs (lncRNAs) are RNA molecules longer than 200 nucleotides that do not encode proteins. Experimental studies have shown the diversity and importance of lncRNA functions in plants. To expand knowledge about lncRNAs in other species, computational pipelines that allow for standardised data-processing steps in a mode that does not require user control up until the final result were actively developed recently. These advancements enable wider functionality for lncRNA data identification and analysis. In the present work, we propose the ICAnnoLncRNA pipeline for the automatic identification, classification and annotation of plant lncRNAs in assembled transcriptomic sequences. It uses the LncFinder software for the identification of lncRNAs and allows the adjustment of recognition parameters using genomic data for which lncRNA annotation is available. The pipeline allows the prediction of lncRNA candidates, alignment of lncRNA sequences to the reference genome, filtering of erroneous/noise transcripts and probable transposable elements, lncRNA classification by genome location, comparison with sequences from external databases and analysis of lncRNA structural features and expression. We used transcriptomic sequences from 15 maize libraries assembled by Trinity and Hisat2/StringTie to demonstrate the application of the ICAnnoLncRNA pipeline.
UR - https://www.scopus.com/record/display.uri?eid=2-s2.0-85165954290&origin=inward&txGid=669e4b9266285e36bda2f6eb4d61e1bb
UR - https://www.mendeley.com/catalogue/8cec4a45-0c57-3cbd-9355-f2b409e5def7/
U2 - 10.3390/genes14071331
DO - 10.3390/genes14071331
M3 - Article
C2 - 37510236
VL - 14
JO - Genes
JF - Genes
SN - 2073-4425
IS - 7
M1 - 1331
ER -
ID: 53249397