Standard

Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome. / Naumenko, Fedor M.; Abnizova, Irina I.; Beka, Nathan et al.

In: BMC Genomics, Vol. 19, No. Suppl 3, 92, 09.02.2018, p. 92.

Research output: Contribution to journalArticlepeer-review

Harvard

Naumenko, FM, Abnizova, II, Beka, N, Genaev, MA & Orlov, YL 2018, 'Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome', BMC Genomics, vol. 19, no. Suppl 3, 92, pp. 92. https://doi.org/10.1186/s12864-018-4475-6

APA

Vancouver

Naumenko FM, Abnizova II, Beka N, Genaev MA, Orlov YL. Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome. BMC Genomics. 2018 Feb 9;19(Suppl 3):92. 92. doi: 10.1186/s12864-018-4475-6

Author

Naumenko, Fedor M. ; Abnizova, Irina I. ; Beka, Nathan et al. / Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome. In: BMC Genomics. 2018 ; Vol. 19, No. Suppl 3. pp. 92.

BibTeX

@article{4fcd805eec3144159e530c1acfadc7fd,
title = "Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome",
abstract = "Background: The use of artificial data to evaluate the performance of aligners and peak callers not only improves its accuracy and reliability, but also makes it possible to reduce the computational time. One of the natural ways to achieve such time reduction is by mapping a single chromosome. Results: We investigated whether a single chromosome mapping causes any artefacts in the alignments' performances. In this paper, we compared the accuracy of the performance of seven aligners on well-controlled simulated benchmark data which was sampled from a single chromosome and also from a whole genome. We found that commonly used statistical methods are insufficient to evaluate an aligner performance, and applied a novel measure of a read density distribution similarity, which allowed to reveal artefacts in aligners' performances. We also calculated some interesting mismatch statistics, and constructed mismatch frequency distributions along the read. Conclusions: The generation of artificial data by mapping of reads generated from a single chromosome to a reference chromosome is justified from the point of view of reducing the benchmarking time. The proposed quality assessment method allows to identify the inherent shortcoming of aligners that are not detected by conventional statistical methods, and can affect the quality of alignment of real data.",
keywords = "DNA alignment, Next-generation sequencing, Read density distribution, Artifacts, Chromosome Mapping/methods, Genomics, ALIGNMENT, SEQUENCING DATA, TOOLS",
author = "Naumenko, {Fedor M.} and Abnizova, {Irina I.} and Nathan Beka and Genaev, {Mikhail A.} and Orlov, {Yuriy L.}",
note = "Publisher Copyright: {\textcopyright} 2018 The Author(s).",
year = "2018",
month = feb,
day = "9",
doi = "10.1186/s12864-018-4475-6",
language = "English",
volume = "19",
pages = "92",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central Ltd.",
number = "Suppl 3",

}

RIS

TY - JOUR

T1 - Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome

AU - Naumenko, Fedor M.

AU - Abnizova, Irina I.

AU - Beka, Nathan

AU - Genaev, Mikhail A.

AU - Orlov, Yuriy L.

N1 - Publisher Copyright: © 2018 The Author(s).

PY - 2018/2/9

Y1 - 2018/2/9

N2 - Background: The use of artificial data to evaluate the performance of aligners and peak callers not only improves its accuracy and reliability, but also makes it possible to reduce the computational time. One of the natural ways to achieve such time reduction is by mapping a single chromosome. Results: We investigated whether a single chromosome mapping causes any artefacts in the alignments' performances. In this paper, we compared the accuracy of the performance of seven aligners on well-controlled simulated benchmark data which was sampled from a single chromosome and also from a whole genome. We found that commonly used statistical methods are insufficient to evaluate an aligner performance, and applied a novel measure of a read density distribution similarity, which allowed to reveal artefacts in aligners' performances. We also calculated some interesting mismatch statistics, and constructed mismatch frequency distributions along the read. Conclusions: The generation of artificial data by mapping of reads generated from a single chromosome to a reference chromosome is justified from the point of view of reducing the benchmarking time. The proposed quality assessment method allows to identify the inherent shortcoming of aligners that are not detected by conventional statistical methods, and can affect the quality of alignment of real data.

AB - Background: The use of artificial data to evaluate the performance of aligners and peak callers not only improves its accuracy and reliability, but also makes it possible to reduce the computational time. One of the natural ways to achieve such time reduction is by mapping a single chromosome. Results: We investigated whether a single chromosome mapping causes any artefacts in the alignments' performances. In this paper, we compared the accuracy of the performance of seven aligners on well-controlled simulated benchmark data which was sampled from a single chromosome and also from a whole genome. We found that commonly used statistical methods are insufficient to evaluate an aligner performance, and applied a novel measure of a read density distribution similarity, which allowed to reveal artefacts in aligners' performances. We also calculated some interesting mismatch statistics, and constructed mismatch frequency distributions along the read. Conclusions: The generation of artificial data by mapping of reads generated from a single chromosome to a reference chromosome is justified from the point of view of reducing the benchmarking time. The proposed quality assessment method allows to identify the inherent shortcoming of aligners that are not detected by conventional statistical methods, and can affect the quality of alignment of real data.

KW - DNA alignment

KW - Next-generation sequencing

KW - Read density distribution

KW - Artifacts

KW - Chromosome Mapping/methods

KW - Genomics

KW - ALIGNMENT

KW - SEQUENCING DATA

KW - TOOLS

UR - http://www.scopus.com/inward/record.url?scp=85041837055&partnerID=8YFLogxK

U2 - 10.1186/s12864-018-4475-6

DO - 10.1186/s12864-018-4475-6

M3 - Article

C2 - 29504893

AN - SCOPUS:85041837055

VL - 19

SP - 92

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

IS - Suppl 3

M1 - 92

ER -

ID: 10453230