Mapping reads to the annotation

Single read mapping

The assignment of reads — after having mapped them to genomic locations — is not straightforward. The Flux Capacitor follows a conservative annotation assignment,i.e., reads are assigned uniquely to genomic regions („segments” or ,,junctions). These regions are defined given the exon-intron structure of each locus, an example is shown in Fig.1.

read_location4
Fig.1: An example locus with two transcripts $\textrm{I}$ and $\textrm{II}$ (names to the left) that overlap in segments of their exons (green boxes denoted by letters A through E, indices indicate segments of overlapping exons). The Flux Capacitor distinguishes further 5 non-exonic areas. 19 sequencing reads (arrows with heart labels) have been mapped in the arrea of the locus as shown.

The locus sketched in Fig.1 consists of 8 exons that cluster in 8 segments (A1, A2,$\ldots$,E) separated by 5 non-exonic regions, i.e., the 5'proximal area (F), 3 introns (G,H,J), and 3'proximal (K). Additionally, there exist junctions between all adjacent segments (e.g., FA1, A1A2, etc. $\ldots$), or between non-adjacent segments that are spliced together (so-called splice-junctions, for instance A2B1). Reads are assigned to the region they completely fall into.

category FA1 A1A2 A2 G GB1 B1B2 B1C1 B2H C1 C1C2 C2 C3 C3J J E EK none
assigned
read ID
1 2 3, 19 18 4 5 17 6 7, 16 15 8 14 9 10 11 12 13

Note: By meanings of the mapping, read number 13 is not compatible with the annotation and remains unassigned.

Read pair mapping

A read pair is mapped validly iff both mate reads map to a segment or junction and their mapping distance on at least one of the transcripts that support both mapping locations falls within the boundaries of expected insert sizes. How paired reads are counted and coverage by read pairs is determined summarizes Fig.2.

how_to_count_reads.png

Fig.2: Examples of exonic structures (green boxes are exons, introns are not drawn to scale) and distinct possible read mappings, for single (above the structure) and paired-end reads (below). The read length is 3 and, for paired-ends, the insert size is 4 (no variation). For simplification, junctions are not shown. (A) There are 10 possible mapping locations („slots”)) in a mono-exonic transcript with 12nt. Reads starting at positions 11 or 12 fall partially outside of the annotation, as reads that start before position 1, and such reads are not considered to belong to the exon as annotated. Correspondingly, 4 slots with paired end reads can be observed. (B) Example of a transcript with 2 exons. Disconsidering the splice-junction, which is assigned read mappings starting in position 6 or 7, we observe 8 slots for single reads and 3 paired-end read slots. (C) Example of a transcript with 3 exons (splice-junctions disregarded). There are 7 slots for single reads, and 2 for paired-end reads.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License