Flux Mapped Read Descriptor (FMRD)

The Flux Mapped Read Descriptor is adopted by different file formats to condense all relevant information of a mapped read in form of a string, usually stored as read identifier or descriptor. It can comprise the following pipe-separated blocks:

unique read identifier paired-end information multi-map information

multi-map information: in the simplest case an integer number that shows how many different mappings have been found for the read. Also, the advanced GEM syntax may be used, where a double-colon seperated list of match counts is given according to the number of mismatches. For instance 1:0:4:1 refers to a read that matches exact uniquely, 0 times with 1 mismatch, 4 times with 2 mismatches and again only once with 3 mismatches.

paired-end information: an unique extension that identifies the two parts of a read pair, as a default 1 and 2 are expected.

unique read identifier: some string that distinguishes the read uniquely from all other reads in the run, beside of paired end reads, where each read pair must have identical such identifiers. In the FMRD generated by the FLUX SIMULATOR, these identifiers are generated by attributes from the simulated experiment:

locus ID : transcript ID : molecule ID : transcript length : fragment start : fragment end : read start : read end

where

locus ID is a UCSC browser compatible string of the format chr:start-end and an additional character W|C denoting the genomic location and strand of the splicing locus from which the read has been produced.
transcript ID is the transcript_id attribute from the reference annotation for the spliceform the read has been produced from
molecule ID is a generic string describing the cDNA molecule that gave rise to this read. It is generated by counter:length where counter assigns an unique number to each molecule, and length preserves the length of the cDNA from which the read has been derived.
transcript length the spliced length of the complete transcript sequence.
fragment start provides the start of cDNA fragment the read has been derived from. The position is given relative to the annotated transcript start. Negative positions indicate that the read falls due to transcription start site variation before the anntotated transcription start, positions larger than transcript_length fall into the poly-A tail.
fragment end gives the end of cDNA fragment the read has been derived from relative to the annotated transcript start, respectively. Negative positions indicate that the read falls due to transcription start site variation before the anntotated transcription start, positions larger than transcript_length fall into the poly-A tail.
read start gives the start position of the read in the spliced sequence relative to the annotated transcription start site.
read end gives the end position of the read in the spliced sequence relative to the annotated transcription start site.
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License