The GTF (Gene Transfer Format) has been developed to facilitate the exchange of genome annotations (i.e., transcripts aligned to the genome) in human readable flat files. A complete description is available for instance at the Washington University or at UCSC.
The standard format description requires 8 mandatory fields which are tab-separated. Following is a list of optional attributes with the structure
key "value"; key2 "value2"; ...
Attention: file sorting that possibly is triggered by the Flux Capacitor or Simulator expects a consistent order of the first 8 fields AND the attribute transcript_id across lines. In concrete, the key transcript_id is expected always in the same field (i.e., column) as it is found in the first line.
Note: in contrast to the general format description, the Flux Capacitor and the Flux Simulator are crucially dependant on the attribute transcript_id which has to be unique on the chromosome a certain transcript has been annotated on (as met by the UCSC standard). The attribute gene_id is not necessary, as both programs perform an intrinsic clustering of transcript into loci, i.e., spliceforms that overlap on the same strand. Further, each transcript requires at least one exon feature. Additional CDS features are optional, and mark the corresponding transcript as coding.
Flux Capacitor
The Flux Capacitor reads reference annotations in GTF format, considering exclusively exon features and transcript_id attributes. Adjacent exons, i.e. exon pairs with the same transcript_id of which one exon starts at the position directly after the end of the other exon, are merged into one exon. This may result in a lower number of exon feature lines in the output. Intrinsically, all transcript models are clustered into loci, i.e., transcripts that overlap on the same strand in the genomic region they align to. The attribute gene_id is adjusted to a UCSC genome browser compatible string chr:start-end and an additional character, W respectively C, indicating the strand.
In the output, the Flux Capacitor writes the following features
| Feature | Description |
|---|---|
| exon | Standard GTF compliant GTF feature. |
| as_event | Alternative Splicing events, compliant with the AStalavista definition. For a description of attributes used for as_event features, see the AStalavista GTF format description. |
| transcript | A spliceform as annotated in the reference annotation. |
| locus | Spliceforms that overlap on the same strand form a splicing locus. |
| segment | A segment, (part of) an exon. |
| junction | A junction, joining two adjacent exon segments or a splice-junction joining two exon segments across an intron. |
The Flux Capacitor can output multiple abundance measures in a 3-token format separated by _'' (underscore''). The 3 tokens denote the base, the resolution, and the measurement of the measurement.
| Base | Description |
|---|---|
| obs | Measurement is based on the number of reads that are observed to map to the feature according to the rules described in mapping. |
| pred | Measurement is based on the number of reads that are predicted for the feature after flow network decomposition. |
| Resolution | Description |
| all | all reads that overlap the area of the feature (i.e., that fall into one of the possible slots) are taken into account for the measurement. |
| split | reads that stem from the spliceforms listed in the transcript_id attribute of the feature are taken into account for the measurement. |
| uniq | exclusively reads that fall into unique regions of the spliceforms listed in the transcript_id attribute are taken into account for the measurement. |
| Measurement | Description |
| freq | the absolute number of read(-pairs) that map to the feature. |
| rfreq | the relative frequency of the feature, i.e., the fraction of read mappings from all read mappings in the experiment. |
| rcov | the relative coverage, i.e., read mappings freq divided by the number of distinct read mappings in the feature. See the description of the slots attribute below. |
For the calculation of the coverage measures, the Flux Capacitor counts the number of distinct read locations („slots”) in the feature with respect to the corresponding resolution. These numbers are included in the output as 2-token attributes slots_all, slots_split, respectively slots_uniq.
Examples
chrX myGenes transcript 123 455 . + . transcript_id="myTranscript"; slots_all "297"; slots_split "297"; slots_uniq "0"; obs_all_reads "69"; obs_split_reads "31"; obs_uniq_reads "0"; obs_all_rfreq "1.3467e-5"; obs_split_rfreq "6.051e-6"; obs_uniq_rfreq "0"; obs_all_rcov "45.3434"; obs_split_rcov "20.0372"; obs_uniq_rcov "0"; pred_all_reads "73"; pred_split_reads "62"; pred_uniq_reads "3"; pred_all_rfreq "1.4248e-5"; pred_split_rfreq "1.2102e-5"; pred_uniq_rfreq "5.8554e-7"; pred_all_rcov "40.9731"; pred_split_rcov "40.744"; pred_uniq_rcov "1.9715";
| Attribute | Value | Description |
|---|---|---|
| slots_all | 297 | number of different mappings to the feature |
| slots_split | 297 | subset of slots_all that map to the spliceforms listed in transcript_id that fall into the feature |
| slots_uniq | 0 | subset of slots_split that map exclusively to the spliceforms listed in transcript_id |
| obs_all_reads | 69 | read mappings that fall into exonic areas of transcript myTranscript |
| obs_split_reads | 31 | for each segment, the number of obs_sum_reads is divided by the number of transcripts there. The sum of these fractions forms the split reads. Use this to assess the difficulty of decomposition. |
| obs_uniq_reads | 0 | reads that align in unique regions of the transcript |
| obs_all_rfreq | 1.3467e-5 | obs_sum_reads divided by the number of read mappings |
| obs_split_rfreq | 6.051e-6 | obs_split_reads divided by the number of read mappings |
| obs_uniq_rfreq | 0 | obs_uniq_reads divided by the number of read mappings |
| obs_all_rcov | 1.3467e-5 | obs_sum_rfreq divided by slots_all |
| obs_split_rcov | 6.051e-6 | obs_split_rfreq divided by slots_split |
| obs_uniq_rcov | 0 | obs_uniq_rfreq divided by slots_uniq |
| pred_all_reads | 73 | pred_split_reads plus reads that have been predicted for other transcripts in overlapping segments |
| pred_split_reads | 62 | reads that have been predicted for the transcript. Use this value in combination with obs_reads to assess the reliability of the prediction. |
| pred_uniq_reads | 3 | reads that have been predicted in unique segments. Use this value in combination with obs_uniq_reads to assess the reliability of the prediction. |
| pred_all_rfreq | 1.4248e-5 | pred_sum_reads divided by the number of read mappings. Use this value in combination with obs_sum_rfreq to assess the reliability of the prediction. |
| pred_split_rfreq | 1.2102e-5 | pred_split_reads divided by the number of read mappings. Use this value to compare the same transcript in different experiments. |
| pred_uniq_rfreq | 5.8554e-7 | pred_uniq_reads divided by the number of read mappings. Use this value in combination with obs_uniq_rfreq to assess the reliability of the prediction. |
| pred_all_rcov | 40.9731 | pred_sum_rfreq divided by slots_all. |
| pred_split_rcov | 40.744 | pred_split_rfreq divided by slots_split. Use this value to compare different transcripts accross different experiments. |
| pred_uniq_rcov | 1.9715 | pred_uniq_rfreq divided by slots_uniq. Use this value in combination with obs_uniq_rpkm to assess the reliability of the prediction. |





