Fragmentation and filtering

Next, an optional fragmentation process (nebulization, hydrolysis, adaptive focusing acoustics, …) may be comprised in the simulation. In general the simulation distinguishes 2 different mechanisms of RNA degradation, a mechanical/physical breaking process (PHYSICAL) and cleavage that is less dependant on physical properties (CHEMICAL). Choice of the fragmentation nature will influence the distribution of fragment lengths after the simulated fragmentation. Furthermore, also dependant on the adopted method, you should provide a realistic estimation of the maximum molecule length (FRAG_LAMBDA) that is not broken in the applied protocol. For instance, ~500nt cDNA molecules are known to be problematic to break with usual nebulization strategies. Naturally, this will mark about the upper limit of the fragment distribution yielded.

Simulation of fragmentation is an iterative process, where in each round a fragment is assigned a certain breaking probability, for PHYSICAL fragmentation

(1)
\begin{align} P_b= 1-exp^{-\frac{length(cDNA)}{FRAG\_LAMBDA}} \end{align}

and for CHEMICAL fragmentation

(2)
\begin{align} P_b= 1-(length(cDNA)- \lambda)^{-2} \end{align}

On the occurrences of breaks is decided in Bernoulli trials, the location of the respective breakpoint is normally distributed around the middle of the molecule (PHYSICAL), respectively uniformly distributed along the molecule (CHEMICAL). Finally, you specify whether the fragmentation step is carried out after or before the reverse transcription from RNA to DNA. Finally, in some protocols there is a step after RT and fragmentation that filters the generated cDNA fragments by size (FILTERING). If so, provide the minimum , (FILT_MIN) and the maximum (FILT_MAX) length of the fragments you want to retain for the sequencing.

When started ("Run" in the toolbar), the FLUX SIMULATOR will show you the number of initial molecules, as well as the number of molecules after each of the steps you programmed. In the gel, you can follow the length distribution of your molecules during the multi-step process. Additionally, a summary of the distribution of subsequently generated reads along the original transcript molecules is shown by 3x3 plots (so, in total 9 plots): 3 for short (<1500 nt), 3 for medium (<3000 nt, and longer than 1500, clearly), 3 for long (>=3000 nt) transcripts, and 3 for low (up to ~15 molecules/cell), 3 for medium (>15 and up to ~500 molecules per cell) and 3 for highly expressed ones (its still only 9 plots, check in the program).

Please note that the final number of molecules you obtain provides an upper limit on your sequencing capacity, as over sampling a small amount of molecules will not enlarge the diversity in the produced reads — it means, if you would produce a 1000 reads from 10 molecules left after RT/fragmentation, you will find groups of about 100 that map to identical locations. Upon termination the step copies the .frg file from the temporary directory to the project directory and updates column 7 and 8 of the .PRO file.