The file I am working on was produced for ENCODE, supposedly in collaboration between Sarah & Micha (I think). According to Micha (if I got it right), it would be better to use values after flow network decomposition ("pred").
I had doubts about using "uniq" vs "split", so the first thing I did was to check which keywords appeared in the file. Apparently only (among the keywords starting with "pred") "pred_split_freq", "pred_uniq_freq" seem to be recorded for all transcripts (see "a" below). Also when they are recorded mostly they are labelled as "NA" (see "b" below). Were these not calculated because there wasn't sufficient information (which would mean there is nothing to do about it) or because they were not judged important at the time (which would mean they could be recovered). Which of the "obsv_*" keywords would be the best work-around here ?
Thanks
Hagen
#a:
[htilgner@cel transcribed_genes]$ awk '$3=="transcript"' ~sdjebali/U54/RPKM/Helicos/RNAseq_PolyA+_K562_Cy_1_1_hg18.bed.flux | wc -l
103619
[htilgner@cel transcribed_genes]$ awk '$3=="transcript"' ~sdjebali/U54/RPKM/Helicos/RNAseq_PolyA+_K562_Cy_1_1_hg18.bed.flux | awk 'BEGIN{}{for(i=1;i<=NF;i++){if(substr($i,1,4)=="pred" || substr($i,1,3)=="obs"){print $i;}}}' | sort | uniq -c
103619 obsv_split_freq
103619 obsv_split_rfreq
103619 obsv_split_rpkm
103619 obsv_uniq_freq
103619 obsv_uniq_rfreq
103619 obsv_uniq_rpkm
103619 pred_split_freq
210 pred_split_rfreq
210 pred_split_rpkm
103619 pred_uniq_freq
210 pred_uniq_rfreq
210 pred_uniq_rpkm
- b:
[htilgner@cel transcribed_genes]$ awk '$3=="transcript"' ~sdjebali/U54/RPKM/Helicos/RNAseq_PolyA+_K562_Cy_1_1_hg18.bed.flux | awk '{split($10,a,"\""); trID=a[2]; for(i=1;i<=NF;i++){if($i=="pred_uniq_freq"){k=i+1; print $k;}}}' | sort | uniq -c
183 "0.0";
1 "0.7241379";
1 "0.8181818";
25 "1.0";
103409 "NA";
[htilgner@cel transcribed_genes]$ awk '$3=="transcript"' ~sdjebali/U54/RPKM/Helicos/RNAseq_PolyA+_K562_Cy_1_1_hg18.bed.flux | awk '{split($10,a,"\""); trID=a[2]; for(i=1;i<=NF;i++){if($i=="pred_split_freq"){k=i+1; print $k;}}}' | sort | uniq -c
182 "0.0";
28 "1.0";
103409 "NA";





