Issue
I am dealing with the analysis of multi-column data organized in the following manner:
#Acceptor DonorH Donor Frames Frac AvgDist AvgAng
[email protected] [email protected] [email protected] 13731 0.6865 2.8609 160.4598
[email protected] [email protected] [email protected] 8320 0.4160 2.8412 150.3665
[email protected] [email protected] [email protected] 1575 0.0788 2.9141 157.3493
[email protected] [email protected] [email protected] 218 0.0109 2.8567 156.0376
[email protected] [email protected] [email protected] 72 0.0036 2.8427 157.3778
[email protected] [email protected] [email protected] 43 0.0022 2.9093 165.3063
[email protected] [email protected] [email protected] 32 0.0016 2.8710 159.8673
[email protected] [email protected] [email protected] 31 0.0015 2.8904 153.0763
[email protected] [email protected] [email protected] 20 0.0010 2.8147 144.6951
[email protected] [email protected] [email protected] 16 0.0008 2.8590 165.3937
[email protected] [email protected] [email protected] 15 0.0008 2.8729 149.1930
[email protected] [email protected] [email protected] 15 0.0008 2.9192 146.2273
[email protected] [email protected] [email protected] 10 0.0005 2.9259 148.8008
[email protected] [email protected] [email protected] 8 0.0004 2.9491 149.1861
[email protected] [email protected] [email protected] 4 0.0002 2.8839 150.1238
[email protected] [email protected] [email protected] 3 0.0001 2.9567 153.7993
[email protected] [email protected] [email protected] 2 0.0001 2.8564 147.7916
[email protected] [email protected] [email protected] 2 0.0001 2.8867 151.6423
[email protected] [email protected] [email protected] 2 0.0001 2.8888 148.3678
[email protected] [email protected] [email protected] 2 0.0001 2.9658 149.2518
[email protected] [email protected] [email protected] 1 0.0001 2.8675 139.9754
[email protected] [email protected] [email protected] 1 0.0001 2.8987 168.1758
[email protected] [email protected] [email protected] 1 0.0001 2.9411 147.0443
From this I need to take into account the info from the third column (donor) as well as the fifth column (Frac) and print the 2D histogram of the data taking into account the values (of the fifth column) bigger then 0.01. So in the demonstrated example, only the following data should be considered:
#Donor #Frac
[email protected] 0.6865
[email protected] 0.4160
[email protected] 0.0788
[email protected] 0.0109
and the 2D histogram should plot # Donor on X and #Frac on Y (in %)
Before I had to add the following lines to the reduced 2D datafile in order that it could be recognized by gracebat as 2D bar plot:
@ title "No tittle"
@ xaxis label "Donor"
@ yaxis label "Frac"
@s0 line type 0
@TYPE bar
# here is the data in 2 column format
Is it possible to automatize such file post-processing to produce the bar plot on-the-fly ? alternatively I would be grateful for sed solution to edit the datafile on the fly to reduce it to 2 columns and insert in the begining @ lines required for bar graph ploting using:
sed -i 's/old-text/new-text/g' datafile
Solution
sed
isn't meant for this kind of task, you should use awk
:
awk '
BEGIN {
print "@ title \"No title\""
print "@ xaxis label \"Donor\""
print "@ yaxis label \"Frac\""
print "@s0 line type 0"
print "@TYPE bar"
}
NR > 1 && $5 > 0.01 { print $3, $5 }
' file.txt
Answered By - Fravadona Answer Checked By - Marie Seifert (WPSolving Admin)