Issue
I have three column file like this below. I want to divide column 3 by column 2 (ignoring headers) and print it in column 4. Also, I want to calculate the log2 value of column4 and print it in column5 as shown below.
head my_file.txt
this is header
chrX:73829232:-::chrX:73831065:- 76.5382 76.34220209
chrX:73827985:-::chrX:73829067:- 60.0702 62.1887549
chr11:18266979:+::chr11:18269194:+ 15.4004 1558.282058
I am trying by awk, is giving less output and repeated lines.
awk -v OFS='\t' 'FNR > 1 {$4 = $3 / $2}1' my_file.txt |awk -F"\t" 'FNR > 1{a = log($4)/log(2); print $0"\t" a} OFS="\t"'
awk: cmd. line:1: (FILENAME=my_file.txt FNR=15) fatal: division by zero attempted
this is header
chrX:73829232:-::chrX:73831065:- 76.5382 76.3422020852288 0.997439 -0.00369948
chrX:73829232:-::chrX:73831065:- 76.5382 76.3422020852288 0.997439
chrX:73827985:-::chrX:73829067:- 60.0702 62.1887548960591 1.03527 0.0500071
chrX:73827985:-::chrX:73829067:- 60.0702 62.1887548960591 1.03527
This is my desired output.
this is my desired header
chrX:73829232:-::chrX:73831065:- 76.5382 76.34220209 0.9974392145 -0.003699170995
chrX:73827985:-::chrX:73829067:- 60.0702 62.1887549 1.035267985 0.05000426549
chr11:18266979:+::chr11:18269194:+ 15.4004 1558.282058 101.1845185 6.66084476
Solution
You can try the below command:
awk -v OFS='\t' 'FNR==1 {print $0, "col4", "col5"; next} {if ($2 == 0) {$4 = "NaN"; $5 = "NaN"} else {$4 = $3 / $2; $5 = log($4) / log(2)}} 1' my_file.txt
Basically, the above command:
- checks for the first row (header) and adds "col4" and "col5" as column names.
- For the remaining rows, it calculates the division and log2 values if column 2 is not zero; otherwise, it sets "NaN" as the value for columns 4 and 5. By setting the values of columns 4 and 5 to "NaN" when column 2 is zero, the script avoids division by zero errors and provides a clear indication that the result of the calculation is not a valid number.
Answered By - mandy8055 Answer Checked By - Cary Denson (WPSolving Admin)