Issue
I am working on the following dataset and I would like to create an awk script to make an arithmetic operation between with other columns and add the result of each record in a new column that could be called "Survival Percentage" with only 2 decimals.
The operation would be the following:
((Column 1 - Column 2)/Column 3)*100
Below you can see a sample of the dataset:
40462186,177827,7671,4395,190,4.313743132
2872296,273870,3492,95349,1216,1.275057509
45236699,265691,6874,5873,152,2.587215976
77481,40024,153,516565,1975,0.382270638
The code I have tried to implement is as follows but it doesn't even run and it is a shell script and not an awk script as I wish.
awk 'BEGIN { FS=OFS="," } NR == 1 { $11="new" } NR > 1 { $11=(($1-$2)/$3)*100 }1' dataset.csv
From comments: After eliminating ^M as you told me, I have detected that there are rows in the "population" column that should be numerical and there is a string. Do you have any idea to discard the records that meet this condition using also awk and then perform the operation of my code? Any idea?
Solution
Considering your samples and shown attempts here is the awk
code which you can try. Which removes Control M characters(which we found out in comments) and this also checks that your 6th, 7th and 4th columns should be integers/floats etc and nothing else, try it out once.
awk '
BEGIN { FS=OFS="," }
{ sub(/\r$/,"") }
FNR==1 { $11="new" }
FNR > 1 && ($6+0=$6 && $7+0=$7 && $4+0=$4){
$11=(($6-$7)/$4)*100
}
1' dataset.csv
OR to get values till 2 decimal points try something like(make use of sprintf
function to save values to $11
itself:
awk '
BEGIN { FS=OFS="," }
{ sub(/\r$/,"") }
FNR==1 { $11="new" }
FNR > 1 && ($6+0=$6 && $7+0=$7 && $4+0=$4){
$11=sprintf("%0.2f",(($6-$7)/$4)*100)
}
1' dataset.csv
Answered By - RavinderSingh13 Answer Checked By - Marilyn (WPSolving Volunteer)