Issue
I have a text file as follows where there are two columns in between strings:
1 23
2 29
3 21
4 18
5 19
6 18
7 19
8 24
Cluster analysis done for this configuration!
1 23
2 22
3 19
4 18
5 23
6 17
7 19
8 31
9 21
10 27
11 19
Cluster analysis done for this configuration!
1 22
2 26
3 27
4 23
5 25
6 32
7 23
8 19
9 19
10 18
11 30
12 21
13 23
14 16
Cluster analysis done for this configuration!
1 23
2 19
3 23
4 27
5 20
6 17
7 15
8 22
9 16
10 23
11 20
12 23
Cluster analysis done for this configuration!
The desired output would be:
1 22.75
2 24.0
3 22.5
4 21.5
5 21.75
6 21.0
7 19.0
8 24.0
9 18.666666666666668
10 22.666666666666668
11 23.0
12 22.0
13 23.0
14 16.0
I would like to get an average for each of the numbers in the first column. If I take this example, the average value that corresponds to ‘1’ would be: (23+23+22+23)/4 = 22.75 and so on for ‘2’, ‘3’… Please note that the total numbers of rows are not the same in between the strings ‘Cluster analysis….’ but that’s ok. For example, the average value for ’14’ would just be 16 in this case as there are no other numbers correspond to ’14’ except in ‘3rd’ block.
I was thinking along the line that somehow one needs to print all the numbers between the strings ‘Cluster analysis….’ and then maybe a store in an array or so and then just do an average but couldn't implement it in code. Could anyone give me a lead?
I don’t have any preference for the coding language; it just needs to solve the problem. I was thinking along with bash/shell but python is also welcome.
Solution
awk '/^[0-9]+ +[0-9]+$/ { # pick only lines with two numbers
arr[$1] += $2 # accumulate the numbers in indexed bins
n[$1]++ # keep track of how may numbers are in each bin
}
END { # finally,
for (e in arr) # for each bin
print arr[e]/n[e] # divide
}' your_input_file
Answered By - Enlico Answer Checked By - Timothy Miller (WPSolving Admin)