Thursday, January 4, 2024

[SOLVED] To make averages from the data of a text file

January 04, 2024 awk, bash, python, shell

Issue

I have a text file as follows where there are two columns in between strings:

1   23
2   29
3   21
4   18
5   19
6   18
7   19
8   24
Cluster analysis done for this configuration!

1   23
2   22
3   19
4   18
5   23
6   17
7   19
8   31
9   21
10   27
11   19
Cluster analysis done for this configuration!

1   22
2   26
3   27
4   23
5   25
6   32
7   23
8   19
9   19
10   18
11   30
12   21
13   23
14   16
Cluster analysis done for this configuration!

1   23
2   19
3   23
4   27
5   20
6   17
7   15
8   22
9   16
10   23
11   20
12   23
Cluster analysis done for this configuration!

The desired output would be:

1 22.75
2 24.0
3 22.5
4 21.5
5 21.75
6 21.0
7 19.0
8 24.0
9 18.666666666666668
10 22.666666666666668
11 23.0
12 22.0
13 23.0
14 16.0

I would like to get an average for each of the numbers in the first column. If I take this example, the average value that corresponds to ‘1’ would be: (23+23+22+23)/4 = 22.75 and so on for ‘2’, ‘3’… Please note that the total numbers of rows are not the same in between the strings ‘Cluster analysis….’ but that’s ok. For example, the average value for ’14’ would just be 16 in this case as there are no other numbers correspond to ’14’ except in ‘3rd’ block.

I was thinking along the line that somehow one needs to print all the numbers between the strings ‘Cluster analysis….’ and then maybe a store in an array or so and then just do an average but couldn't implement it in code. Could anyone give me a lead?

I don’t have any preference for the coding language; it just needs to solve the problem. I was thinking along with bash/shell but python is also welcome.

Solution

awk '/^[0-9]+ +[0-9]+$/ {  # pick only lines with two numbers
       arr[$1] += $2       # accumulate the numbers in indexed bins
       n[$1]++             # keep track of how may numbers are in each bin
     }
     END {                 # finally,
       for (e in arr)      # for each bin
         print arr[e]/n[e] # divide
     }' your_input_file

Answered By - Enlico

Answer Checked By - Timothy Miller (WPSolving Admin)

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, January 4, 2024

[SOLVED] To make averages from the data of a text file

Issue

Solution

Popular Posts

Labels