Monday, November 1, 2021

[SOLVED] Generating a new file after processing data in Shell script

Issue

The input file which is shown below is generated by performing results of 2 other files i.e awk 'BEGIN{FS=OFS=","} FNR==NR{arr[$0];next} {print $1,$2,$3,$5,($4 in arr)?1:0}' $NGW_REG_RESP_FILE $NGW_REG_REQ_FILE >> $NGW_REG_FILE

$NGW_REG_FILE file contains below data based on that i have to create a new file

2020-12-21,18,1,1,1
2020-12-21,18,1,1,0
2020-12-21,18,1,2,1
2020-12-21,18,1,2,1
2020-12-21,18,2,1,1
2020-12-21,18,2,1,1
2020-12-21,18,2,1,0
2020-12-21,18,3,2,1
2020-12-21,18,3,2,1
2020-12-21,18,4,2,0
2020-12-21,18,4,2,1
2020-12-21,18,3,2,0

What this data indicates is:

Date,Hour,Quarter,ReqType,Success/failed
Reqtype there were 2 possibilities: 1-> incoming 2-> outgoing
last field: 1->success 0-> failed
Quarter -> 1,2,3,4

I want to read this file and generate a new file that contains data like below (MY OUTPUT FILE):

2020-12-21,18,1,1,1,1 
2020-12-21,18,1,2,2,0
2020-12-21,18,2,1,2,1
.....

Explanation:
heading: date,hour,quarter,reqType,Success_count,Failure_count  (for reference to understand o/p file)
      Date      H  Q ReqID SuccessCnt Fail Count
     2020-12-21,18,1,1     ,1         ,1  

Explanation: in input file for quarter 1 both reqTypes(1&2) were present
               there will be at max 2 entry in each quarter.

in quarter 1 for reqid 1 there were 2 requests, 1 got success and other got failed
so 1 as success cnt and 1 as failure cnt


    2020-12-21,18,1,2,2,0 
here quarter 1 ,for req ID 2 there were 2 requests both got success so success count be 2 and failure count be 0

**UPDATE The answer which is given in the comment is worked exactly what I was looking for. I have some updates in the sample input file, i.e one more columns gets added before the last column i.e the STATUS CODE which you can see in the below input i,e 200,400,300

2020-12-21,18,1,1,200,1
2020-12-21,18,2,1,400,0
2020-12-21,18,2,1,300,0

The existing code gives the below result in the output file: i.e Total count of success/failed in that quarter. Which is Correct.

What I want to do is add one more column to the output file, next to the total failed count i.e the array holding those status codes.

2020-12-21,18,1,1,1,0,[] //empty array in end bcs there is no failed req,1,success req
2020-12-21,18,2,1,0,2,[400,300] // here 2 failed req,0 success request
<DATE>,<HOUR>,<QUARTER>,<REQ_TYPE>,<SUCCESS_COUNT>,<FAIL_CNT>,<ARRAY_HOLDING_STATUSCODE>

I have added below changes to the code , Bu not getting how to iterate in side the same for loop

`cat $input_file | grep -v Orig | awk -F, '{ 
           if ($NF==1) { 
              map[$1][$2][$3][$4]["success"]++
           }
           else { 
              map[$1][$2][$3][$4]["fail"]++
              harish[$1][$2][$3][$4][$5]++ //ADDED THIS
           } 
          } 
      END { 
            PROCINFO["sorted_in"]="@ind_num_asc";
            for (i in map) { 
             for (j in map[i]) { 
              for (k in map[i][j]) { 
                for (l in map[i][j][k]) { 
                   print i","j","k","l","(map[i][j][k][l]["success"]==""?"0":map[i][j][k][l]["success"])","(map[i][j][k][l]["fail"]==""?"0":map[i][j][k][l]["fail"])
                } 
               } 
              } 
             } 
            }' >> OUTPUT_FILE.txt`

Solution

With awk (GNU awk for array sorting):

awk -F, '{ if ($NF==1) { map[$1][$2][$3][$4]["success"]++ } else { map[$1][$2][$3][$4]["fail"]++ } } END { PROCINFO["sorted_in"]="@ind_num_asc";for (i in map) { for (j in map[i]) { for (k in map[i][j]) { for (l in map[i][j][k]) { print i","j","k","l","(map[i][j][k][l]["success"]==""?"0":map[i][j][k][l]["success"])","(map[i][j][k][l]["fail"]==""?"0":map[i][j][k][l]["fail"]) } } } } }' $NGW_REG_FILE

Explanation:

awk -F, '{ 
           if ($NF==1) { 
              map[$1][$2][$3][$4]["success"]++ # If last field is 1, increment a success index in array map with other fields as further indexes
           }
           else { 
              map[$1][$2][$3][$4]["fail"]++ # Otherwise increment a fail index
           } 
          } 
      END { 
            PROCINFO["sorted_in"]="@ind_num_asc"; # Set the array ordering
            for (i in map) { 
             for (j in map[i]) { 
              for (k in map[i][j]) { 
                for (l in map[i][j][k]) { 
                   print i","j","k","l","(map[i][j][k][l]["success"]==""?"0":map[i][j][k][l]["success"])","(map[i][j][k][l]["fail"]==""?"0":map[i][j][k][l]["fail"]) # Loop through the array and print the data in the format required. If there is no entry in the success or fail index, print 0.
                } 
               } 
              } 
             } 
            }' $NGW_REG_FILE


Answered By - Raman Sailopal