Tuesday, October 4, 2022

[SOLVED] storage in awk when used within shell script

October 04, 2022 awk, grep, shell

Issue

I am writing a shell script program in which I am internally calling an awk script. Here is my script below.

for FILE in `eval echo{0..$fileIterator}`
{

if(FILE == $fileIterator)
{
    printindicator =1;
}
    grep RECORD FILEARRAY[FILE]| awk 'for(i=1;i<=NF;i++) {if($i ~ XXXX) {XARRAY[$i]++}} END {if(printIndicator==1){for(element in XARRAY){print element >> FILE B}}'

I hope I am clear with my code . Please let me know if you need any other details.

ISSUE

My motivation in this program is to traverse through all the files an get the lines that has "XXXX" in all the files and store the lines in an array. That is what I am doing here. Finally I need to store the contents of the array variable into a file. I can store the contents at each and every step like the below

{if($i ~ XXXX) {XARRAY[$i]++; print XARRAY[$i] >> FILE B}}

But the reason behind not going to this approach is each time I need to do an I/O operation and for this the time taken is much and that is why I am converting that into inmemory everytime and then at last dumping the in memory array(XARRAY) into the file.

The problem I am facing here is that. The shell script calls the awk everytime, the data's are getting stored in the array(XARRAY) but for the next iteration, the previous content of XARRAY is getting deleted and it puts the new content as this assumes this as a new array. Hence at last when I print the contents, it prints only the lately updated XARRAY and not all the data that is expected from this.

SUGGESTIONS EXPECTED

1) How to make the awk script realize that the XARRAY is an old one and not the new one when it is being called everytime in each iteration.

2) One of the alternative is to do an I/O everytime. But I am not interested in this. Is there any other alternative other than this. Thank you.

Solution

Ouch, can't tell if it is meant to be real or pseudocode!

You can't make awk preserve state. You would either have to save it to a temporary file or store it in a shell variable, the contents of which you'd pass to later invocations. But this is all too much hassle for what I understand you want to achieve.

I suggest you omit the loop, which will allow you to call awk only once with just some reordering. I assume FILE A is the FILE in the loop and FILE B is something external. The reordering would end up something very roughly like:

grep RECORD ${FILEARRAY[@]:0:$fileIterator} | awk 'for(i=1;i<=NF;i++) {if($i ~ XXXX) {XARRAY[$i]++}} END {for(element in XARRAY){print element >> FILEB}'

I move the filename expansion to the grep call and removed the whole printIndicator check.

It could all be done even more efficiently (the obvious one being removal of grep), but you provided too little detail to make early optimisation sensible.

EDIT: fixed the loop iteration with the info from the update. Here's a loopy solution, which is immune to new whitespace issues and too long command lines:

for FILE in $(seq 0 $fileIterator); do
  grep RECORD "${FILEARRAY[$FILE]}"
done | 
awk 'for(i=1;i<=NF;i++) {if($i ~ XXXX) {XARRAY[$i]++}} END {for(element in XARRAY){print element >> FILEB}'

It still runs awk only once, constantly feeding it data from the loop.

If you want to load the results into an array UGUGU, do the following as well (requires bash 4):

mapfile UGUGU < FILEB

Answered By - lynxlynxlynx

Answer Checked By - Senaida (WPSolving Volunteer)

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, October 4, 2022

[SOLVED] storage in awk when used within shell script

Issue

Solution

Popular Posts

Labels