Monday, April 4, 2022

[SOLVED] Counting occurrences of a pattern in a file, include zero for missing patterns

Issue

I am trying to count occurrences of a pattern (listed in pattern.txt) in a file (file.txt) using:

grep -o -w -f pattern.txt file.txt | sort | uniq -c > output.txt

This works great, but I would like the output to also include 0 for patterns that do not occur in the file.

How might I accomplish this?


Solution

You could add the patterns to the sort, then substract 1 after the uniq:

grep -o -w -F -f pattern.txt file.txt |\
sort - pattern.txt |\
uniq -c |\
awk '{ $1 = sprintf("%7d", $1-1) } 1' > output.txt

Note that this only makes sense if the patterns are fixed strings, so I have added the -F option to grep.

Also, this particular awk script will compact whitespace in the patterns. You'll need more complicated code to avoid that.



Answered By - jhnc
Answer Checked By - Senaida (WPSolving Volunteer)