Monday, January 31, 2022

[SOLVED] How can I make grep do a "word match", but without periods being treated as a word separator?

Issue

I have a file that looks like this:

5.3.236.113681.2225191122.986.3705653211.104    4
5.3.236.113681.2225191122.986.3705653211.104.3402  45
5.3.236.0.1.20549687.20.93.9.2.234266672113.4455  2
5.3.236.113681.5829104.986.3705653211.119    8
5.3.236.2.01107.50.01.24.48685.30000018053113560818700000112 172

A basic grep will show these results; it shows an additional match which I do not want.

$ grep 5.3.236.113681.2225191122.986.3705653211.104 test.txt
5.3.236.113681.2225191122.986.3705653211.104    4
5.3.236.113681.2225191122.986.3705653211.104.3402  45

I tried greping for a "fixed string"; it shows an additional match which I do not want.

$ grep -F 5.3.236.113681.2225191122.986.3705653211.104 test.txt
5.3.236.113681.2225191122.986.3705653211.104    4
5.3.236.113681.2225191122.986.3705653211.104.3402  45

I tried greping for just the match; it shows an additional match which I do not want.

$ grep -w 5.3.236.113681.2225191122.986.3705653211.104 test.txt
5.3.236.113681.2225191122.986.3705653211.104    4
5.3.236.113681.2225191122.986.3705653211.104.3402  45

This works, but it looks like it's technically greping for the string I want plus the space, which seems more like a workaround than actually targeting specifically what I want.

$ grep "5.3.236.113681.2225191122.986.3705653211.104[[:space:]]" test.txt
5.3.236.113681.2225191122.986.3705653211.104    4

The problem with the one that worked is the desired string may not have space at the end, it may have the space at the front like this:

4   5.3.236.113681.2225191122.986.3705653211.104
45  5.3.236.113681.2225191122.986.3705653211.104.3402

The command that worked previously won't work on a list formatted a little differently.

I could simply write grep "[[:space:]]5.3.236.113681.2225191122.986.3705653211.104 but I don't want to have to re-write the grep for each little difference like that.

I would like to be able to grep for that string and show the whole line, regardless of how or where that line shows up in the text.


Solution

Assuming this is your input file:

cat file

5.3.236.113681.2225191122.986.3705653211.104    4
5.3.236.113681.2225191122.986.3705653211.104.3402  45
5.3.236.0.1.20549687.20.93.9.2.234266672113.4455  2
5.3.236.113681.5829104.986.3705653211.119    8
5.3.236.2.01107.50.01.24.48685.30000018053113560818700000112 172
4   5.3.236.113681.2225191122.986.3705653211.104
45  5.3.236.113681.2225191122.986.3705653211.104.3402

If you have gnu-grep then you can use this PCRE regex with look-arounds:

grep -P '(?<!\S)5\.3\.236\.113681\.2225191122\.986\.3705653211\.104(?!\S)' file

5.3.236.113681.2225191122.986.3705653211.104    4
4   5.3.236.113681.2225191122.986.3705653211.104

Here:

  • (?<!\S): is a negative lookbehind regex to assert that we don't have a non-whitespace at a position before the current position
  • (?!\S): is a negative lookahead regex to assert that we don't have a non-whitespace at a position after the current position

Here is POSIX complaint awk solution:

awk -v s='5.3.236.113681.2225191122.986.3705653211.104' '{
for (i=1; i<=NF; ++i) if ($i == s) {print; next}}' file

5.3.236.113681.2225191122.986.3705653211.104    4
4   5.3.236.113681.2225191122.986.3705653211.104


Answered By - anubhava
Answer Checked By - Senaida (WPSolving Volunteer)