Thursday, July 28, 2022

[SOLVED] grep - difference between ways of specifying directory and globs

Issue

Say I'm in a project folder and want to grep a keyword using grep -rni. What's the difference between these 3 commands?

grep -rni . -e "keyword"

grep -rni * -e "keyword"

grep -rni **/* -e "keyword"

I tested this and noticed that the first two commands return the same number of matches, although in different ordering. The third one returned significantly more matches than the first two, however.

Is there any reason to use the third one ever? Is the reason it's returning more matches duplicates?


Solution

First of all, the difference has nothing to do with the arguments -n and -i.

From grep man page:

-n, --line-number
  Prefix each line of output with the 1-based line number within its input file.
-i, --ignore-case
  Ignore case distinctions in patterns and input data, so that characters that differ only in case match each other.
-r, --recursive
  Read all files under each directory, recursively, following symbolic links only if they are on the command line.  Note that if no file operand is given, grep searches the working directory.  This is equivalent to the
  -d recurse option.

So, the difference is actually on how the strings * and **/* are interpreted by the shell.

With . you pass the current directory as an argument to grep. No mystery here because it is grep the one who walks the current working directory.

With * you pass every file in the current directory as an argument to grep (this include directories).

Now, suppose you have the following directory structure:

├── file.txt
├── one
│   └── file.txt
└── two
    └── file.txt

Running grep -rni * -e keyword is translated to:

grep -rni file.txt one two -e keyword

This conditions grep to iterate files and nested directories in that order.

Finally, grep -rni **/* -e keyword will translate to this command line:

grep -rni file.txt one one/file.txt two two/file.txt -e keyword

The problem with this last approach is that some files will be processed more than once. For instance: one/file.txt will be processed twice: once because it is explicitly in the argument list, and another time because it belongs to the directory one, which is also in the argument list.



Answered By - ichramm
Answer Checked By - Pedro (WPSolving Volunteer)