Friday, May 27, 2022

[SOLVED] Using the subprocess module to grep with the | character

Issue

Imagine that file.txt contains the following:

line one
line two
line three

Then, these calls to subprocess.check_output fail (python 2.7.5 says that grep fails with exit code 1, in python 3.8.5 it hangs & requires a keyboard interrupt to stop the program):

# first approach
command = 'grep "one\|three" ./file.txt'
results = subprocess.check_output(command.split())
print(results)

# second approach
command = 'grep -E "one|three" ./file.txt'
results = subprocess.check_output(command.split())
print(results)

but this call succeeds (on both versions) and gives the expected output:

#third approach
command = 'grep -e one -e three ./file.txt'
results = subprocess.check_output(command.split())
print(results)

Why is this the case? My only guess as to why approaches one and two don't work is some intricacy between how the subprocess module and the | character work, but I honestly have no idea why this would cause the call to fail; in the first approach, the character is escaped, and in the second approach, we have a flag being passed to grep saying that we shouldn't have to escape the character. Additionally, approaches 1 and 2 work as expected if you just enter them in on the command line as normal. Could it be that the subprocess module is interpreting the character as a pipe instead of a regex OR?


Solution

The result of command.split() contains quotes which should no longer be there. That's why Python provides shlex.split, but it's also not hard to understand how to split the command manually, though obviously you need to understand the role of the quotes in the shell, and how basically you need to remove them when there is no shell.

command = 'grep "one\|three" ./file.txt'
results1 = subprocess.check_output(['grep', r'one\|three', './file.txt'])
results2 = subprocess.check_output(shlex.split(command))
results3 = subprocess.check_output(command, shell=True) # better avoid

Quotes tell the shell to not perform whitespace tokenization and/or wildcard expansion on a value, but when there is no shell, you should simply provide a string instead where the shell allowed or even required you to use a quoted string.



Answered By - tripleee
Answer Checked By - Cary Denson (WPSolving Admin)