Tuesday, January 30, 2024

[SOLVED] How does the exit code of “test” act as a “if” condition?

Issue

I’m getting confused with using the test command inside a script. Normally I am experienced with Linux but lately I was not able to get my head around the following problem:

I want to check if a directory exists by using test -d dir. Depending if the directory exists or not I want to return a echo output. So the if statement will look somewhat like this:

if test -d $HOME/dir
then
    echo $?
    echo The directory exists.
else
    echo $?
    echo The directory does not exist.
fi

My problem now is understanding the error codes. Why does the if statement go into the condition True block when the error code of the test command is 0? From my understanding it should be the other way around, especially if I don’t specify a condition but simply use if test ….

I have read the man file of test but I wasn’t getting any smarter. In addition I tested to use a condition for the if statement like this:

test -d $HOME/dir
exitcode=$(echo $?)

if [ "$exitcode" -ne 0 ]
then
    echo -e exit is $exitcode
    echo -e does not exist
else
    echo -e exit is $exitcode
    echo -e exists
fi
exit

Of course both scripts return the same output, but again: I don’t know why using the test command together with the if statement makes the if statement go inside the condition true block when the exit code is 0.


Solution

You are thinking about this slightly backwards. if doesn't test Boolean expressions. (The shell itself doesn't have Boolean expressions to test.) if always executes a command and checks for a zero-vs-non-zero exit status.

[ itself is just a synonym for test, and test is a command that basically uses a mini-language to construct a Boolean expression from its arguments, evaluates that expression, then exits 0 if the expression is true, and 1 if it is false.

The convention is opposite that used by C and C-like languages, where 1 is true and 0 is false. Because a command typically can succeed in one way, but fail in may different ways, we use the unique non-positive value 0 to indicate success, but different positive values to indicate failure. It is a further convention that 1 itself represents a general error, but many commands use different non-zero exit status to indicate different errors. Sometimes, you care about a distinction between 1 and 2, other times you don't. When you do, you can use a case statement to distinguish them. For example,

grep "$some_regex" foo.txt

case $? in
  0) echo "regex matched"
  1) echo "regex did not match"
  *) echo "some error running grep, we don't know if the regex matches or not"
esac


Answered By - chepner
Answer Checked By - Senaida (WPSolving Volunteer)