Tuesday, October 4, 2022

[SOLVED] How can I match spaces with a regexp in Bash?

Issue

I expect the code below to echo "yes", but it does not. For some reason it won't match the single quote. Why?

str="{templateUrl: '}"
regexp="templateUrl:[\s]*'"

if [[ $str =~ $regexp ]]; then
  echo "yes"
else
  echo "no"
fi

Solution

Replace:

regexp="templateUrl:[\s]*'"

With:

regexp="templateUrl:[[:space:]]*'"

According to man bash, the =~ operator supports "extended regular expressions" as defined in man 3 regex. man 3 regex says it supports the POSIX standard and refers the reader to man 7 regex. The POSIX standard supports [:space:] as the character class for whitespace.

The GNU bash manual documents the supported character classes as follows:

Within ‘[’ and ‘]’, character classes can be specified using the syntax [:class:], where class is one of the following classes defined in the POSIX standard:

alnum alpha ascii blank cntrl digit graph lower print
punct space upper word xdigit

The only mention of \s that I found in the GNU bash documentation was for an unrelated use in prompts, such as PS1, not in regular expressions.

The Meaning of *

[[:space:]] will match exactly one white space character. [[:space:]]* will match zero or more white space characters.

The Difference Between space and blank

POSIX regular expressions offer two classes of whitespace: [[:space:]] and [[:blank:]]:

  • [[:blank:]] means space and tab. This makes it similar to: [ \t].

  • [[:space:]], in addition to space and tab, includes newline, linefeed, formfeed, and vertical tab. This makes it similar to: [ \t\n\r\f\v].

A key advantage of using character classes is that they are safe for unicode fonts.



Answered By - John1024
Answer Checked By - Mildred Charles (WPSolving Admin)