Monday, October 25, 2021

[SOLVED] sed not recognizing matched group

Issue

I'm trying to print out a sub string from all pdf files in a directory. I can't seem to make sed work with it. The regex is correct, but sed gives an error when I use \1

for old in ./*.pdf; do
    new=$(echo $old | sed -e 's/(\.\/)?\d+_(\w\w\-\d+).+/\1/')
    echo $new
done

I'm using sed (GNU sed) 4.4

The output is:

sed: -e expression #1, char 32: invalid reference \1 on `s' command's RHS

for each file in the directory...

Thanks!


Solution

You may use

sed -E 's/(\.\/)?[0-9]+_[A-Z][A-Z]-[0-9]+.+/\1/'

Note that sed does not support PCRE regex, thus, \d and \w are just plain invalid constructs here. To match any letter, you may use [:alpha:] POSIX character class, or if you wish to match uppercase letters, use [:upper:].

Instead of \d, use [0-9] or [:digit:].

In the BRE POSIX pattern, ( and ) denote the literal parentheses, that is why you got an error saying you cannot refer to any capturing group value - there was none defined in the pattern. To make unescaped parentheses create a group in a POSIX BRE pattern, you need to escape them, or - if you use a POSIX ERE pattern (sed with -r or -E option), you may use them unescaped.

Same goes for + quantifier: in a POSIX BRE pattern it should escaped, in an ERE pattern, it is OK to use it unescaped.

Besides, you do not need to use a second capturing group since you are not using \2 in the replacement.



Answered By - Wiktor Stribiżew