Issue
I'm trying to emulate GNU grep -Eo
with a standard awk
call.
What the man says about the -o
option is:
-o --only-matching
Print only the matched (non-empty) parts of matching lines, with each such part on a separate output line.
For now I have this code:
#!/bin/sh
regextract() {
[ "$#" -ge 2 ] || return 1
__regextract_ere=$1
shift
awk -v FS='^$' -v ERE="$__regextract_ere" '
{
while ( match($0,ERE) && RLENGTH > 0 ) {
print substr($0,RSTART,RLENGTH)
$0 = substr($0,RSTART+1)
}
}
' "$@"
}
My question is: In the case that the matching part is 0-length
, do I need to continue trying to match the rest of the line or should I move to the next line (like I already do)? I can't find a sample of input+regex that would need the former but I feel like it might exist. Any idea?
Solution
Here's a POSIX awk version, which works with a*
(or any POSIX awk regex):
echo abcaaaca |
awk -v regex='a*' '
{
while (match($0, regex)) {
if (RLENGTH) print substr($0, RSTART, RLENGTH)
$0 = substr($0, RSTART + (RLENGTH > 0 ? RLENGTH : 1))
if ($0 == "") break
}
}'
Prints:
a
aaa
a
POSIX awk and grep -E
use POSIX extended regular expressions, except that awk allows C escapes (like \t
) but grep -E
does not. If you wanted strict compatibility you'd have to deal with that.
Answered By - dan Answer Checked By - Willingham (WPSolving Volunteer)