Issue
I'm trying to do some crazy regular expressions in sed but am not allowed to
Am I just not getting regular expressions or are regular expressions different in sed
The file I"m working with is like
46,uie,ieo
39,ieu,tii
44-46,yut,til
45,dkd,ytu
65,dkd,ytu
40-45,dkd,ytu
When I do
cat text.txt | sed s/^4[0-9],//g
I almost get what I want, I get
uie,ieo
39,ieu,tii
44-46,yut,til
dkd,ytu
65,dkd,ytu
40-45,dkd,ytu
But I want to get rid of the of one's that are like 40-45 and 44-46 So I've tried
cat text.txt | sed s/^4[0-9](-4[0-9])?,//g
-bash: syntax error near unexpected token `('
And when I try
cat text.txt | sed s/^4[0-9]-?4?[0-9]?,//g
I just get
46,uie,ieo
39,ieu,tii
44-46,yut,til
45,dkd,ytu
65,dkd,ytu
40-45,dkd,ytu
So nothing is filtered
Thank-you!
Solution
cat text.txt | sed s/^4[0-9](-4[0-9])?,//g
Two problems.
First, you need to quote the argument to sed
. It contains metacharacters that are recognized by the shell, such as (
and ?
; you need to quote the argument so the shell treats it as just a string and doesn't try to interpret it.
cat text.txt | sed 's/^4[0-9](-4[0-9])?,//g' # this still doesn't work
Second, sed
doesn't use extended regular expressions by default. If you're using GNU sed (type sed --version
to confirm this), you can use the -E
option to enable extended regular expressions:
cat text.txt | sed -E 's/^4[0-9](-4[0-9])?,//g'
or you can use backslashes to let sed
recognize the (
, )
, and ?
characters:
cat text.txt | sed 's/^4[0-9]\(-4[0-9]\)\?,//g'
Finally, this is a Useless Use of cat
. sed
is perfectly capable of reading input either from stdin or from a specified file; you don't need to feed it its input via a pipe from cat
:
sed 's/^4[0-9]\(-4[0-9]\)\?,//g' text.txt
The -E
option is specified by POSIX; I think this was a relatively recent addition. GNU sed has supported -E
since 2006 (for originally compatibility with BSD sed), but it's not currently documented in any released version. Documentation was added in 2013, but the most recent official release of GNU sed was 4.2.2, in 2012.
UPDATE 2021-11-08: It does not appear that POSIX specifies the -E
option (see https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html), though it has been proposed.
The manual for version 4.7 of GNU sed says:
'-E'
'-r'
'--regexp-extended'
Use extended regular expressions rather than basic regular
expressions. Extended regexps are those that 'egrep' accepts; they
can be clearer because they usually have fewer backslashes.
Historically this was a GNU extension, but the '-E' extension has
since been added to the POSIX standard
(http://austingroupbugs.net/view.php?id=528), so use '-E' for
portability. GNU sed has accepted '-E' as an undocumented option
for years, and *BSD seds have accepted '-E' for years as well, but
scripts that use '-E' might not port to other older systems. *Note
Extended regular expressions: ERE syntax.
The manual links to this entry in the Austin Group Defect Tracker, which lists the issue as "Resolved => Applied" as of 2020-03-18. Perhaps it just hasn't been applied to the opengroup.org web site.
Answered By - Keith Thompson