Issue
I have a csv file full of values such as this:
0.00145423,3.03795e-05
I wanted to check that all the lines were consistent so I tried to grep for any unexpected characters like so...
grep '[^0-9,e\-\.]' myfile
In my mind it goes like this: find a line with any character []
that is not ^
a number 0-9
, comma ,
, letter e e
, hyphen \-
(attempted to escape with \
), or a period \.
. However, hyphens still continue match.
[EDIT]This does not happen in python, only with bash/grep:
>>> re.search("[^0-9,e\-\.]", "0.00145423,3.03795e-05")
>>>
unsatisfying solution:
If I move the escaped hyphen to the end it works:
grep '[^0-9,e\.\-]' myfile
Putting the escaped hyphen next to the 0-9
range results in grep: Invalid range end
.
Can someone explain what's going on? Is this some bash argument parsing issue or something specific to grep?
bash4.3.33
, grep2.21
Solution
The way to include a literal -
in a character list is to put it in the first or last position of the bracket expression, exactly as shown in the answer at: Get final special character with a regular expression.
From POSIX 9.3.5 RE Bracket Expression:
The character shall be treated as itself if it occurs first (after an initial
^
, if any) or last in the list, or as an ending range point in a range expression.
Some tools might have additional ways of doing it with some kind of escaping but you're always safe to just put it first or last.
Note that -
isn't the only character that has different behavior depending where it shows up in a bracket expression. Consider ]
, and ^
as well.
Answered By - Ed Morton Answer Checked By - David Goodson (WPSolving Volunteer)