Issue
I have a file in which some lines contain a json object on a single line, and I want to extract the value of the window_indicator
property.
A normal regular expression is: "window_indicator":\s*([\-\d\.]+)
in which I want the value of the fist match group.
Here it is working perfectly well: https://regex101.com/r/w9Iuch/1
I've settled on sed
because it seems that grep
has to print the whole line and can't limit to the match group value, and perl
is overkill.
Unfortunately, sed
isn't actually capable of doing this, is it?
# sed 's/("window_indicator:)/\1/' in.txt
sed: -e expression #1, char 26: invalid reference \1 on `s' command's RHS
# sed -E 's/("window_indicator":)/\1/p' in.txt
prints out every line of the file
# sed -rn 's/("window_indicator":)/\1/p' in.txt
prints the whole line
# sed -rn 's/("window_indicator":)/\1/' in.txt
nothing
Solution
With sed
, you need to match the whole line, capture what you need, replace the whole match with Group 1 placeholder, and make sure you suppress the default line output and only print the new text after successful substitution:
sed -nE 's/.*"window_indicator":[[:space:]]*([-0-9.]+).*/\1/p' in.txt
If the first match is to be retrieved, add q
to q
uit:
sed -nE 's/.*"window_indicator":[[:space:]]*([-0-9.]+).*/\1/p;q' in.txt
Note that \d
is not supported in POSIX regex, it is replaced with 0-9
range in the bracket expression here.
Details
n
- suppress default line outputE
- enables POSIX ERE flavor.*"window_indicator":[[:space:]]*([-0-9.]+).*
- finds.*
- any text"window_indicator":
- a fixed string[[:space:]]*
- zero or more whitespaces (GNU sed supports\s
, too)([-0-9.]+)
- Group 1: one or more digits,-
or.
.*
- any text
\1
- replaces with Group 1 valuep
- prints the result upon successful replacementq
- quits processing the stream.
With GNU grep, it is even easier:
grep -oP '"window_indicator":\s*\K[-\d.]+' in.txt
To get the first match,
grep -oP '"window_indicator":\s*\K[-\d.]+' in.txt | head -1
Here,
o
- outputs matched texts onlyP
- enables the PCRE regex engine"window_indicator":\s*\K[-\d.]+
- matches"window_indicator":
- a fixed string\s*
- zero or more whitespaces\K
- removes the text matched so far from the match value[-\d.]+
- matches one or more-
,.
or digits.
Answered By - Wiktor Stribiżew Answer Checked By - Marie Seifert (WPSolving Admin)