Wednesday, April 6, 2022

[SOLVED] Regex capture group works in Javascript and regex101, but not in sed

Issue

In regex101: https://regex101.com/r/FM88LA/1

enter image description here

In my browser console:

x='"AbCd123|999"';
"\"AbCd123|999\""
x.match(/[^\""|]+/)
Array [ "AbCd123" ]

Using sed in the shell:

(base) balter@winmac:~/winhome/CancerGraph/TCGA$ echo '"AbCd123|99999"' | sed -En 's/([^\"|]+)/\1/p'
"AbCd123|99999"
(base) balter@winmac:~/winhome/CancerGraph/TCGA$ echo '"AbCd123|99999"' | sed -En 's/\"([^|]+)/\1/p'
AbCd123|99999"

Solution

That is all fine, because sed command used with -n option and p flag only prints the text that was not matched + the result of the successful replacement.

That means, you can get your "match" with

echo '"AbCd123|99999"' | sed -En 's/["|]*([^"|]+).*/\1/p'

See the online demo.

Here, the pattern gets to the first char that is not " and | with ["|]*, then the ([^"|]+) part captures one or more chars other than " and |, and then .* matches the rest of the string.

Everything that was matched but not captured is removed as you only ask to print the \1, the Group 1 value (captured with ([^"|]+)).



Answered By - Wiktor Stribiżew
Answer Checked By - Gilberto Lyons (WPSolving Admin)