Issue
I need to replace the beginning of each line of a pattern range sorted by ID, I am using sed but other languages are welcome!
This is the sample text with the objectives
...
==OPEN==
data: blabla
id: class1
moredata: blabla
==CLOSE==
==OPEN==
id: class2
boringdata: blabla
==CLOSE==
...extra info
==OPEN==
id: class8
data: ...
==CLOSE==
...more info
==OPEN==
data: ...
boringdata: ...
id: class10
==CLOSE==
...
If I were to comment out the pattern with id 8, the expected output would be:
...
==OPEN==
data: blabla
id: class1
moredata: blabla
==CLOSE==
==OPEN==
id: class2
boringdata: blabla
==CLOSE==
...extra info
// ==OPEN==
// id: class8
// data: ...
// ==CLOSE==
...more info
==OPEN==
data: ...
boringdata: ...
id: class10
==CLOSE==
...
The closest code I have gotten is this, but I have to rewrite the entire range and it is not affordable:
sed -e '/==OPEN==/ {:loop; N; /==CLOSE==/! b loop; /id: class8/ {s/.*/NEEDS REWRITE/}}' /example
If I tell it to rewrite the beginning (^), it rewrites only the first line of the range, I think it is because it considers the entire pattern as one line.
Solution
The closest code I have gotten is this, but I have to rewrite the entire range and it is not affordable:
sed -e '/==OPEN==/ {:loop; N; /==CLOSE==/! b loop; /id: class8/ {s/.*/NEEDS REWRITE/}}' /example
That's actually not bad.
If I tell it to rewrite the beginning (^), it rewrites only the first line of the range, I think it is because it considers the entire pattern as one line.
Yes, with POSIX sed
, and with GNU sed
by default, the ^
matches the beginning of the pattern space only. However, you can match the newlines themselves:
sed -e '/==OPEN==/ {:loop; N; /==CLOSE==/! b loop; /id: class8/ {s,\(^\|\n\),\1// ,g}}' \
/example
Note in particular:
- The text being replaced is expressed as the group
\(^\|\n\)
, meaning either the zero-length substring at the beginning of the pattern space or a newline, which is captured as a group. - The matched text is echoed back into the replacement via
\1
. That has no visible effect when it is the^
alternative that matches, but it avoids eliminating newlines when the other alternative matches. - The comma (
,
) is used as pattern delimiter, so that the slashes (/
) in the replacement text don't need to be escaped. - The
g
flag is used to cause all appearances of the pattern to be replaced, not just the first.
If you are willing to rely on GNU extensions, then you can do it a bit more simply:
sed -e '/==OPEN==/ {:loop; N; /==CLOSE==/! b loop; /id: class8/ {s,^,// ,gm}}' \
/example
With GNU sed
, the m
flag in the s
command causes ^
to match both at the beginning of the pattern space and immediately after each newline within. This flag is not specified by POSIX.
Answered By - John Bollinger