Issue
I have this file
~/ % cat t
---
abc
def DEF
ghi GHI
---
123
456
and I would like to extract the content between the three dashes, so I try
sed -En '{N; /^---\s{5}\w+/,/^---/p}' t
I.e. 3 dashes followed by 5 whitespaces including the newline, followed by one or more word characters and ending with another set of three dashes. This gives me this output
~/ % sed -En '{N; /^---\s{5}\w+/,/^---/p}' t
---
abc
def DEF
ghi GHI
---
123
I don't want the line with "123". Why am I getting that and how do I adjust my expression to get rid of it? [EDIT]: It is important that the four spaces of indentation after the first three dashes are matched in the expression.
Solution
This might work for you (GNU sed):
sed -En '/^---/{:a;N;/^ {4}\S/M!D;/\n---/!ba;p}' file
Turn on extended regexp (-E
) and off implicit printing (-n
).
If a line begins ---
and the following line is indented by 4 spaces, gather up the following lines until another begins ---
and print them.
If the following line does not match the above criteria, delete the first and repeat.
All other lines will pass through unprinted.
N.B. The M
flag on the second regexp for multiline matching , since the first line already begins ---
the next must be indented.
Answered By - potong