Issue
I have a sas log file and I want to list only those lines that are between two words: data
and run
.
File can contain many such words in many lines, for example:
MPRINT: data xxxxx;
yyyyy
xxxxxx
MPRINT: run;
fffff
yyyyy
data fff;
fffff
run;
I would like to have lines 1-4 and 8-10.
I tried something like
egrep -iz file -e '\sdata\s+\S*\s+(.|\s)*\srun\s'
but this expression lists all lines between first begin
and last end
((.|\s)
is for the purpose of new line character).
I may also want to add additional words to pattern between data
and run
like:
MPRINT: data xxx;
fffff
NOTE: ffdd
set fff;
xxxxxx
MPRINT: run;
data fff;
yyyyyy
run;
In some cases I would like to list only lines between data
and run
where there is set
word in some line.
I know there are many similar threads, but I didn't find any when keywords can repeat multiple times.
I'm not familiar awk
or sed
but if it can help I can also use it.
[Edit]
Note that data
and run
are not necessarily on the beginning of the line (I updated the example). Also there can't be any other data
between data
and run
.
[Edit2]
As Tom noted every line that I was looking for started with MPRINT(...):
, so filtered those lines.
Anubhava answer helped me the most with my final solution so I mark it as an answer.
Final expression looked like this :
grep -o path -e 'MPRINT.*' | cut -f '2-' -d ' '|
grep -iozP '(?ms) data [^\(;\s]+.*?(set|infile).*?run[^\n]*\n
Solution
You may use this gnu grep
command witn -P
(PCRE) option:
grep -ozP '(?ms).*?data .*?run[^\n]*\n' file
If you only want to print block with line starting from set
then use:
grep -ozP '(?ms).*?data .*?^set.*?run[^\n]*\n' file
MPRINT: data xxxxx;
yyyyy
set fff;
xxxxxx
MLOGIC: run;
You may use this awk
to print between 2 keywords that must contain a line starting with set
:
awk '/data / {
p=1
}
p && !y {
if (/^set/)
y=1
else
buf = buf $0 ORS
}
y {
if (buf != "")
printf "%s", buf
buf=""
print
}
/run/ {
p=y=0
}' file
MPRINT: data xxxxx;
yyyyy
set fff;
xxxxxx
MLOGIC: run;
If you just want to print data between 2 keywords in awk, it is so simple:
awk '/data /,/run/' file
Answered By - anubhava