Wednesday, January 12, 2022

[SOLVED] List lines beetween 2 keywords using grep/sed/awk

January 12, 2022 grep, linux, regex

Issue

I have a sas log file and I want to list only those lines that are between two words: data and run.

File can contain many such words in many lines, for example:

MPRINT: data xxxxx;
yyyyy
xxxxxx
MPRINT: run;

fffff
yyyyy

data fff;
fffff
run;

I would like to have lines 1-4 and 8-10.

I tried something like egrep -iz file -e '\sdata\s+\S*\s+(.|\s)*\srun\s' but this expression lists all lines between first begin and last end ((.|\s) is for the purpose of new line character).

I may also want to add additional words to pattern between data and run like:

MPRINT: data xxx;
fffff
NOTE: ffdd
set fff;
xxxxxx
MPRINT: run;

data fff;
yyyyyy
run;

In some cases I would like to list only lines between data and run where there is set word in some line.

I know there are many similar threads, but I didn't find any when keywords can repeat multiple times. I'm not familiar awk or sed but if it can help I can also use it.

[Edit]
Note that data and run are not necessarily on the beginning of the line (I updated the example). Also there can't be any other data between data and run.

[Edit2]
As Tom noted every line that I was looking for started with MPRINT(...):, so filtered those lines.
Anubhava answer helped me the most with my final solution so I mark it as an answer.
Final expression looked like this :

grep -o path -e 'MPRINT.*' | cut -f '2-' -d ' '| 
grep -iozP '(?ms) data [^\(;\s]+.*?(set|infile).*?run[^\n]*\n

Solution

You may use this gnu grep command witn -P (PCRE) option:

grep -ozP '(?ms).*?data .*?run[^\n]*\n' file

If you only want to print block with line starting from set then use:

grep -ozP '(?ms).*?data .*?^set.*?run[^\n]*\n' file

MPRINT: data xxxxx;
yyyyy
set fff;
xxxxxx
MLOGIC: run;

You may use this awk to print between 2 keywords that must contain a line starting with set:

awk '/data / {
   p=1
}
p && !y {
if (/^set/)
   y=1
else
   buf = buf $0 ORS
}
y {
   if (buf != "")
      printf "%s", buf
   buf=""
   print
}
/run/ {
   p=y=0
}' file

MPRINT: data xxxxx;
yyyyy
set fff;
xxxxxx
MLOGIC: run;

If you just want to print data between 2 keywords in awk, it is so simple:

awk '/data /,/run/' file

Answered By - anubhava

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, January 12, 2022

[SOLVED] List lines beetween 2 keywords using grep/sed/awk

Issue

Solution

Popular Posts

Labels