Sunday, March 13, 2022

[SOLVED] How to use sed/awk to extract text between two patterns when a specific string must exist in the text block

Issue

I have found several answers on how to sed/awk between two patterns but I need also to find only the specific text block that has a string inside!

Text example:

<requirement        id = "blabla.1"
                slogan = "Handling of blabla"
          work-package = "bla444.2"
          logical-node = "BLA-C"
                 level = "System"
>
Bla bla.
</requirement>
<requirement        id = "bla.2"
                slogan = "Reporting of blabla"
          work-package = "bla444.1"
          logical-node = "BLA-C"
                 level = "System"
>
Bla bla bla.
</requirement>

So the goal is to get only the text block between & which should have bla444.1 in the work-package! This should give me in the example only the last text block. Of course the file that i would like to sed have more requirements and several with the needed work-package, so not only the last text block that sed will find.

sed -e 's/<requirement\(.*\)<\/requirement/\1/' file

The above sed line will give all the text blocks (requirements).

One thing is that the text block has no fixed line count but all will have work-package!


Solution

Could you please try following.

awk '
/^<requirement/{
  if(found && value){
    print value
  }
  found=value=""
}
{
  value=(value?value ORS:"")$0
}
/work-package.*bla444.1\"$/{
  found=1
}
END{
  if(found && value){
    print value
  }
}
'  Input_file

Explanation: Adding detailed explanation for above code.

awk '                           ##Starting awk program from here.
/^<requirement/{                ##Checking condition if line starts from string <requirement then do following.
  if(found && value){           ##Checking condition if found and value is NOT NULL then do following.
    print value                 ##Printing value(which contains all blocks value, explained further code) here.
  }
  found=value=""                ##Nullifying variables found and value variables here.
}
{
  value=(value?value ORS:"")$0  ##Creating variable value whose value is keep concatenating its own value each time cursor comes here.
}
/work-package.*bla444.1\"$/{    ##Checking condition if a line has string work-package till bla444.1 then do following.
  found=1                       ##Making variable found and setting value to 1, kind of FLAG enabling stuff.
}
END{                            ##Starting END block of this awk code here.
  if(found && value){           ##Checking condition if found and value is NOT NULL then do following.
    print value                 ##Printing value variable here.
  }
}
'  Input_file                   ##Mentioning Input_file name here.


Answered By - RavinderSingh13
Answer Checked By - Clifford M. (WPSolving Volunteer)