Issue
I have a text file using markup language (similar to wikipedia articles)
cat test.txt
This is a sample text having: colon in the text. and there is more [[in single or double: brackets]]. I need to select the first word only.
and second line with no [brackets] colon in it.
I need to select the word "having:" only because that is part of regular text. I tried
grep -v '[*:*]' test.txt
This will correctly avoid the tags, but does not select the expected word.
Solution
A combined solution using sed
and awk
:
sed 's/ /\n/g' test.txt | gawk 'i==0 && $0~/:$/{ print $0 }/\[/{ i++} /\]/ {i--}'
sed
will change all spaces to a newlineawk
(or gawk) will output all lines matching$0~/:$/
, as long asi
equals zero- The last part of the awk stuff keeps a count of the opening and closing brackets.
Another solution using sed
and grep
:
sed -r -e 's/\[.*\]+//g' -e 's/ /\n/g' test.txt | grep ':$'
's/\[.*\]+//g'
will filter the stuff between brackets's/ /\n/g'
will replace a space with a newline- grep will only find lines ending with
:
A third on using only awk
:
gawk '{ for (t=1;t<=NF;t++){
if(i==0 && $t~/:$/) print $t;
i=i+gsub(/\[/,"",$t)-gsub(/\]/,"",$t) }}' test.txt
gsub
returns the number of replacements.- The variable
i
is used to count the level of brackets. On every[
it is incremented by 1, and on every]
it is decremented by one. This is done becausegsub(/\[/,"",$t)
returns the number of replaced characters. When having a token like[[][
the count is increased by (3-1=) 2. When a token has brackets AND a semicolon my code will fail, because the token will match, if it ends with a:
, before the count of the brackets.
Answered By - Luuk Answer Checked By - Clifford M. (WPSolving Volunteer)