Friday, April 15, 2022

[SOLVED] How to format text with multiple separators

Issue

I'd like to extract/format some text with awk.

The source text looks like this:

Section 1:
  main_command some_command1 some_subcommand1      # comment 1
  main_command some_command1 some_subcommand2      # comment 2

Section 2:
  main_command some_command2 some_subcommand3      # comment 3
  main_command some_command2 some_subcommand4      # comment 4

Section 3:
  main_command some_command3 some_subcommand5      # comment 5
  main_command some_command3 some_subcommand6      # comment 6

I want to know how to:

  1. filter to the indented lines under Section 2;
  2. specify which column I want (2 or 3); and
  3. extract the comments (after # ).

For example, if I chose column 2 the output would be:

some_command2<tab>'comment 3'
some_command2<tab>'comment 4'

I've used awk to achieve 1 and 2:

  awk -v RS='\n\n' '/^Section 2:/' "$path" | awk "/^  main_command/ {print $2}"

... but I suspect there's a better way to do it all without piping. Am open to using other tools (eg sed).


Solution

You may use this awk solution that works with any version of awk:

awk -v sq="'" -v OFS='\t' -v n=1 '
$1 == "Section" {
   p = ($2 == "2:")
   next
}
NF && p {
   s = $0
   sub(/^[^#]*#[[:blank:]]*/, "", s)
   print $1, sq s sq
}' file

blah7   'some comment 3...'
blah10  'some more comments 4...'

Using n=2 for printing column 2:

awk -v sq="'" -v OFS='\t' -v n=2 '$1 == "Section" {p = ($2 == "2:"); next} NF && p {split($0, a, /#[[:blank:]]*/); print $1, sq a[2] sq}' 

fileblah7   'some comment 3...'
blah10  'some more comments 4...'


Answered By - anubhava
Answer Checked By - David Marino (WPSolving Volunteer)