Saturday, February 19, 2022

[SOLVED] How do to: multiple multi-line replacements using text from the pattern match?

Issue

I'm implementing an annotation feature in Bash and am looking for either an awk or sed solution for some text manipulation.

I'd like to transform text in a file from:

^version 10.2 tag1 tag2
^audit arg1 arg2
f()
{
...
}
g()
{
...
}
^version 10.2
h() { ... }
^version 10.2

i() { ... } # Not annotated: doesn't immediately follow an annotation

to:

annotate f^1 version 10.2 tag1 tag2
annotate f^1 audit arg1 arg2
f^1()
{
...
}
g()
{
...
}
annotate h^2 10.2
h^2() { ... }

i() { ... } # Not annotated: doesn't immediately follow an annotation

Replacements are done as follows:

  • lines beginning with ^ are replaced by annotate, a space, the function name found after the annotation lines, a ^, an index, and the rest of the line
  • the function name is suffixed with a ^ and the index (after this, the index is incremented)

Function names begin in column 1 and are Bash function namess that do not require POSIX compliance (see Bash source code builtins/declare.def: shell function names don't have to be valid identifiers; and, in parse.y, a function is a WORD). An acceptably imperfect regex for the function part of the pattern is (but I'll upvote solutions that can figure out a better regex, even if they don't answer the bigger question--it was hard to figure out from reading the source code):

^[^'"()]\+\s*(\s*)

Note that an annotation applies only to the immediately following function following the match. If the function does not immediately follow the annotation lines, then the annotations should not be emitted at all.

The solution should be general and not include strings found in the example above (version, audit, f, g, h, etc.).

Solutions must not require utilities/packages that are not found in CentOS 7 Minimal. So, unfortunately, Perl cannot be considered. I would prefer an awk solution.

Your answer will be used to improve the code for an open-source Bash project: Eggsh.


Solution

Try something like this:

/^\^/ { if (ann == 0) count++; ann++; acc[ann] = substr($0, 2); next; }
/^[a-zA-Z0-9_]\s*(\s*)/ && ann {
    ind = index($0, "(");
    fname = substr($0, 1, ind-1)
    for (i = 1; i <= ann; i++) {
        print "annotate " fname "^" count " " acc[i];
    }
    print fname "^" count substr($0, ind);
    ann = 0;
    next;
}
{ ann = 0; print; }

Note that I have not bothered to do the research necessary to find a better function name regexp.



Answered By - Michael Vehrs
Answer Checked By - Katrina (WPSolving Volunteer)