Tuesday, October 25, 2022

[SOLVED] Regex only inside multiline match

Issue

I have an old app that generates something like:

USERLIST (
    "jasonr"
    "jameso"
    "tommyx"
)
ROLELIST (
    "op"
    "admin"
    "ro"
)

I need some form of regex that changes ONLY the USERLIST section to USERLIST("jasonr", "jameso", "tommyx") and the rest of the text remain intact:

USERLIST("jasonr", "jameso", "tommyx")
ROLELIST (
    "op"
    "admin"
    "ro"
)

In addition to the multiline issue, I don't know how to handle the replacement in only part of the string. I've tried perl (-0pe) and sed, can't find a solution. I don't want to write an app to do this, surely there is a way...


Solution

perl -0777 -wpe'
    s{USERLIST\s*\(\K ([^)]+) }{ join ", ", $1 =~ /("[^"]+")/g }ex' file

Prints the desired output on the shown input file. Broken over lines for easier view.

With -0777 switch the whole file is read at once into a string ("slurped") and is thus in $_. With /x modifier literal spaces in the pattern are ignored so can be used for readability.

Explanation

  • Capture what follows USERLIST (, up to the first closing parenthesis. This assumes no such paren inside USERLIST( ... ). With \K lookbehind all matches prior to it stay (are not "consumed" out of the string) and are excluded from $&, so we don't have to re-enter them in the replacement side

  • The replacement side is evaluated as code, courtesy of /e modifier. In it we capture all double-quoted substrings from the initial $1 capture (assuming no nested quotes) and join that list by , . The obtained string is then used for the replacement for what was in the parentheses following USERLIST



Answered By - zdim
Answer Checked By - Katrina (WPSolving Volunteer)