Tuesday, July 26, 2022

[SOLVED] Excluding a string in a regex matching, for sed processing

Issue

I need to match this for a substitute command:

whatever__MATCH_THIS__whateverwhatever__AND_THIS__whateverwhatever

I am trying with:

sed -e 's/__\(.*\)__/\{{\1}}/g' myfile

But this is eagerly matching __MATCH_THIS__whateverwhatever__AND_THIS__, producing:

whatever{{MATCH_THIS__whateverwhatever__AND_THIS}}whateverwhatever

But I wanted:

whatever{{MATCH_THIS}}whateverwhatever{{AND_THIS}}whateverwhatever

How can I specify a string to exclude, in the matching part? I know how to exclude one character (for example [^a]) but not how to exclude a string.


Solution

What you need is a non-greedy regex, but unfortunately sed doesn't allow that. However, it can be done in perl.

perl -pe 's|__(.*?)__|{{$1}}|g' <myfile

The question mark after the asterisk denotes the matcher as being non-greedy, so instead of taking the longest matching string it can find, it'll take the shortest.

Hope that helps.

If you wanted to put this in a perl script rather than run on the command line, then something like this will do the job:

#! /usr/bin/perl -w
use strict; # Habit of mine
use 5.0100; # So we can use 'say'

# Save the matching expression in a variable. 
# qr// tells us it's a regex-like quote (http://perldoc.perl.org/functions/qr.html)
my $regex = qr/__(.*?)__/;

# Ordinarily, I'd write this in a way I consider to be less perl-y and more readable.
# What it's doing is reading from the filename supplied on STDIN and places the
# contents of the file in $_. Then it runs the substitution on the string, before
# printing out the result.
while (<>) {
  $_ =~ s/$regex/{{$1}}/g;
  say $_;
}

Usage is simple:

./regex myfile
whatever{{MATCH_THIS}}whateverwhatever{{AND_THIS}}whateverwhatever

It's Perl, there are a million and one ways to do it!



Answered By - chooban
Answer Checked By - Robin (WPSolving Admin)