Thursday, January 4, 2024

[SOLVED] Better regular expression to get a value in parenthesis

January 04, 2024 awk, grep, regex

Issue

I have a M3U playlist that looks something like this:

#EXTM3U
#EXTINF:-1 tvg-id="wsoc.us" tvg-name="ABC 9 (Something) (WSOC)" tvg-logo="" group-title="US Locals",ABC 9 (Something) WSOC (WSOC) 
http://some.url/1
#EXTINF:-1 tvg-id="wbtv.us" tvg-name="CBS 3 WBTV (WBTV)" tvg-logo="" group-title="US Locals",CBS 3 WBTV (WBTV)
http://some.url/2
#EXTINF:-1 tvg-id="wcnc.us" tvg-name="NBC (Hey) 36 WCNC (WCNC)" tvg-logo="" group-title="US Locals (Something here)",NBC 36 (Hey) WCNC (WCNC)
http://some.url/3
#EXTINF:-1 tvg-id="wjzy.us" tvg-name="FOX 46 WJZY (Shout Out) (WJZY)" tvg-logo="" group-title="US Locals",FOX 46 WJZY (Shout Out) (WJZY)
http://some.url/4

I'm looking to get the last entry in the tvg-name field without the parenthesis - for example, WSOC and WBTV and WCNC, etc.

This works:

grep -Po 'tvg-name=\".*?\"'  Playlist.m3u | awk -F'(' '{print $NF}' | cut -f1 -d")" | sort -u

But I know there has got to be a better than using grep, awk, and cut. It's been driving me nuts.

Solution

Using just a regex with `GNU` `grep`:

grep -oP 'tvg-name.*\(\K\w+(?=\))' /tmp/file.m3u

The regular expression matches as follows:

Node	Explanation
`tvg-name`	'tvg-name'
`.*`	any character except \n (0 or more times (matching the most amount possible))
`\(`	(
`\K`	resets the start of the match (what is `K`ept) as a shorter alternative to using a look-behind assertion: look arounds and Support of K in regex
`\w+`	word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible))
`(?=`	look ahead to see if there is:
`\)`	)
`)`	end of look-ahead

Or using a proper m3u parser:

Need to install CPAN module

cpan Parse::M3U::Extended

#!/usr/bin/env perl

use strict; use warnings;

use Parse::M3U::Extended qw(m3u_parser);
use File::Slurp;
use feature 'say';
my $m3u = read_file('/tmp/file.m3u');
my @items = m3u_parser($m3u);

foreach my $item (@items) {
    if ($item->{type} eq "directive" and $item->{tag} eq "EXTINF") {
        $_ = $item->{value};
        s/.*\((\w+)\)/$1/;
        say;
    }
}

This have the advantage to be reusable for other use-cases in a reliable way, that is not the case with random awk, sed etc...

Output:

WSOC 
WBTV
WCNC
WJZY

Answered By - Gilles Quénot

Answer Checked By - Gilberto Lyons (WPSolving Admin)

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, January 4, 2024

[SOLVED] Better regular expression to get a value in parenthesis

Issue

Solution

Using just a regex with `GNU` `grep`:

The regular expression matches as follows:

Or using a proper m3u parser:

Output:

Popular Posts

Labels

Thursday, January 4, 2024

Issue

Solution

Using just a regex with GNU grep:

The regular expression matches as follows:

Or using a proper m3u parser:

Output:

Popular Posts

Labels

Using just a regex with `GNU` `grep`: