Issue
I have a M3U playlist that looks something like this:
#EXTM3U
#EXTINF:-1 tvg-id="wsoc.us" tvg-name="ABC 9 (Something) (WSOC)" tvg-logo="" group-title="US Locals",ABC 9 (Something) WSOC (WSOC)
http://some.url/1
#EXTINF:-1 tvg-id="wbtv.us" tvg-name="CBS 3 WBTV (WBTV)" tvg-logo="" group-title="US Locals",CBS 3 WBTV (WBTV)
http://some.url/2
#EXTINF:-1 tvg-id="wcnc.us" tvg-name="NBC (Hey) 36 WCNC (WCNC)" tvg-logo="" group-title="US Locals (Something here)",NBC 36 (Hey) WCNC (WCNC)
http://some.url/3
#EXTINF:-1 tvg-id="wjzy.us" tvg-name="FOX 46 WJZY (Shout Out) (WJZY)" tvg-logo="" group-title="US Locals",FOX 46 WJZY (Shout Out) (WJZY)
http://some.url/4
I'm looking to get the last entry in the tvg-name field without the parenthesis - for example, WSOC and WBTV and WCNC, etc.
This works:
grep -Po 'tvg-name=\".*?\"' Playlist.m3u | awk -F'(' '{print $NF}' | cut -f1 -d")" | sort -u
But I know there has got to be a better than using grep, awk, and cut. It's been driving me nuts.
Solution
Using just a regex with GNU
grep
:
grep -oP 'tvg-name.*\(\K\w+(?=\))' /tmp/file.m3u
The regular expression matches as follows:
Node | Explanation |
---|---|
tvg-name |
'tvg-name' |
.* |
any character except \n (0 or more times (matching the most amount possible)) |
\( |
( |
\K |
resets the start of the match (what is K ept) as a shorter alternative to using a look-behind assertion: look arounds and Support of K in regex |
\w+ |
word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible)) |
(?= |
look ahead to see if there is: |
\) |
) |
) |
end of look-ahead |
Or using a proper m3u parser:
Need to install CPAN
module
cpan Parse::M3U::Extended
:
#!/usr/bin/env perl
use strict; use warnings;
use Parse::M3U::Extended qw(m3u_parser);
use File::Slurp;
use feature 'say';
my $m3u = read_file('/tmp/file.m3u');
my @items = m3u_parser($m3u);
foreach my $item (@items) {
if ($item->{type} eq "directive" and $item->{tag} eq "EXTINF") {
$_ = $item->{value};
s/.*\((\w+)\)/$1/;
say;
}
}
This have the advantage to be reusable for other use-cases in a reliable way, that is not the case with random awk
, sed
etc...
Output:
WSOC
WBTV
WCNC
WJZY
Answered By - Gilles Quénot Answer Checked By - Gilberto Lyons (WPSolving Admin)