Issue
This question is very much the same as this except that I am looking to do this as fast as possible, doing only a single pass of the (unfortunately gzip compressed) file.
Given the pattern CAPTURE
and input
1:.........
...........
100:CAPTURE
...........
150:CAPTURE
...........
200:CAPTURE
...........
1000:......
Print:
100:CAPTURE
...........
150:CAPTURE
...........
200:CAPTURE
Can this be accomplished with a regular expression?
I vaguely remember that this kind of grammar cannot be captured by a regular expression but not quite sure as regular expressions these days provide look aheads,etc.
Solution
You can buffer the lines until you see a line that contains CAPTURE
, treating the first occurrence of the pattern specially.
#!/usr/bin/env perl
use warnings;
use strict;
my $first=1;
my @buf;
while ( my $line = <> ) {
push @buf, $line unless $first;
if ( $line=~/CAPTURE/ ) {
if ($first) {
@buf = ($line);
$first = 0;
}
print @buf;
@buf = ();
}
}
Feed the input into this program via zcat file.gz | perl script.pl
.
Which can of course be jammed into a one-liner, if need be...
zcat file.gz | perl -ne '$x&&push@b,$_;if(/CAPTURE/){$x||=@b=$_;print@b;@b=()}'
Can this be accomplished with a regular expression?
You mean in a single pass, in a single regex? If you don't mind reading the entire file into memory, sure... but this is obviously not a good idea for large files.
zcat file.gz | perl -0777ne '/((^.*CAPTURE.*$)(?s:.*)(?2)(?:\z|\n))/m and print $1'
Answered By - haukex