Thursday, February 17, 2022

[SOLVED] Parsing only first regex match in a line with several matches

Issue

Is it possible to have a regex that parses only a1bcdea1 from this line a1bcdea1ABCa1DEFa1 ?

This grep command does not work:

$ cat txtfile
a1bcdea1ABCa1DEFa1
$ grep -oE "[A-Z,a-z]1.*?[A-Z,a-z]1" txtfile
a1bcdea1ABCa1DEFa1

I want the output of grep to be only a1bcdea1.

EDIT:

It is obvious that I can just use grep -o "a1bcdea1" for the above line, but consider if one has several thousands of lines and the goal is to match FIRST [A-Z,a-z]1.*?[A-Z,a-z]1 for each single line.


Solution

How about using a ^ start anchor and restricting character set used:

grep -o '^[A-Za-z]1[A-Za-z]*1'

See this Bash demo or Regex Pattern at regex101

If you expect more digits or other characters in between, go with this

grep -oP '^[A-Za-z]1.*?[A-Za-z]1'

The lazy matching requires perl compatible mode. For not at line start, go with this

grep -oP '^.*?\K[A-Za-z]1.*?[A-Za-z]1'

\K resets beginning of the reported match and is a PCRE feature as well.



Answered By - bobble bubble
Answer Checked By - Candace Johnson (WPSolving Volunteer)