Sunday, January 9, 2022

[SOLVED] How do I filter lines in a text file that start with a capital letter and end with a positive integer with regex on the command line in linux?

Issue

I am attempting to use Regex with the grep command in the linux terminal in order to filter lines in a text file that start with Capital letter and end with a positive integer. Is there a way to modify my command so that it does this all in one line with one call of grep instead of two? I am using windows subsystem for linux and the microsoft store ubuntu.

Text File:

C line 1
c line 2
B line 3
d line 4
E line five

The command that I have gotten to work:

grep ^[A-Z] cap*| grep [0-9]$ cap*

The Output

C line 1
B line 3

This works but i feel like the regex statement could be combined somehow but

grep ^[A-Z][0-9]$ 

does not yield the same result as the command above.


Solution

You need to use

grep '^[A-Z].*[0-9]$'
grep '^[[:upper:]].*[0-9]$'

See the online demo. The regex matches:

  • ^ - start of string
  • [A-Z] / [[:upper:]] - an uppercase letter
  • .* - any zero or more chars ([^0-9]* matches zero or more non-digit chars)
  • [0-9] - a digit.
  • $ - end of string.

Also, if you want to make sure there is no - before the number at the end of string, you need to use a negated bracket expression, like

grep -E '^[[:upper:]](.*[^-0-9])?[1-9][0-9]*$'

Here, the POSIX ERE regx (due to -E option) matches

  • ^[[:upper:]] - an uppercase letter at the start and then
  • (.*[^-0-9])? - an optional occurrence of any text and then any char other than a digit and -
  • [1-9] - a non-zero digit
  • [0-9]* - zero or more digits
  • $ - end of string.


Answered By - Wiktor Stribiżew