Monday, November 1, 2021

[SOLVED] GREP using a regex expression group

Issue

I am trying to parse some apache access.log and get the user agent. a line looks like

54.183.192.175 - - [27/Nov/2015:16:52:37 +0000] "GET / HTTP/1.0" 200 329 "-" "Mozilla/5.0 (Windows NT 6.3; rv:36.0 Gecko/20100101 Firefox/36.0"

I went to reg101 site and ended up with the expression .*".*".*".*"(.*)" which in the site perfectly matches the user agent. then I tried to use that regex in a grep command and it simply does not return anything.

I tried with single quotes and scapeing the double quotes withtout success. someone could point it to me how should I do it?

grep -o '.*".*".*".*"(.*)"' access.log   -- no results at all

grep -o .*\".*\".*\".*\"(.*)\" access.log   -- error `bash: syntax

error near unexpected token ('


Solution

To extract string in last pair of "", awk would be simplest solution:

awk -F '"' '{print $(NF-1)}' httpd.log
Mozilla/5.0 (Windows NT 6.3; rv:36.0 Gecko/20100101 Firefox/36.0

How it works:

  • By using -F '"' we use " as field separator
  • $(NF-1) gets last - 1 field


Answered By - anubhava