Monday, April 11, 2022

[SOLVED] Grep and extract specific data in multiple log files

Issue

I've got multiple log files in a directory and trying to extract just the timestamp and a section of the log line i.e. the value of the fulltext query param. Each query param in a request is separated by an ampersand(&) as shown below.

Input

30/Mar/2022:00:27:36 +0000 [59823] -> GET /libs/granite/omnisearch?p.guessTotal=1000&fulltext=798&savedSearches%40Delete=&

31/Mar/2022:00:27:36 +0000 [59823] -> GET /libs/granite/omnisearch?p.guessTotal=1000&fulltext=Dyson+V7&savedSearches%40Delete=&

Intended Output

30/Mar/2022:00:27:36 -> 798

31/Mar/2022:00:27:36 -> Dyson+V7

I've got this command to recursively search over all the files in the directory.

grep -rn "/libs/granite/omnisearch" ~/Downloads/ReqLogs/ > output.txt

This prints the entire log line starting with the directory name, like so

/Users/****/Downloads/ReqLogs/logfile1_2022-03-31.log:6020:31/Mar/2022:00:27:36 +0000 [59823] -> GET /libs/granite/omnisearch?p.guessTotal=1000&fulltext=798&savedSearches%4

Please enlighten, How do i manipulate this to achieve the intended output.


Solution

grep can return the whole line or the string which matched. For extracting a different piece of data from the matching lines, turn to sed or Awk.

awk -v search="/libs/granite/omnisearch" '$0 ~ search { s = $0; sub(/.*fulltext=/, "", s); sub(/&.*/, "", s); print $1, s }' ~/Downloads/ReqLogs/*

or

sed -n '\%/libs/granite/omnisearch%s/ .*fulltext=\([^&]*\)&.*/\1/p' ~/Downloads/ReqLogs/*

The sed version is more succinct, but also somewhat more oblique.

\%...% uses the alternate delimiter % so that we can use literal slashes in our search expression.

The s/ .../\1/p then says to replace everything on the matching lines after the first space, capturing anything between fulltext= and &, and replace with the captured substring, then print the resulting line.

The -n flag turns off the default printing action, so that we only print the lines where the search expression matched.

The wildcard ~/Downloads/ReqLogs/* matches all files in that directory; if you really need to traverse subdirectories, too, perhaps add find to the mix.

find ~/Downloads/ReqLogs -type f -exec sed -n '\%/libs/granite/omnisearch%s/ .*fulltext=\([^&]*\)&.*/\1/p' {} +

or similarly with the Awk command after -exec. The placeholder {} tells find where to add the name of the found file(s) and + says to put as many as possible in one go, rather than running a separate -exec for each found file. (If you want that, use \; instead of +.)



Answered By - tripleee
Answer Checked By - Katrina (WPSolving Volunteer)