Issue
may be I will directly explain with example : I am writing my code in python , for grep part also using bash commands.
I have few files , where I need to grep for some pattern , let's say "INFO" All those files can be present two different dir structure : tyep1, type2
- /home/user1/logs/MAIN_JOB/121/patching/a.log (type1)
- /home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/patching/b.log (type2)
- /home/user1/logs/MAIN_JOB/SUB_JOB1/142/DB:2/patching/c.log (type2)
contents of file :
a.log :
[Thu Jan 20 21:05:00 UTC 2022]: database1: INFO: Subject1: This is subject 1.
b.log :
[Thu Jan 22 18:01:00 UTC 2022]: database1: INFO: Subject2: This is subject 2.
c.log :
[Thu Jan 22 18:01:00 UTC 2022]: database1: ERR: Subject3: This is subject 3.
So I need to know which are all the files does "INFO" string is present. if present I need to get following :
filename : a.log / b.log
filepath : /home/user1/logs/MAIN_JOB/121/patching or /home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/patching
immediate string after search string : Subject1 / Subject2
So I tried using grep command with -r to know what are all the files I can find "INFO"
$ grep -r /home/user1/logs/MAIN_JOB
/home/user1/logs/MAIN_JOB/121/patching/a.log:[Thu Jan 20 21:05:00 UTC 2022]: database1: INFO: Subject1: This is subject 1.
/home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/patching/b.log:[Thu Jan 22 18:01:00 UTC 2022]: database1: INFO: Subject2: This is subject 2.
$
So I will store above grep python variable and need to extract above things from this output.
I tried initially splitting grep o/p with "\n" , so I will get two separate rows
/home/user1/logs/MAIN_JOB/121/patching/a.log:[Thu Jan 20 21:05:00 UTC 2022]: database1: INFO: Subject1: This is subject 1.
/home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/patching/b.log:[Thu Jan 22 18:01:00 UTC 2022]: database1: INFO: Subject2: This is subject 2.
and by taking each row , I can split with ":" First row: I am able to split properly as ":" is at correct places.
file_with_path : /home/user1/logs/MAIN_JOB/121/patching/a.log(I can get file name separate with os.path.basename(file_with_path))
immediate str after search word : "Subject1"
Second row : This is where I need help , As in the path we have this "DB:1" which has ":" which will break my proper split. If I split I will get as below
file_with_path : /home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB (not correct)
actually should be /home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/patching/b.log
I am unable to apply split here as it doesn't work properly for both the cases.
Can you please help me with this? any command that can do this work in bash or python would be very helpful. Thank you In Advance. Also let me know if some info is needed from me.
giving code below:
# main dir
patch_log_home = '/home/user1/logs/MAIN_JOB'
cmd = "grep -r 'INFO' {0}"
patch_bug_inc = self._core.exec_os_cmd(cmd.format(patch_log_home))
# if no occurrance reported continue
if len(patch_bug_inc) == 0:
return
if patch_bug_inc:
patch_bug_inc = patch_bug_inc.split("\n");
for inc in patch_bug_inc:
print("_________________________________________________")
inc = inc.split(":")
# to get subject part
patch_bug_str_index = [i for i, s in enumerate(inc) if 'INFO' in s][0]
inc_name = inc[patch_bug_str_index+1]
# file name
log_file_name = os.path.basename(inc[0])
# get file path
log_path = os.path.split(inc[0])
print("log_path :", log_path)
full_path = log_path[0]
print("FULL PATH: ", full_path)
Solution
Here's one way you could achieve this without calling out to grep which, as I said in my comment, may not be portable:
import os
import sys
for root, _, files in os.walk('/home/user1/logs/MAIN_JOB'):
for file in files:
if file.endswith('.log'):
path = os.path.join(root, file)
try:
with open(path) as infile:
for line in infile:
if 'INFO:' in line:
print(path)
break
except Exception:
print(f"Unable to process {path}", file=sys.stderr)
Answered By - Olvin Roght Answer Checked By - Marie Seifert (WPSolving Admin)