Issue
I need to cut n number of characters after a specific word from a text file in linux . The tricky part here is the n number of characters is spread across two lines. Almost all of the solutions given for the similar scenario only performs this kind of string extraction within the same line. For example, I have this entry in a text file like below:
I. 2023/06/02 17:57:58. Connection to server 'xxx' as user 'xxx' has been faded out (closed).
I. 2023/06/02 18:01:58. ...... connected to server 'xxx' as user 'xxx'.
I. 2023/06/02 18:30:02. Connection to server 'xxx' as user 'xxx' has been faded out (closed).
E. 2023/06/02 18:38:01. ERROR #9027 DIST(103 xxx.xxx) - /generic/gtr/tdext.c(719)
A begin with transaction id = 'c7f0f1' @ '00000000' P3 '0000000000dd' c '87' ^eWH '0000' WH 'dd' c '87' ^eW>T '0000000000' g '00000000' and local_qid = 0000000000000000000000000000
000000000000000000000000000000721fda00360005 was seen from site xxx.xxwhile a transaction was still active.
From this I would like to extract only the entries after "local_qid", i.e. 0000000000000000000000000000 000000000000000000000000000000721fda00360005
and display in single line. Extracting this into a new variable or a text file into single line is also fine.
Could some one please shed some light if this is possible?
P.S.: It is also possible that the whole set of numbers after local_qid
will be in the same line as well.
Thanks in advance to all the experts in the group!! :)
I have tried sed, awk, grep and almost all of them do this only till the line where they find the match with the word given (local_qid
), output is either like 000000000000000000000000000
(or) if I use sed/awk command to exlcude local_qid
then the output is 000000000000000000000000000000721fda00360005
.
Solution
One potential option using awk:
cat file.txt
I. 2023/06/02 17:57:58. Connection to server 'xxx' as user 'xxx' has been faded out (closed).
I. 2023/06/02 18:01:58. ...... connected to server 'xxx' as user 'xxx'.
I. 2023/06/02 18:30:02. Connection to server 'xxx' as user 'xxx' has been faded out (closed).
E. 2023/06/02 18:38:01. ERROR #9027 DIST(103 xxx.xxx) - /generic/gtr/tdext.c(719)
A begin with transaction id = 'c7f0f1' @ '00000000' P3 '0000000000dd' c '87' ^eWH '0000' WH 'dd' c '87' ^eW>T '0000000000' g '00000000' and local_qid = 0000000000000000000000000000
000000000000000000000000000000721fda00360005 was seen from site xxx.xxwhile a transaction was still active.
awk 'BEGIN{RS="\n\n"} {for (i=1; i<=NF; i++) {if ($i == "local_qid") {print $(i + 2) ($(i + 3) ~ /[0-9]{1,}/ ? $(i + 3) : "")}}}' file.txt
0000000000000000000000000000000000000000000000000000000000721fda00360005
This changes the Record Separator (RS) from one newline ("\n", i.e. read in and process each line, one-by-one) to two newlines (i.e. read in all of the lines until "\n\n" is seen). You can then locate the string "local_qid" and print the field after it (the actual local_qid) and conditionally print the field after that if it is comprised of one or more digits (i.e. if the local_qid extends over to the next line).
Answered By - jared_mamrot Answer Checked By - Gilberto Lyons (WPSolving Admin)