Friday, July 29, 2022

[SOLVED] How to get the text between the last occurrence of a pair of strings

Issue

I need to extract text between the last occurrence of a word called "-----BEGIN CERTIFICATE-----" and the last occurrence of a word called "-----END CERTIFICATE-----".

Input :

some other data above this line
-----BEGIN CERTIFICATE-----
a
b
c
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
d
e
f
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
g
h
i
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
j
k
l
-----END CERTIFICATE-----
some other data below this

Expected output :

-----BEGIN CERTIFICATE-----
j
k
l
-----END CERTIFICATE-----

Output should include the -----BEGIN CERTIFICATE----- and -----END CERTIFICATE-----

My Approach :

I have tried the below command but it is giving me the output without -----BEGIN CERTIFICATE----- and -----END CERTIFICATE-----

openssl s_client -showcerts -connect localhost:443 </dev/null | sed -n 'H; /^-----BEGIN CERTIFICATE-----/h; ${g;p;}' |sed -e '1d' -e '/-----END CERTIFICATE-----/q' |sed '$ d' > mycertfile.pem

output from above command :

j
k
l

Solution

When I hear "get the last XYZ" I think "reverse the file and take the first XYZ": reverse the input, and have sed quit when the first "BEGIN CERT" appears, then re-reverse the data:

openssl ... | tac | sed '/BEGIN CERT/q' | tac

Since there's stuff after the last ---END line, try this variation:

openssl ... | tac | sed -n '/---END/,/---BEGIN/ p; /---BEGIN/ q' | tac

That only prints the ranges of lines bounded by the cert markers, and quits after the first one.


Another approach: use perl to read in the whole input, use a regex to capture all the BEGIN/END blocks, and print the last one.

openssl ... | perl -0777 -nE '
  @certs = m/^-----BEGIN .+?^-----END .+?$/gms;
  say $certs[-1];
'


Answered By - glenn jackman
Answer Checked By - Willingham (WPSolving Volunteer)