Friday, December 17, 2021

[SOLVED] awk replace string with another with new lines ( one time ) after finding another string

December 17, 2021 awk, sed

Issue

I wanted replace ___SIGNATURE___ with an HTML code signature after the first occurrence of "text/html" and only one replacement string ___SIGNATURE___. Any remaining ___SIGNATURE___ tags should simply be removed.

I am processing an email message where the header has a multipart boundary so there are two body parts, one with text/plain and another with text/html and the ___SIGNATURE___ tag exists in both.

So my part of my script looks like this:

awk -v signature="$(cat $disclaimer_file)" '/text\/html/ {html=1;} html==1 && !swap(swap=sub(/___SIGNATURE___/, signature);}1 in.$$ > temp.mail && mv temp.mail in.$$
sed -i "s/charset=us-ascii/charset=utf-8/1;s/___SIGNATURE___//" in.$$

It works, but is that optimal solution?

I have used altermime before but it was not good solution for my case.

Solution

Without access to sample messages, it's hard to predict what exactly will work, and whether we need to properly parse the MIME structures or if we can just blindly treat the message as text.

In the latter case, refactoring to something like

awk 'NR==FNR { signature = signature ORS $0; next }
    { sub(/charset="?[Uu][Ss]-[Aa][Ss][Cc][Ii][Ii]"?/, "charset=\"utf-8\"") }
    /text\/html/ { html = 1 }
    /text\/plain/ { html = 0 }
    /___SIGNATURE___/ {
        if (html && signature) {
            # substr because there is a ORS before the text
            sub(/___SIGNATURE___/, substr(signature, 2))
            signature = ""
        } else
            sub(/___SIGNATURE___/, "")
    } 1' "$disclaimer_file" "in.$$"

would avoid invoking both Awk and sed (and cat, and the quite pesky temporary file), where just Awk can reasonably and quite comfortably do all the work.

If you need a proper MIME parser, I would look into writing a simple Python script. The email library in Python 3.6+ is quite easy to use and flexible (but avoid copy/pasting old code which uses raw MIMEMultipart etc; you want to use the (no longer very) new EmailMessage class).

Answered By - tripleee

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, December 17, 2021

[SOLVED] awk replace string with another with new lines ( one time ) after finding another string

Issue

Solution

Popular Posts

Labels