Issue
I wanted replace ___SIGNATURE___
with an HTML code signature after the first occurrence of "text/html" and only one replacement string ___SIGNATURE___
. Any remaining ___SIGNATURE___
tags should simply be removed.
I am processing an email message where the header has a multipart boundary so there are two body parts, one with text/plain
and another with text/html
and the ___SIGNATURE___
tag exists in both.
So my part of my script looks like this:
awk -v signature="$(cat $disclaimer_file)" '/text\/html/ {html=1;} html==1 && !swap(swap=sub(/___SIGNATURE___/, signature);}1 in.$$ > temp.mail && mv temp.mail in.$$
sed -i "s/charset=us-ascii/charset=utf-8/1;s/___SIGNATURE___//" in.$$
It works, but is that optimal solution?
I have used altermime before but it was not good solution for my case.
Solution
Without access to sample messages, it's hard to predict what exactly will work, and whether we need to properly parse the MIME structures or if we can just blindly treat the message as text.
In the latter case, refactoring to something like
awk 'NR==FNR { signature = signature ORS $0; next }
{ sub(/charset="?[Uu][Ss]-[Aa][Ss][Cc][Ii][Ii]"?/, "charset=\"utf-8\"") }
/text\/html/ { html = 1 }
/text\/plain/ { html = 0 }
/___SIGNATURE___/ {
if (html && signature) {
# substr because there is a ORS before the text
sub(/___SIGNATURE___/, substr(signature, 2))
signature = ""
} else
sub(/___SIGNATURE___/, "")
} 1' "$disclaimer_file" "in.$$"
would avoid invoking both Awk and sed
(and cat
, and the quite pesky temporary file), where just Awk can reasonably and quite comfortably do all the work.
If you need a proper MIME parser, I would look into writing a simple Python script. The email
library in Python 3.6+ is quite easy to use and flexible (but avoid copy/pasting old code which uses raw MIMEMultipart
etc; you want to use the (no longer very) new EmailMessage
class).
Answered By - tripleee