Saturday, November 13, 2021

[SOLVED] Using sed command replace in the input text file all occurrences of characters '&', '<', '>' with their HTML entities

Issue

this is my input text file

< > & * ^ % $ # @ ! ) ( ) < > < > > > < 

This is the sed shell script that I am using.

sed 's/&/&amp;/g ; s/</&lt;/g ; s/>/&gt;/g' html_file.txt > new_file.txt

This is the output file:

<lt; >gt; &amp; * ^ % $ # @ ! ) ( ) <lt; >gt; <lt; >gt; >gt; >gt; <lt; 

I can't understand that why there is still < and > signs instead of &?


Solution

From info sed:

3.3 The 's' Command
[...]
The 's' command (as in substitute) is probably the most important in
'sed' [...]. The syntax of the 's' command is 's/REGEXP/REPLACEMENT/FLAGS'.
[...]
The REPLACEMENT can contain [...] unescaped '&' characters which reference the
whole matched portion of the pattern space.

Escape & with \ to \&.



Answered By - Cyrus