Friday, July 29, 2022

[SOLVED] sed or other - remove specific html tag text from file

Issue

Trying to remove a specific html tag from a file.

Question:

  1. How do I get the desired result?
  2. Should I be using the sed command for desired results?

file: test1.txt

Hello World
</body>
</html>

sed

sed -e 's/<\/body>\\n<\/html>\\n//' test1.txt > test2.txt

Desired result in test2.txt

Hello World

Actual

Hello World
</body>
</html>

Solution

Should I be using the sed command for desired results?

Actually grep suits it better with:

grep -Ev '</(body|html)>' file

Hello World

If you want to remove specific <body>\n</html>\n string only then use this sed that would work with any version of sed:

sed '/<\/body>/{N; /<\/html>/ {N; s~</body>\n</html>\n~~;};}' file

Hello World


Answered By - anubhava
Answer Checked By - Pedro (WPSolving Volunteer)