Issue
If write like this:
sed '/html/d;/title/d;/body/d;/table/d' somefile.html
then sed deletes all occurrences, including at the end of the file, but if add this :a;N;$!ba
then it stops deleting at the end of the file ?
sed '/html/d;/title/d;/body/d;/table/d
:a;N;$!ba' somefile.html
This :a;N;$!ba
is needed for further processing by regular expressions
cat > /dev/shm/ex01.html <<++++++++++
<html><title>Some report 01 2021 4</title><body background=White;>
<table border=1 width=90% align=center bgcolor=#f7f7e7>
<tr><td width=80%>
(02) some text
</td><td>
541
</td></tr>
<tr><td>
(03) some text
</td><td>
541
</td></tr>
<tr><td>
(11) some text
</td><td>
82
</td></tr>
</table>
</body><html>
++++++++++
## #1
echo ======= 1
sed '/html/d;/title/d;/body/d;/table/d' /dev/shm/ex01.html
## #2
echo ======= 2
sed '/html/d;/title/d;/body/d;/table/d
:a;N;$!ba' /dev/shm/ex01.html
## #3
echo ======= 3
echo Convert /dev/shm/ex01.html to CSV
sed '/title/d
:a;N;$!ba
s/ \{2,\}//g
s#\s*</td>\s*</tr>\s*<tr>\s*<td>\s*#\n#g
s#\s*</td>\s*<td>\s*#;#g
s/<[^>]\+>//g
s/\n\{2,\}//g' /dev/shm/ex01.html
see example at https://www.onlinegdb.com/EPB6SRJaU
I want to rewrite the last command #3
Solution
if the delete command cannot be used in the sed at the same time as concatenating the lines, then this solution, as I think, is most likely to be the most correct:
sed ':a;N;$!ba
s/<html.\+<table[^>]\+>//Ig
s#\s*</td>\s*</tr>\s*<tr>\s*<td>\s*#\n#Ig
s#\s*</td>\s*<td>\s*#;#Ig
s/<[^>]\+>//g;s/\s\{2,\}//g' report.html
see https://onlinegdb.com/BGAhn3c8U
Answered By - Sergey Zakharov Answer Checked By - Terry (WPSolving Volunteer)