Issue
I'm trying to extract a list of names from a website using sed, but I'm not sure how to go about replacing the tab characters separating them. This code:
curl -s "https://namnidag.se/?year=2022&month=9&day=12" | sed -nE -e "s#<div class='names'>([^<]*)</div>#\1#p" | html2text
gives me the names for September 12th, but they are separated by a tab character:
Åsa Åslög
If I change the sed script to replace tabs with comma and space, like this:
curl -s "https://namnidag.se/?year=2022&month=9&day=12" | sed -nE -e "s#<div class='names'>([^<]*)</div>#\1#" -e 's/\t/, /p' | html2text
it works as expected:
Åsa, Åslög
However, if I try on a day that only has one name, such as September 13th:
curl -s "https://namnidag.se/?year=2022&month=9&day=13" | sed -nE -e "s#<div class='names'>([^<]*)</div>#\1#" -e 's/\t/, /p' | html2text
I get no output; the first sed script without the tab replacement works fine in this case though. What am I doing wrong here? I'm using GNU sed 4.8, if that helps.
Thanks!
Solution
You need to remove the p
curl -s "https://namnidag.se/?year=2022&month=9&day=12" | sed -nE -e "s#<div class='names'>([^<]*)</div>#\1#p" | sed -e 's/\t/, /'
Answered By - WeDBA Answer Checked By - Senaida (WPSolving Volunteer)