Issue
As question states, I use CURl for web-scraping and I get a response which include all html elements but are not in proper indentation.
curl somewebsite.com/somepage > scrape.html/scrape.txt
after this command the data gets saved in scrape.txt or scrape.html file the contents looks very messy and mostly its in 1 line only.
The content of the file looks lke this
<!DOCTYPE html><html lang="en"><head><script src="/cdn-cgi/apps/head/a2ff1ftsK3yTu21p1BeEN2BZsnA.js"></script><link href="https://fonts.googleapis.com/css2?family=DM+Sans:wght@400;700&family=DM+Sans:wght@400&display=swap" rel="stylesheet" media="print" onload="if(!window._isAppPrerendering)this.removeAttribute("media");"><link href="https://fonts.googleapis.com/css2?family=DM+Sans:wght@400;700&family=DM+Sans:wght@400&display=swap" rel="preload" as="style"><link href="https://fonts.gstatic.com" rel="preconnect" crossorigin="true"><meta charset="utf-8">
as u see above it's all in 1 line and it goes off till < /html>
Is there any technique in curl or any other easy way to get output of a scraped webpage with indentation followed?
I am OK with solution in PHP, javascript, or NodeJS
Thank you in advance.....
Solution
Couldn't find solution for the problem no one answered either.
My solution is to use some beautifying tools like
https://beautifytools.com/html-beautifier.php#
This tool is actually good for large websites with large script and styles.
Answered By - Mohammed Khurram