Issue
I'm attempting to make a small shell script to fetch information off of the c++ reference website and I want to crawl through the different pages of the website. To do this i've written the following code:
curl -s http://www.cplusplus.com/reference/ | grep -oP '(?<=<a href= ").*?(?=">)'
This successfully fetches the child pages of a given page however inputting all the links manually is tedious so I attempted to do this:
URL="algorithm" # <-- will be turned into an array
cr=$"\r"
URL=${URL%$cr}
#for loop:
curl "http://www.cplusplus.com/reference/${URL}"
#done
But when I attempt to run this, the file returns a blank line, how would one go about implementing this functionally?
Solution
To iteratively curl all the returned child pages you can store the child pages in a variable and run a for loop for each child in the variable. You can then use that url inside the for loop and use it in the curl command.
childs=$(curl -s http://www.cplusplus.com/reference/ | grep -oP '(?<=<a href= ").*?(?=">)')
declare -a array=($childs)
for url in "${array[@]}"
do
curl "http://www.cplusplus.com/reference/$url"
done
EDIT: using array as noted by Charles
Answered By - Matt