Wednesday, December 29, 2021

[SOLVED] Calling curl with a variable in the link returns a blank line

Issue

I'm attempting to make a small shell script to fetch information off of the c++ reference website and I want to crawl through the different pages of the website. To do this i've written the following code:

curl -s http://www.cplusplus.com/reference/ | grep -oP '(?<=<a href= ").*?(?=">)'

This successfully fetches the child pages of a given page however inputting all the links manually is tedious so I attempted to do this:

URL="algorithm" # <-- will be turned into an array
cr=$"\r"
URL=${URL%$cr}
#for loop:
curl "http://www.cplusplus.com/reference/${URL}"
#done

But when I attempt to run this, the file returns a blank line, how would one go about implementing this functionally?


Solution

To iteratively curl all the returned child pages you can store the child pages in a variable and run a for loop for each child in the variable. You can then use that url inside the for loop and use it in the curl command.

childs=$(curl -s http://www.cplusplus.com/reference/ | grep -oP '(?<=<a href= ").*?(?=">)')
declare -a array=($childs)
for url in "${array[@]}"
do
    curl "http://www.cplusplus.com/reference/$url" 
done

EDIT: using array as noted by Charles



Answered By - Matt