Monday, July 25, 2022

[SOLVED] Set curl options to improve readability of progress in R download.file()

Issue

I am using R's download.file(..., method="curl") to download various text files. The status updates from curl do not have a "\n" after each update, so what comes out looks like this, with no line breaks:

> url1 <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
> q1f <- "wk3q1f.csv"
> download.file(url1,q1f,method="curl")
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  0 4147k    0 18404    0     0  14232      0  0:04:58  0:00:01  0:04:57 14233  2 4147k    2  114k    0     0  51344      0  0:01:22  0:00:02  0:01:20 51341

Versions used: libcurl 7.30.0, R 3.1.0 for OS X.

Is there a curl option I can set for line breaks to make the progress report like so:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  
  0 4147k    0 18404    0     0  14232      0  0:04:58  0:00:01  0:04:57 14233  
  2 4147k    2  114k    0     0  51344      0  0:01:22  0:00:02  0:01:20 51341 

I looked at curl-config and didn't see anything.


Solution

There's no option to make curl use \n instead of \r that I know of. However, you can do this on your own. This is an OS X specific answer, but can be adapted for linux. Using homebrew do a brew install coreutils so we can get access to gstdbuf which will help us get unbuffered command output.

Next, write a small shell script (I called it mycurl) with one line:

gstdbuf -i0 -o0 -e0 curl $1 -o $2 2>&1 | gstdbuf -i0 -o0 -e0 tr '\r' '\n'

ensure it's executable (chmod 755 mycurl)

download.file just does the following if method="curl":

else if (method == "curl") {
    if (quiet) 
        extra <- c(extra, "-s -S")
    if (!cacheOK) 
        extra <- c(extra, "-H 'Pragma: no-cache'")
    status <- system(paste("curl", paste(extra, collapse = " "), 
        shQuote(url), " -o", shQuote(path.expand(destfile))))

So, we can mimic it with:

status <- system(paste("/path/to/mycurl", shQuote(url1), shQuote(path.expand(q1f))))

Which will give you your download progress with newlines.

Linux users can juse use stdbuf vs gstdbuf since the coreutiles homebrew package prepends the g to the commands.

Alternatively, you could use GET from httr package with the write_disk option and use a more R-like progress meter:

library(httr)

status <- GET(url1, write_disk(path.expand(q1f), overwrite=TRUE), progress("down"))
|==================================================== (etc. to 100%)|


Answered By - hrbrmstr
Answer Checked By - Mary Flores (WPSolving Volunteer)