Monday, July 25, 2022

[SOLVED] How to use cURL to stream file to a server without buffering?

Issue

I'm writing a Node.js PUT endpoint to allow users to upload large files. As I test the server via a cURL command, I'm finding that the entire file is 'uploaded' before my Node.js request fires:

cURL command

cat ./resource.tif \
  | curl \
    --progress-bar \
    -X PUT \
    --data-binary @- \
    -H "Content-Type: application/octet-stream" \
    https://server.com/path/to/uploaded/resource.tif \
      | cat

Testing, I know that https://server.com/path/to/uploaded/resource.tif already exists. In my Node.js code I test for this and respond with 409:

if (exists) {
  const msg = 'Conflict. Upload path already exists'
  res.writeHead(409, msg)
  res.write(msg)
  res.end()
  return
}

I'm finding that the response is only sent after the entire file has been uploaded. But I'm not sure if the file is buffering on the client side (i.e. cURL), or on the server side.

In any case... How do I configure cURL to pass the file stream to Node.js without buffering?

Other question/answers that I have seen - for example this one (use pipe for curl data) use the same approach as piping output of cat, or something similar to the argument for --binary-data. But this still results in the whole file processed before I see the conflict error.

Using mbuffer, as mentioned in https://stackoverflow.com/a/48351812/3114742:

mbuffer \
  -i ./myfile.tif \
  -r 2M \
    | curl \
      --progress-bar \
      --verbose \
      -X PUT \
      --data-binary @- \
      -H "Content-Type: application/octet-stream" \
      http://server.com/path/to/myfile.tif \
        | cat

This clearly shows that cuRL is only executing the request once the entire file contents have been read into memory on the local machine.


Solution

curl will exit when it receives the 409 response and the response is ended, at least in my testing.

What allows curl to start the upload, is that the request includes the header Expect: 100-continue which causes node http(s) to use the default checkContinue handler. That responds to the client with a HTTP/1.1 100 Continue and curl continues.

To stop a client from starting the upload, handle a request with Expect: 100-continue via the checkContinue event:

server.on('checkContinue', (req, res) => {
  console.log('checkContinue', req.method, req.url)
  res.writeHead(409, {'Content-Type':'text/plain'})
  res.end('Nope')
})

nginx

The flow you want from nginx can be acheived with proxy_request_buffering off;:

1   client > proxy  : PUT /blah
2a  proxy  > client : 100 continue
2b  proxy  > app    : PUT /blah

3a  client > proxy  : start PUT chunks
3b  app    > proxy  : 409/close

4  proxy  > client : 409/close  
5  client bails with error

The 409/close to the client should only be in the milliseconds range behind the 100 continue in normal operation (or whatever the normal latency for this apps responses are).

The flow nginx provides with request buffering is:

1  client > proxy  : PUT /blah
2  proxy  > client : 100 continue
3  client > proxy  : PUT _all_ chunks
4  proxy  > app    : PUT /blah with all chunks
5  app    > proxy  : 409/close
6  proxy  > client : 409/close  
7  client completes with error


Answered By - Matt
Answer Checked By - Terry (WPSolving Volunteer)