Challenges in Building a Reliable Upload Feature

Sep 01, 2023 10 min

Cover Image for Challenges in Building a Reliable Upload Feature

Uploading files is one of the most basic features of web development. However, it is not without its complexities. Designing a reliable and fast upload feature for a large number of files is still challenging. The naive upload process sends bytes from a client to a server, which receives said bytes and writes them into a file. Essentially, this still remains the same regardless of the number of files or their size needed to upload. You start to run into issues with resource overload, namely CPU, RAM, and network when the number of files and their size increases.

Challenges

The front end makes all the requests using the client’s resources. Sequential requests for all the files would be very slow, whereas simultaneous (concurrent) requests to upload would overwhelm the client’s resources. In the case of file I/O, reading them one by one is fine and slow, whereas loading too many crashes the client.

The same goes for the server as well. Sequentially serving upload acknowledgments is slow, whereas simultaneously receiving too many upload requests can crash the server as well, It’s more capable than a client’s computer, but there is a limit.

Then there is the expectation around the speed of upload. Say you have 100Mbps network bandwidth, to upload 100 files 10MB each, which would ideally take (100*10*8)/100 = 80 secs. But in reality, for each file, there has to be an IO operation and request, and response operations, which takes CPU time. This inflates the total time to upload a single file by a lot (can take up to a full minute per file as well), Numbers may vary but it could be as large as 40-60% CPU time and the rest over the network. The network bandwidth is only partially utilized. This only increases as the file size grows as well as the number of files.

We cannot ignore the possibility of network failure, RAM out-of-memory error or similar issues, that can interrupt the upload. Upload features for large data must provide reliability as well and not just speed, as the process takes time anyway. Re-doing a large upload is usually not a good option,

At last, there is the issue of the maximum size of payload a request can carry. If your requests are routed by Cloudflare, then 100MB is the limit.

Solutions

The most obvious choice to improve upload speed is not to send requests sequentially and send them concurrently from UI. The CPU usage per file may look like a lot of overhead, but improving our network utilization by concurrent requests can improve the upload times by a lot. Another addition would be to chunk the files into a manageable payload size (e.g. 1MB). While this is counterintuitive to our discussion and the number of requests and responses by order, it offers two main benefits.

The first would be a clear view of the progress of the upload, this can be reliably tracked with acknowledgments for each chunk. With the payload much smaller, the CPU processing time for each request and response decreases substantially, and the payload can be sent over much more frequently than a large request. This isn’t a new solution and is referred to as a multipart upload process. Concurrently sending file chunks turns out to be reliable and very fast. On the client side, just by managing the maximum number of concurrent requests and file I/Os, we can prevent freezes and crashes as well. A naive approach like a hard limit on the number of requests or files loaded into memory also works.

Software developmentFile uploadLarge filesWeb development

Related Post

Try our platform for free

Ready to get started? Create an account in seconds and generate your first insights.