Large file upload

This section describes common file upload methods
Common problem with large file uploads: easy timeout. introduceFile fragmentationandBreakpoint continuinglyMethods.

File upload

Upload after coding

Image to base64 upload

The front end base64 encodes the images to be uploaded and submits them to the server.

var imgURL = URL.createObjectURL(file);
var canvas = document.getElementById('canvas');
var ctx = canvas.getContext('2d');
ctx.drawImage(imgURL, 0.0);
// Get the image encoding and pass the image as a long string
var data = canvas.toDataURL("image/jpeg".0.5);
Copy the code

The server receives the file, decodes it in Base64, and saves it.

Base64 encoding is generally recommended only when the image is small because the encoded text is larger than the original image and converts three bytes into four. Therefore, upload and parse time will increase for larger files.

Read the file to binary format upload

The front-end directly reads the file content and uploads it in binary format

// Read the binary file
function readBinary(text){
   var data = new ArrayBuffer(text.length);
   var ui8a = new Uint8Array(data, 0);
   for (var i = 0; i < text.length; i++){ 
     ui8a[i] = (text.charCodeAt(i) & 0xff);
   }
   console.log(ui8a)
}

var reader = new FileReader();
reader.onload = function(){
	  readBinary(this.result) // Read result or upload directly
}

// Place the contents of the file read from the input into the result field of the fileReader
reader.readAsBinaryString(file);
Copy the code

Form upload

Use the form tag and specify encType =”multipart/form-data” to indicate that the form needs to upload binary data, and set method=”POST”.

<form action="http://localhost:8080" method="POST" enctype="multipart/form-data">
  <input type="file" name="file1">
  <input type="submit">
</form>
Copy the code

When type=submit is used to upload files, the disadvantages of experience are obvious. The page will be refreshed after uploading, resulting in the loss of page data and status. Early forms will use iframe embedded in the page, after submission only refresh iframe, to achieve the effect of local refresh.

Using XHR, the front-end can also upload files asynchronously without refreshing.

FormData upload

FormData objects are used to manage FormData, which is then submitted asynchronously.

let files = e.target.files // Get the input file object
let formData = new FormData(); // Construct the FormData object
formData.append('file', file);
axios.post(url, formData);
Copy the code

The above transfer methods work well for small files, but can be problematic for large files, such as a video file with hundreds or thousands of megabytes:

The amount of data uploaded in the same request is too large to cause the link to timeout or exceed the maximum field that the server can accept. If the upload fails, the entire file needs to be retransmitted.

File sharding To solve the problem of uploading large files, the key technology is to divide large files into small files and upload them in parallel.

Fragment restoration After receiving all fragment files, the receiver splices and restores each small file in sequence.

Breakpoint continuation When an unexpected interruption occurs during transfer, we do not want to retransmit all files, but only the parts that failed to upload.

File fragmentation

Blob object: Represents an immutable, raw data file-like object. It contains an important method, slice(), through which we can split binaries.

File objects: A File object is a special type of Blob that inherits the methods of the Blob interface.

Key steps for file fragmentation:

The front end splits large files into smaller files
The front-end sends shard files in parallel
The server receives fragmented files
After the shard file is sent, the front end sends an end request
After receiving the end flag, the server merges the fragments
After the merge, delete the fragment file

Examples of sharding code:

function sliceInPieces(file, size = 2 * 1024 * 1024) {
  const totalSize = file.size; // Total file size
  const chunks = [];
  let start = 0;
  let end = 0;

  while (start < totalSize) {
    end = start + size;
    
    const blob = file.slice(start, end); // Invoke the slice method on the object
    chunks.push(blob);

    start = end;
  }
  return chunks;
}
Copy the code

Upload shard code example :(if there are many shards, the number of concurrent requests needs to be controlled)

const file =  document.querySelector("[name=file]").files[0]; // Read the file

const chunks = sliceInPieces(file); / / shard

const context = uuid();  // File unique identifier

const promiseList = [];

chunks.forEach((chunk, index) = > {
  let fd = new FormData();
  fd.append("file", chunk);
  fd.append("context", context); // Add a label
  fd.append("index", index); // Add the location number
  promiseList.push(axios.post(url, fd)); 
})


Promise.all(promiseList).then((res) = >{...// After all uploads are complete, notify the receiving end
	let fd = new FormData();
  fd.append("context", context);
  fd.append("chunks", chunks.length);
  axios.post(doneUrl, fd).then(res= >{... }); })Copy the code

Restore the shard

The receiving end needs to pay attention to the following problems when processing sharding:

How to identify shard files from the same source file?
How to restore multiple shards to a single file?

Distinguish from the same source file Generate a file from the source file uniquely identify the context parameter to indicate that the file fragment is from the same source file. The context parameter is appended to each slice request, and the notification terminating interface is appended with this flag value, which is used by the receiver to confirm that the received shard belongs to the same file.

This context value is the unique identifier of the file. Here are some examples of how to generate the context:

Use file names as identifiers, but to avoid different users taking the same file name, you can add user information, such as UID, to ensure uniqueness.
Md5 generates the hash file as the unique identifier of the file.

Trigger restore shard

After all fragments are uploaded, an additional request will be sent to notify the receiver for splicing. The receiver finds all shards with the context flag according to the context value in the request, confirms the order of the shards (the index parameter can be added to the shard request, or some fragments can be directly concatenated after the context for processing by the receiver), and merges the shard files according to the order.

Breakpoint continuingly

We have learned the method of uploading large files above. Large files are fragmented and uploaded, and then the receiver merges and restores them into large files. However, in the process of waiting for the fragment upload, we are still likely to fail in some unexpected situations, such as network disconnection or page closure/refresh. Because fragments are not all successfully uploaded, the receiving end cannot be notified to restore files. If you re-upload all of them again, the successfully uploaded shards are wasted. So we can do this with a breakpoint.

Resumable transmission: Only fragments that fail to be uploaded are retransmitted. The key is how the client senses which shards have been successfully uploaded. When the front end triggers retransmission, the server filters the fragments that are successfully transmitted and uploads only the fragments that fail. Additional requests to merge after all uploads are complete.

So how does the front end sense the uploaded shard information?

Client records are stored on the client by means of locaStorage, etc.
- Advantages: Easy to implement and independent of the server. Disadvantages: there is no insurance on the client, the user will clear the cache record will be lost
The server provides an additional query interface to the front-end call.
- Advantages: The server opens an uploaded recording interface to the client based on the fragments received, and the client invokes the interface before retransmission. Records are not easily lost. Disadvantages: Additional interface overhead.

Shard expiration: The last step in sharding is to delete the shard after merging. If the client never calls the interface notifying that the upload is complete, these shards will remain on disk, which is obviously unreasonable. Therefore, shards also need to carry an expiration date, and overdue shards need to be cleared. Expiration also needs to be considered when resuming from a breakpoint.

The last

Search Eval Studio on wechat for more updates.