Recently, I met a demand to upload 100MB large files. I investigated the sectioned uploading function of Qiniu and Tencent Cloud, so I sorted out the realization of relevant functions of uploading large files in the front end.

In some services, large file uploading is an important interaction scenario, for example, uploading large Excel table data and uploading video and audio files. If the file size is large or the network condition is poor, the upload takes a long time (more packets need to be transmitted, and the probability of packet loss and retransmission is high). Users cannot refresh the page and can only wait for the request to complete.

Following from the file upload method, sorting out the idea of large file upload, and give the relevant example code, because PHP built-in more convenient file split and splicing method, so the server code using PHP for example writing.

Several ways to upload files

First let’s take a look at several ways to upload files.

Normal form upload

PHP is a good choice for displaying regular form uploads. First build the form for file upload and specify encType =”multipart/form-data” as the submission content type of the form, indicating that the form needs to upload binary data.

<form action="/index.php" method="POST" enctype="multipart/form-data">
  <input type="file" name="myfile">
  <input type="submit">
</form>
Copy the code

PHP uploaded_file (move_uploaded_file)

$imgName = 'IMG'.time().'.'.str_replace('image/','',$_FILES["myfile"]['type']); $fileName = 'upload/'.$imgName; // Move the uploaded file to the specified upload folder. If (move_uploaded_file($_FILES['myfile']['tmp_name'], $fileName)){echo $fileName; }else { echo "nonn"; }Copy the code

It is easy to encounter server timeout problems when uploading large files on the form form. Through XHR, the front-end can also asynchronously upload files. Generally, there are two ideas.

File code upload

The first idea is to encode the file and then decode it on the server. The main implementation principle is to convert the image to Base64 for transmission

var imgURL = URL.createObjectURL(file); ctx.drawImage(imgURL, 0, 0); Var data = canvas.toDataURL("image/jpeg", 0.5);Copy the code

On the server side, what you need to do is to decode Base64 and then save the image

$imgData = $_REQUEST['imgData'];
$base64 = explode(',', $imgData)[1];
$img = base64_decode($base64);
$url = './test.jpg';
if (file_put_contents($url, $img)) {
    exit(json_encode(array(
        url => $url
    )));
}
Copy the code

The disadvantage of Base64 encoding is that it is larger than the original image (because Base64 converts three bytes into four bytes, the encoded text is about a third larger than the original text), and the upload and parsing time is significantly increased for very large files.

In addition to base64 encoding, files can be uploaded in binary format after being read directly by the front end

Function readBinary(text){var data = new ArrayBuffer(text.length); var ui8a = new Uint8Array(data, 0); for (var i = 0; i < text.length; i++){ ui8a[i] = (text.charCodeAt(i) & 0xff); } console.log(ui8a) } var reader = new FileReader(); Function (){readBinary(this.result) {readBinary(this.result); Put into fileReader result field reader. ReadAsBinaryString (file);Copy the code

FormData Asynchronous upload

The FormData object is primarily used to assemble a set of key/value pairs that send requests using XMLHttpRequest, giving you more flexibility in sending Ajax requests. You can use FormData to simulate form submission.

Let formData = new formData (); let files = e.targe.files // Get the input file object let formData = new formData (); formData.append('file', file); axios.post(url, formData);Copy the code

The server handles the request in much the same way as a direct form request.

Iframe Does not refresh the page

On older browsers such as Internet Explorer, XHR does not support direct uploads of formData, so you can only upload files using the form. The form submission itself will jump to the page because of the target attribute of the form, which has a value of

  • _self, the default, opens the response page in the same window
  • _blank opens in a new window
  • _parent, opens in the parent window
  • _top, the top window opens
  • framenameIn the iframe of the specified name

If you want to make users feel like uploading files asynchronously, you can use Framename to specify iframe. If you set the target property of the form to an invisible iframe, the returned data will be accepted by the iframe. Therefore, only the iframe will be refreshed, and the returned result can also be obtained by parsing the text within the iframe.

function upload(){
    var now = +new Date()
    var id = 'frame' + now
    $("body").append(`<iframe style="display:none;" name="${id}" id="${id}" />`);

    var $form = $("#myForm")
    $form.attr({
        "action": '/index.php',
        "method": "post",
        "enctype": "multipart/form-data",
        "encoding": "multipart/form-data",
        "target": id
    }).submit()

    $("#"+id).on("load", function(){
        var content = $(this).contents().find("body").text()
        try{
            var data = JSON.parse(content)
        }catch(e){
            console.log(e)
        }
    })
}
Copy the code

Large file upload

Now let’s take a look at some of the time-outs that occur with large file uploads in the above mentioned upload modes,

  • Form uploads and iframe uploads without refreshing pages are actually file uploads using form tags. In this way, the entire request is completely handled by the browser. When uploading large files, the request may time out
  • FromData actually encapsulates a set of request parameters in XHR to simulate a form request, and can’t avoid large file upload timeouts
  • Encoding uploads, we can control the content of uploads more flexibly

The main problem with large file uploads is that a large amount of data has to be uploaded in the same request, resulting in a lengthy process and the need to start all over again after a failure. Imagine if we split the request into multiple requests, each request takes less time, and if a request fails, we just need to re-send the request instead of starting from scratch. Would this solve the problem of large file uploads?

Combined with the above problems, it seems that large file upload needs to achieve the following requirements

  • Support split upload requests (i.e. slices)
  • Support breakpoint continuation
  • Display upload progress and pause upload

Let’s implement each of these functions in turn, and it looks like the main one is slicing.

File section

In the front end, we only need to get the binary content of the file first, then split its content, and finally upload each slice to the server.

In JavaScript, the FIle object is a subclass of the Blob object, which contains an important method, slice, through which binary files can be split.

Here is an example of splitting a file

function slice(file, piece = 1024 * 1024 * 5) { let totalSize = file.size; // let start = 0; // let end = start + piece; // End byte let chunks = [] while (start < totalSize) {// End byte let chunks = [] while (start < totalSize) { So include the slice method let blob = file.slice(start, end); chunks.push(blob) start = end; end = start + piece; } return chunks }Copy the code

Split the file into piece-sized pieces, and then upload only that one piece per request

let file = document.querySelector("[name=file]").files[0]; Const LENGTH = 1024 * 1024 * 0.1; let chunks = slice(file, LENGTH); // start with chunks.forEach(chunk=>{let fd = new FormData(); fd.append("file", chunk); post('/mkblk.php', fd) })Copy the code

Once the server receives these slices, it can concatenate them. Here is an example of PHP concatenating slices

$filename = './upload/' . $_POST['filename']; // Create a file if there is no file in the first upload. After uploading, just append data to this file. file_exists($filename)){ move_uploaded_file($_FILES['file']['tmp_name'],$filename); }else{ file_put_contents($filename,file_get_contents($_FILES['file']['tmp_name']),FILE_APPEND); echo $filename; }Copy the code

Remember to modify the server configuration of nginx when testing, otherwise Large files may prompt 413 Request Entity Too Large error.

server {
	// ...
	client_max_body_size 50m;
}
Copy the code

There are some problems with this approach

  • Unable to identify which slice a slice belongs to, appending file contents will fail when multiple requests occur simultaneously
  • The slice uploading interface is asynchronous, so it cannot be guaranteed that the slices received by the server are spliced in accordance with the requested order

So let’s look at how slices should be restored on the server side.

Restore section

The back end needs to restore slices of multiple identical files into a single file. The above method of processing slices has the following problems

  • How to recognize that multiple slices are from the same file, which can pass the same file on each slice requestcontextparameter
  • How do I restore multiple slices to a single file
    • Confirm that all slices have been uploaded. This can be called by the client after all slices have been uploadedmkfileInterface to inform the server to concatenate
    • Find all slices in the same context and confirm the order of each slice. This will mark each slice with a positional index value
    • Splice slices sequentially and restore them to a file

There is an important parameter, context, that we need to get as a unique identifier for a file. You can get it in either of the following ways

  • To prevent multiple users from uploading the same file, you can add additional user information, such as uid, to ensure uniqueness
  • The hash of the file is calculated according to the binary content of the file. In this way, the identity of the file will be different if the file content is different. The disadvantage is that the computation is large.

Modify upload code, add related parameters

// Get the context, Function createContext(file) {return file.name + file.length} let file = document.querySelector("[name=file]").files[0]; Const LENGTH = 1024 * 1024 * 0.1; let chunks = slice(file, LENGTH); // Get the context of the same file let context = createContext(file); let tasks = []; chunks.forEach((chunk, index) => { let fd = new FormData(); fd.append("file", chunk); // Pass context fd.append("context", context); // Pass the slice index fd.append("chunk", index + 1); tasks.push(post("/mkblk.php", fd)); }); All (tasks). Then (res => {let fd = new FormData(); fd.append("context", context); fd.append("chunks", chunks.length); post("/mkfile.php", fd).then(res => { console.log(res); }); });Copy the code

In the mkblk.php interface, we use context to save slices associated with the same file

// mkblk.php $context = $_POST['context']; $path = './upload/' . $context; if(! is_dir($path)){ mkdir($path); $filename = $path.'/'. $_POST['chunk']; $res = move_uploaded_file($_FILES['file']['tmp_name'],$filename);Copy the code

In addition to the above simple method of classifying slices by directory, slice information can also be stored in the database for indexing. Next is the implementation of the mkfile.php interface, which is called after all slices have been uploaded

// mkfile.php $context = $_POST['context']; $chunks = (int)$_POST['chunks']; $filename = './upload/'. $context. '/file.jpg'; for($i = 1; $i <= $chunks; ++$i){ $file = './upload/'.$context. '/' .$i; $content = file_get_contents(); if(! file_exists($filename)){ $fd = fopen($filename, "w+"); }else{ $fd = fopen($filename, "a"); } fwrite($fd, $content); } echo $filename;Copy the code

This solves the above two problems:

  • Identification of slice origin
  • Ensure the splicing sequence of slices

Breakpoint continuingly

Even if large files are split into slices and uploaded, we still need to wait for all slices to be uploaded. During the waiting process, a series of situations may occur that lead to partial slice uploading failure, such as network failure and page closure. The server could not be notified to compose the file because the slices were not all uploaded. This can be handled with a breakpoint continuation.

Resumable upload means that you can continue to upload unfinished parts from the uploaded part without having to start from the beginning to save upload time.

Since the whole uploading process is carried out according to the slice dimension, and the mkfile interface is actively called by the client after all the slices are uploaded, the implementation of resumable breakpoint is also very simple:

  • After the section is uploaded successfully, the uploaded section information is saved
  • When the same file is transferred next time, the slice list is traversed and only the unuploaded slices are selected for uploading
  • Call again after uploading all slicesmkfileThe interface notifies the server to merge files

Therefore, the question falls on how to save the uploaded slice information. Generally, there are two strategies for saving

  • It can be saved in a front-end browser using locaStorage, which is independent of the server and convenient to implement. The disadvantage is that if the user deletes the local file, the uploaded record will be lost
  • The server itself knows which slices have been uploaded. Therefore, the server can provide an interface to query uploaded slices according to the context of the file and call the historical upload record of the file before uploading the file

Let’s implement the breakpoint upload function by saving the uploaded slice record locally

Function getUploadSliceRecord(context){let Record = localstorage.getitem (context) if(! Parse (record) {return []}else {try{return JSON. Parse (record)}catch(e){}}} // Save the uploaded slice function saveUploadSliceRecord(context, sliceIndex){ let list = getUploadSliceRecord(context) list.push(sliceIndex) localStorage.setItem(context, JSON.stringify(list)) }Copy the code

Then, the upload logic is slightly modified, mainly adding the logic of detecting before uploading and saving records after uploading

let context = createContext(file); Let record = getUploadSliceRecord(context); let tasks = []; Chunk. forEach((chunk, index) => {if(record.includes(index)){return} let fd = new FormData(); fd.append("file", chunk); fd.append("context", context); fd.append("chunk", index + 1); Let task = post("/mkblk.php", fd). Then (res=>{saveUploadSliceRecord(context, index) record.push(index) }) tasks.push(task); });Copy the code

At this time, refresh the page or close the browser during uploading. When the same file is uploaded again, the previously uploaded slices will not be uploaded again.

The logic of breakpoint continuation on the server side is basically similar, as long as the query interface on the server side is called inside getUploadSliceRecord to obtain the uploaded slice record, so it is not expanded here.

In addition, we also need to consider the slice expiration: If the mkfile interface is called, the slice content on the disk can be deleted. If the client does not call the mkfile interface, it is obviously not reliable to leave these slices on the disk. In general, the slice upload has a period of validity, after which the slice will be deleted. For the above reasons, breakpoint continuations must also synchronize slice expired implementation logic.

Upload progress and pause

The progress method in xhr.upload can monitor the upload progress of each slice.

Xhr. abort can cancel the upload of unfinished slices to achieve the effect of upload pause. Resuming upload is similar to resumable upload, obtaining the list of uploaded slices and resending unuploaded slices.

Due to space constraints, upload progress and pause functions will not be implemented here.

summary

At present, there are some mature big file upload solutions in the community, such as seven cow SDK, Tencent cloud SDK, etc., maybe we do not need to manually implement a simple large file upload library, but it is necessary to understand its principle.

This paper first sorted out several ways of front-end file upload, and then discussed several scenarios of large file upload, as well as several functions of large file upload

  • Through the Blob objectsliceMethods Split the file into slices
  • The conditions and parameters required by the server to restore files are collated, and PHP is demonstrated to restore slices to files
  • Resumable by saving records of uploaded slices

There are still some problems, such as: avoid memory overflow when merging files, slice failure policy, upload progress pause and other functions, we haven’t gone into depth or implemented one by one, continue to learn ~