This article is an updated version of the multi-file breakpoint continuation, fragment upload, second upload, and retry mechanism. To see the original implementation, please check this article.

Know what is and know what is

File upload I believe many friends have encountered, then maybe you have encountered when uploading large files, upload time is long, and often failed, and failed, and have to upload again is very annoying. Let’s first understand the cause of the failure!

As far as I know, there are probably the following reasons:

  1. Server configuration: For example, the default file upload size in PHP is 8M [post_max_size = 8M]. If you put more than 8M content in a request body, an exception will occur
  2. Request timeout: If you set the interface timeout to 10s, an interface fails to upload large files if the response time is longer than 10s.
  3. Network fluctuation: this belongs to uncontrollable factor, also is more common problem.

For these reasons, smart people come up with the idea of splitting a file into several small files and uploading them one by one, which is called shard uploading. Network fluctuation this is really uncontrollable, maybe a gust of wind, the network will be disconnected. Well, since the disconnection is out of control, I can only upload the content of files that have already been uploaded, which greatly speeds up the re-upload speed. Hence the term “breakpoint continuation”. At this point, someone in the crowd inserted a mouth, some documents I have uploaded again, why also in the upload, can not waste my traffic and time. Oh… Well, it’s easy. Every time you upload a file, check if it exists. If it does, you don’t have to upload it again. Since then, the “three brothers” have taken over the file world by themselves.

Note that the code in this article is not the actual code, please go to Github to see the latest code github.com/pseudo-god.


Shard to upload

HTML

The native INPUT style is ugly, so here we put a Button in a style overlay.

  <div class="btns">
    <el-button-group>
      <el-button :disabled="changeDisabled">
        <i class="el-icon-upload2 el-icon--left" size="mini"></i>Select the file<input
          v-if=! "" changeDisabled"
          type="file"
          :multiple="multiple"
          class="select-file-input"
          :accept="accept"
          @change="handleFileChange"
        />
      </el-button>
      <el-button :disabled="uploadDisabled" @click="handleUpload()"><i class="el-icon-upload el-icon--left" size="mini"></i>upload</el-button>
      <el-button :disabled="pauseDisabled" @click="handlePause"><i class="el-icon-video-pause el-icon--left" size="mini"></i>suspended</el-button>
      <el-button :disabled="resumeDisabled" @click="handleResume"><i class="el-icon-video-play el-icon--left" size="mini"></i>restore</el-button>
      <el-button :disabled="clearDisabled" @click="clearFiles"><i class="el-icon-video-play el-icon--left" size="mini"></i>empty</el-button>
    </el-button-group>
    <slot 
    
 //datadatavar chunkSize = 10 * 1024 * 1024; // Slice sizevar fileIndex = 0;// The subscript of the file currently being traverseddata:() = >({container: {files: null}, tempFilesArr: [], // Store the files information cancels: [], // Store the request tempThreads: 3 to cancel, // Default status status: Status.wait }),Copy the code

A slightly nicer UI comes out.

Select the file

During file selection, several hooks are exposed that should be familiar to those familiar with elementUi, and are basically the same. OnExceed: hooks when the number of files exceeds the limit, beforeUpload: before the file is uploaded

FileIndex is important because it’s a multi-file upload, so it’s important to locate the file that’s being uploaded, basically

handleFileChange(e) {
  const files = e.target.files;
  if(! files)return;
  Object.assign(this.$data, this.$options.data()); // Reset data all data

  fileIndex = 0; // Reset the file subscript
  this.container.files = files;
  // Determine the number of files selected
  if (this.limit && this.container.files.length > this.limit) {
    this.onExceed && this.onExceed(files);
    return;
  }

  // Copy a filelist object because the filelist is not editable
  var index = 0; // The subscript of the selected file is mainly used when the original file list does not correspond to the temporary file list after the file is deleted
  for (const key in this.container.files) {
    if (this.container.files.hasOwnProperty(key)) {
      const file = this.container.files[key];

      if (this.beforeUpload) {
        const before = this.beforeUpload(file);
        if (before) {
          this.pushTempFile(file, index); }}if (!this.beforeUpload) {
        this.pushTempFile(file, index); } index++; }}},// Store tempFilesArr, split code for the above hook
pushTempFile(file, index) {
  // Additional initial value
  const obj = {
    status: fileStatus.wait,
    chunkList: [].uploadProgress: 0.hashProgress: 0,
    index
  };
  for (const k in file) {
    obj[k] = file[k];
  }
  console.log('pushTempFile -> obj', obj);
  this.tempFilesArr.push(obj);
}
Copy the code

Shard to upload

  • Create a slice and loop through the decomposing file
  createFileChunk(file, size = chunkSize) {
    const fileChunkList = [];
    var count = 0;
    while (count < file.size) {
      fileChunkList.push({
        file: file.slice(count, count + size)
      });
      count += size;
    }
    return fileChunkList;
  }
Copy the code
  • Looping to create slices, since we are doing multiple files, so there is a loop to deal with, successively creating file slices, and uploading slices.
async handleUpload(resume) {
  if (!this.container.files) return;
  this.status = Status.uploading;
  const filesArr = this.container.files;
  var tempFilesArr = this.tempFilesArr;

  for (let i = 0; i < tempFilesArr.length; i++) {
    fileIndex = i;
    // Create a slice
    const fileChunkList = this.createFileChunk(
      filesArr[tempFilesArr[i].index]
    );
      
    tempFilesArr[i].fileHash ='xxxx'; // Save a seat
    tempFilesArr[i].chunkList = fileChunkList.map(({ file }, index) = > ({
      fileHash: tempFilesArr[i].hash,
      fileName: tempFilesArr[i].name,
      index,
      hash: tempFilesArr[i].hash + The '-' + index,
      chunk: file,
      size: file.size,
      uploaded: false.progress: 0.// Upload progress for each block
      status: 'wait' // Upload status, used to display progress status
    }));
    
    // Upload slices
    await this.uploadChunks(this.tempFilesArr[i]); }}Copy the code
  • The uploadChunks method is only responsible for constructing the data to be passed to the back end. The core uploading function is in the sendRequest method
 async uploadChunks(data) {
  var chunkData = data.chunkList;
  const requestDataList = chunkData
    .map(({ fileHash, chunk, fileName, index }) = > {
      const formData = new FormData();
      formData.append('md5', fileHash);
      formData.append('file', chunk);
      formData.append('fileName', index); // The file name uses the subscript of the slice
      return { formData, index, fileName };
    });

  try {
    await this.sendRequest(requestDataList, chunkData);
  } catch (error) {
    // Upload is rejected
    this.$message.error('Upload failed, consider trying again.' + error);
    return;
  }

  // merge slices
  const isUpload = chunkData.some(item= > item.uploaded === false);
  console.log('created -> isUpload', isUpload);
  if (isUpload) {
    alert('Failed slice exists');
  } else {
    // Perform the merge
    await this.mergeRequest(data); }}Copy the code
  • SendReques. Upload is the most important area, and it’s also a failure area, so if we have 10 shards, if we just send 10 requests, it’s very easy to hit the browser bottleneck, so we need to process the requests concurrently.

    • Concurrency: Here I use the for loop to control the initial concurrency, and then call myself in a handler function to control concurrency. In handler, the array API. Shift simulates the queue effect to upload slices.

    • Retry: The retryArr array stores and accumulates the number of retries for each slice file request. For example, [1,0,2], the 0th file slice error is reported once, the second error is reported twice. Const index = forminfo.index to make sure it corresponds to the file; Let’s just take our index from the data. If the request fails, add the failed request to the queue again.

      • I wrote a small Demo about concurrency and retry. If you don’t understand it, you can study it yourself. The file address is github.com/pseudo-god. , retry code seems to have been lost by me, if we have demand, I fill it again!
// Concurrent processing
sendRequest(forms, chunkData) {
  var finished = 0;
  const total = forms.length;
  const that = this;
  const retryArr = []; // The array stores the number of hash retries for each file. For example, [1,0,2], the 0th file slice error is reported once and the 2nd file slice error is reported twice

  return new Promise((resolve, reject) = > {
    const handler = () = > {
      if (forms.length) {
        / / out of the stack
        const formInfo = forms.shift();

        const formData = formInfo.formData;
        const index = formInfo.index;
        
        instance.post('fileChunk', formData, {
          onUploadProgress: that.createProgresshandler(chunkData[index]),
          cancelToken: new CancelToken(c= > this.cancels.push(c)),
          timeout: 0
        }).then(res= > {
          console.log('handler -> res', res);
          // Change the state
          chunkData[index].uploaded = true;
          chunkData[index].status = 'success';
          
          finished++;
          handler();
        })
          .catch(e= > {
            // If paused, retry is prohibited
            if (this.status === Status.pause) return;
            if (typeofretryArr[index] ! = ='number') {
              retryArr[index] = 0;
            }

            // Update the status
            chunkData[index].status = 'warning';

            // Add up the number of errors
            retryArr[index]++;

            // Retry 3 times
            if (retryArr[index] >= this.chunkRetry) {
              return reject('Retry failed', retryArr);
            }

            this.tempThreads++; // Release the currently occupied channel

            // Rejoin the failed queue
            forms.push(formInfo);
            handler();
          });
      }

      if (finished >= total) {
        resolve('done'); }};// Control concurrency
    for (let i = 0; i < this.tempThreads; i++) { handler(); }}); }Copy the code
  • The upload progress of slices is maintained through the Axios onUploadProgress event, combined with the createProgresshandler method
// Slice upload progress
createProgresshandler(item) {
  return p= > {
    item.progress = parseInt(String((p.loaded / p.total) * 100));
    this.fileProgress();
  };
}
Copy the code

The Hash computation

In fact, it is to calculate the MD5 value of a file, MD5 is used in the whole project in several places.

  • You need to check whether the file exists by checking the MD5 value.
  • Continuation: MD5 is used as the key value, if only.

This project mainly uses worker to process, which will greatly improve performance and speed. Since there are multiple files, the HASH calculation progress must be reflected in each file. Therefore, the fileIndex global variable is used to locate the file that is being uploaded

// Generate file hash (web-worker)
calculateHash(fileChunkList) {
  return new Promise(resolve= > {
    this.container.worker = new Worker('./hash.js');
    this.container.worker.postMessage({ fileChunkList });
    this.container.worker.onmessage = e= > {
      const { percentage, hash } = e.data;
      if (this.tempFilesArr[fileIndex]) {
        this.tempFilesArr[fileIndex].hashProgress = Number(
          percentage.toFixed(0)); }if(hash) { resolve(hash); }}; }); }Copy the code

Due to the use of worker, we cannot directly use MD5 by using NPM package. You need to download the spark-md5.js file and import it

//hash.js

self.importScripts("/spark-md5.min.js"); // Import the script
// Generate the file hash
self.onmessage = e= > {
  const { fileChunkList } = e.data;
  const spark = new self.SparkMD5.ArrayBuffer();
  let percentage = 0;
  let count = 0;
  const loadNext = index= > {
    const reader = new FileReader();
    reader.readAsArrayBuffer(fileChunkList[index].file);
    reader.onload = e= > {
      count++;
      spark.append(e.target.result);
      if (count === fileChunkList.length) {
        self.postMessage({
          percentage: 100.hash: spark.end()
        });
        self.close();
      } else {
        percentage += 100/ fileChunkList.length; self.postMessage({ percentage }); loadNext(count); }}; }; loadNext(0);
};
Copy the code

File merging

When all our slices are uploaded, we need to merge the files. Here, we only need to request the interface

mergeRequest(data) {
   const obj = {
     md5: data.fileHash,
     fileName: data.name,
     fileChunkNum: data.chunkList.length
   };

   instance.post('fileChunk/merge', obj, 
     {
       timeout: 0
     })
     .then((res) = > {
       this.$message.success('Upload successful');
     });
 }
Copy the code

Done: The shard upload function is now complete

Breakpoint continuingly

As the name implies, it is broken from that from that start, clear train of thought is very simple. There are two ways to do this, one is for the server side to come back and tell me to start there, and the other is for the browser side to take care of it. Both schemes have advantages and disadvantages. The second method is used in this project.

The HASH value of the file is the key value. After each slice is uploaded successfully, record it. If you need to continue, you can directly skip the existing record. This project will use Localstorage for storage, where I have pre-packaged the addChunkStorage and getChunkStorage methods.

Data stored in Stroage

Cache handling

In the AXIos success callback for slice upload, the successfully uploaded slice is stored

 instance.post('fileChunk', formData, )
  .then(res= > {
    // Store the uploaded slice subscript
+ this.addChunkStorage(chunkData[index].fileHash, index);
    handler();
  })
Copy the code

Check the localstorage and modify the event before uploading the slices

    async handleUpload(resume){+const getChunkStorage = this.getChunkStorage(tempFilesArr[i].hash);
      tempFilesArr[i].chunkList = fileChunkList.map(({ file }, index) = > ({
+        uploaded: getChunkStorage && getChunkStorage.includes(index), // Flag: Whether the upload is complete
+        progress: getChunkStorage && getChunkStorage.includes(index) ? 100 : 0,
+        status: getChunkStorage && getChunkStorage.includes(index)? 'success'
+              : 'wait' // Upload status, used to display progress status
      }));

    }
Copy the code

After the event is true, filter out the data

 async uploadChunks(data) {
  var chunkData = data.chunkList;
  const requestDataList = chunkData
+    .filter(({ uploaded }) = >! uploaded) .map(({ fileHash, chunk, fileName, index }) = > {
      const formData = new FormData();
      formData.append('md5', fileHash);
      formData.append('file', chunk);
      formData.append('fileName', index); // The file name uses the subscript of the slice
      return{ formData, index, fileName }; })}Copy the code

Garbage file cleanup

As the number of uploaded files increases, the number of junk files will also increase. For example, in some cases, half of the uploaded files will not continue, or the upload fails, and the number of fragmented files will increase. I have two solutions in mind so far

  • The front-end sets the cache time at localStorage, and sends a request to the backend to clean up the fragmented files when the time is exceeded. At the same time, the front-end also cleans the cache.
  • Both the front and back ends agree that each cache can only be stored for 12 hours after it is generated and will be automatically cleared after 12 hours

There seem to be some problems in the two schemes above, which may lead to abnormal uploading of slices due to time difference between the front and back ends. Please come up with appropriate solutions and update them later.

-Leonard: Done.


A pass

It’s the easiest, but it sounds awesome. Principle: Computes the HASH of the entire file. Before uploading the file, the server sends a request and transmits the MD5 value. The backend retrieves the file. If the file already exists on the server, no subsequent operations are performed and the upload is complete. You can see that

async handleUpload(resume) {
    if (!this.container.files) return;
    const filesArr = this.container.files;
    var tempFilesArr = this.tempFilesArr;

    for (let i = 0; i < tempFilesArr.length; i++) {
      const fileChunkList = this.createFileChunk(
        filesArr[tempFilesArr[i].index]
      );

      // Hash check, whether it is transmitted in seconds
+      tempFilesArr[i].hash = await this.calculateHash(fileChunkList);
+      const verifyRes = await this.verifyUpload(
+        tempFilesArr[i].name,
+        tempFilesArr[i].hash
+      );
+      if (verifyRes.data.presence) {
+       tempFilesArr[i].status = fileStatus.secondPass;
+       tempFilesArr[i].uploadProgress = 100; +}else {
        console.log('Start uploading sliced files ----', tempFilesArr[i].name);
        await this.uploadChunks(this.tempFilesArr[i]); }}}Copy the code
  // Check the file before uploading: check whether the file exists
  verifyUpload(fileName, fileHash) {
    return new Promise(resolve= > {
      const obj = {
        md5: fileHash, fileName, ... this.uploadArguments// Pass other parameters
      };
      instance
        .post('fileChunk/presence', obj)
        .then(res= > {
          resolve(res.data);
        })
        .catch(err= > {
          console.log('verifyUpload -> err', err);
        });
    });
  }
Copy the code

Done: The second is sent here.

The back-end processing

The article seems to be a little long, the specific code logic will not be posted first, unless someone leave a message request, hee hee, there is time to update

The Node version

Please go to github.com/pseudo-god…. To view

JAVA version

Next week should update the processing

PHP version

I haven’t written PHP for more than a year, but I will make it up when I have time

To be perfect

  • Slice size: this will be dynamically calculated later. You need to automatically calculate the appropriate slice size based on the size of the currently uploaded file. Avoid excessive slices.
  • File appending: The file cannot be added to the queue during file uploading. (I have no idea how to handle this.)

Update record

The component has been running for a period of time, during which several problems have been tested. I thought there were no bugs, but it seems that bugs are quite serious

Bug-1: Upload failure occurs when multiple files with the same content but different file names are uploaded at the same time.

Expected result: After the first file is successfully uploaded, subsequent same files are directly uploaded in seconds

Actual result: After the first file is uploaded successfully, other identical files fail with error messages and incorrect number of blocks.

Cause: After the first file block is uploaded, the loop of the next file is immediately started. As a result, the status of whether the file has been uploaded in seconds cannot be obtained in time, resulting in a failure.

Solution: After the current file fragment has been uploaded and the merging interface has been requested, the next loop is performed.

Change the submethods to synchronous, mergeRequest and uploadChunks methods

Bug-2: When the same file is selected each time and the beforeUpload method is triggered, if the same file is selected the second time, the beforeUpload method fails and the entire process fails.

Cause: The data of the last selected input file was not cleared each time the file was selected. The change event of input will not be triggered if the data is the same.

Solution: Clear the data every time you click input. I optimized the other code along the way, see the commit record for details.

<input v-if="! changeDisabled" type="file" :multiple="multiple" class="select-file-input" :accept="accept" + &western nclick = "f.o uterHTML = f.o uterHTML" @ change = "handleFileChange" / >Copy the code

Rewrote the pause and resume functionality, in fact, mainly added pause and resume states

The previous processing logic was too simple and crude, and there were many problems. Now position the state above each file so that when resuming uploads, you can skip it

Packaging components

Write a lot of code, in fact, you can not copy the above code, here I encapsulate a component. You can go to Github to download the file, there are use cases, if useful, please remember to give a star, thank you!

Steal a lazy, specific package component code is not listed, we go directly to download the file to view, if there is not understand, can leave a message.

Component document

Attribute

parameter type instructions The default note
headers Object Set the request header
before-upload Function Hook before uploading a file. Returns false to stop uploading
accept String Accepts the type of file uploaded
upload-arguments Object Parameter to be carried when uploading a file
with-credentials Boolean Whether to pass cookies false
limit Number Maximum number of uploads allowed 0 0 means no limit
on-exceed Function Hook for files exceeding the limit
multiple Boolean Whether the mode is multi-select true
base-url String Since this component is built in AXIOS, if you need to go proxy, you can configure your base path directly here
chunk-size Number Size of each slice 10M
threads Number Number of concurrent requests 3 The higher the number of concurrent requests, the higher the performance requirements on the server. Use the default value as much as possible
chunk-retry Number Error retry times 3 Number of error retries of a fragment request

Slot

The method name instructions parameter note
header Button area There is no
tip Prompt text There is no

Back-end interface documentation: According to the documentation implementation

Code Address:Github.com/pseudo-god….

Interface document address docs.apipost.cn/view/…