Aha, I also implemented breakpoint continuation for large files

Before I saw the front end of several big file upload articles written by the big guy, feel this function is very interesting, I also tried to try, in the big guy code added some of their own ideas. This article is a summary of my own, but also hope to give some help to students who want to do this function.

The functions are summarized as follows:

Uploading large files in fragments
Resumable file
Concurrency control
Error retry

Reference article (highly recommended) :

For the novice front end of a variety of file upload strategy, from small pictures to large files resumable
Bytedance Interviewer: Please implement a large file upload and breakpoint resume
Bytedance interviewer, I also implemented large file upload and breakpoint resume

Project address: Gitee

Without further ado, let’s begin

Technology stack of this paper:

Front end: Vue, Elemental-UI, webWorker, Axios, Spark-MD5 and some file upload apis

The backend: express + multiparty

Need to sort out

Before we get started, let’s take a look at some of the features to implement:

Since the file to be uploaded is relatively large, it is impossible to directly upload the whole file. Therefore, the file needs to be cut, that is, sharded. Multiple slices are uploaded in turn, and the server finally merges the slices to obtain the final file
Large files take a long time to upload, and there are many requests after fragmentation. During the uploading process, users may be disconnected from the Internet. Therefore, our uploading function needs to support resumable uploading (refer to the downloading process of Thunderbolt).
If you set each shard to 1M, then a 200M file will have 200 shards, which means 200 requests. These 200 requests cannot be sent at the same time, so we need to control the number of concurrent requests.
As mentioned earlier, users may experience disconnection during the upload process, or a slice may be lost, and each failed request should be retried to optimize the user experience.

Now that the features are in place, let’s go through the process and some technical details

is used to host the uploaded file. When the user selects the file, the change event is triggered to get the information about the file and give the user a local preview
When the user clicks upload, the file is sliced, using the blob. slice method
The hash of the file is used as the file name for back-end storage, so the hash of the file should be calculated before uploading. Calculating the hash is a time-consuming operation, so we use the Worker thread to prevent page suspended animation.
Consider the case of resumable uploading: if the user breaks the link during upload, only the part that has not been uploaded should be uploaded when the user reuploads. To achieve this, we have two solutions:
1. On the client side, record which slices the user has uploaded, such aslocalStorage
2. Send one to the server before uploadingverifyRequest, the server looks for the folder where the user uploaded the file, gets the part that the user has uploaded and returns it to the client
Let’s use the second option.
Filter out the parts that have not been uploaded and start uploading
The user clicks pause, and instead of using the native XHR abort, instead of canceling the request that’s already been sent, I’m suspending the subsequent request, which is kind of a different place, and WE’ll see how I do that later.
The user clicks Resume to resume the suspended request
If some slices fail beyond the number of retries, the user can click the retry button to reupload the failed section
When all slices are sent, request the server to merge the slices

The flowchart is as follows:

The basic structure

Two configuration items:

const CHUNK_SIZE = 1024 * 1024 * 0.5 // Size of each slice, in bytes
const CONCURRENCY_LIMIT = 4 // Concurrency limit
Copy the code

Define the upload status

const UPLOAD_STATUS = {
  calculatingHash: "calculatingHash".// Calculating the hash
  waiting: "waiting".// Wait for the user to start uploading
  uploading: "uploading".// Uploading
  abort: "abort".// Pause the upload
  success: "success".// This state should also be used when the back-end file is merged
  fail: "fail"                        // Failed to upload
}
Copy the code

Quantities in data:

data() {
  return {
    showPreview: false.// True when the local file preview is loaded
    file: null./ / file
    hash: ""./ / file hash
    worker: null./ / worker thread
    hashPercent: 0.// Hash calculation progress: displays the hash calculation progress bar
    uploadedLen: 0.// Number of slices that have been uploaded
    chunksLen: 0.// Total number of slices
    fileChunks: [].// Record all slices
    Scheduler: null.// The concurrent task scheduler, through which all requests will be completed and error retry is implemented
    uploadStatus: UPLOAD_STATUS,
    curStatus: UPLOAD_STATUS.waiting // The current upload status}}Copy the code

Method of definition:

// When input triggers the change event, which is used to get the file
handleFileChange(e) {}
// To preview the file
preview() {}
// This method is triggered when the user clicks upload. In this method, the file fragment method, hash method, verify method and upload slice method are called
async handleUpload() {}
// When the upload fails, the user clicks the retry button
async handleRetry() {}
// File slice method
createFileChunks() {}
// Computes the hash of the file
getFileHash() {}
// Ask the server which slices have been uploaded
async verifyUpload(filename, fileHash) {}
// Upload the slice
async uploadChunks(chunksNeedUpload) {}
// Ask the server to merge slices
async mergeRequest() {}
// Triggered when the user clicks pause
handlePause() {}
// Trigger when the user clicks the continue button
handleResume() {}
Copy the code

The HTML structure is as follows, consisting mainly of a few buttons, a hash progress bar, an upload progress bar, and a video tag for previewing

<input type="file" @change="handleFileChange" />
<el-button
  type="primary"
  @click="handleUpload"
  v-show="curStatus === uploadStatus.waiting"
  >Upload < / el - button ><el-button
  type="primary"
  @click="handleRetry"
  v-show="curStatus === uploadStatus.fail"
  >Try again < / el - button ><el-button
  type="warning"
  @click="handlePause"
  v-show="curStatus === uploadStatus.uploading"
  >Suspend < / el - button ><el-button
  type="primary"
  @click="handleResume"
  v-show="curStatus === uploadStatus.abort"
  >Continue to < / el - button ><br />
<el-progress
  type="circle"
  :percentage="hashPercent"
  class="hash-progress"
  :status="hashProgressStatus"
></el-progress>
<el-progress
  :text-inside="true"
  :stroke-width="20"
  :percentage="uploadPercent"
  :status="uploadProgressStatus"
  class="file-progress"
></el-progress>
<div class="preview-container" v-show="showPreview">
  <video ref="preview" controls></video>
</div>
Copy the code

Several computed properties for the progress bar:

// Hash Computes the progress bar status, which is used to distinguish colors
hashProgressStatus({ hashPercent }) {
  return hashPercent >= 100 ? "success" : null
},
// Upload the progress bar status, which is used to distinguish colors
uploadProgressStatus({ curStatus, uploadStatus, uploadPercent }) {
  if (curStatus === uploadStatus.fail) return "exception"
  return uploadPercent >= 100 ? "success" : null
},
// Upload progress
uploadPercent({ uploadedLen, chunksLen, curStatus, uploadStatus }) {
  if (curStatus === uploadStatus.waiting) return 0
  if (curStatus === uploadStatus.success) return 100
  return Math.floor((100 * uploadedLen) / chunksLen)
}
Copy the code

The file preview

First, the user selects the file, triggering the change event, which is handled by handleFileChange

handleFileChange(e) {
  this.file = e.target.files[0] // Get the file
  this.preview() // Preview the file
}

preview() {
  // The url.createObjecturl method creates a reference URL to the local file
  const URLobj = window.URL.createObjectURL(this.file)
  const preview = this.$refs.preview
  preview.src = URLobj
  preview.oncanplay = () = > {
    this.showPreview = true}}Copy the code

Note here that in the case of a video file, we cannot destroy the reference with url.revokeobjecturl (), otherwise the video will not play. You can use this method to destroy references for images.

Preparations before uploading

File fragmentation

As you can see from the previous flowchart, we first slice the file

async handleUpload() {
  if (!this.file) {
    this.$message({
      type: "error".message: "Please select the file you want to upload"
    })
    return
  }
  // Slice the file
  this.createFileChunks()
}
Copy the code

The method of slicing is relatively simple, only need to use the slice method can be cut

function createFileChunks() {
  const chunkList = []
  let cur = 0
  const file = this.file
  const size = file.size
  while (cur < size) {
    chunkList.push(file.slice(cur, cur + CHUNK_SIZE))
    cur += CHUNK_SIZE
  }
  this.chunksLen = chunkList.length
  this.fileChunks = chunkList
}
Copy the code

Calculate the hash

The worker thread is used to hash the file, so create hash.js under /public. The worker thread accepts the slice of the file as a parameter and computes the hash of the file

// hash.js
importScript('./spark-md5.min.js')
self.onmessage = function(e) {
  const { fileChunks } = e.data
  const spark = new SparkMD5.ArrayBuffer()
  const fileReader = new FileReader()
  const len = fileChunks.length
  let curChunk = 0
  // Use the arrow function, otherwise self will point to an error
  fileReader.onload = (e) = > {
    spark.append(e.target.result)
    curChunk++
    if (curChunk >= len) {
      const hash = spark.end()
      self.postMessage({ hash, percent: 100 }) // When parsing is complete, the hash is passed
      self.close()
    } else {
      fileReader.readAsArrayBuffer(fileChunks[curChunk])
      self.postMessage({ percent: 100 * curChunk / len }) // When parsing is not complete, only the parsing progress is passed
    }
  }
  fileReader.readAsArrayBuffer(fileChunks[curChunk])
}
Copy the code

As mentioned in the Spark-MD5 document, incremental computing performance is better. For the code, see the official document

Incremental md5 performs a lot better for hashing large amounts of data, such as files. One could read files in chunks, using the FileReader & Blob’s, and append each chunk for md5 hashing while keeping memory usage low.

The logic of communication between the main thread and the worker thread is as follows

function getFileHash() {
  this.curStatus = this.uploadStatus.calculatingHash
  return new Promise((resolve) = > {
    this.worker = new Worker('/hash.js')
    this.worker.postMessage({ fileChunks: this.fileChunks })
    this.woker.onMessage = (e) = > {
      const { hash, percent } = e.data
      this.hashPercent = Math.ceil(percent)
      if (hash) {
        this.hash = hash
        resolve()
      }
    }
  })
}
Copy the code

Verify that the upload is required

The next step is to send a request to the server to get a list of the slices that have been uploaded, so you can filter out which slices need to be uploaded

Now tell me the logic of the server, the server will to hash file name to create a temporary folder, change folder to put all section, when the server receives the merge request that all uploaded slice, the server will be the folder all the slice, the address of a storage file into, then delete the temporary folder.

The server searches for the file name uploaded by the user (used to extract the file suffix) and the hash of the file. If the file exists (note that it is not a temporary folder), the file does not need to be uploaded, that is, the file is transmitted in seconds. If not, the server finds the temporary folder, gets the slices that already exist in the folder, and returns a list of the slices that have already been uploaded.

async handleUpload() {
  if (!this.file) {
    this.$message({
      type: "error".message: "Please select the file you want to upload"
    })
    return
  }
  // Slice the file
  this.createFileChunks()
  const { shouldUpload, uploadedList }await this.verifyUplaod() / / add
  // Note that the file has been uploaded
  if(! shouldUpload) {this.$message({
      type: "success".message: "File transfer in seconds"
    })
    this.curStatus = this.uploadStatus.success
    return}}Copy the code

async verifyUplaod() {
  const { data } = await axios({
    url: '/verify'.method: 'post'.data: { filename: this.file.name, fileHash: this.hash }
  })
  return data
}
Copy the code

File upload

Add tasks

Next, filter out the uploaded slices and upload the unuploaded slices

async handleUpload() {
  // ...
  const chunksNeedUpload = this.fileChunks
    .map((chunk, index) = > ({
      chunk,
      fileHash: this.hash, // Hash of the entire file
      hash: `The ${this.hash}_${index}` // The hash for each slice, which is divided by an underscore, should be the same for the back end
    }))
    .filter(hash= >! uploadedList.includes(hash))this.uploadedLen = uploadedList.length
    this.uploadChunks(chunksNeedUpload)
}
Copy the code

The next method for uploadChunks is the core. First, the MIME type we are uploading the file with is mutipart/ form-Data, so we are building the request body with FormData

async uploadChunks(chunksNeedUpload) {
  this.curStatus = this.uploadStatus.uploading
  chunksNeedUpload
    .map({chunk, fileHash, hash} => {
      const formData = new FormData()
      formData.append('chunk', chunk)
      formData.append('hash', hash) / / file hash
      formData.append('fileHash', fileHash) / / hash slice})}Copy the code

Next, build the list of requests, wrap all of them in a function, and execute them with a concurrent scheduler

async uploadChunks(chunksNeedUpload) {
  this.curStatus = this.uploadStatus.uploading
  this.Scheduler = new Scheduler(CONCURRENCY_LIMIT) / / add
  chunksNeedUpload
    .map({chunk, fileHash, hash} => {
      const formData = new FormData()
      formData.append('chunk', chunk)
      formData.append('hash', hash) / / file hash
      formData.append('fileHash', fileHash) / / hash slice
    })
    .forEach((formData, index) = > {
      const taskFn = () = > {
        axios({
          url: '/'.method: 'post'.headers: { 'Content-Type': 'multipart/form-data' },
          data: formData
        }).then(() = > this.uploadedLen++) // To display the progress bar
      }
      this.Scheduler.append(taskFn, index) / / add
    })
  const { status } = await this.Scheduler.done() / / add
  if (status === 'success') {
    this.mergeRequest()
  } else {
    this.$message({
      type: 'error'.message: 'File upload failed, please try again'}}})Copy the code

The important thing here is the Scheduler class, which adds a task by calling the Append method, executes the done method, and returns a promise. The promise returns two values: Status indicates the scheduling status of all tasks. Success indicates that all tasks are successful. Fail indicates that some tasks fail even after the number of retries. The second value is an array of results, the same as the return value for Promise.allsettled.

Concurrency control and error retry for task scheduling

The next step is to see how the asynchronous task scheduler is implemented. See /utils/scheduler.js

The function of the task scheduler is as follows: Ensure the number of concurrent asynchronous tasks. Each task can be retried when it fails. If the number of retries is too many, the task completely fails.

The state of each task is defined first:

const STATUS = {
  waiting: 'waiting'.// Waiting for execution
  running: 'running'.// Executing
  error: 'error'.// Failed, but can try again
  fail: 'fail'.// After the number of retries is exceeded, it still fails
  success: 'success'  / / success
}
Copy the code

The result of the entire task scheduling

const PENDING = 'pending'
const SUCCESS = 'success' // All tasks succeeded
const FIAL = 'fail'       // Some tasks fail even after multiple retries
Copy the code

Now let’s look at the constructor

constructor(max = 4, retryTime = 3) {
  this.status = PENDING
  this.max = max             // Maximum concurrency
  this.tasks = []            // The task array
  this.promises = []         // Promise array of task results, in order corresponding to the order in which the task was added
  this.settledCount = 0      // The number of tasks that have results, successful or failed after multiple retries
  this.abort = false         // Whether to suspend execution
  this.retryTime = retryTime // Retry times
}
Copy the code

The append method is used to add tasks and is relatively simple

append(handler, index) {
  // Add a handler function to the task array, task status, has been retried several times, index
  this.tasks.push({ handler, status: STATUS.waiting, retryTime: 0, index })
}
Copy the code

The external executes the done method to start

done() {
  // The run method is used to start scheduling tasks
  return this.run().then(() = >
    // Promises promise for each task. Promises promise for each task. // Promises promise for each task
    // This is redundant for this demo, but it is more generic
    Promise.allSettled(this.promises).then((res) = > ({
      status: this.status,
      res
    }))
  )
}
Copy the code

Next comes the core run method

run() {
  return new Promise((resolve) = > {
    const start = async() = > {}for (let i = 0; i < this.max; i++) {
      start()
    }
  })
}
Copy the code

Max tasks are started at once. In the start method, start is recursively called after the completion of one task, followed by the start method

const start = async() = > {const index = this.tasks.findIndex(
    { status } => status === STATUS.waiting || status === STATUS.error
  )
  if (index === -1) return // Note that there is a big hole
  const task = tasks[index]
  task.status = STATUS.running
  const promise = task.handler()
  this.promises[task.index] = promise
  promise
    .then(() = > {
      task.status = STATUS.success
      this.settledCount += 1
      if (this.settledCount >= this.tasks.length) {
        if (this.status === PENDING) {
          this.status = SUCCESS
        }
        resolve()
      } else {
        start()
      }
    })
    .catch(() = > {
      // If the number of retries exceeds, the task will fail completely
      if (task.retryTime >= this.retryTime) {
        task.status = STATUS.fail
        this.settledCount += 1
        this.status = FAIL // If one task fails completely, the entire task schedule will fail
      } else {
        task.status = STATUS.error
        task.retryTime += 1
      }
      if (this.settledCount >= this.tasks.length) {
        resolve()
      } else {
        start()
      }
    })
}
Copy the code

This section implements two functions: concurrency control and error retry. As you can see, the recursive call to start only exists in the THEN and catch callbacks, thus ensuring that the next task can be executed after the completion of one task. In the initial state, this. Max tasks are executed in one go, thus controlling the number of concurrent tasks. The next step is error retry

const index = this.tasks.findIndex(
  { status } => status === STATUS.waiting || status === STATUS.error
)
Copy the code

This line of code is used to find the next task to execute, targeting tasks with states of Waiting and Error. In the THEN callback, we set the status of the task to SUCCESS. In the failure callback, the status of the task that can continue to retry is Error, completely failed, and the status of the task that can not continue to retry is Fail. In this way, the failed task can be found in the next task search, thus completing the retry.

if (index === -1) return // Note that there is a big hole
Copy the code

The problem here is that if the number of concurrent tasks is set to 4, when the last 4 tasks are executed, task A will be executed first, and then the callback will be entered. At this time, there are still three tasks to be executed, so settledCount

The then and catch callbacks form closures, so you get the corresponding task

Suspending and resuming task scheduling

The next function is to pause and resume tasks. The reason for not using native XHR abort is to make it more generic, allowing concurrency control for non-network requests.

It’s also easy to pause a task by externally controlling the promise’s state (promise/defer mode?) This method is used in promiseA+ testing. The idea is to save the promise’s resolve and reject methods externally to control the state of the promise.

setDeferred() {
  let deferred, resolveFn
  deferred = new Promise((resolve) = > {
    resolveFn = resolve
  })
  this.deferred = deferred
  this.resolve = resolveFn
}
Copy the code

Execute the method in the constructor

constructor() {
  // ...
  this.setDeferred()
}
Copy the code

When this. Resolve is executed, this. Deferred is resolved.

Add a judgment to the start function:

const start = async() = > {if (this.abort) await this.deferred
  // ...
}
Copy the code

The initial state, this.abort is false, and the pause method is called when a pause is needed

pause() {
  this.abort = true
}
Copy the code

Because this. Deferred is always pending, subsequent tasks need to wait for the promise to be resolved, thus suspending the task.

As for continuing the mission, it’s easy

resume() {
  this.abort = false
  this.resolve() / / release
  this.setDeferred() / / reset deferred
}
Copy the code

Thus, an asynchronous task scheduler is completed. When the user clicks the pause and resume buttons, the handlePause and handleResume methods are triggered, which call the Scheduler.pause and Scheduler.resume methods, respectively

handlePause() {
  this.Scheduler.pause()
  this.curStatus = this.uploadStatus.abort
}
handleResume() {
  this.Scheduler.resume()
  this.curStatus = this.uploadStatus.uploading
}
Copy the code

Background Interface description

Those familiar with NodeJS can skip this section

The back end is based on @yeyan1996 code (as a 0 year front end, really does not know Node), slightly modified, using Express, it looks a little simpler, now to introduce the back end interface, mainly to help those who do not know node like me.

The author does not nodeJS, so the parameters passed are messy, will students can transform themselves

Request address: /verify. Request mode: POST

field	Whether must	instructions
fileHash	is	Hash of the entire file
filename	is	Original file name

Return data type: JSON

field	instructions
shouldUpload	False if the file already exists, true otherwise
uoloadedList	The list of slices that have been uploaded is null when shouldUpload is false

Request address: /, request mode: POST, data type: form-data

FomrData field	Whether must	instructions
chunk	is	File section
hash	is	File hash_ slice index
fileHash	is	Hash of the entire file
filename	is	Original file name

A random number was used to simulate an upload error on this interface

Request address: /merge. Request mode: POST

field	Whether must	instructions
hash	is	Hash of the entire file
suffix	is	The file suffix

Return data type: JSON

field	instructions
code	0

conclusion

Breakpoint continuation of a large file is divided into the following steps

Get the file and fragment the file
Gets the file hash based on the slice
Verify that the file has been uploaded
Upload section
Notify the server to merge slices

These basic flow, in many articles have been talked about, the demo of this article is on the basis of these, add breakpoint continuation, concurrent control, error retransmission function, personally think these are the core content of large file upload. The Scheduler class implemented in this article is a more general asynchronous task Scheduler that is not limited to Ajax requests.

The source code

gitee

Shortage and Prospect

This article only implements a large file breakpoint continuation to provide basic functions, in fact, there are many places to expand, such as:

usewebsocket, the server actively pushes the progress
The user selects the file, but before clicking upload, implements a function similar to precalculation
The size and concurrent number of file fragments are fixed and can be adjusted according to the user network speed. For example, when the user network speed is high, multiple slices can be merged into one slice to upload through writable streams to improve the speed
Type check before upload and so on

You can combine the above shortcomings, or expand according to their own needs, I hope this article can help you!!

If the feeling article is good, please point to like it 👍🏻!!