The core of MD5 is to map arbitrary length of raw data into 128 bit data through algorithm. A string of data is processed to produce another fixed length of data. Message Digest Algorithm 5 is a Hash Algorithm.

Different raw data has different MD5 values. Therefore, different files have different MD5 values.

Generally, in the scenario of file uploading, the MD5 function is implemented to continue and transfer files in seconds.

This article briefly describes how to generate MD5, using the plug-in Spark-MD5.

TL; DR

  • For fear of trouble, generate MD5 within 10M, direct use of a line
  • Afraid of trouble, generate MD5 within 30M, direct use of the second line
  • A little bit bigger. Use number three
  • But method three can be used in all of these scenarios

Method 1: Generate MD5 of the file

Generate MD5 file, the simple idea is as follows:

  • Create a FileReader instance and read the file
  • After reading it, use it directly on successSparkMD5.hashBinary.

Specific code at the end of the article.

defects

In this way, md5 is generated slowly for larger files. For example, it takes 1s to generate MD5 for a 40 MB file. When there are other interactions on the page, the other interactions will be blocked and the page will be in a state of suspended animation.

For example: add a button, write a click event. After selecting the file, click the button immediately, and you will find that the pop-up box becomes slower and slower as the file becomes larger.

<input id="upload" type="file" onchange="selectLocalFile" />
<! -- Add a button here, after selecting the file -->
<button onclick="alert(1)">Test thread blocking</button>
<script>
  upload.onchange = async (e) => {
    const file = e.target.files[0];
    console.time("timeCreateMd5");
    const md5 = await createFileMd5(file);
    console.log(file.size);
    // TimecreatemD5:959.31396484375 ms will be printed
    console.timeEnd("timeCreateMd5");
  };
</script>
Copy the code

Method 2: Generate MD5 in worker

Therefore, we use web-worker to compute hash in the worker thread, so that users can still interact normally on the main interface without causing congestion. The current page is modified as follows:

The new hash.js file executed in worker is as follows:

self.importScripts("https://unpkg.com/[email protected]/spark-md5.min.js");

// Generate the file hash
self.onmessage = async e => {
  const file = e.data
  const md5 = await createFileMd5(file)
  self.postMessage(md5);
};

function createFileMd5(file){
  // ...
  // Same as before, but need to change, add self prefix
  isSuccess
        ? resolve(self.SparkMD5.hashBinary(result))
        : reject(new Error("Reading error"));
}

Copy the code

It takes a long time to generate MD5 for large files, but at least it doesn’t block the main thread of the page.

Also has a defect

As you can see, md5 is calculated by reading the entire file, which is extremely memory consuming when the file is too large. Therefore, fragments need to be read to generate MD5.

Method 3: Generate MD5 for a file fragment

Spark-md5 is also recommended to read fragments, similar to streams in NodeJS. This does not require a large amount of memory.

First, divide the File into chunks of a certain size. Here, File. Slice is directly used.

The chunks are then passed on to another thread to calculate MD5. Progress bars may be required for large files, so there is a progress bar, which is used as required.

Note: Code

Code: generate small file MD5

<! DOCTYPEhtml>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="Width = device - width, initial - scale = 1.0" />
    <title>Document</title>
  </head>
  <body>
    <input id="upload" type="file" onchange="selectLocalFile" />
    <script src="https://unpkg.com/[email protected]/spark-md5.min.js"></script>
    <script>
      const upload = document.querySelector("#upload");
      upload.onchange = async (e) => {
        const file = e.target.files[0];
        const md5 = await createFileMd5(file);
        console.log(md5);
      };
      function createFileMd5(file) {
        return new Promise((resolve, reject) = > {
          // Create a FileReader instance
          const fileReader = new FileReader();
          // Start reading the file
          fileReader.readAsBinaryString(file);
          // After the file is read, the load event is triggered
          fileReader.onload = (e) = > {
            // e.target is the fileReader instance
            console.log(e.target);
            // result is what fileReader reads
            const result = e.target.result;
            // If the read length is the same as the file length, the read succeeds
            const isSuccess = file.size === result.length;
            // If the read succeeds, MD5 is generated and thrown. Failure is reported as an error
            isSuccess
              ? resolve(SparkMD5.hashBinary(result))
              : reject(new Error("Reading error"));
          };
          // Error is reported during reading
          fileReader.onerror = () = > reject(new Error("Reading error"));
        });
      }
    </script>
  </body>
</html>
Copy the code

Code: generate MD5 in worker

<! DOCTYPEhtml>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="Width = device - width, initial - scale = 1.0" />
    <title>Document</title>
  </head>
  <body>
    <input id="upload" type="file" onchange="selectLocalFile" />
    <button onclick="alert(1)">Test thread blocking</button>
    <script>
      const upload = document.querySelector("#upload");
      upload.onchange = async (e) => {
        const file = e.target.files[0];
        console.time("timeCreateMd5");
        const md5 = await createFileMd5InWorker(file);
        console.log(file.size);
        console.timeEnd("timeCreateMd5");
      };
      // Generate file md5 (web-worker)
      function createFileMd5InWorker(file) {
        return new Promise((resolve) = > {
          // Create a new worker thread and execute hash.js
          const worker = new Worker("./hash.js");
          // Pass file to the thread
          worker.postMessage(file);
          When the thread sends a message, the message is received
          worker.onmessage = (e) = > {
            const md5 = e.data;
            md5 && resolve(md5)
          };
        });
      }
    </script>
  </body>
</html>

Copy the code
// hash.js
self.importScripts("https://unpkg.com/[email protected]/spark-md5.min.js");

// Generate file MD5
self.onmessage = async e => {
  const file = e.data
  const md5 = await createFileMd5(file)
  self.postMessage(md5);
  self.close()
};

function createFileMd5(file) {
  return new Promise((resolve, reject) = > {
    // Create a FileReader instance
    const fileReader = new FileReader();
    // Start reading the file
    fileReader.readAsBinaryString(file);
    // After the file is read, the load event is triggered
    fileReader.onload = (e) = > {
      // e.target is the fileReader instance
      console.log(e.target);
      // result is what fileReader reads
      const result = e.target.result;
      // If the read length is the same as the file length, the read succeeds
      const isSuccess = file.size === result.length;
      // If the read succeeds, MD5 is generated and thrown. Failure is reported as an error
      isSuccess
        ? resolve(self.SparkMD5.hashBinary(result))
        : reject(new Error("Reading error"));
    };
    // Error is reported during reading
    fileReader.onerror = () = > reject(new Error("Reading error"));
  });
}

Copy the code

Code: Fragment read generates MD5

<! DOCTYPEhtml>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="Width = device - width, initial - scale = 1.0" />
    <title>Document</title>
  </head>
  <body>
    <input id="upload" type="file" onchange="selectLocalFile" />
    <button onclick="alert(1)">Test thread blocking</button>
    <script>
      const upload = document.querySelector("#upload");
      upload.onchange = async (e) => {
        const file = e.target.files[0];
        const chunks = createFileChunk(file)
        console.time("timeCreateMd5");
        // this is a big error
        const {md5} = await createFileMd5InWorker(chunks);
        console.log(file.size);
        console.timeEnd("timeCreateMd5");
      };
      
      // Generate file slices
      function createFileChunk(file, size = 4 * 1024 * 1024) {
        let chunks = [];
        let cur = 0;
        while (cur < file.size) {
          chunks.push(file.slice(cur, cur + size));
          cur += size;
        }
        return chunks;
      }
      // Generate file hash (web-worker)
      function createFileMd5InWorker(fileChunks) {
        return new Promise((resolve) = > {
          const worker = new Worker("./hash.js");
          worker.postMessage({ fileChunks });
          worker.onmessage = (e) = > {
            // Add a progress bar here
            const { percentage, hash } = e.data;
            console.log(percentage)
            // After calculating the hash, throw it
            hash &&resolve(hash);
          };
        });
      }
    </script>
  </body>
</html>

Copy the code
Direct copy / / https://juejin.cn/post/6844904046436843527#heading-17
self.importScripts("./js/lib/spark-md5.min.js"); // Import the script

// Generate the file hash
self.onmessage = e= > {
  const { fileChunks } = e.data;
  console.log(fileChunks)
// const { fileChunks } = e.data;
  const spark = new self.SparkMD5.ArrayBuffer();
  let percentage = 0;
  let count = 0;
  const loadNext = index= > {
    const reader = new FileReader();
    reader.readAsArrayBuffer(fileChunks[index]);
    reader.onload = e= > {
      count++;
      spark.append(e.target.result);
      if (count === fileChunks.length) {
        self.postMessage({
          percentage: 100.hash: spark.end()
        });
        self.close();
      } else {
        percentage += 100/ fileChunks.length; self.postMessage({ percentage }); loadNext(count); }}; }; loadNext(0);
};


Copy the code

reference

  • Bytedance Interviewer: Please implement a large file upload and resumable breakpoint
  • Js implements input file conversion into bloB and byte streams
  • 3 minutes to learn MD5