⚡️ front-end multi-threaded large file download practice, speed up 10 times (take Baidu cloud disk)

background

Yes, you read that right, it’s front-end multithreading, not Node. This exploration originated from the recent development, a special status code was found in the development requirements related to video streaming, his name is 206~

In order to prevent the boring of this article, the first effect of the town. (Take a 3.7m image for example).

Animation comparison (Single thread – left vs. 10 threads – right)

Time comparison (single thread vs. 10 threads)

If you’re excited about this, stay with me, and let’s grab a bag and see how it all works.

GET /360_0388.jpg HTTP/1.1 Host: limit.qiufeng.com Connection: keep-alive... Range: bytes=0-102399 HTTP/1.1 206 Partial Content Server: OpenResty /1.13.6.2 Date: Sat, 19 Sep 2020 06:31:11 GMT Content-Type: image/jpeg Content-Length: 102400 .... Content-Range: bytes 0-102399/3670627 ... (Here is the file flow)Copy the code

Content-range: bytes 0-102399 content-range: bytes 0-102399/3670627 the status code returned is 206.

So what is Range? I still remember that I wrote an article about file download a few days ago, which mentioned the downloading methods of large files and something called Range. However, the last article was a systematic overview of file download, so it did not give a detailed introduction to Range.

All of the following codes are available at github.com/hua1995116/…

Basic introduction to Range

The origin of the Range

Range is a new field in HTTP/1.1 and is the core mechanism we use for multi-threaded downloads and breakpoint downloads such as Thunderbolt. (Introductory copy, excerpted)

First, the client sends a request with Range: bytes=0-xxx. If the server supports Range, then accept-Ranges: bytes is added to the response header to indicate that the request supports Range. Then the client may send a request with Range.

The server uses Range in the request header: Bytes =0-xxx To determine whether Range processing is performed. If this value exists and is valid, only the requested part of the file is sent back. The status code of the response is 206, indicating Partial Content. If Request Range Not Satisfiable, return the 416 status code indicating Request Range Not Satisfiable. If there is no Range in the request header, the server responds normally and does not set content-range, etc.

Value	Description
206	Partial Content
416	Range Not Satisfiable

The format of Range is:

Range:(unit=first byte pos)-[last byte pos]

That is, Range: units such as bytes = start byte position – end byte position.

For example, let’s say we have enabled multi-threaded download and need to download a 5000byte file in four threads.

Range: bytes=0-1199 The first 1200 bytes
Range: bytes=1200-2399 Second 1200 bytes
Range: bytes=2400-3599 The third 1200 bytes
Range: bytes=3600 to 5000 Last 1400 bytes

The server responds:

The first response

The Content – Length: 1200
The Content – Range: bytes 0-1199/5000

The second response

The Content – Length: 1200
The Content – Range: bytes 1200-2399/5000

The third response

The Content – Length: 1200
The Content – Range: bytes 2400-3599/5000

Response number four

The Content – Length: 1400
The Content – Range: bytes 3600-5000/5000

If each request is successful, the server returns a content-range field in the response header. The Content-range is used in the response header to tell the client how much data was sent, and it describes the Range and overall entity length of the response. General format:

Content-range: bytes (Unit first byte pos) – [Last byte pos]/[Entity Length] Content-range: start byte position – End byte position/file size.

Browser Support

All major browsers currently support this feature.

Server support

Nginx

After nginx version 1.9.8, (plus ngx_http_slice_module) is automatically supported by default. Max_ranges can be set to 0 to cancel this setting.

Node

Node does not provide Range method handling by default, so you need to write your own code to handle it.

router.get('/api/rangeFile'.async(ctx) => {
    const { filename } = ctx.query;
    const { size } = fs.statSync(path.join(__dirname, './static/', filename));
    const range = ctx.headers['range'];
    if(! range) { ctx.set('Accept-Ranges'.'bytes');
        ctx.body = fs.readFileSync(path.join(__dirname, './static/', filename));
        return;
    }
    const { start, end } = getRange(range);
    if (start >= size || end >= size) {
        ctx.response.status = 416;
        ctx.body = ' ';
        return;
    }
    ctx.response.status = 206;
    ctx.set('Accept-Ranges'.'bytes');
    ctx.set('Content-Range'.`bytes ${start}-${end ? end : size - 1}/${size}`);
    ctx.body = fs.createReadStream(path.join(__dirname, './static/', filename), { start, end });
})
Copy the code

Or you can use the koa-send library.

Github.com/pillarjs/se…

Practice Range

The architecture overview

Let’s start with an overview of the process architecture diagram. Single thread is very simple, normal download can, do not understand can refer to my last article. Multi-threaded words, will be more troublesome, need to download by piece, download good, need to merge and then download. (For blob and other download methods, see the previous article.)

Server code

Very simple, is compatible with Range.

router.get('/api/rangeFile'.async(ctx) => {
    const { filename } = ctx.query;
    const { size } = fs.statSync(path.join(__dirname, './static/', filename));
    const range = ctx.headers['range'];
    if(! range) { ctx.set('Accept-Ranges'.'bytes');
        ctx.body = fs.readFileSync(path.join(__dirname, './static/', filename));
        return;
    }
    const { start, end } = getRange(range);
    if (start >= size || end >= size) {
        ctx.response.status = 416;
        ctx.body = ' ';
        return;
    }
    ctx.response.status = 206;
    ctx.set('Accept-Ranges'.'bytes');
    ctx.set('Content-Range'.`bytes ${start}-${end ? end : size - 1}/${size}`);
    ctx.body = fs.createReadStream(path.join(__dirname, './static/', filename), { start, end });
})
Copy the code

html

And then I’m going to write HTML, there’s nothing to say about that, and I’m going to write two buttons to show it.

<! -- html -->
<button id="download1">Serial download</button>
<button id="download2">Multithreaded download</button>
<script src="https://cdn.bootcss.com/axios/0.19.2/axios.min.js"></script>
Copy the code

Js public parameters

const m = 1024 * 520;  // The size of the fragment
const url = 'http://localhost:8888/api/rangeFile? filename=360_0388.jpg'; // The address to download
Copy the code

Single threaded part

Single-threaded download code, directly request bloB, and then download blobURL.

download1.onclick = () = > {
    console.time("Direct download");
    function download(url) {
        const req = new XMLHttpRequest();
        req.open("GET", url, true);
        req.responseType = "blob";
        req.onload = function (oEvent) {
            const content = req.response;
            const aTag = document.createElement('a');
            aTag.download = '360_0388.jpg';
            const blob = new Blob([content])
            const blobUrl = URL.createObjectURL(blob);
            aTag.href = blobUrl;
            aTag.click();
            URL.revokeObjectURL(blob);
            console.timeEnd("Direct download");
        };
        req.send();
    }
    download(url);
}
Copy the code

Multithreaded part

A head request is sent to get the size of the file, and the slippage distance of each shard is calculated based on length and the shard size set. In the promise. all callback, fragment buffers are merged into a bloB using concatenate and downloaded as blobURL.

// script
function downloadRange(url, start, end, i) {
    return new Promise((resolve, reject) = > {
        const req = new XMLHttpRequest();
        req.open("GET", url, true);
        req.setRequestHeader('range'.`bytes=${start}-${end}`)
        req.responseType = "blob";
        req.onload = function (oEvent) {
            req.response.arrayBuffer().then(res= > {
                resolve({
                    i,
                    buffer: res }); })}; req.send(); })}/ / buffer
function concatenate(resultConstructor, arrays) {
    let totalLength = 0;
    for (let arr of arrays) {
        totalLength += arr.length;
    }
    let result = new resultConstructor(totalLength);
    let offset = 0;
    for (let arr of arrays) {
        result.set(arr, offset);
        offset += arr.length;
    }
    return result;
}
download2.onclick = () = > {
    axios({
        url,
        method: 'head',
    }).then((res) = > {
        // Get the length to split the block
        console.time("Concurrent download");
        const size = Number(res.headers['content-length']);
        const length = parseInt(size / m);
        const arr = []
        for (let i = 0; i < length; i++) {
            let start = i * m;
            let end = (i == length - 1)? size -1  : (i + 1) * m - 1;
            arr.push(downloadRange(url, start, end, i))
        }
        Promise.all(arr).then(res= > {
            const arrBufferList = res.sort(item= > item.i - item.i).map(item= > new Uint8Array(item.buffer));
            const allBuffer = concatenate(Uint8Array, arrBufferList);
            const blob = new Blob([allBuffer], {type: 'image/jpeg'});
            const blobUrl = URL.createObjectURL(blob);
            const aTag = document.createElement('a');
            aTag.download = '360_0388.jpg';
            aTag.href = blobUrl;
            aTag.click();
            URL.revokeObjectURL(blob);
            console.timeEnd("Concurrent download"); })})}Copy the code

Complete sample

Github.com/hua1995116/…

/ / CD into the directory file - download / / start node server. Open js / / http://localhost:8888/example/download-multiple/index.htmlCopy the code

Since Google Chrome has restrictions on a single domain name in HTTP/1.1, the maximum number of concurrent requests for a single domain name is 6.

This can be reflected in the source code and official discussions.

Discuss the address

Bugs.chromium.org/p/chromium/…

Chromium source

/ / https://source.chromium.org/chromium/chromium/src/+/refs/tags/87.0.4268.1:net/socket/client_socket_pool_manager.cc; l=47
// Default to allow up to 6 connections per host. Experiment and tuning may
// try other values (greater than 0). Too large may cause many problems, such
// as home routers blocking the connections! ? ! ? See http://crbug.com/12066.
//
// WebSocket connections are long-lived, and should be treated differently
// than normal other connections. Use a limit of 255, so the limit for wss will
// be the same as the limit for ws. Also note that Firefox uses a limit of 200.
// See http://crbug.com/486800
int g_max_sockets_per_group[] = {
    6.// NORMAL_SOCKET_POOL
    255  // WEBSOCKET_SOCKET_POOL
};
Copy the code

So to accommodate this feature I split the file into six fragments, each 520KB (yes, every code needs a love number), which opens up six threads for download.

I downloaded it six times each with a single thread and multiple threads, which seemed to be about the same speed. So why is it different from what we expected?

Explore the cause of failure

I began to compare the two requests carefully, observing the speed of the two requests.

Six threads are concurrent

A single thread

According to the speed of 3.7m 82ms, about 46KB is downloaded in 1ms, while the actual situation can be seen that 533KB, the average download time is about 20ms (the connection time has been excluded, pure content download time).

So I went and looked it up, and I learned that there’s something called downward velocity and upward velocity.

The actual transmission speed of a network is divided into uplink speed and downlink speed. The uplink speed is the speed at which data is sent and the downlink speed is the speed at which data is received. ADSL is based on our ordinary Internet access, issued data requirements relatively small download data this habit to achieve a transmission mode. We say that for 4M broadband, then our l-theory maximum download speed is 512K/S, which is called downlink speed. — Baidu Encyclopedia

So where do we stand now?

Comparing the server to a big water pipe, let me illustrate our single-thread and multi-thread downloads. The server is on the left and the client is on the right. (All of the following cases are considered ideally, just to simulate the process, and do not consider the race effects of some other programs.)

Single thread

multithreading

Yes, since our server is a big water pipe, the flow rate is constant, and there is no limit on our client. If it is a single thread, it will run at the maximum speed of the user. If you have multiple threads, if you have three threads, each thread is running a third as fast as the original thread. The combined speed is no different from that of a single thread.

Below I will be divided into several cases to explain, what kind of situation is our multithreading will take effect?

If the server bandwidth is greater than the user bandwidth, there is no limit

In fact, this situation is similar to the situation we encountered.

The server bandwidth is much higher than the user bandwidth, limiting the single-connection network speed

If the server limits the download speed of a single broadband, most of which is also the case, such as baidu Cloud, for example, clearly you have a 10M broadband, but the actual download speed is only 100KB /s, in this case, we can enable multi-threading to download, because it often limits the download of a single TCP. Of course, the online environment does not allow users to open an infinite number of threads, but there are limits, which limit the maximum TCP for your current IP address. In this case, the download limit is usually the maximum speed of your users. Using the example above, if you have 10 threads running at maximum speed, because at any rate your entry is already restricted, then each thread will race for speed and more threads will not help.

The improved scheme

Since I couldn’t find an easy way to control the download speed for Node, I introduced Nginx.

We kept the speed of each TCP connection at 1M/s.

Added limit_rate 1M;

The preparatory work

1.nginx_conf

server {
    listen 80;
    server_name limit.qiufeng.com;
    access_log  /opt/logs/wwwlogs/limitqiufeng.access.log;
    error_log  /opt/logs/wwwlogs/limitqiufeng.error.log;

    add_header Cache-Control max-age=60;
    add_header Access-Control-Allow-Origin *;
    add_header Access-Control-Allow-Methods 'GET, OPTIONS';
    add_header Access-Control-Allow-Headers 'DNT,X-Mx-ReqToken,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Authorization,ran ge,If-Range';
    if ($request_method = 'OPTIONS') {
        return 204;
    }
    limit_rate 1M;
    location / {
        rootYour static directory;indexindex.html; }}Copy the code

2. Configure the local host

127.0.0.1 limit.qiufeng.com
Copy the code

View the effect, this speed is basically normal, multithreaded download faster than single thread. The speed is basically 5 to 6:1, but it is found that if the download process is clicked several times quickly, the download will be faster and faster using Range.

Modify the download address in the codeconst url = 'http://localhost:8888/api/rangeFile? filename=360_0388.jpg'; becomeconst url = 'http://limit.qiufeng.com/360_0388.jpg';
Copy the code

Test download speed

Remember that HTTP/1.1 allows only 6 requests to be sent concurrently to a site, and any additional requests will be sent to the next batch. However, HTTP/2.0 is not subject to this limitation, and multiplexing replaces HTTP/1.x’s sequence and blocking mechanisms. Let’s test it by updating HTTP/2.0.

A certificate needs to be generated locally. (Method for generating a certificate: juejin.cn/post/684490…

server {
    listen 443 ssl http2;
    ssl on;
    ssl_certificate /usr/local/openresty/nginx/conf/ssl/server.crt;
    ssl_certificate_key /usr/local/openresty/nginx/conf/ssl/server.key;
    ssl_session_cache shared:le_nginx_SSL:1m;
    ssl_session_timeout 1440m;

    ssl_protocols SSLv3 TLSv1 TLSv1.1 TLSv1.2;
    ssl_ciphersRC4:HIGH:! aNULL:! MD5;ssl_prefer_server_ciphers on;
    server_name limit.qiufeng.com;
 
    access_log  /opt/logs/wwwlogs/limitqiufeng2.access.log;
    error_log  /opt/logs/wwwlogs/limitqiufeng2.error.log;

    add_header Cache-Control max-age=60;
    add_header Access-Control-Allow-Origin *;
    add_header Access-Control-Allow-Methods 'GET, OPTIONS';
    add_header Access-Control-Allow-Headers 'DNT,X-Mx-ReqToken,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Authorization,ran ge,If-Range';
    if ($request_method = 'OPTIONS') {
        return 204;
    }
    limit_rate 1M;
    location / {
        root/node-demo/file-download/;indexindex.html; }}Copy the code

10 threads

Const m = 1024 * 400;Copy the code

12 threads

24 thread

Of course, the more threads the better, after testing, it was found that when the number of threads reached a certain number, the speed would be slower. Below is a rendering of the 36 concurrent requests.

Practical application exploration

What’s the point of all those process downloads? Yes, also said at the beginning, this fragmentation mechanism is the core mechanism of download software such as Thunder.

Netease Cloud Classroom

Study.163.com/course/cour…

We opened the console and easily found this download URL, directly a nude MP4 download address.

Take our test script from the console input.

// The test script is too long and should be able to write the code if you read the above article carefully. Can see the following code.
https://github.com/hua1995116/node-demo/blob/master/file-download/example/download-multiple/script.js
Copy the code

Direct download

Multithreaded download

It can be seen that since netease Cloud Class has no restrictions on the download speed of a single TCP, the speed of improvement is not so obvious.

Baidu cloud

Let’s test the web version of Baidu Cloud.

Take a 16.6 MB file as an example.

Open the web version of Baidu cloud disk interface, click download

Click pause to open Chrome -> More -> Download Content -> right click to copy the download link

Still use the above netease cloud course to download the script of the course. You just have to change the parameters.

Change url to baidu cloud download link. Change M to 1024 * 1024 * 2Copy the code

Direct download

Baidu cloud more than a single TCP connection speed limit, is really inhuman, fully spent 217 seconds!! We suffer a lot from a 17M file. (Except VIP players)

Multithreaded download

Since it is HTTP/1.1, we only need to open 6 or more threads for downloading. Here is the multithreaded download speed, approximately 46 seconds.

Let’s use this diagram to get a feel for the speed difference again.

Really sweet, free and only rely on our front end to achieve this function, too TM sweet, you don’t hurry to try??

Package defects

1. There is a limit on the upper limit of large files

Since blobs are limited in size in major browsers, this approach has some drawbacks.

Browser	Constructs as	Filenames	Max Blob Size	Dependencies
Firefox 20+	Blob	Yes	800 MiB	None
Firefox < 20	data: URI	No	n/a	Blob.js
Chrome	Blob	Yes	2GB	None
Chrome for Android	Blob	Yes	RAM/5	None
Edge	Blob	Yes	?	None
IE 10+	Blob	Yes	600 MiB	None
Opera 15+	Blob	Yes	500 MiB	None
Opera < 15	data: URI	No	n/a	Blob.js
Safari 6.1 + *	Blob	No	?	None
Safari < 6	data: URI	No	n/a	Blob.js
Safari 10.1 +	Blob	Yes	n/a	None

2. The server limits the speed of a single TCP connection

In general, there will be a limit, so this time depends on the user’s width speed.

At the end

The article is written in haste, and the expression may not be particularly accurate. If there is any mistake, welcome to point out.

Under the investigation, there is no web version of baidu cloud acceleration plug-in, if not build a web version of Baidu cloud download plug-in ~.

reference

Nginx bandwidth control: blog.huoding.com/2015/03/20/…

Openresty deployment HTTPS and open http2 support: www.gryen.com/articles/sh…

Talk about the Range of HTTP: dabing1022. Making. IO / 2016/12/24 /…

❤️ thank you

If you find this article helpful:

Please like to support, so that more people can see this content, your like is my biggest encouragement.
Follow public accountNotes on the autumn wind, contact the author and explore interesting knowledge together.
Upload and download a series of articles
- This article takes you layer by layer to unlock the secrets of file download
- Understand the whole process of file uploading (1.8W word in-depth analysis, necessary for advanced)