Java to achieve large file multithreading download, speed up 30 times! Want to learn? I teach you!

preface

In the last article “the interview officer does not speak wudu” on the Java primary program of friction Http protocol, we have mentioned a large file download and resumable breakpoint, we will develop a multithreaded file download, finally we use this multithreaded download to break the speed limit of Baidu cloud disk download.

For those of you who may think this is a bit of a clickbait, let’s take a look at the final test results:

Test baidu cloud download file 46M, their local maximum download speed of 2M

1. Single-thread download, total time: 603s

2. Multithreaded download, 50 threads, total time: 13s

Test results, 46 times faster, and I’m being too modest to say 30 times faster, which we think should be applauded.

Source code address: gitee.com/silently952…

If you like, please remember star

HTTP Range request header

Range is mainly for the Range requests that only need to obtain part of the resource. By specifying Range, the server can be informed of the specified Range of the resource. Format: Range: bytes=start-end

For example, get the range of 5001-10000 bytes

Range: bytes=5001-10000
Copy the code

You can also specify the start position without specifying the end position, which means that all data after the start position is obtained

Range: bytes=5001-
Copy the code

The server receives a request with Range and returns a 206 Partial Content response after processing the request.

Based on the features of Range, we can realize the multithreading download of files, file resumable breakpoint

The preparatory work

In this article we use the RestTemplate in SpringMVC; Since the baidu cloud link is Https, we need to set the RestTemplate to bypass certificate verification

pom.xml

writeRestTemplateConstructor, and certificate validation bypassing HTTPS

During the download process, we need to know what the current download speed is, so we need to define an interface that displays the download speed

To calculate the download speed, we need to know how many bytes are transferred per second. In order to monitor the data transfer process, we need to know the ResponseExtractor interface in SpringMVC

The interface has only one method, which is called when the client-server connection is established, and we can monitor the download speed in this method.

DisplayDownloadSpeedAn abstract implementation of an interfaceAbstractDisplayDownloadSpeedResponseExtractor

The whole project mainly involves class diagrams

Simple file downloader

The restTemplate call execute is used to fetch the byte array from the file and write it directly to the target file.

The important thing to note here is that this method puts the file’s byte array into memory and consumes resources. So let’s see how we do that.

createByteArrayResponseExtractorClass inheritanceAbstractDisplayDownloadSpeedResponseExtractor

callrestTemplate.executePerform the download and save the byte data to the file

Test download 819M idea

After some time, we can see that the memory has been used about 800M, so this method can only be used for small text download, if we download a few gigabytes of large files, the memory is definitely not enough. As for the download time, because the file is too large and did not wait for the download completed on the end of the program.

Single threaded large file download

The above way can only download small files, that large file download we should use what way? We can output the stream to a file instead of memory. Next we will implement our large file download.

createFileResponseExtractorClass inheritanceAbstractDisplayDownloadSpeedResponseExtractorTo output the stream to a file

File download, first output stream to temporary download file (xxxxx.download), download after the completion of the rename file

Test download 819M idea

After executing for a period of time, we take a look at the memory usage, and find that this method consumes less memory, the effect is relatively ideal, download time: 199s

Multithreaded file download

If the server does not speed limit, usually can run their own local bandwidth to full, then the use of single-thread download is enough, but if the server speed limit, download speed is far less than their local bandwidth, then you can consider using multi-thread download. For multithreading we use CompletableFuture (see the article CompletableFuture keeps your code from suffering from blocking).

The basic process of multithreaded file download:

First we get the total size of the file through the Http protocol Head method
Then the file size is evenly divided according to the set number of threads, and the start and end positions of downloaded bytes of data for each thread are calculated
Start the thread, set the HTTP request header Range information, and start downloading data to a temporary file
After downloading, merge the temporary files downloaded by each thread into a single file

The completion code is as follows:

Open 30 threads to test download 819M idea

From the execution result, because 30 threads are downloading at the same time, the memory consumption is larger than that of a single thread, but it is also within the acceptable range, download time: 81s, the speed is increased by 2.5 times, this is because the download server of IDEA has no speed limit, the increase of multi-thread speed this time is only fully squeezing the local bandwidth, so the range of suggestion is not big.

Single – thread downloads versus thread downloads

Because Baidu cloud disk has a limit on the download speed of a single thread, about 100KB, so we use baidu cloud disk download link to test the download speed of multithreading and single thread.

Test baidu cloud disk 46M file download speed, their local maximum download speed of 2M
Gets the download address of the file

Note: the links obtained from the browser need to be decoded using URLDecode first, otherwise the download will fail, and the download link of Baidu cloud disk file is time-limited, it cannot be downloaded after the expiration, so you need to regenerate the download link

Test single thread download file

The results of the execution can be seen that Baidu cloud on the single-thread download speed is really crazy, 46M file download time: 600s

Test multithreaded download files

In order to squeeze the network speed and find the most appropriate number of threads, we tested the download speed of different threads

The number of threads	Total download time
10	60s
20	30s
30	21s
40	15s
50	13s

Based on the results of the tests, it is appropriate to set the number of threads to about 30 for your environment

Write to the end (pay attention, don’t get lost)

There may be more or less deficiencies and mistakes in the article, suggestions or opinions are welcome to comment and exchange.

Finally, the painstaking creation, please do not white whoring oh, I hope friends can click on the comment to pay attention to three, because these are all the power source I share 🙏

How to realize file breakpoint continuation, welcome to speak your ideas in the comments area.

This article is purely for learning purposes