preface
In the last article “the interview officer does not speak wudu” on the Java primary program of friction Http protocol, we have mentioned a large file download and resumable breakpoint, we will develop a multithreaded file download, finally we use this multithreaded download to break the speed limit of Baidu cloud disk download.
For those of you who may think this is a bit of a clickbait, let’s take a look at the final test results:
Test baidu cloud download file 46M, their local maximum download speed of 2M
1. Single-thread download, total time: 603s
2. Multithreaded download, 50 threads, total time: 13s
Test results, 46 times faster, and I’m being too modest to say 30 times faster, which we think should be applauded.
Source code address: gitee.com/silently952…
If you like, please remember star
HTTP Range request header
Range is mainly for the Range requests that only need to obtain part of the resource. By specifying Range, the server can be informed of the specified Range of the resource. Format: Range: bytes=start-end
For example, get the range of 5001-10000 bytes
Range: bytes=5001-10000
Copy the code
You can also specify the start position without specifying the end position, which means that all data after the start position is obtained
Range: bytes=5001-
Copy the code
The server receives a request with Range and returns a 206 Partial Content response after processing the request.
Based on the features of Range, we can realize the multithreading download of files, file resumable breakpoint
The preparatory work
In this article we use the RestTemplate in SpringMVC; Since the baidu cloud link is Https, we need to set the RestTemplate to bypass certificate verification
- pom.xml
- write
RestTemplate
Constructor, and certificate validation bypassing HTTPS
- During the download process, we need to know what the current download speed is, so we need to define an interface that displays the download speed
To calculate the download speed, we need to know how many bytes are transferred per second. In order to monitor the data transfer process, we need to know the ResponseExtractor interface in SpringMVC
The interface has only one method, which is called when the client-server connection is established, and we can monitor the download speed in this method.
DisplayDownloadSpeed
An abstract implementation of an interfaceAbstractDisplayDownloadSpeedResponseExtractor
- The whole project mainly involves class diagrams
Simple file downloader
The restTemplate call execute is used to fetch the byte array from the file and write it directly to the target file.
The important thing to note here is that this method puts the file’s byte array into memory and consumes resources. So let’s see how we do that.
- create
ByteArrayResponseExtractor
Class inheritanceAbstractDisplayDownloadSpeedResponseExtractor
- call
restTemplate.execute
Perform the download and save the byte data to the file
- Test download 819M idea
After some time, we can see that the memory has been used about 800M, so this method can only be used for small text download, if we download a few gigabytes of large files, the memory is definitely not enough. As for the download time, because the file is too large and did not wait for the download completed on the end of the program.
Single threaded large file download
The above way can only download small files, that large file download we should use what way? We can output the stream to a file instead of memory. Next we will implement our large file download.
- create
FileResponseExtractor
Class inheritanceAbstractDisplayDownloadSpeedResponseExtractor
To output the stream to a file
- File download, first output stream to temporary download file (xxxxx.download), download after the completion of the rename file
- Test download 819M idea
After executing for a period of time, we take a look at the memory usage, and find that this method consumes less memory, the effect is relatively ideal, download time: 199s
Multithreaded file download
If the server does not speed limit, usually can run their own local bandwidth to full, then the use of single-thread download is enough, but if the server speed limit, download speed is far less than their local bandwidth, then you can consider using multi-thread download. For multithreading we use CompletableFuture (see the article CompletableFuture keeps your code from suffering from blocking).
The basic process of multithreaded file download:
- First we get the total size of the file through the Http protocol Head method
- Then the file size is evenly divided according to the set number of threads, and the start and end positions of downloaded bytes of data for each thread are calculated
- Start the thread, set the HTTP request header Range information, and start downloading data to a temporary file
- After downloading, merge the temporary files downloaded by each thread into a single file
The completion code is as follows:
- Open 30 threads to test download 819M idea
From the execution result, because 30 threads are downloading at the same time, the memory consumption is larger than that of a single thread, but it is also within the acceptable range, download time: 81s, the speed is increased by 2.5 times, this is because the download server of IDEA has no speed limit, the increase of multi-thread speed this time is only fully squeezing the local bandwidth, so the range of suggestion is not big.
Single – thread downloads versus thread downloads
Because Baidu cloud disk has a limit on the download speed of a single thread, about 100KB, so we use baidu cloud disk download link to test the download speed of multithreading and single thread.
-
Test baidu cloud disk 46M file download speed, their local maximum download speed of 2M
-
Gets the download address of the file
Note: the links obtained from the browser need to be decoded using URLDecode first, otherwise the download will fail, and the download link of Baidu cloud disk file is time-limited, it cannot be downloaded after the expiration, so you need to regenerate the download link
Test single thread download file
The results of the execution can be seen that Baidu cloud on the single-thread download speed is really crazy, 46M file download time: 600s
Test multithreaded download files
In order to squeeze the network speed and find the most appropriate number of threads, we tested the download speed of different threads
The number of threads | Total download time |
---|---|
10 | 60s |
20 | 30s |
30 | 21s |
40 | 15s |
50 | 13s |
Based on the results of the tests, it is appropriate to set the number of threads to about 30 for your environment
Write to the end (pay attention, don’t get lost)
There may be more or less deficiencies and mistakes in the article, suggestions or opinions are welcome to comment and exchange.
Finally, the painstaking creation, please do not white whoring oh, I hope friends can click on the comment to pay attention to three, because these are all the power source I share 🙏
How to realize file breakpoint continuation, welcome to speak your ideas in the comments area.
This article is purely for learning purposes