Article source official Account:IT ranch
There is a requirement to transfer 10 photos from the front end, and then compress them into a compressed package through the network for transmission and output after processing by the back end. Before no contact with Java compression files, so directly to the Internet to find an example to change the use, change after can also use, but with the front end of the size of the picture is getting bigger and bigger, the cost of time is also sharply increased, finally tested compress 20M files actually need 30 seconds of time. The code for the compressed file is as follows.
public static void zipFileNoBuffer(a) {
File zipFile = new File(ZIP_FILE);
try (ZipOutputStream zipOut = new ZipOutputStream(new FileOutputStream(zipFile))) {
// Start time
long beginTime = System.currentTimeMillis();
for (int i = 0; i < 10; i++) {
try (InputStream input = new FileInputStream(JPG_FILE)) {
zipOut.putNextEntry(new ZipEntry(FILE_NAME + i));
int temp = 0;
while((temp = input.read()) ! = -1) {
zipOut.write(temp);
}
}
}
printInfo(beginTime);
} catch(Exception e) { e.printStackTrace(); }}Copy the code
Here we take a 2M image and loop it ten times. The printed result is as follows, with a time of about 30 seconds.
fileSize:20M
consum time: 29599
Copy the code
First optimization process – from 30 seconds to 2 seconds
The first thing that comes to mind for optimization is to make use of the buffer BufferInputStream. The read() method in FileInputStream reads only one byte at a time. There are also instructions in the source code.
/**
* Reads a byte of data from this input stream. This method blocks
* if no input is yet available.
*
* @return the next byte of data, or <code>-1</code> if the end of the
* file is reached.
* @exception IOException if an I/O error occurs.
*/
public native int read(a) throws IOException;
Copy the code
This is a call to a native method that interacts with the native operating system to read data from disk. It is time consuming to call native methods to interact with the operating system every time a byte of data is read. For example, we now have 30,000 bytes of data. Using FileInputStream requires 30,000 calls to local methods to get the data, whereas using buffers (assuming the initial buffer size is large enough to hold 30,000 bytes of data) requires only one call. This is because the buffer reads data directly from disk into memory the first time it calls the read() method. Then, byte by byte, slowly return.
BufferedInputStream encapsulates an array of bytes to hold data. The default size is 8192
The optimized code looks like this
public static void zipFileBuffer(a) {
File zipFile = new File(ZIP_FILE);
try (ZipOutputStream zipOut = new ZipOutputStream(new FileOutputStream(zipFile));
BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(zipOut)) {
// Start time
long beginTime = System.currentTimeMillis();
for (int i = 0; i < 10; i++) {
try (BufferedInputStream bufferedInputStream = new BufferedInputStream(new FileInputStream(JPG_FILE))) {
zipOut.putNextEntry(new ZipEntry(FILE_NAME + i));
int temp = 0;
while((temp = bufferedInputStream.read()) ! = -1) {
bufferedOutputStream.write(temp);
}
}
}
printInfo(beginTime);
} catch(Exception e) { e.printStackTrace(); }}Copy the code
The output
------Buffer
fileSize:20M
consum time: 1808
Copy the code
You can see how much efficiency has improved since the first time you used FileInputStream
Second optimization process – from 2 seconds to 1 second
Using the buffer buffer already meets my requirements, but in order to apply what I have learned, I decided to optimize it with the knowledge in NIO.
The use of the Channel
Why Channel? Because there are new channels and ByteBuffers in NIO. Because their structure is more consistent with the way the operating system performs I/O, their speed is significantly higher than that of traditional IO. A Channel is like a mine containing a coal mine, and ByteBuffer is the truck that delivers the mine. That is, all our interactions with data are with ByteBuffer.
There are three classes in NIO that can produce a FileChannel. FileInputStream, FileOutputStream, and RandomAccessFile, which can be both read and write.
The source code is as follows
public static void zipFileChannel(a) {
// Start time
long beginTime = System.currentTimeMillis();
File zipFile = new File(ZIP_FILE);
try (ZipOutputStream zipOut = new ZipOutputStream(new FileOutputStream(zipFile));
WritableByteChannel writableByteChannel = Channels.newChannel(zipOut)) {
for (int i = 0; i < 10; i++) {
try (FileChannel fileChannel = new FileInputStream(JPG_FILE).getChannel()) {
zipOut.putNextEntry(new ZipEntry(i + SUFFIX_FILE));
fileChannel.transferTo(0, FILE_SIZE, writableByteChannel);
}
}
printInfo(beginTime);
} catch(Exception e) { e.printStackTrace(); }}Copy the code
We can see that instead of using ByteBuffer for data transfer, the transferTo method is used. The method is to connect the two channels directly.
This method is potentially much more efficient than a simple loop
* that reads from this channel and writes to the target channel. Many
* operating systems can transfer bytes directly from the filesystem cache
* to the target channel without actually copying them.
Copy the code
This is the description of the source code, which basically means that the efficiency of using transferTo is better than looping through a Channel and then writing to another Channel. The operating system can transfer bytes directly from the file system cache to the target Channel without the actual copy stage.
The copy phase is a transition from kernel space to user space
You can see that the speed has improved somewhat compared to using buffers.
------Channel
fileSize:20M
consum time: 1416
Copy the code
Kernel space and user space
So why is the transition from kernel space to user space slow? The first thing we need to understand is what kernel space and user space are. In common operating systems, in order to protect the core resources in the system, so the system is designed into four areas, the greater the permissions, so Ring0 is called the kernel space, used to access some key resources. Ring3 is called user space.
image
User-state, kernel-state: A thread in kernel space is called kernel-state, and a thread in user-space is called user-state
So what if the application (which is in user mode) needs to access the core resources? Then you need to call the interface exposed in the kernel to make the call, called system call. For example, our application needs to access files on disk. The application calls the open method of the system call interface, and the kernel accesses the files on disk and returns the contents of the files to the application. The general process is as follows
image
Direct and indirect buffers
Since we’re trying to read files from a disk, that’s a lot of trouble. Is there an easy way for our applications to manipulate disk files directly, without kernel transfer? Yes, that is to create a direct buffer.
-
Indirect buffers: An indirect buffer is what we have described above as a kernel state acting as a middleman, requiring the kernel to act as a mediator each time.
image
-
Direct buffer: The direct buffer does not require kernel space to transfer copy data. Instead, it directly requests a space in physical memory that is mapped to the kernel address space and user address space. Data access between applications and disks interacts through this directly requested physical memory.
image
Since direct buffers are so fast, why don’t we all use direct buffers? In fact, direct buffers have the following disadvantages. Disadvantages of direct buffers:
- unsafe
- It consumes more because it does not make space directly in the JVM. The collection of this part of memory can only depend on the garbage collection mechanism, garbage collection is not under our control.
- Data is written to the physical memory buffer, and the program loses control of that data, meaning that when data is finally written to the slave disk is up to the operating system, not the application.
To sum up, we use the transferTo method to create a direct buffer directly. So the performance is much better than that
Use memory-mapped files
Another new feature in NIO is memory-mapped files. Why are memory-mapped files so fast? In fact, the reason is the same as mentioned above, is to create a direct buffer in memory. Interact directly with data. The source code is as follows
//Version 4 Uses Map mapping files
public static void zipFileMap(a) {
// Start time
long beginTime = System.currentTimeMillis();
File zipFile = new File(ZIP_FILE);
try (ZipOutputStream zipOut = new ZipOutputStream(new FileOutputStream(zipFile));
WritableByteChannel writableByteChannel = Channels.newChannel(zipOut)) {
for (int i = 0; i < 10; i++) {
zipOut.putNextEntry(new ZipEntry(i + SUFFIX_FILE));
// Map file in memory
MappedByteBuffer mappedByteBuffer = new RandomAccessFile(JPG_FILE_PATH, "r").getChannel()
.map(FileChannel.MapMode.READ_ONLY, 0, FILE_SIZE);
writableByteChannel.write(mappedByteBuffer);
}
printInfo(beginTime);
} catch(Exception e) { e.printStackTrace(); }}Copy the code
Print the following
---------Map
fileSize:20M
consum time: 1305
Copy the code
You can see that the speed is about the same as using Channel.
Use Pipe
A Java NIO pipe is a one-way data connection between two threads. Pipe has a source channel and a sink channel. Source channel is used to read data, sink channel is used to write data. The writing thread blocks until a reader thread reads data from the channel. If there is no data to read, the reader thread blocks until the writer thread writes. Until the channel is closed.
Whether or not a thread writing bytes to a pipe will block until another
thread reads those bytes
Copy the code
image
This is what I’m looking for. The source code is as follows
//Version 5 uses Pip
public static void zipFilePip(a) {
long beginTime = System.currentTimeMillis();
try(WritableByteChannel out = Channels.newChannel(new FileOutputStream(ZIP_FILE))) {
Pipe pipe = Pipe.open();
// Asynchronous tasks
CompletableFuture.runAsync(()->runTask(pipe));
// Get the read channel
ReadableByteChannel readableByteChannel = pipe.source();
ByteBuffer buffer = ByteBuffer.allocate(((int) FILE_SIZE)*10);
while (readableByteChannel.read(buffer)>= 0) { buffer.flip(); out.write(buffer); buffer.clear(); }}catch (Exception e){
e.printStackTrace();
}
printInfo(beginTime);
}
// Asynchronous tasks
public static void runTask(Pipe pipe) {
try(ZipOutputStream zos = new ZipOutputStream(Channels.newOutputStream(pipe.sink()));
WritableByteChannel out = Channels.newChannel(zos)) {
System.out.println("Begin");
for (int i = 0; i < 10; i++) {
zos.putNextEntry(new ZipEntry(i+SUFFIX_FILE));
FileChannel jpgChannel = new FileInputStream(new File(JPG_FILE_PATH)).getChannel();
jpgChannel.transferTo(0, FILE_SIZE, out); jpgChannel.close(); }}catch(Exception e){ e.printStackTrace(); }}Copy the code
conclusion
- Life is all about learning, and sometimes it’s just a simple optimization that allows you to learn different things in depth. So when you’re learning, not only do you need to know this but you need to know why you’re doing it.
- Unity of knowledge and action: after learning a knowledge, try to apply it once. That’s how you remember it.
The source address
The source address
Refer to the article
- www.jianshu.com/p/f90866dcb…
- Juejin. Cn/post / 684490…
- Interesting talk about Linux operating system
- JAVA NIO direct and indirect buffers