Java performance optimization principles: Code performance, memory reclamation, application configuration (the main reason for Java programs is garbage collection)

Code layer optimization: Avoid excessive loop nesting, calls, and complex logic



Tuning content:

1. Increase the maximum connection number

2. Adjust the working mode

3. Enable gzip compression

4. Resize the JVM memory

As a Web server, integrate with Apache or Nginx

6. Choose a proper garbage collection algorithm

7. Use newer JDK versions whenever possible



Production configuration example:


< Connectorport="8080"protocol="org.apache.coyote.http11.Http11NioProtocol"
               maxThreads="1000"
               minSpareThreads="100"
               maxSpareThreads="200"
               acceptCount="900"
               disableUploadTimeout="true"
              connectionTimeout="20000"
               URIEncoding="UTF-8"
               enableLookups="false"
               redirectPort="8443"
               compression="on"
              compressionMinSize="1024"
              compressableMimeType="text/html,text/xml,text/css,text/javascript"/>
Copy the code

Parameter Description:

Org. Apache. Coyote. Http11. Http11NioProtocol: adjust the working mode of Nio

There are three working modes: Bio, Nio, and Apr

Blocking I/O (Bio) : the default working mode is Blocking I/O, which is not processed by any optimization technology and has low performance.

Nio(New I/O or non-blocking) : Non-blocking I/O operations. Bio provides better concurrent processing performance.

Apr(Apache Portable Runtime) : The preferred mode of operation, mainly for upper-layer applications to provide a low-level support interface library that can be used across multiple operating system platforms.

Tomcat leveraging the AP-based tomcat-Native library for operating system level control, providing an optimization technique and non-blocking I/O operations that significantly improve concurrent processing capabilities. However, apr and tomcat-Native libraries need to be installed.

Knowledge of network I/O model:

Blocking I/O model: When an application makes a recV system call, if the data it is waiting to operate on is not sent to the kernel buffer, the application will block and cannot receive other requests. On the other hand, the kernel has data in the recV-side buffer, and the kernel copies the data into user space to unblock and continue processing the next request. (Kernel space (buffer)– user space (system call))

Non-blocking I/O model: application process set non-blocking mode, if you want to operation data is not sent to the kernel buffer, recv system call returns an error, the application process using polling mode constantly check whether this action in place, if there is data in the buffer is returned, I/O operation does not block the application process, at the same time period will continue to process a new request.

I/O reuse model: Blocking occurs on select/poll system calls, not on actual I/O system calls. The select/epoll function can process multiple operations at the same time and check whether the operation is ready. When the select/epoll function finds that data is ready, it copies the data to the buffer of the application process through the actual I/O operation.

Asynchronous I/O model: An application process notifies the kernel to start an asynchronous I/O operation, and the kernel notifies the application process after the entire operation (including the data replication buffer) is complete, during which new requests continue to be processed.

I/O operations are divided into two phases: the first phase waits for data to become available, and the second phase copies data from the kernel to user space.

The differences between the first three models are as follows: Phase-1 blocking I/O blocking on I/O operations, non-blocking I/O polling, and I/O multiplexing blocking on SELECT /poll or epoll. The second phase is the same. Neither phase of asynchronous I/O blocks the process.




MaxThreads: Maximum number of threads. The default value is 150. Increasing the value prevents too many queue requests, resulting in slow response.

MinSpareThreads: Minimum number of idle threads.

MaxSpareThreads: The maximum number of idle threads. If this value is exceeded, useless threads will be closed.

AcceptCount: When this value is exceeded, subsequent requests are queued to wait.

DisableUploadTimeout: Disables the upload timeout period

ConnectionTimeout: indicates that the connection times out, in milliseconds. 0 indicates that there is no limit

URIEncoding: THE URI address encoding is UTF-8

EnableLookups: Disables DNS resolution to improve the response time

Compression: enables the compression function

CompressionMinSize: indicates the minimum compression size, in bytes

CompressableMimeType: indicates the type of the compressed file

 

Resize JVM memory:

Add JAVA_OPTS=’ -xms512m -XMx1024m -xx :PermSize=128m -xx :MaxPermSize=256m’ to catalina.sh

-Xms Specifies the minimum heap memory of the JVM. The default heap memory is 1/64 of the physical memory.

-Xmx Maximum heap memory size allowed by the JVM. The default value is 1/4 physical memory

-xx :PermSize Specifies the initial non-heap memory size allocated by the JVM

-xx :MaxPermSize Maximum non-heap memory that can be allocated by the JVM

Heap and non-heap memory knowledge:

Heap memory is storage objects, such as class instances, arrays, and so on. Memory outside the heap in the JVM is called non-heap memory. It is understood that non-heap memory is reserved for the JVM’s own use.

Objects are stored in the heap, and primitive data types and references to objects in the heap are stored in the stack. In the stack, each object corresponds to a 4byte reference.

The stack is the unit of runtime and the heap is the unit of storage.

 

Gzip compression function: save server traffic and improve website access speed. After the client requests resources from the server, the server compresses the resource file and returns it to the client. The browser on the client decompresses the resource file and browses it.


Blog: lizhenliang.blog.51cto.com


Garbage collection involves knowledge:

Garbage Collection algorithm:

1. Mark-sweep

There are two stages, marking and clearing. The collector marks all referenced objects starting at the root node, marks available objects, and then performs cleanup on unmarked objects. The reclaimed space is discontinuous. The downside is that the entire application is suspended and, at the same time, fragmentation is generated.

2. copying algorithms

Divide the memory into two pieces, use one piece at a time, and in garbage collection, copy the marked objects to the other piece, and then completely clean up the used piece of memory. The copied space is continuous. The downside is that you need two pieces of memory.

3. Mark-compact algorithm

Combine the advantages of two algorithms: mark-clear and copy. There are also two phases. The first phase marks all referenced objects from the root node, and the second phase traverses the heap, removing unmarked objects and squeezing the surviving objects into one of the blocks of the heap, in order. This method avoids the mark-clean fragmentation problem and the space problem of the copy algorithm.

Garbage collector techniques:

1. Serial collection

Using a single thread to process all garbage collection work is easy to implement and efficient, but can not use the advantages of multi-processor, so this is suitable for a single-processor server, the data volume (about 100M) is relatively small and the response time is not required.

2. Parallel collection

The use of multithreading garbage collection, fast and efficient. In theory, the more processors, the better performance. Suitable for scenarios with a large amount of data and no requirement on response time.

3. Concurrent Collection (CMS)

The first two need to suspend the entire application while garbage collection is going on, depending on the size of the heap. Multiple threads are used to scan the heap, mark objects that need to be reclaimed, and then clean up marked objects, suspending applications in some cases. Suitable for scenarios with large data volume, multiple processors, and high response time requirements.

4.        G1 GC

Divide multiple memory into separate regions and then garbage collect them. After freeing memory, G1 can also compress free heap memory.


The choice of garbage collector depends on the application scenario, hardware resources, and throughput. CMS is generally used.

Specify garbage collection technology (catalina.sh JAVA_OPS specify) :

-xx :+UseSerialGC Serial garbage collector

-xx :+UseParallelGC Parallel garbage collector

-xx :+UseConcMarkSweepGC concurrent tag scan garbage collector

-xx :ParallelCMSThreads= Concurrent marker scanning garbage collector = for the number of threads used

Print garbage collection information:

-XX:+PrintGC

-XX:+PrintGCDetails

-XX:+PrintGCTimeStamps

-Xloggc:filename


Use Apache to integrate with Tomcat, because Tomcat is far less capable of handling static files than Apache, so let Apache handle static files, Tomcat handles dynamic JSP files, can effectively improve processing speed. There’s also a question, how do YOU save sessions?

TomcatSessionID persistence:

Session stickiness: Bind the SessionID with the browser Cookie and allocate the same Session request to the same Tomcat in sticky mode.

Session replication: Tomcat broadcasts sessions to other Tomcat nodes. In Linux, the broadcast address must be manually enabled. Not easy to have too many back-end nodes

Session Saves database (memcache, redis) : Stores the SessionID in a shared database.

 

Memory overflow error messages often appear, why?

A memory overflow is thrown if both conditions are met: 98% of the JVM’s time is spent in memory reclamation and less than 2% of memory is reclaimed each time

1. Create too many objects and do not release them in time, resulting in insufficient heap memory (the size of heap memory is limited by physical memory and virtual memory).

2. The code design is not reasonable, and the occupied memory cannot be recycled in time.