Java 10 has made some special optimizations for Docker. As we all know, Java docker containerization support has always been more awkward, because Docker bottom uses cgroups to carry out process-level isolation, although we set the resource limit of the container through Docker, but the JVM virtual machine actually can not perceive these restrictions. For example, our host machine may be 8-core 16G, and the docker container is limited to 2-core 4G. The resources read from the container may still be 8-core 16G. We may read machine resources to optimize performance, such as setting the number of core threads and the maximum number of threads. This can be a performance drain for some programs running on Docker, but fortunately Java10 has added support for this and there are plans for JDK8 compatibility.
In my recent work, I found that availableProcessors seemed to have a big performance drain while optimizing the program, so I took a closer look at it and did some testing.
What does availableProcessors offer?
/**
* Returns the number of processors available to the Java virtual machine.
*
* <p> This value may change during a particular invocation of the virtual
* machine. Applications that are sensitive to the number of available
* processors should therefore occasionally poll this property and adjust
* their resource usage appropriately. </p>
*
* @return the maximum number of processors available to the virtual
* machine; never smaller than one
* @since 1.4
*/
public native int availableProcessors();
Copy the code
Return the number of cores available to the JVM. It is also followed by a note that this value is subject to change during specific calls to the virtual machine. This function returns the number of cpus on the machine, which should be a constant value. From this point of view, there may be some big misunderstanding. Two questions arise from this:
- What are the number of cores available to the JVM?
- 2. Why is the return value variable? How does it work?
Number of cores available for the JVM
This is easy to understand, as the name implies, the number of CPU cores a JVM can use to work with. On a multi-core CPU server, there may be multiple applications installed, of which the JVM is only one part, and some of the cpus are used by other applications.
Why is the return value variable? How does it work?
Since multiple applications on a multi-core CPU server share the same CPU, the amount available to the JVM is of course different at different times. So how does this work in Java? Through reading jdK8 source code, Linux system and Windows system implementation difference is still relatively large.
Int OS ::active_processor_count() {
// Linux doesn't yet have a (official) notion of processor sets, // so just return the number of online processors. int online_cpus = ::sysconf(_SC_NPROCESSORS_ONLN); assert(online_cpus > 0 && online_cpus <= processor_count(), "sanity check"); return online_cpus; }Copy the code
Linux implementations are lazy and read system parameters directly from sysconf, _SC_NPROCESSORS_ONLN.
Windows implementation int OS ::active_processor_count() {
DWORD_PTR lpProcessAffinityMask = 0;
DWORD_PTR lpSystemAffinityMask = 0;
int proc_count = processor_count();
if (proc_count <= sizeof(UINT_PTR) * BitsPerByte &&
GetProcessAffinityMask(GetCurrentProcess(), &lpProcessAffinityMask, &lpSystemAffinityMask)) {
// Nof active processors is number of bits in process affinity mask
int bitcount = 0;
while(lpProcessAffinityMask ! = 0) { lpProcessAffinityMask = lpProcessAffinityMask & (lpProcessAffinityMask-1); bitcount++; }return bitcount;
} else {
returnproc_count; }}Copy the code
In Windows, the implementation is more complicated. It can be seen that the CPU is not only available but also available to the thread according to CPU affinity. There is a while loop to parse the CPU affinity mask, so this is a CPU intensive operation.
The performance test
From the above analysis, we can basically know that this operation is a CPU sensitive operation, so how does its performance under various operating systems? Here I test some of the performance of this function when it works properly and when the CPU is full. The test data is 1 million calls. The 10 calls are counted and averaged. The relevant codes are as follows:
public class RuntimeDemo {
private static final int EXEC_TIMES = 100_0000;
private static final int TEST_TIME = 10;
public static void main(String[] args) throws Exception{
int[] arr = new int[TEST_TIME];
for(int i = 0; i < TEST_TIME; i++){
long start = System.currentTimeMillis();
for(int j = 0; j < EXEC_TIMES; j++){
Runtime.getRuntime().availableProcessors();
}
long end = System.currentTimeMillis();
arr[i] = (int)(end-start);
}
double avg = Arrays.stream(arr).average().orElse(0);
System.out.println("avg spend time:" + avg + "ms"); }}Copy the code
CPU load code is as follows:
public class CpuIntesive {
private static final int THREAD_COUNT = 16;
public static void main(String[] args) {
for(int i = 0; i < THREAD_COUNT; i++){
new Thread(()->{
long count = 1000_0000_0000L;
long index=0;
long sum = 0;
while(index < count){ sum = sum + index; index++; } }).start(); }}}Copy the code
system | configuration | The test method | The test results |
---|---|---|---|
Windows | 2 nuclear 8 g | normal | 1425.2 ms |
Windows | 2 nuclear 8 g | The CPU capacity | 6113.1 ms |
MacOS | 4 nuclear 8 g | normal | 69.4 ms |
MacOS | 4 nuclear 8 g | The CPU capacity | 322.8 ms |
Although there is a big difference in the configuration of the two machines, the test data is not meaningful, but the following conclusions can be drawn from the test:
- The performance difference between Windows and Linux-like system depends on the implementation
- CPU intensive computation has a significant impact on the performance of this function
- Overall, the performance of this function is acceptable, and the longest one is only 6US under full Windows CPU load. In Linux, the ns level can be reduced.
conclusion
- In your daily work, you don’t need to pay much attention to the performance overhead of calling this function
- If you want to use a variable that is normally defined as static, for CPU-sensitive programs, you can use a cache-like strategy to obtain this value periodically
- Performance problems at work may not be caused by this function, but by other problems
Thank you
- Docker will no longer be embarrassed by Java: Java 10 has made special optimizations for Docker
- CPU affinity in Linux
- JDK8 os_windows. App source code
- JDK8 os_linux. App source code