Running applications with the root account in a server environment is very dangerous, and it is easy to get a shell and become a chicken. So any conscious team would create a normal user with low privileges to run Java programs.

Low access, a little un-son, especially in difficult times when resources are tight.

The phenomenon of

The problem occurred on a common test environment machine, and the formal environment did not recur. Dozens of services are deployed on this server, and the deployment account recently switched from root to BOT.

After running for some time, the server frequently has problems. First, a large number of connections were in CLOSE_WAIT state, which was thought to be a passive closed problem. But it’s not.

netstat -antp | grep CLOSE | awk '{print $7}'  | sort | uniq -c
Copy the code

Strangely enough, you can log in to the system using root or any other account and everything works fine. However, when switching to the BOT account, the following error will be reported:

# sudo su - bot
bash: fork: retry: no child processes
bash: fork: retry: no child processes
bash: fork: retry: no child processes
bash: fork: retry: no child processes
bash: fork: Resource temporarily unavailable
Copy the code

The above is a system-level error message. In this case, the JVM will also report an error, but you won’t have a chance to see it (you can use other system users to see it).

- Cannot create GC thread. Out of system resources  
- java.lang.OutOfMemoryError: unable to create new native thread
Copy the code

why

The cause is insufficient resources, specifically process resources.

A Linux thread is actually a process, and so is Java. Specifically, it’s called a “light weight process” (LWP) — lightweight process.

LWP shares all (or most) of the logical address space and system resources with other processes, and one process can create multiple LWPS so that they share most of the resources; LWP has its own process identifier and has parent-child relationships with other processes. . LWP is managed by the kernel and scheduled like a normal process

Use the following command to see how many process resources a user is using

ps -eLf | grep bot(uid)  | wc -l
Copy the code

Use the following command to see how many threads are started for each process

ps -o nlwp,pid,lwp,args -u bot(uid)  | sort -n
Copy the code

To solve

According to Linux’s everything is a file rule, the first thing that comes to mind is to change the ulimit parameter, but it’s not, because it’s already big enough. For elasticSearch, you need to configure something called nproc when you install it.

Relevant configuration file: / etc/security/limits. Conf

There are also some minor differences between different kernel versions. For example, files in /etc/security/limits.d/* overwrite the limits.conf configuration at some point. If the configuration does not take effect, check it.

For these reasons, you can comment out all limits. D configurations and configure them in limits.

Here is the original configuration

*          soft    nproc     4096
root       soft    nproc     unlimited
Copy the code

Change 4096 to a larger number, or just change it to Unlimited.

Configure system parameters for ElasticSearch

Now that ES is mentioned, let’s take a look at what system configuration needs to be changed for an ES installation. These experiences are common and can be drawn from by analogy.

www.elastic.co/guide/en/el…

Disable the swap

Swap is a performance killer, so ES can’t stand it anymore and just shut it down.

sudo swapoff -a
Copy the code

This parameter can also be added to the configuration file, and the JVM locks the memory from swapping with the swap partition.

bootstrap.memory_lock: true
Copy the code

Virtual memory

ES uses MMAPFS to map some data, but the default system parameters are too small for it and need to be modified as well.

sysctl -w vm.max_map_count=262144
Copy the code

To take effect permanently, you need to modify /etc/sysctl.conf

File handle

ulimit

Linux has a limited number of open file descriptors. If your application needs to work with many small files at the same time, you need to configure this parameter.

sudo su  
ulimit -n 65536 
su elasticsearch
Copy the code

/etc/security/limits.conf

Ok, this is the file we just changed. To make the above configuration permanent, you need to change this file.

elasticsearch  -  nofile  65536
Copy the code

Number of threads

So, don’t just open a lot of threads, in addition to increasing the scheduling time, but also easy to top the ceiling of the system.

In the Von Neumann architecture, isn’t this software all the same? With the same fate, struggling but unable to escape.

You can follow my B station account →→→→B station account

Study communication groups →→→→ →Communication group