preface

Performance tuning, of course, is a huge topic and an important part of many projects. Performance tuning is notoriously difficult to do because there are so many different aspects of performance tuning. Here I will outline the four most common types of tuning used in enterprises — JVM tuning, MySQL tuning, Nginx tuning, and Tomcat tuning. A point of view, what said wrong also please forgive more to add.

Due to the limited space, some things cannot be written, so this article just picks up some important parts to analyze, if you need a complete and detailed grasp of performance tuning, you can get performance tuning notes and related learning materials organized by the system

Don’t say much, sit steady and hold good, start!

Jvm performance tuning

1, the JVM class loading mechanism detailed

As shown in the figure below, the JVM class loading mechanism is divided into five parts: load, validate, prepare, parse, and initialize. Let’s take a look at each of these five processes.

1.1 loading

During the load phase, the virtual machine needs to do three things:

1) Get the binary byte stream that defines a class through its fully qualified name. Note that the binary byte stream does not have to be obtained from a Class file. It can be read from a ZIP package (such as a JAR or WAR), from a network, or computed at run time (dynamic proxy). It can also be generated from other files (such as converting JSP files to corresponding Class classes).

2) Convert the static storage structure represented by the byte stream into the runtime data structure of the method area.

3) Generate a Java.lang. Class object representing the Class in the Java heap, and use it as the access point for the data in the method area.

Compared to the other stages of the class loading process, the loading phase (specifically, the action of obtaining the binary stream of a class during the loading phase) is the most controllable phase of development, because the loading phase can be completed using either system-provided class loaders or user-defined class loaders. Developers can control how the byte stream is obtained by defining their own class loaders.

1.2 validation

The main purpose of this stage is to ensure that the information contained in the byte stream of the Class file meets the requirements of the current VIRTUAL machine and does not compromise the security of the virtual machine.

1.3 to prepare

The preparation phase is the phase where memory is formally allocated for class variables and initial values are set for class variables, that is, the memory space used to allocate these variables in the method area. Note the concept of initial values, such as a class variable defined as:

public static int v = 8080;
Copy the code

In fact, the initial value of v after the preparation phase is 0, not 8080. The putStatic instruction that assigns v to 8080 is stored in the class constructor

method after the program is compiled, which we will explain later.

But note that if declared as:

public static final int v = 8080;
Copy the code

The ConstantValue attribute is generated for v at compile time, and the vm assigns v to 8080 based on the ConstantValue attribute at prepare time.

1.4 analytical

During the resolution phase, the VIRTUAL machine replaces the symbolic reference in the constant pool with a direct reference. The symbolic reference is in the class file:

CONSTANT_Class_info
CONSTANT_Field_info
CONSTANT_Method_info

And so on.

Let’s explain the concepts of symbolic and direct references:

Symbolic references are independent of the layout of the virtual machine implementation, and the target of the reference does not have to be loaded into memory. The memory layout of the various virtual machine implementations can vary, but the symbolic references they accept must be consistent because the literal form of symbolic references is clearly defined in the Class file format of the Java virtual Machine specification.
A direct reference can be a pointer to the target, a relative offset, or a handle that indirectly locates the target. If there is a direct reference, the target of the reference must already exist in memory.

1.5 the initialization

The initialization phase is the last class loading phase, and after the previous class loading phase, all operations are dominated by the JVM, except that the class loader can be customized during the load phase. At the inception stage, the actual execution of the Java program code defined in the class begins.

The initialization phase is the execution of the class constructor < Clint > method. The < Clint > method is a combination of assignments to class variables in the class that the compiler automatically collects and statements in static statement blocks. The virtual machine ensures that the < Clint > method of the parent class has been executed before the < Clint > method executes. P.s: the compiler may not generate the < Clint >() method for a class that has no static variable assignment and no static statement block.

Note that class initialization is not performed in the following cases:

A reference to a static field of a parent class by a child class triggers initialization of only the parent class, not the child class.
Defines an array of objects that does not trigger initialization of the class.
Constants are stored in the constant pool of the calling class at compile time. There is no direct reference to the class in which the constant is defined, and it does not trigger the class in which the constant is defined.
Getting a Class object by its name does not trigger Class initialization.
If the initialize parameter is set to false when loading a Class using class.forname, the initialization of the Class will not be triggered. This parameter tells the VM whether to initialize the Class or not.
The default loadClass method of the ClassLoader does not trigger the initialization action.

1.6 Class loaders

The virtual machine design team implemented the “get a binary stream describing a class by its fully qualified name” action in the class loading phase outside the Java Virtual Machine, allowing the application to decide how to get the required class. The code module that implements this action is called a “classloader.”

For any class, its uniqueness in the Java virtual Machine needs to be established both by the class loader that loads it and by the class itself. To put it more generally, comparing two classes to be “equal” only makes sense if they are loaded by the same Class loader. Otherwise, even if they come from the same Class file, they are not equal as long as they are loaded by different Class loaders. Equals (), isAssignableFrom(), isInstance(), and instanceof (). In some cases, the effects of class loaders can be misleading if not noticed.

The JVM provides three kinds of loaders:

Start the class loader (Bootstrap ClassLoader) : is responsible for loading the JAVA_HOME\lib directory, or through-XbootclasspathParameter specifies the class in the path that is recognized by the VIRTUAL machine (by file name, such as rt.jar).
Extension class loader (Extension ClassLoader) : Is responsible for loading libraries in the JAVA_HOME\lib\ext directory, or in the path specified by the java.ext.dirs system variable.
Application class loaders (Application ClassLoader) : Is responsible for loading libraries on the user path (classpath).

The JVM uses the parent delegate model to load classes, but we can also implement custom class loaders by inheriting from the Java.lang. ClassLoader.

The parent delegate model requires that all class loaders, except the top-level startup class loader, have their own parent class loaders. The parent-child relationship between classloaders is not usually achieved by Inheritance, but by Composition, to duplicate the code of the parent.

The parent delegation model works as follows: If a classloader receives a classloader request, it will not first attempt to load the class itself. Instead, it will delegate the request to the parent classloader. This is the case for classloaders at each level, so all loading requests should eventually be passed to the top-level startup classloader. Only when the parent responds that it cannot complete the load request (it does not find the required class in its search scope) will the child loader attempt to load it itself.

One advantage of parent delegation is that, for example, if you load the java.lang.Object class in the rt.jar package, whichever loader loads the class will eventually delegate the loading to the top-level bootstrap classloader. This ensures that different classloaders will all end up with the same Object.

In some cases, it may be necessary to implement a classloader ourselves, and since the scope of this is extensive, I think it will be covered in a separate article, but let’s take a look at it for a moment. Let’s look directly at the source code implementation of the ClassLoader in the JDK:

protected synchronized Class<? > loadClass(String name, boolean resolve) throws ClassNotFoundException { // First, check if the class has already been loaded Class c = findLoadedClass(name); if (c == null) { try { if (parent ! = null) { c = parent.loadClass(name, false); } else { c = findBootstrapClass0(name); } } catch (ClassNotFoundException e) { // If still not found, then invoke findClass in order // to find the class. c = findClass(name); } } if (resolve) { resolveClass(c); } return c; }Copy the code

First of all byClass c = findLoadedClass(name);Determines whether a class has been loaded.
If it has not been loadedif (c == null)The program in, following the parent delegate model, will first recursively start from the parent loader until the parent class loader isBootstrap ClassLoaderSo far.
According to theresolveTo determine whether the class needs to be resolved.

While the above implementation of findClass() throws an exception directly and the method is protected, this is obviously left to the developers to implement. We’ll write a separate article on how to override the findClass method to implement our own classloader.

protected Class<? > findClass(String name) throws ClassNotFoundException { throw new ClassNotFoundException(name); }Copy the code

2. JVM memory model

2.1 Functions of each part

These several storage areas are the stack area and the heap area, so what is the stack and what is the heap? To put it simply, the stack holds basic data types and references, while the heap holds various object instances.

Why is the separate heap and stack design?

The stack stores the processing logic and the heap stores the concrete data, making the isolation design clearer
The heap is separated from the stack so that the heap can be shared by multiple stacks.
The stack holds context information and can only grow upwards; The heap is dynamically allocated

Stack size can be set by – XSs, if not, will cause the Java lang. StackOverflowError abnormalities

The stack area

Threads are private and have the same life cycle as threads. When each method executes, a stack frame is created to hold local variable tables, operation stacks, dynamic links, and method exits.

The heap

This is where the object instance is stored, where all of the object’s memory is allocated. That’s where garbage collection comes in.

Heaped memory is specified by -xms and defaults to 1/64 of physical memory. The maximum memory is specified by -xmx and defaults to 1/4 of the physical memory.
When the default free heap memory is less than 40%, it will increase until the memory set by -xmx. The specific ratio can be specified by -xx :MinHeapFreeRatio
When free memory is greater than 70%, memory is reduced up to the size set by -xMS. The value is specified by -xx :MaxHeapFreeRatio.

Therefore, it is generally recommended to set both parameters to the same size to prevent the JVM from constantly resizing.

2.2 Program counter

This is a record of the line number of bytecode executed by the thread, which is relied on for branches, loops, jumps, exceptions, thread recovery, and so on.

2.3 method area

Type information, field information, method information and other information

2.4 summarize

3, garbage collection mechanism detailed explanation

3.1 How to Define Garbage

There are two ways to do this: one is to count references (but it doesn’t solve the circular reference problem); The other is accessibility analysis.

To determine whether an object can be reclaimed:

The display sets a reference to NULL or to another object
The object to which a local reference points
A weak reference to the associated object

3.2 Methods of Garbage Collection

3.2.1 Mark-sweep algorithm

This approach has the advantage of reducing pause time, but has the disadvantage of memory fragmentation.

3.2.2 Copying algorithm

This method does not involve deleting objects, but merely copying the available objects from one place to another, so it is suitable for scenarios where a large number of objects are reclaimed, such as the new generation of reclamation.

3.2.3 mark-compact The mark-compact algorithm

This approach solves the memory fragmentation problem, but increases pause time.

3.2.4 Generational Collection

This last approach, which is a combination of the previous ones, is the main approach taken by JVMS today, and the idea is to divide the JVM into different regions. Each region uses a different garbage collection method.

Above, you can see that the heap is divided into two regions:

Young Generation: Stores newly created objects. The objects are copied between S0 and S1 for a certain number of times and then transferred to the aged Generation. The garbage collection here is called minor GC;
Old Generation: These objects are collected less frequently, using the tag sorting method, here the garbage collection is called major GC.

Here can be described in detail about the new generation of replication and recycling algorithm process:

In the Cenozoic, there are three zones: Eden, from Survivor, and to survior.

When the minor GC is triggered, the surviving objects in Eden are first copied to Survivor;
And then from survivor, and if that’s enough for the tenured generation, copy it to the tenured generation; If not, copy to to survivor, if to survivor is full, copy to the tenured generation.
Then swap the names from survivor and to survivor, ensuring that every time TO survivor is empty, the wait object is copied there.

3.3 Garbage collector

3.3.1 Serial collector

This collector collects in a single-threaded manner, and no other threads work while garbage is collected.

3.3.2 Parallel collector Parallel

Collect in a multi-threaded manner

3.3.3 Concurrent Mark Sweep Collector, CMS

The general process is: initial mark – concurrent mark – re-mark – concurrent clear

3.3.4 G1 Collector Garbage First Collector

The general process is: initial tag — concurrent tag — final tag — filter reclaim

Due to space limitations, I won’t write about bytecodes-like files, tuning tools, and GC log analysis here. If you are interested, you can click hereGet my complete JVM performance tuning notes”Will be described in detail.

2. Mysql performance tuning

1. Detailed explanation of SQL execution principle

1.1 Components of SQL Server

1.1.1 Relational engine: Optimizes and executes queries.

Contains three major components:

(1) Command parser: check syntax and transform query tree.

(2) Query executor: optimize queries.

(3) Query optimizer: responsible for executing queries.

1.1.2 Storage Engine: Manages all data and related I/OS

Contains three major components:

(1) Transaction manager: Manages data and maintains ACID properties of transactions through locks.

(2) Data access method: processing I/O requests for row, index, page, row version, space allocation, etc.

(3) Buffer manager: Manages the Buffer Pool, the main memory consuming component of SQL Server.

1.1.3 Buffer Pool

Contains all caches of SQL Server. Such as schedule caching and data caching.

1.1.4 Transaction logs

Record all changes to the transaction. An important component of the ACID property of a transaction is guaranteed.

1.1.5 Data files

The physical storage file of the database.

6. The SQL Server network interface is the protocol layer of the network connection between the client and the Server

1.2 Underlying Principles of Query

1.2.1 When a client executes a T-SQL statement to the SQL Server Server, the statement first reaches the network interface of the Server. There is a protocol layer between the network interface and the client.

1.2.2 Establishing a connection between the Client and the network interface. The communication data is formatted using a Microsoft communication format called table Format Data Flow (TDS) packets.

1.2.3 The Client sends the TDS packet to the protocol layer. After receiving the TDS package, the protocol layer decompresses and analyzes what requests the package contains.

1.2.4 Command the Parser to parse t-SQL statements. The command parser does several things:

(1) Check the grammar. Syntax errors are found and returned to the client. The following steps are not performed.

(2) Check whether there is a cache of execution plans for the T-SQL statement in the Buffer Pool.

(3) If the cached execution plan is found, it is directly read from the execution plan cache and transmitted to the query executor for execution.

(4) If no execution plan cache is found, the execution plan is optimized in the query executor and generated, and stored in the Buffer Pool.

1.2.5 Query Optimizer Optimizes SQL statements

If no execution plan for the SQL statement exists in the Buffer Pool, the SQL statement needs to be passed to the query optimizer, which analyzes the SQL statement and generates one or more candidate execution plans through certain algorithms. The plan with the least overhead is selected as the final execution plan. The execution plan is then passed to the query executor.

1.2.6 Query Executor Executes queries

The query executor passes the execution plan to the data access method of the storage engine through the OLE DB interface.

1.2.7 Data access methods generate execution code

The data access method executes the code that plans to generate the SQL Server operable data, but does not actually execute the code and passes it to the buffer manager for execution.

1.2.8 Buffer manager reads data.

The data is checked in the buffer pool’s data cache, and if so, the result is returned to the storage engine’s data access method. If it does not exist, the data is read from the disk (data file) and put into the data cache, and then returned to the data access method of the storage engine.

1.2.9 For read data, a shared lock is applied and the transaction manager allocates the shared lock to the read operation.

1.2.10 Data Access Methods of the Storage Engine Return the query results to the query executor of the relational engine.

1.2.11 The query Executor returns the result to the protocol layer.

1.2.12 The protocol layer encapsulates data into TDS packets, and then the protocol layer passes the TDS packets to the client.

2, index underlying analysis

2.1 Why indexes?

In a typical application system, the read-write ratio is around 10:1, and insert operations and general update operations rarely cause performance problems. In a production environment, the most complicated query operations we encounter are the most likely to cause problems. Therefore, query optimization is obviously a top priority. When it comes to speeding up queries, there’s no alternative to indexing.

2.2 What is an index?

An index, also known as a “key” or “key” in MySQL (primary key, unique key, and index key), is a data structure used by the storage engine to quickly find records. Indexes are critical for good performance, especially as the amount of data in the table increases, and they become increasingly important for performance, reducing I/O times and speeding up queries. Select * from unique key where primary key is not null and index key is unique; select * from unique key where primary key is null and index key is unique; select * from unique key where index key is unique; select * from unique key where primary key is null and index key is unique; select * from unique key where index key is unique; select * from unique key where index key is unique;

Index optimization is probably the most effective way to optimize query performance. Indexes can easily improve query performance by several orders of magnitude. An index is the phonetic list of a dictionary. If you want to look up a word, if you don’t use the phonetic list, you need to look up one page from hundreds of pages.

Note: Once an index has been created for the table, it is best for future queries to look at the index first and then use the results of the index location to find the data

2.3 Index Principle

The purpose of an index is to make queries more efficient, in the same way that we use a table of contents to look up a book: first navigate to a chapter, then to a section under that chapter, then to the number of pages. Similar examples still have: look up the dictionary, look up the train number, the plane flight and so on, the following content does not understand the classmate also does not matter, can understand the reason of this catalogue on the line. So you think, does the contents of the book take up pages, does this page have to be stored on the hard drive, takes up space on the hard drive.

You think again, you first in the absence of data directory indexed or fast, or already exist a lot of data, and then to build index, which is fast, is certainly no data quickly, because if you have a lot of data, to build index based on these data, you are to all data traversing again, and then according to the data is indexed. You think again, index established then add data quickly, or to add data quickly when no index, the index is used to do, is used to accelerate query, what effect will the write data to you, affirmation is slower, because whenever you add new data, all need to index or a storyteller, a directory to do so although the index will accelerate query, However, it will reduce the efficiency of writing.

2.4 Data structure of the index

I’ve talked about the basics of indexes, the complexity of databases, and the operating system, just to get you started. Now let’s look at how indexes can reduce IO and speed up queries. Any kind of data structure is not without foundation generation, will have its background and usage scenarios, we summarize, now we need to what could be done this kind of data structure, actually very simple, that is: each time to find the data to disk I/o control in a small number of orders of magnitude, it is best to constant orders of magnitude. So we asked ourselves if a highly controlled multipath search tree would suffice? And so the B plus tree was born.

Mysql lock mechanism and transaction isolation level

3.1 Why do I need to Learn Database Locking

Even if we don’t know these locks, our program will generally run fine. Because the database implicitly adds these locks for us

forUPDATE, DELETE, and INSERTStatements,InnoDBwillautomaticAdd an exclusive lock (X) to the data set involved
MyISAMThe query statement is executingSELECTBefore,automaticAdd to all tables involvedRead lock, during the update operation (UPDATE, DELETE, and INSERT), etcautomaticAdd to the tables involvedWrite lock

Manual locking is only required in certain scenarios. The database locking knowledge is to:

Something we can use in certain situations
Better control over the programs you write
When talking to someone about database technology, you can catch up
Build your own knowledge base system! It’s true in interviews

3.2 Introduction to Table Locking

Firstly, from the granularity of locks, we can divide them into two categories:

Table locks
- Small overhead, fast locking; Deadlocks do not occur; The locking force is large, the lock conflict probability is high, and the concurrency is the lowest
Row locks
- High overhead, slow lock; Deadlocks occur; The locking granularity is small, the probability of lock conflict is low, and the concurrency is high

Different storage engines support different lock granularity:

InnoDB supports both row locks and table locks!
MyISAM only supports table locking!

InnoDB uses row-level locks only if the data is retrieved by index criteria; otherwise, InnoDB uses table locks

In other words, InnoDB’s row locks are indexed!

Table locking is divided into two modes:

Table Read Lock Table Read Lock
Table Write Lock
Table read locks and table write locksRead not block, read and write block, write block!
- Read not blocking: The current user is reading data, and other users are also reading data, without locking
- Read/write blocking: the current user is reading data, other users cannot modify the data read by the current user, will be locked!
- Write block: the current user is modifying the data, other users can not modify the current user is modifying the data, will be locked!
- Write locks are compatible with other locks. Only read locks are compatible with each other

As you can see from above, read locks and write locks are mutually exclusive, and read and write operations are serial.

If one process wants to acquire a read lock while another process wants to acquire a write lock. In mysql, write locks take precedence over read locks!
Write and read lock priorities can be adjusted with the following parameters:max_write_lock_countandlow-priority-updates

It is worth noting that:

MyISAM canSupporting query and insert operationsconcurrentCarry on. You can do it through system variablesconcurrent_insertTo specify which mode, inMyISAMMyISAM allows one process to read the table while another process is reading from it, if there are no holes in the tablefooterInsert record.
But InnoDB storage engine is not supported!

3.3 Isolation level of MVCC and transaction

Database transactions have different isolation levels, different isolation levels use locks differently, and lock application ultimately results in different transaction isolation levels

Multi-version Concurrency Control (MVCC) can be simply considered: MVCC is a variant (upgrade) of row level locking.

The transaction isolation level is implemented through locking, but the locking details are hidden

In table locking, we block reads and writes. In order to improve concurrency performance, MVCC generally does not block reads and writes (so MVCC avoids locking in many cases).

MVCC implements read and write non-blocking just as the name suggests: multi-version concurrency control — a mechanism to generate a Snapshot of the data at the point in time of the data request and use that Snapshot to provide consistent reads at a certain level (statement level or transaction level). From the user’s perspective, it is as if a database can provide multiple versions of the same data.

There are two levels of snapshot:

statement-level

At the Read COMMITTED isolation level

Transaction level

In terms of Repeatable Read isolation level, we already know that there are four isolation levels of transactions when we first learn:

Read uncommitted

There will be dirty reads, unrepeatable reads, magic reads

Read committed

There will be non-repeatable reads, unreal reads

Repeatable read

Magic reads will occur (but not in Mysql’s Repeatable read with GAP lock!)

Serializable

Serial, avoid the above situation!

Dirty Read: one transaction reads uncommitted data from another transaction

Example: A transfers money to B. A executes the transfer statement, but A hasn’t committed the transaction yet. B reads the data and finds that he has more money in his account! B says to A, I have received the money. Rollback rollback rollback rollback rollback rollback rollback rollback rollback rollback rollback rollback
The essence of dirty reads is that the lock is released immediately after the operation (modification) on the data, causing the data to become useless or incorrect.

The way to avoid Read COMMITTED is simple:

The lock release position is adjusted to after the transaction commit, at this time, other processes can not read the data in the row, including any operation before the transaction commit

Read committed: one transaction reads data that has already been committed by another transaction. This means that one transaction can see changes made by other transactions

Note: A queries the database to get the data, and B revises the database data, resulting in the results of A’s queries to the database are different [harm: each query result of A is affected by B, so the information queried by A is meaningless]

Read COMMITTED is a snapshot committed to the statement level. Each read is the latest version of the current!

Repeatable Read Prevents non-repeatable reads from being transaction level snapshots! Each read is the version of the current transaction, and even if it is modified, only the current transaction version is read.

Uh… If that’s not clear, let’s take a look at InnoDB’s MVCC.

InnoDB MVCC is implemented by keeping two hidden columns behind each row, one for the creation time of the row and one for the expiration (deletion) time of the row. Of course, it is not the actual time value stored, but the system version number. Each time a new transaction is started, the system version number is automatically increments. The system version number at the beginning of the transaction is used as the transaction version number and is compared with the version number of each row queried.

Select InnoDB checks each row based on two criteria: a. InnoDB only looks for rows whose version is earlier than the current transaction version. This ensures that the transaction reads data that either existed before the transaction started, or was inserted or updated by the transaction itself. The row deletion version is either undefined or greater than the current transaction version number, ensuring that rows read by a transaction are not deleted before the transaction begins

As for the virtual read (magic read) : refers to a transaction in which the data inserted by another transaction is read, resulting in inconsistent read before and after.

Note: Similar to non-repeatable reads, but virtual reads (magic reads) may read data inserted by other transactions, resulting in inconsistent reads
MySQL’s Repeatable Read isolation level plus GAP lock already handles magic reads.

Nginx tuning

1. Nginx definitions

Nginx is often used as a static content service and reverse proxy server, as well as a high concurrency server at the front of the page. Suitable for load balancing, directly forward external requests to the following application services (Tomcat or something)

2. Familiar with Nginx core configuration

2.1 Global Configuration Blocks

user root; # Run worker_processes 7 as nobody by default; Error_log logs/error.log; error_log /error.log; Error_log logs/error.log notice; error_log logs/error.log notice; Pid logs/nginx.pid; /nginx.pid; /nginx.pid; /nginx.pid; #nginx main process pid save location, this is the default value worker_rlimit_nofile 65535; # Maximum number of file descriptors that a single worker process can openCopy the code

Worker_processes: * * * *

The actual operation is generally set to very close to the number of CPU threads, for example, the CPU is 8 threads, generally set to 6, 7.

Our own development, time generally set to 1, 2 can, otherwise too eat resources.

Worker_rlimit_nofile:

R is read, and limit is the limit. A single worker process can only open a specified number of files at most, and can’t read any more files if the number exceeds. Opening a file once generates a file descriptor.

This setting is to prevent a single worker process from consuming a large number of system resources.

Ps – ef | grep nginx nginx process under the query:

No matter how many worker processes are set, there is only one main process (that is, running sbin/nginx).

The main process is run by the account currently logged in to Linux, and the worker process is run by the account specified by the User directive. The first column of numbers is the PROCESS PID.

The nginx worker process and the nginx main process are both processes in Linux, but the main process (parent process) can control the start and end of the worker process (child process).

The master process can be the boss and the worker process can be the worker.

2.2 events block

events { accept_mutex on; # prevent stampede multi_accept on; # Allow a single worker process to receive multiple network connection requests at the same time, default to off use epoll; Worker_connections 1024; # Specifies the maximum number of network connections that a worker process can set up. Default value: 1024 }Copy the code

Accept_mutex:

Stampede phenomenon: when a network connection arrives, all the sleeping worker processes will be awakened, but only one worker processes the connection, and the other awakened workers start to sleep again.

Set to on: Wake up several workers to use, not all. The default value is on.

If this parameter is set to OFF, all users will be awakened. A worker waking up consumes resources and affects performance.

Use:

Specifies the working mode of nginx. Available values are select, poll, kqueue, epoll, rtsig, and /dev/poll.

Select and poll are standard, kqueue and epoll are efficient,

Kqueue is in BSD, epoll is in Linux. (BSD is a branch of Unix; Linux is a Unix-like system).

Worker_processes in the global block and worker_connections in the Events block are key to nginx’s support for high concurrency. The maximum number of connections that Nginx can establish is multiplied by these two values.

A connection needs to be saved in a file,

The maximum number of connections for a worker process set by worker_connections is limited by the maximum number of files that can be opened by a worker process set by worker_rlimit_nofile in the global block.

Worker_rlimit_nofile is restricted by nginx to a single worker process, which is subject to the maximum number of file descriptors that can be opened by a single worker process in Linux.

By default, a Linux process can open a maximum of 1024 file descriptors. We need to change the Linux resource limit and set the maximum number of file descriptors that can be opened by a single process:

ulimit -n 65536
Copy the code

The ulimit command can limit the size and number of system resources used by a single process, including memory, buffers, sockets, stacks, queues, CPU usage, and so on.

You can use ulimit –help to view the parameters.

2.3 HTTP block

HTTP {# HTTP global block #server block}Copy the code

You can have multiple server blocks.

(1) HTTP global blocks

The HTTP global block configuration applies to the entire HTTP block (all server blocks within the HTTP block).

include mime.types; Default_type application/octet-stream; # Set default MIME type, binary stream. If the MIME type used is not in mime.types, it is treated as the default type. #log_format main '$remote_addr - $remote_user [$time_local] "$request" ' # '$status $body_bytes_sent "$http_referer" ' # '" $http_user_agent "" $http_x_forwarded_for"; access_log logs/access.log; This log stores the information about the client request, including the client address, the browser used, the browser kernel version, the URL of the request, the request time, the request mode, the response status, and so on. #access_log logs/access.log main; The default log format is main, which is defined above. The default saving location is logs/access.log sendfile on; # Enable the file efficient transfer mode. The default value is off. #tcp_nopush on; If the response size is too large, it will be sent to the client in batches by default. Setting on will send the response to the client in one batch to prevent network blocking #tcp_nodelay on If the response size is too small, it will be placed in the buffer by default. If the buffer is full, it will be flushed to the client. If the buffer is full, it will be flushed to the client directly. If the client does not send any data to Nginx for a specified period of time, Nginx will close the connection. gzip on; # Compress the response data using the gzip module. After this function is enabled, the response size becomes smaller and the transmission to the client takes less time, thus saving bandwidth. However, nGINx compression and client decompression incur additional time and resource costs, and the burden of NGINx increases. Upstream servers{# Set load balancers. The name of the load balancer cannot contain _. The specified name is Servers server 192.168.1.7:8080. # Tomcat server node server 192.168.1.8:8081; Server 192.168.1.7:8080 down; Server 192.168.1.8:8081 backup is not used. #backup indicates that this node is a standby node, which is enabled only when other nodes are too busy (for example, when some nodes fail and other nodes become overloaded). Server 192.168.1.8:8081 max_fails = 3 fail_timeout = 60 s; If the request for the node fails 3 times, the node will not be used for 60 seconds, and will be used again after 60 seconds}Copy the code

Common log format values:

$remote_addr Specifies the IP address of the client
$time_local: access time and time zone
$request: Indicates the REQUESTED URL and HTTP protocol
$status: indicates the status of the request. Success is 200
$HTTP_referer: Accessed from that page link
$HTTP_user_agent: information about the client browser

(2) Server module

Server {#server global block listen 80; Server_name localhost; # Virtual host (i.e., domain name) is valid only if it is registered with DNS. If it is not registered, it must be localhost. Multiple virtual hosts can be specified, separated by Spaces charset UTF-8; # Character set to use. #access_log logs/host.access.log main; Set log to either HTTP global block or server global block. The HTTP global block is already logged, so you do not need to set it here. Error_page 404/404.html; Nginx = /404.html {root = /404.html; Error_page 500 502 503 504/50x.html; error_page 500 502 503 504/50x.html; error_page 500 502 503 504/50x.html; location = /50x.html { root html; Static resource location ~ *} # processing \. | | (HTML CSS js | | GIF JPG | PNG | mp4) ${# use regular expression match the url, if the request is for these files, use the following approach root static; If you use nginx to process static resources, use root to specify the directory where the static resources are located. Create directory static in nginx home directory and put static resources in it. expires 30d; #proxy_pass http://192.168.1.10:80; Use proxy_pass if you are using other machines to process static resources. Use load balancer when multiple machines are clustered. } # set default location / {# if the url does not match, use default processing for root HTML; Specifies the root directory where requests are processed. When Nginx itself acts as a Web server to process client requests directly, such as login.jsp, it calls login in the root directory to process the request. index index.html index.htm; Specifies the nginx server home page address. Both root and index2 item configurations are required. proxy_pass http://servers; # Specify the load balancer to use and forward to one of the nodes for processing. If this parameter is not set, nginx itself will act as the web server by default, processing the request directly, and will find the requested file in the specified root directory}}Copy the code

The error page is set to be returned to the client when a problem occurs on Nginx as a Web server (handling static resources). For example, the static resources on Nginx cannot be found.

If the problem is tomcat, for example, xxx. JSP cannot be found on Tomcat, the error page is returned for Tomcat, not nginx.

If you want to use Nginx itself as a Web server and process client requests directly, such as static resources, set user in the global block to the account that is running Nginx (that is, the account currently logged in to Linux).

Otherwise, the worker process (default nobody account) has no permission to read the static resources of the current account (that is, the account running the main nginx process), and the client displays 403 access forbidden.

You can use regular expressions to filter the CLIENT IP addresses, or you can write the client IP filtering rules in a file and include them.

3. Master the configuration of Nginx load algorithm

(1) Polling

Line up the list of servers in a circle, from the front to the back, looking for free servers to handle requests.

Polling is suitable for situations where server performance is similar. By default, polling is used and nothing needs to be set.

(2) Weighted polling

Upstream servers{server 192.168.1.7:8080 weight=1; Server 192.168.1.8:8081 weight = 2; }Copy the code

Set weights so that the most important ones have a greater chance of turning and are suitable for situations where server performance is significantly different.

(3) the ip_hash

upstream  servers{
    ip_hash;
    server  192.168.1.7:8080;
    server  192.168.1.8:8081;
}
Copy the code

Requests are forwarded based on the hash value of the client IP address. Requests from the same client (IP address) are forwarded to the same server, which resolves the session problem.

(4) url_hash (third party)

upstream  servers{
    hash $request_uri;
    server  192.168.1.7:8080;
    server  192.168.1.8:8081;
}
Copy the code

Based on the REQUESTED URL, requests with the same URL are forwarded to the same server for processing.

Always process a URL, the server generally has the URL cache, can directly get the data from the cache as a response back, reducing the time overhead.

(5) Fair (third party)

upstream  servers{
    fair;
    server  192.168.1.7:8080;
    server  192.168.1.8:8081;
}
Copy the code

Requests are distributed according to the server response time. Requests with short response time are distributed more.

Fair: Nginx first calculates the average response time of each node. A short response time indicates that the node has a low load (idle), so it should be forwarded to it more. A long response time indicates that the node has a large load and should be forwarded to it less.

Both ip_hash and URl_hash use specific nodes to process specific requests. If a specific node fails, Nginx will remove the unavailable nodes and forward the specific requests to other nodes for processing. Url_hash has little impact, but IP_hash will lose the previous session data.

4. Tomcat tuning

1. Set basic parameters

Configure in server.xml:

**maxThreads:**Tomcat uses threads to process each request received. This value represents the maximum number of threads that Tomcat can create.
**acceptCount:** Specifies the number of requests that can be queued for processing when all available threads for processing requests are used. Any requests that exceed this number will not be processed.
**connnectionTimeout:** Network connection timeout, in milliseconds. Setting it to 0 indicates that it will never time out, which is a potential hazard. Typically, the value can be set to 30,000 milliseconds.
**minSpareThreads:** Number of threads created when Tomcat is initialized.
**maxSpareThreads:** If the number of threads created exceeds this value, Tomcat will close the socket threads that are no longer needed

2. Comparison of four connection modes of Tomat

The default HTTP request processing mode for Tomcat is BIO (blocking, second below), which opens a new thread for each request. Here’s an introduction

The < Connector port = "8081" protocol = "org. Apache. Coyote. Http11. Http11NioProtocol" connectionTimeout = "20000" redirectPort="8443"/>Copy the code

<Connector port="8081" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443"/>Copy the code

<Connector executor="tomcatThreadPool" port="8081" protocol="HTTP/1.1" connectionTimeout="20000" The redirectPort = "8443" / >Copy the code

< Connector executor = "tomcatThreadPool" port = "8081" protocol = "org. Apache. Coyote. Http11. Http11NioProtocol" ConnectionTimeout = "20000" the redirectPort = "8443" / >Copy the code

Let’s name the four connectors NIO, HTTP, POOL, and NIOP in order. Test performance comparison. The value is the number of requests processed per second. The higher the efficiency, the higher the efficiency

NIO HTTP POOL NIOP 281 65 208 365 666 66 110 398 692 65 66 263 256 63 94 459 440 67 145 363Copy the code

NIOP > NIO > POOL > HTTP although Tomcat default HTTP efficiency is the lowest, according to the number of tests can be seen to be the most stable. And this is just a simple page test, which will fluctuate depending on complexity.

Configuration reference: The maximum number of threads per process in Linux is 1000, and the windos is 2000. This is related to the server’s memory and the number of Tomcat configurations.

<Connector port="8080" protocol="org.apache.coyote.http11.Http11NioProtocol"
              maxThreads="500" minSpareThreads="25" maxSpareThreads="250"
              enableLookups="false" redirectPort="8443" acceptCount="300" connectionTimeout="20000" disableUploadTimeout="true"/>  
Copy the code

3. Tomcat cluster

Whether to deploy one Tomcat (multiple projects) on one server or multiple TomACTS (1 to N projects are deployed for each Tomcat) on one server. Multi-core mandatory configuration of multiple Tomcat, micro-service multi-threaded thought pattern.

4. Set Tomcat memory

Modify /bin/catalina.sh to add the following Settings:

JAVA_OPTS=' -xms [initial memory size] -xmx [maximum memory available] 'Copy the code

You need to increase the size of these two parameters, depending on the size of the server memory. Such as:

JAVA_OPTS = '- Xms1024m - Xmx2048m'Copy the code

The server is running 8GB of memory, running 3 Tomcat services and allocated 2GB of memory because there are other processes.

This article is almost finished here, of course, there are a lot of things have not written, but limited to the space is also useless, I sorted out a very detailed JVM, MySQL, NGINX and Tomcat study notes and materials, need friends directly click to get it.

Finally, code word is not easy, so, can click a like and collect brothers!

end

Performance tuning for JVM, MySQL, Nginx and Tomcat