Abstract: One aspect of system optimization is the systematic analysis and optimization of each link on the IT system or trading chain, and the other is the bottleneck analysis and tuning of a single system. The objective of optimization is nothing more than to improve the response speed and throughput of the system and reduce the coupling of each layer, so as to cope with the market of flexible opposite side.

1. What is system optimization

One aspect of system optimization is to systematically analyze and optimize every link on the IT system or trading chain, and the other is to analyze and tune the bottleneck point of a single system. However, the objective of optimization is basically the same, which is to improve the response speed and throughput of the system and reduce the coupling of various layers to cope with the market of flexible opposite side.

There are three levels of system optimization: IT architecture governance layer, system layer and infrastructure layer.

  • IT system governance layer: The purpose of optimization is not only performance optimization, but also application architecture optimization (e.g. application layering, service governance, etc.) to adapt to changes in business architecture.

  • System layer: The purpose of optimization includes business process optimization, data process optimization (such as: improve system load, reduce system overhead, etc.)

  • Infrastructure layer: The purpose of optimization is to improve the capabilities of the IAAS platform (for example, to establish an elastic cluster with horizontal scalability, and to support rapid up-and-down and transfer of resources).

2. Methodology and thinking of system optimization

My personal understanding of what a methodology is is something that sounds awesome, that people who have done it think is crap, but that points the way to action or continuous improvement.

2.1 Common Methodology

(1) Do not access unnecessary data — reduce unnecessary links on the trading line, reduce failure points and maintenance points.

(2) Nearest load/cache is king — reduce unnecessary access.

(3) Fault isolation — don’t overwhelm the entire trading platform because of one system bottleneck.

(4) Have good expansion ability — rational use of resources, improve processing efficiency and avoid single point of failure.

(5) Optimize the transaction chain to improve throughput — asynchronous/less serial, reasonable split (vertical/horizontal split), rule front-loading.

(6) Performance and function are equally important — the total value of the 5 performance items on the transaction chain is 59% of the design time after 90% of the design stage.

2.2 General idea of optimization



2.3 Principles of optimization

  • Performance should always be considered in the design and development of application systems.

  • Defining clear performance goals is key.

  • Performance tuning is accompanied by the whole project cycle, and it is best to set goals in stages. After the expected performance goals are achieved, the work in this stage can be summarized and knowledge transfer into the next stage of tuning.

  • You must ensure that the tuned program runs correctly.

  • Performance depends more on good design than tuning techniques.

  • The tuning process is iterative and progressive, and the results of each tuning should be fed back into the subsequent code development.

  • Performance tuning should not come at the expense of code readability and maintainability.

3
Performance tuning

3.1 Common Performance Problems

3.1.1 Common Client Performance Problems

  • Slow loading: The first startup or reload is slow.

  • No response: the page is dead after the event starts;

  • Seriously affected by network bandwidth: because of the need to download a large number of resource files, in some areas in the network environment is not good page;

  • JS memory overflow: Frequent operations on object attributes cause excessive memory usage and overflow.

3.1.2 Common J2EE System Performance Problems

  • Memory leak: Memory that is continuously occupied during operation and cannot be recollected. The memory usage increases linearly over time or with increasing load. The processing efficiency of the system decreases with increasing time or concurrency until the memory allocated to the JVM is exhausted and the system breaks down or recovers within a short time after restart.

  • Resource leak: Failure to close or successfully close a resource after it has been opened. These resources include data source connections, file streams, and so on. When these resources are frequently turned on but not successfully turned off, resource leaks result. Data connection leakage is a common resource leakage problem.

  • Overload: Excessive use of a system that exceeds the capacity of the system.

  • Internal resource bottleneck: Resource bottlenecks are caused by overuse or underallocation of resources.

  • Thread blocking, thread deadlock: communication is blocked when a thread falls back to an uncompleted synchronization point.

  • Slow response of the application system: The response time is long due to improper application or SQL.

  • Application system is unstable, sometimes fast and sometimes slow phenomenon occurs.

  • A variety of exceptions occur in application systems: some are thrown by middleware servers, others are thrown by the data side.

3.1.3 Common Database Problems

  • Deadlock: a table deadlock caused by a request hold or inefficient execution that cannot release leads in time or by a waiting loop;

  • I/O busy: A large number of I/OS are waiting because of poor SQL or improper service logic design.

  • High CPU usage: High concurrency or cache penetration results in high or low database CPU usage.

3.2 Specific work of tuning

In order to be fast, the most important thing is to improve the response time of the system (response time = service processing time + queuing time). As shown in the classic response time curve, what we need to do is to reduce the service response time through program optimization and reduce the queuing time by improving the system throughput.

Response time curves (from Oracle Performance Forecasting)

The vertical axis is response time. Response time is the sum of service time and queue time. The horizontal axis is the rate of arrival. As the number of transactions entering the system increases per unit of time, the curve slides to the right. As arrival rates continue to increase, at some point queue times will rise steeply. When this happens, response times skyrocket, performance drops, and users get very frustrated.

Examples from previous projects are used to analyze the specific work of performance optimization.

3.2.1 Trading line optimization

The transaction line starts from the consumers of the service, looking at the functions that the transaction should accomplish at each level and the relationship between the function points. The relationship between function points is represented by a directed path:



Principles of trading line optimization:

  • Shortest path: reduce unnecessary links, avoid failure points;

  • Transaction integrity: to ensure the consistency of things at all stages of the transaction line through rectification or compensation transactions;

  • Fault isolation and fast location: shielding abnormal conditions for normal transaction images, through the transaction code or error code can quickly allelize the problem;

  • Traffic control principle: You can control the traffic of service channels and set priorities for higher-level services.

  • Timeout control funnel principle: The timeout setting of the front-end system should be larger than that of the back-end system.

[Case] With the evolution of architecture, in the past, the first station was a silo-type system built, and gradually developed into an independent unit that can be built flexibly based on service:



In the process of service governance, the original core business system is broken into various independent business components. Some middle-layer platform systems gradually build business services based on these business components and process services, and provide business support for the rapid construction of front-end applications. In this process, service identification and construction is the basis, and the specification of transaction line is the guarantee. Through the specification of transaction line, service governance can be determined to do and not do. This is because with the iteration of software version, few individuals can consider all the details of the system clearly, so governance by rules, rather than by people.

To develop an order query function A, both B and C services of the service integration platform can complete the same function, but B adds some extra unnecessary verification on the basis of C. According to the shortest path principle, A should directly call C service at this time.



When service provider D has insufficient processing capacity, it should timely notify service consumer C or discard part of access channel requests according to priority. Front-end consumer receives back-end traffic control error codes and notifys users in time. This prevents all user levels from being denied service when the system reaches capacity limit. One of the purposes of flow control is to ensure the healthy and stable operation of each system. Counters are generally used to detect the number of concurrent transactions by transaction type, and different counters are used for different transaction types. When a transaction request arrives, the counter is incremented by 1, and when the request responds or times out, the counter is decreased by 1.

3.2.2 Client Optimization

The primary goal of client-side optimization is to speed up page rendering, followed by reducing calls to the server.

Common solutions:

  • Analyze bottleneck points and optimize them accordingly;

  • Caching is king by caching static data on the client to improve page response time;

  • Reduce the client network download traffic by GZIP compression;

  • Use compression tools to compress JS, reduce js file size;

  • Remove and merge scripts, stylesheets, and images to reduce GET requests;

  • Loading JS without blocking

  • Preloading (images, CSS styles, JS scripts);

  • Load JS scripts on demand;

  • Optimize JS processing methods to improve page processing speed.

WEB request sequence diagram:



[Case] The following is the HTTP request monitoring record of an enterprise internal application system client:



In the figure above, you can see that 25 requests were sent (21 hits the cache and 4 interactions with the server).



According to the statistics, the total request time is 5.645 seconds, four network interactions are performed, and 5.9KB data is received. Sending 110.25KB of data, GZIP compression saves: 8KB of data.

Later, the page optimized the page response time to about 2 seconds by optimizing back-end requests, merging and compressing JS/JSP files, and so on.

PS: front-end optimization is best to understand the browser principle, HTTP principle

3.2.3 Server Optimization



[Case] Record a resource leak, specifically manifested as result-set is not closed:

Result-set statistics are not disabled

The stack trace log shows that the application code closes only connection but not Statement and ResultSet.

Closing a connection automatically closes a Statement and a ResultSet, and automatically releases the resources occupied by a Statement and a ResultSet. The JDBC processing specification or JDK specification provides the following description:

JDBC processing specification

Jdbc.3.0 Specification — 13.1.3 Closing Statement Objects

An application calls the method Statement.close toindicate that it has finished processing a statement. All Statement objectswill be closed when the connection that created them is closed. However, it isgood coding practice for applications to close statements as soon as they havefinished processing them. This allows any external resources that the statementis using to be released immediately.

Closing a Statement object will close and invalidateany instances of ResultSet produced by that Statement object. The resourcesheld by the ResultSet object may not be released until garbage collection runsagain, so it is a good practice to explicitly close ResultSet objects when theyare no longer needed.

These comments about closing Statement objects applyto PreparedStatement and CallableStatement objects as well.

Jdbc.4.0 Specification — 13.1.4 Closing Statement Objects

An application calls the method Statement.close toindicate that it has finished processing a statement. All Statement objectswill be closed when the connection that created them is closed. However, it isgood coding practice for applications to close statements as soon as they havefinished processing them. This allows any external resources that the statementis using to be released immediately.

Closing a Statement object will close and invalidateany instances of ResultSet produced by that Statement object. The resourcesheld by the ResultSet object may not be released until garbage collection runsagain, so it is a good practice to explicitly close ResultSet objects when theyare no longer needed.

Once a Statement has been closed, any attempt toaccess any of its methods with the exception of the isClosed or close methodswill result in a SQLException being thrown.

These comments about closing Statement objects applyto PreparedStatement and CallableStatement objects as well.

Statement close Automatically closes a ResultSet. Close Automatically invalidates a ResultSet. Note that only the ResultSet object is invalid, and the resources occupied by the ResultSet may not be released. So you should still explicitly execute the close method on connection, Statement, and ResultSet. Especially when using a Connection pool, connection.close does not cause the physical connection to be closed, and failure to execute the ResultSet close may result in more resource leaks.

JDK processing specification:

JDK1.4

Note: A ResultSet object is automatically closed by theStatement object that generated it when that Statement object is closed,re-executed, or is used to retrieve the next result from a sequence of multipleresults. A ResultSet object is also automatically closed when it is garbagecollected.

Note: A Statement object is automatically closed when it isgarbage collected. When a Statement object is closed, its current ResultSetobject, if one exists, is also closed.

Note: A Connection object is automatically closed when it is garbagecollected. Certain fatal errors also close a Connection object.

JDK1.5 

Releases this ResultSet object’s database and JDBC resources immediatelyinstead of waiting for this to happen when it is automatically closed.

Note: A ResultSetobject is automatically closed by the Statement object that generated it whenthat Statement object is closed, re-executed, or is used to retrieve the nextresult from a sequence of multiple results. A ResultSet object is alsoautomatically closed when it is garbage collected.

Specification description:

1. The garbage collection mechanism can turn them off automatically;

2. If Statement is closed, the ResultSet is closed.

3. If a Connection is closed, Statement may not be closed.

A Connection closure is not a physical closure, but a return to the Connection pool. Therefore, a Statement or a ResultSet may be held and actually occupy the cursor resources of the relevant database. In this case, As long as it runs for a long time, it is possible to report an error “the cursor exceeds the maximum allowed value of the database”, resulting in the program cannot access the database properly.

Suggestions for this problem:

(1) Explicitly close database resources, especially when using a Connection Pool;

(2) The optimal experience is to execute close in the order of ResultSet, Statement and Connection;

Rs.close () and stmt.close() must be followed by rs = null and STMT = null, and do a good job of exception handling;

(4) If a ResultSet must be passed, a RowSet should be used. Rowsets can be independent of Connection and Statement.

3.2.4 the JVM to optimize

Tuning parameters for the JVM needs to be handled carefully. Common JVM parameters:

Heap parameter Settings

-server - Xmx1G - xMS1g-xmn512m-xx :PermSize= 512m-xx :MaxPermSize= 512m-xx :+UseCompressedOopsCopy the code

-server: Select “server” VM as the first parameter, as opposed to -client. Adding -server to “client” VM will affect the default values of other JVM parameters. HotSpot consists of an interpreter and two compilers (client and Server, or one of the two), with a mixed interpretation and compilation mode that starts interpretation execution by default. The server starts slowly, takes up a lot of memory, and has high execution efficiency. It is suitable for server-side applications. After JDK1.6, this mode will be enabled by default in JDK environment with 64-bit capability. The client takes less memory and executes faster than the Server. By default, dynamic compilation is not implemented. The client is used for client application development and debugging or PC application development.

PS: It has been reported that some versions of Hotspot Servermode have stability issues, so whether the JVM adopts Servermode or client mode needs to be evaluated by system monitoring over a long period of time.

Garbage collection parameter Settings

-XX:+DisableExplicitGC-XX:+UseParNewGC-XX:+UseConcMarkSweepGC-XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection-XX:CMSFullGCsBeforeCompaction=0 -XX:+CMSClassUnloadingEnabled

-xx :+DisableExplicitGC disables system.gc () to prevent programmers from accidentally calling GC methods and affecting performance;

PS: Based on historical experience, if the garbage collection time is less than 2%, it is considered to have little impact on performance.

Log class parameters

-XX:+PrintClassHistogram -XX:+PrintGCDetails-XX:+PrintGCTimeStamps-Xloggc:log/gc.log 

-XX:+ShowMessageBoxOnError-XX:+HeapDumpOnOutOfMemoryError-XX:+HeapDumpOnCtrlBreak

Set log parameters such as -xx :+PrintClassHistogram -xx :+PrintGCDetails -xx :+PrintGCTimeStamps -xloggc :log/gc.log This allows you to view gc frequency from gc.log to assess the impact on performance.

Abnormal set down when I was a debug heap dump file, – XX: XX: + ShowMessageBoxOnError – + HeapDumpOnOutOfMemoryError – XX: + HeapDumpOnCtrlBreak, This allows you to see what your system does when it’s down.

Set performance monitoring parameters

-Djava.rmi.server.hostname=Server IP-Dcom.sun.management.jmxremote.port=7091-Dcom.sun.management.jmxremote.ssl=false-Dcom.sun.management.jmxremote.authent icate=false

The above parameters can be added to monitor the execution of the remote JVM either through visualVM or JConsole.

JVM parameter tuning

Adjusting heap parameters and garbage collection parameters requires comprehensive analysis through pressure testing and monitoring records:

Parameter combination

TransResponse Time

Throughput

Passed Transactions

Heap parameters

The GC parameter


























【 Case 】 When an application server runs an Object instance with a number of millions/tens of millions of instances and uses IBMHeapAnalyzer to analyze memory overflow to generate heapdump files, 89.1% of the space is occupied by the underlying objects (caused by loading a large number of records from the database) :



Using Jprofiler monitoring, we found a large number of unfreed VchBaseVo objects:



Hibernate’s list() method is used for query. The hibernatelist() method gives priority to query cached data. If it fails to retrieve data from the database, Hibernate will fill the first and second level caches accordingly. This is an effective solution to Hibernate caching, but it does present a performance problem here and requires a call to clear() to free up memory resources used by the level 1 cache.

3.2.5 Database Optimization

【 Case 】 The CPU usage of the internal core service system database of an enterprise is high at peak times, and there are problems such as large amount of data query, query performance degradation caused by multiple table connections, and unreasonable table index establishment. The CPU usage is controlled within 30% at peak times by the following methods:

Execute the following statement under SQL*PLUS:

SQL> set line 1000 — set line 1000 characters per line

SQL> set autotrace traceonly — Displays execution plans and statistics, but does not display query output

SQL statement with low execution efficiency:

select variablein0_.TOKENVARIABLEMAP_ as  TOKENVAR7_1_

   from JBPM_VARIABLEINSTANCE variablein0_


 where variablein0_.TOKENVARIABLEMAP_ =  ‘4888804’

View the execution plan before optimization:

The execution plan

———————————————————-

Plan hash value:  3971367966

——————————————————————————————-

| Id | Operation  | Name | Rows | Bytes | Cost (%CPU)| Time|

——————————————————————————————-

|   0 | SELECT STATEMENT  |                       |    12 |    612 | 12408   (2)| 00:02:29 |

|*  1 |  TABLE ACCESS FULL| JBPM_VARIABLEINSTANCE  |    12 |   612 | 12408   (2)| 00:02:29 |

——————————————————————————————-

Predicate  Information (identified by operation id):

—————————————————

   1 –  filter(“VARIABLEIN0_”.”TOKENVARIABLEMAP_”=4888804)


statistics

———————————————————-

          1   recursive calls

          1   db block gets

      48995   consistent gets

      48982   physical reads

          0   redo size

       1531   bytes sent via SQL*Net to client

        248   bytes received via SQL*Net from client

          2   SQL*Net roundtrips to/from client

          0   sorts (memory)

          0   sorts (disk)

          9   rows processed

The lack of an index in this statement results in a full table scan from the execution plan. Total consistent read usage: 48995; average consistent read usage per row: 48995/9=5444; physical read usage: 48982, which cannot meet normal performance requirements. Create index optimized execution plan:

statistics

———————————————————-

           1  recursive calls

           0  db block gets

           6  consistent gets

           4  physical reads

           0  redo size

        1530  bytes sent via SQL*Net to  client

         248  bytes received via SQL*Net  from client

           2  SQL*Net roundtrips to/from  client

           0  sorts (memory)

           0  sorts (disk)

           9  rows processed

According to the execution plan, the total consistent read consumption of this statement is 6, the average consistent read per row is 6/9=0.67, and the physical read is 4, which is relatively efficient SQL.

It is generally considered that the average number of consistent reads per row is more than 100, and the average number of consistent reads per row is less than 10.

According to previous optimization practices, the problems causing SQL inefficiency are mainly concentrated in the following aspects:

(1) Access path, mainly concentrated in the SQL execution due to index loss or data migration caused by index failure cannot use index scan, and is forced to use full table scan access path. The solution is to create the missing index or rebuild the index.

(2) the excessive use of sub queries and, in some cases we will connect to multiple tables, and at this time due to the needs of the business logic, we often use to some sub queries, as a result of the statement logic is too complex, the oracle cannot automatically transform sub queries on multiple table joins the operation, as a result of the results of the resulting oracle choose wrong execution path, Brings a sharp drop in statement execution performance. Therefore, we need to use join queries instead of subqueries whenever possible, which can help the Oracle query optimizer to choose the most efficient execution plan based on data partitioning and index design, such as the proper join order, join technology, and table access technology.

(3) The advantage of using bound variables is that hard parsing can be avoided, which is not discussed here, but the disadvantage is that the wrong execution plan can be chosen, which can cause a sharp performance decline. Oracle 10g currently addresses this problem by introducing a binding variable classification mechanism, and 11G maintains a new execution plan by creating new child cursors. Under 11G we can be bold with bound variables.

3.2.6 Load Balancing Optimization

Load balancing distributes access traffic and improves system scalability to avoid single point of failure. The following is a load balancing problem analysis and optimization ideas of a project group:



Load balancing algorithm:

  1. Random: Randomly selects one IP address from the pool. Advantages: Simple algorithm, high performance, and the backend is basically balanced when the request time difference is not large. Disadvantages: The back-end machine is prone to imbalance if the request time varies greatly.

  2. Round-robin: Selects the IP addresses based on the pool list in sequence. Advantages: Simple algorithm and high performance. Disadvantages: The same as randomization.

  3. By weight: You can assign weights to hosts in the pool, and then allocate requests based on the weights. Benefits: This algorithm is useful, especially when hosts with different configurations have accumulated in the production environment for many years. However, this problem has been resolved in the IAAS layer with virtualization.

  4. Hash: The request information is hashed and sent to a machine in the pool (generally used for loading static resources). Benefits: Increased cache hit ratio. Disadvantages: Consumes more CPU resources because you need to read the request information and do the hash.

  5. By response time: Allocating by response time benefits: Requests can be allocated to hosts with good performance. Disadvantages: The back-end machine is prone to imbalance if the request time varies greatly.

  6. Based on the minimum number of connections: Allocates resources according to the number of host connections. Benefits: Balancing request resources; Disadvantages: Performance problems may occur when a new server is added or a certain server is restarted due to a large amount of instant requests.

Session persistence:

  1. No Sticky session: Each request is considered as a new request and is re-allocated to the back-end host based on the load balancing algorithm. Benefits: simple, high performance; Disadvantages: Stateless processing is required for back-end services;

  2. Access IP address-based persistence: After the same IP address is allocated based on the load balancing algorithm for the first time, the second request is still allocated to the host. Disadvantages: Some network users are connected to the same server.

  3. Cookie-based persistence: The load balancer inserts the cookie in the HTTP request header in the first request, and the second request is allocated to the last host based on the cookie in the HTTP request header. Benefits: Relatively stable, can flexibly switch; Disadvantages: Occasional reply loss due to cookie clearing.

Health check:

  1. Based on TCP ports: Whether the port is monitored. If the port is not monitored, the host is excluded from the pool. Advantages: Simple, disadvantages: Requests may be sent when the container is started or applications are not started

  2. Based on Http GET /TCP request: periodically sends requests to the server and determines whether the return string is consistent with the specified one. If the return string is inconsistent, the host is excluded from the pool flushing. Advantages: Accurately determines whether applications are started properly and dynamically controls whether services are online.

Author: Kong Qinglong

This article is reprinted from the wechat official account freshmanTechnology