Recommend learning

How helpless and painful it is to miss the “golden three silver four” Java programmer interview! Opportunity is only for those who are prepared!
Mybatis Interview Questions (latest version 2021)
MySQL optimized Interview Questions (2021 update)

preface

Due to the mature use of ORM framework now, many partners for JDBC concept is a little weak, ORM framework bottom is actually through THE JDBC operation of DB

JDBC (JavaDataBase Connectivity) is a Java database connection, to put it bluntly is to use the Java language to operate the database

SUN provides a set of standard API to access the database, and provides the corresponding connection to the database protocol standard, and then each manufacturer provides a set of API interface to access their own database according to the specification

1 MySql JDBC large data operation

The whole article takes large data operation as the topic, and elicits relevant knowledge points through the requirements in the development process

Migrating data
Export data
Batch processing data

Generally speaking, the author thinks that in Java Web programs, can be called a large amount of data, hundreds of thousands to tens of millions of different, then Java (Web application) processing is not how appropriate

For example, now the business system needs to read 500W rows from the MySQL database for processing, what should be done

For regular queries, 500w of data is read into the JVM memory at once, or in pages
Stream query, establish a long connection, using server cursors, read one at a time to load into THE JVM memory
Cursor queries, like streaming, use the fetchSize parameter to control how many pieces of data are read at a time

1.1 General Query

By default, the complete set of retrieval results is stored in memory. In most cases, this is the most efficient way to operate and is easier to implement because of the design of the MySQL network protocol

Assuming 500W of data in a single table, no one will load it into memory at once, usually in paging mode

@SneakyThrows
@Override
public void pageQuery() {
    @Cleanup Connection conn = dataSource.getConnection();
    @Cleanup Statement stmt = conn.createStatement();
    long start = System.currentTimeMillis();
    long offset = 0;
    int size = 100;
    while (true) {
        String sql = String.format("SELECT COLUMN_A, COLUMN_B, COLUMN_C FROM YOU_TABLE LIMIT %s, %s", offset, size);
        @Cleanup ResultSet rs = stmt.executeQuery(sql);
        long count = loopResultSet(rs);
        if (count == 0) break;
        offset += size;
    }
    log.info("Paging query time :: {}", System.currentTimeMillis() - start);
}
Copy the code

The above approach is simple, but in the absence of LIMIT deep paging optimizations, the online database server is cold, or you can wait a few days to retrieve the data

1.2 Streaming Query

If you are using a ResultSet with a large number of rows and cannot allocate the required memory heap space in the JVM, you can tell the driver to return a row from the result stream

One caveat to streaming queries is that all rows in the result set must be read (or closed) before any other queries can be issued against the connection, or an exception will be thrown

With streaming queries, concurrent access to the table referenced by the statement that produced the result set is maintained because its query monopolizes the join and must be processed as quickly as possible

@SneakyThrows
public void streamQuery() {
    @Cleanup Connection conn = dataSource.getConnection();
    @Cleanup Statement stmt = conn.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
    stmt.setFetchSize(Integer.MIN_VALUE);
  	long start = System.currentTimeMillis();
    @Cleanup ResultSet rs = stmt.executeQuery("SELECT COLUMN_A, COLUMN_B, COLUMN_C FROM YOU_TABLE");
    loopResultSet(rs);
    log.info("Stream query time :: {}", (System.currentTimeMillis() - start) / 1000);
}
Copy the code

Flow query database table data amount 500W single call time consumption: ≈ 6s

1.3 Cursor Query

In SpringBoot 2.x, the default connection pool is HikariPool and the connection object is HikariProxyConnection. Therefore, the following setting method is not feasible

((JDBC4Connection) conn).setUseCursorFetch(true);
Copy the code

UseCursorFetch =true in database connection information. Second, set the amount of data that Statement reads at a time, such as 1000

@SneakyThrows
public void cursorQuery() {
    @Cleanup Connection conn = dataSource.getConnection();
    @Cleanup Statement stmt = conn.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
    stmt.setFetchSize(1000);
    long start = System.currentTimeMillis();
    @Cleanup ResultSet rs = stmt.executeQuery("SELECT COLUMN_A, COLUMN_B, COLUMN_C FROM YOU_TABLE");
    loopResultSet(rs);
    log.info("Cursor query time :: {}", (System.currentTimeMillis() - start) / 1000);
}
Copy the code

The amount of database table data queried by cursor is 500W. The single call time consumption: ≈ 18s

1.4 the JDBC RowData

The above method loopResultSet is used, and the inner method is only a while loop. The core point of the general, streaming, and cursor query is the next method

@SneakyThrows
private Long loopResultSet(ResultSet rs) {
    while (rs.next()) {
				// Business operations
    }
    return xx;
}
Copy the code

The logic of resultset.next () is to implement the ResultSetImpl class to retrieve the next row of data from RowData each time. RowData is an interface, and the implementation diagram is shown below

When you switch to the Markdown editor, the ResultSet will use the RowDataStatic instance. When you generate the RowDataStatic object, all the records in the ResultSet will be read into memory, and then read from memory one by one through next()

RowDataCursor calls are batch processed and then cached internally. The process is as follows:

It first checks to see if any data has not been returned from its own internal buffer, and if so returns the next row
If all is read, trigger a new request to MySQL Server to read the fetchSize number result
The return result is buffered into an internal buffer and the first row of data is returned

When streaming, ResultSet uses the RowDataDynamic object, and this object next() initiates IO to read a single row of data on each call

In summary, the default RowDataStatic reads all data into client memory, which is our JVM; RowDataCursor reads the fetchSize row once and invokes the request after consumption. RowDataDynamic Reads one piece of data per IO call

1.5 JDBC Communication Principles

(1) Common query

The JDBC server interacts with the MySQL server through the Socket. For network programming, MySQL can be treated as a SocketServer. Therefore, a complete request link should be:

JDBC client -> client Socket -> MySQL -> retrieve data return -> MySQL kernel Socket Buffer -> Network -> Client Socket Buffer -> JDBC client

When querying large amounts of data in a common query mode, the JVM may become cold for the following reasons:

MySQL Server writes the retrieved SQL result set to the corresponding kernel Socket Buffer through the output stream
The kernel buffer sends back data through the TCP link initiated by JDBC. In this case, the data first enters the kernel buffer where the JDBC client resides
After JDBC initiates the SQL operation, the program is blocked on the read operation of the input stream. When the buffer has data, the program is awakened to read the buffer data into THE JVM memory
MySQL Server continues to send data, and JDBC continues to read buffer data into Java memory. Although the data is in the local JDBC program, JDBC does not respond to the execute method call, because it will not return until the corresponding data is read
The downside is obvious: if you query too much data, you will constantly experience GC and then run out of memory

(2) Cursor query

It is known from the above that the cursor can solve the memory overflow problem of ordinary query large amount of data, but

If MySQL does not know when the client program is finished, what should be done when another DML write is done to the table?

In fact, when we use a cursor query, MySQL needs to create a temporary space to store the data that needs to be read, so it does not conflict with DML write operations

However, cursor queries can cause the following symptoms:

The IOPS spikes because the returned data needs to be written into the temporary space. A large number of I/OS are read and written. This process may cause write jitter of other services
Disk space surges because the data written to the temporary space is outside the original table. If the table data is too large, in extreme cases, the database disk may be written out and the network output will not change. The data written to the temporary space will be collected by MySQL when the read is complete or when the client initiates a ResultSet#close operation
When the CLIENT JDBC initiates an SQL query, it may take a long time to wait for the SQL response. During this time, the server prepares data. But the general query waiting time and the cursor query waiting time principle is inconsistent, the former is consistent in reading the network buffer data, no response to the business layer; The latter is a temporary data space that MySQL is preparing and does not respond to JDBC
After data is prepared and data is transmitted, the network response begins to soar. The IOPS changes from “read and write” to “read”.

The communication efficiency of cursor query is low. After the client consumes the fetchSize row data, it needs to send a request to the server. In the database preparation phase, the IOPS is very high, which consumes a lot of disk space and performance

(3) Streaming query

When the client and MySQL Server connect and query each other, MySQL Server returns the SQL result set as output by writing data to the Socket Buffer corresponding to the local kernel. Then the kernel data is sent back to the server kernel buffer corresponding to JDBC through the TCP link

JDBC uses the input stream read method to read kernel buffer data. Since streaming is enabled, the business program receives only one piece of data at a time
The MySQL server will continue to send data to the client kernel represented by JDBC until the client request Socket buffer is full, at which point the MySQL server will block
Every time for JDBC client, the data is read from the kernel buffer of this machine, so the performance will be quicker, generally don’t have to worry about the machine kernel countless according to consumption (unless the MySQL server data, the client does not do any business logic, to get the data directly to give up, advance consumption than the client server will happen)

It seems that streaming is better than the cursor approach, but things are often not as simple as they seem

Streaming takes longer to affect the database than cursor queries
In addition, streaming query depends on the network, which leads to network congestion

Flow cursor memory analysis

Table data volume: 500W

Memory viewing tool: JDK comes with Jvisualvm

Set JVM parameters: -XMx512m -xms512m

2.1 Memory usage for a single call

The memory performance report for streaming query is as follows

The cursor query memory performance report is as follows

In terms of memory usage, cursor queries and streaming queries are well protected against OOM

2.2 Memory usage for concurrent calls

Concurrent call: Jmete 1 second 10 threads concurrent call

The memory performance report for streaming query is as follows

Concurrent calls are also fine for memory usage; there is no stacking

Average concurrent invocation time of streaming query: ≈ 55s

The cursor query memory performance report is as follows

Average concurrent invocation time of cursor query: ≈ 83s

Due to the equipment limitations and some cases can only occur in extreme situations, the multi-environment verification of production and testing has not been carried out. Partners can test by themselves if they are interested

3 How does MyBatis use streaming queries

Mybatis is an ORM framework that encapsulates streaming queries

The ResultHandler interface contains only the handleResult method, which can get the transformed Java entity class

@Slf4j
@Service
public class MyBatisStreamService {
    @Resource
    private MyBatisStreamMapper myBatisStreamMapper;

    public void mybatisStreamQuery() {
        long start = System.currentTimeMillis();
        myBatisStreamMapper.mybatisStreamQuery(new ResultHandler<YOU_TABLE_DO>() {
            @Override
            public void handleResult(ResultContext<? extends YOU_TABLE_DO> resultContext){}}); log.info("MyBatis query time :: {}", System.currentTimeMillis() - start); }}Copy the code

In addition to the following annotated application, you can also use the.xml file format

@Mapper
public interface MyBatisStreamMapper {
    @Options(resultSetType = ResultSetType.FORWARD_ONLY, fetchSize = Integer.MIN_VALUE)
    @ResultType(YOU_TABLE_DO.class)
    @Select("SELECT COLUMN_A, COLUMN_B, COLUMN_C FROM YOU_TABLE")
    void mybatisStreamQuery(ResultHandler<YOU_TABLE_DO> handler);
}
Copy the code

Mybatis flow query call time consumption: ≈ 18s

Comparison of JDBC streaming and MyBatis encapsulated streaming read

MyBatis is still a bit slower than native streaming, but the performance is acceptable considering the underlying encapsulation
From the memory ratio, the two fluctuations are similar
MyBatis is more convenient than native JDBC because it encapsulates features such as callbacks and serialized objects
MySQL actual combat study notes to everyone amway!

The specific use of both can be determined according to the actual situation of the project, there is no best, only the most suitable.

“Said

Flow query, cursor query can avoid OOM, data volume can consider this scheme. However, these two methods will occupy the database connection and will not be released in use. Therefore, when cursors and streaming operations are used for online services with large data volume, concurrency control must be carried out

In addition, for JDBC native streaming query, Mybatis also carries on the encapsulation, although it will be slower, but the function and code will be much cleaner

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Interviewer: Read 500W rows from MySQL database for processing, what should be done more efficiently?

Recommend learning

preface

1 MySql JDBC large data operation

1.1 General Query

1.2 Streaming Query

1.3 Cursor Query

1.4 the JDBC RowData

1.5 JDBC Communication Principles

Flow cursor memory analysis

2.1 Memory usage for a single call

2.2 Memory usage for concurrent calls

3 How does MyBatis use streaming queries

“Said

Interviewer: Read 500W rows from MySQL database for processing, what should be done more efficiently?

Recommend learning

preface

1 MySql JDBC large data operation

1.1 General Query

1.2 Streaming Query

1.3 Cursor Query

1.4 the JDBC RowData

1.5 JDBC Communication Principles

Flow cursor memory analysis

2.1 Memory usage for a single call

2.2 Memory usage for concurrent calls

3 How does MyBatis use streaming queries

“Said

Related Posts

Five useful functions in Python

Remember: never use UTF-8 in MySQL

Study Notes — AOP