sequence

This paper mainly studies the defense measures against exceptions related to large data reads and writes in JDBC

A read operation

Select a large amount of data to memory at one time, the most likely occurrence is OOM exception, this time can be limited from the time and data size two dimensions

Limited data

1. Paging query

For common functions, paging operation is necessary, but also the simplest way to solve this problem, in the implementation of related functions, to the production of data volume in advance, determine the corresponding amount of paging data.

2.maxRows

In JDBC, you can set the maxRows of a statement to limit the maximum number of rows that the statment can pull. For example, the JDBC driver of PG compares maxRows with fetchSize, and takes the minimum value as the limit value to query.

This parameter may not be easy to set if you want to do general Settings for different SQL. It may be a bit brutal and violent, and some queries may not have too many columns or take up too much memory. It needs to be set separately. However, jPA or Mybatis is rarely used in the actual function implementation, so it depends on whether JPA or MyBatis has exposed this parameter value for you to set. However, for general SQL services, it is necessary to set maxRows, such as no more than 2W, to prevent backtracking.

3.fetchSize

JDBC provides the fetchSize parameter to set batches of fetchSize for each query. Different databases have different JDBC driver implementations.

For example, mysql requires that the URL useCursorFetch=true and the statement fetchSize be set so that a batch fetch is performed. Otherwise, a full fetch is performed. In fetch mode, the executeQuery method does not fetch the first batch of data, but is implemented in the Next method of the resultSet.

For pg, the executeQuery method by default pulls the first fetchSize and returns it, and the Next () method of the resultSet fetches again as needed

Using fetchSize to avoid OOM is limited by the fact that you need to process data while traversing the resultSet. If you are not iterating through the result set loop and return the result set to the list, the fetchSize will not work in the non-Reactive programming mode because you will end up building up all the data sets in memory and therefore risk OOM

Limit query time

With time constraints, there are multiple dimensions:

1. The connection socketTimeout

This is JDBC connection socket timeout parameter setting, can be used to prevent the database due to network reasons or its own problems caused by the restart of the connection blocking, this is very necessary to set, generally in the connection URL set

Such as mysql

jdbc:mysql://localhost:3306/ag_admin? useUnicode=true& characterEncoding=UTF8&connectTimeout=60000&socketTimeout=60000Copy the code

For example, pg, pg is different from mysql, mysql is millisecond, pg is second

jdbc:postgresql://localhost/test? user=fred&password=secret&&connectTimeout=60&socketTimeout=60Copy the code

However, the database connection pool is usually used, so this parameter is not set. You can also set connection pool parameters.

2. The statement queryTimeout

The statement executeQuery timeout is set by the statement executeQuery. This timer is used to set the time out from the time the statement executeQuery sends the query to the time when the first batch of data is received.

However, the JDBC driver implementation varies from database to database. For example, in fetch mode, mysql’s executeQuery does not fetch the first batch of data, while PG pulls the first batch of data and returns it. This parameter is relatively semantically consistent only in cases other than fetch mode, where all data is queried at once. In fetch mode, this timeout does not limit the fetch timeouts for subsequent batches of data, which can only depend on the socketTimeout parameter of the Connection.

Mybatis can set this value with the defaultStatementTimeout parameter. Jpa can set this value with the Query HIT

@QueryHints(@QueryHint(name = org.hibernate.jpa.QueryHints.SPEC_HINT_TIMEOUT, value = "1000"/*ms**/))
    List<DemoUser> findAll();
Copy the code

The JDBC template can be set with parameters

    @Bean(name = "pgJdbcTemplate")
    public JdbcTemplate pgJdbcTemplate(
            @Qualifier("pgDataSource") DataSource dataSource) {
        JdbcTemplate jdbcTemplate = new JdbcTemplate(dataSource);
        jdbcTemplate.setQueryTimeout(10*1000);
        jdbcTemplate.setMaxRows(10*1000);
        return jdbcTemplate;
    }
Copy the code

3. The transaction timeout

In real programming, implementing a business function might call multiple statement queries in a single transaction. Transaction can limit the elapsed time of these operations on a transaction basis.

You can set a global timeout

    @Bean
    @Qualifier("pgTransactionManager")
    PlatformTransactionManager pgTransactionManager() {
        JpaTransactionManager transactionManager = new JpaTransactionManager(pgEntityManagerFactory().getObject());
        transactionManager.setDefaultTimeout(60 /*seconds*/);
        return transactionManager;
    }
Copy the code

It can also be set separately in the Transactional annotation, for example

@Transactional(timeout=5) /**5 seconds*/
public List<DemoUser> findAll();
Copy the code

4. Connection time

When using connection pooling for database operations, connection pooling generally provides connection detection, such as verifying that a connection is OK when IT borrows

The connection timeout suspect and abandon operations are also provided to detect connection leaks. If none of the above operations is set or the value set is too large (by default), then this detection is a backpocket operation in addition to socketTimeout. If the connection is lent and not returned after the specified time, it is judged to be a leaked connection, which forces abandon, or close, to drop the connection. This is very violent, but also very useful, preventing the thread from blocking at the end of the database operation resulting in service 504 or 502

The write operation

Similar to fetchSize, JDBC provides the Batch method for inserting or updating a large amount of data. Therefore, for large-scale data operations, pay attention to the amount of data stored in memory and remember to release calls in batches. It is better to use native JDBC, jPA’s save method still holds a large number of objects in memory, and performs batch and release on flush.

summary

Note For a large number of JDBC data read and write operations, pay extra attention to the accumulation of objects in the memory to avoid OOM. In addition, extra attention should be paid to setting the timeout period for database operations to prevent server threads from blocking and thus unable to provide services.

operation category parameter note
read The number of pageSize Paging query
read The number of maxRows Limits the amount of all data that can be used in a fetch query at one time or in a sub-fetch query
read The number of fetchSize Limit the size of each batch query for statement query and result next
read time connection socketTimeout The read of the underlying socket connection timed out. Procedure
read time statement queryTimeout Limit the statement query timeout
read time transaction timeout Limit the timeout for transaction execution
read time connection remove abandon timeout Limit the connection borrowing timeout
write The number of batch execute Partial execution

doc

  • Hibernate: set default query timeout?
  • Talk about the maxRows parameter in a PG JDBC statement
  • Talk about the fetchSize of a JDBC statement
  • Talk about pg JDBC’s queryTimeout and next methods
  • Talk about mysql JDBC queryTimeout and next methods
  • Talk about setting JDBC socketTimeout
  • Talk about the default parameters of tomcat JDBC pool and poolSweeper
  • Talk about JDBC Batch operations
  • Talk about the batch operation implementation of JPA