1. How did the problem occur
1.1. Application function Introduction
- The system is a dual data source dual write single service. (The two data sources are different stores, so master/slave replication mode cannot be used, which is a transition state of switching storage media).
- Update xx set a=b where m=n But there are more than a billion records in this table. Problems occur in scenarios where there are many records that need to be updated. So this is SQL optimized. The logic used is to query the id of the record that needs to be updated and then update it in pages.
1.2. Key codes
Dual data source operation
private Object runSql(List<String> sqlSessionFactotyBeanNameList, MethodInvocation invocation) throws InvocationTargetException, IllegalAccessException { List<SqlSession> sqlSessionList = Lists.newArrayList(); Object result = null; try { for (String sessionFactotyBeanName : sqlSessionFactotyBeanNameList) { SqlSessionFactory sqlSessionFactory = RgApplicationContextUtil.getBean( sessionFactotyBeanName, SqlSessionFactory.class); SqlSession sqlSession = sqlSessionFactory.openSession(ExecutorType.BATCH); Object mapper = sqlSession.getMapper(invocation.getMethod().getDeclaringClass()); Object[] param = invocation.getArguments(); result = invocation.getMethod().invoke(mapper, param); sqlSessionList.add(sqlSession); // Problem code, note!!!! sqlSession.commit(); } } catch (Exception ex) { sqlSessionList.stream() .forEach( x -> { x.rollback(); }); } finally { sqlSessionList.stream() .forEach( x -> { x.close(); }); } return result; }Copy the code
Problems of SQL
<select id="getBatchIdWithLimit" resultType="java.lang.Long">
SELECT x.id FROM context x WHERE x.oid = #{oid} ORDER BY id ASC
LIMIT #{offset}, #{limit}
</select>
Copy the code
Key configuration
MaxWait Indicates the maximum time to wait for a connection, in milliseconds. After maxWait is configured, fair lock is enabled by default, reducing the concurrency efficiency. If necessary, you can set useUnfairLock to true to use an unfair lock.
This parameter is not configured in the current system. Therefore, the system waits indefinitely and uses a fair lock
1.3 Steps for the occurrence of problems
- The length of some data exceeds the maximum value of Integer (2147483647).
- The dual data source code is buggy. List code combined with the add position is too backward, resulting in reflection abnormal. When a secondary SqlSession is not associated with the collection to be processed, it is not rollback or close. The link is leaked.
- Bug in code combining dual data sources when problematic data occurs. Will cause the List to be empty, so the release operation is not performed.
- The current maximum number of connections on the system is 100, and after 100 such occurrences, the service will wait endlessly to get the state in the link.
1.4. Symptoms of the problem
2. How to reproduce the problem
2.1 Recurrence of problem data
- Set the maximum number of database connections to 1 and leave maxWaitTime unset
- Construct a piece of data whose ID is greater than 2147483647
- Use the API to trigger calls to this logic
- The result is that the first call gives an error, and the second call gives the timeout set by the client of the card.
2.2 Database connection exception reappears
There is also a path where the code is fine, but the database is locked due to high concurrency. Mybatis allows you to set the duration of SQL execution. Once that happens. Problems can arise.
But this kind of scene is difficult to reproduce, so is there a way to effectively forge this scene?
Prepare knowledge
Set the autocommit = 0; SELECT * FROM XXX a WHERE a.id='111' for update; Commit; // Commit the transactionCopy the code
Data is committed automatically by default, so set autocommit=0; This operation do not forget to step on the pit several times. Execute commit after completion. Unlock the account.
Set autocommit=1; To restore the automatic transaction commit feature of the database.
- Prepare a piece of data for the interface test
- Execute SQL select.. For UPDATE locks row records
- The interface call requests with the same ID. Because records were locked, the API update failed, successfully forging high concurrency and creating SQL problems caused by row locks
3. Problem summary
- Database protection configuration: Both maxActive and maxWait are configured, which is equivalent to fusing protection
- Mybatis object mapping needs to focus on the scope of the data
- Use select for UPDATE to create row locks to fake high concurrency data problems