Life is too short to have a dog
First, the problem scene
Taking advantage of the holidays these days, I have reviewed some historical issues left over last year, among which there is such a small problem about the database that I can’t help but turn it out and savor it. Let’s taste it together.
First, take a look at the problem site. The execution process of operating the database is as follows:
Here, the original business logic is simplified, and the simplified code is implemented as follows:
public void finishSubTask(SubTask subTask){
// Update the subtask status
subTaskService.updateFinish(subTask);
// Query the running tasks based on the id of the primary task
int runningSubTaskNum = subTaskService.countRunning(subTask.getTaskId);
// If the current running task is empty, the main task is terminated
if (runningSubTaskNum == 0) { taskService.updateFinish(subTask.getTaskId); }}Copy the code
At first glance, the logic and code seem to be fine, but sometimes during the actual run, the query will find that the result set is the same as before the update, as if the update did not take effect or “lost”, resulting in the failure to terminate the corresponding main task.
2. Go back to the source
The MySQL version is 5.6 (ali Cloud high availability version, that is, one master and one standby, transaction isolation level is read committed), and the server uses SpringBoot and MyBatis framework.
1. On-site inspection
The first time I encountered a problem was to check whether the database was updated, but after the query, it was found that the data was indeed updated, and then I checked the network problem of the machine at that time, and did not report any problems such as abnormal database connection. At this point, code exceptions and network jitter issues can be eliminated.
2. Try to recreate
When I couldn’t get any more clues from the problem scene, I started to try to reproduce the scene locally, but after many attempts, I found that the scene could not be reproduced locally, and the above process always worked correctly.
3. Reason carefully
After the failure of local replication, combined with the initial onsite troubleshooting, the problem is generally found in the SQL execution process of ORM framework (Mybatis here) in the system or the SQL execution process of MySQL server.
SQL execution process of MyBatis
Suppose the problem occurs in the SQL execution process of MyBatis, then there are two possible causes:
- Caused by reading local cache;
- Due to transaction isolation;
Since the configuration of MyBatis in the current project is basically default, MyBatis only uses the level 1 cache and only shares it at the session level. In order to determine the lifecycle of the sqlSession, let’s take a look at Mybatis’s logic for executing the agent method (Spring framework version) :
private class SqlSessionInterceptor implements InvocationHandler {
@Override
public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
// Obtain the sqlSession where the actual database operation is performed
SqlSession sqlSession = getSqlSession(SqlSessionTemplate.this.sqlSessionFactory,
SqlSessionTemplate.this.executorType, SqlSessionTemplate.this.exceptionTranslator);
try {
// The proxy method executes
Object result = method.invoke(sqlSession, args);
if(! isSqlSessionTransactional(sqlSession, SqlSessionTemplate.this.sqlSessionFactory)) {
// force commit even on non-dirty sessions because some databases require
// a commit/rollback before calling close()
// Commit the transaction manually if the transaction is not hosted to Spring
sqlSession.commit(true);
}
return result;
} catch (Throwable t) {
// Exception handling, omit
} finally {
if(sqlSession ! =null) {
// If sqlSession is not empty, the session is closed
closeSqlSession(sqlSession, SqlSessionTemplate.this.sqlSessionFactory);
}
}
}
}
}
Copy the code
GetSqlSession getSqlSession getSqlSession getSqlSession getSqlSession
public static SqlSession getSqlSession(SqlSessionFactory sessionFactory, ExecutorType executorType, PersistenceExceptionTranslator exceptionTranslator) {
notNull(sessionFactory, NO_SQL_SESSION_FACTORY_SPECIFIED);
notNull(executorType, NO_EXECUTOR_TYPE_SPECIFIED);
// We cannot register the newly created sqlSession with the sessionHolder set because we did not host the transaction to Spring. Therefore, we can obtain null sqlSessionHolder each time
SqlSessionHolder holder = (SqlSessionHolder) TransactionSynchronizationManager.getResource(sessionFactory);
SqlSession session = sessionHolder(executorType, holder);
if(session ! =null) {
return session;
}
LOGGER.debug(() -> "Creating a new SqlSession");
session = sessionFactory.openSession(executorType);
// Only when the transaction is managed by Spring will the newly created session be registered in the sessionHolder collection, which is stored using ThreadLocal for thread isolation
registerSessionHolder(sessionFactory, executorType, exceptionTranslator, session);
return session;
}
Copy the code
Looking at the original code again, there is no Spring-related transaction management method, that is, there is no transaction managed to Spring, so every time you get a new sqlSession in the getSqlSession method, This also means that there is no query statement and the query result is reserved in the sqlSession cache of the last query, that is, the problem is not caused by the level 1 cache of Mybatis.
Is it due to transaction isolation? As mentioned above, the project retains almost all the default configuration, where Spring’s transaction isolation level is consistent with the database (although Spring transaction management is not used). Observing the above sqlSession execution method, since the transaction is not managed to Spring, after the proxy method, sqlSession will actively force a COMMIT, regardless of whether there are dirty pages. According to the execution order above, the query transaction starts after the update transaction commits, so theoretically there should be no data before the update transaction commits.
Analysis of MyBatis execution process is fruitless, can only look at the internal execution process of MySQL server.
MySQL SQL execution procedure
Within MySQL services, an SQL request from a client passes through the connector, the query cache, the profiler, the optimizer, and finally the executor that performs the actual execution. Here’s how the executor executes an UPDATE statement:
During the execution of the update statement, the cache associated with the target table is cleared. In the request order above, the query statement will not be queried into the cache.
Again, the problem is transaction isolation. Is it really transaction isolation?
Inference & verification
Check the last step in the MyBatis again commit the transaction MIT () method of sqlSession.com, found that the internal just call the method JdbcTransaction/SpringManagedTransaction commit method, However, this method does not return a value, which means that the call here does not indicate that the committed transaction has actually been committed. It is possible that the updated transaction has not been committed yet, and the query transaction has been executed. In this case, the query can only update the result set before the transaction was committed.
With this in mind, I once again took a look at the interval between update requests and query requests for a few of the problem data, and the interval was indeed very short, averaging in the 10s of milliseconds (and some even shorter).
To further verify this guess, I used the thread.sleep () method in my test environment, and let the Thread sleep for 500 milliseconds after the update statement is executed before executing the following query statement. After a few days of stress testing, it was found that there was indeed no failure to execute the end of the primary task, and it was almost certain that transaction isolation was the cause.
Third, solutions
According to the above analysis, I finally designed three solutions:
Thread.sleep
The MySQL service update is not complete, so let the thread rest and let the update “fly” for a while.Use Spring transaction management
The main reason for the above problem is that the two statements are executed in two transactions. If the two statements are executed in one transaction, there is no transaction isolation problem, so you can choose to add this method@Transaction
Annotations, using one transaction to manage two statements;Use scheduled tasks for compensation processing
: Use the scheduled task to scan the main task table periodically and update the main task that has no neutron task running in the main task table to complete. This approach is for the current business logic, if the business logic is not the same, do not use;
Four, stern said
As the old saying goes: one bug changes a day. It is not that complicated, but the fun of learning (the fun of fixing bugs) does not lie in the process of exploring the root of the problem and finding a solution.
Finally, on the occasion of the Spring Festival, I wish everyone a happy New Year, good health and early wealth, ha ha ha ~