There are three main ways to process batch data:

  1. Execute a single insert statement repeatedly
  2. foreachJoining togethersql
  3. The batch

First, preliminary preparation

Based on Spring Boot + Mysql, lombok is used to omit get/set, see Pom.xml.

1.1 table structure

The ID uses the database increment.

DROP TABLE IF EXISTS `user_info_batch`;
CREATE TABLE `user_info_batch` (
                           `id` bigint(11) unsigned NOT NULL AUTO_INCREMENT COMMENT 'primary key id',
                           `user_name` varchar(100) NOT NULL COMMENT 'Account Name',
                           `pass_word` varchar(100) NOT NULL COMMENT 'Login password',
                           `nick_name` varchar(30) NOT NULL COMMENT 'nickname',
                           `mobile` varchar(30) NOT NULL COMMENT 'Mobile phone Number',
                           `email` varchar(100) DEFAULT NULL COMMENT 'Email address',
                           `gmt_create` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Creation time',
                           `gmt_update` timestamp NULL DEFAULT NULL COMMENT 'Update Time'.PRIMARY KEY (`id`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC COMMENT 'Mybatis Batch';
Copy the code

1.2 Project Configuration File

RewriteBatchedStatements =true rewriteBatchedStatements=true Not so fast, we’ll talk about it later.

# database configuration spring: a datasource: url: JDBC: mysql: / / 47.111.118.152:3306 / mybatis? rewriteBatchedStatements=true username: mybatis password: password driver-class-name: com.mysql.cj.jdbc.Driver # mybatis mybatis: mapper-locations: classpath:mapper/*.xml type-aliases-package: cn.van.mybatis.batch.entityCopy the code

1.3 entity class

@Data
@Accessors(chain = true)
public class UserInfoBatchDO implements Serializable {
    private Long id;

    private String userName;

    private String passWord;

    private String nickName;

    private String mobile;

    private String email;

    private LocalDateTime gmtCreate;

    private LocalDateTime gmtUpdate;
}
Copy the code

1.4 UserInfoBatchMapper

public interface UserInfoBatchMapper {

    /** Single insert *@param info
     * @return* /
    int insert(UserInfoBatchDO info);

    /** * foreach inserts *@param list
     * @return* /
    int batchInsert(List<UserInfoBatchDO> list);
}
Copy the code

1.5 UserInfoBatchMapper.xml


      
<! DOCTYPEmapper PUBLIC "- / / mybatis.org//DTD Mapper / 3.0 / EN" "http://mybatis.org/dtd/mybatis-3-mapper.dtd">
<mapper namespace="cn.van.mybatis.batch.mapper.UserInfoBatchMapper">

  <insert id="insert" parameterType="cn.van.mybatis.batch.entity.UserInfoBatchDO">
    insert into user_info_batch (user_name, pass_word, nick_name, mobile, email, gmt_create, gmt_update)
    values (#{userName,jdbcType=VARCHAR}, #{passWord,jdbcType=VARCHAR},#{nickName,jdbcType=VARCHAR}, #{mobile,jdbcType=VARCHAR}, #{email,jdbcType=VARCHAR}, #{gmtCreate,jdbcType=TIMESTAMP}, #{gmtUpdate,jdbcType=TIMESTAMP})
  </insert>

  <insert id="batchInsert">
    insert into user_info_batch (user_name, pass_word, nick_name, mobile, email, gmt_create, gmt_update)
    values
    <foreach collection="list" item="item" separator=",">
      (#{item.userName,jdbcType=VARCHAR}, #{item.passWord,jdbcType=VARCHAR}, #{item.nickName,jdbcType=VARCHAR}, #{item.mobile,jdbcType=VARCHAR}, #{item.email,jdbcType=VARCHAR}, #{item.gmtCreate,jdbcType=TIMESTAMP}, #{item.gmtUpdate,jdbcType=TIMESTAMP})
    </foreach>
  </insert>
</mapper>
Copy the code

1.6 Preparing Data

To facilitate testing, several variables were removed and pre-loaded.

private List<UserInfoBatchDO> list = new ArrayList<>(); private List<UserInfoBatchDO> lessList = new ArrayList<>(); private List<UserInfoBatchDO> lageList = new ArrayList<>(); private List<UserInfoBatchDO> warmList = new ArrayList<>(); Private StopWatch sw = new StopWatch();Copy the code
  • To facilitate the assembly of the data, a common method is extracted.
private List<UserInfoBatchDO> assemblyData(int count){ List<UserInfoBatchDO> list = new ArrayList<>(); UserInfoBatchDO userInfoDO; for (int i = 0; i < count; I++){userInfoDO = new UserInfoBatchDO().setusername ("Van").setnickname (" dust blog ").setmobile ("17098705205") .setPassWord("password") .setGmtUpdate(LocalDateTime.now()); list.add(userInfoDO); } return list; }Copy the code
  • Preheat data
    @Before
    public void assemblyData() {
        list = assemblyData(200000);
        lessList = assemblyData(2000);
        lageList = assemblyData(1000000);
        warmList = assemblyData(5);
    }
Copy the code

Execute a single insert statement repeatedly

Perhaps a ‘lazy’ programmer would do this simply by nesting a for loop on top of the original single INSERT statement.

2.1 Mapper interface

int insert(UserInfoBatchDO info);
Copy the code

2.2 Test Method

Because this method was too slow, the data was reduced to 2000

@test public void insert() {log.info(" log.info "); for (UserInfoBatchDO userInfoBatchDO : warmList) { userInfoBatchMapper.insert(userInfoBatchDO); } log.info(" [warm-up end] "); Sw.start (" execute single insert statement repeatedly "); For (UserInfoBatchDO UserInfoBatchDO: lessList) { userInfoBatchMapper.insert(userInfoBatchDO); } sw.stop(); log.info("all cost info:{}",sw.prettyPrint()); }Copy the code

2.3 Execution Time

  • For the first time,
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - ms % Task name -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - 59887 100% repeatedly perform a single insert statementCopy the code
  • The second time
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - ms % Task name -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - 64853 100% repeatedly perform a single insert statementCopy the code
  • The third time
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - ms % Task name -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - 58235 100% repeatedly perform a single insert statementCopy the code

This method inserts 2000 pieces of data, the average time of three times: 60991 ms.

Three,foreachJoining togetherSQL

3.1 Mapper interface

int batchInsert(List<UserInfoBatchDO> list);
Copy the code

3.2 Test Method

This method and the next method both use 20W data test.

@test public void batchInsert() {log.info(); for (UserInfoBatchDO userInfoBatchDO : warmList) { userInfoBatchMapper.insert(userInfoBatchDO); } log.info(" [warm-up end] "); Sw. Start (" foreach splicing SQL "); userInfoBatchMapper.batchInsert(list); sw.stop(); log.info("all cost info:{}",sw.prettyPrint()); }Copy the code

3.3 Execution Time

  • For the first time,
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - ms % Task name -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - 18835 100% foreach splicing  sqlCopy the code
  • The second time
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - ms % Task name -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - 17895 100% foreach splicing  sqlCopy the code
  • The third time
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - ms % Task name -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - 19827 100% foreach splicing  sqlCopy the code

This method inserts 20W data, and the average execution time of three times is 18852 ms.

Four, batch processing

Mapper and XML reuse 2.1 in this approach.

4.1 rewriteBatchedStatementsparameter

At the beginning of the test, I found that the method changed to Mybatis Batch submission did not work. In fact, the insertion was still one record at a time, and the speed was far inferior to the original method of foreach splice SQL, which was very unscientific.

Later it was discovered that to batch execute words, connectURLWe need to add a new parameter to the string:rewriteBatchedStatements=true

  • rewriteBatchedStatementsParameter is introduced

RewriteBatchedStatements must be added to the URL of the MySql JDBC connection and the driver version 5.1.13 or later is required to implement high-performance batch inserts. By default, the MySql JDBC driver ignores the executeBatch() statement and sends a batch of SQL statements to the MySql database one by one. Batch inserts are actually single inserts, resulting in poor performance. If rewriteBatchedStatements is set to true, the driver will execute SQL in batches for you. This option is valid for INSERT/UPDATE/DELETE.

4.2 Batch Preparation

  • Manual injectionSqlSessionFactory
    @Resource
    private SqlSessionFactory sqlSessionFactory;
Copy the code
  • The test code
@test public void processInsert() {log.info(" log.info "); for (UserInfoBatchDO userInfoBatchDO : warmList) { userInfoBatchMapper.insert(userInfoBatchDO); } log.info(" [warm-up end] "); Sw.start (" batch execute insert "); / / open the BATCH SqlSession session. = sqlSessionFactory openSession (ExecutorType. BATCH); UserInfoBatchMapper mapper = session.getMapper(UserInfoBatchMapper.class); for (int i = 0,length = list.size(); i < length; i++) { mapper.insert(list.get(i)); If (I %20000==19999){session.com MIT (); session.clearCache(); } } session.commit(); session.clearCache(); sw.stop(); log.info("all cost info:{}",sw.prettyPrint()); }Copy the code

4.3 Execution Time

  • For the first time,
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - ms % Task name -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - 09346 100% batch insertCopy the code
  • The second time
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - ms % Task name -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - 08890 100% batch insertCopy the code
  • The third time
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - ms % Task name -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - 09042 100% batch insertCopy the code

This method inserts 20w pieces of data, and the average time for three times is 9092 ms.

4.4 If the data is larger

When I scaled up to 100W, foreach concatenated SQL was no longer able to do the insert, so I had to test the batch insert time.

During the test, just cut the list in the [4.2] test code into lageList test.

  • For the first time,
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - ms % Task name -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - 32419 100% batch insertCopy the code
  • The second time
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - ms % Task name -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - 31935 100% batch insertCopy the code
  • The third time
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - ms % Task name -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - 33048 100% batch insertCopy the code

In this mode, 100w data is inserted, and the average execution time for three times is 32467 ms.

Five, the summary

Batch insertion mode The amount of data The average time of three executions
Cyclic insertion of a single piece of data 2000 60991 ms
foreachJoining togethersql 20w 18852 ms
The batch 20w 9092 ms
The batch 100w 32467 ms
  1. Cyclic insertion of a single piece of data is extremely inefficient, but the amount of code is very small. It can be used when the amount of data is small, but it is forbidden to use when the amount of data is large, so the efficiency is too low.
  2. foreachJoining togethersql, when using a large section of XML and SQL statements to write, very prone to error, although the efficiency is acceptable, but really deal with a large amount of data, still can not use, so not recommended;
  3. Batch execution is recommended for large data inserts and is easier to use.

[Example code of this article]

Copy the code