background

Recently, I encountered a scenario that needed to insert data in batches in the project. When the amount of data was 20W, it would take nearly 36s (with index).

Use MyBatis dynamic SQL, code like this:

<insert id="batchInsert" useGeneratedKeys="true" keyProperty="id">
    INSERT INTO sample(X,X,X)
    VALUES
    <foreach collection="list" item="item" separator=",">
        (#{X},#{X},#{X})
    </foreach>
</insert>
Copy the code

The final spliced SQL execution looks like this:

INSERT INTO sample(X,X,X) VALUES(X, X, X) (X, X, X) (X, X, X)...Copy the code

If it is query optimization, you can also work on the index, but in the scenario of batch insert, SQL statements have no room to discuss, currently only think of batch insert data sets, control the length of a single execution of SQL. Is this the limit?

At my colleague’s suggestion, I used JProfiler to check the call tree time, and found that about 10s of the total time of nearly 36s were preparedStatement.setxxx () assignment operations. To try it out, use StringBuilder to append a SQL file directly and execute it in 11s, which is 3 times worse. While you can expect to see a bit of loss in handling dynamic SQL with MyBatis, the gap is still huge.

I isolated this scene and did an experiment, and here’s how it went

Scenario reduction

Suppose to insert each record has 10 fields of type String, each field data is “abcdefghijklmnopqrstuvwxyz”, each batch insert 1 w, were inserted into the article 50 w.

Restore under SpringBootTest:

Table structure and Java objects

DROP TABLE IF EXISTS `sample`;
CREATE TABLE `sample` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `col1` varchar(255) COLLATE utf8_bin DEFAULT NULL.`col2` varchar(255) COLLATE utf8_bin DEFAULT NULL.`col3` varchar(255) COLLATE utf8_bin DEFAULT NULL.`col4` varchar(255) COLLATE utf8_bin DEFAULT NULL.`col5` varchar(255) COLLATE utf8_bin DEFAULT NULL.`col6` varchar(255) COLLATE utf8_bin DEFAULT NULL.`col7` varchar(255) COLLATE utf8_bin DEFAULT NULL.`col8` varchar(255) COLLATE utf8_bin DEFAULT NULL.`col9` varchar(255) COLLATE utf8_bin DEFAULT NULL.`col10` varchar(255) COLLATE utf8_bin DEFAULT NULL,
  PRIMARY KEY (`id`))ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
Copy the code
@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
public class Sample implements Serializable {

    private Long id;
    private String col1;
    private String col2;
    private String col3;
    private String col4;
    private String col5;
    private String col6;
    private String col7;
    private String col8;
    private String col9;
    private String col10;
}
Copy the code

Test data generation

public List<Sample> buildData(a) {
    List<Sample> samples = new ArrayList<>();
    String col = "abcdefghijklmnopqrstuvwxyz";
    for (int i = 0; i < 500000; i++) {
        Sample sample = Sample.builder()
                .col1(col)
                .col2(col)
                .col3(col)
                .col4(col)
                .col5(col)
                .col6(col)
                .col7(col)
                .col8(col)
                .col9(col)
                .col10(col)
                .build();
        samples.add(sample);
    }
    return samples;
}
Copy the code

MyBatis test

Dynamic SQL is as follows:

<insert id="batchInsertSamples" parameterType="java.util.List" useGeneratedKeys="true" keyProperty="id">
    INSERT INTO sample(col1,col2,col3,col4,col5,col6,col7,col8,col9,col10)
    VALUES
    <foreach collection="samples" item="item" separator=",">
        (#{item.col1},#{item.col2},#{item.col3},#{item.col4},#{item.col5},
        #{item.col6},#{item.col7},#{item.col8},#{item.col9},#{item.col10})
    </foreach>
</insert>
Copy the code

Test code:

@Autowired
private SampleMapper sampleMapper;

@Test
public void testMyBatis(a) {
    List<Sample> samples = buildData();
    StopWatch stopWatch = new StopWatch();
    System.out.println("Start using MyBatis for bulk inserts");
  
    stopWatch.start();
    List<List<Sample>> batch = Lists.partition(samples,10000);
    for(List<Sample> part :batch){
        sampleMapper.batchInsertSamples(part);
    }
    stopWatch.stop();

    System.out.println("Mybatis batch insert completed, time:" + stopWatch.getTotalTimeMillis());
}
Copy the code

The time was 26.439s

Append test

@Autowired
private JdbcTemplate jdbcTemplate;

@Test
public void testAppend(a){
    List<Sample> samples = buildData();
    String prefix = "INSERT INTO sample(col1,col2,col3,col4,col5,col6,col7,col8,col9,col10) VALUES";

    StopWatch stopWatch = new StopWatch();
    System.out.println("Start direct concatenation SQL insert");
    
    stopWatch.start();
    List<List<Sample>> batch = Lists.partition(samples,10000);
    for(List<Sample> part :batch){
        StringBuilder sb = new StringBuilder();
        for(Sample sample :part){
            sb.append("(");
            sb.append("\" "+sample.getCol1()+"\" ").append(",");
            sb.append("\" "+sample.getCol2()+"\" ").append(",");
            sb.append("\" "+sample.getCol3()+"\" ").append(",");
            sb.append("\" "+sample.getCol4()+"\" ").append(",");
            sb.append("\" "+sample.getCol5()+"\" ").append(",");
            sb.append("\" "+sample.getCol6()+"\" ").append(",");
            sb.append("\" "+sample.getCol7()+"\" ").append(",");
            sb.append("\" "+sample.getCol8()+"\" ").append(",");
            sb.append("\" "+sample.getCol9()+"\" ").append(",");
            sb.append("\" "+sample.getCol10()+"\" ");
            sb.append(")");
            sb.append(",");
        }
        String sql = prefix + sb.replace(sb.length()-1,sb.length(),"");
        jdbcTemplate.execute(sql);
    }
    stopWatch.stop();

    System.out.println("Stitching SQL batch insert completed, time:" + stopWatch.getTotalTimeMillis());
}
Copy the code

It took 13.473 seconds

The test results

Twice that difference, which is still a lot.

Never expected a simple assignment operation, when the amount of data, there will be so much difference