Consider the old and learn the new

The business is simple: you need to bulk insert some data, either from a table in another database or from an external Excel import

So the question is, do I have to look it up every time I insert it, do I have to sift through the code, do I have to filter out the duplicates?

When inserting a value into a big data database, determine whether the insert is duplicate, and then insert. How to improve efficiency

It seems I’m not the only one struggling with this problem.

There are many ways to solve this problem, and different scenarios have different solutions. In the case of a small amount of data, you can do whatever you want, but in the case of a large amount of data, it is not a simple problem.

Millions of data, impossible to trace, reprocessing

Tell me about the solution I googled

1, insert ignore into

When inserting data, such as error, such as duplicate data, no error will be returned, only in the form of a warning. Use ignore to ensure that the statement itself is fine, otherwise it will be ignored. Such as:

INSERT IGNORE INTO user (name) VALUES ('telami') 
Copy the code

This method is simple, but there is a possibility that the insertion is not due to duplicate data, but for some other reason, which is also ignored

2. On Duplicate key Update

If the primary or unique statement is repeated, execute the update statement. If the primary or unique statement is invalid after the update, such as id=id, execute the same function as 1.

For example, to insert data with duplicate names without error, use the following statement:

INSERT INTO user (name) VALUES ('telami') ON duplicate KEY UPDATE id = id 
Copy the code

This method requires that the constraint to be inserted be either a primary key or a unique constraint (in your business, that field should be set as a unique constraint, i.e. a unique key).

3, insert… The select… where not exist

Insert (primary, unique); insert (primary, unique); Such as:

INSERT INTO user (name) SELECT 'telami' FROM dual WHERE NOT EXISTS (SELECT id FROM user WHERE id = 1) 
Copy the code

This method actually uses a temporary table in mysql, but there is a subquery, efficiency is also a little bit affected, if the above can not use this method.

4, replace into if there is a primary or unique record, delete first. Insert a new record.

REPLACE INTO user SELECT 1, 'telami' FROM books 
Copy the code

This method is to delete and then insert the same record regardless of whether the original record has been deleted.

Practice chooses the second way

<insert id="batchSaveUser" parameterType="list">        insert into user (id,username,mobile_number)        values        <foreach collection="list" item="item" index="index" separator=",">            (                #{item.id},                #{item.username},                #{item.mobileNumber}            )        </foreach>        ON duplicate KEY UPDATE id = id    </insert>
Copy the code

This is Mybatis, a batch insert operation, mobile_number has added a unique constraint. In this way, when batch insertion, if there is the same phone number, it will not be inserted.

Source: www.telami.cn/2018/when-m…