This article introduces the use of MongoDB bulkWrite() by comparing the operations in batches and non-batches. By the way, compare the efficiency of these two different types of databases with the relational database MySQL. If you just want to learn how to use bulkWrite(), look at part 1.

Test environment: Win7 flagship edition, 16G memory, I3 processor, MongoDB3.0.2, mysQL5.0

First, MongoDB batch operation

MongoDB Operations on data are divided into Read Operations and Write Operations. Read Operations include query Operations, and Write Operations include deletion, insertion, replacement, and update Operations. MongoDB provides clients to perform Write Operations in bulk mode. In Java driver, corresponding to MongoCollection bulkWrite() method, first look at the signature of this method:

BulkWriteResult  com.mongodb.client.MongoCollection.bulkWrite(List<? extends WriteModel<? extends Document>> requests)
Copy the code

This method requires passing in a List collection of elements of type WriteModel, which represents a base class model available for bulk writes. It has subclasses DeleteManyModel, DeleteOneModel, InsertOneModel, ReplaceOneModel, UpdateManyModel, UpdateOneModel, As you can see from its name, it corresponds to delete, insert, replace, update several operations. This method returns a BulkWriteResult object, which represents the result of a successful batch write operation and encapsulates the status information of the operation result, such as the number of insert, update, and delete records.

1. Insert operation

(1) Batch insertion

This method takes a set of parameters containing the Document object to be inserted, iterates through the set, constructs an InsertOneModel object using Document, and each InsertOneModel instance represents an operation to insert a single Document. Then the instance is added to the List, and the bulkWrite() method is called to pass in the List that stores all the insert operations to complete the bulk insert.

public void bulkWriteInsert(List<Document> documents){ List<WriteModel<Document>> requests = new ArrayList<WriteModel<Document>>(); for (Document document : Documents) {InsertOneModel<Document> iom = new InsertOneModel<Document>(Document); requests.add(iom); } BulkWriteResult bulkWriteResult = collection.bulkWrite(requests); System.out.println(bulkWriteResult.toString()); }Copy the code

Test: Let’s test with a main function. First construct 100,000 Product entity objects, use a utility class to convert them into JSON strings, parse them into Document objects, save them to a list collection, and test 100,000 object insertion times by calling the method written above.

TestMongoDB instance = TestMongoDB.getInstance(); ArrayList<Document> documents = new ArrayList<Document>(); for (int i = 0; i < 100000; I ++) {Product Product = new Product(I," book "," kite runner ",22.5); / / converting a Java object into a json String String jsonProduct = JsonParseUtil. GetJsonString4JavaPOJO (product); Document docProduct = document.parse (jsonProduct); documents.add(docProduct); } system.out. println(" Start insert data... ); long startInsert = System.currentTimeMillis(); instance.bulkWriteInsert(documents); System.out.println(" Insert data complete, total time: "+(system.currentTimemillis () -startinsert)+" milliseconds ");Copy the code

Results: 1560 milliseconds, multiple tests in about 1.5 seconds

(2) Insert one by one

The following method is used for comparison by inserting 100,000 data in non-batch:

public void insertOneByOne(List<Document> documents) throws ParseException{ for (Document document : documents){ collection.insertOne(document); }}Copy the code

Test: 100,000 pieces of data

System.out.println(" Start inserting data... ); long startInsert = System.currentTimeMillis(); instance.insertOneByOne(documents); System.out.println(" Insert data complete, total time: "+(system.currentTimemillis () -startinsert)+" milliseconds ");Copy the code

The result: 12,068 milliseconds, a huge difference. It can be seen that the efficiency of MongoDB batch insertion is much higher than that of single data insertion.

Supplement:

MongoCollection’s insertMany() method and bulkWrite() method are equivalent, and the test time is almost the same without any mapping.

Public void insertMany(List<Document> Documents) throws ParseException{// This is equivalent to bulkWrite() collection.insertMany(documents); }Copy the code

2. Delete operations

(1) Batch delete

Master batch insert, batch delete is to follow the gourd gourd gourd gourd gourd. Construct DeleteOneModel takes a Bson type argument, representing a delete operation, using a Bson subclass Document. Document.get (” ID “) is null before the document is inserted into the database. If you use other conditions such as productId, Add an index to the productId field after the document is inserted into the collection

collection.createIndex(new Document("productId", 1));
Copy the code

As the volume of collection data increases, the search becomes more time-consuming. Indexes are added to improve the search efficiency and thus speed up the deletion efficiency. In addition, it is worth mentioning that DeleteOneModel means to delete at most one record of the matching condition, and DeleteManyModel means to delete all records of the matching condition. To prevent multiple records from being deleted at once, use DeleteOneModel to ensure that only one record is deleted in an operation. Of course, it is impossible to match multiple records because the _id is unique.

public void bulkWriteDelete(List<Document> documents){ List<WriteModel<Document>> requests = new ArrayList<WriteModel<Document>>(); For (Document Document: Documents) {Document queryDocument = new Document("_id",document.get("_id")); Dom = new DeleteOneModel<Document>(queryDocument); requests.add(dom); } BulkWriteResult bulkWriteResult = collection.bulkWrite(requests); System.out.println(bulkWriteResult.toString()); }Copy the code

Test: 100,000 pieces of data

System.out.println(" Start deleting data... ") ); long startDelete = System.currentTimeMillis(); instance.bulkWriteDelete(documents); System.out.println(" Delete data completed, total time: "+(system.currentTimemillis () -startDelete)+" milliseconds ");Copy the code

Result: 2251 milliseconds

(2) Delete one by one

Let’s look at the deletions under non-batch

public void deleteOneByOne(List<Document> documents){ for (Document document : documents) { Document queryDocument = new Document("_id",document.get("_id")); DeleteResult deleteResult = collection.deleteOne(queryDocument); }}Copy the code

Test: 100,000 pieces of data

System.out.println(" Start deleting data... ") ); long startDelete = System.currentTimeMillis(); instance.deleteOneByOne(documents); System.out.println(" Delete data completed, total time: "+(system.currentTimemillis () -startDelete)+" milliseconds ");Copy the code

Results: 12765 ms, much lower than batch delete efficiency

3. Update operation

(1) Batch update

UpdateOneModel updates one record of a matching condition, while UpdateManyModel updates all records of a matching condition. For ReplaceOneModel, it’s a replace operation, which is also called an update, so let’s take UpdateOneModel as an example. The UpdateOneModel constructor accepts three parameters: the first parameter is the query condition, the second parameter is the content to be updated, and the third parameter is the optional UpdateOptions parameter. If you do not fill this parameter, you will automatically create a new parameter. Its upser property defaults to false, which does nothing. True means that a new Document is inserted into the database. This new Document is a combination of query and update documents, but if it’s a replacement, This new Document is this replacement Document. There is a question: is this not the same as the result of the replacement operation after matching the query condition? The difference is that the id field, when the query condition is not matched, the id of the new Document inserted is new, while the replacement operation successfully performed, and the _ID is the original old.

public void bulkWriteUpdate(List<Document> documents){ List<WriteModel<Document>> requests = new ArrayList<WriteModel<Document>>(); For (Document Document: Documents) {Document queryDocument = new Document("_id",document.get("_id")); $set = new Document("$set",new Document("price","30.6")); / / structure update operation model of a single Document UpdateOneModel < Document > uom = new UpdateOneModel < Document > (queryDocument updateDocument, new UpdateOptions().upsert(false)); //UpdateOptions represents a batch update operation that does not match the query condition. The default value is false, which does not do anything, and true means that a new Document is inserted into the database, which is the combination of the query part and the update part requests. } BulkWriteResult bulkWriteResult = collection.bulkWrite(requests); System.out.println(bulkWriteResult.toString()); }Copy the code

Test: 100,000 pieces of data

System.out.println(" Start updating data... ") ); long startUpdate = System.currentTimeMillis(); instance.bulkWriteUpdate(documents); System.out.println(" Update completed, total time: "+(system.currentTimemillis () -startupDate)+" milliseconds ");Copy the code

Result: 3198 milliseconds

(2) Update item by item

Compare non-batch updates

public void updateOneByOne(List<Document> documents){ for (Document document : documents) { Document queryDocument = new Document("_id",document.get("_id")); Document updateDocument = new Document("$set",new Document("price","30.6")); UpdateResult UpdateResult = collection.updateOne(queryDocument, updateDocument); }}Copy the code

Test: 100,000 pieces of data

System.out.println(" Start updating data... ") ); long startUpdate = System.currentTimeMillis(); instance.updateOneByOne(documents); System.out.println(" Update completed, total time: "+(system.currentTimemillis () -startupDate)+" milliseconds ");Copy the code

Results: 13979 ms, much lower than batch update efficiency

4. Mixed batch operation

The bulkWrite() method can process different types of write operations in batches. The code is as follows:

public void bulkWriteMix(){
  List<WriteModel<Document>> requests = new ArrayList<WriteModel<Document>>();
   InsertOneModel<Document>  iom = new InsertOneModel<Document>(new Document("name","kobe"));
   UpdateManyModel<Document> umm = new UpdateManyModel<Document>(new Document("name","kobe"), 
     new Document("$set",new Document("name","James")),new UpdateOptions().upsert(true));
   DeleteManyModel<Document>  dmm = new DeleteManyModel<Document>(new Document("name","James"));
   requests.add(iom);
   requests.add(umm);
   requests.add(dmm);
   BulkWriteResult bulkWriteResult = collection.bulkWrite(requests);
   System.out.println(bulkWriteResult.toString());
 }
Copy the code

Note that the updateMany() and deleteMany() methods are different from insertMany() in that they do not operate in bulk, but instead update (delete) all the data in the matching condition.

Ii. Performance comparison with MySQL

1. Insert operation

(1) Batch insert

Like MongoDB, insert the Product entity object as follows

public void insertBatch(ArrayList<Product> list) throws Exception{ Connection conn = DBUtil.getConnection(); try { PreparedStatement pst = conn.prepareStatement("insert into t_product value(? ,? ,? ,?) "); int count = 1; for (Product product : list) { pst.setInt(1, product.getProductId()); pst.setString(2, product.getCategory()); pst.setString(3, product.getName()); pst.setDouble(4, product.getPrice()); pst.addBatch(); if(count % 1000 == 0){ pst.executeBatch(); pst.clearBatch(); } count++; // void all parameters in PreparedStatement; } conn.commit(); } catch (SQLException e) { e.printStackTrace(); } DBUtil.closeConnection(conn); }Copy the code

JDBC automatically commits transactions by default, remember to add the following line after obtaining the connection to turn off automatic transaction commits.

connection.setAutoCommit(false);
Copy the code

Test: 100,000 pieces of data

public static void main(String[] args) throws Exception { TestMysql test = new TestMysql(); ArrayList<Product> list = new ArrayList<Product>(); for (int i = 0; i < 1000; I ++) {Product Product = new Product(I, "book "," kite runner ", 20.5); list.add(product); } system.out. println("MYSQL starts to insert data... ); long insertStart = System.currentTimeMillis(); test.insertBatch(list); +(system.currentTimemillis () -insertstart)+" milliseconds "); }Copy the code

Results: 7389 milliseconds, multiple tests basic about 7 seconds

(2) Insert one by one

Mysql > insert into mysql

public void insertOneByOne(ArrayList<Product> list) throws Exception{ Connection conn = DBUtil.getConnection(); try { for (Product product : list) { PreparedStatement pst = conn.prepareStatement("insert into t_product value(? ,? ,? ,?) "); pst.setInt(1, product.getProductId()); pst.setString(2, product.getCategory()); pst.setString(3, product.getName()); pst.setDouble(4, product.getPrice()); pst.executeUpdate(); //conn.commit(); } conn.mit ();} conn.mit (); } catch (SQLException e) { e.printStackTrace(); } DBUtil.closeConnection(conn); }Copy the code

Test: 100,000 records

System.out.println("MYSQL starts to insert data... ); long insertStart = System.currentTimeMillis(); test.insertOneByOne(list); +(system.currentTimemillis () -insertstart)+" milliseconds ");Copy the code

Results: 8921 ms, more than 1 second slower than batch.

2. Delete operations

(1) Batch deletion

Select * from table where (productId, productId, productId, productId, productId); Remember to add a primary key, which has an index by default, so records can be matched faster.

public void deleteBatch(ArrayList<Product> list) throws Exception{ Connection conn = DBUtil.getConnection(); try { PreparedStatement pst = conn.prepareStatement("delete from t_product where id = ?" ); Int count = 1; for (Product product : list) { pst.setInt(1, product.getProductId()); pst.addBatch(); if(count % 1000 == 0){ pst.executeBatch(); pst.clearBatch(); } count++; } conn.commit(); } catch (SQLException e) { e.printStackTrace(); } DBUtil.closeConnection(conn); }Copy the code

Test: 100,000 pieces of data

System.out.println("MYSQL starts deleting data... ); long deleteStart = System.currentTimeMillis(); test.deleteBatch(list); System.out.println("MYSQL delete data completed, total time: "+(system.currentTimemillis () -deletestart)+");Copy the code

Result: 7936 milliseconds

(2) Delete one by one

The following code

public void deleteOneByOne(ArrayList<Product> list) throws Exception{ Connection conn = DBUtil.getConnection(); PreparedStatement pst = null; try { for (Product product : list) { pst = conn.prepareStatement("delete from t_product where id = ?" ); pst.setInt(1, product.getProductId()); pst.executeUpdate(); //conn.commit(); } conn.mit ();} conn.mit (); } catch (SQLException e) { e.printStackTrace(); } DBUtil.closeConnection(conn); }Copy the code

Test: 100,000 pieces of data

System.out.println("MYSQL starts deleting data... ); long deleteStart = System.currentTimeMillis(); test.deleteOneByOne(list); System.out.println("MYSQL delete data completed, total time: "+(system.currentTimemillis () -deletestart)+");Copy the code

Result: 8752 milliseconds, about a second slower than batch deletion

3. Update operation

(1) Batch update

The following code

public void updateBatch(ArrayList<Product> list) throws Exception{ Connection conn = DBUtil.getConnection(); PST = conn. PrepareStatement ("update t_product set price=31.5 where id=?" ); int count = 1; for (Product product : list) { pst.setInt(1, product.getProductId()); pst.addBatch(); if(count % 1000 == 0){ pst.executeBatch(); pst.clearBatch(); } count++; // void all parameters in PreparedStatement; } conn.commit(); } catch (SQLException e) { e.printStackTrace(); } DBUtil.closeConnection(conn); }Copy the code

Test: 100,000 pieces of data

System.out.println("MYSQL starts updating data... ); long updateStart = System.currentTimeMillis(); test.updateBatch(list); System.out.println("MYSQL update completed, total time: "+(system.currentTimemillis () -updatestart)+");Copy the code

Result: 8611 milliseconds

(2) Update item by item

The following code

public void updateOneByOne(ArrayList<Product> list) throws Exception{ Connection conn = DBUtil.getConnection(); try { for (Product product : List) {PreparedStatement PST = conn. PrepareStatement ("update t_product set price=30.5 where id=?" ); pst.setInt(1, product.getProductId()); pst.executeUpdate(); //conn.commit(); } conn.mit ();} conn.mit (); } catch (SQLException e) { e.printStackTrace(); } DBUtil.closeConnection(conn); }Copy the code

Test: 100,000 pieces of data

System.out.println("MYSQL starts updating data... ); long updateStart = System.currentTimeMillis(); test.updateOneByOne(list); System.out.println("MYSQL update completed, total time: "+(system.currentTimemillis () -updatestart)+");Copy the code

The result: 9430 milliseconds, about a second slower than the batch update

Third, summary

This paper mainly introduces the use of bulkWrite(), that is, the batch write operation of MongoDB. It can be seen from the experiment that MongoDB uses bulkWrite() to write a large amount of data, which is much more efficient than the conventional method. The article also introduces the comparison between batch and non-batch write operations in mysql. It can be seen that their batch processing is faster than non-batch processing, but not as obvious as MongoDB.

For the comparison between MongoDB and mysql, MongoDB inserts, deletes and updates faster than mysql in batch operation, while MongoDB inserts, deletes and updates slower than mysql in non-batch operation. Of course, this is only a preliminary conclusion, the paper did not carry out 100, 1000, 10000 or more such different data comparison, as well as CPU memory usage monitoring, interested can try.