preface

Today I introduce the idea of MongoDB index design in the work, for your reference, there is an article in front of the MongoDB query syntax and table storage scheme, do not understand the friends can click the following link to learn.

Learn Mongo operation syntax quickly with single table SQL

MongoDB stores query schemes in separate tables

The body of the

We will directly into the topic, combined with the actual business table design and index design

Business background

We are a monitoring system. The data uploaded by the equipment every minute is analyzed and stored in MongoDB, and the data will have the number and time information of the equipment (timestamp) and some monitoring data (meteorological related).

  • Precise query (Query the data of a specified device at a specified point in time)
  • Range query (Query the data of a specified device in a specified time range)
  • The data structure
# original single data content {"deviceCode": "xxx"."ts": 1632450060000."temp": 27.1."humidity": 0.77."windDirection": 180."windSpeed": 10
}
Copy the code
field describe unit
deviceCode Equipment coding There is no
ts time ms
temp The temperature
humidity humidity %
windDirection The direction of the wind °
windSpeed The wind speed m/s

Before optimization

Indexes are created based on query requirements. The query speed is guaranteed, but the number of indexes costs a lot of memory and increases quickly

Create a collection db.createcollection ("wather_data") // 2 Db.weather_data.createindex ({"deviceCode":1,"ts":1}) // 3 Initialization 1 million fake data. Here I use code generation import cn hutool. Core. Util. RandomUtil; import com.mongodb.BasicDBObject; import com.mongodb.DBObject; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.data.mongodb.core.MongoTemplate; import org.springframework.web.bind.annotation.GetMapping; import org.springframework.web.bind.annotation.RestController; import java.math.RoundingMode; import java.util.ArrayList; import java.util.Calendar; import java.util.List; @RestController public class Simulator { @Autowired private MongoTemplate mongoTemplate; @getMapping ("simulator") public void execute() {List<String> deviceCodeList = new ArrayList<>(1000); for (int i = 1; i <= 100; i++) { deviceCodeList.add("code_" + i); } Calendar calendar = Calendar.getInstance(); For (int I = 0; i < 10000; i++) { List<DBObject> dataList = new ArrayList<>(1000); for (String deviceCode : deviceCodeList) { DBObject data = new BasicDBObject(); // Device encoding data.put("deviceCode", deviceCode); Data. put("ts", calendar.gettimeinmillis ()); // Data. put("ts", calendar.gettimeinmillis ()); // data. Put ("temp", RandomUtil. RandomDouble (20, 40, 2, roundingmode.down)); data.put("humidity", RandomUtil.randomDouble(10, 80, 2, RoundingMode.DOWN)); data.put("windDirection", RandomUtil.randomDouble(0, 360, 2, RoundingMode.DOWN)); data.put("windSpeed", RandomUtil.randomDouble(0, 30, 2, RoundingMode.DOWN)); dataList.add(data); } / / bulk storage of 100 points in a minute data mongoTemplate. GetCollection (" weather_data "). The insert (dataList); Calendar. add(calendar. MINUTE, -1); }}}Copy the code

Query speed

Precise query (query data at a specified point for a specified minute)

Db.getcollection ('weather_data').find({'deviceCode':"code_88", "ts": 1632464271085}) with index: 1 to 2ms without index: 200 to 300msCopy the code

Range query (query one hour of data at a specified point)

db.getCollection('weather_data').find({'deviceCode':"code_88", "ts": {'$gte':1632460545000,'$lte':1632464271085}}); With indexes: 1 to 3ms without indexes: 300 to 1000msCopy the code

Disk usage and index memory usage (unit: M)

db.getCollection('weather_data').stats(1024*1024); Memory overhead of the index deviceCode_1_ts_1 13M disk size used by the set storageSize 57MCopy the code

For 100W pieces of data, the combined query index is built according to deviceCode and TS. The memory occupied by the query index of a single set reaches 13M. If the amount of data reaches tens of millions, the memory occupied by the index of the set will reach hundreds of megabytes. For the monitoring system, the data is increasing over time, even if the table processing, the overall index memory usage is the same, the index optimization scheme is introduced below.

Optimization scheme

Optimization purposes

Ensure query performance and reduce index memory consumption

Data changes

Format the TS timestamp before putting it into the library, adding an HOUR field

{
   "deviceCode": "xxx",
   "ts": 1632450060000,
   "hour": "2021-09-24 10",
   "temp": 27.1,
   "humidity": 0.77,
   "windDirection": 180,
   "windSpeed": 10
}
Copy the code

The index

In the Mongo collection, create indexes according to deviceCode and hours

db.weather_data_01.createIndex({"deviceCode":1,"hour":1})
Copy the code

Disk usage and index memory usage (unit: M)

db.getCollection('weather_data_01').stats(1024*1024); Memory overhead of the index deviceCode_1_ts_1 5M disk size used by the collection storageSize 45MCopy the code

With the new index design, the memory usage of the index and the disk usage of the data are greatly reduced. The memory usage is less than 40% of the previous

The query implementation

Precise query

Db.getcollection ('weather_data_01').find({"deviceCode":"code_88","hour":"2021-09-24 16","ts":1632470878256}) 1 to 2ms no index time: 200 to 300msCopy the code

Range query (query one hour of data at a specified point)

Db.getcollection ('weather_data_01'). Find ({"deviceCode":"code_88","hour":"2021-09-24 15"}) time: 1 to 2msCopy the code

At this time, because we add an HOUR field to our data and build indexes according to deviceCode and HOUR, we can query all data of the specified hour without ts range query

Range query (spanning two hours)

Query the data between 2021-09-24 15:10 and 2021-09-24 16:10 on deviceCode code_88 db.getCollection('weather_data_01').find({"deviceCode":"code_88","hour":{"$gte":"2021-09-24 15","$lte":"2021-09-24 16"},  "ts":{"$gte":1632467400000,"$lte":1632471000000}}); Time: 1 ~ 3 msCopy the code

After optimization, not only the memory occupation is significantly reduced, but also we are more flexible in the query business. If we are involved in the query of a day’s minute data business, we can format TS into Day, store a date field and index when entering the library. When we design the index, we can also create fields out of thin air for the index. For example, when our data involves the mobile phone number and relevant precise query, we can generate a field for the first three digits of the mobile phone number and create the index.

conclusion

Select * from mongo where id = ‘UUID’; select * from mongo where id = ‘UUID’; select * from mongo where id = ‘UUID’; In the primary key query service, you can use the _id to query. Please correct any errors.