1. Background — Company Internet of Things project

Massive devices access the cloud through the Internet of Things and report their own data (hereinafter referred to as dynamic data) every 30 seconds. The Internet of Things service forwards the data reported by the device to the data processing gateway, and the data entry gateway inserts the data into the database in batches. The general technical framework of the project is as follows:

Problem 2.

When a large number of devices are added, a large amount of dynamic data is reported, resulting in slow query of a single table.

Assuming 10,000 devices report dynamic data every 30 seconds, that’s 20,000 pieces of data per minute, 28.8 million pieces of data per day, or more than 10 billion pieces of data per year.

With such a large amount of data, it is totally unacceptable to delay operations such as single table query database analysis, so we need to find a solution.

3. Technical background

Table 3.1

The split table refers to splitting a certain number of devices into different tables based on their serial numbers to reduce the data magnitude of a single table.

3.2 partition

The data in the MySql database is stored on disk as a file, under/MySql /data by default (this can be viewed by datadir in my.cnf).

A table mainly corresponds to three files, one is FRM store table structure, one is MYD store table data, one is MYI store table index. If the amount of data in a table is too large, mydMyI becomes too large, and finding data becomes slow.

The partition function of MySql physically divides the three files corresponding to this table into many small pieces. In this way, when searching for a piece of data, you do not need to search all of them. You only need to know where the data is and then look for it.

MySql5.1 and above supports partitioning. MySql > alter table partition

3.2.1 Range Partition:

A range partition is a continuous partition based on a certain field, for example, one partition whose ID is less than 3, one partition whose ID is less than 6, and one partition whose ID is less than 100.

3.2.2 List Partition:

A list partition is based on a field that belongs to a list range partition, such as a partition with ids 1,3,5,7, and 2,4,6,8.

3.2.3 Hash Partition:

Hash partitioning ensures that data is evenly distributed among a preset number of partitions. For example, if the preset number of partitions is 3, all data is evenly distributed among the three partitions.

3.2.4 Key Partitioning

Partitioning by KEY is similar to partitioning by HASH, except that HASH partitioning uses user-defined expressions, and the HASH function for KEY partitioning is provided by the MySQL server.

3.2.5 Sub-Partition:

A subpartition is a repartition of each partition in a partitioned table. Subpartitions can use either HASH or KEY partitions. This is also known as composite partitioning. Subpartitions must comply with the following rules:

  • If subpartitions are created in one partition, the other partitions must also have subpartitions
  • If a partition is created, the number of subpartitions in each partition must be the same
  • Subpartitions in the same partition may have different names and have the same subpartition names (5.1.50 Not applicable).

4. Solutions

4.1 Sub-table design

Designed as a table for every 1000 devices, the table name is T_data_ serial number.

If there are 10,000 devices, data is stored in t_DATA_1 to T_DATA_10 tables based on device serial numbers.

At the same time, a device-dynamic data relationship table (table name T_DEVICe_table_map) is added to store the relationship between devices and dynamic data tables, so that the corresponding table can be found when adding, deleting, modifying, and querying device data. The structure of t_DEVICe_table_map is as follows:

When data processing starts, load t_DEVICe_TABLE_map table data into its own memory, and then read from its own memory which dynamic data table the device belongs to before the data reported by the device is stored in the database, and then assemble Sql to perform the database operation.

4.2 Zone Design

Because device data is continuously reported, consider using a Range partition.

Partition design is based on data collection time, one partition per week, with each table preset for 10 years. If each device reports one piece of data every 30 seconds, each partition contains about 1,002,460 x 2 = 2,880,000 pieces of data.

The construction sentences are as follows:

CREATE TABLE `t_data_1` (
  `i_id` bigint(20) NOT NULL AUTO_INCREMENT,
  `i_status` bit(1) DEFAULT NULL,
  `c_device_sequence` varchar(32) DEFAULT NULL COMMENT 'Equipment Serial number',
  `t_collect_time` datetime NOT NULL DEFAULT '0000-00-00 00:00:00' COMMENT 'Data Collection Time'. PRIMARY KEY (`i_id`,`t_collect_time`), KEY `index_c_device_sequence` (`c_device_sequence`,`t_collect_time`) USING BTREE ) ENGINE=MyISAM AUTO_INCREMENT=398404 DEFAULT CHARSET=utf8 /*! 50500 PARTITION BY RANGE COLUMNS(t_collect_time) (PARTITION p20171224 VALUES LESS THAN ('2017-12-24 00:00:00') ENGINE = MyISAM,
 PARTITION p20171231 VALUES LESS THAN ('2017-12-31 00:00:00') ENGINE = MyISAM,
 PARTITION p20180107 VALUES LESS THAN ('2018-01-07 00:00:00') ENGINE = MyISAM,
 ...
 PARTITION p20271212 VALUES LESS THAN ('2027-12-12 00:00:00') ENGINE = MyISAM,
 PARTITION p20271219 VALUES LESS THAN ('2027-12-19 00:00:00') ENGINE = MyISAM) */; / *! 40101 SET character_set_client = @saved_cs_client */;Copy the code

5. Test

1.2 million pieces of data were tested, and the query time of sub-table (10 pieces) partition was about 0.1 seconds, as shown in the following figure:

The performance gains from table partitioning are significant.

6. Think about

Isn’t the more partitions the better? Definitely not.

Because MySQL performs a query operation in the first time to retrieve the query scope in which partition, partition too many, this part of the operation time increases. In addition, too many partitions may lead to a high memory footprint.

How to partition, how many areas is the most appropriate, still need long-term observation and a large number of data experiments.