This is the 13th day of my participation in the November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021
Why separate database and table?
The answer is simple: A database performance bottleneck a database performance bottleneck can manifest itself in several ways: a large number of requests are blocked
- In high concurrency scenarios, a large number of requests need to operate on the database, resulting in insufficient connections and blocked requests.
- SQL operations slow down
- If there is a table with hundreds of millions of data in the database, a SINGLE SQL that does not match the index will scan the whole table, and the query will take a long time.
- Storage is faulty.
- With the rapid increase of business volume, the amount of single database data is increasing, causing great pressure to storage.
Database related optimization plan
There are many database optimization schemes, which are mainly divided into two categories: software level and hardware level. Software level includes: SQL tuning, table structure optimization, read and write separation, database cluster, sub-database sub-table, etc. The hardware aspect is mainly to increase machine performance.
SQL tuning
SQL tuning is often the first step in solving a database problem, and you can get a lot of bang for your buck with less effort. The main purpose of SQL tuning is to make slow SQL as fast as possible by simply making SQL execution hit as many indexes as possible. The explain command is used to view the execution plan of SQL statements. By observing the execution result, it is easy to know whether the SQL statement is a full table scan or an index match. The TYPE column is used to determine whether the SQL statement has a full table scan. ALL, index, range, ref, eq_ref, const, system, NULL (from left to right, poor to good performance) ALL indicates that the SQL statement has been scanned for ALL tables and needs to be optimized. Generally speaking, range level and above is required.
Table structure optimization
Taking our table as an example, we now need to return the landing page information forward, including landing page details, corresponding product category, company name… How to get the data? However, there are a lot of data in landpage table, and it is difficult to associate the table. With the increase of data volume, tens of millions of landing page tables have to be associated to query individual fields, and the speed will certainly be greatly reduced. Optimization:
First check the LandPage table to obtain the main information, and then query other tables to obtain other required fields according to this batch of data information, and then combine them at the Java level
You can try to add required fields from other tables to the Landpage table, a practice often called database table redundant fields. The advantage of this is that the landing page information does not need to associated query product category table, company name table…
A drawback of redundant fields is that if the field update involves multiple table updates at the same time, you should try to select the fields that are not updated frequently when selecting redundant fields.
Structure optimization
When a single database instance cannot be supported, we can add instances to form a cluster for external services. When it is found that the number of read requests is significantly more than that of write requests, we can let the master instance take charge of the write, and the slave instance provides read capability externally. If the instance reading pressure is still high, you can add a cache such as Redis in front of the database to make requests to cache data first to reduce database access. After the cache has taken some of the burden, the database is still the bottleneck. At this time, we can consider the scheme of separate libraries and tables.
The hardware optimization
Hardware costs are very high, and it is generally not possible to upgrade hardware for database performance bottlenecks. Upgrading the hardware database can greatly improve the performance when the initial service volume is small. But at a later stage, the benefits of upgrading hardware are less obvious.
Project Database evolution
Multiple applications and single database
When I first joined the project team, we had two clients:Portal (Foreground) and Managemen (background)The two projects share a database.
Multiple applications and multiple databases
As the project continues to iterate, we need to develop new modulesCreative workshop. In order to develop new modules, we need to create a new database. In order not to make the database more chaotic, we built a new library to store the data of the creative workshop. This is actually “branch library”.A single database can support only a limited amount of concurrency, so multiple libraries can be divided into services without competition and improve the performance of services. If only split application not split database, can not solve the fundamental problem, the whole system is also easy to reach the bottleneck.
As data volumes continue to grow, read/write separation is also a considerationIf the workload increases, we can add a layer of caching between the workload and the database. After the cache has taken some of the burden, the database is still the bottleneck, so it is time to consider separate libraries and tables.
table
Divided library said, that when divided table? Take our product table, material table, etc., for example, the crawler climbs data and stores them in these tables every day. When the data grows to a certain stage, the database query efficiency will decline significantly. Therefore, when the amount of data in a single table is too large, we can consider separate tables. How to divide the tables? Horizontal segmentation and vertical segmentationHorizontal split and vertical splitThe users table (user), the table has seven fields: id, name, age, sex, the nickname, the description, if the nickname and the description is not commonly used, we can be broken down into another table: User details table, so split by a user table into user basic information table + user details table, two table structure is not the same independent of each other. But from this point of view, vertical split does not fundamentally solve the problem of large amount of single table data, so we still need to do a horizontal split.There is another way to split the table. For example, if there are 10,000 entries in the table, we split the table into two entries with odd ids: 1,3,5,7… Put it in user1 with an even id: 2,4,6,8… In user2, the split is horizontal. There are also many ways to split horizontally, in addition to the above said according to the ID table, but also according to the time dimension to split, such as order table, can be divided by daily, monthly, etc.
- Daily table: Stores only the data of the current day.
- Monthly table: You can create a scheduled task to migrate all data of the previous day to the monthly table.
- History table: You can also use scheduled tasks to migrate data that is more than 30 days old to the history table.
According to our current daily data volume, monthly data volume, there is no need to split according to the time dimension. Features:
- Vertical sharding: Partitioning based on tables or fields, with different table structures.
- Horizontal split: Based on data partitioning, the table structure is the same, but the data is different.
Summary: Table splitting is mainly used to reduce the size of a single table and solve the performance problem caused by the amount of data in a single table.
The complexity of separate databases and tables
Associated queries across libraries
Before the single database is separated into tables, we can easily use join operation to associate multiple tables to query data, but after the database is divided into tables, the two tables may not be in the same database, how to use JOIN? There are several solutions:
- Field redundancy: Put the fields to be associated into the main table to avoid the join operation.
- Data abstraction: data aggregation through ETL, etc., to generate new tables;
- Global tables: For example, some basic tables can be placed in each database.
- Application layer assembly: the basic data is checked out, and the assembly is calculated by the application program;
Distributed transaction
A single database can be handled by local transactions, while multiple databases can only be handled by distributed transactions. Common solutions include: Reliable message (MQ) based solutions, two-phase transaction commit, flexible transaction, etc.
Sorting, paging, function calculation problems
When SQL is used, keywords such as order BY and limit need special processing. Generally speaking, the idea of sharding is adopted: the corresponding function is executed on each shard, and then the result set of each shard is summarized and calculated again, and finally the result is obtained.
A distributed ids
Mysql > select * from db where id = 1; Mysql > select * from db where id = 1; Mysql > select * from DB where id = 1; Common distributed ID solutions are:
- UUID
- Maintain a separate ID table based on database increment
- Snowflake algorithm
Multiple data sources
After dividing the database and table, it may be faced with obtaining data from multiple databases or sub-tables. The general solution is client adaptation and proxy layer adaptation. Middleware commonly used in the industry include:
- Shardingsphere (formerly Sharding-JDBC)
- Mycat
ShardingJdbc overview
What is shardingJdbc
The website explains:
Sharding-JDBC
Positioned as a lightweight Java framework that provides additional services in Java’s JDBC layer. It uses the client directly connected to the database, in the form of JAR package to provide services, without additional deployment and dependence, can be understood as an enhanced VERSION of THE JDBC driver, fully compatible with JDBC and various ORM frameworks.
- Works with any JDBC-based ORM framework, such as JPA, Hibernate, Mybatis, Spring JDBC Template or directly using JDBC.
- Support any third party database connection pool, such as DBCP, C3P0, BoneCP, Druid, HikariCP, etc.
- Support for any database that implements the JDBC specification. Currently supports MySQL, Oracle, SQLServer, PostgreSQL and any database that complies with the SQL92 standard.
What does shardingJdbc do
- Data sharding:
- Reading and writing separation
- Depots table
- Distributed primary key
- Distributed transactions:
- XA strong consistent transactions
- Flexible transaction
- Database governance:
- Configuration dynamic
- Fuse & disable
- Call link Tracing
Architecture diagram: Registry Center stores a database structure and sharding rules
A master-slave replication
The principle of
- When the Master node performs insert, UPDATE, and delete operations, they are written to the binary log in sequence.
- Salve connects to the master library and creates as many binlog dump threads as the master has.
- When the binlog of the Master node changes, the binlog dump thread notifies all salve nodes and pushes the corresponding binlog content to slave nodes.
- After receiving the binlog content, the I/O thread writes the content to the local relay-log.
- The SQL thread reads the relay log written by the I/O thread and acts on the slave database based on the relay log.
Knowing the principle of master/slave replication, we can also clearly know that the write operation of read/write separation must be carried out on the master node, because the salve node realizes data unification according to the binlog of the master node. If we carry out the write operation on the slave node and the read operation on the master node, The two data will not be unified, master – slave replication will lose its significance.
Primary instance Setup
- Example Modify the mysql configuration file /etc/my.cnf
[mysqld] ## set server_id Log-bin =mall-mysql-bin ## Set the size of memory used by binary logs. Binlog_cache_size =1M ## Set the binary log format to be used (mixed,statement,row) binlog_format=mixed ## Time when binary logs expire. The default value is 0, indicating that automatic clearing is not performed. Expire_logs_days =7 ## Skips all errors or the specified type of errors encountered in the slave replication to avoid replication interruption. Slave_skip_errors =1062; slave_skip_errors=1062Copy the code
- After modifying the configuration, restart the instance:
service mysqld restart
Copy the code
- Create a data synchronization user:
Connect to database
mysql -uroot -proot
Create a data synchronization user
Create a synchronization account for Slave1
CREATE USER 'slave1'@'192.168.200.12' IDENTIFIED BY '123456';
GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'slave1'@'192.168.200.12';
Create a synchronization account for Slave2
CREATE USER 'slave2'@'192.168.200.13' IDENTIFIED BY '123456';
GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'slave2'@'192.168.200.13';
Copy the code
Build from instance
- Example Modify the mysql configuration file /etc/my.cnf
[mysqld]
Set server_id to be unique on the same LAN
server_id=102
## Specifies the database name that does not need to be synchronized
binlog-ignore-db=mysql
Enable binary logging in case Slave acts as Master of other database instances
log-bin=mall-mysql-slave1-bin
Set the size of memory used by binary logs (transactions)
binlog_cache_size=1M
# change the binary log format to use (mixed,statement,row)
binlog_format=mixed
## Clearing time of binary log expiration. The default value is 0, indicating that automatic clearing is not performed.
expire_logs_days=7
Skip all errors or specified types of errors encountered in master/slave replication to avoid replication interruption on slave.
Error 1032 is caused by data inconsistency between primary and secondary databases
slave_skip_errors=1062
## relay_log Configures relay logs
relay_log=mall-mysql-relay-bin
## log_slave_updates indicates that slave writes replication events to its binary log
log_slave_updates=1
Set slave to read-only (except for users with super privileges)
read_only=1
Copy the code
- After modifying the configuration, restart the instance:
service mysqld restart
Copy the code
Connect the primary and secondary databases
- Connect to the mysql client of the primary database and check the primary database status:
show master status;
Copy the code
- The following information is displayed about the active database:
- Mysql client that connects to the secondary database
mysql -uroot -proot
Copy the code
- Configure master/slave replication in the slave database
Change master to master_host='192.168.200.11', master_user='slave1', master_password='123456', master_port=3306, master_log_file='mall-mysql-bin.000001', master_log_pos=645, master_connect_retry=30;Copy the code
-
Parameter description of the primary/secondary replication command:
- Master_host: IP address of the primary database.
- Master_port: running port of the primary database.
- Master_user: user account created on the primary database to synchronize data.
- Master_password: user password created on the primary database to synchronize data;
- Master_log_file: specifies the log File to copy data from the database. You can view the status of the primary data to obtain the File parameter.
- Master_log_pos: specify the Position from which the data is copied from the database. Obtain the Position parameter by viewing the status of the master data.
- Master_connect_retry: Indicates the retry interval for connection failures, in seconds.
-
Viewing the master/slave synchronization status:
show slave status \G;
Copy the code
* * * * * * * * * * * * * * * * * * * * * * * * * * * 1 row * * * * * * * * * * * * * * * * * * * * * * * * * * * Slave_IO_State: Master_Host: 192.168.200.11 Master_User: slave Master_Port: 3306 Connect_Retry: 30 Master_Log_File: mall-mysql-bin.000001 Read_Master_Log_Pos: 645 Relay_Log_File: mall-mysql-relay-bin.000001 Relay_Log_Pos: 4 Relay_Master_Log_File: Mall-mysql-bin.000001 Slave_IO_Running: No # Indicates that Replicate_Do_DB is not synchronized. Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 645 Relay_Log_Space: 154 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 0 Master_UUID: Master_Info_File: /var/lib/mysql/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.00 SEC)Copy the code
- Enable primary/secondary synchronization:
start slave;
Copy the code
- SQL > select * from database;
*************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.200.11 Master_User: slave1 Master_Port: 3306 Connect_Retry: 30 Master_Log_File: mall-mysql-bin.000001 Read_Master_Log_Pos: 645 Relay_Log_File: mall-mysql-relay-bin.000002 Relay_Log_Pos: 325 Relay_Master_Log_File: mall-mysql-bin.000001 Slave_IO_Running: Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 645 Relay_Log_Space: 537 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 11 Master_UUID: cabb05d9-d404-11eb-9530-000c299fd1be Master_Info_File: /var/lib/mysql/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.00 SEC)Copy the code
Master/slave replication tests
- Create a database in the primary instance
mall
;
- When I look at the database from the instance, I find one as well
mall
Database, you can determine that the primary/secondary replication has been set up successfully.
Reading and writing separation
After the master/slave replication is complete, read/write separation is required. The master writes data and the slave reads data. How do you do that?
There are two solutions to read-write separation: application layer solution and middleware solution.
CREATE TABLE `tb_commodity_info` (
`id` varchar(32) NOT NULL,
`commodity_name` varchar(512) DEFAULT NULL COMMENT 'Trade Name',
`commodity_price` varchar(36) DEFAULT '0' COMMENT 'Commodity price',
`number` int(10) DEFAULT '0' COMMENT 'Quantity of goods',
`description` varchar(2048) DEFAULT ' ' COMMENT 'Product Description'.PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Commodity Information Sheet';
Copy the code
Application Layer Solution
Advantages:
-
Multi-data source switching is convenient, completed automatically by the program;
-
No need to introduce middleware;
-
Theoretically supports any database;
Disadvantages:
-
Completed by programmers, not involved in operation and maintenance;
-
Cannot dynamically add data sources;
Middleware solution
Advantages:
- The source program does not need to make any changes to achieve read and write separation;
- Adding data sources dynamically does not require a restart of the program;
Disadvantages:
- Programs rely on middleware, which makes switching databases difficult;
- By the middleware to do the intermediary agent, the performance decreased;
The application layer uses AOP to judge by method names. Methods starting with GET, SELECT, and Query are connected to the slave database, while others are connected to the master database. However, AOP is a bit cumbersome to implement the code, is there any ready-made framework, the answer is there. Apache ShardingSphere is an ecosystem of open source distributed database middleware solutions, which is composed of JDBC and Proxy. Shardingsphere-jdbc is positioned as a lightweight Java framework that provides additional services in Java’s JDBC layer. It uses the client directly connected to the database, in the form of JAR package to provide services, without additional deployment and dependence, can be understood as an enhanced VERSION of THE JDBC driver, fully compatible with JDBC and various ORM frameworks. Read and write separation can be implemented using ShardingSphere-JDBC.
SpringBoot+Mybatis+ Druid +ShardingSphere-JDBC code implementationProject configurationVersion Description:
SpringBoot: 2.5.3
Druid: 1.1.20
Mybatis - spring - the boot - starter: 1.3.2
Sharding - JDBC - spring - the boot - starter: 4.0.0 - RC1
Copy the code
Add sharding-JDBC maven configuration:
<dependency>
<groupId>org.apache.shardingsphere</groupId>
<artifactId>sharding-jdbc-spring-boot-starter</artifactId>
<version>4.0. 0-RC1</version>
</dependency>
Copy the code
Then add the configuration in application.yml:
Spring: main: allow druid connection pool configuration-bean-definition-overriding: true
shardingsphere:
datasource:
names: master,slave0,slave1
master:
type: com.alibaba.druid.pool.DruidDataSource
driver-class-name: com.mysql.cj.jdbc.Driver
url: jdbc:mysql://192.168200.11.:3306/mall? useUnicode=true&characterEncoding=utf8&tinyInt1isBit=false&useSSL=false&serverTimezone=GMT
username: root
password: root
slave0:
type: com.alibaba.druid.pool.DruidDataSource
driver-class-name: com.mysql.cj.jdbc.Driver
url: jdbc:mysql://192.168200.12.:3306/mall? useUnicode=true&characterEncoding=utf8&tinyInt1isBit=false&useSSL=false&serverTimezone=GMT
username: root
password: root
slave1:
type: com.alibaba.druid.pool.DruidDataSource
driver-class-name: com.mysql.cj.jdbc.Driver
url: jdbc:mysql://192.168200.13.:3306/mall? useUnicode=true&characterEncoding=utf8&tinyInt1isBit=false&useSSL=false&serverTimezone=GMT
username: root
password: root
props:
sql.show: true
masterslave:
load-balance-algorithm-type: round_robin #round_robin random
sharding:
master-slave-rules:
master:
master-data-source-name: master
slave-data-source-names: slave1,slave0
Copy the code
Sharding. master-slave-rules specifies the master and slave libraries. Do not write the data incorrectly, otherwise the data will not be synchronized to the slave library. Load-balance-algorithm-type indicates a routing policy, round_robin indicates a polling policy, and random indicates a random policy. When the project is started, you can see the following information, which indicates that the configuration is successful:Write the entity class interface:
import lombok.Data;
/ * * *@author wangpeixu
* @date2021/8/12 13:41 * /
@Data
public class Commodity {
private String id;
private String commodityName;
private String commodityPrice;
private Integer number;
private String description;
}
Copy the code
Ready to test! Tests write test classes
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import wang.cai2.shardingSphere.dao.CommodityMapper;
import wang.cai2.shardingSphere.entity.Commodity;
import java.util.List;
import java.util.Random;
/ * * *@author wangpeixu
* @date2021/8/12 13:56 * /
@SpringBootTest
public class masterTest {
@Autowired
private CommodityMapper commodityMapper;
@Test
void masterT(a) {
Commodity commodity = new Commodity();
commodity.setCommodityName("Gourd");
commodity.setCommodityPrice("6");
commodity.setNumber(10000);
commodity.setDescription("Sell wax gourd");
commodity.setId(String.valueOf(new Random().nextInt(1000)));
commodityMapper.addCommodity(commodity);
}
@Test
void queryTest(a) {
for (int i = 0; i < 10; i++) {
List<Commodity> query = commodityMapper.query();
System.out.println("-- -- -- -- -- -- --"); }}}Copy the code
Insert datamasterT():Query dataqueryTest():successful
Sharding-Jdbc implementation sub-library sub-table
The shardingJdbc configuration file is as follows:
# datasource Spring: main: allow-bean-meur-Tpo50: true # overriding shardingSphere: # SQL props: SQL Show: true datasource: names: db1, db2 # first configuration source db1: type: com. Alibaba. Druid. Pool. DruidDataSource driver - class - name: com.mysql.cj.jdbc.Driver url: jdbc:mysql://localhost:3306/db_1? ServerTimezone =UTC username: root password: root com.alibaba.druid.pool.DruidDataSource driver-class-name: com.mysql.cj.jdbc.Driver url: jdbc:mysql://localhost:3306/db_2? ServerTimezone =UTC username: root password: root Sharding: # Library sharding policy default-database-strategy: inline: sharding-column: user_id algorithm-expression: db$->{user_id %2 +1} tables: course: actual-data-nodes: db$->{1.. 2}.course_$->{1.. Table -strategy: inline: sharding-column: cid algorithm-expression Course_ $->{cid % 2 + 1} # Cid generation strategy key-generator: column: CID Type: SNOWFLAKE # SNOWFLAKE algorithmCopy the code
Test code:
import org.junit.jupiter.api.Test; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.boot.test.context.SpringBootTest; import wang.cai2.shardingSphere.dao.CourseMapper; import wang.cai2.shardingSphere.entity.Course; import java.util.List; @SpringBootTest class ShardingSphereApplicationTests { @Autowired private CourseMapper courseMapper; @Test void saveCourse() { for (int i = 1; i <= 10; i++) { Course course = new Course(); course.setCname("java" + i); course.setUserId(Long.valueOf(i)); course.setCstatus("Normal" + i); courseMapper.saveCourse(course); } } @Test void findAll() { List<Course> all = courseMapper.findAll(); for (Course course : all) { System.out.println(course); }}}Copy the code
We can see that the data is inserted according to our configuration