Mysql high concurrency solution

preface

With the application of distributed in recent years, the amount of data in the system is also increasing. In order to improve the overall performance of the system, we also put forward “distributed” requirements for relational databases represented by Mysql.

MyCat Get Started refer to my blog: MyCat Get Started

High Availability of Mycat using HaProxy can be found in my blog: High Availability of Mycat using HaProxy

The body of the

Mysql high concurrency solution

Mysql single-node performance limit:

  • mysqlThe default maximum number of connections is 151 and the upper limit is 100000.
  • If with the help ofInnoDBRow-level locks to achieve “de-inventory” words: because of the databaseInnoDBThe update operation of the row lock for the same row of data is serial execution, so a thread before releasing the lock, the rest of the threads will be all blocked in the queue waiting for the lock, the higher the concurrency, the more waiting threads will be, which will seriously affect the databaseTPS, thus resulting inRTLinear rise, which could eventually trigger an avalanche in the system.

Due to the performance bottleneck of Mysql single node, it is necessary to further increase the concurrency by adding nodes. Therefore, master/slave replication can be adopted to achieve read/write separation and relieve the IO pressure of single node.

MySQL primary/secondary replication implements read/write separation

Mysql maintains data consistency by using two-phase commit for fault recovery, primary/secondary replication, and primary/secondary replication. That is, data consistency is achieved between binlog and InnoDB redo log.

  • Phase one:redo logWrite,InnoDBThe transaction into thepreparestate
  • Phase two: If the frontprepareSuccess,binlogWrite to disk, then continue the transaction log (redo log) persist tobinlogIf the persistence succeeds, thenInnoDBTransactions entercommitstate

At the end of each transaction binlog, an XID event is recorded to indicate whether the transaction has committed successfully.

Primary/secondary replication of MySql databases

Primary/secondary replication is based on log (binlog). There are four types of replication:

  • Asynchronous replication (default)MySQLThe default replication policy,MasterIt is written to during transaction processingBinlogWill informDump threadThe thread processes the transaction and then completes the commit, regardless of whether it was successfully sent to either oneslaveIn the.
  • Semi-synchronous replication:MasterWhen a transaction is committed, you must wait for at least one transactionSlaveWill receive thebinlogwriterelay logreturnack(commit before synchronization) to continue with transactions that process the user.
  • Enhanced semi-synchronous replication: The problem with semi-synchronization is waitingACKThe point isCommitAnd then, at this pointMasterData changes have been completed and users can see the latest data whenBinlogIt’s not synchronized yetSlave, a master/slave switchover occurs, so the slave library does not have the latest data, and the user sees the old data. Enhanced semi-synchronization will waitACKThe points placed in the submissionCommitBefore (submit after synchronization), at this time, the data has not been submitted, the outside world can not see the data change, at this time, if the primary/secondary switchover is sent, the new library is still the old data, there is no data inconsistency problem.
  • Set of copy:MySQLThis is done in the engine layerPrepareWrite operationRedoAfter logging, will beMySQLThe defaultHookIntercept intoMGRLayer,MGRLayer packages transaction information throughPaxosThe protocol is sent to all nodes and only half of the cluster respondsACK(half write successful), then all nodes will be told that the packet synchronization is successful, and each node will start its own authentication (certify) Write as soon as you passBinlogCommit a transaction or writerelay log, data synchronization. If the authentication failsrollback.

MySql master-slave replication involves three behaviors:

  1. Data is stored:MasterLog data changes to the binary log (binary log), which is the configuration filelog-binThese records are called binary log events (binary log events);
  2. The data transfer:SlavethroughI/OThread to readMasterIn thebinary log eventsAnd write to its relay log (relay log);
  3. Data replay:SlaveRedo the events in the relay log, the event information in the relay log is executed locally one by one, and the data is stored locally, so as to realize the changes reflected in its own data.

table

Characteristics of the horizontal sub-table:

  • Divide a table into multiple tables with the same structure (use the same table)FRMFile, but each child table is stored on the same disk a separate oneMYDFile and a separateMYIFile,MRGFile record sub-table information);
  • The main table is used as the interface for the query (no data files), and table 1 and Table 2 are used as the actual form to store the data;
  • Deciding which actual table to place the data in is often done correctlyIDModulo (that is, mod) or the business primary keyhashOperation to ensure that related data is in the same subtable;
  • mergeTo divide the table, is the simplest way to achieve partition;

For example, modulo ID:

Insert data into table 1;insert into tb_member1(id,name,sex) select id,name,sex from member where id%2=0; // select * from table 1Insert data into table 2;insert into tb_member2(id,name,sex) select id,name,sex from member where id%2=1; Select * from tb_member;Copy the code

partition

Characteristics of horizontal zoning:

  • Partition the data in a table is divided into N blocks, which can be on the same disk or on different disks, but it is still the same tableMYD,MYIAt the same time, I cut a lot of pieces,ParThe file records partition information;
  • Partition usingpartition byRelatively easy to implement, horizontal partitioning supportrangePartitions,listPartitions,hashPartition and other ways;
  • Which way to partition depends on the business. For example, take the equity center as an example: Generally speaking, we want the same user’s data to be stored in the same physical table, so we can use the mobile phone numberhashTo partition;
Insert data into table imperceptibly;insert into member(id,name,sex) values (1, "luo", "male"); Select * from tb_member;Copy the code

Table depots

Partitioned table partitioned database is a combination of partitioned table and partitioned table:

  • Data stored in one library is stored in multiple libraries.
  • Block data stored in one table into multiple tables;
  • Sub-library can solve the problem of insufficient performance or high cost of a single server.
  • Sub-table sub-library can be horizontal sub-table sub-library, can also be vertical sub-table sub-library;

Common problems of dividing tables and libraries

Cross-database joins

When data is divided into different libraries, cross-library join is generally prohibited, and the following methods are generally adopted:

  • Global tableGlobal tables are tables that all modules in the system may depend on. It’s similar to what we understand as a data dictionary. To avoid cross-libraryjoinQuery, we can keep a copy of this type of table in every other database;
  • Fields redundancy: For example, save “seller” in order formIdAt the same time, the seller’s”Name“Fields are also redundant, so when querying order details, there is no need to query the” seller user table “;
  • Data synchronizationUse:ETLTools (data migration) do table data synchronization, timing A librarytab_aTable and B librarytbl_bIf there is association, you can synchronize the specified table periodically.

Cross-database transaction

Database transaction management is difficult because the data is stored in different libraries. If we rely on the distributed transaction management function of the database itself to execute transactions, we will pay a high performance cost, so we will actually only split the historical data into tables and libraries.

Cross-database transaction

Database transaction management is difficult because the data is stored in different libraries. The core of cross-library transactions is to ensure consistency, and there are two main ways:

  • Distributed transactions: Distributed transactions can maximize the atomicity of database operations, but the cost is too high and generally not used.
  • Final consistency: local transaction, local landing, compensation send.

If we rely on the distributed transaction management function of the database itself to execute transactions, we will pay a high performance cost, so we will actually only split the historical data into tables and libraries.

Cross-library paging, sorting, function issues

Limit paging, order by sort, etc.

  • When the sorting field is the sharding field, it is easier to locate the specified sharding based on the sharding rule.
  • If the sorting field is not a sharded field, the data needs to be sorted and returned from different sharded nodes. Then, the result sets returned from different shards are summarized and sorted again, and finally returned to the user.

Unique global primary key problem

In a partitioned database and table environment, since the data in the table is stored in different databases at the same time, the auto-growth of the primary key is useless, and the self-generated ID of a partitioned database cannot be guaranteed to be globally unique. There are two common schemes:

  • UUID: Primary key is the simplest solution. It is locally generated, has high performance and no network time. But the disadvantages are also obvious due toUUIDIt’s very long and takes up a lot of storage space; In addition, there are performance problems when creating indexes as primary keys and querying based on indexesInnoDBNext,UUIDThe disorder of the data location changes frequently, leading to paging.
  • Snowflake Algorithm: Solves the global problem of distributed system generationIDOn demand, generate 64-bitLongType number, generatedIDOn the whole, the time trend increases;

Service scenarios of separate tables and libraries

The principles of table and library can be referred to in the database:

  • Do not slice if you can: do not easily use the table and library until absolutely necessary, avoid “over-design” and “premature optimization”.
  • A large amount of data affects service access: A single table is too large and requires a large number of disks for backupIOAnd the InternetIOAnd so on.
  • Business development requires vertical separation of certain fields: for example, we sometimes separate the user login part from the user table and make it into the login table.
  • The amount of data is growing rapidly: If the performance is approaching the bottleneck due to the increase of data volume, horizontal sharding needs to be considered. Appropriate rules for sharding should be selected according to the service, for example, pairIDTake the mold or hand machine number to dohashOperation, etc.
  • Security and availability: Shard data to different databases to reduce the impact of database outages on services.

MyCat implements read and write separation of Mysql. In essence, it is a practice to separate tables and libraries. For details, please refer to my blog :MyCat Introduction