Kingshard has been open source for a while, and enthusiastic users have emailed kingshard with questions about its design and implementation. Therefore, I took some time to write an article introducing kingshard’s architecture and function realization over the weekend, hoping to make users have a deeper understanding of Kingshard through this article. The following modules will introduce the design and implementation of kingshard’s core components.

1. Overall architecture

Kingshard is developed with Go, taking full advantage of the concurrency features of Go. Go encapsulates concurrency, which greatly simplifies kingshard development. The overall workflow of Kingshard is as follows:

  1. Read the configuration file and start. The listening port set in the configuration file listens for client requests.
  2. Upon receipt of a client connection request, a Goroutine is launched to process the request separately.
  3. Login authentication is preferred. The authentication process is compatible with the MySQL authentication protocol. The user name and password have been set in the configuration file, so you can use the information to verify connection requests. If both the user name and password are correct, go to the following steps; otherwise, an error message is returned to the client.
  4. After the authentication succeeds, the client sends the SQL statement.
  5. Kingshard conducts lexical and semantic analysis on THE SQL statements sent from the client to identify the type of SQL and the routing plan to generate SQL. If necessary, the SQL is overwritten and forwarded to the appropriate DB. It is also possible to forward directly to the appropriate back-end DB without doing lexical and semantic analysis. If the forward SQL is split table and spans multiple DB, each DB starts a Goroutine to send THE SQL and receive the results returned by that DB.
  6. The results are received and merged, and then forwarded to the client.

The overall working process of Kingshard can be referred to the picture below.The overall architecture of Kingshard is shown below

2. Lexical and semantic analysis

To make Kingshard powerful enough, you have to do lexical and semantic analysis of SQL. Lexical analysis of SQL statements refers to the segmentation of SQL statements into keywords. SQL statement: select name from STu where id < 13 {” select “, “name”, “from”, “stu”, “where”, “id”, “<“, “13”} < code = “” >. The main purpose of this is to generate an abstract syntax tree, also known as AST(Abstract Syntax Tree), on which semantic analysis is performed. The purpose of semantic analysis is as follows:

  1. Read/write separation: The read/write separation can be performed only after the type of THE SQL statement is identified.
  2. Data sharding, parsing out table names and query conditions, routing SQL to the correct DB.
  3. SQL blacklist, through lexical and semantic analysis, can also quickly identify SQL statements that need to be masked. For example, if you detect that the DELETE statement does not have a WHERE operation, you can directly block the forwarding of the SQL.

Kingshard does not consider full compatibility with all MySQL syntax because full compatibility with MySQL syntax would make lexical and semantic analysis modules extremely complex and inefficient. There is no need to parse the DDL statement, as long as it can be correctly forwarded to the corresponding DB on the back end.

Kingshard only for part of DML statements (select, update, insert, delete, replace) the resolution, so that we can meet most of the operating table. For other statements, Kingshard sends them to a default DB, or to the specified DB in a kingshard-specific way, such as: /*node2*/insert into stu (id,name) values(12,’xiaoming’);

3. Load balancing

Users use Mysql Proxy to reduce the load on a single DB server and distribute the read load to multiple DB servers. Kingshard supports multiple slaves. Each slave can be configured with different read weights. A larger read weight shares more read requests. Kingshard read request load balancing adopts weighted polling scheduling algorithm.

When most systems use this algorithm, they dynamically calculate the serial number of the selected DB when forwarding SQL statements. The SQL statement for the read request is then sent to the DB. On closer analysis, this is not necessary. Because DB weights are relatively fixed and do not change very often, it is perfectly possible to calculate a fixed polling sequence and store that sequence in an array. So you don’t have to do this dynamically, you just read the array every time. For example, configure the Slave option in the Kingshard node configuration item: Slave :192.168.0.12@2,192.168.0.13@3 when kingshard reads configuration information to initialize the system, it generates a polling array :[0,0,1,1,1]. In kingshard, this array is shuffled into: [0,1,1,0,1]. In this way, the problem of dynamically calculating DB subscripts is avoided and the performance is improved to some extent.

4. Sharding implementation

First, two concepts need to be introduced:

  1. Subtables, in Kingshard a logically large table consists of several small subtables. For example, divide the STU table into STU_0000, STU_0001, STU_0002, STU_0003. The STU table does not exist in the database, it is just a logical table. There are only four subtables in the database (STU_0000, STU_0001, STU_0002, STU_0003). When you send an SQL statement, Kingshard recognizes the SQL statement that needs to be split into tables and overwrites the SQL. For example, the client sends the following SQL statement: SELECT name from STu where ID =10; After receiving the SQL statement, kingshard identifies the table as a Hash table from the configuration information. Select name from STU_2 where id =10; select name from STU_2 where id =10; Then send it to the corresponding DB.

  2. Node. Sub-tables are distributed on each Node. Each Node contains a Maser Server and several slave servers (the number of slaves can be 0). Write requests are sent to the master server and read requests are sent to the slave Server.

Sharding of Kingshard adopted the idea of two-level mapping. Firstly, the sub-table of the SQL statement was calculated according to the sub-table conditions of the SQL statement, and then the node of the sub-table was found according to the configuration information. Using the idea of two-level mapping, the expansion and reduction of MySQL can be easily supported. Currently, Kingshard Sharding supports INSERT, DELETE, SELECT, UPDATE, and REPLACE statements, and all five types of operations are supported across subtables. However, write operations can only be performed across subtables on a single node. Select operations can be performed across nodes and subtables.

For tables that do not have sub-tables, SQL statements that operate the tables are sent to the Default node. Or the user can choose to add a comment in front of the SQL statement to specify the node to which the SQL is to be sent. After receiving the statement, kingshard identifies the node specified in the comment and sends the SQL to the appropriate DB in the corresponding node. For example, if you send /*node1*/select * from member where id=100,kingshard will send the SQL to salve on node1. This allows Kingshard to be well compatible with a variety of scenarios for both partitioned and partitioned tables.

5. Transaction implementation

All proxies that support shard face a question: do they support distributed transactions? For performance and availability, Kingshard only supports transactions on a single DB, not across DB. Kingshard handles transactions on a single DB as follows:

  1. The user sends the BEGIN statement.
  2. After receiving the SQL statement, kingshard sets the state of the connection to transaction.
  3. The user sends a DML statement, and Kingshard identifies the DB to which the statement needs to be sent. Kingshard then creates a connection to the back-end DB and sends the statement using this connection.
  4. After receiving the result of the SQL statement, put the connection back.
  5. Kingshard receives the next SQL statement and determines whether the SQL statement is sent to the same DB. If not, an error is reported. If it is to the same DB, the statement is sent using this connection.
  6. A COMMIT statement from the user is received, the state of the connection is set to non-transaction, and the transaction ends.

6. Back-end DB survival check

Kingshard Each node starts a Goroutine to check the status of the back-end master and slave. When the goroutine fails to ping the back-end DB for a period of time (set by down_after_noalive in the configuration file), the state of the DB is set to Down, and the Kingshard will not send SQL statements to the DB.

7. Whitelist mechanism on the client

Sometimes users want to allow only a few servers to connect to Kingshard for security reasons. Kingshard has a parameter in its configuration file: allow_ips, which implements the client whitelist mechanism. If this parameter is set, only IP addresses specified by allow_ips can connect to kingshard, and other IP addresses are rejected by Kingshard. If this parameter is not set, the clients connected to Kingshard are not restricted.

8. Design and implementation of management terminal

Kingshard’s administrative ports reuse the working ports, identified by a specific keyword (admin). Kingshard is used to parse SQL statements into a command that kingshard can recognize through lexical and semantic analysis of management-specific SQL. Currently, you can smoothly line up and down the master and slave, and view the Kingshard configuration and back-end DB status. The next step is to integrate web pages into the management side so that users can operate on web pages instead of typing command lines. Significantly lower the threshold for users to use Kingshard.

All the modules mentioned above are core modules in Kingshard. Through the introduction of this article, I think readers should have a preliminary understanding of kingshard’s architecture and implementation. The design and realization of many functions are gradually explored and practiced by the author. If any readers are interested in the design or implementation of Kingshard or have different ideas about the above design, please email me.