The authors introduce
Zhao Feixiang, now engaged in database related work in competitive world, Oracle 10G OCP, 11G OCM, Oracle YEP young expert, 8 years of experience in database operation and architecture, I have rich practical experience in operation and maintenance of MySQL, Oracle, PostgreSQL, Greenplum, MongoDB and other common databases, and master the back-end architecture and DevOps implementation technology related to databases. Good at database architecture design, maintenance and optimization, data flow, Shell and Python development; I am interested in technical communication and have done a lot of technical summary and thinking sharing under the online name Yumushui.
Currently, MySQL database is widely used in many OLTP scenarios and can be used in many different ways. From database business requirements, architecture design, operation and maintenance, to expansion and migration, different MySQL architectures have different characteristics and adapt to certain business scenarios or solve certain business problems.
As the design, implementation and maintenance personnel of database architecture, DBA should not only be very familiar with various MySQL architectures, but also understand the business, have certain division and understanding of different businesses, and choose and use MySQL reasonably according to the business characteristics and architecture characteristics to meet business requirements.
This article from the MySQL common architecture, the classification of the business environment, business and architecture combined with the principle of three aspects discusses the MySQL database and the business scenario and instructions, let everybody respectively with MySQL architecture and business classification, then the two together, can make business with MySQL architecture design in confusing, Enabling users to address supporting business needs with appropriate technologies.
MySQL database architecture
In order to have a clear understanding of the common architecture of MySQL database, three general infrastructure, five special requirements architecture, architecture combination and comprehensive use of MySQL are explained below.
There are three common MySQL infrastructures
(1) MySQL single-instance architecture
MySQL single instance is to deploy a MySQL instance on the server to provide external services. This is the first way to contact the MySQL database, and it is also a common way to learn and study the use of MySQL database.
MySQL singleton is the first phase of MySQL database usage, usually on the same server as the application.
The main benefit of this approach is that it is easy to deploy and use, directly through compilation installation, or binary package decompression installation, you will soon have a MySQL database environment ready to use. In this way, you do not need to rely on third-party tools or software to maintain and locate faults easily.
Familiarity with and mastery of the MySQL single-instance environment is also the foundation for maintaining other MySQL architectures.
Note that the MySQL single instance can be used in the learning and development environment, but this method has poor availability and disaster recovery. If the database is used as a business system, try not to use this method.
(2) MySQL master-slave architecture
MySQL master-slave is a master/slave environment that backs up the MySQL database and restores one or more MySQL instances. Run the change master command to specify the newly restored MySQL instances. Read the change log from that MySQL node and apply it locally to make the newly restored instance consistent with the original MySQL instance data.
So, the instance where the original data changes consistently is called the master node; The node that obtains logs from the master node and applies them locally to keep the data consistent with the master node is called the slave node. Such an architectural environment is called a master-slave environment.
Master-slave is a characteristic function of MySQL database and a common architecture of MySQL database application in production environment.
Through master-slave architecture, the online database can have multiple copies of data, playing a certain data backup function. Data changes from the Slave database are implemented only through the application of logs. Generally, data write is not actively generated, but external data read services can be provided. In this way, the read performance and throughput of services can be greatly improved by adding several Slave Slave databases to enable the services that only read data to be queried on the Slave Slave database. In the Internet industry, most data read operations are far higher than data write operations. Data is written on the master node and read on the slave node. The architecture of read/write separation meets service requirements.
Of course, the master-slave architecture of MySQL can be implemented flexibly. One master node can have one or more slave nodes. A slave node can also be used as a slave node of another node. If a slave node is followed by other nodes as slave nodes of this node, it is called cascading replication.
Master-save architecture of MySQL improves the performance, availability and scalability of MySQL database on the basis of MySQL single instance architecture, and also provides a basic guarantee for MySQL database to pursue higher availability.
(3) MySQL MHA high availability architecture
Although the master-salve master-slave architecture of MySQL database makes the database have multiple copies, these maste master nodes and slave slave nodes are still relatively independent, especially if the master master node fails, it still cannot provide database services. In order to achieve higher availability of database in case of various failures and special situations, it is necessary to realize higher availability through other components on the basis of master-slave.
There are many high availability schemes for MySQL, but at present, the mainstream and mature scheme is MySQL + MHA high availability architecture.
Simply put, in order to achieve higher availability, based on the master-slave environment, the IP address of the service connection to the master, including the actual IP address of the master host, is changed into a virtual VIP or domain name. The application program accesses the database through the VIP to read and write data. In normal cases, services read and write data on the master. If the master node fails, the HA component detects the fault and switches the VIP to the slave slave library. Meanwhile, it transfers and applies logs from the slave slave library to ensure that the data on the slave node is as consistent as possible as the data before the master node fails. In this way, the new slave node can still provide database services.
Of course, for the specific implementation, in addition to the master-slave structure of MySQL, the VIP and database log equalization scheme also has a variety of implementation methods, but also can be customized, and some companies do not use open source software, directly develop their own to achieve various functions of high availability components.
At present, MySQL + MHA is the mainstream and mature implementation of high availability. There may be some other more perfect implementation of high availability in the future, but they can be categorized into the implementation of improved availability.
summary
The MySQL single-instance architecture, mysqlmaster-slave architecture, and MySQL+MHA high-availability architecture can meet the various universal requirements of the MySQL database. In other words, the MySQL database can meet the requirements of the single instance architecture, MySQLmaster-slave architecture, and MySQL+MHA high-availability architecture. These are the three infrastructures that you must use and master in the MySQL database architecture.
Five special business requirements architectures
With these three common MySQL infrastructures, most MySQL database scenarios and problems can be well met and solved. But some special scenarios, or some special problems, can also use a database other than MySQL database, a specific class or several class of problem solutions. In view of these special business requirements, I will first describe and explain the problems to be solved, and then propose corresponding solutions.
(1) MySQL + distributed Proxy horizontal extension architecture
Question: if further expand business scale, reading and writing level especially writing level near the point of very large, such as data is written to hundreds of thousands of every second, and even millions of, the amount of data every day there are hundreds of millions or even billions of scale, the reading and writing are far from a master node can support, and it has to be expanded.
Generally speaking, the expansion of MySQL can be divided into two ways: vertical expansion by vertical split according to different services, and horizontal expansion of library table by a certain strategy of library and table. These two methods can be used separately or in combination. However, when dealing with large amounts of data and high concurrent reads and writes, the main approach is MySQL horizontal expansion.
MySQL horizontal extension idea
The database database and table table on a server are always limited by the resources of a server. Even if all aspects of the hardware of the server have reached the maximum configuration, there will still be bottlenecks. For business access, if there is a database and a table in a Proxy layer or middleware layer, the Proxy layer can map and transform them into multiple databases and tables on multiple servers at the bottom according to certain rules. In this way, it is equivalent to multiple servers jointly supporting one business. The capacity that can be supported is related to the number of underlying servers. In the case of no Proxy bottleneck, the more underlying servers, the more performance and capacity of the entire horizontal scaling cluster, almost linear scaling. In this way, the problem of massive data storage and concurrency can be solved.
Specifically, in addition to MySQL database, horizontal extension cluster requires a distributed Proxy middleware. There are many kinds of horizontal extension middleware, which are developed by MySQL officials and some large companies. We use MyCAT middleware a lot. For low-level shards, there can be dozens, hundreds, or even thousands.
Of course, horizontal expansion can solve the problem of large data volume, which requires a sharding strategy. Accordingly, there will be restrictions on the use of this strategy, such as the selection of chip keys, cross-node access, distributed transactions and other problems. Only after corresponding combination and consideration with business can it be well used.
(2) TokuDB/MyRocks/InnoDB high-performance write architecture
Problem: MySQL database horizontal split, can be for large data read and write linear expansion, but the corresponding number of underlying servers also need to be more; However, high-performance write architectures are more suitable for large data writes, small data reads, and large data volumes.
When the amount of service data written and read is very large, data insert writing performance and data compression efficiency are required. Such special writing requirements require special optimization and design for data writing, and good compression efficiency and algorithms to compress large amounts of data and save space. This write architecture is usually seen as a special storage engine for MySQL databases.
In terms of implementation, MySQL’s high-performance write cluster can use the TokuDB storage engine. In recent years Facebook has also opened source its internal implementation of MyRocks as a high-performance write storage engine. MySQL’s default InnoDB storage engine, optimized in the new 5.7 and later versions for better write performance and compression performance, can also be used as an option for data writing.
(3) MySQL + cache (Memcached, Redis, etc.) high concurrent read architecture
Problem: If services access some small amount of hot data in MySQL, the concurrency of access is very high, the timeliness of access and the consistency of data are very high. At this time, the MySQL database itself supports data reading, which may lead to bottlenecks in concurrency and timeliness. At this point, you can use a caching system in conjunction with MySQL.
The cache system stores a small amount of hot data from the MySQL database into the memory. Since the I/O efficiency of the memory is much higher than that of the hard disk, the corresponding CPU consumption is also much lower. Therefore, the time required for the cache system to respond to a service request is much shorter than the time required to directly access the MySQL database. Therefore, the cache system can support high concurrent access of hot data. Data is written or written to the MySQL database. Data read operation has priority to read from the cache. If there is one in the cache, the result is returned directly from the cache; If not in the cache, data is read from the MySQL database and returned to the application, and the data results are put into the cache.
In terms of implementation, Memcached and Redis are commonly used technical architectures for caching systems. Memcached is a classic caching system, previously used in conjunction with LAMP and LNMP popular architectures. Redis is an emerging key-value NoSQL database. In addition to being used as a cache, Redis can also be persisted as a key-value database.
(4) MySQL + small file system (MongoDB, Ceph, etc.) large field access architecture
Problem: the MySQL database usually stores small fields that conform to the principles of a relational database, such as numeric and character data. But in a real environment, in addition to these common fields, there will be some big fields, such as user’s head the image files, audio and video files to upload, post content such as text fields, there are some JSON, XML files, etc., these can be stored in the MySQL database in binary, but read and management will be more troublesome. At this point, you can use a small file system in conjunction with MySQL.
Small file systems are systems that can store and quickly access structured data. For large fields such as pictures, audio, video, TXT files, JSON files and XML files, only simple read and write operations are generally performed. These fields are stored in the small file system, and the corresponding access links are stored in the tables of the MySQL database. In this way, the database table can be used to quickly read and write file location information. In a small file system, the file location information can be used to quickly read and write large fields.
In terms of specific implementation, small file system also has many technical software, such as MongoDB document-type NoSQL database and Ceph distributed small file system.
(5) MySQL + Inforbright/Greenplum statistical analysis architecture
Problem: Queries that respond to business needs in real time on MySQL databases, usually OLTP services, but for data that has been generated, there is usually a need for summary results and statistical analysis after the next day. Such OLAP requirements are usually executed infrequently, but consume a lot of resources per execution. If they are run on the same system as OLTP, the two types of services will interact. This is where the MySQL database combined with OLAP statistical business classification architecture can be used.
After service data is generated by MySQL, statistics, aggregation, and analysis of the data of the previous day are usually performed on the next day to reflect the operation of the service. This is to enable MySQL to support online OLTP service, through the data flow program, the data generated every day to the offline data warehouse system, in the data warehouse system, statistical analysis of various data, summary of results, and data statistics and then transferred to the results display library. In this way, online OLTP and offline OLAP can be used and executed together.
In terms of concrete implementation, for OLAP data warehouse architecture that MySQL database can combine, Inforbright data warehouse or Greenplum distributed MPP database warehouse can be selected. Inforbright data warehouse is relatively lightweight and similar to MySQL; Greenplum distributed MPP data warehouse can support statistical analysis of massive data, with stronger functions, performance, capacity and cost than Inforbright.
summary
The five special business architectures of MySQL can be said to be targeted solutions to special business scenarios and problems encountered on the basis of the three common and general architectures of MySQL.
For large data reads and writes, a horizontal scaling architecture can be used; For a large number of data writing requirements, you can use MySQL high-performance write architecture; For the demand of high concurrency and fast response of hot data, MySQL+ cache architecture can be adopted. For special large field access requirements, MySQL+ small file system architecture can be used. For offline statistical analysis requirements, MySQL+ statistical analysis architecture can be used.
Architecture composition and synthesis
The three common infrastructure architectures and the five special requirements architectures of MySQL can be used independently, in combination, or together.
(1) Architecture combination
For architectures with only one or two special cases, a simple combination of infrastructure and special architectures is fine. The types of architecture combinations available in production are:
-
MySQL+MHA high availability architecture combined with MySQL distributed Proxy horizontal extension architecture
-
MySQL+MHA high availability architecture combined with MySQL small file system large field access architecture
-
MySQL+MHA high availability architecture combined with MySQL cache high concurrent read architecture
-
MySQL distributed Proxy horizontal extension architecture combined with MySQL small file system large field access architecture
-
MySQL distributed Proxy horizontal scaling architecture combined with MySQL cache high concurrent read architecture
-
MySQL high-performance write architecture combined with MySQL Inforbright/Greenplum statistical analysis architecture
(2) Architecture synthesis
For more complex business scenarios, several specific database architectures can be used together:
MySQL+MHA high availability architecture, MySQL distributed Proxy horizontal expansion architecture, MySQL cache high concurrent read architecture, MySQL small file system large field access architecture, MySQL Inforbright/Greenplum statistical analysis architecture.
Ii. Classification of business environment
The first part of the MySQL architecture is explained, this is the understanding of the MySQL database itself, counted as “confidant”. All database systems provide services to the object is the business system, so THE DBA to understand the business system, the characteristics of the business and the suitable scenario, do know, can be regarded as “know the other”. If you know yourself and your enemy, you’ll be able to understand them better.
Derive data usage classification from database usage
From the perspective of database operations, the operations of a business system on a database can be divided into two categories: “read data” and “write data”. In terms of expansion, data writing can be divided into insert data, update data modification, and DELETE data deletion. Therefore, the use of database can also be subdivided into insert, delete delete, change update, check select four situations.
Business systems can be classified according to their operation classification of data:
(1) Read-only business system
Read-only is a condition where there is only a query select and no data is modified. In this case, there are already some data in the database, and these data are only used for query or display, there will be no data changes, such as the data in the cache, archived data, historical result data, statistical result data, etc., can only be queried and displayed, there will be no data changes.
(2) Read-write business system
According to the specific conditions of write operations, read-write service systems can be divided into three categories:
-
A read-write business system that inserts inserts without Update or Delete
In this case, the data in the index data table will only increase, and the data in the table can not be changed, there will be no modification or deletion operations; Such as operation record form, state change record form, message record form and so on.
-
Insert and Update, and no Delete can read and write business systems
In this case, the data in the index table can be added and modified, but once produced, the data can be changed but not modified. This is also a common database design idea, in which a table can be invalidated, but deleted effectively, not actually deleted from the table, but changes the column values of the status bits in the table, so that the data is always traceable.
-
Insert, Update, Delete all have read-write business system
In this case, the data in the index table can be operated on, can be queried, can also be changed, and can be added, deleted, or removed.
2
Common service table classification
Tables are classified from a business perspective, and although they are used differently by different applications, they can be abstracted into several broad categories of tables.
(1) Configuration table
This type of table usually stores some basic configuration information or dictionary information of the business. The amount of data in a table is generally small, and the modification operation is not frequent. It is usually a Select query operation.
(2) Status table
This kind of table is usually stored in the business system entity read image state information, the common user information table, order information table, etc.. The amount of data in this table is directly related to the size of entity read images, such as how many registered users an APP has and how many records the user table of this APP usually has. The status table changes frequently. Insert, Update, and Select operations occur. The existence of Delete operations depends on service requirements.
(3) Log table
This kind of table is usually used to record the state information of some entity in the business system, such as user login table, recharge information record table and so on. The data size of the table is usually large. If the service status changes frequently and a lot of change information is recorded, the data volume and insert performance of the table are required to be high. Log table operations are mainly Insert operations, and individual services will query the log table. High performance write architecture in MySQL five special requirements architecture is mainly the application of this table requirements.
(4) Archiving form
This table is used to archive or separate the data of the above three OLTP service tables. On the one hand, the data scale of the online business table can be controlled to ensure the performance of the business table. On the other hand, archiving can be used to better reflect and support archiving historical data. The amount of data in an archiving table is related to the size of the online table and the archiving period. Archiving table operations, in addition to the archiving process of data loading, mainly is the Select query operation, archiving is read-only table.
(5) Statistical data table
Statistical data table refers to the original data table of various on-line tables and archived tables that needs to be transferred to the on-line OLAP statistical analysis system through ETL process when the business needs off-line statistical analysis. Such tables are usually very large, and an OLAP statistical analysis platform aggregates data from multiple online business systems for statistical analysis. In addition to data flow, the operation of statistical data table is mainly the access calculation of various statistical analysis programs.
(6) Statistical results table
The statistical result table is the result data that various statistical analysis processes access in the statistical data table and perform statistical analysis according to certain logic when services have offline statistical analysis requirements. This kind of statistical result data, usually the data amount is relatively small. The operation of the statistical result table, besides processing the result flow action, is mainly for accessing the interface for Select query.
By sorting out the types of business tables, we can make a general division of all business systems and get a good idea.
DBA’s grasp of the business
The business system can be divided into four types by the way of data use, and then the common business table types can be divided to have an overall understanding of the general business use database. However, you need to confirm and consider specific business scenarios based on the actual situation of each company.
In most cases, a specific business can be classified into a certain type of business according to the situation, but the specific magnitude of each business will be different. You need to first understand the magnitude of the specific business environment, and then according to the use of the MySQL architecture, the corresponding is ok.
If it is confirmed that there is a situation in the actual environment that is not in the existing classification, the existing thinking can be used to carry out new type classification and architecture correspondence.
Principles for the combination of MySQL services and architecture
The previous two parts explain the various MySQL architectures and classify the business environment to give you a certain understanding of the MySQL architecture and business environment. Next, I will explain the combination of MySQL business and architecture by using the principle of the combination of the two in architecture design and operation and maintenance.
Applicability principle
The principle of applicability is that when considering which or several business scenarios to use for a specific business scenario, we should try to use the appropriate technical architecture to solve the appropriate problems.
(1) Requirements and scenarios
The three common MySQL infrastructures are applicable to more scenarios. However, when the data magnitude, access scale, and read/write mode of a common service scenario change greatly, it becomes a scenario with special requirements. You can use the MySQL architecture technology corresponding to a specific scenario to ensure the applicability as much as possible.
On the other hand, if the actual service does not meet special scenarios in terms of magnitude, scale, and read/write mode, use a common infrastructure to meet service requirements and reduce system complexity and risks.
(2) Whole and parts
No matter for a business system or MySQL database architecture, we should look at and consider both the whole and the part.
A business system, first of all, is a whole, from the overall view of various business needs and use, grasp the whole, and then consider the specific needs; If there are no special requirements, design and consider common scenarios; If a certain part has special requirements, it can be separated out and the corresponding architecture design can be carried out.
Multiple common and special architectures are combined to complete an architectural ensemble that supports business systems.
(3) Stability and upgrade
Under normal circumstances, the business system first uses the common architecture to support data. When the common architecture is applicable, the business system can also run stably. In the process of continuous operation of the business system, when new business scenarios are required, the steps and stages to ensure the stability of the existing business and upgrade the business system to the new architecture should be considered comprehensively.
Generally, do not upgrade all at once. It is recommended to test first, then go online, and gradually upgrade in batches.
Stage principle
There are stages in the development of business systems, and there are stages in the development of MySQL database architecture. Different phases focus on different information and main processing ideas. Considering phases from different dimensions is also an important principle for using architecture and business.
(1) Quantity stage
Quantity is an obvious indicator of stage judgment. Business systems often have metrics such as DAU, UV, and PV to help determine the size of the business system. Database system, QPS, TPS, the amount of data in a table, the number of tables in a library, the number of libraries in an instance, the total number of instances, and the number of servers are all closely related to the architecture.
Take table data volume as an example: If a table runs for a year and the number is less than 100,000, it can be considered as a small table. If the data volume is more than 100,000 to 10 million, it can be considered as a medium table. The amount of data in more than 10 million, can be considered as a large table, then need to consider archiving or horizontal split; If the amount of data is more than 100 million, it must be processed separately with a special architecture.
(2) Unified organization
When the business scale and data scale are small, it can be maintained if there are many different architectures. But as the number of database instances and business modules grows, it becomes important to unify one or a few data architectures. Unifying the architecture organization makes business systems and architectures easier to control and maintain.
(3) Scale control
When the business develops to a certain scale, the database in the underlying architecture must control the scale. An instance and a table should not be too large. If the scale exceeds the agreed size, the strength splitting or table splitting is required to keep the instance and library table in the unified set scale.
Extensibility principle
As application services change over time, the underlying MySQL architecture also needs to be extensible to a certain extent and maintain room for change and expansion to support the development of services.
(1) Opening up between architectures
It can be seen from the three basic architectures of MySQL, namely, MySQL single-instance architecture →MySQL master-slave master-slave architecture →MySQL MHA high-availability architecture, which are gradually evolving and directly dependent on each other. The subsequent InnoDB Cluster architecture launched by Oracle is also directly related to the evolution of these infrastructures.
The architecture of the other five special requirements can be adjusted from one special architecture to another according to the changes in the special situation as the business classification changes.
(2) OLTP and OLAP
Database systems are generally divided into OLTP and OLAP, but these two requirements need to be supported in the current business system and architecture design. As long as a relatively stable and reliable data flow system is established and the two are connected, the communication between OLTP and OLAP can be easily realized. The business data of OLTP is transmitted to OLAP for statistics, and the statistical results of OLAP are returned to OLTP for display.
(3) Use of new architecture
In addition to the common three infrastructures and five special architectures in MySQL architecture, there are some new technologies and trends to try to improve and solve some problems of the existing architecture, such as InnoDB Cluster technology, which will have a better solution to the expansion and high availability of MySQL.
Although these new technologies are not yet fully stable and mature, it is easy to extend the existing architecture into a new technology architecture to better solve business problems once the new technology architecture is stable and mature.
Afterword.
This paper tries to explain MySQL architecture and business from three aspects: MySQL architecture, business environment classification, and the principle of combining MySQL business and architecture. It hopes to make everyone understand the architecture and business from the perspective of architecture, and to be able to face and solve various business problems at a deeper level by analogy. Some of the details that are not architecturally relevant are not fully explored in this article and will be explained in a future article.