The outbreak has delayed the opening of schools across the country. In order to avoid delaying students’ studies, the Ministry of Education has launched a “no suspension” policy to encourage teachers and students to actively carry out online teaching mode.
In response to the government’s call, Shanghai Yishan Mu Education Technology Co., LTD., the top student in the exam, offered free online teaching materials to students from grade one to grade three in Hubei province, and later expanded to the whole country. This public welfare activity has received a great response, the website user registration and page traffic is several times more than usual.
Although activities from planning to launch a total of only four days, descend net technical team through scientific decision-making, the implementation of the timely and precise, completed the business code transformation, the architecture evaluation, capacity upgrade, SQL optimization, etc, and also during UCloud UDB team closely, realized the database with the help of structure modification and expansion, We successfully undertook the visit pressure of QPS, which increased by 20 times, and successfully completed the technical support task.
Here we share some details and thoughts in this process for the reference of enterprises in need of rapid expansion.
Business pain points facing
Since the company was founded in 2005 has focused on K12 online education, the outbreak of public activity idea phase, the business side use after more than ten years of experience in Internet and offline education, rapid design the effective feasible solutions, in many details, such as accelerating the channel, learning only takes two steps to ease of use, for parents to promote the qr code, etc.
The task of the technical team is to predict the rapid growth of background traffic in advance, design effective plans in case of a rainy day, check the omissions and fill the gaps, and guarantee the smooth operation of activities. To this end, we carefully sorted out the weak points of the existing architecture, and formulated an accurate and effective expansion plan accordingly.
Since our previous expansion was steady and slow, this time we anticipated a lot of access requirements, and the first task was to analyze the capabilities and deficiencies of the existing architecture.
The main application architecture of the previous business was ULB load + dual UHost monomer architecture + distributed cache + single instance high availability UDB, and a large number of UCloud cloud components were used. We did a thorough review of each layer of the architecture and evaluated the capabilities of each layer. The ULB is capable of carrying large amounts of traffic and focuses on the back end. UHost is the application service. We measured how many concurrent requests can a master module deployed on 8-core 16GB UHost withstand, compared the advantages and disadvantages of the two schemes of vertical expansion and parallel expansion of application modules, and finally decided to expand horizontally. Considering that the external single service interface is driven by the main application traffic, the capacity expansion pressure is also adjusted. The way to cache is to increase the distributed cache and optimize the access cache. The biggest pressure is on databases, which can lead to serious performance bottlenecks.
Whether it’s MySQL, PostgreSQL or MongoDB, we use UCloud’s UDB hosting service, which includes the core database. Compared to self-built databases, UDB provides robust, high availability and maintenance-free services that allow us to focus more safely on building the upper-layer business logic. In terms of data security, UDB provides a comprehensive backup and recovery mechanism to prevent accidental data loss.
Three decisions for rapid expansion
1.Online Vertical Upgrade
We initially chose a highly available UDB deployment for our business core database. In the stable period, UDB instance configuration is not high at the beginning from the perspective of cost performance. It is estimated that the maximum QPS load is within 3000. On the eve of the outbreak, a vertical upgrade of the UDB instance was first performed, significantly increasing the memory and disk configuration of the database. This operation is fast, with little impact on business access (only second level access outages during disk upgrades), and the instance’s processing performance improves quickly.
2, ** high-performance read and write separation **
However, with the rapid growth of services, the pressure on the database is expected to exceed the limit that a single UDB instance can bear. Therefore, it is imperative to upgrade the database from a single UDB instance to a cluster.
The UDB team helped us design the solution. After analyzing SQL requests, the read/write ratio is about 50:1, which is a typical service with many reads and few writes. Therefore, the database cluster architecture of one master, many slaves and read/write separation Proxy is recommended.
In this architecture, data is synchronized between the primary (high availability UDB) and secondary (single-node UDB) through asynchronous replication. The IP address of the service database is changed to the IP address of the READ-write Proxy. The read-write Proxy identifies the read and write types of service SQL, forwards write SQL and transactional read SQL to the primary, and forwards common read SQL to the secondary in proportion. The forwarding ratio console can be configured freely. In addition, read/write Proxy is almost 100% compatible with MySQL syntax and protocols, enabling services to be accessed without modification.
This architecture seems to be a perfect solution to the problem of heavy traffic on the database caused by too much reading and too little writing, but there are still some crucial technical details to be grasped.
One of them is the forwarding performance of read-write separation middleware. Theoretically, the read performance of a database cluster can be linearly improved by horizontally adding slave nodes. However, if the number of slave nodes is added, will read/write separation middleware become a new bottleneck?
We attach great importance to this problem, so we first estimate a series of data indicators, including the performance upper limit of the existing UDB instances, the performance upper limit of UDB based on the existing access model after the vertical upgrade, and the relationship between UDB’s read and write request processing performance and the number of slave nodes after the upgrade to the primary/secondary cluster.
The second step is to cooperate with UDB team to do the pressure test of Proxy for read/write separation.
The Sysbench program was used to simulate the read processing performance of the whole read-write separation cluster on two physical machines when the underlying UDB nodes grew from 1 master node to 1 master node and 6 slave nodes. It can be seen from the results that thanks to the full utilization of multi-core CPU by the READ-write Proxy and multiple performance tuning at the code level, the read-write Proxy has reliable forwarding performance and does not constitute a performance bottleneck of the cluster. In this way, the read performance of the cluster can increase almost perfectly linearly with the linear growth of the slave nodes.
The data was reassuring, and the UDB team helped develop the entire UDB expansion plan. Our business code was adjusted accordingly, and the whole solution could be implemented with virtually no awareness from the site’s users. Since then, the capacity expansion of back-end services began in an orderly manner. In the early hours of every morning when the business was at a low ebb, the capacity expansion was carried out. First, the application cluster of the master station was expanded, the database was upgraded to the read-write separation cluster, and then the peripheral individual services were expanded layer by layer.
operation
Within 10 days from February 1 to February 10, the average number of registered top scorers rose from 8,300 to 30,000, and the largest number of registered users in a single day reached 58,000. In addition, it has successfully provided free online services such as online micro-lessons, question bank exercises, and homework groups to more than 1 million students in different regions of China, contributing to the continuous suspension of classes and high-quality public education resources.
During this period, the throughput (QPS) of our service database increased by nearly 20 times, and the highest QPS of single instance of online Proxy for read-write separation has reached over 200,000. After upgrading to the read-write separation cluster, the cluster configuration of Proxy for active-active read-write separation with 1 master, 6 secondary nodes and 2 nodes has successfully handled the rapid growth of service traffic. Our technology architecture has also grown into a medium-sized Internet service of considerable size.
3, ** New high availability architecture: expand connection number **
On February 10, the online through the peak after the beginning of the business is still in the high speed development, although separate cluster master-slave, speaking, reading and writing is not a problem in dealing with the business high QPS access, but along with the increase in business module, in the morning of February 12 peak business launched a number of connections to the database, has reached a high availability UDB product 6000 caps.
In order to fundamentally solve this problem, the UDB r&d team proposes to upgrade the backend architecture of the high availability UDB from the traditional VIP+ agent +DB architecture to the new drifting VIP+DB dual-master architecture. For this reason, they worked out a transparent upgrade plan overnight. They could upgrade from the old architecture to the new architecture within 2 minutes without moving the data, which was approved by us and implemented in the early morning to ensure the normal operation of the next day’s business.
The new high availability UDB architecture, using the stable UCloud virtual network VIP management service, simplifies the architecture to a more naive implementation of floating VIP+DB dual master, reduces one forwarding on the data link, eliminates a potential performance bottleneck, and simplifies the control module to reduce uncontrollable factors. The new architecture is also more compatible with native databases (MySQL and PG).
Slow query optimization
Advance planning and rapid expansion give us more margin to quickly deal with and solve other hidden problems exposed when heavy traffic bursts, the most typical example is slow database query.
Our background code uses the ORM framework to connect to MySQL. Because the ORM layer shields the details of the underlying MySQL library tables, a small part of the code accessing MySQL does not take into account the execution logic of the underlying MySQL, resulting in too many slow queries. The problem of slow query has no obvious impact under the condition of small traffic and slow growth in normal times, but it cannot be ignored under the pressure of large traffic during the epidemic.
UCloud DBA team provided a lot of help during the period, they have rich experience in locating and solving slow query problems. During outbreaks, they overcome more work-at-home interference, remote communication inconvenience adverse factors, such as through WeChat, methods of remote meeting at any time, and we keep in touch, combined with our understanding of the business logic, co-location, combing with the business past all new database tables and indexes, the final query to effectively solve the problem of slow.
Write in the last
This rapid response to unexpected demand expansion, from plan to implementation, is an effective practice for cloud users and cloud manufacturers to work together and give full play to their respective advantages. Our extensive use of UDB in our business is a recognition of its two core capabilities of flexibility and full hosting. Through this collaboration, we have gained more understanding of its rapid response, plan making and 7*24 online services.
It is reported that recently UCloud UDB team will also launch kuaijie UDB new product, which is based on the separation of computing and storage architecture, combined with the layered hybrid storage design at the back end of UCloud data Ark, can achieve fast backup and recovery of database data to any second ability. This avoids data loss and slow data recovery caused by database deletion.