Traditionally building a blog site requires a reverse proxy Nginx, an application service, and a MySQL database to build a standard WEB site.

The blog is adding more than 3000 articles every day, the speed is already very slow. If I want to build an APP later, the data volume will definitely be larger. How can I ensure the access speed at that time

If the volume of 3000 + articles per day is slow, consider modest architectural improvements.

Optimization of traditional architecture

Therefore, whether to increase and balance the load of multiple application services can improve the response speed of concurrent requests. Also consider adding Redis to improve read performance? Of course, it’s the only way!

Nginx acts as a load balancer. The client and the Web container make stateless requests and responses. The load between Nginx and the Web container is kept in IP mode, mainly to satisfy the Web session. If this process creates Web container pressure, add servers, but often the pressure is not there, it comes from the database. Therefore, the next step is to consider the design of read and write separation. Generally, one write and two read are commonly used. This reduces the load of reading and writing to a database.

We can do this by separating the database master and slave as described above, paying attention to the modification of database queries and updates, and reducing the amount of code modification through Service layer annotation interception. The red arrows are writes to the master library and copies from the slave library, and the gray arrows are the way a WEB container breaks down the query pressure against a slave library.

When the read aspect still encounters great concurrency pressure, Redis can be further incorporated into the query cache to further improve the read performance.

As shown in the figure above, On the one hand, Redis can be used as the Session shared pool of the Web dual container, thus realizing the Session decoupling of the Web container in the distributed environment. The biggest benefit is that the Nginx proxy does not need Ip hash, because Ip binding is prone to access skew. Redis, on the other hand, can work with MyBatis, a similar data access framework, as a secondary cache for read operations. This maximizes database read performance.

If this step is done, the reading problem is basically horizontal. In fact, the biggest problem is the database write problem, because if you have encountered this step, I believe that the write problem will also have a bottleneck, as they say, it never rains but it pours.

For MySQL is much more difficult to write than actually read optimization, often involves the data transformation, often do depots table, for example, is a typical dynamic data, needs to be a range of data table according to the data growth form a table, stored in a MySQL database in a distributed and centralized routing table to assist registration and discovery of distributed library table, Thus, the write process must first determine the database routing address from the routing table. In fact, don’t use this mode unless you have to, because this process complicates the problem the most! Read/write separation was replanned at least at the beginning, and cross-table aggregation was coupled to the upper application implementation.

MySQL > partition MySQL > partition MySQL > partition MySQL RANGE partitioning, a LIST, the HASH partitioning, composite partition, it is important to note to planning division according to business needs, such as the date of the article writing has obvious sex, then based on the date RANGE partition is pretty good, but often have heat, interaction is very frequent, the articles and interaction should play heat on the label, The heat classification label is taken as the segmentation item of the LIST partition, and the interactive data with heat is transferred to the heat partition by means of composite partition.

Optimization of hybrid architecture

Well, if the single library pressure is still huge, then you can no longer consider the storage form of a relational RDBMS, and need to consider the introduction of K-V NoSQL support writes.

To give you a hack, replace InnoDB with the MySQL Master library engine and try using the MyRocks engine, but this requires strict business compatibility testing.

MyRocks is actually RocksDB, and RocksDB has a write performance improvement of dozens of blocks over traditional databases (provided that SSD storage is preferred). Check out another of my answer pieces: Why do distributed databases use KV Store so much? From the logic analysis of the underlying data structure, we can understand why k-V storage has strong write ability. Although the scope lookup is not as good as traditional RDBMS, but the read-write separation mechanism just makes up for this, but this black technology, the most uncertain is the stability of read-write replication, which needs to be tested! The test! The test!

Ok, let’s speculate about how baidu and Zhihu will cope with the data record writing of hundreds of millions or even billions of millions every day.

At this time, MySQL database single library write is basically impossible, single I/O can not support, then will adopt a mixed solution of distributed NoSQL+ relational database cluster, that is, k-V storage model of distributed database to deal with frequent insert update operation, but the integrity of the business relationship, finally land in the relational database cluster, For complex and intensive business relationships, it is more appropriate to maintain relationship tables.

As for baidu and Zhihu, I guess that they should store and access KV on the distributed big data platform for businesses with high real-time operation, such as editing articles continuously, and complete temporary processing without rushing to update the intensive business relationship table. After formal submission, There must be a delayed queuing process before an RDBMS database maintains transactional integrity for relational tables.

The above is only an inferential guess, not necessarily accurate, for reference only.

As can be seen from the figure, Hadoop, a big data platform, is introduced to make use of HBase’s extremely high performance K-V read and write, especially the draft editing of article content, which basically belongs to quasi-real-time operation. If thousands of people do this on the MySQL database online, the database writing will crash! Therefore, the k-V relationship that can form a document can be written to the HBase sparse table as much as possible. In fact, the update is only an iteration of the content version. HBase does not worry about complex relationship issues and only focuses on editing articles.

When the author think finished writing, submit an article, enter a state of audit, audit process can make full use of the messaging system, form the text audit events, sensitive to filter, adultury and real-time flow processing and so on questions, by subscription to drive pipe to the relational database cluster, forming a complete data transaction relationship, Then the problem of solving high-concurrency writes is transformed into a massive data computation problem of queue pushing. After the article query on the relational database cluster, the formation of a cache mechanism, distributed query system, it is much easier!

The last

Said so much, in fact giant distributed data calculated reasoning is certainly more complicated than I many, many, I just stand in the perspective of technical rationality, give you a direction of thinking, build high concurrency, huge amounts of data web site, we should follow a process, at the end of the day is to use the smallest cost, gradually deepened, the transition technology to prevent the beginning. In short, relational database sub – database sub – table mode unless absolutely necessary, must be careful to use! Because once it’s on, it’s hard to turn it around, and system operations are overwhelmed by the complexity of data maintenance.

Head over to read Byte Creation Center – learn more about Read Byte creation

This article is the public number “read bytes” original article, reproduced please be sure to show the source of the article