With the popularization of Internet applications and cloud computing, the focus of architecture design and software technology has changed from how to realize complex business logic to how to meet the high concurrent access requests of a large number of users.

A simple computing process, once faced with a large number of users access, the entire technical challenges will become completely different, software development methods, technical team organization, software process management will be completely different.

Take Sina Weibo for example, Sina Weibo only had two engineers at the beginning, one front end and one back end, and they developed Sina Weibo within a week. Now, many years later, Sina Weibo has a technical team of thousands of people, who have to deal with the technical challenges from more and more complex functions on the one hand, and the pressure of high concurrent access as the number of users increases.

Such challenges and pressures are almost the same for all large Internet systems. Although taobao, Baidu and wechat have different functions, they all face the same pressure of high concurrent users’ access requests. After all, the technical architecture for the same function used by a few people is completely different from that used by hundreds of millions of people.

As more and more users access the system at the same time, more and more system computing resources are consumed. More CPU and memory are needed to process users’ computing requests, more network bandwidth is needed to transmit users’ data, and more disk space is needed to store users’ data. When the consumption of resources exceeds the limit of the server resources, the server will crash and the whole system cannot be used properly.

So how do you solve the problem of high concurrent user requests?

Vertical and horizontal expansion

To cope with the consumption of system resources caused by high concurrent user access, one solution is vertical scaling. Vertical scaling is to improve the processing capability of a single server. For example, a server is composed of more cpus, more cores, larger memory, faster network adapters, and more disks, so that the processing capability of a single server is improved. In this way, the processing power of the system is increased.

Before the emergence of large-scale Internet, the software system of traditional industries, such as banks and telecommunications, mainly uses the means of vertical scaling to improve the system capability, enhance the server, and improve the hardware level of the server. When the business growth, the user, server computing capacity can’t meet the requirements, with more powerful computers, such as changing faster CPU and network CARDS, larger memory and disk, from the server to upgrade to the minicomputer, from small to medium-sized machine, medium-sized machine from ascending to the mainframe, server is getting stronger and stronger, processing power is getting stronger and stronger, Of course the price is more and more expensive, operation and maintenance is more and more complex.

The cost of vertical scaling is not always linear with the processing power of the server, that is, you cannot get the same amount of computing power for the same amount of money. And the more computing power there is, the more money it costs.

At the same time, restricted by the level of computer hardware technology, the computing capacity of a single server cannot be increased indefinitely, while the computing requirements of the Internet, especially the Internet of Things, are almost unlimited.

Therefore, in the Internet and the Internet of Things, we do not use vertical scaling, but horizontal scaling.

The so-called horizontal scaling refers to the use of more servers instead of improving the processing capacity of a single machine or using more expensive, faster and more powerful hardware. These servers form a distributed cluster, through which the external unified services are provided, so as to improve the overall processing capacity of the system.

However, in order to make more servers constitute a whole, it is necessary to design the architecture, so that these servers become a part of the overall system, organize these servers effectively, and uniformly improve the processing capacity of the system. This is the distributed architecture scheme commonly used in Internet applications and cloud computing.

Evolution of Internet distributed architecture

Distributed architecture is a technical architecture gradually developed by Internet enterprises in the process of rapid service development, including a series of distributed technical solutions: Distributed caching, load balancing, reverse proxy and CDN, distributed message queues, distributed databases, NoSQL databases, distributed files, search engines, microservices, etc., and distributed architecture solutions that integrate these distributed technologies.

These distributed technologies and architecture solutions evolve gradually to meet the increasing computing and storage requirements of high-concurrent users as Internet applications continue to grow. It can be said that almost all of these technologies are directly driven by application requirements.

Let’s take a look at the development history of a typical Internet application to see how the Internet system gradually evolved into various distributed technologies and formed a complex and huge distributed system.

In the earliest days, there were only a few users in the system, such as weibo just mentioned. An application accessing the database on its own server and the file system on its own server constitute a stand-alone system, which can meet the needs of a small number of users.

If the system proves to be commercially viable and valuable, the number of users will grow rapidly. For example, Sina Weibo introduced some celebrities to open micro-blogs, which quickly attracted a large number of fans of these celebrities to follow. At this point, the server can no longer withstand the access pressure, and the first upgrade is required, separating the database from the application.

In the previous standalone scenario, the database and application were deployed together. For the first separation, the application, database, and file systems are deployed on different servers, and the processing power increases threefold when you go from one server to three.

This separation requires almost no technical cost, just remote deployment of databases and file systems for remote access.

With the further increase of users and more fans joining weibo, the three servers can no longer bear such pressure, so they need to use caching to improve performance.

The so-called cache is to cache the data that the application needs to read, and read the data from the cache instead of the database. There are two types of cache: distributed cache and local cache. Distributed cache integrates multiple servers into a cluster to store more cached data, provide caching services for applications, and provide stronger caching capabilities.

By using the cache, on the one hand, the application does not need to access the database, because the database data is stored on disk, which takes more time to access the database, while the cache data is only stored in memory, which takes less time to access. On the other hand, the data in the database is in the form of raw data, while the data in the cache is usually in the form of results, for example, has been built into an object, the cache is the object, there is no need to calculate the object, which reduces the calculation time, but also reduces the PRESSURE on the CPU. Most importantly, the application reduces the pressure to access the database, which is often the bottleneck of the entire system, through the access cache. Reduce the database access pressure, is to improve the processing capacity of the entire system.

With the further increase of users, such as weibo, more stars join in and bring more fans. The application server could become a bottleneck again, connecting to a large number of concurrent users, requiring an upgrade of the application server. By using load balancing servers, application servers are deployed as a cluster and more application servers are added to handle user access.

On weibo, our main operation is to brush the microblog, that is, to read the microblog. If only the stars tweet and the fans tweet, there is not much pressure to access the database, because the microblog data can be provided through caching. But in fact, fans are also tweeting, which is writing data, and once again the database becomes a bottleneck for the whole system. A single database can’t handle that much access pressure.

At this time of the solution is to read and write the database, a database through the way of data replication, split into two database, the primary database is mainly responsible for data write operations, all write operations copy to from the database, ensure the data from the database and the main database data is consistent, and mainly provides data from a database read operation.

In this way, scaling one database server horizontally into two database servers provides more data processing power.

For most Internet applications, such a distributed architecture can meet the concurrent access pressure of users. However, for large-scale Internet applications, such as Sina Weibo, massive data storage and query need to be solved, as well as the resulting network bandwidth pressure and access delay. In addition, with the continuous complexity of business, how to achieve low coupling and modular development and deployment of the system has become an important technical challenge.

Massive data storage is mainly solved by distributed databases, distributed file systems, and NoSQL databases. Directly querying data in the database cannot meet the query performance requirements, and an independent search engine needs to be deployed to provide query services. At the same time, reduce the network bandwidth pressure of data centers, provide better user access delay, use CDN and reverse proxy to provide pre-cache, and return static file resources to users as soon as possible.

In order to make each subsystem more flexible and easy to expand, the distributed message queue is used to decouple related subsystems, and the collaboration between subsystems is completed through message publishing and subscription. Using microservice architecture, logically independent modules are physically deployed and maintained independently. Application systems complete their business logic by combining multiple microservices to achieve higher-level reuse of modules, so as to develop and maintain the system more quickly.

Microservices, message queues, NoSQL and other distributed technologies were relatively difficult and threshold to use in the early stage of their emergence, and were only used in relatively large-scale Internet systems. However, with the continuous maturity of technology in recent years, especially the popularity of cloud computing, the threshold of use is gradually reduced. Many small and medium-sized systems have also widely used these distributed technology architectures to design their own Internet systems.

summary

As the Internet becomes more and more popular, more and more enterprises adopt an Internet-oriented way to carry out their business. The traditional IT system, the number of users is limited and certain, supermarket system users are mainly supermarket cashiers, banking system users are mainly bank tellers, but supermarkets, banks, if these enterprises use the Internet to carry out their own business, then the number of users of the application system may be thousands of times to increase.

These massive users accessing the back-end system of the enterprise will produce high concurrent access pressure and consume huge computing resources. How to increase computing resources to meet the high concurrent user access pressure is the core driving force of Internet architecture technology.

These are mainly distributed technologies, and I will explain some typical distributed technology architectures in the following sections.