The architectural design of large websites is generally very different from that of small websites, and the technical points considered are also different

This article is synchronized to personal wechat public account “Front-end Hours”, welcome to pay attention to!

01 preface


Recently, I was interested in the architecture of large websites, so I read a book about architecture and recorded my thoughts.

We know that the software design of Taobao, Weibo, 12306, etc., is necessarily different from the software design we usually use, because the former involves a large amount of data storage, a large number of user visits, and is high concurrency (instant access). If have among them a link did not do well, it is to affect integral performance for certain, because this can appear short board effect.

What we see, such as ticket rush on time, double 11 shopping and hot search on Weibo, may lead to server downtime and network breakdown. On the one hand, it may be the network congestion, but more importantly, the architecture design of the website can meet the state of high concurrency and high availability (7*24 hours). Here is a look at how the architecture of a large site is implemented step by step.

02 Website Features


The characteristics we see on the surface are these two, large site visits, high site concurrency. Beyond that, we don’t care about anything else, these are the two most pressing problems that users can think of and solve. But in the development of technical personnel, but to consider a lot of factors, generally speaking, there are the following points:

  • High concurrency

How much is the amount of instant access to the website to bear, such as double 11 shopping concurrency can reach the level of 100 million, such a high concurrency is ordinary websites simply can not bear the pressure, not only the number of servers, but also consider the design scheme between servers and other factors.

  • High availability

I didn’t know what high availability was at first, it simply means 7*24 hours service. Since you can’t guarantee that your users will visit your site in the middle of the night, we want to make sure that the service is always up and running. Generally speaking, some small websites or systems will be updated at 0 PM, restricting access.

  • Massive data storage

When it comes to large websites, ordinary users are massive, so we need to consider how to store user data and user browsing information. For example, wechat, which we use every day, releases friends circle and chats every day, is massive and stored in Tencent’s special server cluster (many servers).

  • High safety factor

There is no denying that we are involved in some bank transactions every day, such as wechat transfer or Alipay transfer. In fact, if you think about the change in your cash, it is just a change in a number. It is worrying to think about it. For example, your Alipay balance is just a number lying there, the money has been used by Alipay in other ways, but your withdrawal and other operations will be scheduled back. All of these processes have to be safe.

  • Frequently updated requirements

Because the background will collect some information of users to improve product functions and product experience, or users want to add a certain function. At this point, there will be user demand to update the product functionality. Each software development is to meet a function, and then constantly updated iterative development.

  • Incremental development

No matter how big the website is from a small start, no matter how big the high-rise is a brick by brick. Progressive development is different from traditional software development and design, without the expectation of software integrity and the predictability of the whole function, they are improving themselves in continuous development. Through the continuous operation of products, to meet the needs of users, to adapt to the trend of The Times.

03 Design Evolution


I don’t know if you’ve heard of the word “LAMP,” but that was the early days of web design, and it only worked for small sites, and it certainly doesn’t work today. Due to the small amount of data at the beginning, a server is enough to support the operation of a real website. The operating system uses Linux, server uses Apache, database uses mysql and language uses PHP to develop.

Website through the development of business, continuous improvement, continuous evolution, formed a rules to follow the technical program. Each stage of the process is driven by the business, and if your site doesn’t have this requirement, the programmers aren’t doing the big design projects. As the book says, it is business that makes technology possible, and business that makes people possible.

Initial development stage

Business demand is low, and free, open source software can be used to build systems with simple configurations.

Application data is separated from service data

As the business grows, the performance of the site inevitably degrades, so this is the time to separate services.

Use the cache

Referring to the 80-20 rule, we know that 80% of users access 20% of the functions of the website, so we only need to do a good job of the most needed functions of the users, so we can use the technology of caching, can quickly return the resources needed by the users to the users.

Application Server Cluster

With the increase of your business volume, and the continuous increase of functions, one server may not be able to handle, so we put more than one server to handle the business at the same time. Just like handing over user requests to multiple people, performance will definitely improve.

Database read/write separation

Not only at the application level, the operation of data is also important. As we know, data is either read or write. Generally speaking, users have more read operations. So we separate data reads from and writes to the database, with one database providing data and the other writing data to the user, and then synchronizing data in between (master/slave backup).

Load balancing and CDN

CDN is needed for websites with large business volumes, such as those all over the country and even the world. Because users in the south access servers in the north, there is also a delay; Or if an American user visits China, the delay will be even greater. CDN is the content distribution network, and the server closest to the user returns the data directly, which is much faster.

There is also load balancing. When the data added into THE CDN expires, users will go to the load balancing server first when they need to access the data center. There is also cache here. In general, the principle of CDN and load balancing server is the use of caching technology.

Distributed file systems and distributed database systems

In fact, distributed data stores are split to store data of different services on different servers to reduce the pressure on other servers. For example, you can store the user’s order data to database server A and the user’s information to database server B.

NoSQL and search engines

Search engines are designed to respond to the search function of websites.

business

Divide a website into several different applications, each of which is deployed and maintained independently. For example, you can put a function out, provide interface embedded in the site, its logic processing is in another server.

Distributed service

The common business is extracted and then deployed independently to complete specific business operations through distributed service calls to common services.

04 summary

We can take a look at the current Internet companies. There are only a few that can be called BAT level. After all, most of them are small companies that are gradually developing their own businesses. He cannot have the time and energy to research and develop every field. But the general company will have their own professional point, you just want to develop their own business, good service to our users is enough, sometimes do a lot of fancy things but no benefit.

What small websites need to do most is to provide good service for users to create value, get users’ recognition, live, savage growth. — Li Zhihui, Technical Architecture of Large Websites

Some companies want to change their architecture because they see a lot of new technologies coming out recently. It’s technology for technology’s sake. Sometimes their intentions are good, but they may also bring bad results and fail to follow their business trends. Also do not blindly imitate the technical solutions of big companies, to develop their own independent business, develop their own independent technology.

In general, every company now has its own technology solution, and whether you refactor your technology solution or change your server distribution depends on whether you need to. But now the Development of Internet technology is more and more mature, there are some resources you can pay for, like the cloud services of many companies can be used, and the technology is stable, the quality is good. Big companies, after all. You want more high-quality resources, depending on the money to how much, very simple, no worries (has helped you solve the problem).

Finally, I would like to recommend this book “Technical Architecture of Large Websites: Core Principles and Case Studies”. The author of this book has very good technology and unique analysis, which is worth in-depth reading.

Refer to the article

  • Li Zhihui, Technical Architecture of Large Websites: Core Principles and Case Analysis