Evolution of the technical architecture of large web sites

Read the latest “large website technology architecture” some content, record the process of website evolution and some insights.

Evolution history:

1. Initial site architecture

In the early days of these sites, a single server played all roles. Applications, databases, files, etc., were all deployed on it. Many individual developers probably did the same thing, and media files were also placed in the database. There’s not much flow, so there’s no pressure or problems.

2. Separate application services from data services

With the development of the website, I think this is really inevitable, and the book also says that for application server and data server, server hardware requirements are actually different. Application servers need a lot of computing, provide stable services, need a high performance CPU, and here I think it may also need a large memory, after all, the intermediate results and processing of various calculations are done in memory. For the data server that disk must be big enough, the corresponding data will require large memory. File services are similar to data servers. In some cases, data services and file services need to have GC mechanisms to automatically and periodically clean up expired data if the service is to continue to operate steadily and the data is to continue to grow.

3. Use caching

The data that you’re going to cache is going to be in memory at the beginning, but if you have a lot of data you’re going to have to put a lot of pressure on the memory of the application server to do the service, to keep a lot of cache, and if you don’t have synchronization. When the server restarts or goes down, the cached data is gone, and the strain on the database is instantaneous. So in addition to the local cache will also need remote distributed cache, cache cluster architecture will not cause a cache server to fail, resulting in the entire site traffic skyrocketing. But there are some technical difficulties in this one how to synchronize the data, how to ensure that you don’t cache avalanche.

4. Application server cluster

As the traffic becomes heavier, one application server cannot handle it. In this case, multiple servers are required to distribute the traffic and handle the problem. Here I think it is better to add the configuration of one server to a high level than two servers with less high configuration. The higher the configuration is, the higher the price will be. Moreover, even if a server is well configured, there will be the possibility of failure. The whole service is still highly available and it’s not going to be inaccessible, failed transactions because of a server problem. As long as the traffic can be sent to other servers, the overall service is still normal. Another point that comes to mind here is that the architecture can add the layer of load balancing in front of the application server early, and the server needs to be added later. It only needs to configure the configuration of load balancing server, so it can be quickly expanded and very flexible. It may not be very convenient to check the log, and you need to search several machines to see which machine the error log is on.

5. Read and write data of the database are separated

There are already multiple application servers, the application level may not be so much pressure, but the database is still under pressure, so many server connections, crazy requests. This time can put the database read and write separation, for the database read as long as not slow SQL, but also has been frequent query this kind, basically there will not be too big a problem. In addition, after the separation of read and write, the website can, according to its own needs, determine whether there are more read scenes or more write scenes, and add different hardware to the read and write servers respectively to relieve the pressure of data. But there may be a problem here, master slave synchronization, data can be consistent.

6. Use reverse proxy and CDN to speed access

To achieve this level of feeling that the site has been relatively large, at least has been inter-provincial level above. Using a reverse proxy server to cache static resources from some web sites can also reduce access pressure. Use the CDN is more rapid, after all, the real world of physical distance there, we can’t request, direct access to hainan also quickly in xinjiang, so the user may need to be in the middle distance closer to some places, depositing some servers, so that users from the place where they are close to read data, network latency will be much smaller.

7. Use distributed file systems and distributed databases

I feel that there may not be so many big websites in this area. The database and file system are the same as the application server above. If one application server cannot solve the problem, we need to build more, and then we may take them apart according to the business situation. It basically relies on collective strength to solve problems, which seems to be a relatively common idea of architecture.

8. Use Nosql and search engines

Nosql and search engines may be used when you have different requirements for different data. I personally am not sure what scenarios and data volumes are necessary to use these techniques, as a general search might be done directly with the database. You may need to speed up query times and then use search engines when you have full-text searches.

9. Split applications

In fact, I think this level should not be the last level, it should be in the middle of the development of such a business split, each project is responsible for its own module, and then through some distributed framework or message queue to complete the communication interaction. You don’t want to break it down too far in the future, where the code is already so tightly coupled that it’s a very difficult thing to break it up.

Feeling:

1. I feel that all the mistakes mentioned in the book are very relevant and have good guidance and reference function. For example, we should not blindly pursue the solutions of large companies. Moreover, the scheme of big companies is not necessarily suitable for themselves, may be copied over will only acclimatize, increase the pressure of engineer transformation. Therefore, we should learn from the advantages of the structure of large companies, and then follow our own company’s characteristic structure model, and teach students according to their aptitude. 2. We should not use technology for technology’s sake. This is also a very good saying. Using technology just for the sake of doing it, rather than the business, is wrong, putting the cart before the horse. 3. Technology serves the business. To start a company and do business, first of all, it needs to solve the survival problem. You can learn and understand the latest technology in advance to prepare for the future technology architecture. 4. Not all problems can be solved by technology. The example cited in the book is buying tickets on 12306. You ended up with a lot of people not being able to get tickets. If you think about a billion people, even if only a few hundred million people are going home for Spring Festival, it’s a huge pressure to buy tickets on the same day. In this case, you should put tickets in advance, divided into many batches to put tickets, the peak flow to come down, apportioned to different time periods. In the introduction of queuing mechanism, the server can withstand the pressure, the release of tickets, so that it is more reasonable. In this case, do not want to solve the technical problem, should try to provide better background conditions for the technology from the business solution, rather than put all the pressure on the technology, not to solve the problem is the problem of technical personnel, which is wrong. 5. No absolute system architecture evolution stage, not have to be at some stage we can only use the stages of the technology, in the book are just gives some concrete architecture thinking and direction, and analyzing the specific problems to according to the actual situation of each architecture can also be to open it to find suitable for his way of architecture.