About the author: Huang Qingbing, chief technical evangelist of netease Honeycomb, graduated from Zhejiang University with a master’s degree, engaged in cloud computing, Docker, Go and other related development and technical evangelism work; I like open source and share with others. I am good at preaching. I have worked with open source gadgets, made Docker courses and shared Gopher Meetup. Welcome to discuss technology! Personal Homepage: bingohuang.com/

The following is the text:

The Internet is changing, the architecture is changing, and the change of the architecture is also the change of the Internet. So it’s worth talking about the architecture of the Internet and how it’s changing.

What is architecture? To a large extent, the universe and society have structures; to a small extent, architecture and software have structures; to a large extent, architecture comes from division of labor and returns to the whole; to a practical extent, the core of architecture is to solve problems, including business and human problems.

Based on the Internet industry, architecture usually refers to technical architecture, more specifically, system architecture, software architecture, or most commonly, website architecture. This paper will discuss the evolution process of technology architecture in the Internet era and its advantages and disadvantages. If there are shortcomings, we hope to correct them and promote communication.

For the sake of presentation, I will divide the evolution of the Internet architecture into three eras: the stand-alone era, the cluster era and the distributed era. The three eras are not in historical chronological order, but rather determined by the time period of the team or product.

Single machine age

In the early days of the Internet, just like when a product team of Hangzhou Research Institute was founded, resources were limited and manpower was insufficient. In order to quickly develop a product or launch a website, a single machine was often a good choice. At this time, resources such as application programs, file services and database services would be concentrated on a Server. Applications are usually packaged and deployed as a whole, and the specific format depends on the language and framework of the application, such as Java WAR files and Rails directory files. This architecture is often called a single architecture.

Monomer architecture

A system architecture diagram would look something like this:

Figure 1: PC era -ALL IN ONE

Pros: Simple and fast, easy to develop, easy to test, easy to deploy Cons: also very significant, only suitable for early projects, difficult to maintain when it gets big, and has a single point, upgrades need to be stopped

Layered architecture

The discerning eye soon noticed that the application architecture at this time was haphazard, which might have existed in early Web development, such as using JSP+JDBC, ASP+ADO, but it was clearly not a friendly standard architecture, so the hierarchical architecture came into being, as shown in the figure below. It is generally divided into presentation layer, business layer, persistence layer and database layer. This is actually the most common MVC architecture.

Figure 2: Single machine era – Software Architecture – Layered architecture

The system architecture diagram after transformation is as follows:

Figure 3: Single machine era – Layered architecture

  • Advantages: simple structure, clear division of labor, layered testing, if you do not know what software architecture to use, recommend it
  • Disadvantages: Poor scalability, low efficiency of iterative development, sometimes too many layers lead to complex processes

Data separation

Added the layered architecture, the application has a good look, the team’s development efficiency has been improved. At this time, the business volume further increases, and has a certain user scale, and gradually found that the application and data resources on a host are very fierce. Each service has different requirements on hardware resources. Application servers need faster CPUS, file servers need larger disks, and database servers need larger memory and disks. Therefore, it is decided to separate application and data services, forming the following architecture:

Figure 4: Single-machine era – Data separation

  • Advantages: Resources are dispersed, improving hardware utilization of different services and facilitating maintenance
  • Disadvantages: Increased resource consumption and network overhead, but also a single point

Cache appearance

The product has a certain reputation, the number of users continue to grow, access began to frequent, to improve the speed of access, cache is essential, shining debut.

Figure 5: Single machine era – cache debut

Server-side cache can be divided into local cache and remote cache, each with its own advantages and disadvantages. Local cache has fast access speed, but limited data volume, and subsequent clustering is not convenient to share. Remote caches can be shared, clustered, and have unlimited capacity, but be aware of cache updates.

  • Advantages: Simple and effective, reduce the DB query
  • Disadvantages: Increased logical judgment, not suitable for storing large objects, this architecture also has a single point

Reading and writing separation

Market response is good, the business also continues to grow, but the performance has declined, analysis of the entire architecture, found that the database read and write is very frequent, even some businesses, read more than write, a single database server has become a bottleneck, at this point, we can try to do read and write separation and master/slave replication.

Figure 6: Single-machine era – Read/write separation

  • Advantages: reduce the pressure of a single database, the number of slave machines can be flexibly changed
  • Disadvantages: The architecture becomes complex and difficult to maintain

Since then, the architecture of the stand-alone era has taken shape. “Although a sparrow is small, it has all the five organs”, which has been able to support the operation of the business well in the early stage. But as the business grows, there may be bottlenecks in each module. The biggest problem in the stand-alone era is that the entire architecture has single points, which will be solved in the cluster era.

Age of the cluster

In the era of single machine, many measures have been taken to relieve the pressure of database layer, including server separation, the introduction of cache, data separation, etc. But with the increase of traffic, the demand for high availability is getting higher and higher. It is urgent to reduce the pressure of application layer and solve the single point problem, which is what needs to be done in the era of cluster.

Load balancing

The code is the foundation of the architecture, but the workload of reforming the code in the early stage is large. If the staff changes frequently, the risk is higher. Therefore, the common means to improve the server performance is to cluster the application first and do load balancing.

Figure 7: Cluster age – Load balancing

  • Advantages: Remove the application layer single point, availability is guaranteed, performance is improved
  • Disadvantages: Pay attention to consistency issues between applications, such as cache access and Session storage

Dynamic and static separation

To further reduce the pressure on the application server, dynamic and static separation technology can be used.

Dynamic and static separation is to make the dynamic web pages in the dynamic website, according to certain rules to distinguish between constant resources and often changed resources, dynamic and static resources after splitting, we can also do cache operations according to the characteristics of static resources, in order to speed up the response.

In Hangyan, the common practice is to separate the front end from the back end. The back end applications provide APIS to process the requests from the front end and return the processing results to the front end in JSON format

Figure 8: Cluster age – Static and dynamic separation

  • Advantages: Reduce application server stress, cache static files, speed up response, front-end and back-end separation, development can be parallel.
  • Disadvantages: static file cache update failure problem, high communication cost

The CDN to accelerate

Content Delivery Network (CDN) can further speed up the corresponding website. Its principle is to synchronize the source Content to all edge nodes in the country, and with the precise scheduling system, distribute the user’s request to the node most suitable for him, so that the user can obtain the Content he needs at the fastest speed.

Figure 9: Cluster age -CDN acceleration

  • Advantages: Small network bandwidth, large user access volume, uneven network distribution, improves user access response speed, and alleviates application load.
  • Disadvantages: obviously cost up, CDN services are generally charged by traffic, and there are static file cache update failures.

Redundant cluster

Above a medium-sized website structure is basically formed. As mid-sized sites continue to evolve into larger sites, the ultimate goal is to ensure the “three highs” : high concurrency, high performance, and high availability. The above architecture can basically meet the performance requirements, then more focus on “high availability”, ensuring “no single point”.

At this point, you need to perform redundant cluster load on critical services.

Ideally, we would cluster the following services/applications:

  • Database Service Cluster
  • File Service Cluster
  • Cache service cluster
  • Application Service Cluster
  • Load balancing scheduler cluster
  • Static content service clusters
  • CDN server cluster

Figure 10: Cluster age – Redundant cluster

  • Advantages: To a single point, high availability
  • Disadvantages: data has state problems, data consistency problems, resource costs, human maintenance costs have gone up

So far, the architecture of a large website is also basically formed, can “add machine” place is finished, is not the end? Of course not! With the advent of DT/ distributed era, the emergence of large traffic and big data scenarios put forward higher requirements for applications, and then need to operate on applications.

Distributed age

Application of split

Previously, we just layered the application, which is still a good choice in the early stages of a startup or product. Although applications are clustered and load balanced, the application architecture is still “centralized”. As businesses become more complex and websites become more functional, it is imperative to split applications.

  • Advantages: Applied decoupling, split team responsibility, divide and conquer
  • Cons: Complex architecture

After application splitting, there is also the problem of interdependent, common modules, especially depending on the same logical or functional code. At this point, we can consider extracting these common services for independent deployment, unified governance, and increased reuse, which is service-oriented Architecture (SOA).

The message queue

After the application is split and the service is deployed independently, some communication or dependency problems still occur. In this case, message queues can be introduced to improve throughput.

  • Advantages: asynchronous, decoupled, improved throughput
  • Disadvantages: message consumption delay, etc

Data depots

If multiple applications are connected to a single database, the number of connections, QPS, TPS, and I/O processing capacity are very limited.

  • Advantages: DB partial voltage, reduced coupling
  • Disadvantages: Redundant and complex data access modules

Mention sub – library, many people will think of sub – table, this I did not practice, not good to write. However, more complex data architecture and data consistency problems will be introduced, and the mature open source database and table scheme on the market does not exist, which is a deep pit of uncertainty. Dismantling or not dismantling is also a question worth thinking about.

Microservices Architecture

Microservices Architecture has been a hot topic, frequently mentioned in articles, blogs, and conference presentations. Microservices didn’t come out of nowhere. Some say it’s an upgrade of service-oriented architecture (SOA), which precedes centralized architecture, distributed architecture, etc. Here’s an abstract diagram to describe some common architectures:

Figure 11: Distributed age – Microservices Architecture – Abstract comparison

Microservices architecture consists of multiple micro-services, each of which is an independent deployable unit or component. They are distributed, decouple from each other, interact with each other through lightweight telecommunication protocols (such as REST), use different databases, and are language independent. They are independent, small, lightweight, loosely coupled, and can be easily combined and reconstructed, just like the micro-robots in Big Hero 6, which are simple in individual but powerful in combination.

Figure 12: Distributed age – Microservices architecture

  • Advantages: Good scalability, low coupling between services, independent between services, easy to deploy, easy to develop, convenient to test each service
  • Disadvantages: It is easy to pay too much attention to the size of services, which may be split very fine, resulting in the system relying on a large number of micro-services, and the mutual communication between services will become complicated, the system integration complexity will increase, and it is difficult to achieve atomic operation.

Another reason why microservices are so popular is the emergence of Docker, which provides microservices with a perfect operating environment. Docker’s independence and fine granularity match the concept of microservices very well. Docker’s excellent performance and rich management tools enable people to have certain information about microservices. Generally speaking, Docker has the following four points suitable for micro-services:

  • Independence: A container is a complete execution environment, independent of anything external.
  • Fine-grained: A physical machine can run hundreds or thousands of containers simultaneously. The calculation granularity is small enough.
  • Rapid creation and destruction: Containers can be created and destroyed in seconds, which is ideal for rapid build and reorganization of services.
  • Comprehensive management tools: a large number of container orchestration management tools, can quickly achieve service composition and scheduling.

Of course, good architecture and technology can only be applied to practice and recognized by users, which requires rich scenario-based applications on microservice architecture and Docker technology. Netease Honeycomb is also actively exploring micro-service architecture and scenario-based services of container cloud platform. Welcome to join us.

At this point, the three eras of architecture change are introduced. In general, architecture is not static. Time is constantly changing and progress is incessant.

Afterword.

For me, there is some practice in the way of architecture, but it is still very shallow. I want to discuss this topic, and I don’t know where to start. I thought about it for a long time. O’Reilly’s free booklet Software Architecture Patterns, for example, summarizes previous explorations of Architecture.