Welcome to follow our wechat official account: Shishan100

My new course ** “C2C e-commerce System Micro-service Architecture 120-day Practical Training Camp” is online in the public account ruxihu Technology Nest **, interested students, you can click the link below for details:

120-Day Training Camp of C2C E-commerce System Micro-Service Architecture

Author: huashiou

Source:t.cn/Ai98XycJ

Writing in the front

Taking Taobao as an example, this paper introduces the evolution process of the server architecture from one hundred concurrency to ten million concurrency, and lists the relevant technologies encountered in each evolution stage, so that we can have an overall understanding of the evolution of the architecture. At last, the paper summarizes some principles of architecture design.

Before introducing architecture, in order to avoid some readers are not familiar with some concepts in architectural design, the following are the most basic concepts:

  • If multiple modules in a distributed system are deployed on different servers, the distributed system is called a distributed system. For example, Tomcat and database are deployed on different servers, or two Tomcats of the same function are deployed on different servers

  • When some nodes in a ha system fail, other nodes can continue to provide services

  • Clustering Software in a particular domain is deployed on multiple servers and provides a class of services as a whole, called a cluster.

    For example, the Master and Slave of Zookeeper are deployed on multiple servers to provide centralized configuration services.

    In a common cluster, clients can connect to any node to obtain services. When a node in the cluster goes offline, other nodes can automatically take over its services, indicating that the cluster has high availability

  • Load balancing When a request is sent to the system, the request is evenly distributed to multiple nodes so that each node can evenly process the request load. The system is considered to be load balanced

  • Forward proxy and reverse proxy When an internal system accesses an external network, a proxy server forwards the request to the external network. The external network considers the access initiated by the proxy server, and the proxy server implements forward proxy

    When an external request enters the system, the proxy server forwards the request to a server in the system. For the external request, only the proxy server interacts with it. At this time, the proxy server implements reverse proxy.

    In simple terms, forward proxy refers to the process that the proxy server accesses the external network instead of the internal system, and reverse proxy refers to the process that the external request to access the system is forwarded to the internal server through the proxy server.

Single machine architecture

Take Taobao as an example. At the beginning of the website, the number of applications and users were small. Tomcat and database could be deployed on the same server.

When a browser sends a request to www.taobao.com, the DNS server converts the domain name to an actual IP address, and the browser accesses Tomcat with the IP address. As shown below:

** New technical challenges: ** As the number of users increases, Tomcat and database compete for resources, and stand-alone performance is not enough to support services, architecture evolution is inevitable.

First Evolution: Tomcat and database are deployed separately

The first evolution was nothing special, with Tomcat and the database monopolizing server resources, significantly improving the performance of each. As shown below:

** New technical challenges: ** As the number of users grows, concurrent reading and writing to the database becomes a bottleneck

Second evolution: Local cache and distributed cache are introduced

The second architecture evolution introduced caches, adding local caches on the Tomcat server and distributed caches externally to cache popular product information or HTML pages of popular products.

Caching can intercept most requests before they are read or written to the database, greatly reducing the database pressure. The techniques involved include memcached as a local cache, Redis as a distributed cache, cache consistency, cache penetration/breakdown, cache avalanche, hot data set failures, etc.

After the evolution, the following figure shows:

**** New technical challenges: The cache has resisted most of the access requests. As the number of users increases, the concurrency pressure mainly falls on the stand-alone Tomcat, and the response gradually slows down

Third evolution: Reverse proxy is introduced to implement load balancing

Tomcat is deployed on multiple servers and requests are evenly distributed to each Tomcat using reverse proxy software (Nginx).

The assumption here is that Tomcat supports up to 100 concurrent requests and Nginx supports up to 50,000 concurrent requests, so in theory Nginx can handle 50,000 concurrent requests by distributing requests to 500 Tomcats.

The technologies involved include: Nginx and HAProxy, both of which are reverse proxy software working in the seventh layer of the network. They mainly support HTTP protocol, and also involve session sharing and file uploading and downloading.

Take a look at the architecture diagram using reverse proxies:

**** New technical challenges: Reverse proxies greatly increase the amount of concurrency an application server can support, but the increase in concurrency also means that more requests penetrate the database, and stand-alone databases eventually become bottlenecks

Fourth evolution: Read and write separation of databases

The database is divided into read and write libraries. There can be more than one read library. The data of the write library is synchronized to the read library through the synchronization mechanism.

To query the latest written data, you can write more data to the cache to obtain the latest data from the cache.

The technologies involved include: Mycat, which is database middleware, which can be used to organize the separate read and write of the database and the sub-database sub-table, and the client can access the lower database through it. It also involves data synchronization and data consistency.

Architecture diagram after read/write separation:

New technical challenges: As the number of services increases, the number of visits of different services varies greatly. Different services directly compete with databases and affect each other’s performance

Fifth evolution: The database is separated by service

Databases are separated by services to store data of different services in different databases, reducing resource competition among services. For services with heavy traffic, more servers can be deployed to support them.

As a result, cross-business tables cannot be directly associated with analysis, which needs to be solved by other means. However, this is not the focus of this paper. Interested parties can search for their own solutions.

The architecture diagram after repository separation is as follows:

**** New technical challenges: As the number of users increases, stand-alone write libraries will gradually reach performance bottlenecks

Sixth evolution: Split large tables into small tables

For example, comment data can be hashed based on product ids and routed to corresponding tables for storage.

For payment records, tables can be created on an hourly basis, and each hourly table continues to be split into smaller tables, using user ids or record numbers to route the data.

As long as the amount of real-time table data is small enough and requests are distributed evenly across small tables on multiple servers, the database can improve performance through horizontal scaling. Mycat, mentioned earlier, also supports access control when large tables are split into smaller tables.

This approach significantly increases the difficulty of database operation and maintenance, and has higher requirements for DBAs. When a database is designed to this structure, it can already be called a distributed database.

But this is just a logical database whole, and different parts of the database are implemented individually by different components.

For example, the management of sub-database and sub-table and request distribution are realized by Mycat, SQL parsing is realized by stand-alone database, read and write separation may be realized by gateway and message queue, the summary of query results may be realized by database interface layer and so on. This architecture is actually a kind of IMPLEMENTATION of MPP (large-scale parallel processing) architecture.

At present, there are a lot of MPP databases in both open source and commercial, among which Greenplum, TiDB, Postgresql XC and HAWQ are popular, and commercial ones, such as NTNU GBase, Ruifan Snowball DB, Huawei LibrA and so on.

Different MPP databases have different emphases. For example, TiDB is more focused on distributed OLTP scenarios, while Greenplum is more focused on distributed OLAP scenarios.

These MPP databases generally provide SQL standard support capabilities such as Postgresql, Oracle, and MySQL, which can parse a query into a distributed execution plan and distribute it to each machine for parallel execution. Finally, the database itself summarizes the data and returns it.

In addition, it also provides such capabilities as authority management, sub-database sub-table, transaction, data copy, and most of them can support clusters of more than 100 nodes, which greatly reduces the cost of database operation and maintenance, and enables the database to achieve horizontal expansion.

Let’s look at the architecture diagram after splitting small tables:

**** New technical challenges: Both the database and Tomcat can scale horizontally, with a significant increase in supported concurrency. However, as users grow, stand-alone Nginx will eventually become a bottleneck

Evolution 7: Use LVS or F5 to load balance multiple Nginx

Because the bottleneck is in Nginx, multiple Nginx load balancing cannot be achieved through two layers of Nginx.

Above the LVS and F5 is work in the fourth layer network load balancing solutions, including the LVS is software, running on the operating system kernel mode, to TCP forwarding requests or higher level of network protocols, so support agreement is richer, and the performance is much higher than Nginx, can assume that single LVS can support hundreds of thousands of concurrent request forward

The F5 is a load balancing hardware that is similar to the capabilities provided by LVS, with higher performance than LVS, but at an expensive price.

LVS is standalone software. If the LVS server is down, the entire backend system cannot be accessed. Therefore, a standby node is required.

Keepalived software can be used to simulate the virtual IP and then bind the virtual IP to multiple LVS servers.

When the main LVS server is down, keepalived software automatically updates the routing table in the router and redirects the virtual IP address to another normal LVS server, thus making the LVS server highly available.

It is important to note here that the drawing from the Nginx layer to the Tomcat layer does not mean that all Nginx forwards requests to all Tomcat.

In practice, there may be several Nginx connected to one part of Tomcat, with keepalived high availability between them, and other Nginx connected to another Tomcat, so that the number of Tomcat connections can be multiplied.

**** New technical challenges: Since LVS are also stand-alone, LVS servers will eventually reach a bottleneck as the number of concurrent applications grows into the hundreds of thousands.

In this case, the number of users reaches tens of millions or even hundreds of millions. The users are distributed in different areas and the distance from the server room is different, resulting in significantly different access delays

Eighth Evolution: Load balancing between equipment rooms is implemented using DNS polling

On the DNS server, you can configure a domain name to correspond to multiple IP addresses. Each IP address corresponds to a virtual IP address in a different equipment room.

When a user accesses www.taobao.com, the DNS server uses polling policies or other policies to select an IP address for the user to access. This mode implements load balancing between equipment rooms

At this point, the system can achieve the level of machine room expansion, tens of millions to hundreds of millions of levels of concurrency can be solved by increasing the machine room, the system entrance request concurrency is no longer a problem.

The architecture diagram after evolution is as follows:

**** New technical challenges: With the development of data richness and business, search, analysis and other requirements are increasingly rich, relying on the database alone can not solve such a rich demand

Evolution 9: NoSQL database and search engine technologies are introduced

When the amount of data in the database reaches a certain size, the database is not suitable for complex queries, and can only meet the scenarios of common queries.

In statistical report scenarios, results may not be generated when there is a large amount of data, and other queries may be slowed down when complex queries are run

For scenarios such as full-text retrieval, variable data structures, databases are inherently unsuitable. Therefore, appropriate solutions need to be introduced for specific scenarios.

For massive file storage, HDFS can be used. For key value data, HBase and Redis can be used. For full-text search, ElasticSearch can be used. Solutions such as Kylin or Druid are available.

Of course, the introduction of more components will also increase the complexity of the system, the data stored by different components need to be synchronized, the problem of consistency needs to be considered, and more means of operation and maintenance need to manage these components.

Architecture diagram for introducing NoSQL and search engines:

**** New technical challenges: The introduction of more components to solve the rich requirements, business dimensions can be greatly expanded, resulting in an application containing too much business code, business upgrade iterations become difficult

Tenth Evolution: A large application is split into a small application

Application code is divided by service block, so that the responsibilities of each application are clearer and each application can be upgraded independently.

In this case, some common configurations may be involved between applications. You can use the Zookeeper distributed configuration center to solve this problem.

The architecture diagram is as follows:

**** New technical challenges: Multiple copies of the same code exist when modules are shared between different applications. As a result, all application codes must be upgraded when common functions are upgraded

Evolution 11: Reuse functions are abstracted into microservices

If user management, order, payment, authentication and other functions exist in multiple applications, the codes of these functions can be extracted separately to form a separate service for management, which is the so-called micro service

Common services are accessed between applications and services through HTTP, TCP, or RPC requests, and each individual service can be managed by a separate team.

In addition, Dubbo, SpringCloud and other frameworks can realize service governance, traffic limiting, circuit breaker, downgrade and other functions to improve the stability and availability of services.

**** New technical challenges: Different services have different interface access modes, and application code needs to adapt to multiple access modes to use the service.

In addition, applications access services and services may access each other, and the invocation chain becomes very complex and the logic becomes chaotic

Evolution 12: The ESB, an enterprise service bus, is introduced to mask service interface access differences

The ESB implements unified access protocol transformation. Applications access back-end services through ESB, and services invoke each other through ESB to reduce the coupling degree of the system.

The so-called SOA (Service Oriented) architecture, in which a single application is split into multiple applications, common services are isolated to manage, and the enterprise message bus is used to decouple services from each other, is easily confused with microservices architecture because of its similar presentation.

Personally, microservice architecture refers to the idea of extracting public services from the system for independent operation and maintenance management, while SOA architecture refers to an architectural idea of splitting services and unifying service interface access. SOA architecture contains the idea of microservices.

Take a look at the evolving architecture diagram:

**** New technical challenges: With the continuous development of services, the number of applications and services will continue to increase, and the deployment of applications and services becomes complicated. Multiple services deployed on the same server must solve the problem of running environment conflicts

In addition, for scenarios requiring dynamic expansion and shrinkage, such as large-scale expansion, the performance of services needs to be horizontally expanded, so it is necessary to prepare the operating environment and deploy services on the newly added services, which makes operation and maintenance very difficult

Evolution 13: Container technology is introduced to achieve operational environment isolation and dynamic service management

At present, the most popular containerization technology is Docker, and the most popular container management service is Kubernetes(K8S). Applications/services can be packaged as Docker images, which can be dynamically distributed and deployed through K8S.

Docker image can be understood as a minimum operating system that can run your application/service, which puts the application/service running code, and the running environment is set up according to the actual needs.

After the whole “operating system” is packaged as an image, it can be distributed to the machines where relevant services need to be deployed. Directly starting the Docker image can get the service up, making the deployment, operation and maintenance of the service simple.

Before the big push, you can partition the servers on the existing machine cluster to start the Docker image and enhance the performance of the service. After the big push, the mirror can be turned off without affecting other services on the machine

**** New technical challenges: Dynamic service scaling is solved with containerization, but the machine still needs to be managed by the company itself.

In the non-promotion time, still need to idle a large number of machine resources to deal with the promotion, machine own cost and operation and maintenance cost are very high, resource utilization rate is low.

Evolution: A cloud platform is used to host the system

The system can be deployed on the public cloud to solve the problem of dynamic hardware resources by utilizing the massive machine resources of the public cloud.

In a short period of time, apply for more resources temporarily in the cloud platform, combine Docker and K8S to rapidly deploy the service.

After the end of the promotion, the release of resources, truly pay on demand, greatly improve the utilization of resources, and greatly reduce the operation and maintenance costs.

The so-called cloud platform is to abstract massive machine resources into a whole resource through unified resource management.

On the cloud platform, hardware resources (such as CPU, memory, network, etc.) can be applied dynamically on demand. In addition, the cloud platform provides a general operating system, common technical components (such as Hadoop technology stack, MPP database, etc.) for users to use, and even provides developed applications.

Users can address their needs (such as audio and video transcoding services, mail services, personal blogs, etc.) without having to relate to what technology is being used within the application.

The following concepts are involved in cloud platforms:

  • **IaaS: ** Infrastructure as a service. Corresponding to the above mentioned machine resources unified into a resource whole, can dynamically apply for hardware resources level;

  • **PaaS: ** Platform as a service. Provide common technical components to facilitate the development and maintenance of the system as mentioned above;

  • **SaaS: ** Software as a service. To provide developed applications or services as described above, pay for functionality or performance requirements.

At this point, the above mentioned problems from high concurrent access, to the service architecture and system implementation level have their own solutions.

At the same time, it should be realized that the above introduction has intentionally omitted practical issues such as cross-machine room data synchronization, distributed transaction implementation, and so on, which will be discussed separately at a later time

Summary & Reflection

Next, let’s discuss some architectural design issues:

  • Must the changes to the architecture follow the evolutionary path described above? No! The sequence of architecture evolution mentioned above is a separate improvement for a certain side. In actual scenarios, there may be several problems to be solved at the same time, or another aspect may reach the bottleneck first. In this case, it should be solved according to the actual problems.

    For example, in a scenario where the amount of concurrency in the government class may be small but the business may be very rich, high concurrency is not the priority to be solved, and a solution that enriches the requirements may be the priority.

  • How far should the architecture be designed for the system to be implemented? For single-implementation systems with well-defined performance metrics, it is sufficient that the architecture is designed to support the performance metrics of the system, but with interfaces to extend the architecture in case it is needed.

    Evolving systems, such as e-commerce platforms, should be designed to meet the requirements of the next phase of user volume and performance indicators, and the architecture should be upgraded iteratively based on the growth of the business to support higher concurrency and richer business.

  • What is the difference between server architecture and big data architecture? In fact, the so-called “big data” is a general term for mass data collection, cleaning, transformation, data storage, data analysis, data services and other scenarios. Each scenario contains a variety of optional technologies.

    For example, data collection includes Flume, Sqoop, and Kettle, data storage includes DISTRIBUTED file systems HDFS and FastDFS, NoSQL databases HBase and MongoDB, and data analysis includes Spark technology stack and machine learning algorithm.

    In general, big data architecture is an architecture that integrates various big data components based on business requirements and generally provides distributed storage, distributed computing, multidimensional analysis, data warehouse, machine learning algorithm and other capabilities. On the other hand, server-side architecture refers more to the architecture at the application organization level, and the underlying capabilities are often provided by big data architecture.

  • Are there any architectural design principles?

  • N + 1 design. Every component in the system should be free of single points of failure;

  • Roll back the design. Ensure that the system is forward compatible and there should be a way to roll back versions when the system is upgraded;

  • Disable design. A configuration should be provided to control the availability of specific functions and to be able to quickly go offline in the event of a system failure;

  • Monitoring design. In the design stage to consider the means of monitoring;

  • Multi-active data center design. If the system needs extremely high availability, it should be considered to implement multi-activity in data centers in multiple places, and the system can still be used when at least one machine room is powered off.

  • Use proven technology. New or open source technologies tend to have many hidden bugs, and failure without commercial support can be a disaster.

  • Resource isolation design. Avoid a single business taking up all resources;

  • The architecture should scale horizontally. Only when the system can be extended horizontally can bottleneck problems be avoided effectively.

  • Non-core buy. If non-core functions need to occupy a lot of R&D resources to solve, consider buying mature products;

  • Use commercial hardware. Commercial hardware can effectively reduce the probability of hardware failure;

  • Iterate quickly. The system should develop small functional modules quickly and go online for verification as soon as possible, so as to find problems early and greatly reduce the risk of system delivery.

  • Stateless design. The service interface should be stateless, with access to the current interface independent of the state of the interface’s last access.

END

Personal public account: Architecture Notes of Huishania (ID: Shishan100)

Welcome to long press the picture below to pay attention to the public number: Huoia architecture notes!

The official number backstage replies the information, obtains the author exclusive secret system study material

Architecture notes, BAT architecture experience taught each other