• Note: Source: Geek Time’s “Learn Architecture from Scratch”

Methods paper

Service granularity

  • The three Musketeers principle is that one microservice is developed by three people
  • In terms of system scale, three people are responsible for developing a system, and the complexity of the system is just enough that everyone can fully understand the whole system and can also divide the labor. Two people, the complexity of the system is not enough, the developer may feel unable to reflect their technical strength; Four or more, and the complexity of the system does not allow developers to understand the details of the system deeply
  • In terms of team management, three people can form a stable backup. Even if one person takes a vacation or transfers to other systems, the remaining two people can still support him. Two people are too stressed; One person is a la carte

Split method

  • Based on the “three Musketeers” theory, the appropriate number of services can be calculated after the split
1. Split based on service logic
  • Service modules in the system are identified according to their responsibilities, and each single service module is split into an independent service
  • The difficulty is that the definition of “sphere of responsibility” varies widely. For example, in an e-commerce system, the first way is to divide the service into three services, namely “commodity”, “transaction” and “user”; the second way is to divide the service into six services, namely “commodity”, “order”, “payment”, “delivery”, “seller” and “buyer”. Which way is more reasonable?
  • The confusion lies in the separation from the perspective of business, the scale is fine or coarse, because the separation is based on business logic, to judge the separation granularity, not from the perspective of business logic, according to the “three Musketeers” principle, calculate the approximate service scope
  • For example, if there are 10 people, about 4 services need to be divided according to the above principles, so “login, registration, user information management” can be divided into “user service” responsibility scope; If the team size is 100 people supporting the service and the number of services can reach 40, then “user login” is a service; If the team size reaches 1000 people to support the business, “User Connection Management” may be a standalone service
2. Split based on scalability
  • The service modules in the system are sorted according to the stability, and the services that are mature and have little change are split into stable services, and the services that are constantly changing and iterating are split into variable services
  • Stable service granularity can be coarse. Even services that are not logically associated can be placed in the same subsystem. For example, the log service and upgrade service can be placed in the same subsystem. The granularity of unstable services can be fine, but not too fine, always remembering to control the total number of services
  • The main purpose of this split is to improve the efficiency of the rapid iteration of the project, and avoid the accidental impact of the existing mature features during development, resulting in online problems
3. Split based on reliability
  • The service modules in the system are prioritized, and the core services with high reliability requirements are separated from the non-core services with low reliability requirements, and then the core services are guaranteed to be highly available.
benefits
  • Avoid non-core service failures affecting core services

For example, log reporting is usually a non-core service. However, a large number of logs may be reported in some scenarios. If the system is not split, log reporting may cause core service faults. Core services are not affected even if logs are reported

  • Core service high availability solutions can be even simpler

The functional logic of the core service is simpler, potentially storing less data and using fewer components, making it much easier to design highly available solutions than not to break them down

  • Reduces high availability costs

After the core services are split out, the core services occupy much less resources such as machines and bandwidth than without the split. Therefore, the high availability solution only for core services can save more costs in machines and bandwidth than without disassembly

4. Split based on performance
  • Modules with high performance requirements or high performance pressure are separated to prevent services with high performance pressure from affecting other services
  • Common splits are related to specific performance bottlenecks, such as Web services, databases, caches, and so on
  • For example, in the shopping spree of e-commerce, the queuing function of the entrance is the biggest performance pressure, and the queuing function can be separated into a service
The above split can be arranged and combined freely according to the actual situation

infrastructure

  • “Automated” is an important part, if its related infrastructure is not sound, that micro service is a tar pit, so that r & D, testing, operation and maintenance into a variety of traps
  • The microservice infrastructure is shown below

Implementing microservices
  • There is a whole family of open source microservices infrastructure, such as the Spring Cloud project, which covers service discovery, service routing, gateways, configuration centers, and more
  • If the number of microservices is not high, not every infrastructure is necessary
Prioritize infrastructure
    1. Service discovery, service routing, service fault tolerance: this is the most basic microservice infrastructure
    1. Interface framework and API gateway: Mainly to improve the development efficiency, the interface framework is to improve the development efficiency of internal services, API gateway is to improve the efficiency of connecting with external services
    1. Automated deployment, automated testing, and configuration center: mainly to improve testing and o&M efficiency
    1. Service monitoring, service tracking, and service security are mainly used to further improve operation and maintenance efficiency
  • The importance of the above two types of infrastructure (3 and 4) will become more and more important as the number of microservice nodes increases. However, when the number of microservice nodes is small, it can be supported by manual means. Although the efficiency is not high, it can be basically maintained

infrastructure

Automated testing

  • Microservices split the original unified system into a number of independent “micro” services, the number of interfaces between micro services greatly increased, and micro services advocate fast delivery, short version cycle, version update frequently
  • If every update relies on manual regression of the whole system, it will be heavy workload, low efficiency, and can not achieve the purpose of “fast delivery”, so we must complete most of the test regression work through automated test system
  • Automated testing covers unit tests at the code level, integration tests at the individual system level, interface tests between systems, and ideally each type of test is automated
  • Because of the size of the team and manpower reasons can not be fully covered, at least to achieve interface test automation

Automated deployment

  • Compared with the unified system, the number of nodes deployed by microservice increases several times or even more than ten times, and the frequency of microservice deployment will also be greatly increased (for example, 70% of our business system is deployed on weekdays). After comprehensive calculation, the number of microservice deployment is dozens of times that of the unified system
  • If such a large deployment operation is manually handled, it requires a lot of manpower and is prone to errors. Therefore, an automated deployment system is required to complete the deployment operation
  • The automatic deployment system provides version management, resource management (such as machine management and VM management), deployment operations, and rollback operations

Configuration center

  • The number of microservice nodes is very large, and manual modification through manual login on each machine is inefficient and error-prone
  • Especially when deploying or removing obstacles, you need to quickly add, delete, modify, and check configurations. Manual operation is obviously not possible
  • Some run-time configurations need to be modified dynamically and take effect immediately on all nodes, which cannot be done manually
  • To sum up, microservices require a unified configuration center to manage the configuration of all microservice nodes
  • The configuration center includes configuration version management (for example, for the same micro-service, there are 10 nodes serving mobile users and 20 nodes serving Unicom users, but the configuration items are the same, but the configuration values are different), add, delete, change and check configuration, node configuration, configuration synchronization, configuration push and other functions

Interface framework

  • Microservices advocate lightweight communication, generally using HTTP/REST or RPC unified interface protocol
  • But in practice, the unified optical interface protocol is not enough, but also needs to unify the data format transmitted by the interface
  • For example, we need to specify that the interface protocol is HTTP/REST, but that’s not enough. We also need to specify that the HTTP/REST data format is JSON, and that the JSON data follows the following specifications.

  • If we had simply specified the HTTP/REST protocol without specifying JSON and JSON’s data specification, we would have a confusing situation where some microservices were XML, some were JSON, and some were key-value pairs; Even if they are both JSON, the JSON data format is different. In this way, each microservice has to adapt several or even dozens of interface protocols, which is equivalent to handing over what was once done by the ESB to the microservice itself. The efficiency of this approach is obviously unacceptable, so a unified interface framework is needed
  • The interface framework is not a runnable system and is generally provided as a library or package for all microservice calls. For example, for the JSON sample above, parsing packages in multiple languages (Java packages, Python packages, C libraries, and so on) could be provided by an underlying technical team

API gateway

  • After the system is split into microservices, the internal microservices are interconnected and access each other is point-to-point
  • If an external system wants to call a function of the system, and also takes a point-to-point approach, the external system can be very “big-headed.”
  • Because the external system doesn’t need and can’t understand the responsibilities and boundaries of so many microservices, it only focuses on the capabilities it needs, not on which microservices should provide that capability
  • There are also security and permission-related restrictions for external systems to access systems. If external systems directly access a microservice, it means that each microservice has to implement its own security and permission-related functions, which is not only heavy work, but also repetitive work
  • Based on the above analysis, microservices require a unified API gateway that is responsible for accessing external systems
  • The API gateway is an interface for external systems to access. All external systems need to access the API gateway, including access authentication (whether access is allowed), permission control (which functions can be accessed), transmission encryption, request routing, and traffic control

Service discovery

  • Micro service type and quantity many, if all these information through the way of manual configuration to each service node, the first configuration workload big, configuration files may be hundreds of thousands of lines, added up dozens of node configuration is tens of thousands of hundreds of thousands of line, artificial to maintain such a large number of configuration items is a disaster
  • Secondly, the micro-service nodes often change, which may be caused by the increase of nodes due to capacity expansion, or some nodes may be isolated during fault handling, or gray scale upgrade may be adopted. Some nodes are upgraded to the new version, and then the new and old versions run at the same time
  • In either case, we want node changes to be synchronized to all other dependent microservices in a timely manner. If you use manual configuration, it is impossible to make real-time changes take effect
  • Therefore, a service discovery system is needed to support automatic registration and discovery of microservices
Service discovery can be implemented in two main ways: self-care and proxy
1. Provide for oneself
  • Self-care structure is as follows:

  • Self-care architecture means that each microservice completes its own service discovery. For example, in the figure, SERVICE INSTANCE A accesses SERVICE REGISTRY to obtain SERVICE registration information, and then directly accesses SERVICE INSTANCE B
  • The implementation of self-care service discovery is relatively simple, because the functions of this part are generally provided to each microservice invocation through a unified library or package, instead of each microservice repeating itself. And because each microservice undertakes the service discovery function, the access pressure is spread across the microservice nodes, with no significant pressure or risk on performance and availability
2. Agent type
  • The proxy structure is as follows:

  • The proxy structure means that there is a load balancing system among microservices, and the load balancing system completes the service discovery among microservices
  • The proxy approach looks cleaner and the microservices themselves are much simpler to implement, but it’s actually a riskier solution
  • The first risk is the availability risk, which could affect all calls between microservices if the LOAD BALANCER system fails
  • The second risk is performance risk, where all the call traffic between microservices will pass through the LOAD BALANCER system, and the performance pressure will increase as the number of microservices and traffic increases, eventually becoming a performance bottleneck
  • Therefore, the LOAD BALANCER system needs to be designed in a clustered mode, but the implementation of a LOAD BALANCER cluster itself adds complexity
  • Whether self-care or proxy, the core function of service discovery is the service registry, which records the configuration and status of all service nodes. After each microservice is started, it needs to register its own information in the service registry, and then the microservice or LOAD BALANCER system will query the available services in the service registry

Service routing

  • With service discovery, it is convenient to obtain relevant configuration information between micro-services. However, when a specific call request is made, we need to select a specific node from all eligible available micro-service nodes to initiate a request, which is the function that service routing needs to complete
  • Service routing is closely related to service discovery. Service routing is not designed to run as a separate system, but is usually implemented together with service discovery
  • For self-care service discovery, service routing is implemented internally by microservices. For proxy service discovery, service routing is implemented by the LOAD BALANCER system
  • No matter where implemented, the core function of service routing is the routing algorithm. Common routing algorithms include random routing, polling routing, minimum pressure routing, and minimum connection number routing

Service fault tolerance

  • After the system is split into microservices, the probability of a single microservice failure decreases and the scope of the fault is reduced, but the number of microservice nodes greatly increases
  • As a whole, the probability of a microservice failing in the system increases dramatically
  • Microservices have the feature of fault diffusion. If the fault is not handled in time, it will appear that many service nodes in the system are faulty when the fault spreads. Therefore, microservices are required to automatically deal with such fault scenarios and deal with them in a timely manner. Otherwise, if a node failure will need manual processing, input of manpower, processing speed is slow; Once processing is slow, faults spread quickly, so we need fault tolerance for services
  • Common service fault tolerance includes request retry, flow control, and service isolation
  • Typically, service fault tolerance is integrated into service discovery and service routing systems

Service monitoring

  • After microservices are split, the number of nodes increases greatly, resulting in a large increase in the number of controlled objects to be monitored, such as machines, networks, processes, and interface calls. In addition, once a fault occurs, you need to quickly locate the fault based on various information. It’s unrealistic to do it by hand
  • For example, when we receive complaints from users about business problems, if we collect and analyze information manually at this time, it may take more than ten minutes to open the logs of dozens of nodes, so we need service monitoring system to complete the monitoring of micro-service nodes
role
  • Collect and analyze information to avoid analyzing faults and reduce the processing time
  • Service monitoring can give early warning on the basis of real-time analysis, and detect and give early warning of problems in the embryonic stage, reducing the scope and time of problems

In general, service monitoring needs to collect and analyze a large amount of data, so it is recommended to make a separate system, rather than integrated into service discovery, API gateway, etc

Service tracking

  • Service monitoring can do microservice node-level monitoring and information collection, but if we need to trace the full path of a request in the microservice, service monitoring is difficult to achieve. If the complete request chain information for each service was sent to the service monitoring system in real time, the data volume would be too large to process
  • The difference between service monitoring and service tracing can be summarized simply as the difference between macro and micro. For example, service A requests service B through HTTP for 10 times, and service B returns A JSON object through HTTP. Service monitoring records the number of requests, response time, response error code, request parameters, and returned JOSN object
  • At present, no matter Distributed Tracing or microservices service Tracing, most request Tracing implementation technologies are based on Google’s Dapper, a Large-scale Distributed Systems Tracing Infrastructure.

Service security

  • After the system is split into microservices, data is scattered on each microservice node
  • From a system connectivity perspective, any microservice can access all other microservice nodes
  • From a business perspective, some sensitive data or operations can only be accessed by some microservices, but not all microservices. Therefore, a service security mechanism needs to be designed to ensure the security of services and data
  • Service security consists of three parts: access security, data security, and transmission security
  • In general, service security can be integrated into the configuration center system, that is, the configuration center configures the access security policy and data security policy for microservices. The microservice node obtains the configuration information from the configuration center and processes the specific microservice invocation request according to the security policy.
  • Because these policies are common, they are typically provided to individual microservice invocations wrapped in a common library.
  • The basic structure is as follows:

Note: Those who are interested in geek Time can check out geek Time – cash back is available