Microservices architecture pain points
Business is concerned with communication between services
The service iteration speed slows down
Copy the code
Microservices Architecture 1.0
One gateway layer, multiple business logic layers, multiple data access layers, multiple DB/Cache, registry, configuration center
Copy the code
Microservices 2.0 Architecture - Service Grid
Infrastructure upgrading is difficult
Impact the ability and speed of delivery of infrastructure teams
Because the application introduces communication components through JAR packages
The communication component upgrade requires the application and JAR package version upgrade
Copy the code
‘Communication’ between multiple programming languages
Business one set of infrastructure per language costs a lot
Copy the code
Microservices architecture evolution
Service Grid Definition
Service Grid Architecture
Problems - Links get longer
The average RT response latency in performance will be higher
However, the loss of sidecar between native applications to the native will not exceed 1 millisecond
Copy the code
Open source framework
Earliest version linkerd
Applications can communicate with Sidecar over TCP or HTTP1.1. The two need to maintain a long connection
Copy the code
istio
- Pilot
The control center
1. Control communication between proxies
2. Load balancing
Copy the code
- Mixer
Data Collection Services:
After communication between proxies, some Mertics information (time consuming and number of requests) should be reported.
Synchronous reporting of all
Centralism is unreliable
Its performance affects the performance of the proxy itself
Copy the code
- Citadel
Do authentication security related
Proxy permission authentication, for example, TLS and SSL
Copy the code
sofa mesh
Ant Financial is open source
Copy the code
architecture
1. Rewrite proxy in ISTIO
Isotio proxy is written in c++
Sofa is rewritten in go
2. Istio data collection nodes are centralized and SOFA is distributed, that is, there is a Mixer in each proxy
3. No company has suggested using ISTIO on a large scale with an inactive community
Copy the code
Sina weibo mesh
What does the service grid do
How to choice
1. The cost of business upgrade is too high. To reduce the cost of business upgrade to 0, it is necessary to be compatible with all RPC usage
2. The expectation is that the business only needs to replace the RPC JAR package with the RPC Mesh JAR package
Copy the code
Since the research train of thought
1. Be compatible with traditional physical machines and virtual machines and the cloud
2. Control center includes service management platform and data collection center
Copy the code
Architecture design
1. Data Collection Center:
A. Metric: collection time and response time
B. Trace: Distributed request tracing system APM
C, Alarm: Alarm function
2, Protocol
A. RPC: Compatible with old RPC protocols
B. Mesh includes communication protocols (HTTP1.1 and 2.0) and data protocols (Protobuff)
(Note: HTTP1.0 does not support short connections; Http1.1 and http2.0 support keep Alive long connections; TPC is a long connection; The connection is still short on the server and can be pushed to the client directly.
2. Health checks between sidecars do not pass through the registry but themselves
Copy the code
The overall process
The user initiates a command to fuse service B
1. The Service management platform, control center and data collection center are all ready-made services (introduced in previous articles), so the self-developed Service Mesh only needs to implement proxy
Service and Proxy are one process
Now you need to change it to two separate processes
3. Place both in the same POD
Copy the code
Does it matter if Sidecar dies?
No impact.
Copy the code
How to handle the failure of sidecar pod?
If the sidecar fails, it will be detected and kill the current pod. K8s will automatically restart a pod
Copy the code
What is the architecture of two applications on the same physical machine?
drift
1. Log drift
Service 1 generates log 1 on server 1
If service 1 on server 1 hangs
Start service 2 on server 2. Log 2 is generated
If log 1 and log 2 have strong dependencies
Then service 1 must be started on server 1 to continue generating logs based on log 1
2. Retry failover
If the POD hangs and restarts again, the IP will change
It doesn't matter if you try drifting to any node on the cloud
Copy the code
Complete flow chart
This complete flow chart covers
DNS, CDN, Nginx, FastDFS (or Ceph),
LVS, ServiceMash, data collection center,
Registry, Control Center, Gateway, business logic layer,
Data access layer, storage layer and other data interaction processes
It's worth a fortune if you want it
You can add me to wechat 15900411193
Copy the code
Call link
1. The purpose of protocol resolution is to be compatible with old protocols
After the client sends a request, the client service and the server service perform protocol resolution
If both mesh protocols are used, protocol parsing and encapsulation are not required
2. The client must do serialization and deserialization. This has nothing to do with communication
Copy the code
Caller sequence diagram
Time sequence diagram of the server
Caches manage multiple maps:
Which function calls are provided by the service provider by scanning jar package reflection to get the class and method names provided by the service
Copy the code
Protocol design
Data protocol
1, the Protocol Buffer
2. Segmentator, version number and Mesh message composition
Copy the code
1. There is a version number in the primary transmission protocol
For example, version 1 indicates RPC protocol
Version 2 indicates the mesh protocol
You can distinguish between the old protocol and the new protocol based on the version number
2. Multiple packets are separated by header and tail separators
3. The separator is 5 bytes
Copy the code
Mesh communication protocol
1. TCP long connection
Http1.1 or 2.0
Copy the code
Hybrid Cloud Deployment
1. Caller
A, sidecars + Service (Mesh)
B, the Service (RPC)
2. Service Party
A, sidecars + Service (Mesh)
B, the Service (RPC)
Copy the code
Access to the process
1. When the service is started, the Mesh service or common RPC service will register with the registry, and then the service type of the node will be known
2. The caller can pull down the service information to know the provider service type and then choose a different protocol to call
Copy the code
Small details
1. Fusing is done in mesh without the participation of the business side
2. The downstream retry times are the same. The service granularity is not the interface granularity
3. Health detection between proxies (Mesh) is distributed and the local routing table is updated once problems occur in the upstream or downstream of the proxy
4. Load balancing algorithm :Random, RR, Hash(consistency is mainly usedhashTo do)
(RR :(cyclic load)
The first request is routed to the first node,
The second request is routed to the second node,
The third request is routed to the third node,
The fourth request is routed to the first node
...).
Copy the code
Architecture in the future
2 Platform 1 Center 1 Trend
Service Mesh Decouples the platform and services
Container cloud elastic platform
Service governance platform (control center, registry, data collection center)
Artificial Intelligence (AI)
Copy the code
Call relationship – data collection storage method for service management platform
The service-caller perspective
Service provider: 5 million records per minute
Caller: 500,000
A total of 5.5 million
Copy the code
Storage solution
Plan 1
Scheme 2
Duplicate data is extracted as metadata
Plan 3
The actual call traffic is only 1/10 of scheme 1
Copy the code
,