preface
The previous article, which was more source focused, explained how the RemoteDialer module in Rancher implements a duplex tunnel. So here’s a quick summary. Security gives birth to the tunnel. Duplex tunnel can be realized by four-layer connection, but processing the data receiving and receiving of four-layer connection is more troublesome than processing the seven-layer connection. Websocket supports duplex in the layer 7 protocol, so the layer 7 protocol can be used as a layer 4 implementation to realize the duplex tunnel.
In this article, we will look at how Rancher does cluster registration and cluster management based on this tunnel.
Cluster registered
Rancher Server: cluster creation
Assuming that we do not connect to IaaS and import existing clusters, a cluster import, or more accurately, cluster creation, needs to be done on the server before the cluster is registered.
After importing the cluster, you are really just creating a cluster instance on rancher’s administrative side. This instance has no real cluster information, only some meta information, such as cluster ID, cluster display name, and so on. The actual cluster information, such as the cluster endpoint, needs to be populated by the agent during the cluster registration phase.
Of course, Rancher does some extra things, like
- A service Account has been created
- A cluster registration token is created to authenticate the agent
We can ignore these parts for now and start with the cluster registration process initiated by the Agent.
Rancher Agent: Cluster registration
After the cluster is created, you can generate a YAML file for deploying the Agent. After the Agent is deployed on the target K8S cluster, the Agent initiates cluster registration.
The logical entry of the agent is in CMD /agent/main.go. Ignoring some useless, auxiliary code, we can look directly at the run() function. In the process of reading the whole agent logic, we need to know the following background knowledge:
- Agent is divided into Cluster Agent and Node Agent. In the architecture description of Rancher, Rancher will try to connect cluster Agent first. When cluster Agent is unavailable, Will attempt to access node Agent. There is no significant difference between Node Agent and Cluster Agent in essence. We just look at cluster Agent.
- Some environment variables, so that we can determine the branch of code logic.
CLUSTER_CLEANUP=false
.CATTLE_WRITE_CERT_ONLY=false
.CATTLE_TOKEN
It’s a token from Rancher,CATTLE_SERVER
Is the server address of Rancher.
The whole process is divided into the following steps:
- Obtain the root CA cert, Service Account Token, and APIServer endpoint of the K8S. This information will be serialized and encapsulated in the header of the request made by the Agent, whose key is
X-API-Tunnel-Params
. - Obtain the Cluster Registration Token and Rancher Server. The token is encapsulated as
X-API-Tunnel-Token
In the. - Continuously initiate a WebSocket connection to the server, and after the connection is made, call the specific callback function
onConnect
. Note that the webSocket connection is wrapped in the client of RemoteDialer. On the one hand, after the webSocket establishes the connection, the callback function is called. On the other hand, after the connection, the client side can receive the request forwarded by the server side and forward it to the K8S cluster at the back end.
The callback function launches some K8S operators that are trivial and will not be expanded here.
From the above description, we can see that the Cluster Agent completes the tunnel authentication on the server side by carrying the token. The k8S cluster information is reported in the cluster registration.
Cluster management
Cluster management generally includes the following functions:
- Manage metadata of a cluster and manage the mapping between metadata and cluster resources
- Synchronize cluster status, including health status and authentication information
- Manage tunnels established with clusters
- Manage routing and forwarding rules
Rancher Server implements cluster management and routing functions through multi-cluster-Manager.
multi-cluster-manager
Rancher’s Multi-Cluster Manager is messy, with a lot of code scattered all over the place and poor readability. So here’s a quick summary. The multi-cluster-Manager does the following:
- Manage the CA, endpoint, and health status of the cluster to access the back-end cluster
- Managing Tunnel Status
- Manage cluster routing, used to decide what requests to forward to where
- Manages the forwarding unit, which is used to forward requests and add various additional information
- Manages cluster status to determine cluster health status
MCM packages a DeferredServer, which we don’t have to worry about except that it registers a number of API handlers. Specific routing registered see PKG/multiclustermanager/routes go this file, which is we need to focus on
- ConnectHandler: API handler for cluster registration, which is actually the server side of RemoteDialer
- Proxy Handler: API handler used to forward requests to back-end clusters
These two are at the heart of Rancher’s tunnel management.
remotedialer server
The Connect Handler acts as a RemoteDialer server, which was described in detail in the previous article and won’t be covered here. Readers need to remember that a custom authorizer can be registered in RemoteDialer for tunnel authentication. Rancher declares the authorizer, see PKG/tunnelserver/tunnel. Go the Authorize function of files. Here is a brief description of the process:
- To obtain
X-API-Tunnel-Token
Header to check whether the Cluster registration token is in the database. In this step, the agent can use the token to find the cluster. The agent does not need to carry the cluster ID, but only the token. This information is used for tunnel authentication - To obtain
X-API-Tunnel-Params
Header, parse the K8S cluster information and update it to the cluster CRD in time. This information is used when forwarding requests
This information is reported every time the Agent is reconnected, so That Rancher can update the cluster information in time.
In terms of token management, Rancher did not use JWT, which is common in the industry, but chose to generate a string of tokens by himself, which is not necessary. But forget about certification.
proxy service
Is actually “github.com/rancher/rancher/pkg/k8sproxy” is package, interested students can simply read the source code, through a pile of curved around logic, “Github.com/rancher/rancher/pkg/clusterrouter/proxy” final look at this package.
How is the cluster information reported by agent used? This will cover some knowledge about authentication and authentication, which will be explained in detail in the authentication section. The key service processes of proxy instances are briefly introduced here:
- Check whether the cluster is reachable. An error message is displayed if the cluster is unreachable
- Gets the CLUSTER API endpoint
- After obtaining the Service Account token reported by the AGENT, the agent can access the K8S cluster by holding the token, and then add the authorization header
- Back-end cluster forwarding
The request is then routed through remoteDialer’s tunnel to the backend K8S ‘APIServer.
Those of you who have studied authentication will notice that there is only authentication information, not identity information. Rancher fill in another component of identity information, source can see PKG/auth/requests/filter. Go, adopt k8s impersonation technology, here don’t do, then described in detail.
Dealing with high availability
If the agent only maintains a long connection, this tunnel can become a bottleneck in high concurrency scenarios. Rancher doesn’t have a solution at this point.
If you recall, the RemoteDialer package is designed to support multiple copies, and the list of sessions maintained on the server side is an array. This means that when there are multiple clients, the server could have adopted a certain load balancing algorithm. Corresponding to the actual scenario, multiple agents should be able to register with Rancher at the same time, and rancher should carry out load balancing to avoid this Websocket connection called bottleneck. Therefore, it may not be difficult to support HIGH availability. If high availability is well done in the tunnel, it can complete the construction of high availability schemes with no perception at the upper layer.
conclusion
Based on this tunnel, Rancher completed cluster registration, tunnel authentication, cluster information reporting, and cluster management with MCM on the server side:
- Uplink: reports cluster information and cluster authentication information
- Downlink: Forwards the request
There are many technical details not mentioned in this article, most of which are about authentication and authentication. In the next article, we will separate these technical details.