By Zhao Ning (QunqunOPS)

I joined Qunar in 2011, engaged in the operation and maintenance of private cloud infrastructure and CEPh storage. I have rich experience in operation and maintenance. Now I am in charge of the container and storage group

A brief introduction.

Calico is an open source CNI project that provides networking solutions for containerized applications. Here is a brief overview of how we use Calico to provide networking capabilities for containerized applications.

Ii. Calico architecture

Calico is a three-layer data center network solution. It can be used as a CNI plug-in to provide TCP/ IP-based three-layer network communication solution for containers running in Kubernetes. It can also be integrated with IaaS cloud architecture like OpenStack. Protocols such as BGP and IPIP are used to provide network connectivity functions for workloads, enabling efficient and controllable communication between VMS, containers, and physical machines.

Figure 1: Core components

The core components of Calico include:

✧ Felix, Calico Agent, runs on the host node of each container and is responsible for configuring information such as routes and ACLs to ensure the communication status of the container. ✧ Etcd, distributed Key/Value storage, responsible for the consistency of network metadata to ensure the accuracy of Calico network status; ✧ A BGP Client(Bird) sends routing information written to the Kernel by Felix to the Calico network to ensure the communication effectiveness between containers. ✧ BGP Route Reflector By default, Calico works in nod-mesh mode, and all nodes are connected to each other. Nod-mesh mode works well in small-scale deployment, but in large-scale deployment, the number of connections is very large and consumes too many resources. Therefore, BGP RR, To avoid this problem, one or more BGP RRS can be used to distribute routes in a centralized manner, reducing the consumption of network resources and improving the efficiency and stability of Calico.

Calico uses Linux kernel to implement an efficient vRouter on each container host node to take charge of data forwarding, and each vRouter is responsible for routing information of workload running on itself to the whole Calico through BGP protocol In-network transmission;

Figure 2: Data path

Use of Calico in Qunar:

In Qunar network environment, host domain name is a strong dependent factor. For example, Nginx/OR requires upstream members that can be directly connected to IP, which cannot be directly satisfied by the container itself OR Kubernetes. Therefore, when we started to test Kubernetes as a container choreographer tool in 2017, direct access to Pod was a necessary testing factor. After a period of testing, we chose Calico from Flannel, Cilium, Calico and other solutions. Calico has provided network functions for Pod 4000+ in our ESAAS dedicated cluster. At present, the online business cluster also chooses to use Calico solution. The main factors for selecting Calico solution are as follows:

✧ pure three layer scheme, because there is no Overlay network, the scheme is simple and controllable, and there is no solution package packet, save CPU computing resources at the same time, improve the performance of the entire network, and three layer scheme will not bring ARP broadcast storm because the container quantity change, also need not worry because the container of frequent start stop the disturbance brought about by the network, The stability of the network is guaranteed; ✧ Pod IP, Sevice IP can be directly routed, there is no intermediate link similar to NAT, all data traffic through IP packets to complete interconnection; ✧ Suitable for large-scale deployment

To sum up, the Calico network solution is simple, efficient, stable and suitable for large-scale production environments.

The Qunar container cloud platform uses Kubernetes orchestration container. Calico is deployed on each host node of the cluster as DaemonSet. For large-scale applications, we use the Calico RR mode. The rack switch is used AS the route reflector to establish A BGP neighbor relationship with the host node. All the host nodes are configured with the same AS Number. Each host node declares the route to the IP address of the local container to the rack switch, and then the switch declares the route to other nodes in the same AS.Figure 3: Host BGP Peer list

As shown in Figure 3, the BGP Peer on Node1 is the IP address of the two rack switches. Two peers are configured for redundancy. When the connection of one Peer fails, routing information can be declared through the other Peer, which does not affect online services.

Figure 4: Overall architecture

As shown in Figure 4, all Kubernetes nodes connect to two rack switches by Bonding two 10G network adapters and establishing IBGP connections with rack switches. Rack switches connect to core switches through EBGP. The Kubernetes cluster uses the same AS Number. Here is the BgpPeer configuration for Calico:Figure 5: BGP Peer configuration

1. Calico IPAM

Calico allocates IP addresses by dividing the IPPool into multiple address blocks. Each Node obtains one address. When a Pod is scheduled to a Node, the Node preferentially uses the address in the Block. In this way, hosts in the cluster can learn each other’s route entries and add them to the local routing table by means of Felix. In this way, all hosts can know the blocks owned by other hosts in the cluster.

2. Communication between Pods in the same Kubernetes cluster:

Figure 6: Pod access traffic in the same cluster

As shown in the figure above, if Container1 requests access to Container4, it goes to the host network stack through the virtual network adapter of the Veth pair Calixxxx. In the routing table of Node1, you can find the route on the network segment where Container4 resides. 10.10.1.0/26 via 10.10.5.10 dev bond0 proto bird this route is declared by Node2 after Calico has allocated the IP Block. The next hop to the destination IP address 10.66.1.0/26 is 10.10.5.10, which is the IP address of Node2. When the packet reaches the host Node2, the route to destination IP address 10.10.1.13 is queried. 10.10.1.3 dev cali11239F98883 Scope link, this route is added by Calico after Ceontainer4 is scheduled to Node2, Finally, the data packets go to Container4 via the Veth pair Cali11239F98883, and the data path works the other way around, so the two pods can establish a connection to communicate with each other.

3. Communication between Pod in Kubernetes cluster and external network:

Figure 7: Paths to access addresses outside the cluster

Once you understand how pods within a unified cluster communicate with each other, it’s easy to understand how containers communicate with addresses outside the cluster. After the EBGP connection is established between the rack switch and the core switch, the cluster routes are advertised to the core switch. The core switch forwards the data packets to the corresponding rack switch for all access to the container network. When a Pod goes to an external address (such as Gitlab, which is located outside the cluster network), we see the output of traceroute:

The first hop is the host IP address of the container

Second hop: indicates the gateway of the VLAN where bond0 IP resides

Third hop and Fourth hop: the IP addresses of the rack switch and core switch

Fifth hop: Indicates the IP address of the core switch

Sixth and seventh hops: are also the addresses of network devices that have entered the base network of Qunar

Eighth jump: The last jump to the destination address

Figure 8: Outbound traffic to the cluster

Problem 4.

Some problems have been found in the use of Calico, such as the following problem. When Calico IPAM assigns IP, it follows the following logic:

1. If an IP Block has been bound to a node, an IP address is assigned to this IP Block. 2. If no IP address is available in the first step or the host has no bound IP Block, the system searches for an unbound IP Block from the IP Pool and then executes the IP address assignment policy. 3. If the second step fails, all IP blocks are searched for an unused IP address.

The problem with this is that after all IP blocks have been allocated to the host, when a new host joins the cluster, the containers that have been started on this host will be able to borrow IP available in the allocated IP Block. The problem is, When Calico uses BIRD to broadcast BGP routes, it sets blackhole routes for each bound IP Block. However, this IP cannot communicate with it due to the existence of blackhole routes. Prior to Calico 3.14, our solution to this problem was to consider both IP and cluster size in cluster planning, and to add IP Block usage monitoring so that new IP pools can be added when IP blocks are running out. 3.14 and later, Strict IP Affinity can be configured to turn off IP Borrowing.

In live.

Now we know how containers communicate and how they communicate with addresses outside the cluster. Calico has many further features and uses, such as adapting to the Typha pattern on a very large scale to provide robust and efficient networking capabilities for larger container platforms. In the later stage, we will adjust the configuration according to the actual usage scenario to meet the needs and development of the business.