This document was first written in June 2018, and the latest kOPS release is 1.9.1
This article does not cover the basic knowledge of Kubernetes, if not clear should go to the official website to run a demo of Minikube
0. Objective and context
The Jike Technology team has been practicing the DevOps culture, deploying Kubernetes clusters with the goal of Production Ready, requiring that the cluster stability meet Production standards.
Two Kubernetes deployment solutions have been used in production so far, both of which have stability and other issues.
The first plan: Juju
Juju was easy to use, but the community was not active, customizability was weak, and ubuntu instances later ran into kernel panic issues.
Second solution: Kubespray
When I chose the second solution, KOPS had not integrated Gossip and could not use it in AWS China. At that time, KUbesPray, which is highly customizable, was chosen. Kubespray itself is not a major problem, but rather too flexible and lacking in best practices. As a result, I encountered these problems:
- Based on past experience, I chose CentOS as the operating system at that time, but later I came across docker issue 5618 frequently and there was no good solution. This bug is easier to be triggered on CentOS/RedHat.
- Launch Configuration and Autoscaling Group need to write their own scripts to manage and are prone to errors;
- Scaling requires a number of tedious things, including removing it from the Autoscaling Group, draining it, removing nodes and, if calico is used, removing it from the Calico configuration before shutting down the machine.
In choosing the third solution, we learned from the first two, needed to strike a balance between ease of use and customizability, and kOPS had integrated the Gossip deployment solution, which seemed to be a mature solution. However, we still encountered many problems in the implementation process, so that there were several service failures during the half month of switching the cluster. The rest of this article will cover some of the major potholes we’ve trodden, but there are some that we’ve missed. Questions and comments are welcome.
1. Platform and tool selection
- Deployment tool: KOPS, because it integrates the following functions directly without much customization
- EBS -> PV
- LoadBalancer
- flannel vxlan
- AutoScalingGroup
- Operating system: Debian Jessie, although less functional than Ubuntu, stable
2. Problems and solutions
AWS China does not provide Route53 DNS service
Since kOPS 1.7, gossip has been supported. As long as the cluster name ends in k8s.local, there is no need to do any DNS configuration.
Missing base AMI
There is no basic image like K8S-1.8-Debian-Jessie – AMD64-HVM-EBS-YYYY-MM-DD in Market-place in China, you need to build one by yourself. Built image kernel is customized, kernel parameters have been tuned, can be directly put into production use.
The pit of flannel
To use Flannel, you also need to load a kernel module, br_netfilter, otherwise you run into this kube-DNS problem. It’s easier to write the AMI to /etc/modules before making it.
Some of the basic components are still in places where downloading is difficult
Kops originally supported configuration of containerRegistry, but the current version seems to have a bug, this option does not work, to do some additional work, mainly including:
- Offline Mode Offline mode
- Create an EC2 instance with the AMI you built, then set up the proxy, and pull the following images:
GCR. IO/google_containers/cluster - proportional - autoscaler - amd64:1.1.2 - r2 GCR. IO/google_containers/etcd: 2.2.1 GCR. IO/google_containers/k8s - DNS - dnsmasq - nanny - amd64:1.14.10 GCR. IO/google_containers/k8s - DNS - kube - DNS - amd64:1.14.10 GCR. IO/google_containers k8s - DNS - sidecars - amd64:1.14.10 GCR. IO/google_containers/pause - amd64:3.0 GCR. IO/google_containers kube - apiserver: v1.9.3 GCR. IO/google_containers/kube - controller - manager: v1.9.3 GCR. IO/google_containers kube - proxy: v1.9.3 GCR. IO/google_containers/kube - the scheduler: v1.9.3Copy the code
Tip: if you are not sure what image you need, you can try to deploy it, set up the global HTTP proxy, and then you can see the list of images
Tip: However, clusters with HTTP proxies deployed should not be used in a production environment
3. Start deployment
3.0 Preparations
- Create an IAM user with the Route53 part skipped
- Want to a
.k8s.local
The cluster name at the end, for examplejike-test.k8s.local
3.1 Configuring the AWS Environment
In the following scripts, all names starting with your- are arbitrary, so change them accordingly.
# install awscli PIP install awscli # generate the local configuration used by awscli, enter the key and secret obtained in the previous step, and region, Beijing, cn-north-1. Aws configure export AWS_REGION=$(AWS configure get region) export AWS_AZ=a # Available region export can be skipped NAME=your-cluster.k8s.local # create an S3 bucket to store KOPS state export KOPS_STATE_STORE=your-kops-store AWS S3API Create-bucket --bucket $KOPS_STATE_STORE --create-bucket-configuration LocationConstraint=$AWS_REGION # Generate one from key-pair Public key export SSH_PUBLIC_KEY=your-public-key.pub # This public key is copied to all nodes in the cluster. Ssh-keygen -f pemfile.pem -y >${SSH_PUBLIC_KEY} # Export VPC_ID= your-VPC # Which VPC is to be added to pemfile.pem -y >${SSH_PUBLIC_KEY} # Export VPC_NETWORK_CIDR=your-vpc-network-cidr # VPC CIDR export SUBNETS=your-vpc-subnet # VPC subnet export UTILITY_SUBNETS=your-vpc-utility-subnet # If you need to deploy bastion host, export AMI=your-ami # AMI source. Export KUBERNETES_VERSION="v1.9.3" However, the recommended version is still 1.9.3 export KOPS_VERSION="1.9.1" export ASSET_BUCKET="kops-asset" # Offline mode Requires export KOPS_BASE_URL="https://s3.cn-north-1.amazonaws.com.cn/$ASSET_BUCKET/kops/$KOPS_VERSION/" # export CNI_VERSION_URL = "HTTP: / / https://s3.cn-north-1.amazonaws.com.cn/$ASSET_BUCKET/kubernetes/network-plugins/cni-plugins-amd64-v0.6. 0..tgz "# need the CNI export CNI_ASSET_HASH_STRING =" d595d3ded6499a64e8dac02466e2f5f2ce257c9f "# the CNI sha1 hash export SSH_ACCESS=your-cidr1,your-cidr2 # Comma-separated CIDR, used to restrict the source IP of SSH login, will be written to the security groupCopy the code
3.2 Creating a Cluster
kops create cluster \ --zones ${AWS_REGION}${AWS_AZ} \ --vpc ${VPC_ID} \ --network-cidr ${VPC_NETWORK_CIDR} \ --image ${AMI} \ --associate-public- IP =true \ --api-loadbalancer-type public \ # Otherwise, set it to internal --topology public # The node is placed on a public subnet --networking flannel # Default backend is VXLAN --kubernetes-version https://s3.cn-north-1.amazonaws.com.cn/$ASSET_BUCKET/kubernetes/release/$KUBERNETES_VERSION \ --ssh-public-key ${SSH_PUBLIC_KEY} \ -- utility-utility ${utility_utility} \ # bastion uses --master-count 3 \ # The number of master nodes is recommended to be at least 3. It is not recommended to change the size of the master-size m4. https://kubernetes.io/docs/admin/cluster-large/ --node-count 1 \ --node-volume-size 200 \ ${NAME}Copy the code
3.3 Settings before implementation
At this point, we have generated a configuration for the cluster in the State Store, but have not yet implemented the deployment.
Take a look at the current cluster configuration
kops get clusters $NAME -o yaml
Copy the code
Take a look at all the configurations, including instance Groups.
kops get --name $NAME -o yaml
Copy the code
List all IG’s
kops get ig --name $NAME
Copy the code
There are currently two types of roles available: master and node
The default configuration, including cluster configuration and IG configuration, needs to be modified to ensure its normal operation.
- Modifying Cluster Configurations
kops edit cluster $NAME
Spec: docker: registryMirrors: - https://xxxx # docker registry mirror Kubelet cgroup configuration kubeletCgroups "/ systemd/system. Slice" # https://github.com/kubernetes/kops/issues/4049 runtimeCgroups: "/ systemd/system. Slice" imageGCHighThresholdPercent: 70 # old mirror recycling imageGCLowThresholdPercent: 50 # ditto masterKubelet: Kubelet: "/systemd/system.slice" runtimeCgroups: "/systemd/system.slice"Copy the code
- Modifying IG Configuration
kops edit ig your-ig-name --name $NAME
Spec: rootVolumeSize: 200 # Volume nodeLabels: [...] Taints: [...] # some taintCopy the code
3.4 Deployment
kops update cluster $NAME --yes
Copy the code
Wait a few minutes and then
kops validate cluster
Copy the code
If “Ready” is displayed, then everything is Ready. If “NotReady” is displayed and kube-DNS is not started, a taint may have been added to Node so that kube-DNS cannot find nodes to assign. And you can do that by modifying Toleration.
3.5 After Deployment
- Add, modify, and delete IG
kops create ig $IG_NAME --name $NAME --edit
Copy the code
Note the configuration changes, especially spec.image, the default image is not found in China.
- Install kubernetes – dashboard
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml
Copy the code
Confirm that the relevant POD has been up
kubectl proxy
Copy the code
Use a browser to open
http://localhost:8001/ui
Copy the code
You’ll find that you have to type in something to log in, which is not the same as before. Note that the config file generated when we created the cluster is not directly used as login credentials. See the documentation for specific methods
- Add an IAM Statement to the Master Node
Kops supports adding IAM Statements to specific roles.
Notice role, not IG. So if you add a Statement to a node, all nodes will have that Statement.
Let’s take a look at the original Node policies:
aws iam list-role-policies --role-name nodes.${NAME}
Copy the code
As a result,
{
"PolicyNames": [
"nodes.${NAME}"
]
}
Copy the code
Note that the role name and policy name use the same name. This is an Inline policy. See what Statement is in it. Run
aws iam get-role-policy --role-name nodes.${NAME} --policy-name nodes.${NAME}
Copy the code
As a result,
{ "RoleName": "nodes.${NAME}", "PolicyName": "nodes.${NAME}", "PolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Sid": "kopsK8sEC2NodePerms", "Effect": "Allow", "Action": [ "ec2:DescribeInstances", "ec2:DescribeRegions" ], "Resource": [ "*" ] }, { "Sid": "kopsK8sS3GetListBucket", "Effect": "Allow", "Action": [ "s3:GetBucketLocation", "s3:ListBucket" ], "Resource": [ "arn:aws-cn:s3:::kops-k8s-v1-state-store" ] }, { "Sid": "kopsK8sS3NodeBucketSelectiveGet", "Effect": "Allow", "Action": [ "s3:Get*" ], "Resource": [ "arn:aws-cn:s3:::kops-k8s-v1-state-store/jike-a.k8s.local/addons/*", "arn:aws-cn:s3:::kops-k8s-v1-state-store/jike-a.k8s.local/cluster.spec", "arn:aws-cn:s3:::kops-k8s-v1-state-store/jike-a.k8s.local/config", "arn:aws-cn:s3:::kops-k8s-v1-state-store/jike-a.k8s.local/instancegroup/*", "arn:aws-cn:s3:::kops-k8s-v1-state-store/jike-a.k8s.local/pki/issued/*", "arn:aws-cn:s3:::kops-k8s-v1-state-store/jike-a.k8s.local/pki/private/kube-proxy/*", "arn:aws-cn:s3:::kops-k8s-v1-state-store/jike-a.k8s.local/pki/private/kubelet/*", "arn:aws-cn:s3:::kops-k8s-v1-state-store/jike-a.k8s.local/pki/ssh/*", "arn:aws-cn:s3:::kops-k8s-v1-state-store/jike-a.k8s.local/secrets/dockerconfig" ] }, { "Sid": "kopsK8sS3NodeBucketGetKuberouter", "Effect": "Allow", "Action": [ "s3:Get*" ], "Resource": "arn:aws-cn:s3:::kops-k8s-v1-state-store/jike-a.k8s.local/pki/private/kube-router/*" }, { "Sid": "kopsK8sECR", "Effect": "Allow", "Action": [ "ecr:GetAuthorizationToken", "ecr:BatchCheckLayerAvailability", "ecr:GetDownloadUrlForLayer", "ecr:GetRepositoryPolicy", "ecr:DescribeRepositories", "ecr:ListImages", "ecr:BatchGetImage" ], "Resource": ["*"]}]}}Copy the code
You can see that there are several statements, each corresponding to a set of permissions.
Add a Permission Statement to Node:
kops edit cluster
Copy the code
Then add the following to.spec:
spec:
additionalPolicies:
node: |
[
{
"Action": ["ec2:*"],
"Effect": "Allow",
"Resource": "*"
}
]
Copy the code
Then run
kops update cluster --yes
Copy the code
Run again
aws iam list-role-policies --role-name nodes.${NAME}
Copy the code
The results changed:
{
"PolicyNames": [
"additional.nodes.${NAME}",
"nodes.${NAME}"
]
}
Copy the code
The additional. Nodes.${NAME} policy inline contains the permission we just added.
aws iam get-role-policy --role-name nodes.${NAME} --policy-name additional.nodes.${NAME}
Copy the code
The results for
{
"RoleName": "nodes.${NAME}",
"PolicyName": "additional.nodes.${NAME}",
"PolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Action": [
"ec2:*"
],
"Resource": "*"
}
]
}
}
Copy the code
4. The pit
Kube-dns scale problem
After running for a period of time, we found some network problems, pod can not access each other, and finally found that the DNS resolution may be failed, the reason is that the number of Kube-DNS pod is not enough. Proportional scaler (cluster-proportional AutoScaler) : Kube-DNS (cluster-proportional AutoScaler) : Kube-DNS (cluster-proportional AutoScaler) So to add kube-DNS, you need to modify the autoScaler configuration.
HPA doesn’t work
With this issue, metrics-server needs to be deployed. If you don’t want to deploy, it can be defined in ClusterSpec KubeControllerManagerConfig. HorizontalPodAutoscalerUseRestClients
5. Faqs
LoadBalancer Service
We thought we could route the POD directly to the ELB, but we created a NodePort and let the ELB listen on the NodePort. There are advantages to doing this, such as not having to manage the relationship between THE ELB and the Auto Scaling Group yourself, and the SSL certificate can be automatically configured.
However, there are a few caveats
- The problem of the HTTPS
If a Service has two ports configured and an SSL certificate is configured, both ports will become HTTPS by default. Only designated port if you want to use HTTPS, the rest of the HTTP port go, you need to set up the service. The beta. Kubernetes. IO/aws – the load balancer – SSL – ports
- Application Load Balancer (ALB)
It is not supported. At present, only the Classic ELB can be established, so there is also a later WebSocket problem
- How to support WebSocket?
Inconvenient to support, because currently kOPS can only create ELBs, not ALBs, unless Proxy Protocol is enabled, which is cumbersome to configure. The best way to support WebSocket is to use ALB:
- Create a NodePort Service instead of a LoadBalancer Service
- Refer the auto-scaling group corresponding to the appropriate instance-group to a Target group
- Point an Application LB to the Target Group
- Problem with manual configuration being reset
If a Service generates an ELB and goes to the AWS console to add some custom configuration, such as adding ports, the configuration on the ELB will be reset to the configuration automatically generated by the Service when the master is restarted or rolled update. Therefore, do not manually modify the ELB configuration. Otherwise, it will be difficult to upgrade the cluster.
- A TargetPort must correspond to a NodePort
Sometimes you want a TargetPort to map to a NodePort and a NodePort to two ports of the ELB, which is not possible.
Update and rolling-update operations
Rolling – Update is one of the best features of KOPS, allowing for rolling upgrades of clusters with minimal damage, but newcomers to KOPS often don’t know when to run rolling- Update. In general, if the change does not take effect before Node restarts, then rolling- Update is required. Kops rolling-update cluster –name $name kops rolling-update cluster –name $name
Here are some tips:
- Some operations may require Rolling Update unexpectedly, such as adding labels to nodes in IG
- Rolling Update’s default operation interval is long and can be adjusted by using the parameters –master-interval and –node-interval
- If rolling Update is interrupted, do not worry, you can resume until the cluster reaches the target
- If you want to force rolling Update, add –force
Why does Docker storage-driver use overlay instead of Overlay2
The docker version in AMI is 17.03.2. Why use storage-driver overlay instead of overlay2? Stretch is recommended for overlay2.
Why is the default CIDR 100.64.0.0/10?
The default CIDR used by networks in K8S (including pods and services) is 100.64.0.0/10. This is intentional, but can be changed during cluster building.
How do I enable Bastion?
Bastion host can be used as a jumper to isolate nodes in a cluster from the public network.
You can add the –bastion option when building a cluster, but KOPS requires that both the master and node topology be private. If you have already created a public master or node, you can add bastion after the cluster has been created:
kops create instancegroup bastions --role Bastion --subnet ${UTILITY-SUBNET}
Copy the code
Utility-subnet was mentioned earlier.
No consideration for IPVS?
The IPVS mode in K8S 1.9 has entered beta. We actually tried kube-Router, but the results of the network tests were very poor. Although fast when fast, the stability is so poor that it does not seem ready for production. In addition, there are many hidden pits, which are not detailed here.
Author: Ruoyu (Zhihu & Jiji)
Reference:
Kubernetes official documentation
Kops official documentation