This is the 21st day of my participation in Gwen Challenge.

1. The origin of K8S

K8S is an abbreviation of kubernetes, using 8 instead of the 8-character ubernete.





2, K8S standalone actual combat

Environment:
  • Ubuntu 16.04

  • Gpu driver 418.56

  • Docker 18.06

  • K8s 1.13.5

First, set the environment

Cp /etc/apt/sources.list /etc/apt/sources.list.cp /etc/apt/sources.list

Vim /etc/apt/sources.list

deb-src http://archive.ubuntu.com/ubuntu xenial main restricted
deb http://mirrors.aliyun.com/ubuntu/ xenial main restricted
deb-src http://mirrors.aliyun.com/ubuntu/ xenial main restricted multiverse universe
deb http://mirrors.aliyun.com/ubuntu/ xenial-updates main restricted
deb-src http://mirrors.aliyun.com/ubuntu/ xenial-updates main restricted multiverse universe
deb http://mirrors.aliyun.com/ubuntu/ xenial universe
deb http://mirrors.aliyun.com/ubuntu/ xenial-updates universe
deb http://mirrors.aliyun.com/ubuntu/ xenial multiverse
deb http://mirrors.aliyun.com/ubuntu/ xenial-updates multiverse
deb http://mirrors.aliyun.com/ubuntu/ xenial-backports main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/xenial-backports main restricted universe multiverse
deb http://archive.canonical.com/ubuntu xenial partner
deb-src http://archive.canonical.com/ubuntu xenial partner
deb http://mirrors.aliyun.com/ubuntu/ xenial-security main restricted
deb-src http://mirrors.aliyun.com/ubuntu/ xenial-security main restricted multiverse universe
deb http://mirrors.aliyun.com/ubuntu/ xenial-security universe
deb http://mirrors.aliyun.com/ubuntu/ xenial-security multiverse
Copy the code

Apt-get update

Package :apt –fix-broken install

Upgrade: Do not perform this operation on GPU machines. Otherwise, the GPU driver may be upgraded, causing problems

Disable the firewall :ufw disable

Install selinux:apt install selinux-utils

Selinux firewall configuration:

setenforce 0

vim/etc/selinux/conifg

SELINUX=disabled
Copy the code

Setting up the network:

tee /etc/sysctl.d/k8s.conf <<-'EOF'
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF

modprobe br_netfilter
Copy the code

To check whether ipv4 and V6 configurations take effect, run the sysctl –system command

Configure iptables:

iptables -P FORWARD ACCEPT

vim /etc/rc.local
/usr/sbin/iptables -P FORWARD ACCEPT
Copy the code

Permanently disable the swap partition :sed -i ‘s/.*swap.*/#&/’ /etc/fstab

Install docker

Execute the following command:

apt-get install apt-transport-https ca-certificates curl software-properties-common curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | apt-key add - add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" apt-get update apt-get purge docker-ce docker docker-engine docker.io && rm -rf /var/lib/docker apt-get autoremove docker-ce docker docker-engine docker.io apt-get Install - y docker - ce = 18.06.3 ~ ce ~ 3-0 ~ ubuntuCopy the code

Start docker and set the systemctl enable docker && systemctl start docker

Docker configuration:

vim /etc/docker/daemon.json
{
 "log-driver": "json-file",
 "log-opts": {
   "max-size": "100m",
   "max-file": "10"
 },
 "insecure-registries": ["http://k8s.gcr.io"],
 "data-root": "",
 "default-runtime": "nvidia",
 "runtimes": {
     "nvidia": {
         "path": "/usr/bin/nvidia-container-runtime",
         "runtimeArgs": []
     }
 }
}
Copy the code

The above is the configuration with and without GPU:

{
"registry-mirrors":[
"https://registry.docker-cn.com"
],
"storage-driver":"overlay2",
"log-driver":"json-file",
"log-opts":{
"max-size":"100m"
},
"exec-opts":[
"native.cgroupdriver=systemd"
],
"insecure-registries":["http://k8s.gcr.io"],
"live-restore":true
}
Copy the code

Restart the service and set automatic restart upon startup:

systemctl daemon-reload && systemctl restart docker && docker info
Copy the code

Three, install K8S

Settings before pulling mirror:

apt-get update && apt-get install -y apt-transport-https curl

curl -s https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add -

tee /etc/apt/sources.list.d/kubernetes.list <<-'EOF'
deb https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial main
EOF
Copy the code

Update:

Kubectl =1.13.5-00 apt-get autoremove kubelet=1.13.5-00 kubectl=1.13.5-00 Kubeadm =1.13.5-00 kubectl=1.13.5-00 apt-get install -y kubelet=1.13.5-00 kubeadm=1.13.5-00 kubectl=1.13.5-00 apt-mark Hold kubelet = 1.13.5-00 kubeadm = 1.13.5-00 kubectl = 1.13.5-00Copy the code

Start the service and set automatic restart upon startup:

systemctl enable kubelet && sudo systemctl start kubelet
Copy the code

Download k8S image from registry.cn-hangzhou.aliyuncs.com because gCR. IO network is unavailable.

Docker pull registry.cn-hangzhou.aliyuncs.com/gg-gcr-io/kube-apiserver:v1.13.5 docker pull Registry.cn-hangzhou.aliyuncs.com/gg-gcr-io/kube-controller-manager:v1.13.5 docker pull Registry.cn-hangzhou.aliyuncs.com/gg-gcr-io/kube-scheduler:v1.13.5 docker pull Registry.cn-hangzhou.aliyuncs.com/gg-gcr-io/kube-proxy:v1.13.5 docker pull Registry.cn-hangzhou.aliyuncs.com/kuberimages/pause:3.1 docker pull Registry.cn-hangzhou.aliyuncs.com/kuberimages/etcd:3.2.24 docker pull registry.cn-hangzhou.aliyuncs.com/kuberimages/coredns:1.2.6Copy the code

Tag:

Docker tag registry.cn-hangzhou.aliyuncs.com/gg-gcr-io/kube-apiserver:v1.13.5 k8s.gcr. IO /kube-apiserver:v1.13.5 Docker The tag registry.cn-hangzhou.aliyuncs.com/gg-gcr-io/kube-controller-manager:v1.13.5 K8s. GCR. IO/kube - controller - manager: v1.13.5 docker tag registry.cn-hangzhou.aliyuncs.com/gg-gcr-io/kube-scheduler:v1.13.5 K8s. GCR. IO/kube - the scheduler: v1.13.5 docker tag registry.cn-hangzhou.aliyuncs.com/gg-gcr-io/kube-proxy:v1.13.5 K8s. GCR. IO/kube - proxy: v1.13.5 docker tag registry.cn-hangzhou.aliyuncs.com/kuberimages/pause:3.1 k8s. GCR. IO/pause: 3.1 Docker tag registry.cn-hangzhou.aliyuncs.com/kuberimages/etcd:3.2.24 k8s. GCR. IO/etcd: 3.2.24 docker tag Registry.cn-hangzhou.aliyuncs.com/kuberimages/coredns:1.2.6 k8s. GCR. IO/coredns: 1.2.6Copy the code

Kubeadm initialization

Use kubeadm to initialize k8S, where the host IP is entered according to its actual situation:

Kubeadm init --kubernetes-version=v1.13.5 --pod-network-cidr=10.244.0.0/16 --service-cidr=10.16.0.0/16 --apiserver-advertise-address=${masterIp} | tee kubeadm-init.logCopy the code

If the host IP address is not known, the yamL file can also be used to dynamically initialize:

Server vi kube-init.yaml apiVersion: kubeadm.k8s.io/v1beta1 kind: ClusterConfiguration kubernetesVersion: v1.13.5 imageRepository: registry.aliyuncs.com/google_containers apiServer: certSANs: - "k8s.api.server" controlPlaneEndpoint: "k8s.api.server:6443" networking: serviceSubnet: "10.1.0.0/16 podSubnet: 10.244.0.0/16"Copy the code

HA version:

ApiVersion: kubeadm.k8s. IO/v1Beta1 Kind: ClusterConfiguration kubernetesVersion: v1.13.5 imageRepository: registry.aliyuncs.com/google_containers apiServer: certSANs: - "api.k8s.com" controlPlaneEndpoint: "api.k8s.com:6443" etcd: external: endpoints: - https://ETCD_0_IP:2379 - https://ETCD_1_IP:2379 - https://ETCD_2_IP:2379 networking: serviceSubnet: 10.1.0.0/16 podSubnet: 10.244.0.0/16Copy the code

Note: apiVersion uses kubeadm, because kubeadm is needed to initialize it, and finally performs the following initialization:

kubeadm init --config=kube-init.yaml
Copy the code

If a problem occurs, resolve it, reset it, or run the following command if more is required:

 kubeadm --help
Copy the code

Five, deployment problems

Delete node first (Cluster version)

kubectl drain <node name> --delete-local-data --force --ignore-daemonsets
​
kubectl delete node <node name>
Copy the code

Clear the init configuration on the node to be deleted (note that you can use this command to return any error after init or join) :

kubeadm reset
Copy the code

Six, check the problem

If any problem occurs after initialization, run the following command to check the container status and network status:

sudo docker ps -a | grep kube | grep -v pause

sudo docker logs CONTAINERID

sudo docker images && systemctl status -l kubelet

netstat -nlpt

kubectl describe ep kubernetes

kubectl describe svc kubernetes

kubectl get svc kubernetes

kubectl get ep

netstat -nlpt | grep apiser

vi /var/log/syslog
Copy the code

7. Configure the k8S apiserver to access the public key for the current user

sudo mkdir -p $HOME/.kube

sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

sudo chown $(id -u):$(id -g) $HOME/.kube/config
Copy the code

Network plug-in

Kubectl apply -f https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/rbac-kdd.yaml wget https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networkin Yaml - name: CALICO_IPV4POOL_IPIP value:"off" - name: CALICO_IPV4POOL_CIDR value: "10.244.0.0/16 kubectl apply-f calico.yamlCopy the code

Allow the pod command to be deployed on the master node in a single machine:

kubectl taint nodes --all node-role.kubernetes.io/master-
Copy the code

Disallow master to deploy pod:

kubectl taint nodes k8s node-role.kubernetes.io/master=true:NoSchedule
Copy the code

This is the end of the standalone deployment, if your project is delivering a combination of hardware and software in one machine, then this is the end. Remember to allow the master node to be deployed on a single machine.

Next, the cluster version is coming online!

Take the machine deployed above as an example, as the master node, continue:

scp /etc/kubernetes/admin.conf $nodeUser@$nodeIp:/home/$nodeUser

scp /etc/kubernetes/pki/etcd/* $nodeUser@$nodeIp:/home/$nodeUser/etcd

kubeadm token generate

kubeadm token create $token_name --print-join-command --ttl=0

kubeadm join $masterIP:6443 --token  $token_name --discovery-token-ca-cert-hash $hash
Copy the code

If CUDA is required for Node machine execution, refer to the following materials:

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#ubuntu-installation
https://blog.csdn.net/u012235003/article/details/54575758
https://blog.csdn.net/qq_39670011/article/details/90404111
Copy the code

Official execution:

vim /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
update-initramfs -u
Copy the code

Restart Ubuntu to check whether it is successfully disabled:

lsmod | grep nouveau

apt-get remove --purge nvidia*

https://developer.nvidia.com/cuda-downloads

sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev
Copy the code

Cuda installation:

Accept select "Install"/Enter select "Yes" sh CUDA_10.1.168_418.67_linux. run echo 'export Bashrc echo 'export PATH=/usr/local/ CUDA-10.1/nsightCompute -2019.3:$PATH' >> ~/. Bashrc echo 'export PATH=/usr/local/ CUDA-10.1/nsightCompute -2019.3:$PATH' >> Bashrc echo 'export LD_LIBRARY_PATH=/usr/local/ CUDA-10.1 /lib64:$LD_LIBRARY_PATH' >> ~/. Bashrc source ~/Copy the code

Restart the PC and check whether CUDA is installed successfully.

Check whether there are “Nvidia *” devices:

cd /dev && ls -al
Copy the code

If not, create a nv.sh:

vi nv.sh #! /bin/bash /sbin/modprobe nvidia if [ "$?" -eq 0 ]; then NVDEVS=`lspci | grep -i NVIDIA ` N3D=` echo "$NVDEVS" | grep "3D controller" | wc -l ` NVGA=` echo "$NVDEVS" | grep  "VGA compatible controller" | wc -l ` N=` expr $N3D + $NVGA - 1 ` for i in ` seq 0 $N `; do mknod -m 666 /dev/nvidia$i c 195 $i done mknod -m 666 /dev/nvidiactl c 195 255 else exit 1 fi chmod +x nv.sh && bash nv.shCopy the code

Restart the machine again to view the CUDA version:

nvcc -V
Copy the code

Compile:

CD /usr/local/cuda-10.1/samples && make CD /usr/local/cuda-10.1/samples /bin/x86_64/linux/release. /deviceQueryCopy the code

If Result = PASS is displayed, CUDA is successfully installed.

Install nvdocker:

vim /etc/docker/daemon.json
{
"runtimes":{
    "nvidia":{
         "path":"nvidia-container-runtime",
          "runtimeArgs":[]
     }
},
"registry-mirrors":["https://registry.docker-cn.com"],
"storage-driver":"overlay2",
"default-runtime":"nvidia",
"log-driver":"json-file",
"log-opts":{
 "max-size":"100m"
},
"exec-opts": ["native.cgroupdriver=systemd"],
"insecure-registries": [$harborRgistry],
"live-restore": true
}
Copy the code

Restart the docker:

sudo systemctl daemon-reload && sudo systemctl restart docker && docker info
Copy the code

Check whether the NviDIa-Docker installation is successful:

Docker run -- Runtime =nvidia --rm Nvidia/CUDA: 9.0-BASE nvidia-smICopy the code

Go to su mode on the node machine :su $nodeUser

Configure the k8S apiserver to access the public key of the current node.

mkdir -p $HOME/.kube

cp -i  admin.conf $HOME/.kube/config

chown $(id -u):$(id -g) $HOME/.kube/config

mkdir -p $HOME/etcd

sudo rm -rf /etc/kubernetes

sudo mkdir -p /etc/kubernetes/pki/etcd

sudo cp /home/$nodeUser/etcd/* /etc/kubernetes/pki/etcd

sudo kubeadm join $masterIP:6443 --token  $token_name --discovery-token-ca-cert-hash $hash
Copy the code

Such as:

Sudo kubeadm join 192.168.8.116:6443 --token vyi4ga.foyxqr2iz9i391q3 -- discovery-tok-ca-cert-hash sha256:929143bcdaa3e23c6faf20bc51ef6a57df02edf9df86cedf200320a9b4d3220aCopy the code

Check whether the node is added to the master:kubectl get node

This section describes how to deploy a single K8S node and install the MASTER HA node.