This is the 21st day of my participation in Gwen Challenge.
1. The origin of K8S
K8S is an abbreviation of kubernetes, using 8 instead of the 8-character ubernete.
2, K8S standalone actual combat
Environment:
-
Ubuntu 16.04
-
Gpu driver 418.56
-
Docker 18.06
-
K8s 1.13.5
First, set the environment
Cp /etc/apt/sources.list /etc/apt/sources.list.cp /etc/apt/sources.list
Vim /etc/apt/sources.list
deb-src http://archive.ubuntu.com/ubuntu xenial main restricted
deb http://mirrors.aliyun.com/ubuntu/ xenial main restricted
deb-src http://mirrors.aliyun.com/ubuntu/ xenial main restricted multiverse universe
deb http://mirrors.aliyun.com/ubuntu/ xenial-updates main restricted
deb-src http://mirrors.aliyun.com/ubuntu/ xenial-updates main restricted multiverse universe
deb http://mirrors.aliyun.com/ubuntu/ xenial universe
deb http://mirrors.aliyun.com/ubuntu/ xenial-updates universe
deb http://mirrors.aliyun.com/ubuntu/ xenial multiverse
deb http://mirrors.aliyun.com/ubuntu/ xenial-updates multiverse
deb http://mirrors.aliyun.com/ubuntu/ xenial-backports main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/xenial-backports main restricted universe multiverse
deb http://archive.canonical.com/ubuntu xenial partner
deb-src http://archive.canonical.com/ubuntu xenial partner
deb http://mirrors.aliyun.com/ubuntu/ xenial-security main restricted
deb-src http://mirrors.aliyun.com/ubuntu/ xenial-security main restricted multiverse universe
deb http://mirrors.aliyun.com/ubuntu/ xenial-security universe
deb http://mirrors.aliyun.com/ubuntu/ xenial-security multiverse
Copy the code
Apt-get update
Package :apt –fix-broken install
Upgrade: Do not perform this operation on GPU machines. Otherwise, the GPU driver may be upgraded, causing problems
Disable the firewall :ufw disable
Install selinux:apt install selinux-utils
Selinux firewall configuration:
setenforce 0
vim/etc/selinux/conifg
SELINUX=disabled
Copy the code
Setting up the network:
tee /etc/sysctl.d/k8s.conf <<-'EOF'
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
modprobe br_netfilter
Copy the code
To check whether ipv4 and V6 configurations take effect, run the sysctl –system command
Configure iptables:
iptables -P FORWARD ACCEPT
vim /etc/rc.local
/usr/sbin/iptables -P FORWARD ACCEPT
Copy the code
Permanently disable the swap partition :sed -i ‘s/.*swap.*/#&/’ /etc/fstab
Install docker
Execute the following command:
apt-get install apt-transport-https ca-certificates curl software-properties-common curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | apt-key add - add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" apt-get update apt-get purge docker-ce docker docker-engine docker.io && rm -rf /var/lib/docker apt-get autoremove docker-ce docker docker-engine docker.io apt-get Install - y docker - ce = 18.06.3 ~ ce ~ 3-0 ~ ubuntuCopy the code
Start docker and set the systemctl enable docker && systemctl start docker
Docker configuration:
vim /etc/docker/daemon.json
{
"log-driver": "json-file",
"log-opts": {
"max-size": "100m",
"max-file": "10"
},
"insecure-registries": ["http://k8s.gcr.io"],
"data-root": "",
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
Copy the code
The above is the configuration with and without GPU:
{
"registry-mirrors":[
"https://registry.docker-cn.com"
],
"storage-driver":"overlay2",
"log-driver":"json-file",
"log-opts":{
"max-size":"100m"
},
"exec-opts":[
"native.cgroupdriver=systemd"
],
"insecure-registries":["http://k8s.gcr.io"],
"live-restore":true
}
Copy the code
Restart the service and set automatic restart upon startup:
systemctl daemon-reload && systemctl restart docker && docker info
Copy the code
Three, install K8S
Settings before pulling mirror:
apt-get update && apt-get install -y apt-transport-https curl
curl -s https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add -
tee /etc/apt/sources.list.d/kubernetes.list <<-'EOF'
deb https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial main
EOF
Copy the code
Update:
Kubectl =1.13.5-00 apt-get autoremove kubelet=1.13.5-00 kubectl=1.13.5-00 Kubeadm =1.13.5-00 kubectl=1.13.5-00 apt-get install -y kubelet=1.13.5-00 kubeadm=1.13.5-00 kubectl=1.13.5-00 apt-mark Hold kubelet = 1.13.5-00 kubeadm = 1.13.5-00 kubectl = 1.13.5-00Copy the code
Start the service and set automatic restart upon startup:
systemctl enable kubelet && sudo systemctl start kubelet
Copy the code
Download k8S image from registry.cn-hangzhou.aliyuncs.com because gCR. IO network is unavailable.
Docker pull registry.cn-hangzhou.aliyuncs.com/gg-gcr-io/kube-apiserver:v1.13.5 docker pull Registry.cn-hangzhou.aliyuncs.com/gg-gcr-io/kube-controller-manager:v1.13.5 docker pull Registry.cn-hangzhou.aliyuncs.com/gg-gcr-io/kube-scheduler:v1.13.5 docker pull Registry.cn-hangzhou.aliyuncs.com/gg-gcr-io/kube-proxy:v1.13.5 docker pull Registry.cn-hangzhou.aliyuncs.com/kuberimages/pause:3.1 docker pull Registry.cn-hangzhou.aliyuncs.com/kuberimages/etcd:3.2.24 docker pull registry.cn-hangzhou.aliyuncs.com/kuberimages/coredns:1.2.6Copy the code
Tag:
Docker tag registry.cn-hangzhou.aliyuncs.com/gg-gcr-io/kube-apiserver:v1.13.5 k8s.gcr. IO /kube-apiserver:v1.13.5 Docker The tag registry.cn-hangzhou.aliyuncs.com/gg-gcr-io/kube-controller-manager:v1.13.5 K8s. GCR. IO/kube - controller - manager: v1.13.5 docker tag registry.cn-hangzhou.aliyuncs.com/gg-gcr-io/kube-scheduler:v1.13.5 K8s. GCR. IO/kube - the scheduler: v1.13.5 docker tag registry.cn-hangzhou.aliyuncs.com/gg-gcr-io/kube-proxy:v1.13.5 K8s. GCR. IO/kube - proxy: v1.13.5 docker tag registry.cn-hangzhou.aliyuncs.com/kuberimages/pause:3.1 k8s. GCR. IO/pause: 3.1 Docker tag registry.cn-hangzhou.aliyuncs.com/kuberimages/etcd:3.2.24 k8s. GCR. IO/etcd: 3.2.24 docker tag Registry.cn-hangzhou.aliyuncs.com/kuberimages/coredns:1.2.6 k8s. GCR. IO/coredns: 1.2.6Copy the code
Kubeadm initialization
Use kubeadm to initialize k8S, where the host IP is entered according to its actual situation:
Kubeadm init --kubernetes-version=v1.13.5 --pod-network-cidr=10.244.0.0/16 --service-cidr=10.16.0.0/16 --apiserver-advertise-address=${masterIp} | tee kubeadm-init.logCopy the code
If the host IP address is not known, the yamL file can also be used to dynamically initialize:
Server vi kube-init.yaml apiVersion: kubeadm.k8s.io/v1beta1 kind: ClusterConfiguration kubernetesVersion: v1.13.5 imageRepository: registry.aliyuncs.com/google_containers apiServer: certSANs: - "k8s.api.server" controlPlaneEndpoint: "k8s.api.server:6443" networking: serviceSubnet: "10.1.0.0/16 podSubnet: 10.244.0.0/16"Copy the code
HA version:
ApiVersion: kubeadm.k8s. IO/v1Beta1 Kind: ClusterConfiguration kubernetesVersion: v1.13.5 imageRepository: registry.aliyuncs.com/google_containers apiServer: certSANs: - "api.k8s.com" controlPlaneEndpoint: "api.k8s.com:6443" etcd: external: endpoints: - https://ETCD_0_IP:2379 - https://ETCD_1_IP:2379 - https://ETCD_2_IP:2379 networking: serviceSubnet: 10.1.0.0/16 podSubnet: 10.244.0.0/16Copy the code
Note: apiVersion uses kubeadm, because kubeadm is needed to initialize it, and finally performs the following initialization:
kubeadm init --config=kube-init.yaml
Copy the code
If a problem occurs, resolve it, reset it, or run the following command if more is required:
kubeadm --help
Copy the code
Five, deployment problems
Delete node first (Cluster version)
kubectl drain <node name> --delete-local-data --force --ignore-daemonsets
kubectl delete node <node name>
Copy the code
Clear the init configuration on the node to be deleted (note that you can use this command to return any error after init or join) :
kubeadm reset
Copy the code
Six, check the problem
If any problem occurs after initialization, run the following command to check the container status and network status:
sudo docker ps -a | grep kube | grep -v pause
sudo docker logs CONTAINERID
sudo docker images && systemctl status -l kubelet
netstat -nlpt
kubectl describe ep kubernetes
kubectl describe svc kubernetes
kubectl get svc kubernetes
kubectl get ep
netstat -nlpt | grep apiser
vi /var/log/syslog
Copy the code
7. Configure the k8S apiserver to access the public key for the current user
sudo mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Copy the code
Network plug-in
Kubectl apply -f https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/rbac-kdd.yaml wget https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networkin Yaml - name: CALICO_IPV4POOL_IPIP value:"off" - name: CALICO_IPV4POOL_CIDR value: "10.244.0.0/16 kubectl apply-f calico.yamlCopy the code
Allow the pod command to be deployed on the master node in a single machine:
kubectl taint nodes --all node-role.kubernetes.io/master-
Copy the code
Disallow master to deploy pod:
kubectl taint nodes k8s node-role.kubernetes.io/master=true:NoSchedule
Copy the code
This is the end of the standalone deployment, if your project is delivering a combination of hardware and software in one machine, then this is the end. Remember to allow the master node to be deployed on a single machine.
Next, the cluster version is coming online!
Take the machine deployed above as an example, as the master node, continue:
scp /etc/kubernetes/admin.conf $nodeUser@$nodeIp:/home/$nodeUser
scp /etc/kubernetes/pki/etcd/* $nodeUser@$nodeIp:/home/$nodeUser/etcd
kubeadm token generate
kubeadm token create $token_name --print-join-command --ttl=0
kubeadm join $masterIP:6443 --token $token_name --discovery-token-ca-cert-hash $hash
Copy the code
If CUDA is required for Node machine execution, refer to the following materials:
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#ubuntu-installation
https://blog.csdn.net/u012235003/article/details/54575758
https://blog.csdn.net/qq_39670011/article/details/90404111
Copy the code
Official execution:
vim /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
update-initramfs -u
Copy the code
Restart Ubuntu to check whether it is successfully disabled:
lsmod | grep nouveau
apt-get remove --purge nvidia*
https://developer.nvidia.com/cuda-downloads
sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev
Copy the code
Cuda installation:
Accept select "Install"/Enter select "Yes" sh CUDA_10.1.168_418.67_linux. run echo 'export Bashrc echo 'export PATH=/usr/local/ CUDA-10.1/nsightCompute -2019.3:$PATH' >> ~/. Bashrc echo 'export PATH=/usr/local/ CUDA-10.1/nsightCompute -2019.3:$PATH' >> Bashrc echo 'export LD_LIBRARY_PATH=/usr/local/ CUDA-10.1 /lib64:$LD_LIBRARY_PATH' >> ~/. Bashrc source ~/Copy the code
Restart the PC and check whether CUDA is installed successfully.
Check whether there are “Nvidia *” devices:
cd /dev && ls -al
Copy the code
If not, create a nv.sh:
vi nv.sh #! /bin/bash /sbin/modprobe nvidia if [ "$?" -eq 0 ]; then NVDEVS=`lspci | grep -i NVIDIA ` N3D=` echo "$NVDEVS" | grep "3D controller" | wc -l ` NVGA=` echo "$NVDEVS" | grep "VGA compatible controller" | wc -l ` N=` expr $N3D + $NVGA - 1 ` for i in ` seq 0 $N `; do mknod -m 666 /dev/nvidia$i c 195 $i done mknod -m 666 /dev/nvidiactl c 195 255 else exit 1 fi chmod +x nv.sh && bash nv.shCopy the code
Restart the machine again to view the CUDA version:
nvcc -V
Copy the code
Compile:
CD /usr/local/cuda-10.1/samples && make CD /usr/local/cuda-10.1/samples /bin/x86_64/linux/release. /deviceQueryCopy the code
If Result = PASS is displayed, CUDA is successfully installed.
Install nvdocker:
vim /etc/docker/daemon.json
{
"runtimes":{
"nvidia":{
"path":"nvidia-container-runtime",
"runtimeArgs":[]
}
},
"registry-mirrors":["https://registry.docker-cn.com"],
"storage-driver":"overlay2",
"default-runtime":"nvidia",
"log-driver":"json-file",
"log-opts":{
"max-size":"100m"
},
"exec-opts": ["native.cgroupdriver=systemd"],
"insecure-registries": [$harborRgistry],
"live-restore": true
}
Copy the code
Restart the docker:
sudo systemctl daemon-reload && sudo systemctl restart docker && docker info
Copy the code
Check whether the NviDIa-Docker installation is successful:
Docker run -- Runtime =nvidia --rm Nvidia/CUDA: 9.0-BASE nvidia-smICopy the code
Go to su mode on the node machine :su $nodeUser
Configure the k8S apiserver to access the public key of the current node.
mkdir -p $HOME/.kube
cp -i admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config
mkdir -p $HOME/etcd
sudo rm -rf /etc/kubernetes
sudo mkdir -p /etc/kubernetes/pki/etcd
sudo cp /home/$nodeUser/etcd/* /etc/kubernetes/pki/etcd
sudo kubeadm join $masterIP:6443 --token $token_name --discovery-token-ca-cert-hash $hash
Copy the code
Such as:
Sudo kubeadm join 192.168.8.116:6443 --token vyi4ga.foyxqr2iz9i391q3 -- discovery-tok-ca-cert-hash sha256:929143bcdaa3e23c6faf20bc51ef6a57df02edf9df86cedf200320a9b4d3220aCopy the code
Check whether the node is added to the master:kubectl get node