K8S practice - Promethues Migrates K8S from VMS

Open Source Project Recommendation

Pepper Metrics is an open source tool I developed with my colleagues (github.com/zrbcool/pep…) , by collecting jedis mybatis/httpservlet/dubbo/motan performance statistics, and exposure to Prometheus and other mainstream temporal database compatible data, through grafana show the trend. Its plug-in architecture also makes it easy for users to extend and integrate other open source components. Please give us a STAR, and we welcome you to become developers to submit PR and improve the project together.

background

The existing monitoring system is deployed on virtual machines and needs to be migrated to K8S cluster to reduce the overhead of virtual machines. We built the monitoring system based on Prometheus for nearly a year and saved data for 180 cycles. The current data amount is about 800GB. The following is about our migration scheme and migration process.

plan

1. In the Kubernetes cluster

Create good storage, PV, PVC
Create StatefulSet, Service, etc
Pull Prometheus+Grafana in K8S
Kubectl delete deletes the related instances and keeps YAML

2. On old Prometheus and Grafana instances

Mount the verified disk in the K8S cluster to the data machine instance to be migrated
Migrating data
Modify the owning group and permission of data
Uninstall new dish

3. In the Kubernetes cluster

Deploy the predefined and tested YAML again
Observe, test
Switch the Grafana data source to Prometheus in K8S

Implementation process

New Prometheus pulled from the K8S cluster

Those who qualify qualify kubectl logs Prometry-ECS-0-N monitoring... Level = info ts = 2019-07-08 T04: name. 715 zcaller=main.go:714 msg="Notifier manager stopped"Level = info ts = 2019-07-08 T04: name. 715 zcaller=main.go:544 msg="Scrape manager stopped"Level = error ts = 2019-07-08 T04: name. 715 zcaller=main.go:723 err="opening storage failed: mkdir data/: permission denied"
Copy the code

Permission error, add securityContext description, try again, see Google-groups or Github Issue here

kind: StatefulSet
metadata:
  name: prometheus-ecs
  namespace: monitoring
spec:
  serviceName: "prometheus-ecs"
  selector:
    matchLabels:
      app: prometheus-ecs
  template:
    metadata:
      labels:
        app: prometheus-ecs
    spec:
      .
      securityContext:
        runAsUser: 1000
        fsGroup: 2000
        runAsNonRoot: true
      .
Copy the code

After adding the description and pulling up the POD, the data can be generated normally. See the data folder below

/prometheus $ ls -l
total 5242908
drwxr-sr-x    3 1000     2000          4096 Jul  8 04:26 data
drwxrwsr-x    4 root     2000          4096 Jul  8 04:09 k8s-resource-monitoring
drwxrwS---    2 root     2000         16384 Jul  8 02:37 lost+found
Copy the code

Next we kubectl delete all the objects we just created, get YAML ready, and then mount the disk to the machine we plan to migrate to synchronize the data.

Mount the disk to the old Prometheus machine

Mount the created target disk to the old Prometheus VM and use rsync to level the data. Use –bwlimit to control the speed, otherwise the disk will be full and the online performance will be affected.

rsync -av --bwlimit=200M --delete --progress --log-file=/tmp/rsync.log /data/coohua/prometheus/data /data1/data
Copy the code

Ensure that the system is stable and all parameters are normal

# dstat -a
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  inout | int csw 28 0 72 0 0 0| 652k 666k| 0 0 | 0 0 |1953 1401 37 4 54 5 0 0| 141M 237M| 344k 5408B| 0 0 |6918 5466 39 6 47 9 0 0| 205M 293M| 17k 3006B| 0 0 |7477 5874 48 6 38 8 0 0| 201M 306M| 959k 10k| 0 0 |8001 5705 50 5 45 0 0 0| 187M 12M| 83k 3531B| 0 0 |7589 5135 41 6 50 3 0 0| 204M 113M| 360k 11k| 0 0 |9505 8630 38 5 47 9 0 1| 177M 204M| 84k 1468B| 0  0 |8268 7147 71 5 20 4 0 0| 147M 308M| 131k 4766B| 0 0 |8334 5281 94 4 2 0 0 0| 132M 301M| 180k 4392B| 0 0 |9274 5789 46 5 48 0 0 0| 210M 42M| 393k 5334B| 0 0 |7928 6030 40 5 55 0 0 0| 190M 0 | 361k 16k| 0 0 |6994 5119 28 5 58 9 0 0| 156M  228M| 80k 3554B| 0 0 |5414 4121# topTop-14:21:03 Up 186 days, 23:07, 2 Users, Load Average: 6.36, 5.54, 5.07 Tasks: 144 total, 4 running, 140 sleeping, 0 stopped, 0 zombie %Cpu0 : 85.0US, 0.0sy, 0.0Ni, 15.0ID, 0.0wa, 0.0hi, 0.0si, 0.0st %Cpu1: 29.1US, 1.0sy, 0.0Ni, 61.6ID, 7.9wa, 0.0hi, 0.3si, 0.0st %Cpu2: 27.1us, 3.3SY, 0.0ni, 69.6ID, 0.0wa, 0.0hi, 0.0si, 0.0st %Cpu3: 66.3 us, 1.7SY, 0.0ni, 29.0id, 3.0wa, 0.0hi, 0.0si, 0.0st %Cpu4: 37.3us, 11.0sy, 0.0ni, 51.0id, 0.7wa, 0.0hi, 0.0si, 0.0st %Cpu5: 31.8 us, 8.4SY, 0.0ni, 54.2ID, 5.7wa, 0.0hi, 0.0si, 0.0st %Cpu6: 61.8us, 3.0sy, 0.0ni, 34.6id, 0.7wa, 0.0hi, 0.0si, 0.0st %Cpu7: 31.5US, 12.2SY, 0.0Ni, 46.1ID, 10.2wa, 0.0hi, 0.0Si, 0.0st KiB Mem: 16267428 total, 160392 free, 7907888 used, 8199148 buff/cache KiB Swap: 0 total, 0 free, 0 Used.7989000 Avail Mem PID USER PR NI VIRT RES SHR S %CPU % Mem TIME+ COMMAND 4903 Coohua 20 0 0.682t 7.784g 817072 S 279.7 50.2 62469:47 Prometheus 20712 root 20 0 129656 1180 332 R 62.1 0.0 2:10.42 rsync 20710 root 20 0 129696 2348 1232 R 56.1 0.0 1:53.88 rsync 61 root 20 0 0 0 S 6.6 0.0 110:21.78 kswapd0# iostat -xdm 1Linux 3.10.0-514.26.2.el7.x86_64 (Monitor-Storage001) 2019年07月08日 _x86_64_ (8 CPU) Device: RRQM /s WRQM /s r/s W /s rMB/s wMB/s AVGRq-sz AVgqu-sz await r_await w_await SVCTM %util VDA 0.00 0.64 0.06 1.33 0.00 0.01 17.52 0.01 7.71 23.99 6.95 0.86 0.12 VDB 0.01 0.36 4.11 1.88 0.67 0.64 446.97 0.04 7.46 10.11 1.67 1.35 0.81 VDC 0.00 0.00 0.00 0.00 0.04 959.25 0.01 179.72 32.19 179.79 1.54 0.01 Device: RRQM /s WRQM /s R/S W/S rMB/s wMB/s AVGRq-sz AVgqu-sz await r_await w_await SVCTM %util VDA 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 VDB 0.00 2.00 359.00 2.00 160.00 0.02 907.79 3.16 8.33 8.37 1.00 1.34 48.50 VDC 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Device: RRQM /s WRQM /s R/S W/S rMB/s wMB/s AVGRq-sz AVgqu-sz await r_await w_await SVCTM %util VDA 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 VDB 1.00 0.00 470.00 8.00 208.55 3.44 908.30 4.50 9.82 9.48 30.00 1.31 62.40 VDC 0.00 8.00 0.00 409.00 0.00 190.92 955.99 68.50 154.77 0.00 154.77 1.47 60.30 Device: RRQM /s WRQM /s R /s W /s rMB/s wMB/s AVGRq-sz AVgqu-sz await r_await w_await SVCTM %util VDA 0.00 4.00 0.00 2.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.50 0.50 0.10 VDB 0.00 0.00 457.00 0.00 202.09 0.00 905.65 4.32 9.48 9.48 0.00 1.33 61.00 VDC 0.00 0.00 0.00 640.00 0.00 300.49 961.58 127.32 194.03 0.00 194.03 1.56 100.10Copy the code

Sit back and wait for data to be synchronized

The disk is mounted again into the POD to use the migrated data

The data migration is complete. Check the disk read and write status during the migration. Since the rate limit is 200MiB, the data read and write status is as expected

Pay attention to modify the data permissions, otherwise the POD will not be up again

# umount -l /data1
# chown -R 1000:2000 /data1/data
Copy the code

PV/PVC was not cleaned and the following problems occurred. PV was Released and could not be used again

kubectl get pvc -n monitoring pv-metrics-ecs-promethues
NAME                        STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
pv-metrics-ecs-promethues   Pending   

kubectl get pv -n monitoring pv-metrics-ecs-promethues 
NAME                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                                  STORAGECLASS   REASON   AGE
pv-metrics-ecs-promethues   1Ti        RWX            Retain           Released   monitoring/pv-metrics-ecs-promethues   disk                    3h32m
Copy the code

Kubernetes (Kubernetes, Kubernetes, Kubernetes, Kubernetes, Kubernetes, Kubernetes) It was used to connect Grafana that was not migrated to K8S to the data source of Grafana that was then modified to point to Prometheus after migration, and the data lost during migration interrupted service <10 minutes

Summary & Reflection

It’s not a perfect solution, but it’s simple

After data synchronization, the POD is pulled from K8S. After POD initialization, Prometheus processes data and then starts pulling new data, with an interruption of 10 to 15 minutes, which means a loss of monitoring data for at least 10 to 15 minutes, which is acceptable from a business perspective. Find a suitable time to operate. Is there a perfect solution? The solution mentioned by Developer for Prometheus, an official community, was to start a new instance running the same configuration and wait long enough for the old one to retire. This was probably the perfect solution, but in our scenario, we stored nearly 1 terabyte of data for 180 days and used SSDS for storage. The cost of our hardware would double during the retirement period, so we rejected the proposal economically.

Caution When using NFS as storage PV

NFS was used in the first attempt to migrate. Due to the characteristics of cloud providers, NFS is more economical, and can be mounted in multiple places. Data management is also very convenient. So in heavy I/O scenarios, really think about NFS and do a good performance evaluation.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

K8S practice – Promethues Migrates K8S from VMS

Open Source Project Recommendation

background

plan

1. In the Kubernetes cluster

2. On old Prometheus and Grafana instances

3. In the Kubernetes cluster

Implementation process

New Prometheus pulled from the K8S cluster

Mount the disk to the old Prometheus machine

The disk is mounted again into the POD to use the migrated data

Summary & Reflection

It’s not a perfect solution, but it’s simple

Caution When using NFS as storage PV

K8S practice – Promethues Migrates K8S from VMS

Open Source Project Recommendation

background

plan

1. In the Kubernetes cluster

2. On old Prometheus and Grafana instances

3. In the Kubernetes cluster

Implementation process

New Prometheus pulled from the K8S cluster

Mount the disk to the old Prometheus machine

The disk is mounted again into the POD to use the migrated data

Summary & Reflection

It’s not a perfect solution, but it’s simple

Caution When using NFS as storage PV

Related Posts

Use EasyExcel to dynamically add self-increasing ordinal columns

Data processing of database series

@enableAutoConfiguration Processing logic