This article is compiled by Hanming Tian, 36Kr operation and maintenance development engineer, who shared technology in Rancher Technology Exchange group on January 19th. Search for Rancher2 on wechat, add Rancher as a friend, join the technology group, and join the next sharing in real time ~
Hanming Tian, operation and maintenance development engineer at 36Kr, is mainly responsible for operation and maintenance automation, CI/CD construction, and application container promotion.
background
36Kr is a media company founded in 2010, focusing on the field of science and technology venture capital. The business scene is not complicated. The front-end mainly uses NodeJS to Render, the mobile terminal has Android and iOS, and the back-end services are almost all supported by PHP. The main reason for using PHP was that PHP was efficient for Web development at the time of the initial technology selection, and it has continued to be so.
Soaring in the later, however, as the business, in programming and failed to decouple, leads to many application service coupling has become a very bloated monomer, logical coupling is serious, leading to a lot of performance problems, as the problem more and more difficult to change, development task and more and more tight, will have to is delayed, the more problems left by the later is more difficult to change, It formed a vicious circle, leaving a lot of technical debt, which was not good for the subsequent development tasks, and once there was a problem, it was difficult to trace the specific cause, so we often heard a saying at that time, “This is a problem left over from history”.
B/S, C/S, single application, this is a very traditional and very simple architecture, but the shortcomings are also exposed, so often because of a business logic performance problem, and then affect all businesses. In the aspect of operation and maintenance, operation and maintenance can only be dealt with by stack machines, upgrade configuration and other strategies, which has invested a lot of machine costs and human costs, but has little effect and is very passive.
This situation is imminent, finally the technical team decided to use the Java language for reconstruction, the single application for micro-service disassembly, completely change the single application failure caused by a large range of failures in the production environment.
Requirements analysis + selection
In order to save virtual machine resources, we ran multiple Java programs on one virtual machine after the reconstruction plan began. However, due to the lack of resource isolation and flexible scheduling system, some resources were actually wasted. And in high concurrency scenarios, resource preemption occasionally causes one application to affect another. For this purpose, we have developed an automatic deployment system, which includes deployment, monitoring and detection, rollback of deployment failures, restart and other basic functions.
With the popularity of K8s at that time and the release of Rancher 2.x, we gradually found that these problems we were facing could be basically solved, such as resource isolation, deployment controller model and flexible scheduling system, which were the best automated deployment system. Therefore, on the operation and maintenance side, It also decided to move into containerization.
In terms of selection, because our services are basically above Ali Cloud, the first thing that comes to mind is Ali Cloud. At that time, we had some business contacts with Huawei, so Huawei’S CCE was also used as an alternative. However, considering that all our service resources were on Ali Cloud, the cost of migration was too great, so Huawei Cloud was not considered.
We used Rancher 1.6 initially, but only to manage native Docker deployed on the host. That’s why I love Rancher’s products.
On the requirements side, the ease of use of the container management platform is important to reduce the learning cost of our developers. In addition, the basic function of K8s is a must, because K8s is still in the stage of rapid development, so can need to keep up with the update at any time, there are security vulnerabilities also need to be the first time to update patches, but also have basic permission control. Moreover, there is no special K8S team in our company, and there are only two operation and maintenance personnel. Therefore, it is very important to have a professional service team to assist in technical communication if problems occur.
In summary, Rancher has won a complete victory. The UI is very friendly, developers can get used to it quickly, and the update and iteration speed is very fast. There will be a detailed patch scheme when vulnerabilities are found, and the authentication policy perfectly supports our OpenLDAP protocol, which can control different permissions for development, testing and operation personnel. It is also the first company to support a multi-cloud environment, so that we can make cross-cloud solutions in the future.
We mainly experienced the following factors in the process of containerization. Today, I would like to share with you some of our practices on Rancher, hoping to help you:
-
Containerization of applications
-
Rancher’s high availability
-
Container operation and maintenance
-
Multi-tenant Isolation
Containerization of applications
In order to be more friendly to developers, our image is divided into two layers. The main Dockerfile is written by our operation and maintenance staff, while the Dockerfile in the developer code warehouse is the simplest. Basically only the code copy process and some mandatory variables, can refer to the following example:
FROM alpine:3.8 MAINTAINER yunwei <[email protected]> WORKDIR/WWW RUN mv The/etc/apk/repositories/etc/apk/repositories. Bak \ && echo "http://mirrors.aliyun.com/alpine/v3.8/main/" > > /etc/apk/repositories \ && apk update && apk upgrade RUN apk --no-cache add ca-certificates wget && \ wget -q -O /etc/apk/keys/sgerrand.rsa.pub https://alpine-pkgs.sgerrand.com/sgerrand.rsa.pub && \ wget https://github.com/sgerrand/alpine-pkg-glibc/releases/download/2.29-r0/glibc-2.29-r0.apk && \ apk add glibc 2.29 - r0. Apk &&rm -f glibc-2.29-r0.apk RUN apk add-u --no-cache \ bash \ sudo \ tzdata \ drill \ iputils \ curl \ busybox-extras \ && rm - rf/var/cache/apk / * \ && ln - sf/usr/share/zoneinfo/Asia/Shanghai/etc/localtime COPY the Java jar/jdk1.8.0 _131 /usr/local/jdk1.8.0_131 ENV TZ="Asia/Shanghai" ENV JAVA_HOME=/usr/local/jdk1.8.0_131 ENV CLASSPATH=$JAVA_HOME/bin ENV PATH=.:$JAVA_HOME/bin:$PATH ENV JAVA_OPTS="-server -Xms1024m -Xmx1024m" CMD java -jar $JAVA_OPTS -Dserver.port=8080 Server. The jar = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = # # this is the developer to maintain Dockerfile sample FROM harbor.36kr.com/java:v1.1.1 MAINTAINER developer <[email protected]> ADD web.jar ./server.jarCopy the code
As you can see, the Dockerfile maintained by the developer can be said to be quite simple, which greatly reduces the difficulty of the developer maintenance.
In addition, since the size of the build largely determines the length of deployment time, we used alpine, the supposedly minimal image. Alpine has many advantages:
-
Small volume
-
There are package managers, rich dependencies
-
Supported by large factories, it is officially used by many large factories including Docker
Alpine doesn’t have a glibc library. Instead, alpine uses a smaller version of Musl Libc, but Java must rely on glibc, which is available precompiled on GitHub. Called Alpine-pkG-glibc, the library provides perfect Java support while keeping its size small.
Rancher’s high availability
There are two ways to install Rancher: a single-node installation or a high-availability cluster installation. Generally, the single-node installation is only applicable to test or demo environments. Therefore, the ha cluster installation is recommended.
At the beginning, we used single-node installation for the test environment. Later, due to a restart of the Rancher Server machine, the test environment failed, and a small amount of data was lost despite the backup. Finally, we also adopted HA deployment for the test environment, and the whole architecture is shown in the figure below.
I used RKE to install Rancher Server. In order to prevent regional failures of Ali Cloud, we deployed three machines of Rancher Server in two available areas. Rancher Server-001 and 003 are in Zone H of Beijing, and Rancher Server-002 is in Zone G of Beijing.
For load balancing, we use SLB of Ali Cloud, which is also the active/standby instance of procurement, to prevent single point of failure. Rancher must use SSL certificate, and we also have our own domain name certificate. In order to facilitate the maintenance of SSL certificate on SLB, we use layer 7 protocol. The architecture of Rancher Server can be seen in the following figure:
The downstream cluster, that is, the K8s cluster used to bear the business, is also half and half deployed in the two available areas of Ali Cloud. It should be noted that in order to ensure that the network delay in the two areas is less than = 15 ms, a highly available DISASTER recovery architecture is completed.
In terms of backup, we also use ali Cloud ECS snapshot + ETCD S3 protocol to back up to ali Cloud OSS object storage two solutions to ensure that services can be restored in a timely manner if a fault occurs.
A detailed tutorial on deployment can be found in the Official Rancher documentation.
Container operation and maintenance
For container monitoring, Rancher comes with Prometheus and Grafana, and has some integration with Rancher’s UI, which is very handy, so I won’t go into monitoring, I’ll focus on log collection.
In K8s, log collection is more complicated than traditional physical machines and virtual machines. Because K8s provides a dynamic environment, binding hostPath is not applicable. We can use the following table to make an intuitive comparison:
It can be seen that K8s needs to collect various types of logs. However, containerized deployment mode has a high number of applications in a single machine and is dynamic. Therefore, traditional log collection mode is not applicable to K8s.
At present, the collection methods of K8s can be roughly divided into two types: passive collection and active push.
Active push generally has DockerEngine and business direct write two ways: DockerEngine is Docker LogDriver native, generally can only collect STDOUT, generally not recommended use; In service direct write, the SDK for log collection needs to be integrated in the application and sent to the collector directly through the SDK. Logs do not need to be dropped, and Agent does not need to be deployed. However, services are strongly bound to the SDK and the flexibility is low.
Passive push was collected by deploying log collection agents in two ways: Daemonset, which deployed one Agent on each machine node, and Sidecar, where each Pod deployed one Agent in the form of Sidecar.
Sidecar deployment consumes resources, which means that each Pod has an agent. However, this deployment has strong flexibility and isolation, which is suitable for large K8s clusters or groups that provide services for business parties as PaaS platforms. Daemonset deployment consumes less resources. Suitable for a cluster with a single function and few services.
Considering our own scenario, it is a small cluster with few businesses, so we chose the deployment mode of Daemonset. In the test environment, we selected log-pilot, a log collection component of Ali open source, after investigation and investigation. The GitHub address is: Github.com/AliyunContainerService/log-pilot, by combining Elasticsearch, Kibana is a good K8s log solution.
Because our server on ali cloud, we only had two operations staff is less, didn’t have the energy to maintain a large distributed storage cluster, so our business journal selection store on ali cloud services, so in a production environment, we K8s also USES the ali cloud log service, the daily log 600 million + without any problems.
To collect logs using Ali Cloud, you need to open ali Cloud’s log service, and then install Logtail log component Alibaba-log-Controller Helm. There is an installation script for this in the official document, and I have attached the link to the document below. During component installation, AliyunLogConfigs CRD is automatically created, Deployment of Alibaba-log-Controller is deployed, and Logtail is installed in DaemonSet mode. Then you can access the logs you want to collect from the console. After installation, it looks like this:
Logtail Collects text logs generated in containers and uploads the metadata of containers to the log service. Kubernetes file collection has the following features:
-
You only need to configure the log path in the container and do not care about the mapping between the log path and the host
-
Support for specifying collection containers by Label
-
Support for excluding specific containers by Label
-
Support for specifying collection containers by environment variables
-
Support for specifying excluded containers by environment variables
-
Support for multi-line logging (such as Java Stack logging)
-
Support automatic labeling of Docker container data
-
Support automatic labeling of Kubernetes container data
If you want to learn more, you can check out the official documentation of the Ali Cloud logging service:
Help.aliyun.com/document_de…
Multi-tenant isolation of containers
What I’m talking about here is mainly multi-tenant isolation of users within the enterprise, not multi-tenant isolation of SaaS and KaaS service models.
In terms of permissions, because our company strictly controls permissions, Rancher just provides very convenient permission control based on cluster, project, namespace and other granularity, and supports our OpenLDAP-based authentication protocol, which is very easy to manage. I can assign cluster/project/namespace permissions to developers and testers on different project teams.
As shown in the figure below, I can add users to a cluster, add users to a Project, specify several different roles, and even customize roles.
For example, in scenario 1, I can assign the development environment cluster -> project 1 Owner permissions to the project leader, who is then free to add his members to the project and assign permissions accordingly.
Scenario 2: I can assign Owner rights to the test cluster to the test manager, who is responsible for which project’s test deployment, and developers can only view the logs.
In terms of resources, resources in the container must be quotas set, if you don’t set the resource limitation, once a certain application performance problems, will affect the entire node all the applications on the node, K8s will application problems of scheduling to other node, if you don’t have enough resources, there will be paralysis of the entire system, cause an avalanche.
There is also a hole in the resource quota limit for Java applications, because by default Java uses /proc/meminfo for memory information, and the default JVM uses 25% of system memory for Max Heap Size, However, /proc/meminfo in the container is mounted to the container in read-only mode of the host, so it is not possible to use the default value. As a result, the application will be OOM after exceeding the memory quota of the container, and the health check will restart the service, resulting in continuous application restart.
Is it possible to manually set JVM memory = container memory limit? No way! Since the JVM consumes more memory than Heap, and since the JVM is also an application that needs extra space to do its job, the quota you need to configure should be Metaspace + Threads + Heap + memory required for the JVM process to run + other data. Because it involves more content, I will not expand, interested students can search for it.
Total knot
Because our business scenario is not complicated, so our container road, actually walk also pretty is relatively smooth, very few of our operations staff, only two, so we do not have too much time and effort to maintain self-built system too much, we use a lot of ali cloud products, including the Rancher, he is very convenient way to deploy, The user-friendly UI, including integrated monitoring and so on, gave us a lot of confidence on the containerization road.
We reduce the learning complexity of developers by building a two-layer mirror. Alpine + precompiled Glibc was used to reduce the mirror size. The deployment time is improved. In terms of architecture, we adopt the disaster recovery architecture of ali Cloud two-zone machine room and complete backup scheme. Using the log collection component deployed by Daemonset, we collected ali Cloud log service to support our 600 million/day log system. Rancher also provides us with deeply integrated monitoring systems, multi-tenant isolation, and more. And the resource quota Settings that we step on ourselves.
In fact, containerization is not complicated. Without K8s, we need to build our own health monitoring system, release system and maintain different host environments. We cannot divide resources in fine granularity and make more effective use of computing resources. In my opinion, it is to save cost and improve efficiency. Virtualization, automation, intelligence, high performance, high availability, high concurrency, all of these are all about cost and efficiency, and K8s has done that for us, and a platform like Rancher has helped us reduce the learning complexity of K8s, so all you have to do is add K8s, and there you go. And that’s the end of this sharing. Thank you ~
Community QA
Q1: Is there a recommended high availability storage solution for K8S in the production environment?
A1: There is no standard answer for storage solutions. We mainly use Ali Cloud, so we use Ali Cloud’s block storage solutions. Common solutions include Ceph, GlusterFS, Portworx, OpenEBS, etc., each of which has its own advantages and disadvantages and needs to be selected according to its own business requirements
Q2: Grayscale release, Kubernetes network traffic can be distributed through the service grid to achieve network level distribution, but involves the application of large version of the update, involves the database structure change, how to achieve grayscale release?
A2: I have not encountered this scenario, but IT provides an idea, you can prepare two sets of databases, network shunting can also be shunted to the unconnected database, specific need you to verify whether it is feasible
Make it clear that there are two layers, one logical layer and one data layer
Q3: What is Pipeline made of? Under Pipeline, how to deal with the same branch, need to test multiple versions of the scenario? I used Rancher’s Pipeline, which had a big limitation, that is, the same branch could not be tested for multiple sets in parallel. The namespace is in use, but the same branch, the namespace is written under.rancher. Yml, so it is not distinguishable, Rancher Pipeline can not inject variables outside to distinguish.
A3: Rancher’s Pipline is still not flexible enough at present. We use self-built Jenkins to do Pipeline tests in parallel, which can be isolated by namespaces or other isolation strategies, or prepare multiple test environments
Q4: How do you merge Dockerfile and Dockerfile?
A4: The Dockerfile developed is the Dockerfile From operation
Q5: What tools do you use for vulnerability scanning of K8S? What level of mirroring vulnerability needs to be fixed?
A5: The missing scan tool is not used for the time being. We mainly repair according to the repair suggestions in the Rancher Enterprise service notice
Q6: For example, from the Internet, the service IP can log in and manage the container. You can do this by exposing the service IP. Then how do you expose the service IP? Please answer it.
A6: If you need to manage containers, you can actually use Rancher’s user permission control. It is impossible to let a user have the permission of a container, expose the service IP to the public network, and let users manage containers
Q6: Ok, thank you. I still don’t understand this service IP. How can it be exposed? Do you mean to let different users manage different containers through the Rancher platform? Trouble to answer again, thank you.
A6: You can use NodePort exposure, which is accessed through Node IP and port, or use a load balancing product from the public cloud
Q6: THAT’s not what I mean. I want to expose the service IP, not just through internal cluster access.
A6: The service IP address is internal to K8s and cannot be exposed. It can only be forwarded
Q7: Why is it not in three availability zones? If availability zone H fails, does the cluster become inaccessible?
A7: Three availability zones is certainly possible, Rancher HA architecture, as long as there is one Server available it doesn’t matter
Q8: What is the process of pipeline for multiple development and test environments? Is helm template used? Can you explain more details?
A8: Currently Jenkins deployment parameters are used. During deployment, you can select namespace, environment identifier, branch, etc., and modify template through sed
Q9: What is your Devops flow like? Does one environment correspond to one Docker image, or does test Pre PRD share one Docker image? What if a docker image shared test pre PRD (such as the configuration of different environments and development co-flow)?
A9: We are using the same image. During deployment, by selecting different environment identification parameters, the program will automatically inject the configuration of different environments. Some corresponding configuration modifications need to be made in development
Q10: Do you not know how to configure the resource limit of the container
A10: Rancher can be set at the project, namespace, and Pod granularity, with opposite priorities