preface
Containerization has become a trend, which can solve a lot of pain points in operation and maintenance, such as efficiency, cost, stability and other problems. However, there are often many problems and inconvenience in the process of accessing containers. At the very beginning, The purpose of containerization in Youzan was to deliver the development and test environment quickly. In the process of containerization, we encountered various problems such as container technology, adaptation of operation and maintenance system, change of user habits, etc. This paper mainly introduces the problems encountered in the process of containerization in Youzan and the solutions adopted.
I love the idea of containerization
When there is approval, there will be many projects and daily development in parallel, and the preemption of the environment seriously affects the efficiency of development, testing and launching. We need to provide a set of daily development and qa environment for each project, and the project environment will be created and destroyed along with the project and daily life cycle. One of our earliest containerization requirements was how to solve the problem of rapid delivery of the environment.
[Favorable environment]
The above is the general R&D process. In the standard process, we have four stable environments, which are Daily environment, Qa environment, pre-release environment and test environment. Our development, testing, and co-commissioning work will not be carried out directly in the stable environment, but will be pulled out a set of independent project environment, as the code is developed, tested, pre-issued and finally released to the production environment and then synchronized back to the stable environment of Daily/Qa.
[Project Environment]
We provide a set of parallelism with minimal resources to meet the biggest project delivery plan, the environment of stable in Daily/Qa environment, on the basis of isolated N project environment, in the project environment only need to create the project involved the application of computing resources, lack of other service call is provided by a stable environment, in the project environment, We use container technology a lot.
[Continuous delivery]
Then we realized the continuous delivery pipeline based on the solution of fast delivery in the project environment. So far, there are more than 600 sets of project/continuous delivery environment, plus Daily/Qa stable environment, involving four or five thousand compute instances. These compute instances have very low CPU and memory utilization. Containerization can solve the efficiency problem of environmental delivery very well, and improve the utilization of resources to save the cost of input.
Nice container solution
Our containerization scheme is based on Kubernetes (1.7.10), Docker (1.12.6) and Docker (1.13.1). The problems we encounter in various aspects and solutions are introduced below.
network
In the process, the whole unit cannot be fully containerized. Therefore, it is necessary to communicate with the original cluster on network routing. Since we cannot solve the problem of intercommunication between overlay network and public cloud network on public cloud, Therefore, at the beginning, we gave up the overlay network plan and adopted the MACVLAN plan under the managed network. In this way, the problems of network connectivity and network performance were solved, but the advantages of public cloud elastic resources were not enjoyed. With the development of Uzan multi-cloud architecture and the integration of overlay networks and VPC networks supported by more and more cloud vendors, the problem of elastic resources is alleviated.
Isolation,
The isolation of containers mainly utilizes the namespace and Cgroup technologies of the kernel, which has a good performance in the isolation limit of resources such as process, CPU, memory, IO, etc., but there are many deficiencies compared with virtual machines in other aspects. One of the most common problems we encounter is the inaccurate number of cpus and memory sizes seen in the container, because the /proc file system cannot be isolated, causing the processes in the container to “see” the number of cpus and memory sizes in the physical machine.
Memory problems
Our Java application determines how JVM parameters should be configured based on the memory size of the server, and we use the LXCFS solution to avoid this.
CPU number problem
Because we have oversold requirements and kubernetes uses CPU share by default for CPU limits, even though we used LXCFS, the number of cpus was not correct. The JVM and many Java SDKS determine how many threads to create based on the number of cpus on the system. As a result, Java applications have far more threads and memory usage than virtual machines, which severely affects their performance. Other types of applications have similar problems. We build in an environment variable, NUM_CPUS, based on the container’s specifications, and then, for example, a NodeJS application will use this variable to create its worker process count. For Java class applications, we simply override the JVM_ActiveProcessorCount function with LD_PRELOAD to return NUM_CPUS [1].
Applications access
Before containerization, all favorable applications have been connected to the publishing system, and the packaging and publishing process of applications has been standardized in the publishing system, so the cost of application access is relatively small, and the business side does not need to provide Dockerfile.
- Nodejs, Python, PHP-SOA, etc. It is designed to be hosted in a container where the app. Yaml file defines runtime and startup commands for the container to run.
- There is no need to change the business side of the application initiated by Java standardization
- Non-standardized Java applications need to be standardized
Image integration
Load Balancing (ingress)
Container login and debugging
The log
Gray released
The flow related to grayscale publishing mainly includes three parts:
- HTTP access traffic of the client
- HTTP calls between applications
- Dubbo call between applications First, we put labels of various dimensions required by gray scale on unified access of the entrance (such as user, shop, etc.), and then we need to transform unified access, HTTP client and Dubbo client, so that these labels can be transparently transmitted in the whole call chain. When we do container grayscale publishing, we will issue a grayscale Deployment, and then configure grayscale rules in unified access and grayscale configuration center. Callers on the whole link will sense these grayscale rules to realize grayscale publishing.
Standard environment containerization
The starting point for a standard environment
- Similar to the project environment, more than half of the daily, QA, Pre, and PROD servers in the standard stable environment run at low water levels and are wasteful.
- Considering the cost, daily, QA and Pre are all run on a single VIRTUAL machine, so once the stable environment needs to be released, the standard stable environment and project environment will be temporarily unavailable.
- The delivery speed of virtual machines is slow, and grayscale publishing using virtual machines is complicated.
- VMS tend to exist for several years or longer, and the convergence of operating systems and basic software versions is very troublesome during running.
Containerization of standard environment is promoted
After the rollout and iteration of previous projects/ongoing deliveries, most applications are already containerized themselves. But for going online, you need the entire o&M system to accommodate containerization, such as monitoring, publishing, logging, and so on. At present, the preparation of containerization in our production environment is basically completed. Some front-end NodeJS applications have been posted on the production network, and other applications are being promoted successively. We hope to share more containerization experience in the production environment in the future.
conclusion
The above is the application of Youzan in containerization, as well as some problems and solutions encountered in the process of containerization. Our production environment is still in the beginning stage of containerization, and we will encounter various kinds of problems in the future. We hope to learn from each other and share more experience with you later.
reference
[1] github.com/fabianenard…