The live recorded links: live. Juejin. Cn / 4354/595741
start
Welcome to the first live broadcast of the Nuggets in August. I’m Wang Shengsong
The topic I bring to you today is how to build CICD systematically. Without further ado, let’s get started
To introduce myself
Before we begin, let me introduce myself. My name is Janlay. I am currently working on the Gitee Devops product front end team. I am a front end development engineer involved in the front end development of the team’s Devops product. Has led the front end development of CI/CD related products and is the author of the nugget booklet implementing a CI/CD process from 0 to 1.
Status quo and dilemma
Before we get started, let’s talk about the current situation of CICD mastery on the front end.
The status quo
This is a chat I had with one of the other front-end developers earlier. As you can see, front end engineers are interested in the direction of CI/CD. But also anxious. And some big factory interview will also ask related questions, which also involves Kubernetes related technology.
How well do you know it?
Mastery of 162 front-end CI/CD tools was investigated
I did a master survey of CI/CD tools for 162 intermediate and advanced front ends before I wrote the booklet last year. Some of them are already in Dafang, or they are already a leader of a front end.
People’s degree of practice and mastery of Docker is very, very low, less than 20% and only 19%. Less than 1% of them had a grasp of Kubernetes. Out of 162 people, only one has ever done it. There are a lot of people who do not know about the new concept of product library, and there are also some self-research people. Build tools, deploy tools, these are things that you probably hear a lot about, a lot of common ones, a lot of good ones. But there are quite a few students who are not very clear about this part, or are using manual deployment. In fact, it is not very adapt to the development of The Times and is not conducive to improving efficiency.
The conversation tool chain
Here is a graph of Devops toolchains compiled by a consulting firm. On the whole, these tools are actually quite numerous. In addition to continuous build and deployment, there are things like monitoring operations, automated testing, process management, and container choreography.
There are seven code managed Git libraries. CI is not just Jenkins, Gitlab, CI. There are other toolchains, actually more and more miscellaneous, that are related to CI/CD in a tool ecology. With these tool ecosystems, we were able to fully unlock the value of CI/CD.
So CICD is only used as an automation tool. So how significant is it
Why automation?
So why automate? You may have heard of a development model called Waterfall Flow.
At the top left is how Waterfall Flow is developed. The traditional waterfall development method is: the requirement schedule is fixed, the requirement is not allowed to modify in the middle, until the final full online. The process of testing is also more complex, and the process of going online is also more complex.
As you can see in this diagram, there is a wall between development and testing, and a wall between testing and operations. Then it can only be delivered to the test after all the functions are developed, and then it can be delivered to the operation and maintenance online after all the functions are tested and tested. So this development model in today’s Internet era, in fact, has been seriously out of line.
In the Internet era, product demand changes are very frequent, and waterfall flow can not adapt to it, so there is such a way of agile development. A large piece of the original waterfall flow needs to be broken down into iterations. In each iteration, there are requirements related to the iteration. After the requirements of an iteration are developed, they are handed over to testing, and the original whole process becomes a small run. Research and development do part of the requirements testing, testing on the test part, online on the part. Products can be quickly proven to a market at an early stage.
Of course, demand changes frequently, in today’s is a necessity. Because the development of The Times is relatively fast. If everyone can see the future market trend and user needs, maybe JINGdong Taobao will not allow Pinduoduo to exist.
But then comes the disadvantage of agility. Since the process of going live is not automated and is complex, once development and testing is complete, the requirements still have to wait for a fixed time to go live. Because the release process is more and complex, it is still not possible to verify the requirements uniformly in the first place because of automation and this threshold is stuck here.
That’s where Devops came in and broke down the wall of shipping and development testing. We went online to deploy the build. You can implement an automatic rollout, automatic build, automatic deployment. That way, we have a smaller release cost and a faster product release cycle. The right to go online has changed from operation to research and development, to the business side. This is the point of automated build deployment, which allows requirements to be proven to market faster.
So automation is not only about improving performance at the RESEARCH and development level, but also about influencing rapid business iterations. Therefore, from the perspective of r & D students, mastering automated construction and deployment is more popular.
The evolution
Just now, we saw what automation means in the business. Let’s take a look at the evolution of CI/CD and see how the technology has evolved to automate as much as possible
FTP + Tomcat/Nginx in ancient times
The first is the ancient build deployment. In fact, it’s not very early, I attended a technology sharing in 18 years, and a back-end engineer talked about this part, he told me is using FTP mode of deployment. Many school textbooks even teach this. It is true that FTP was common in the early days of web deployment.
The first is to build the code locally.
In the days of FTP, Webpack might not have existed at all, maybe just a few JQuery pages, or static pages, throwing the page files directly to the server.
A lot of this is done with the back end code. Such as Tomcat, Apache.
With the development of The Times, the FTP way is too backward. As I mentioned earlier, product requirements need to be validated quickly to market. In addition to splitting requirements, the rollout process needs to be automated to achieve this goal.
Without automatic operation, the risk is also relatively large. For example, sometimes there is no good operation, accidentally touch, or accidentally deleted files, resulting in service and system problems. This has happened n times in history. And automation also solves a lot of repetitive labor.
A machine can perform an operation automatically, but it is a waste of cost to perform it by a human.
Automatic prologue Shell + Nginx
So the initial exploration of automation was to write Shell scripts, and you can automate operations by writing Shell scripts. First of all, we will pull down the code in the script, which is missing the git fetch command, above you can imagine haha. Once the code has been pulled down, do an NPM run build, compile it and throw it into the corresponding service directory.
In this case, there is an independent deployment server. We can consider throwing the compiled compressed package to the target server in SCP mode. Use the SSH command to remotely control the server for a decompression. For example, in line 10 above: tar ZXVF, decompress the tar package, and then move mv to the corresponding directory.
That was one of the first things we did with automation, and a fair number of businesses still use it today. Including some internal, small businesses like ours, are also deployed in these ways. This also confirms that the cost of this part of the way is still relatively low.
The cost is relatively low, and there are no additional service dependencies and no more complex operations. You don’t need a complex architecture.
But as the business grows and the team expands, it can become less friendly.
First, you can execute the script on the server, which means you have server permissions. So everyone wants to build, the risk of password public key disclosure will increase. Anyone can access the server, in case someone wants to delete the library and run away; Or accidentally deleted a file, accidentally misoperation led to the service module, is also a catastrophic problem.
Second, it’s not very user-friendly. So can we use an idea of software development, these real operations to do a upper layer of encapsulation? Provide a unified entrance to the user, the user can only perform a range of operations you give, can be executed on the interface for a key.
So, like a lot of companies, we chose Jenkins, which is a very classic build tool with things like visualization. And it has an account system, you can assign permissions according to the account. Different people see different tasks and can perform different tasks. Of course, it also has a variety of plugins that can be extended, such as Git Plugin, Node Plugin. There are also very open apis that you can use to make a custom extension.
Visualized Jenkins + Nginx execution
After we automate it, everything looks better. After writing the Shell in advance, we can execute the script you want to execute with one key.
Even Jenkins gives you a feature like timing execution. When you submit code or merge a pull request, Git will use the Webhook to call your Jenkins’ build trigger link to give the code a build trigger.
You can see the two images above: the upper left corner is the Shell script writing area, and the right side is a compiled log, which can be written and implemented in the web page.
So here’s the overall process. When we commit the code to the repository, it’s time to open Jenkins, click on one of these build buttons, and the build is done. Or we can push it directly into the repository, trigger the Webhook, and build it directly. It’s a much more convenient way.
However, one day service. It’s not just for you. For example, with many companies doing TOB, your service is not just for you to use, you have to take your service to the customer and deploy it.
In this case, it can be a headache if your service requires a very complex environment to install. Especially in the case of more customers, the installation environment is also very time-consuming and laborious, and even said that the operating system will encounter inconsistency. For example, Ubuntu on your own development environment becomes CentOS on your client’s site, and some environments rely on incompatible installation methods.
Version induction is also a big problem. In particular, each release package is very large, and the installation deployment requires pulling out a particularly heavy installation package, and it is also very time-consuming. Ideally, I would take a virtual machine, go to the client site and run.
This is where Docker comes in handy. It can highly integrate things like your runtime, service code, and so on into one image. Where you want, if there is a Docker runtime environment, we can run the Docker image, and have a full set of runtime experience.
In addition, the update of each image of Docker is incremental. Only the modified part is pulled. It is not necessary to pull the full image every time, which is also very space-saving, time-saving and labor-saving.
Container age Jenkins + Docker + Nexus + Nginx container
So we ran the original service with Docker image, and then built and compiled the build package into compiled image. There’s also a new thing introduced here — the artifact library, which is shown on the right. The output of each compilation is called an artifact, and the file storage system in which it is stored is called an artifact library.
The screenshot in the upper right corner is a Nexus artifact library. For example, Java also has its own package specification Maven, Node’s package specification NPM, and Docker’s image, which can be created and hosted by Nexus. So for some small and medium-sized teams, Nexus is still relatively versatile, but also relatively cost saving. There’s a set of things that you can use both front and back.
So the process becomes: when we build the image, we build it, and then we push the image into the mirror library. The original build code becomes the build image, pushes to the image library and tells the remote server that you want to pull the image.
The image on the left shows an SSH command at the bottom: this will pull down the image using the Docker command on the 151 server, then stop the container, delete it, and run a new image. So that way, our service is highly integrated with the environment it needs to run in, and you don’t need to worry about distribution, and you don’t need to be affected by an operating system.
But here’s the problem. One day the number of users becomes large, you need to add servers, one server is not enough to add. The biggest headache is when I have to batch update new versions of servers, or add new servers. It’s ok if we add one, but what if we add five, ten? If you expand it, it will become endless.
We need a general strategy to solve the problem, so that we can not only operate this server in batches, but also make it particularly convenient when we add this server. Just like writing a configuration file, we can write an action by writing a list.
So here’s a very well-known tool called Ansible. Ansible is an automated operation and maintenance tool from Red Hat that allows you to batch operations and deploy servers based on your pre-made server list.
You can write like a notepad, write a YAML file, json file to write: your first server IP, how much account password, how much second server account password…. This creates a list. After giving the list to it, it will follow the account number and password written on your list, server address to our remote server to operate the script to perform tasks.
It’s essentially an automated operation tool. It’s a tool that automatically helps you execute commands, not just deploy them. For example, if you have a large number of nodes and need an installation environment, Ansible can be very useful. And it’s written in Python, so it’s faster.
Operate Ansible in batches based on the configuration list server
So this is the Playbook script set for Ansible operations.
On the right, you can see that this is a sample file with a task field. Ansible, which generates tasks for each command you need to execute in your script. In addition, asynchronous execution between the first task and the second task can also be supported, and error interrupt can also be supported. For example, if my intermediate execution fails and exits, I can ignore the error and continue executing.
The entire Playbook script set can also be instantiated based on variables. For example, I am using a timestamp variable here. I can define a timestamp by var field, and the value is the default value of the variable. When I run the ansible-playbook command on the left, I can pass the timestamp variable with the -e parameter, and each time a new constructed operation instance will be generated.
So in this case, we replace the image of the original server to operate Ansible, and let Ansible operate the corresponding machine to perform an image replacement. As you can see, after the image is pushed to the image library, the Ansible batch operation server is operated, the controlled server pulls the image, and then the original container is deleted, and the new container is run based on the new version.
So, Ansible helps us realize the dream of batch operation servers, but it’s a little blunt because it’s just an automated operation tool.
We are running a container, and sometimes we may run into situations where load balancing requires configuration of upstream environment variables. Especially in the case of toB, where you’re doing micro fronts, or where services are referring to each other, you have to have a set of environment variables. Upstream values are not fixed.
For example, I now have a system where the front page, when we want to access /user, we want to access the /user configured machine; So when you access /a, you go to the container of/A. At this time, if you only use their own good, assuming that the customer site, the customer site environment is more complex, you do not know what he will use to deploy. “(upstream)” in this case, you are writing the specific value dead, which is not helpful for later deployment. The actual deployment of this value is not fixed, the network segment of the environment and DNS will affect this value.
So we want to have an environment ecology for the container to operate, a soil that is highly isolated from the container. The container can solve some of these complicated problems in its own ecology. So when we go to the customer deployment, we only need to take this set of environmental ecology to the customer deployment, and the environmental ecology can go to the perfect to achieve a 1:1 reduction.
In fact, when we go to update the container, we will delete the original container, and create a new container. In the middle of the time if the user to access, is down, the service is not up to access. In this case, it’s actually relatively good. If even your new version of the container doesn’t get up because of an error, then you’ve got a big problem, and it’s an accident. Rolling back is also a hassle.
So we want the service to launch without downtime. If a new release fails, you can also automatically terminate the new release and roll back to the older version.
While maintaining one of the above operating ecosystems, you also want to maximize the use of machine resources. For example, we have multiple sets of environments that are isolated from each other and unaffected by each other. For example, our daily development has test environment, development environment. To maximize server resource utilization, a set of highly configured servers can be deployed to address both environments.
We hope to achieve such an ideal state. In this case, Kubernetes, or K8s for short, was chosen.
Kubernetes is a container layout tool. It gives you a container ecology that is like a complete home, so it is called container layout tool. In Kubernetes, it’s used to manage the servers where you run containers, and it’s managed in a cluster manner. Each of your servers is a node. It can automatically schedule the deployment of services according to the remaining resources of a node in your cluster, as well as the labels you give it manually and other influencing factors, so as to maximize the utilization of our server resources.
For application updates, it is also possible to publish the service in a rolling manner, with no interruption of access between the new and old versions during the release, i.e. no downtime. This technology is also relatively popular, now a lot of companies are also in use.
Container cluster arrangement Jenkins + K8S + Docker + Nginx
So the top left corner is what happens when K8s runs. You can see that k8s, after starting a new container, makes sure that the new container is ok to kill the original container. This ensures accessibility and ensures no downtime due to service restarts.
The lower left figure shows how K8S adds a new node. As you can see, you only need to operate a remote access through the API to join the cluster, and it’s all serviced. Add based on key, IP and port. It is decoupled from the other nodes in the cluster and the master control node in the cluster. Delete it if you want, add it if you want, just think of it as a service.
On the right is a flowchart of k8S integration. As you can see at the bottom, when the user commits the code to Git, the Webhook is triggered. Or you can trigger Jenkins manually. Jenkins built the image, uploaded the image to the artifact library, and then called the K8S cluster for a artifact deployment.
This time tell K8s: my mirror version has been updated, give you the mirror address of the new version. At this point, K8s makes a change to the mirror version in the configuration file. At this time will take the new version to the image kula to get a new image, pull after the original POD will be deleted and replaced.
You can sort of think of pod as roughly equal to a container, but it’s not really equal to a container. Pod in K8s not only has the concept of container, it can also have multiple containers, it is also a minimum node that can be scheduled in K8S, and also has the network allocation permission, so it is not equal to a container.
You can see the strategy for rolling upgrades. The image on the left was originally 4 blue pods, we first drop the blue one, then create the new one, ok; Then you have three left, you kill one of them, you get a new one; There are only two left and then you kill one of them, and then you start a new one, and then you complete the deployment. This is a way of rolling release.
After the upgrade, K8S also introduces a concept called Ingress, which is somewhat similar to Nginx load balancing. You can see here, when the browser goes to visit it, it goes through a bunch of retweets, and it goes to the K8S cluster; After the K8S cluster comes in, it first goes to the Ingress. To the left of the path is path = /, and then to the right of the path is service. The one on the right not only has a path = /, but it also has a grayscale cookie. Indicates that he is a grayscale user and gives him grayscale access to a service. This is where the ingress comes in. He can do grayscale publishing, or he can do PATH forwarding.
K8s provides such a complete operating ecosystem for containers. Domestic large factories are basically using K8S including serverless, some through K8S to carry out a underlying implementation. This arrangement is time – saving, labor – saving and worry – saving.
Of course, K8S is not the best solution. Serverless deployment will be easier in the future.
For example, the lower left corner of the elastic scale, pay-as-you-need. We still need to manually add and delete the existing K8S nodes. It does not automatically scale up nodes based on traffic to your service. But it can be done in Serverless, because the real savings come when physical resources are scaled. Serverless function computation, which also addresses part of a stateless operation and service, is also cheaper and faster. Of course, he’s also based on Docker
Domestic cloud server vendors have also provided rapid deployment support for some commonly used front and back end frameworks. Examples like nex.js, koA.js, and egg.js support rapid deployment of a common framework. It will be much easier to deploy.
Back to the start
So that’s one of our evolutionary processes. It can be seen that in order to improve the efficiency of the project, predecessors tried every means to optimize and improve.
Let’s get back to the main point: Why automation?
Automation is not about slacking off, it’s not about doing for the sake of doing, it’s not about swiping KPI/OKR. But to make the product and requirements can be more quickly online verification, more early analysis of user information, requirements better match users. So that’s one of the goals that we want to automate.
The reality is harsh
The reality is harsh. The rollout is automated, but the time required from development to rollout has no impact at all. So you have to think about: how much time do you spend testing? Is there a lot of manual testing?
Unable to make the last mile online, like a classic little program. Small program online is different from the previous development online, the last kilometer how to get through?
Input and output are not proportional, originally your business is not very hot business, a month deployment can not be several times, there is no need to do CI/CD? How significant is it?
For some specific scenarios, current open source tools are still unable to get through such a build. The last mile has to be done manually. And when automation is done, the impact is sometimes minor. Therefore, we should choose the right tools to carry out a transformation according to the business, including not limited to the expansion of existing tools, or can consider self-research.
Be a platform, not a tool
So again, be a platform, not a tool. I’ll give a separate example, like IDE integration: One-click publishing for small program IDE support, is a good example. It is a linkage between IDE and CI/CD, including many large manufacturers are also developing WebIDE, also for this consideration. There is also a correlation and binding of our daily task cards, which can better track the development of requirements.
Like automated testing: our rollup time has decreased, but our testing time has not. A lot of manual testing is still a waste of cost, which we can consider to optimize;
According to a strategy of gray release of different users: for example, based on the region of the user portrait, gray release based on some other factors; There’s also a monitor like the service, the alarm.
Like data measurement platform: for example, we can count build times, release times, etc., in order to better optimize. Even the very nice process visualization drag: CI/CD process choreography is hot right now. These are just some examples of how CI/CD fits into the business.
Tools are a relatively abstract platform, but business scenarios are complex and changeable, and we need to use tools to evolve into our own platform. Docker, Jenkins, Nexus, Kubernetes all have their own OpenAPI, so you can customize the encapsulation according to your business needs.
Continue to develop
So to summarize: automation is a means to a problem, not a consequence; Properly doing the data measurement statistics can help you to a better process improvement; Choose the right tools to automate.
Q&A
Here’s a q&A:
Visit recorded: live. Juejin. Cn / 4354/595741 39 minutes and 53 seconds
Small volume of promotion
And then I’m going to promote my little book here. Portal: juejin.cn/book/689761…
Recommended books
Here are two recommended books.
The first one is K8S advanced actual combat, the individual is often used when looking up the relevant knowledge, but the official website of the document to see
The second is the Devops Experimental Guide, which is also very helpful for building your automated processes, Devops culture, and measurement platform.
Recommend platforms & projects
This is a referral platform and project. There is no advertising here, if you have self-research interest needs, you can refer to the above platform function design.
The first is Huawei Cloud’s DevCloud. Personally, I feel that their CI/CD aspect is quite complete, including visual drag and drop like CI/CD, and also support for pipeline interrupt execution. Our tasks can form an assembly line, just like a factory assembly line. Tasks in the pipeline can also support parallel execution.
The second Baidu efficiency cloud. The frequency of this maintenance is relatively low, and they are characterized by the development of data links. For example, when the code is committed, the commit message fills in the card ID, and the card monitors that the requirement code has been committed to the code base. When building, it will also bind to the requirement card. The business side can clearly see the status and progress of the development and rollout of the requirements. A single card can be used to connect all progress states.
The third KubeSphere is a cluster management for K8S. Including k8S service monitoring, CI/CD choreography. This one is open source and also domestic.
Thanks
That’s all for my share, thank you!