Ali Cloud efficiency technology experts share: cloud native development, commissioning and reliable release solutions

Introduction: Efficient development, robust release.

In the cloud native environment, the Kubernetes-based toolchain simplifies many of the daily chores for developers, but also brings many new concepts and changes in the way they work. This article will focus on cloud native infrastructure and talk about how to efficiently conduct development commissioning and release in a cloud native development process.

First, in a general sense, as a developer, what kind of development process do you expect?

I understand the ideal development process

Based on this ideal R&D process, what issues and solutions will be encountered in development, debugging, and release when the R&D infrastructure is migrated to cloud native and microservice architectures?

A typical development process consists of three phases: development, testing, and deployment.

Development is primarily about writing and testing code. Once you’ve written your code and unit tests, you need to perform functional validation in a runtime environment. The local IDE provides a lot of functionality for commissioning, and local services can be restarted quickly. Being able to write, start, and debug entirely locally is the ideal way to work, rather than deploying your code to a remote test environment.

In practice, in order to verify a specific functional scenario, it is often necessary to cooperate with other external dependencies, such as what other services the service I am writing currently depends on; What other services need to call my service before I can do a full validation?

These issues need to be addressed in order to truly enjoy the benefits of local development.

Testing generally refers to test validation in a variety of automated test and acceptance environments within a CI environment, and the focus of this article is not there, so we assume that we have done that. It’s time to deploy.

We have a fair amount of confidence in the quality of the release, though it has been proven a lot. However, once a release goes live, it is inevitable that some defects will be introduced from time to time, so how to minimize the impact of these problems is called robust release.

First, let’s focus on developing and debugging in the cloud native.

With the proliferation of microservices and various open source service components, today’s software systems more or less consist of several independent service entities connected to each other through interface calls. Therefore, local service testing inevitably involves the interaction with other upstream and downstream services. In particular, when the full function verification is performed, all services of the upstream and downstream links need to be started locally. However, as the system evolves and the number of services increases, the local resources quickly become unable to support the overall system startup. So, is it possible to concatenate common service nodes and local services in the test environment into a complete test link?

In the cloud-native environment, the test environment is isolated from the Kubernetes cluster network boundary. Access to services in the test cluster from outside the cluster requires the unified Ingress gateway, and only a small part of services with a gateway route can be accessed. Also, because the developer’s local host usually does not have a public IP address, there is no way to connect to the local service instance from the test environment.

To this end, Cloud Efficiency created the KT-Connect tool to solve the problem of network connectivity during local testing. It creates a virtual two-way network path between the developer’s local environment and the Kubernetes test cluster.

Kt-connect is a simple command line tool to use. In the case of connecting to the test environment from a local location, it provides a connect command that uses a Pod node deployed in the cluster as a network proxy to directly access any Service domain name, IP address, and IP address of any Pod in the cluster from the local network. In the case of local access from the cluster, KT-Connect provides the Exchange command to direct all incoming cluster requests to the specified service to the specified local port through another reverse proxy node.

For individual developer usage scenarios, the above two commands are perfectly adequate for everyday work. However, in the case of team development, new problems arise. When one developer uses an Exchange command to redirect traffic locally to a particular service instance, all other developers working in the same cluster are affected. To avoid such interference, KT-Connect created a third command mesh, which is similar to the exchange command, but does not import all the traffic in the network to the developer’s local environment. Instead, it directs only the required test traffic to the developer’s local environment based on specific grid rules. So as to realize the maximum utilization of testing environmental resources and peaceful coexistence of multiple purposes.

In essence, KT-Connect is implemented using the four-layer network proxy capabilities of Kubernetes native command-line port forwarding and open source SSH tools, without any intrusion to the application itself. We have now made all of its source code open source on Github.

Next comes the release.

Rolling releases are built into the cloud native infrastructure to ensure that the release itself is reliable and elegant. However, this pattern has some problems, such as the release and rollback time is relatively long, and can not be provisionally observed business status. An advanced mode is a blue-green release, where a new copy is launched and all traffic is switched to the new version. This release and rollback are faster, but all traffic is switched at once, without incremental verification. The Canary release solves this problem by gradually importing traffic to the new version through routing control. But it is generally a percentage of traffic, so there is no way to specify a specific group of people to use the new version.

A more controlled canary approach would require each user to be given a traffic flag, such as a cookie. That is, through a mechanism similar to Interpcetor, to determine whether the current user should be a grayscale user, if so, set a cookie to him, all subsequent traffic from the user will carry this cookie. With this traffic flag, you can determine whether the request should go to the new version or the old version based on the value of the cookie at the traffic inlet.

With this routing mechanism, it’s still not enough. This is because our actual application is not a single service, but rather consists of multiple services calling each other as shown in the diagram. For example, when I publish service B, because service B is not directly oriented to the browser, I cannot receive cookies from the user. This is where an automatic flow meter transmission mechanism is needed. The general approach is to place the grayscale marker in a ThreadLocal at the entry point of the request, and then pass the value of the ThreadLocal in a cookie at the exit of the application, such as an OkHttpClient call.

Now that we’ve understood how full-link controlled Canary publishing works, let’s look at how it works technically. At Alibaba, we use a technology called unified access, where all requests (both inbound traffic and inbound traffic between internal services) go through the access, and unified access decides where to send the request.

In the cloud native era, the concept of Service Mesh emerged, which is essentially a “distributed unified access”. This unified access is no longer a centralized service, but a process deployed with each instance of each service that receives incoming traffic from the instance and forwards it to the actual service. It also intercepts the exit traffic of the instance and determines who the next hop should be.

Istio is a widely adopted implementation of Service Mesh.

With this in mind, you can see what a release process looks like in the figure above.

This release involves multiple updates to Kubernetes resources, which can be complex and error-prone if done entirely using native commands and manual configuration. For this reason, cloud efficiency products all kinds of common cloud native publishing patterns, developers only need to configure some simple publishing and routing rules, can easily achieve a secure and controllable publishing process.

The original link

This article is ali Cloud original content, shall not be reproduced without permission.

Ali Cloud efficiency technology experts share: cloud native development, commissioning and reliable release solutions

Related Posts

Web page screenshot plug-in

Vue3 learning (6) Vue3.X life cycle

React Advanced series: Hooks design Motivation