The author | heron, county treasure
** Recently, THE CNCF Technical Oversight Committee (TOC) voted to accept Argo as an incubator level managed project. As a new addition, Argo focuses on Kubernetes native workflow, continuous deployment, and more.
The Argo project is a collection of Native Kubernetes tools for running and managing jobs and applications on Kubernetes. It provides a simple combination of the three computing patterns for creating jobs and applications on Kubernetes — the service pattern, the workflow pattern, and the event-based pattern. All Argo tools are implemented as controllers and custom resources.
Ali Cloud Container Service is one of the early teams using Argo Workflow in China. During the production process, a lot of performance bottlenecks are solved and more features are developed to give back to the community. The team members are also part of Argo Project Maintainer.
Argo Project: Workflow for K8s
Directed Acyclic Graph (DAG) is a typical computer graph theory problem that can be used to simulate interdependent data processing tasks, such as audio and video transcoding, machine learning data flow, big data analysis, etc.
Argo first became known in the community through Workflow. Argo Workflow’s project name is Argo, the original project of the Argo Organization. Argo Workflow focuses on Kubernetes Native Workflow design with a declarative Workflow mechanism that is fully compatible with The Kubernetes cluster via CRD mode. Each task is run in the form of Pod. Workflow provides a dependency topology, such as a DAG, and combines multiple workflows using the Workflow Template CRD.
A typical DAG structure is shown above. Argo Workflow makes it easy to build an interdependent Workflow based on user-submitted choreography templates. Argo Workflow handles these dependencies and runs them in the order specified by the user.
The Argo CD is another recent project that is better known. Argo CD is geared towards Gitops processes and addresses the need for one-click deployment to Kubernetes via Git, with the ability to quickly track and roll back version identifiers. The Argo CD also provides the multi-cluster deployment function to overcome the problem of deploying the same application in multiple clusters.
Argo Events provide declarative management based on Event dependencies and Kubernetes resource triggers based on various Event sources. A common use of Argo Events is to trigger Argo workflows and generate Events for long-running services deployed using the Argo CD.
Argo Rollout is a project born to address multiple forms of deployment. Argo Rollout can realize a variety of grayscale publishing methods, and combine Ingress, Service Mesh and other methods to complete flow management and grayscale testing.
Argo subprojects can be used individually or in combination. In general, using multiple subprojects in combination can bring out more of Argo’s capabilities and achieve more functionality.
Problems and solutions encountered in using Argo
Ali Cloud was the first to land Argo Workflow. When using Argo Workflow, the first problem is permission management. Argo Workflow Each specific task is executed using a Pod, and a Sidecar container is used to monitor the main task. Here, the sidecar listening mode is realized through mount Docker. Sock, which bypasses the Kubernetes APIServer RBAC mechanism and cannot achieve accurate control of user rights. We worked with the community to develop and implement the Argo Kubernetes APIServer Native Executor function. Sidecar can listen on the APIServer through the service Account to get the dynamics and information of the master container. Realize Kubernetes RBAC support and authority convergence.
Argo Workflow At each step of the DAG resolution process, all Pod states are scanned against the Workflow Label to determine if further action is required. However, each scan is executed in sequence. If there are many workflows in a cluster, the scan speed is slow and Workflow tasks wait for a long time. Based on this, we developed the parallel scanning function, and parallelized all scanning actions with Goroutine, which greatly accelerated the workflow execution efficiency. Reduce the original 20-hour task to 4 hours. This feature has been feedback to the community and released in Argo Workflow V2.4.
In real production, the more steps Argo Workflow executes, the more space it takes up. All execution steps are recorded in the CRD Status field. When the number of tasks exceeds 1000 steps, a single object becomes too large to store into the ETCD, or the APIServer will be overwhelmed by traffic. We worked with the community to develop status-compression technology that can compress Status strings. The compressed Status field is 1/20 of its original size, enabling large workflow runs of more than 5,000 steps.
The landing practice of Argo in gene data processing scene
AGS (Ali Cloud Gene Computing Service) is mainly used for secondary analysis of genome sequencing. It only takes 15 minutes to complete the whole process of 30X WGS gene alignment, sequencing, weight removal and mutation detection through AGS accelerated API, which is 120 times faster than the classical process. It is still 2-4 times faster than the current fastest FPGA/GPU scheme in the world.
By analyzing the mutation mechanism of individual gene sequence, it can provide strong support for genetic disease detection and tumor screening, and will play a huge role in clinical medicine and gene diagnosis in the future. The total human genome is about 3 billion base pairs, and a 30X WGS sequencing data volume is about 100GB. AGS has great advantages in computing speed, accuracy, cost, ease of use and integration with upstream sequencers. It is also suitable for SNP/INDEL and CNV structural variation detection of DNA, as well as DNA/RNA virus detection.
The AGS workflow is implemented based on Argo and provides a containerized native workflow for Kubernetes. Each step in the workflow is defined as a container.
The workflow engine is implemented as Kubernetes CRD (Custom Resource Definition). Therefore, you can use Kubectl to manage workflows and integrate them locally with other Kubernetes services, such as Volumes, Secrets and RBAC. Workflow controller provides complete workflow functionality, including parameter substitution, storage, looping and recursive workflow.
Alibaba Cloud uses Argo Workflow to run data processing and analysis business on Kubernetes cluster under the scenario of genetic computing, which can support more than 5,000 large Workflow steps, and can be 100 times faster than the traditional data processing method. The customized Workflow engine greatly facilitates the efficiency of genetic data processing.
Author’s brief introduction
Chen Xianlu, ali Cloud technology expert, has been a Contributor for Docker&Kubernetes for many years. He is also a Kubernetes Group Member and the author of “Write Docker by Yourself”. Focus on container technology choreography and basic environment research. Love to toss source code, love open source culture and actively participate in the development of community open source projects.
Jun Bao, Kubernetes project contributor, Kubernetes and Kubernetes-SIGS community member. I have years of practical experience in the container and K8s fields. Currently, I am working in the Container Service team of Alibaba Cloud Computing. My main research direction is container storage, container choreography, AGS products and other fields.
AGS Trial link: help.aliyun.com/document_de…
“Alibaba Cloud originators pay close attention to technical fields such as microservice, Serverless, container and Service Mesh, focus on cloud native popular technology trends and large-scale implementation of cloud native, and become the technical circle that knows most about cloud native developers.”