The little ant says:
Ant Financial has accumulated rich experience and best engineering practices in the continuous delivery of financial grade Internet products. ATEC technology to explore conference in 2018, the ant gold take the solution architect Lv Zhongbang (rev) was based on industry background, analyzes the financial level Internet products continuous delivery at the heart of the challenge, from “faster and more early delivery value” and “keep technical risk the bottom line guarantee delivery quality” two dimensions to share the best engineering practice to respond to these challenges the ants, Finally, the practical experience of ant r&d performance platform supporting continuous delivery is introduced. Follow the ant to learn
I. Industry background and main challenges
Under the background of digital transformation, enterprises need to build various core capabilities, which objectively require enterprises to upgrade or adopt a new generation of technology architecture. A very important part of this is cloud-based infrastructure, continuous delivery under distributed architecture. When it comes to continuous delivery, it’s easy to think of specific challenges: how to reduce the development and launch time of new business products and respond quickly to customer segments; How to deal with the challenges of complex business scenarios and high concurrency brought by distributed microservices architecture; How to promote automation through technical means to reduce the labor input in the r&d process and so on.
In addition, we need to take a hard look at the characteristics of the industry we are in. The first of the two key words at the core of financial Internet products is “finance”. The most important financial attribute is to ensure capital, safety and high availability, which can be summed up in one word — “stability”; Another key word “Internet”, the most prominent feature is the rapid delivery of value, support business rapid innovation, we boil this down to another word – “fast”. Not only fast but also stable, this is the basic characteristics of the financial Internet industry, seemingly contradictory two aspects, one is indispensable.
Speaking of “steady” and “fast,” how is Ant Financial doing? Share some actual data from last fiscal year: online service availability — 100%; Over 150 app releases per day; The average iteration time was 5.8 days; The test automation rate exceeded 80%; The automation rate of operation and maintenance is over 98%.
Based on the background of digital transformation and the basic characteristics of the industry, we believe that the core challenge of financial Internet products in the field of continuous delivery is: how to balance fast and stable? Agile and fast delivery of value, yet innovative, technology risk baseline, and continued compliance with regulatory requirements.
Agile Delivery – How to deliver value faster and earlier
This chapter is divided into four parts. The first part is lean R&D process customization and diversified branch and release strategy, mainly to solve how our system or process ADAPTS to different business scenarios. The correct path and posture are the basic premise of R&D delivery and efficiency improvement.
The second part is functional servitization, efficient joint investigation and problem diagnosis. These two parts mainly describe how to liberate human flesh and improve efficiency through technology or automatic means.
1. Lean R&D process customization
When it comes to process customization, many people ask: How do we customize the r&d process? One of the things that works well in ants is grading by application. Application grading mainly considers three factors: the amount of calls dependent on the service, the amount of money traded daily, and the daily PV and UV. Based on these three aspects, we define twelve different application levels, from A1 to C4, and then set baseline development rules for each level of application. On top of the baseline rules, we also support each business to customize additional risk management measures.
To give two examples, one is configuring the process on demand. Ant Financial’s business is very complex and diverse. Some core business systems require high stability, and the corresponding technical risk prevention and control measures and test and verification links will be relatively perfect. Conversely, some new business or internal service systems tend to go online faster and earlier, with processes that are relatively lightweight and agile. The second is the pipelined that can be arranged and extended. The component center of the efficiency platform defines a lot of quality inspection components, including third-party or business self-built components. The pipelined template can be customized for different businesses through the orchestration capability of the platform. Some applications force code reviews, others require CI automation testing or specific testing before moving forward, and similar scenarios can be achieved through pipelining.
2. Diversify branches and release strategies
In terms of branching and publishing strategies, Ant has four main ways to play.
The first is the daily release. We compare it to a regular train, which is suitable for scenarios with strong correlation between the core business systems and applications of the whole station.
The second is independent release. We compare it to a car, which can start its journey whenever it wants. It is suitable for scenarios with independent business domains and certain coupling and correlation between applications.
The third is single-application publishing, which we compare to a motorcycle, and is suitable for scenarios where the business is more independent and completely decoupled from other businesses at the architectural level. The first three patterns typically follow the branching development trunk publishing pattern.
The last type of emergency release, which we compare to an ambulance, is suitable for urgent business needs or troubleshooting online, usually in a branch development branch release model.
Through these four modes, all business scenarios of ant are basically covered, and each business can find matching gameplay according to its own needs.
3. Service of functions
Having covered the posture and path of agile delivery, let’s move on to the automation side.
Usually in a RESEARCH and development iteration, there are many functional departments involved. The traditional approach is that the functional teams conduct human risk management based on experience review. For example, when developers have finished coding and self-testing, security, risk control and other functional teams will review them based on experience. “Department walls” can seriously affect the efficiency of collaboration and delivery. In ants, each function team cooperation way is completely different, they are no longer directly involved in the project iteration, bear the iteration test and review this repetitive mechanical activity, but a transition to power output and the automation tool construction, implementation of service functions, the development of business test team can assign, this model greatly improve the efficiency of r&d collaboration.
4. Efficient joint investigation and problem diagnosis
The business scenarios of financial Internet products are very complicated, and it is very time-consuming to build the project environment. For example, if a transaction link involves 20 applications, it is common practice to deploy each application once during the development iteration and finally form a joint tuning environment. Ants do things differently. First, we set up a shared STABLE environment, which only needs to be deployed for applications with changes and modifications during the development iteration. Then, sofarOuter groups all 20 applications together to form a coordinated environment. This not only greatly improves efficiency, but also maximizes the use of testing resources. In addition, when code is released online, the platform automatically updates the STABLE environment to keep it up to date.
How to locate and diagnose problems in such a complex link is also very important if problems are found during joint debugging. Developers can query link diagram and time sequence diagram by TraceID or transaction number to intuitively and comprehensively understand the call interaction information between applications. Combined with service logs, it is easy to find faulty applications and locate the root cause of problems.
Iii. Steady innovation — Keep the bottom line of risk to ensure delivery quality
This chapter is also divided into four parts to discuss, among which technical risk assessment, quality built in, test and verification are carried out in accordance with the logic of development before, during and after. Finally, we share with you how we keep the bottom line of safety and ensure information security.
1. Risk assessment of data enabling technology
There is no doubt that technical risk assessment prior to development is very important. Ant technical risk assessment is based on two main inputs: the first is demand input, and the second is data input and empowerment related to governance analysis, which is more important to us. Developers can easily obtain data such as application dependency, service invocation, message inspection, component control and code retrieval, and comprehensively and accurately assess the technical risks brought by changes. Based on these data and analysis, risk response strategies can be easily determined and effective closed-loop feedback can be achieved.
2. Built-in quality real-time closed-loop
In the middle of development – Built in quality and real-time closed-loop feedback to help developers get things right the first time. Internally, we encourage gitFlow-based best practices to submit code to the project branch or trunk via MergeRequest rather than Push, giving code access and CI inspection a chance. Virtually all nodes and components in the Pipeline Pipeline are choreographed and extensible. After submitting the code, the platform will feed back the results in real time after each component is executed and automatically update the quality data of the iteration to assist the development and testing students to control quality risks.
3. Hierarchical verification of the entire environment and services
After the development, that is, the test and verification part, we share two points: the first is the whole environment verification, from the development “integration” pre-release “gray scale step by step close to simulate the production environment, to ensure that there is no problem in the production and release. The second is business layered verification, in each environment, there is a corresponding test means. For example, many companies are doing pressure testing, but most of them are based on the offline environment, while Ant will directly do pressure testing in the production environment, so as to truly achieve the high availability of the system.
4. Information security guarantee
As the final part of secure delivery, Ant has a complete system to ensure information security: in the requirements design review phase, architects assess business risks; In the development stage, first of all, the safety of SOFA framework itself is guaranteed. Secondly, every code submission we have a safe automatic scan, in addition to a special safety test. Finally, after the system went online, we had monitoring and emergency mechanisms specifically for security.
Iv. AntLinkE support for r&d performance platform
Having covered not only agile delivery but also safe innovation, we will share how the tool platform level supports the overall continuous delivery process.
1. Platform Introduction
What ant r&d efficiency platform does mainly comes down to three aspects:
-
DevOps – One-stop development integration for continuous delivery
-
DevMind – Real-time multi-dimensional data analysis enabling development process
-
DevServices – to provide efficient technical support and consulting services for developers
2. Product picture
The following is the big picture of our products, the top is the business supported by the platform and the value delivered, and the bottom is the main capabilities of DevOps, DevMind and DevServices. Today we will focus on the product layer in the middle.
The first is continuous delivery, supporting the entire R&D life cycle from the start of the creation iteration to the end of the launch. Here are the supporting sub-products: R&D collaborative management projects, iterations, and requirements; Code services provide code hosting, code search, CR and other capabilities; IDE is a characteristic product of Ant, which can help developers to do code scanning in the first time. It also integrates and integrates with the Web side of the performance platform, so that developers do not need to frequently switch the workbench to carry out development work. Test services support test management, automated testing; Fault diagnosis relies on distributed links and service logs to quickly locate and resolve problems. At the bottom is the R&D insight. In the r&d process, both the IDE side and the Web side of the R&D efficiency platform will accumulate a large amount of data, which is a very valuable asset. We will drive the continuous optimization and upgrade of the whole R&D system and organization through collecting, statistics and analysis of these data.
3. One-stop continuous delivery solution
The One-stop continuous delivery solution is shown in figure 1. In the middle is the R&D performance platform, which provides a continuous delivery engine from requirements to release, linking all relevant capabilities and tools together. At the bottom is the application PAAS platform. The efficiency platform interacts with it through openAPI to get through the functions of environment management and environment deployment. On the right side is distributed middleware. The R&D efficiency platform manages the middleware configuration of multiple environments uniformly through the R&D container, which not only realizes the isolation between environments, but also realizes the automatic conversion and synchronization between environments. In addition, the research and development efficiency platform and the technology risk prevention and control platform are integrated, and the technology risk prevention and control measures are implemented in the research and development process. In addition, ant’s R&D efficiency platform is inherently capable of open integration, which can connect with the enterprise’s own tool platform by means of components to maximize the value of existing assets.
Five, the conclusion
Both internally and externally, Ant R&D efficiency platform has always been adhering to and will continue to pursue such an original goal — to improve the happiness of developers, improve the innovation efficiency of enterprises, let us redefine r&d together!