Traditional banks have a large number of supplier systems, and even more, our business units, rely heavily on suppliers for continuous customized development and repair.
Generally speaking, continuous delivery or automated deployment are the independent development with the latest architectural thinking, or the rapid deployment of applications using Ansible, blue-green deployment, container and other methods.
But the design of most vendor systems is antiquated and changes rely heavily on database updates (much of the logic is done through stored procedures). With such a system, achieving continuous delivery requires a different approach.
Our managed supplier system is a single application serving 10 countries and regions around the world including Asia and Europe, which is based on. NET and Oracle database development, running on Windows Server and IIS Server. More than 20 changes need to be released to the production environment each month. 支那
We’ve always followed the pace of monthly releases. The scope of each release is large and the risk is high, which requires the business departments around the world to work overtime to cooperate with the test and the execution time is long. Since the maintenance window (not in business use, downtime) is only on weekends and weekdays from 6am to 8am, the only way to increase the release frequency is to rely on overtime, which is obviously not sustainable.
A, goals,
To steer departments toward DevOps, the company has a clear goal of doubling the number of releases and halving the number of failures. While we don’t focus on the numbers, it gives direction and pressure to all departments. For us, it became a goal to launch during the workday without having to work overtime.
2. Important discoveries
We did an in-depth analysis of the system and the release process and came to the following important conclusions:
Most releases are data patches, which affect only a single country or region and are small in scope and low in risk. By releasing these data patches on a business day, we would have greatly accelerated the realization of business value, increased business unit satisfaction, and solved our “80% problem” (Plato’s Law). The few other patches with global impact continue to be released on a monthly basis (over the weekend), but with most of the data patches removed, their scope has been greatly reduced and their risk reduced. 支那
Also, since data patches only affect a single country or region, we don’t have to stick to the global maintenance window of 6 a.m. to 8 a.m. Because the business time of each country or region is different, we can make the corresponding data patch release time window according to the business time of each country or region, such as 11:00 PM to 8:00 the next day in Hong Kong, 2:00 am to 4:00 PM in Europe, which gives us greater flexibility.
III. Implementation Plan
In the past, after the vendor provided the patch, our IT team needed to manually deploy IT to the test environment and then notify the business testing. The process is laborious and of little value. Considering that the supplier on site staff can access our network and GitHub, we considered automating this process and drew up the following pipeline design:
3.1 Test environment
Goal: The IT team doesn’t need to get involved
- The vendor on site person submits the patch release to GitHub’s test branch, triggering the corresponding Jenkins Job.
- Jenkins Job triggers the company’s own automated database deployment tool to deploy the scripts to the appropriate test environment.
- We require vendors to provide validation scripts along with patches that will be executed at deployment time.
- Upon verification, we notify the business to start the acceptance test (UAT);
- The validation fails, the validation script throws an exception, and the deployment tool notifies us to roll back.
3.2 Production environment
Goal: Automatically publish in the maintenance time window
- When the UAT for the patch passed, we incorporated it into GitHub’s master branch and initiated the release approval process according to company regulations.
- Set the scheduled execution time of the corresponding Jenkins Job (the next business day maintenance window for the country or region covered by the patch);
- Jenkins Job triggers the automatic database deployment tool to deploy the script to production.
- After the verification script is passed, the release is successful;
- If validation fails, we will be notified to go online and roll back.
Four,
Through this program, we have achieved the following goals:
- Most of the data patches do not need to wait for monthly release to go live, enabling continuous delivery, greatly accelerating the realization of business value and improving business satisfaction;
- IT is freed up from the laborious, low-value task of deployment to do more valuable things, such as business requirements or failure analysis;
- Patch release is no longer a high risk event, business and IT are no longer jitter-shy with each release;
- Significantly reducing overtime hours;
- The number of releases per month increased from one to two to more than a dozen, which reduced the number of release-related failures and met the company’s DevOps goals due to the small scope and low risk of each release.
Five, landing difficulties
The practice process is not always smooth sailing, we also encountered a lot of problems in the process, this part can share some of our approach to the landing difficulties, for reference.
5.1 Progress Guarantee
With daily delivery and maintenance taking up almost all of our work time, important but not urgent tasks like the ones mentioned in this article often go unanswered. To avoid this, we used daily station meetings to discuss landing details and track progress.
We divide the meeting into two categories: first, third and fifth to discuss daily work; Two and four discuss the implementation of this assembly line.
This way we were able to make sure we made progress every day and achieved our goals in a little over a month.
5.2 Automatic acceptance
Since the release to production may be automated during off-hours, we need to have an automated acceptance process for the release to ensure that the system works properly after the release; When the acceptance fails, the publication fails, notifying us to perform the rollback operation.
This requires vendors to provide acceptance scripts along with patch scripts as part of the deployment to automatically test whether the results of the patch deployment meet expectations.
For example:
DECLARE
NumberOfPatch_Actual number;
NumberOfPatch_Expect int:=2;
BEGIN
select count(*) into NumberOfPatch_Actual from USERS where a.user_id = 'USERA';
If
NumberOfPatch_Actual <> NumberOfPatch_Expect
then
RAISE_APPLICATION_ERROR(-20001,'Error, This is testing for the log capturing');
END if;
END;
/
This idea of test-driven programming (TDD) in a disguised way requires vendor compliance.
5.3 Automatic rollback
With automatic acceptance, how to achieve automatic rollback, so that the whole process is fully automated, is one of our considerations. However, since the patch execution process has several steps and it is difficult to predict each failure scenario, we ultimately decided that it is safer to roll back manually.
5.4 Notification of failure
When the acceptance script is not running out of date, it means that the release fails. How to let us get timely notice outside working hours and immediately go online to perform rollback operation is the key to ensure the normal operation of the system.
But at present, our assembly line system can only send email notifications, and there is no instant notification means for mobile phones, so we still need to find more effective means.
Six, summarized
Much of what is often referred to as continuous delivery and automated deployment solutions are for homegrown development and applications.
For obsoleted-designed vendor systems or legacy systems, where most changes are implemented on a database, continuous delivery may require a different approach.
Understand your system and release process thoroughly before considering a solution; Use daily station meetings to continuously discuss landing details and track progress to ensure goals are achieved.
Author: Liu Hua, IDCF Community Sharing Guest
The author of Operation Cheetah: A Journey of Agility in the Smoke