What is Database Migration
Database migration uses code to define changes to a Database. DB migration codifies and automates changes to the database, as opposed to writing SQL manually, and makes it easy to synchronize those changes across different environments.
The problems we have
In the early days of our product deployment to EC2, human SSH performed migration on the machine, which was not ideal but acceptable at the time.
Later, as the team grew and the deployment process became automated, and CI deployment was applied to the Kubernetes cluster, continuing to rely on manual migration brought us several problems:
- Need to communicate and easy to miss: our product will cut a release to Producton every week. However, not every release contains database changes, so the development team needs to inform the operations team that a release will handle migration, but we have encountered several instances where migration was missed due to communication failures.
- Migration may be terminated: In the case of insufficient Kubernetes node resources, pods may be reschedule to other nodes and the original pods terminated. If it’s in use at the moment
kubectl exec
When you run migration, it is very likely that the process will be interrupted and an unexpected “intermediate state” will occur. - The database needs to be backed up: To ensure data security, we want to fully back up the database before each migration, but relying on manual processing is not only a waste of valuable engineer time, but also prone to human errors.
In summary, we decided to automate the process of performing migration.
Automate execution using jobs
To automate the execution of migration, we added a Job resource in the Helm Chart, creating a new Job with each deployment.
Different from ordinary jobs, a migration Job must ensure that:
- Pod has a high priority and does not interrupt as much as possible.
- Automatic retry is prohibited. If the retry fails, manual intervention is required to ensure data security.
- Naming is unique and a new Job should be created for each Helm deployment.
- Each deployment generates a new Job. Therefore, you must delete the Job after a period of time.
- The database root password is required.
Based on these requirements, we write a Job that looks something like this:
{{- $fullName : = printf "migration-%s" (include "chart.fullname" .). -}}
---
apiVersion: batch/v1
kind: Job
metadata:
name: {{ $fullName }}. {{ .Release.Revision }}
labels:
# slightly
spec:
ttlSecondsAfterFinished: 172800 # 2 days
backoffLimit: 0
template:
metadata:
# slightly
spec:
priorityClassName: high-priority-class
terminationGracePeriodSeconds: 86400 # 24 hours
restartPolicy: Never
containers:
- name: php
# slightly
env:
- name: DB_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: {{ include "chart.fullname" . }}-db-root-password
key: dbRootPassword
args:
- php
- artisan
- migrate
- --force
Copy the code
You can see:
- In PodSpec, we have configured the super-long
terminationGracePeriodSeconds
, allowing enough time for migration to finish, even if terminate is signaled. More importantly, we give this Pod a very high priority (priorityClassName
When Kubernetes starts to expel pods due to resource problems, it can ensure the normal operation of high-priority PODS as much as possible. - The Job of
backoffLimit: 0
Pod,restartPolicy: Never
To ensure that the Job does not retry and the Pod does not restart. - The name of the Job is tied to revision of the Helm Release, so it is guaranteed to be unique on each deployment.
- The Job of
ttlSecondsAfterFinished
为172800
Seconds: The Job is automatically deleted 48 hours after the Job is successfully executed. - In the Container
env
The inside is injected with the nameDB_ROOT_PASSWORD
The value of the environment variable is referenced from a separate secret rather than written in plain text to ensure security. Regarding the storage and management of confidential Information, we have previously introduced relevant schemes. Please refer to our previous work”Use SOPS to manage Secret”.
In addition, if you are also using Laravel, migration with APP_ENV as production will have an interactive prompt like this:
**************************************
* Application In Production! *
**************************************
Do you really wish to run this command? (yes/no) [no]:
>
Copy the code
There is no TTY in Pod, so don’t forget to add the –force parameter to the PHP artisan Migrate command to force it to run directly in production.
User-defined Migrate command
Since Laravel’s built-in PHP Artisan Migrate command was not fully available, we built a layer of encapsulation on it and customized additional features.
Automatically enters the Maintenance state
To ensure data consistency, there should be no business requests to update data during migration. Therefore, before starting Migrate, the application needs to be in maintenance; After the Migration completes successfully, the service should be restored. So we called PHP Artisan Down and PHP Artisan Up before calling PHP Artisan Migrate. We also specify the –message option to identify maintenance reasons:
Artisan::call('down'['--message'= >'running migration'].$this->output);
// ...
Artisan::call('up'[],$this->output);
Copy the code
Automatic Data Backup
The database we use is AWS RDS. To prevent any unexpected problems with the automated migration, we created a Snapshot for THE RDS by calling the CreateDBSnapshot API through the AWS PHP SDK in the Migrate: Privileged command. If the migration fails, we can use Snapshot to roll back the data to the pre-migration state.
Do not execute multiple Migration tasks simultaneously
Although the probability is extremely small, consider the following two situations:
- The two CI pipelines were very close in time, the second migration job was created and started at the same time before the migrations of the former were completed.
- If a problem occurs, you need to manually perform migration while the other Migration job is still running.
To avoid these two possible data anomalies, we also add a mutex to the command so that only one instance can be running at a time.
Slack notice
Finally, we added Slack notifications to the entire process so that engineers could know the status of migration execution in real time:
By listening for the \Illuminate\Console\Events\CommandFinished event, the message is notified to Slack after PHP Artisan Down and up are called.
final class MaintenanceNotification
{
public function handle(CommandFinished $event): void
{
if(! in_array($event->command, ['up'.'down'].true)) {
return;
}
if ($event->command === 'down') {
$message = $event->input->getOption('message') = = =null ? 'maintenance' : $event->input->getOption('message');
}
if (app()->isDownForMaintenance()) {
$color = 'warning';
$text = "API is now DOWN for $message.";
} else {
$color = 'good';
$text = "API is now UP.";
}
// Send to Slack webhook...}}Copy the code
Welcome to follow our wechat official account “RightCapital”