What is Database Migration

Database migration uses code to define changes to a Database. DB migration codifies and automates changes to the database, as opposed to writing SQL manually, and makes it easy to synchronize those changes across different environments.

The problems we have

In the early days of our product deployment to EC2, human SSH performed migration on the machine, which was not ideal but acceptable at the time.

Later, as the team grew and the deployment process became automated, and CI deployment was applied to the Kubernetes cluster, continuing to rely on manual migration brought us several problems:

  • Need to communicate and easy to miss: our product will cut a release to Producton every week. However, not every release contains database changes, so the development team needs to inform the operations team that a release will handle migration, but we have encountered several instances where migration was missed due to communication failures.
  • Migration may be terminated: In the case of insufficient Kubernetes node resources, pods may be reschedule to other nodes and the original pods terminated. If it’s in use at the momentkubectl execWhen you run migration, it is very likely that the process will be interrupted and an unexpected “intermediate state” will occur.
  • The database needs to be backed up: To ensure data security, we want to fully back up the database before each migration, but relying on manual processing is not only a waste of valuable engineer time, but also prone to human errors.

In summary, we decided to automate the process of performing migration.

Automate execution using jobs

To automate the execution of migration, we added a Job resource in the Helm Chart, creating a new Job with each deployment.

Different from ordinary jobs, a migration Job must ensure that:

  1. Pod has a high priority and does not interrupt as much as possible.
  2. Automatic retry is prohibited. If the retry fails, manual intervention is required to ensure data security.
  3. Naming is unique and a new Job should be created for each Helm deployment.
  4. Each deployment generates a new Job. Therefore, you must delete the Job after a period of time.
  5. The database root password is required.

Based on these requirements, we write a Job that looks something like this:

{{- $fullName : = printf "migration-%s" (include "chart.fullname" .). -}}
---
apiVersion: batch/v1
kind: Job
metadata:
  name: {{ $fullName }}. {{ .Release.Revision }}
  labels:
    # slightly
spec:
  ttlSecondsAfterFinished: 172800 # 2 days
  backoffLimit: 0
  template:
    metadata:
      # slightly
    spec:
      priorityClassName: high-priority-class
      terminationGracePeriodSeconds: 86400 # 24 hours
      restartPolicy: Never
      containers:
        - name: php
          # slightly
          env:
            - name: DB_ROOT_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: {{ include "chart.fullname" . }}-db-root-password
                  key: dbRootPassword
          args:
            - php
            - artisan
            - migrate
            - --force
Copy the code

You can see:

  1. In PodSpec, we have configured the super-longterminationGracePeriodSeconds, allowing enough time for migration to finish, even if terminate is signaled. More importantly, we give this Pod a very high priority (priorityClassNameWhen Kubernetes starts to expel pods due to resource problems, it can ensure the normal operation of high-priority PODS as much as possible.
  2. The Job ofbackoffLimit: 0Pod,restartPolicy: NeverTo ensure that the Job does not retry and the Pod does not restart.
  3. The name of the Job is tied to revision of the Helm Release, so it is guaranteed to be unique on each deployment.
  4. The Job ofttlSecondsAfterFinished172800Seconds: The Job is automatically deleted 48 hours after the Job is successfully executed.
  5. In the ContainerenvThe inside is injected with the nameDB_ROOT_PASSWORDThe value of the environment variable is referenced from a separate secret rather than written in plain text to ensure security. Regarding the storage and management of confidential Information, we have previously introduced relevant schemes. Please refer to our previous work”Use SOPS to manage Secret”.

In addition, if you are also using Laravel, migration with APP_ENV as production will have an interactive prompt like this:

**************************************
*     Application In Production!     *
**************************************

 Do you really wish to run this command? (yes/no) [no]:
 >
Copy the code

There is no TTY in Pod, so don’t forget to add the –force parameter to the PHP artisan Migrate command to force it to run directly in production.

User-defined Migrate command

Since Laravel’s built-in PHP Artisan Migrate command was not fully available, we built a layer of encapsulation on it and customized additional features.

Automatically enters the Maintenance state

To ensure data consistency, there should be no business requests to update data during migration. Therefore, before starting Migrate, the application needs to be in maintenance; After the Migration completes successfully, the service should be restored. So we called PHP Artisan Down and PHP Artisan Up before calling PHP Artisan Migrate. We also specify the –message option to identify maintenance reasons:

Artisan::call('down'['--message'= >'running migration'].$this->output);
// ...
Artisan::call('up'[],$this->output);
Copy the code

Automatic Data Backup

The database we use is AWS RDS. To prevent any unexpected problems with the automated migration, we created a Snapshot for THE RDS by calling the CreateDBSnapshot API through the AWS PHP SDK in the Migrate: Privileged command. If the migration fails, we can use Snapshot to roll back the data to the pre-migration state.

Do not execute multiple Migration tasks simultaneously

Although the probability is extremely small, consider the following two situations:

  1. The two CI pipelines were very close in time, the second migration job was created and started at the same time before the migrations of the former were completed.
  2. If a problem occurs, you need to manually perform migration while the other Migration job is still running.

To avoid these two possible data anomalies, we also add a mutex to the command so that only one instance can be running at a time.

Slack notice

Finally, we added Slack notifications to the entire process so that engineers could know the status of migration execution in real time:

By listening for the \Illuminate\Console\Events\CommandFinished event, the message is notified to Slack after PHP Artisan Down and up are called.

final class MaintenanceNotification
{
    public function handle(CommandFinished $event): void
    {
        if(! in_array($event->command, ['up'.'down'].true)) {
            return;
        }

        if ($event->command === 'down') {
            $message = $event->input->getOption('message') = = =null ? 'maintenance' : $event->input->getOption('message');
        }

        if (app()->isDownForMaintenance()) {
            $color = 'warning';
            $text  = "API is now DOWN for $message.";
        } else {
            $color = 'good';
            $text  = "API is now UP.";
        }

        // Send to Slack webhook...}}Copy the code

Welcome to follow our wechat official account “RightCapital”