How do I migrate from MongoDB to MySQL

In fact, I have been working on database migration for more than a month recently. The project I am currently developing used MySQL as the main database in ancient times. Later, due to some business reasons, MySQL was migrated to MongoDB. In addition, unattended, and MongoDB itself is a database without Schema, resulting in a serious problem of dirty data in the database. The current team members do not have extensive Rails development experience, so they still want to use ActiveRecord plus Migration to impose some strong restrictions on the data and ensure that the data in the database is legitimate.

This paper will introduce the author some of the problems in the process of migration database, and to provide readers need to stop the migration database feasible solution, if you need a non-stop migration database or other solutions are needed to solve, here to provide solutions for millions of data volume of mongo, expected downtime in two hours or so, If the amount of data is in the tens of millions or more, the long downtime may be unacceptable and a non-stop migration should be designed. However, the authors hope that this article will give some ideas to developers who want to do database migration without making any mistakes.

From relationships to documents

Although this article focuses on migrating from MongoDB to MySQL, the author would like to briefly mention the migration from MySQL to MongoDB. If we just import all the data from MySQL into MongoDB, it is actually a relatively simple matter. The most important reason is that MySQL supports a subset of MongoDB data types:

In the process of migration, we can export all data in MySQL in CSV format, and then use Mongoimport to import all data in CSV format into MongoDB:

$ mysqldump -u<username> -p<password> \ -T <output_directory> \ --fields-terminated-by ',' \ --fields-enclosed-by '\"' \  --fields-escaped-by '\' \ --no-create-info <database_name> $ mongoimport --db <database_name> --collection <collection_name> \ --type csv \ --file <data.csv> \ --headerlineCopy the code

Although the whole process seems to be simple enough with just two commands, there are a lot of problems when it comes to actually doing it. The author has no experience migrating from MySQL or any other relational database to MongoDB, but there are a lot of relevant materials on Google. So it’s always a matter of having a lot of people who have done it before, and that experience can save us a lot of time.

Exporting data using CSV is fine in the vast majority of cases, but if some of the documents in the database are stored in rich text, while exporting the data will be fine, some strange errors can occur when importing it.

From documents to relationships

Compared with the migration from MySQL to MongoDB, the reverse migration is more than twice the trouble, this is mainly because many data types and the relationship between sets in MongoDB do not exist in MySQL, such as embedded data structure, array and hash and other set types, many-to-many relationship implementation. Many of the problems can not only be solved by data migration, we need to reconstruct part of the data structure before data migration, the second half of this article will introduce the data structure and logic to deal with.

When we are ready to fully migrate the database to MySQL, we need to do some preparatory work to minimize the work required for the final migration and ensure that the downtime is not too long. The goal of the preparatory work is to eliminate as many complex data structures as possible in the project.

Data preprocessing

There are a number of things to do before migrating. The first thing is to change all embedded data structures to non-embedded data structures:

That is, change the embeds_many and embeds_one relationships to has_MANY and has_ONE, and replace embedded_IN with belongs_to. Meanwhile, we need to change the corresponding tests in the project to such reference relationships. However, just changing the relationships in the code doesn’t really change the data in MongoDB.

def embeds_many_to_has_many(parent, child) child_key_name = child.to_s.underscore.pluralize parent.collection.find({}).each do |parent_document| next unless  parent_document[child_key_name] parent_document[child_key_name].each do |child_document| new_child = child_document.merge "#{parent.to_s.underscore}_id": parent_document['_id'] child.collection.insert_one new_child end end parent.all.unset(child_key_name.to_sym) end embeds_many_to_has_many(Person, Address)Copy the code

We can use the above code will relationship model is converted into a reference for embedded, pat down all complex data relationships, this code running time is associated with the number of embedded relation of the two models, it is important to note that the mongo embedded in the model of data may occur for some reason the same _id conflict happens in the insert lead to collapse, You can use resuce on insert_one to ensure that the code doesn’t stop running because of the above reasons.

Through this code we can easily to embed the original relationship between all the expansion into the relationship between the reference, will be embedded into the relationship between reference in addition to make these two changes, do not need to do other things, both in data query and model creation does not need to change the code realization, but remember to sub models of foreign key adding indexes of parent model, Otherwise, the parent model will cause a full table scan when fetching all the child models it holds:

class Comment
  include Mongoid::Document
  index post_id: 1
  belongs_to :post
end
Copy the code

After dealing with the embedded relationships that are unique to MongoDB, we need to deal with complex collection types, such as arrays and hashes, that we wouldn’t need to deal with if we were using MySQL5.7 or PostgreSQL. Since the latest versions of MySQL and PostgreSQL already provide JSON support, the authors have made arrays and hashes in their projects common data structures.

In this optional process, there is no standard answer, and we can transform different data into different data structures as needed:

For example, to change an array to a string or one-to-many relationship, or to change a hash to a key-value pair for the current document, it is up to our business logic to deal with the set data. While changing these fields, we try to provide an API for the upper layer that is the same as the original. Tags or.

class Post
  ...
  def tag_titles
    tags.map(&:title)
  end
  
  def split_categories
    categories.split(',')
  end
end
Copy the code

This step is also optional, but the code above is intended to reduce the modification burden elsewhere, although if you want to use JSON support in MySQL5.7 or PostgreSQL databases, there is no problem, except that it is inconvenient to query collection fields.

Mongoid’s “little brothers”

Mongoid-enum, mongoid-slug and Mongoid-history will inevitably be used during development with Mongoid, for example, mongoid-enum, mongoid-slug and mongoid-history. The implementation of these plug-ins is quite different from that of plugins with the same functionality in ActiveRecord.

For some plugins, such as Mongoid-slug just insert _slugs fields in the documentation of the model introduced into the plugin, we just need to ignore these fields during data migration and change all #slug methods to # ID without making any other changes during preprocessing. Enumeration is implemented differently in Mongoid’s plugin and ActiveRecord:

Mongoid-enum uses strings and _status to store fields of enumeration types, while ActiveRecord uses integers and status to represent enumeration types. There are some differences in the storage of the underlying data structure, which we will address in a later migration script.

If we use a lot of Mongoid plugins in our project, we can only decide how to migrate them according to their implementation. If we use plugins that support specific functions, it may be difficult to find support in ActiveRecord. Consider temporarily removing some non-essential functionality during migration.

The primary key and UUID

Another important reason we want to migrate from MongoDB to MySQL is that the primary key of each MongoDB document is so long that a 32-byte _ID doesn’t give us much information and only increases our reading difficulty. Plus, there was no MongoDB cluster deployed in the project, so you didn’t get the benefit of using the default UUID generation mechanism.

Not only did we not enjoy the benefits of UUID, it also caused us a lot of trouble in migrating MySQL, partly because ActiveRecord’s default primary key is an integer and does not support 32 byte UUID, If we want to keep the UUID of MongoDB unchanged, migrating to MySQL is fine, except that we need to change the default integer primary key to string and use a UUID generator to ensure that all primary keys are incremented over time and do not conflict.

Using the UUID plus generator approach saves a lot of migration time, but it doesn’t seem particularly elegant, and there are trade-offs and evaluations to be made, but if we choose to use an autoincrement primary key of type INTEGER, we need to do a lot of extra work. Post_uuid = post_uUID = post_id; post_uUID = post_uUID; post_uUID = post_uUID;

In the process of data migration, we will map the original _ID to uUID, and the post_id to post_UUID. By maintaining the relationship between UUID and post_UUID, we ensure that the relationship between models is not lost. There is absolutely no connection between ID and post_id during data migration.

When we iterate through the document in _id order and insert the data from the document into the table, MySQL automatically generates an incremented primary key ID for all rows, and post_id is null.

After all data has been inserted into MySQL, we migrate the relationship between uUID and post_UUID to ID and post_id using the # find_by_uUID query, and remove all uUID related fields. This way we can ensure that the relationships between the models do not disappear and that the relative positions of the data rows are exactly the same as before the migration.

Migration of code

Mongoid uses include to load methods into the current model, while ActiveRecord inherits ActiveRecord::Base to preprocess the data, so we can modify the existing model layer code.

Mongoid::Document: ActiveRecord::Base: Mongoid::Document: ActiveRecord::Base: Mongoid::Document: ActiveRecord::Base: ActiveRecord::Base: ActiveRecord::Base: ActiveRecord::Base: ActiveRecord::Base

# app/models/post.rb class Post < ActiveRecord::Base validate_presence_of :title, : the content of end # db/migrate / 20170908075625 _create_posts. Rb class CreatePosts < ActiveRecord: : Migration [5.1] def change create_table :posts do |t| t.string :title, null: false t.text :content, null: false t.string :uuid, null: false t.timestamps null: false end add_index :posts, :uuid, unique: true end endCopy the code

Note: Add a UUID field of type string for each table and create a unique index for the UUID to speed up the relationship between different data models through uuid.

In addition to building database migration files and modifying base classes, we also need to modify some include modules and Mongoid unique queries, such as date queries using GTE or LTE and pattern matching queries using re. These queries are used in A completely different way in ActiveRecord than in Mongoid, and we need to solve these problems with handwritten SQL.

Mongoid inverse_of is also called foreign_key in ActiveRecord. These changes are not complicated, but if we want to remove all of this code, We need to test the business logic in detail to make sure there are no residual problems, which is a high requirement for our test coverage, but I believe that most Rails projects have very good test coverage to ensure that this part of the code and logic can be migrated smoothly. But if the project has no tests at all or very low test coverage, you can either test on your own or live on your own, or forget about migration, write more tests and worry about refactoring later.

Migration of data

Creating migration files and tables for each model is a bit of manual work, although we can’t skip some of the work, but we can consider using an automated way to add uUID fields and indexes for all models, as well as the corresponding post_UUID column for fields like post_id:

Class AddUuidColumns < ActiveRecord::Migration[5.1] def change rails.application.eager_load! ActiveRecord::Base.descendants.map do |klass| # add `uuid` column and create unique index on `uuid`. add_column klass.table_name, :uuid, :string, unique: true add_index klass.table_name, unique: true # add `xxx_uuid` columns, ex: `post_uuid`, `comment_uuid` and etc. uuids = klass.attribute_names .select { |attr| attr.include? '_id' } .map { |attr| attr.gsub '_id', '_uuid' } next unless uuids.present? uuids.each do |uuid| add_column klass.table_name, uuid, :string end end end endCopy the code

Once the UUID column is added and indexed, we are ready to migrate the database. If we decide to change the primary key of the original data during the migration, we will split the migration into two steps: data migration and relationship reconstruction. MySQL > convert _id to uUID, xx_ID to xx_uUID, and MySQL > convert _id to xx_uUID. Re-establish the relationship between the models by associating uUID with xx_UUID and delete all UUID fields at the end.

We can migrate the data using code that iterates through a Collection from MongoDB and passes the document as a parameter to the block, The obsolete_columns and DatabaseTransformer# delete_update_rename_columns methods should be used to delete some existing columns, update some data columns, and finally, all All id columns become UUID:

module DatabaseTransformer
  def import(collection_name, *obsolete_columns, **rename_columns)
    collection = Mongoid::Clients.default.collections.select do |c|
      c.namespace == "#{database}.#{collection_name.to_s.pluralize}"
    end.first

    unless collection.present?
      STDOUT.puts "#{collection_name.to_s.yellow}: skipped"
      STDOUT.puts
      return
    end

    constant = collection_name.to_s.singularize.camelcase.constantize
    reset_callbacks constant

    DatabaseTransformer.profiling do
      collection_count = collection.find.count
      collection.find.each_with_index do |document, index|
        document = yield document if block_given?
        delete_obsolete_columns document, obsolete_columns
        update_rename_columns document, rename_columns
        update_id_columns document

        insert_record constant, document
        STDOUT.puts "#{index}/#{collection_count}\n" if (index % 1000).zero?
      end
    end
  end
end
Copy the code

When you’re done with the document, the method directly calls DatabaseTransformer#insert_record to insert the data into the corresponding MySQL table. We can directly migrate all documents from a Collection to MySQL using the following code:

transformer = DatabaseTransformer.new 'draven_production'
transformer.import :post, :_slugs, name: :title, _status: :status
Copy the code

This code will ignore the _slugs field for each document in the collection, rename name to title, and rename _status to status. While the mongoid-enum field as an enumeration type is completely different from ActiveRecord’s enumeration type, there is no problem inserting it directly here. ActiveRecord’s model handles the conversion between strings and integers itself when it is created:

def insert_record(constant, params)
  model = constant.new params
  model.save! validate: false
rescue Exception => exception
  STDERR.puts "Import Error: #{exception}"
  raise exception
end
Copy the code

To speed up data insertion and avoid all side effects of insertion, we reset all callbacks during data migration:

def reset_callbacks(constant)
  %i(create save update).each do |callback|
    constant.reset_callbacks callback
  end
end
Copy the code

This code only takes effect while the script is running and does not affect the rest of the project. At the same time, the script prints the current progress to standard output for every 1000 successful model inserts, helping us quickly find problems and estimate migration times.

You can find the complete data migration code at database_transformer. Rb.

After inserting all the data into the MySQL table, there is no explicit relationship between the models. We also need to convert the model connected by uUID to use ID, so that the relationship between objects can be directly accessed by point syntax. Establishing the relationship is actually very simple. We get all attributes of the current class that end in _uUID, then iterate through all rows of data, get the table name based on the uUID value and the “post” part of the post_UUID attribute, and finally get the corresponding association model. Here we also deal with special cases like polymorphism:

module RelationBuilder def build_relations(class_name, polymorphic_associations = [], rename_associations = {}) uuids = class_name.attribute_names.select { |name| name.end_with? '_uuid' } unless uuids.present? STDOUT.puts "#{class_name.to_s.yellow}: skipped" STDOUT.puts return end reset_callbacks class_name RelationBuilder.profiling do models_count = class_name.count class_name.unscoped.all.each_with_index do |model, index| update_params = uuids.map do |uuid| original_association_name = uuid[0...-5] association_model = association_model( original_association_name, model[uuid], polymorphic_associations, rename_associations ) [original_association_name.to_s, association_model] end.compact begin Hash[update_params].each do |key, value| model.send "#{key}=", value end model.save! validate: false rescue Exception => e STDERR.puts e raise e end STDOUT.puts "#{index}/#{models_count}\n" if (counter % 1000).zero?  end end end endCopy the code

After finding the corresponding data row, it is very simple. We call the corresponding methods such as POST = to update the foreign key and save the value of the foreign key directly to the database. Like the data migration process, we will print the current progress during the execution of this code.

When initializing RelationBuilder, if we pass in constants, then call RelationBuilder#build! If not, all subclasses of ActiveRecord will be loaded by default. In ActiveRecord, which is an intermediate class created using has_and_belongs_to_many, we will show how to handle many-to-many relationships separately in the next section:

def initialize(constants = [])
  if constants.present?
    @constants = constants
  else
    Rails.application.eager_load!
    @constants = ActiveRecord::Base.descendants
        .reject { |constant| constant.to_s.include?('::') }
  end
end
Copy the code

The complete code for relational migration can be found in relation_Builder.rb.

builder = RelationBuilder.new([Post, Comment])
builder.build!
Copy the code

MongoDB and ActiveRecord deal with many-to-many relationships in a very specific way, so we need to solve these problems separately. If all migration problems are solved at this point, Then we can use the following migration file to remove all uUID related columns from the database:

Class RemoveAllUuidColumns < ActiveRecord::Migration[5.1] def change rails.application.eager_load! ActiveRecord::Base.descendants.map do |klass| attrs = klass.attribute_names.select { |n| n.include? 'uuid' } next unless  attrs.present? remove_columns klass.table_name, *attrs end end endCopy the code

At this point, the migration process is almost complete, and then there are other issues related to the migration process, such as the importance of multi-pair relationships, and the importance of testing.

Many-to-many relationships

Mongoid: belongs_to_many: Mongoid: belongs_to_many: belongs_to_many: Mongoid: has_and_to_many: belongs_to_many: belongs_to_many: Mongoid: has_and_to_many: belongs_to_many: belongs_to_many:

# The post document.
{
  "_id" : ObjectId("4d3ed089fb60ab534684b7e9"),
  "tag_ids" : [
    ObjectId("4d3ed089fb60ab534684b7f2"), 
    ObjectId("4d3ed089fb60ab53468831f1")
  ],
  "title": "xxx",
  "content": "xxx"
}
Copy the code

In ActiveRecord, a separate table is created. The name of the table is an alphabetical concatenation of two table names. If the table is Post and Tag, the corresponding many-to-many table is posts_tags. Has_and_belongs_to_many will also create two subclasses of ActiveRecord::Base, Tag::HABTM_Posts and Post::HABTM_Tags.

require 'active_record'

class Tag < ActiveRecord::Base; end
class Post < ActiveRecord::Base
  has_and_belongs_to_many :tags
end
class Tag < ActiveRecord::Base
  has_and_belongs_to_many :posts
end
puts ActiveRecord::Base.descendants
# => [Tag, Post, Post::HABTM_Tags, Tag::HABTM_Posts]
Copy the code

Post::HABTM_Tags :: Post::HABTM_Tags :: Post::HABTM_Tags :: Post::HABTM_Tags :: Post::HABTM_Tags We simply need to update the many-to-many relational table by iterating through the posts_tags table after importing all the data in the table using DatabaseTransformer:

class PostsTag < ActiveRecord::Base; end

# migrate data from mongodb to mysql.
transformer = DatabaseTransformer.new 'draven_production'
transformer.import :posts_tags

# establish association between posts and tags.
PostsTag.unscoped.all.each do |model|
  post = Post.find_by_uuid model.post_uuid
  tag = Tag.find_by_uuid model.tag_uuid
  next unless post.present? && tag.present?
  model.update_columns post_id: post.id, tag_id: tag.id
end
Copy the code

All many-to-many relationships using has_and_belongs_to_many need to be migrated through the code above, which needs to be completed before deleting all UUID fields in the database.

The Importance of Testing

Before stopping and migrating online services, we actually need to conduct partial and full tests on the existing data of the database. In the partial test phase, we can prepare a local MongoDB database with 1/10 or 1/100 of the data volume of the production environment. By pre-migrating in an environment that simulates MongoDB and MySQL locally, we ensure that errors in the migration script can be found as quickly as possible.

The test database can be prepared by deleting rows of major models using relational data, which can be deleted using dependent in MongoDB: : destroy delete related model, it can be as much as possible to ensure data consistency and integrity, but before the online database migration, we still need to full amount of all the data of directing a migration test, so that we can find some more hidden problems, ensure the status of the real online can appear less.

Database migration also belongs to the refactoring actually, before the mongo database migration must assure project has a perfect test system and test cases, so as to let us after reconstruction project, make sure there will be no we unexpected problems, the whole project is controlled, if there is not enough in the engineering test, not even Forget about refactoring – unit tests are the foundation of refactoring.

conclusion

How to migrate from MongoDB to MySQL is actually an engineering problem, we need to constantly look for problems that can go wrong, break up a complex task, and minimize the impact of migration on service availability and stability before actually migrating.

In addition, the choice between MongoDB and MySQL is not always either-or. We migrated most of the data in the project to MySQL, but left some of the data for calculation and analysis in MongoDB. In this way, MongoDB outages will not affect the main tasks of the project, and MySQL backup and recovery will be very fast due to the smaller database.

Finally, testing is really important. Without testing, no one can modify a lot of business code without losing any business logic, and without testing, much of the business logic would be lost on the day of development.

If you have any questions about this article or MongoDB migration, feel free to leave them in the comments.

How to migrate from MongoDB to MySQL

Follow: Draveness dead simple

Reference

How do I migrate data from a MongoDB to MySQL database? Who will