AskTUG.com technical question and answer website believe everyone is not unfamiliar, but in addition to the daily familiar front page, the database behind the support of its operation there is an unknown story. This article is shared by Wang Xingzong, one of the authors of AskTUG.com, to reveal the fantastic story of AskTUG.com born after Discourse, its migration from PostgreSQL to MySQL and its stable operation in TiDB.

An AD:

AskTUG.com is a gathering place for TiDB Users, Contributors, and partners where you can find answers to all your TiDB-related questions. Welcome to register to experience ~

Link: asktug.com/

background

“Through one platform, we are sure to find satisfactory answers to all of TiDB’s questions.”

Due to this desire, users, contributors and partners in TiDB ecosystem jointly established AskTUG.com technology q&A website, which was officially launched in August 2019. As a “gathering place” for TUG members to learn and share, TiDB users can put forward and answer questions here, communicate and discuss with each other, and gather the collective wisdom of TiDB users here. Since its launch, AskTUG.com has gradually attracted more and more users’ attention. By the end of June 2021, AskTUG.com had 7,000 + registered users, 1.6 million + questions and 300+ technical articles.

Many partners have discovered that the AskTUG.com back-end program is a Discourse program. Discourse is a new open source forum project by Jeff Atwood, co-founder of Stack Overflow, with the aim of transforming forum software that has not changed for a decade. At the beginning of the establishment of AskTUG.com, Discourse was determined from the following perspectives:

  • Powerful: Discourse features are rich and customizable. It is the WordPress of the forum world. Compared with other traditional forums, Discourse simplifies the classification of traditional forums and replaces it with hot posts. It is a bit like question and answer. It can avoid the confusion of users who cannot find their direction when they enter traditional forums.

  • Extensive audience: Most popular open source projects use Discourse to build their own communities, including:

    • Docker:forums.docker.com/

    • Github Atom:discuss.atom.io/

    • Mozilla:discourse.mozilla.org/

    • TiDB:asktug.com/

    • Discourse:meta.discourse.org/

    • Rust:users.rust-lang.org/

    • More: discourse.org/customers
  • Good usability: Discourse posts are presented in the form of bubbles, with all Ajax loading and computer and mobile versions. The forum adopts the design of waterfall flow, which automatically loads the next page without manual page turning. In short, this is an excellent system.

Why migrate

Up to now, Everything is Good except that Discourse officially only supports PostgreSQL database.

As an open source database vendor, we have great enthusiasm and sufficient reasons for AskTUG.com to run on our TiDB database. When we first had this idea, of course, we were looking for any solution that has Discourse port to MySQL. As a result, many people asked but no action was taken.

Therefore, we decided to do Discourse database transformation by ourselves. There are two reasons:

  • Verify TiDB compatibility by eating your own dog food.

  • **Discourse is a typical HTAP application. Its management background has complex report query. As the amount of forum data increases, standalone PostgreSQL and MySQL are prone to performance bottlenecks. ** The TiFlash MPP computing model introduced in TiDB 5.0 just meets the requirements of this application scenario. By introducing TiFlash nodes, the parallel processing of some complex statistical analysis queries can achieve the effect of acceleration. There is no need to change SQL and complex ETL processes.

The migration practice

Before, we have described the reasons for doing AskTUG & Discourse database transformation project. Next, we will talk in detail about the “pit” that PostgreSQL migrated to MySQL/TiDB. If you have migrated from PG to MySQL, you can refer to it.

TiDB is compatible with MySQL protocol and ecology, so it is easy to migrate and has very low operation and maintenance cost. Therefore, Discourse migration from PG to TiDB can be roughly divided into two steps:

Step 1: Migrate Discourse to MySQL;

Step 2: Adapt TiDB.

Migrate to MySQL 5.7

🌟mini_sql

Minisql is a lightweight SQL wraper that makes it easy to do queries that ORM is not good at and prevents SQL injection. Previously only PG and SQLite were supported. The Discourse code relies on minisQL in many places, and rewriting it will require a lot of work. Patch mini_SQL to support MySQL is an important step to complete the migration: github.com/discourse/m…

🌟 schema migration

Rails Schema migration is used to maintain DDL, which reflects the process of database schema changes. For migration, it actually increases the workload. The solution is to create a final schema.rb file and make changes to the final result. Generate a new Migration file. Delete the migration file generated during the intermediate process.

🌟 character set utf8mb4

database.yml

development:
  prepared_statements: false
   encoding: utf8mb4
   socket: /tmp/mysql.sock
   adapter: mysql2
Copy the code

/etc/mysql/my.cnf

[client] default-character-set = utf8mb4 [mysql] default-character-set = utf8mb4 [mysqld] character-set-client-handshake  = FALSE character-set-server = utf8mb4 collation-server = utf8mb4_unicode_ciCopy the code

🌟 MySQL can index only the first N chars of a BLOB or TEXT column

MySQL > select * from ‘PG’; MySQL > select * from ‘PG’;

t.index ["error"], name: "index_incoming_emails_on_error", length: 100
Copy the code

In the case of composite indexes, however, it is more complicated to ignore those with the text type. Fortunately, indexes do not affect functionality.

🌟 data migration

Pg2mysql can convert pgdump insert statements into a form that is compatible with MySQL syntax, but only in simple forms. Some formats with array and JSON will be messed up, but this part is correct in Ruby. Mysql > select user_options, site_settings, etc.

PGPASSWORD=yourpass pg_dump discourse_development -h localhost --quote-all-identifiers --quote-all-identifiers --inserts  --disable-dollar-quoting --column-inserts --exclude-table-data user_options --exclude-table-data user_api_keys --exclude-table-data reviewable_histories --exclude-table-data reviewables --exclude-table-data notifications --exclude-table-data site_settings --exclude-table-data reviewables --no-acl --no-owner --format p --data-only -f pgfile.sqlCopy the code

The rest of the data is migrated using seed_dump:

bundle exec rake db:seed:dump 
  MODELS=UserApiKey,UserOption,ReviewableHistory,
    Reviewable,Notification,SiteSetting 
EXCLUDE=[] IMPORT=true
Copy the code

🌟 distinct on

PG has a distinct ON usage that is equivalent to MySQLONLY_FULL_GROUP_BY when the MySQLONLY_FULL_GROUP_BY parameter is turned off, but as of MySQL 5.7, this parameter is turned on by default. So one solution is to turn off the ONLY_FULL_GROUP_BY parameter, and the other is to simulate the GROUP and aggregate functions:

# postgresql
SELECT DISTINCT ON (pr.user_id) pr.user_id, pr.post_id, pr.created_at granted_at
  FROM post_revisions pr
  JOIN badge_posts p on p.id = pr.post_id
  WHERE p.wiki
      AND NOT pr.hidden
      AND (:backfill OR p.id IN (:post_ids))
# mysql
SELECT pr.user_id, MIN(pr.post_id) AS post_id, MIN(pr.created_at) AS granted_at
  FROM post_revisions pr
  JOIN badge_posts p on p.id = pr.post_id
  WHERE p.wiki
      AND NOT pr.hidden
      AND (:backfill OR p.id IN (:post_ids))
  GROUP BY pr.user_id
Copy the code

🌟 returning

PG UPDATE, DELETE, and INSERT statements can all return results with returning keywords. MySQL > select * from primary key; UPDATE key; DELETE key;

update users set updated_at = now() where id = 801 returning id,updated_at ; Id | updated_at + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 801 | 2019-12-30 15:43:35. 81969Copy the code

MySQL version:

update users set updated_at = now() where id = 801;
select id, updated_at from users where id = 801;
+-----+---------------------+
| id  | updated_at          |
+-----+---------------------+
| 801 | 2019-12-30 15:45:46 |
+-----+---------------------+
Copy the code

For a single INSERT, use the last_insert_id() function:

PG version:

Insert into category_users(user_id, category_id, notification_level) values(100,100,1) returning id, user_id, 2000 category_id; id | user_id | category_id ----+---------+------------- 59 | 100 | 100Copy the code

MySQL > update MySQL

Insert into category_users(user_id, category_id, notification_level) values(100,100,1); select id, category_id, user_id from category_users where id = last_insert_id(); +----+-------------+---------+ | id | category_id | user_id | +----+-------------+---------+ | 48 | 100 | 100 | +----+-------------+---------+Copy the code

INSERT (last_insert_id()); INSERT (last_insert_id()); INSERT (last_insert_id());

ub_ids = records.map do |ub|
  DB.exec(
    "INSERT IGNORE INTO user_badges(badge_id, user_id, granted_at, granted_by_id, post_id) 
     VALUES (:badge_id, :user_id, :granted_at, :granted_by_id, :post_id)",
     badge_id: badge.id,
     user_id: ub.user_id,
     granted_at: ub.granted_at,
     granted_by_id: -1,
     post_id: ub.post_id
  )
  DB.raw_connection.last_id
end

DB.query("SELECT id, user_id, granted_at FROM user_badges WHERE id IN (:ub_ids)", ub_ids: ub_ids)
Copy the code

🌟 insert into on conflict do nothing

PG 9.5 starts to support upsert, MySQL has the same functionality, just written differently:

# postgresql
DB.exec(<<~SQL, args)
  INSERT INTO post_timings (topic_id, user_id, post_number, msecs)
  SELECT :topic_id, :user_id, :post_number, :msecs
  ON CONFLICT DO NOTHING
SQL
# MySQL
DB.exec(<<~SQL, args)
  INSERT IGNORE INTO post_timings (topic_id, user_id, post_number, msecs)
  SELECT :topic_id, :user_id, :post_number, :msecs
SQL
Copy the code

🌟 select without from

Select 1 where 1=2;

This is illegal in MySQL because there is no FROM clause, so the trick is to manually create a table with only one data that is specifically compatible with this syntax.

execute("create table one_row_table (id int)")
execute("insert into one_row_table values (1)")
Copy the code

MySQL using:

# MySQL
select 1 from one_row_table where 1=2;
Copy the code

🌟 full outer join

MySQL does not support full outer join, so you need to use LEFT join + RIGHT join + UNION.

# MySQL
SELECT * FROM t1
LEFT JOIN t2 ON t1.id = t2.id
UNION
SELECT * FROM t1
RIGHT JOIN t2 ON t1.id = t2.id
Copy the code

🌟 recursive cte

Prior to MySQL 8.0, CTE/Recursive CTE was not supported. Simple CTES can be changed to subqueries with no functionality other than poor readability. Recursive CTE can be modeled using User-defined variables. Discourse has a nested reply query:

WITH RECURSIVE breadcrumb(id, level) AS ( SELECT 8543, 0 UNION SELECT reply_id, level + 1 FROM post_replies AS r JOIN breadcrumb AS b ON (r.post_id = b.id) WHERE r.post_id <> r.reply_id AND b.level < 1000 ), breadcrumb_with_count AS ( SELECT id, level, COUNT(*) AS count FROM post_replies AS r JOIN breadcrumb AS b ON (r.reply_id = b.id) WHERE r.reply_id <> r.post_id GROUP  BY id, level ) SELECT id, level FROM breadcrumb_with_count ORDER BY idCopy the code

Use MySQL 5.7 to be compatible with:

# MySQL
SELECT id, level FROM (
  SELECT id, level, count(*) as count FROM (
    SELECT reply_id AS id, length(@pv) - length((replace(@pv, ',', '')))  AS level
      FROM (
             SELECT * FROM post_replies ORDER BY post_id, reply_id) pr,
             (SELECT @pv := 8543) init
     WHERE find_in_set(post_id, @pv)
           AND length(@pv := concat(@pv, ',', reply_id))
  ) tmp GROUP BY id, level
) tmp1
WHERE (count = 1)
ORDER BY id
Copy the code

The CTE of PG can be nested, such as the query in the Discourse WITH period_actions which is nested in flag_count:

 WITH mods AS (
  SELECT
  id AS user_id,
  username_lower AS username,
  uploaded_avatar_id
  FROM users u
  WHERE u.moderator = 'true'
  AND u.id > 0
  ),
  time_read AS (
  SELECT SUM(uv.time_read) AS time_read,
  uv.user_id
  FROM mods m
  JOIN user_visits uv
  ON m.user_id = uv.user_id
  WHERE uv.visited_at >= '#{report.start_date}'
  AND uv.visited_at <= '#{report.end_date}'
  GROUP BY uv.user_id
  ),
  flag_count AS (
      WITH period_actions AS (
      SELECT agreed_by_id,
      disagreed_by_id
      FROM post_actions
      WHERE post_action_type_id IN (#{PostActionType.flag_types_without_custom.values.join(',')})
      AND created_at >= '#{report.start_date}'
      AND created_at <= '#{report.end_date}'
      ),
      agreed_flags AS (
      SELECT pa.agreed_by_id AS user_id,
      COUNT(*) AS flag_count
      FROM mods m
      JOIN period_actions pa
      ON pa.agreed_by_id = m.user_id
      GROUP BY agreed_by_id
      ),
      disagreed_flags AS (
      SELECT pa.disagreed_by_id AS user_id,
      COUNT(*) AS flag_count
      FROM mods m
      JOIN period_actions pa
      ON pa.disagreed_by_id = m.user_id
      GROUP BY disagreed_by_id
   )
Copy the code

This is very complex to simulate WITH subqueries, and can be compatible WITH temporary tables. The query part does not need any modification, just replace the WITH part WITH temporary table in order of dependency:

DB.exec(<<~SQL)
    CREATE TEMPORARY TABLE IF NOT EXISTS mods AS (
      SELECT
      id AS user_id,
      username_lower AS username,
      uploaded_avatar_id
      FROM users u
      WHERE u.moderator = true
      AND u.id > 0
    )
  SQL

  DB.exec(<<~SQL)
    CREATE TEMPORARY TABLE IF NOT EXISTS time_read AS (
      SELECT SUM(uv.time_read) AS time_read,
      uv.user_id
      FROM mods m
      JOIN user_visits uv
      ON m.user_id = uv.user_id
      WHERE uv.visited_at >= '#{report.start_date.to_s(:db)}'
      AND uv.visited_at <= '#{report.end_date.to_s(:db)}'
      GROUP BY uv.user_id
    )
  SQL
Copy the code

🌟 delete & update

The writing methods of UPDATE/DELETE statements in PG and MySQL are different. ORM will process them automatically. However, a large number of codes in Discourse are written SQL using MINI_SQL and need to be replaced one by one.

PG update statement:

# postgresql
UPDATE employees
SET department_name = departments.name
FROM departments
WHERE employees.department_id = departments.id
Copy the code

MySQL > update;

# MySQL
UPDATE employees
LEFT JOIN departments ON employees.department_id = departments.id
SET department_name = departments.name
Copy the code

The delete statement is similar.

🌟 You can’t specify target table xx for update in FROM clause

Specify target table ‘users’ for update in FROM clause You can’t specify target table ‘users’ for update in FROM clause

# MySQL
update users set updated_at = now() where id in (
  select id from users where id < 10
);
# You can't specify target table 'users' for update in FROM clause
Copy the code

Derived table; derived table;

# MySQL
update users set updated_at = now() where id in (
  select id from (select * from users) u where id < 10
);
Copy the code

🌟 MySQL doesn’t yet support ‘LIMIT & IN/ALL/ANY/SOME subquery’

For example, if the subquery has a LIMIT:

# MySQL
update users set updated_at = now() where id in (
  select id from (select * from users) u where id < 10 limit 10
);
# MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
Copy the code

The simplest solution is to do it again:

# MySQL
update users set updated_at = now() where id in (
  select id from (
    select id from (select * from users) u where id < 10 limit 10
  ) u1
);
Copy the code

🌟 window function

MySQL 8.0 did not have window functions, but can use user-defined Variables instead:

# postgresql
WITH ranked_requests AS (
  SELECT row_number() OVER (ORDER BY count DESC) as row_number, id
    FROM web_crawler_requests
   WHERE date = '#{1.day.ago.strftime("%Y-%m-%d")}'
)
DELETE FROM web_crawler_requests
WHERE id IN (
  SELECT ranked_requests.id
    FROM ranked_requests
   WHERE row_number > 10
)
# MySQL
DELETE FROM web_crawler_requests
WHERE id IN (
  SELECT ranked_requests.id
    FROM (
          SELECT @r := @r + 1 as row_number, id
            FROM web_crawler_requests, (SELECT @r := 0) t
           WHERE date = '#{1.day.ago.strftime("%Y-%m-%d")}'
        ORDER BY count DESC
    ) ranked_requests
   WHERE row_number > 10
)
Copy the code

🌟 swap columns

MySQL > update MySQL > update MySQL > update MySQL > update MySQL > update MySQL > update MySQL > update

# postgresql
create table tmp (id integer primary key, c1 varchar(10), c2 varchar(10));

insert into tmp values (1,2,3);
insert into tmp values (2,4,5);

select * from tmp;
 id | c1 | c2
----+----+----
  1 | 3  | 2
  2 | 5  | 4

update tmp set c1=c2,c2=c1;

select * from tmp;
 id | c1 | c2
----+----+----
  1 | 3  | 2
  2 | 5  | 4
# MySQL

create table tmp (id integer primary key, c1 varchar(10), c2 varchar(10));

insert into tmp values (1,2,3);
insert into tmp values (2,4,5);

select * from tmp;
+----+------+------+
| id | c1   | c2   |
+----+------+------+
|  1 | 2    | 3    |
|  2 | 4    | 5    |
+----+------+------+

update tmp set c1=c2,c2=c1;

select * from tmp;
+----+------+------+
| id | c1   | c2   |
+----+------+------+
|  1 | 3    | 3    |
|  2 | 5    | 5    |
+----+------+------+
Copy the code

🌟 function

There are some inconsistencies in the names and behavior of some of the built-in functions in PG and MySQL:

  • regexp_replace -> replace

  • pg_sleep -> sleep

  • ilike -> lower + like

  • ~* -> regexp

  • || -> concat

  • set local statementtimeout -> set session statementtimeout

  • offset a limit b -> limit a offset b

  • @ -> ABS

  • Interval -> date_add or datediff

  • extract epoch from -> unix_timestimp

  • unnest -> union all

  • Json ->> ‘username’ to json->> ‘$. Username ‘

  • position in -> locate

  • generate_series -> union

  • greatest & least -> greatest/least + coalesce

🌟 type & casting

MySQL uses the cast function, PG also supports the same syntax, but more commonly uses four points ::, such as SELECT 1::varchar, MySQL can only convert the following five types: CHAR[(N)], DATE, DATETIME, DECIMAL, SIGNED, TIME.

select cast('1' as signed);

In Rails, PG maps to varchar, MySQL maps to varchar(255), and PG can store more than 255. Some string data in Discourse will exceed 255 and will be truncated after conversion to MySQL. The solution is to use text type for this Discourse.

🌟 keywords

MySQL and PG have different keyword lists. For example, read is a keyword in MySQL, but not in PG. For ORM production of SQL has been processed, some handwritten SQL need to quote, PG use “”, MySQL use ‘ ‘.

🌟 expression index

PG supports expression indexing:

CREATE INDEX test1_lower_col1_idx ON test1 (lower(col1));

Some of the functions in Discourse will add unique constraints to the expression index. MySQL does not have a direct corresponding, but it can be simulated by using Stored Generated Column. A Stored Generated Column is redundant first and then the unique constraints are added to it. The same effect was achieved.

Rails also supports:

t.virtual "virtual_parent_category_id", type: :string, as: "COALESCE(parent_category_id, '-1')", stored: true 
t.index "virtual_parent_category_id, name", name: "unique_index_categories_on_name", unique: true
Copy the code

🌟 array && json

PG supports array and JSON. MySQL 5.7 has completed JSON. In Discourse, array and JSON are used only for storage without advanced retrieval requirements. JSON can be used directly instead of PG array and JSON. MySQL JSON and text do not support the default value. You can only set the default value in the application layer. You can use github.com/FooBarWidge…

Adapter TiDB

TiDB supports the MySQL transport protocol and most of its syntax, but some features are not implemented well in a distributed environment, so there are still some differences between the performance of some features and MySQL, see the documentation pingcap.com/docs-cn/sta…

🌟 TiDB reserves keywords

In the new version of TiDB (v3.0.7 is used in this migration), Window Function is supported and group, rank, row_number and other functions are introduced. However, the above Function names are treated as keywords by TiDB. Therefore, when opening window functions, we need to modify the SQL named similar to the window function name, and the related keywords are enclosed in backquotes.

TiDB reserved keywords :pingcap.com/docs-cn/sta…

TiDB window functions: pingcap.com/docs-cn/sta…

🌟 Insert into SELECT syntax is incompatible

TiDB does not support this syntax. You can use insert into select from dual to bypass this syntax:

invalid: insert into t1 (i) select 1;
valid: insert into t1 (i) select 1 from dual;
Copy the code

🌟 Nested transactions & SavePoint

TiDB does not support nested transactions, nor does SavePoint. But Rails ActiveRecord uses savePoint to simulate nested transactions when the database is MySQL or PostgreSQL and requiRES_new to control. Documents: api.rubyonrails.org/classes/Act…

Therefore, after the database migration to TiDB, we need to adjust the business code, adjust the original logic involving nested transactions to single-layer transactions, roll back uniformly in case of exceptions, and cancel the requiRES_new option in discourse.

Strong compatibility with TiDB

TiDB is 100% compatible with MySQL 5.7 protocol. In addition, it supports common MySQL 5.7 features and syntax. System tools in the MySQL 5.7 ecosystem (PHPMyAdmin, Navicat, MySQL Workbench, mysqlDump, Mydumper/Myloader), clients, etc., are available for TiDB. With TiDB 5.0, new features such as expression indexes, CTES, and temporary tables will be released. TiDB compatibility will be improved and it will be easier to migrate from MySQL or PostgreSQL to TiDB.

conclusion

The project is 100% complete, and now AskTUG website (AskTug.com) runs smoothly on TiDB (current version: TIDB-v5.0.x) for more than a year. Yes, without changing the experience, no one has noticed that the database has been quietly changed ~ proving the feasibility of migrating business running on PG to TiDB.

The project address is: github.com/tidb-incuba… Fork & raise PR to participate in the improvement and follow the progress of the project. We also welcome the Ruby community, Ruby On Rails community and partners of Discourse community to experience the goodwill from TiDB community.