36 | why can temporary table name repetition?

Today we start with this question: what are the characteristics of temporary tables and what scenarios are suitable for them?

Here, I need to help you clear up a misunderstanding: some people may think that temporary tables are memory tables. However, the two concepts are quite different.

  • Create table… create table… Engine = the memory. ** The table data is stored in memory and will be cleared when the system restarts, but the table structure remains. ** Except for these two “strange” features, it is a normal table from other features.
  • Temporary tables that can be used with various engine types. If InnoDB engine or MyISAM engine is used for temporary tables, data is written to disk. Of course, Memory engines can also be used for temporary tables.

Now that we know the difference between a memory table and a temporary table, let’s take a look at the characteristics of a temporary table.

Temporary table features

To make it easier to understand, let’s look at the following operation sequence:

As you can see, temporary tables have the following characteristics:

  1. Create temporary table… .
  2. A temporary table can only be accessed by the session that created it and is not visible to other threads. Therefore, the temporary table T created by Session A in the figure is not visible to Session B.
  3. Temporary tables can have the same name as regular tables.
  4. If session A has A temporary table with the same name and A regular table with the same name, the showcreate statement and the add, delete, alter, query statement access the temporary table.
  5. The showtables command does not display temporary tables.

Because a temporary table can only be accessed by the session that created it, the temporary table is automatically dropped when the session ends.

Because of this feature, temporary tables are particularly well suited for scenarios like join optimization in the previous article. Why is that? The reasons include the following two aspects:

  1. Temporary tables of different sessions can have the same name. If multiple sessions perform join optimization at the same time, there is no need to worry about table creation failure caused by duplicate table names.
  2. There is no need to worry about data deletion. If a common table is used and the client is abnormally disconnected or the database is abnormally restarted during process execution, you need to clean the data tables generated during the process. Temporary tables do not need this extra operation because they are automatically reclaimed.

Application of temporary tables

Temporary tables are often used to optimize complex queries without worrying about duplicate name conflicts between threads. Among them, the cross-library query of the system is a typical application scenario.

In a typical database and table split scenario, a logically large table is spread across different database instances. For example. A large table HT was divided into 1024 sub-tables by field F and distributed across 32 database instances. As shown below:

In general, this kind of database and table system has an intermediate proxy. However, there are some solutions that let the client connect directly to the database without the proxy layer.

In this architecture, partitioning keys are chosen based on “reducing cross-library and cross-table queries.” If most statements contain f equivalence conditions, use F as the partitioning key. This way, after the SQL statement is parsed at the Proxy level, you can determine which sub-table to route the statement to for query.

For example, the following statement:

select v from ht where f=N;

At this point, we can use a table partition rule (for example, N%1024) to determine which table the required data is placed on. This statement requires only one access to the table and is the most popular form of statement in the database and table scheme.

However, if there is another index k on the table and the query statement looks like this:

select v from ht where k >= M order by t_modified desc limit 100;

At this time, since partition field F is not used in the query condition, we can only find all rows that meet the condition in all partitions and then perform the operation of order BY uniformly. In this case, there are two common ideas.

The first idea is to implement sorting in the process code of the Proxy layer. The advantage of this method is fast processing speed, after getting the data of the branch library, directly participate in the calculation in memory. However, the disadvantages of this scheme are also obvious:

  1. The development workload is relatively large. This statement is relatively simple. If complex operations are involved, such as group by or even JOIN, the development capability of the middle layer is quite high.
  2. The pressure on the proxy is high, especially the problems of insufficient memory and CPU bottleneck.

Another way to do this is to aggregate the data from each database into a table in a MySQL instance, and then perform logical operations on that aggregate instance.

For example, in the above statement, the execution flow could look like this:

  • Create temporary table temp_HT (v, K, T_MODIFIED) on table TEMP_HT;
  • Execute on each branchselect v,k,t_modified from ht_x where k >= M order by t_modified desc limit 100;
  • Insert the result of repository execution into the temp_HT table;
  • performselect v from temp_ht order by t_modified desc limit 100;

Get the result. The corresponding flow chart of this process is as follows:

In practice, we often find that each branch is not saturated with computation, so we directly place the temporary table TEMP_HT on one of the 32 branches.

Why can temporary tables have the same name?

How, you may ask, can different threads create temporary tables with the same name?

We are executing

create temporary table temp_t(id int primary key)engine=innodb;

MySQL needs to create an FRM file for the InnoDB table to hold the table structure definition and a place to store the table data.

This FRM file is stored in the temporary file directory with the suffix. FRM and the prefix “# SQL {process ID}_ {thread ID}_ sequence number”.

MySQL > create temporary table T1 InnoDB temporary table T1 = InnoDB temporary table T1 = InnoDB temporary table T1 = InnoDB temporary table T1

Let’s start with an example.

The process id is 1234, the thread ID of session A is 4 and the thread ID of session B is 5. So, as you can see, session A and session B create temporary tables that don’t have the same file name on disk.

MySQL maintains tables. In addition to physical files, there is also a set of mechanisms in memory to distinguish different tables. Each table has a table_def_key.

  • The table_def_key value of a normal table is obtained by “library name + table name”, so if you want to create two normal tables with the same name in the same library, you will find that table_def_key already exists on the second table.
  • For temporary tables, table_def_key is added with server_id+thread_id in addition to database name + table name.

In other words, session A and session B create two temporary table T1, their table_def_key is different, the disk file name is different, so can coexist.

Implementationally, each thread maintains its own temporary linked list. In this way, when a table is operated in each session, the linked table is first traversed to check whether there is a temporary table with this name. If there is a temporary table, the temporary table is operated first. If there is no common table, the temporary table is operated first. At the end of the session, execute “DROPTEMPORARY TABLE + TABLE name” for each temporary TABLE in the list.

The command DROPTEMPORARY TABLE is also recorded in the binlog. Temporary tables can only be accessed by the thread itself, so why write to binlog? This brings us to master/slave replication.

Temporary tables and master/slave replication

So if I write a binlog, that means that I need a backup. You can imagine executing the following statement sequence on the main library:

create table t_normal(id int primary key, c int)engine=innodb; /*Q1*/ create temporary table temp_t like t_normal; /*Q2*/ insert into temp_t values(1,1); /*Q3*/ insert into t_normal select * from temp_t; /*Q4*/Copy the code

If operations on temporary tables are not logged, then only binlogs for create TABLE T_normal and INSERT inTOT_normal SELECT * fromtemp_t are logged in the standby database. Insert into T_normal; error “table TEMP_t does not exist”;

You might say, wouldn’t it be nice if I set binlog to row? Insert intot_normal binlog (1,1); insert intot_normal binlog (1,1);

That’s true. If the current binlog_format=row, temporary table statements will not be recorded in the binlog. That is, temporary table operations are only recorded in binlog if binlog_format=statment/mixed.

In this case, the statement that creates the temporary table is passed to the standby database for execution, so the synchronous thread of the standby database creates the temporary table. The primary database automatically deletes temporary tables when a thread exits, but the standby database synchronizes the thread to run continuously. Create a DROPTEMPORARY TABLE on the primary database and pass it to the standby database.

It is ok for different threads on the primary to create temporary tables with the same name, but what about passing execution to the standby?

Now, to give you an example, in the following sequence instance S is the standby library of M.

Two sessions on primary database M create a temporary table T1 with the same name. Both create TEMPORARY table T1 statements are passed to standby database S.

However, the application log threads in the standby library are shared, which means that the create statement is executed twice in the application thread. (Even if multithreaded replication is enabled, it may be assigned to the same worker from the library). So, will this cause the synchronization thread to report an error?

Obviously not, otherwise temporary tables would be a bug. That is, the standby thread will execute the two T1 tables as if they were two different temporary tables. And how does that work? MySQL writes the id of the thread that executed the statement to the binlog. In this way, the application thread in the standby database can know the primary thread ID of each statement and use this thread ID to construct the table_def_key of the temporary table:

  1. Temporary table T1 of session A, table_def_key in the standby database is: database name + T1 + serverID of M + thread_id of session A;
  2. Temporary table T1 of session B, table_def_key in the standby database is: library name + T1 + serverID of M + thread_id of session B.

Because the table_def_key is different, the two tables do not conflict in the application thread of the standby library.