MySQL join buffer

In MySQL’s processing of join operation, join buffer is an important concept, as well as an important optimization method of MySQL for table Join. Although this concept is not complicated to implement, it is an important method to implement MySQL join optimization, which can greatly improve the efficiency of join query in the case of “violent” join.

Assume you have the following join: Assume you have the following join: Assume you have the following join

Table name      Type
t1              range
t2              ref
t3              ALL
The join is then done as follows:
 
- While rows in t1 matching range
 - Read through all rows in t2 according to reference key
  - Store used fields from t1, t2 in cache
  - If cache is full
    - Read through all rows in t3
      - Compare t3 row against all t1, t2 combinations in cache
        - If row satisfies join condition, send it to client
    - Empty cache
 
- Read through all rows in t3
 - Compare t3 row against all stored t1, t2 combinations in cache
   - If row satisfies join condition, send it to client
Copy the code

2. Allocate join buffer cache space

Table_count specifies the number of nonconst tables before the join table. Tables [I]. Table ->read_set ();

Each execution of the double loop replicates the description structure of the field (and its corresponding data source) that needs to be cached, or the double loop is just for assigning and storing metadata. Cache ->buff=(uchar*) my_malloc(size,MYF(0))

Static int join_init_cache(THD * THD,JOIN_TAB *tables,uint table_count) {...... for (i=0 ; i < table_count ; i++) { bool have_bit_fields= FALSE; uint null_fields=0,used_fields; Field **f_ptr,*field; MY_BITMAP *read_set= tables[i].table->read_set; for (f_ptr=tables[i].table->field,used_fields=tables[i].used_fields ; used_fields ; f_ptr++) { field= *f_ptr; if (bitmap_is_set(read_set, field->field_index)) { used_fields--; length+=field->fill_cache_field(copy); ... } } cache->length=length+blobs*sizeof(char*); cache->blobs=blobs; *blob_ptr=0; /* End sequentel */ size=max(thd->variables.join_buff_size, cache->length); if (! (cache->buff=(uchar*) my_malloc(size,MYF(0)))) DBUG_RETURN(1); /* Don't use cache */ /* purecov: inspected */ cache->end=cache->buff+size; reset_cache_write(cache); DBUG_RETURN(0); }Copy the code

Three, common multi table query implementation

This “ordinary” of course can also be understood as “simple”, “intuitive” meaning, is the execution process in most cases. A normal query is actually a recursive call to each table, the same as matrix multiplication, and this correspondence is very intuitive and very general.

This regular query action is implemented through the sub_select function, which is essentially executed

tsecer_select() { for (r = first ; r ! = end ; r = next) { if(sofartest()) { nexttable.tsecer_select() } } }Copy the code

Where sofarTest () stands for “a judgment that can be made using all currently read tables”, which is the push-down expression in where. For example, select * from a, b where A.A > 10 and b.B + a.a = 10. Of course, this is a way of describing even pseudo-code, whereas real code corresponds to:

Enum_nested_loop_state sub_SELECT (JOIN_TAB * JOIN_TAB,bool end_OF_records) {...... error= (*join_tab->read_first_record)(join_tab); rc= evaluate_join_record(join, join_tab, error); ... while (rc == NESTED_LOOP_OK) { error= info->read_record(info); rc= evaluate_join_record(join, join_tab, error); }... return rc; Evaluate_record (JOIN * JOIN, JOIN_TAB * JOIN_TAB, int error) {... if (select_cond) { select_cond_result= test(select_cond->val_int()); /* check for errors evaluating the condition */ if (join->thd->is_error()) return NESTED_LOOP_ERROR; }... if (found) { enum enum_nested_loop_state rc; /* A match from join_tab is found for the current partial join. */ rc= (*join_tab->next_select)(join, join_tab+1, 0); if (rc ! = NESTED_LOOP_OK && rc ! = NESTED_LOOP_NO_MORE_ROWS) return rc; if (join->return_tab < join_tab) return NESTED_LOOP_OK; /* Test if this was a SELECT DISTINCT query on a table that was not in the field list; In this case we can abort if we found a row, as no new rows can be added to the result. */ if (not_used_in_distinct && found_records ! = join->found_records) return NESTED_LOOP_NO_MORE_ROWS; }... }Copy the code

And as you can see here, this is a recursion to produce a set of Cartesian cross products, which is pretty neat both programmatically and mathematically. In MySQL’s implementation, the for loop in tsecer_SELECT is roughly equivalent to the while loop in sub_SELECT, and the contents of the body of the tsecer_SELECT loop are placed in evaluate_join_record, Evaluate_join_record ::test(select_cond->val_int()); Evaluate_join_record :(* join_TAB -> next_SELECT)(join, join_TAB +1, 0)

4. Select implementation of Join buffer

When join buffer cache is used, the next_SELECT function points to sub_select_cache

enum_nested_loop_state sub_select_cache(JOIN *join,JOIN_TAB *join_tab,bool end_of_records) { enum_nested_loop_state rc; if (end_of_records) { rc= flush_cached_records(join,join_tab,FALSE); if (rc == NESTED_LOOP_OK || rc == NESTED_LOOP_NO_MORE_ROWS) rc= sub_select(join,join_tab,end_of_records); return rc; } if (join->thd->killed) // If aborted by user { join->thd->send_kill_message(); return NESTED_LOOP_KILLED; /* purecov: inspected */ } if (join_tab->use_quick ! = 2 || test_if_quick_select(join_tab) <= 0) { if (! store_record_in_cache(&join_tab->cache)) return NESTED_LOOP_OK; // There is more room in cache return flush_cached_records(join,join_tab,FALSE); } rc= flush_cached_records(join, join_tab, TRUE); if (rc == NESTED_LOOP_OK || rc == NESTED_LOOP_NO_MORE_ROWS) rc= sub_select(join, join_tab, end_of_records); return rc; }Copy the code

This code makes sense when combined with the MySQL documentation. The initial determination of end_of_records corresponds to

if (! store_record_in_cache(&join_tab->cache)) return NESTED_LOOP_OK; // There is more room in cache return flush_cached_records(join,join_tab,FALSE);Copy the code

The corresponding

  - Store used fields from t1, t2 in cache
  - If cache is full
Copy the code

The store_record_in_cache function determines whether the cache is full, stores the combined record of the previous table in the cache if more cache can be held, and returns NESTED_LOOP_OK. Note: This is the key to the overall cache optimization, since no table scan is enabled here. Conversely, if the cache is full, the flush_cached_records function is called to do the following

    - Read through all rows in t3
      - Compare t3 row against all t1, t2 combinations in cache
        - If row satisfies join condition, send it to client
    - Empty cache
Copy the code

The uniqueness of this process is that the traversal driver compares each record of the table with all combinations of T1 and T2 in the cache to determine whether the where condition (If row distribution condition) is satisfied. Run join_TAB -> next_SELECT (send it to client).

Static enum_nested_loop_state flush_cached_records(JOIN * JOIN,JOIN_TAB * JOIN_TAB,bool skip_last) {...... info= &join_tab->read_record; Do {// select * from t3; for (i=(join_tab->cache.records- (skip_last ? 1:0)); i-- > 0 ;) {read_cached_record(join_tab); skip_record= FALSE; if (select && select->skip_record(join->thd, &skip_record)) {// reset_cache_write(&join_tab->cache); return NESTED_LOOP_ERROR; } if (! Rc = (join_tab->next_select)(join,join_tab+1,0); if (rc ! = NESTED_LOOP_OK && rc ! = NESTED_LOOP_NO_MORE_ROWS) { reset_cache_write(&join_tab->cache); return rc; }}... } while (! (error=info->read_record(info)));Copy the code

5. Give an example of the process

The core idea of this implementation is not complicated, and it is more simple and intuitive when combined with specific examples. For example, we use two simple tables, which store the values of x and y respectively. We want to use a join operation to calculate all the values that satisfy X in these two tables

x + y

Y == 5 * 5, which is our most common “tick three, four, five” such a classic tick number value.

mysql> create table harry (x int); Query OK, 0 rows affected (0.03 SEC) mysql> insert Harry values (1),(2),(3),(4),(5); Query OK, 5 rows affected (0.00 SEC) Records: 5 Duplicates: 0 Warnings: 0 mysql> create table tsecer (y int); Query OK, 0 rows affected (0.01 SEC) mysql> insert tsecer VALUES (1),(2),(3),(4),(5); Query OK, 5 rows Affected (0.00 SEC) Records: 5 Duplicates: 0 Warnings: 0 mysql> explain select * from harry, tsecer where x * x + y * y = 5 * 5; +----+-------------+--------+------+---------------+------+---------+------+------+--------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | + - + -- -- -- -- -- -- -- -- -- -- -- -- -- + + -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- - + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | 1  | SIMPLE | harry | ALL | NULL | NULL | NULL | NULL | 5 | | | 1 | SIMPLE | tsecer | ALL | NULL | NULL | NULL | NULL | 5 | Using where; Using join buffer | + - + -- -- -- -- -- -- -- -- -- -- -- -- -- + + -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- - + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + 2 Rows in set (0.00 SEC) mysql>Copy the code

If no Join buffer is used, a full table scan is performed on the tsecer table for each x value of Harry, and then the combination of x and y is used to determine whether x meets the requirement

x + y

Y is equal to 5 times 5. Since x has a total of five values, tsecer requires five full table scans.

For each value of x, the tsecer table will cache the value in the JoinBuffer. If the buffer is not empty, the tsecer table will store the value in the joinBuffer and return it directly. When the join buffer is full or the last record, the tSECer table is scanned at this time. For each record read in the TSECer table, the previous cached record is combined to see whether the judgment condition is met. For the example we saw, where all 5 values of harry are in the cache, during the tSecer table scan, for each record read from tSecer, combined with the “every” cache in the cache, determine whether the combination results meet the condition. If any of the groups are satisfied, So next_select. In the example using buffer, you can see that there is only one scan of the TSecer table. Normally, the database scan code is the highest (because disk reads are involved), so using buffer reduces the tSecer table scan to one, so this is much more efficient. Especially if there are multiple tables involved and/or a large number of records in each table.

In essence, this efficiency improvement is due to the increased “utilization” of each record retrieved from the table. In the intuitive scan mode, a full table scan matches only one combination, whereas in the buffer mode, all combinations in the cache are matched.