Function call optimization

MySQL functions are marked internally as deterministic or indeterminate. A function given fixed values for arguments is indeterminate if it can return different results for different calls. Examples of indeterminate functions: RAND(), UUID().

If a function is marked as indeterminate, WHERE evaluates the reference to the function in the clause for each row (when selecting from a table) or for a combination of rows (when selecting from a multi-table join).

MySQL also determines when to evaluate a function based on the type of argument (whether the argument is a table column or a constant value). Every time a table column changes its value, a deterministic function that takes the table column as an argument must be evaluated.

Nondeterministic functions can affect query performance. For example, some optimizations may not be available, or may require more locking. The following discussion uses RAND() but applies to other uncertain functions as well.

Suppose a table T has the following definitions:

CREATE TABLE t (id INT NOT NULL PRIMARY KEY, col_a VARCHAR(100));
Copy the code

Consider the following two queries:

SELECT * FROM t WHERE id = POW(1.2);
SELECT * FROM t WHERE id = FLOOR(1 + RAND() * 49);
Copy the code

Due to equality comparisons with primary keys, both queries appear to use primary key lookups, but this only applies to the first query:

  • The first query always produces at most one row, because the constant of POW() with a constant argument is a constant value and is used for index lookups.
  • The second query contains an expression that uses a nondeterministic functionRAND()Is not constant in the query, but actually has a new value for each row of the tablet. Therefore, the query reads each row of the table, evaluates the predicates for each row, and outputs all rows where the primary key matches the random value. According to theidThe column values andRAND()The value in the sequence, which can be zero, one, or more lines.

The effect of uncertainty is not limited to SELECT statements. The UPDATE statement uses nondeterministic functions to select rows to modify:

UPDATE t SET col_a = some_expr WHERE id = FLOOR(1 + RAND() * 49);
Copy the code

The approximate goal is to update at most one row that the primary key matches the expression. However, it may update zero, one or more rows, depending on the ID column value and the value in the RAND() sequence.

The behavior just described has an impact on performance and replication:

  • Because uncertain functions do not produce constant values, the optimizer cannot use other strategies that might be applicable, such as index lookups. The result may be a table scan.
  • InnoDBIt is possible to upgrade to a range key lock instead of obtaining a single row lock for a matching row.
  • An update that cannot be determined to be performed is unsafe for replication.

The difficulty stems from the fact that RAND() evaluates the function once for each row of the table. To avoid multifunctional evaluations, use one of the following techniques:

  • Move an expression containing an indeterminate function into a separate statement, storing the value in a variable. In the original statement, replace the expression with a reference to a variable that the optimizer can treat as a constant value:

    SET @keyval = FLOOR(1 + RAND() * 49);
    UPDATE t SET col_a = some_expr WHERE id = @keyval;
    Copy the code
  • Assigns random values to variables in derived tables. This technique causes the variable to be assigned a value before being used in a comparison in the WHERE clause:

    SET optimizer_switch = 'derived_merge=off';
    UPDATE t, (SELECT @keyval := FLOOR(1 + RAND() * 49)) AS dt
    SET col_a = some_expr WHERE id = @keyval;
    Copy the code

As mentioned earlier, uncertain expressions in this WHERE clause can prevent optimizations and cause table scans. However, WHERE clauses can be partially optimized if other expressions are deterministic. Such as:

SELECT * FROM t WHERE partial_key=5 AND some_column=RAND();
Copy the code

If the optimizer partial_key can be used to reduce the set of selected rows, RAND() is executed less often, which reduces the impact of uncertainty on optimization.

More content welcome to pay attention to my personal public number “Han Elder brother has words”, 100G artificial intelligence learning materials, a large number of back-end learning materials waiting for you to take.