1, SQL and Cartesian product

So, first of all, a little explanation of the Cartesian product.

Now, we have two sets A and B.

A = {0,1}     B = {2,3,4}
Copy the code

The result sets of A×B and B×A can be expressed in the following form respectively:

A×B = {(0,2), (1,2), (0,3), (1,3), (0,4), (1,4)}; B×A = {(2,0), (2,1), (3,0), (3,1), (4,0), (4,1)};Copy the code

So this A by B and B by A is called the Cartesian product of two sets.

From the above data analysis, we can draw the following two conclusions:

  1. If two sets are multiplied, the exchange rate is not satisfied, i.e. A×B ≠ B×A;

  2. When you multiply A by B, it covers all the possibilities of sequential combinations of members of A and members of B. The number of elements in the new set is the number of elements in the set A times the number of elements in the set B;

  3. The permutation is very similar to the permutation in high school math, except that it contains (2,0) and (0,2), whereas the cartesian product has only one of them: AxB is (0,2) and BxA is (2,0).

The algorithm followed when database table joins data row matching is the cartesian product mentioned above, and the connection between tables can be seen as doing multiplication.

Student_subject = student; student_subject = student; student_subject = student; student_subject = student;We execute the following SQL statement purely to join the table.

SELECT * from student JOIN student_subject;
SELECT * from student_subject JOIN student;
Copy the code

Take a look at the result:From the point of view of the implementation results, the results are consistent with the two conclusions we put forward above;

Taking the first SQL statement as an example, let’s take a look at its execution flow.

  1. The FROM statement loads the student and student_subject tables from the database file into memory.

  2. The join statement multiplicates two tables, matching each row in student with each row in student_subject.

  3. After matching, we have a temporary table with (number of records in student × number of records in student_subject table). The temporary tables formed in memory are shown in Table 1.0. We also call the table shown in table 1.0 in memory a Cartesian product table.

In view of the above theory, we put forward a question, is it necessary to form a Cartesian product table when tables are joined?

If the amount of data in both tables is large, it will take up a lot of memory space, which is obviously not reasonable. Therefore, the syntax of JOIN XXX ON XXX is generally used in table JOIN query. The execution of THE ON statement is before the JOIN statement, that is to say, when the data rows of two tables are matched, whether the data rows meet the conditions behind the ON statement will be judged first before joining.

Therefore, there is an obvious SQL optimization scheme is, when two tables of data is relatively large, and need to JOIN the query, should use FROM table1 JOIN table2 ON XXX syntax, avoid using FROM table1,table2 WHERE XXX syntax, Because the latter will form a large cartesian product table in memory, increasing the memory overhead.

We can summarize a query SQL statement execution flow.

 1. From
 2. ON
 3. JOIN
 4. WHERE
 5. GROUP BY
 6. SELECT
 7. HAVING
 8. ORDER BY
 9. LIMIT
Copy the code

Finally, for two database table connection of the underlying implementation, I use Java code simulation, interested can see, can help us understand:

package com.opensource.util;

import java.util.Arrays;

public class DecareProduct {
    
    public static void main(String[] args) {
        
        // Use a two-dimensional array to simulate the student table
        String[][] student ={
                {"0"."jsonp"},
                {"1"."alice"}};// Use a two-dimensional array to emulate the student_subject table
        String[][] student_subject2 ={
                {"0"."0"."Chinese"},
                {"1"."0"."Mathematics"}};SELECT * from student JOIN student_subject;
        String[][] resultTowArray1 = getTwoDimensionArray(student,student_subject2);
        SELECT * from student_subject JOIN student;
        String[][] resultTowArray2 = getTwoDimensionArray(student_subject2,student);
        
        int length1 = resultTowArray1.length;
        for (int i = 0; i <length1 ; i++) {
            System.out.println(Arrays.toString(resultTowArray1[i]));
        }
        System.err.println("-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --");
        int length2 = resultTowArray2.length;
        for (int i = 0; i <length2 ; i++) { System.out.println(Arrays.toString(resultTowArray2[i])); }}/** * to join two tables *@param towArray1
     * @param towArray2
     * @return* /
    public static String[][] getTwoDimensionArray(String[][] towArray1,String[][] towArray2){
        
        // Get the height of the two-dimensional array (i.e. the number of one-dimensional arrays used to refer to the number of records in the database table)
        int high1 = towArray1.length;
        int high2 = towArray2.length;
        
        // Get the width of the two-dimensional array (i.e. the length of the two-dimensional array, used to refer to the columns in the database table)
        int wide1 = towArray1[0].length;
        int wide2 = towArray2[0].length;
        
        // Calculate the height and width of the cartesian product of two arrays, i.e. the number of rows and columns in the cartesian product table
        int resultHigh = high1 * high2;
        int resultWide = wide1 + wide2;
        
        // Initialize the result set number group, both cartesian product table
        String[][] resultArray = new String[resultHigh][resultWide];
        
        // Iterate over variables
        int index = 0;
        
        // iterate over the second two-dimensional array
        for (int i = 0; i < high2; i++) {
            
            // Take out the elements of the towArray2 two-dimensional array
            String[] tempArray = towArray2[i];
            
            // loop nesting, traversing the towArray1 array
            for (int j = 0; j < high1; j++) {
                
                // Initialize an array of length 'resultWide' as an element of the result set number group, i.e. a row in the Cartesian product table
                String[] tempExtened = new String[resultWide];
                
                // Take out the towArray1 element of the two-dimensional array
                String[] tempArray1 = towArray1[j];
                
                // Copy the elements of tempArray1 and tempArray into the elements of the result set. (Array expansion is used here)
                System.arraycopy(tempArray1, 0, tempExtened, 0, tempArray1.length);
                System.arraycopy(tempArray, 0, tempExtened, tempArray1.length, tempArray.length);
                
                // Put tempExtened in the result set group
                resultArray[index] = tempExtened;
                
                // Add oneindex++; }}returnresultArray; }}Copy the code

Execution Result:

The Cartesian product of several joins:

  • Two tables are joined directly, and the result of the Cartesian product is the multiplication of the data volumes of the two tables
  • A Cartesian product with where/on condition ids equal yields the same result as an inner join, but an inner join is faster
  • Select * from TEST_A where ID = null; left JOIN (TEST_B where ID = null)
  • Right Join is the opposite
  • Full JOIN: equal to the union of left Join and right Join

So if your program does need to join multiple tables, try to join them in pairs, shrink the result set by where or on or inner join, and then join the result set to other tables…

2. JavaIO decoration pattern and Cartesian product

While learning about the java.io package, the InputStream class was very annoying, not to mention a lot of subclasses, and very strange to use because it used decorator mode…

Suppose we want to read the byte stream from a file as a cache, the usual operation is always: Create a FileInputStream, and then put that FileInputStream into the BufferedInputStream constructor to create BufferedInputStream. Do this before you start reading the file:

	try (FileInputStream fis = new FileInputStream("c:/a.txt");
	     BufferedInputStream bis = new BufferedInputStream(fis)) {

		byte[] buffer = new byte[1024];
		int len;
		StringBuilder result = new StringBuilder();
		while((len = bis.read(buffer)) ! = -1) {
			result.append(new String(buffer, 0, len)); }}catch (IOException e) {
		//handle
	}
Copy the code

Why can’t Sun just create input stream classes that cache data from files?

Or why InputStream extends decorator mode instead of direct inheritance, i.e. Decorator mode vs. inheritance.

To answer this question, combine InputStream with FilterInputStream. If I use inheritance, let’s see what our class diagram looks like:Deja vu, let’s look again:

InputStream: [FileInputStream, ByteArrayInput Stream, SequenceInputStream, ObjectInputream, PipedInputStream, StringBufferInputStream… includes other subclasses of InputStream that inherit from the InputStream implementation. There are at least two hundred implementations of InputStream.]

FilterInputStream (which also inherits from InputStream) : [BufferedInputStream, DataInputStream, PushbackInputStream…]

If any combination of the two is assumed, a so-called output stream class can be formed. Then the number of such output stream classes will be a Cartesian product, namely explosive growth. Meanwhile, InputStream can also be combined with each other internally.

And if you use decoration mode, I don’t care how you match, just need to set a decoration, that is, into a new function of the output stream class!

SQL > select * from Descartes;

  • Blog.csdn.net/csdn_hklm/a…