Java programmers need a mixture of object-oriented thinking and general imperative programming, and it’s up to the programmer to combine the two perfectly:
Skills: Imperative programming patterns can be learned easily by anyone: Some people use “pattern-pattern”, for example, patterns can be applied anywhere and fall into a certain pattern state: First of all, it’s a lot harder to write good object-oriented programs than it is to write imperative programs, you have to do some work but when Java programmers write SQL statements, everything changes. SQL is a declarative language rather than an object-oriented or imperative programming language. Writing a query statement in SQL is very simple. But similar statements in Java are not easy, because programmers have to think about algorithms as well as programming paradigms.
Here, in no particular order, are 10 common mistakes Java programmers make when writing SQL.
The misconception that Java programmers make about NULL when writing SQL is probably the biggest mistake of all. Perhaps because (and not for the only reason) NULL is also known as UNKNOWN. This is easier to understand if it’s called UNKNOWN. Another reason is that JDBC maps SQL NULL to NULL in Java when you fetch things from a database or bind variables. This leads to the misconception of NULL = NULL (SQL) and NULL = NULL (Java).
The biggest misconception about NULL is when it is used as a row – value expression integrity constraint. Another misconception occurs with NULL IN the application of NOT IN anti-joins.
Solutions:
Train yourself well. Always think about NULL when writing SQL: is the NULL integrity constraint correct? Does NULL affect the results? Few Java developers understand SQL well enough to process data in Java memory. The occasional JOIN, and the odd UNION, okay, but what about window functions? What about grouping sets? Many Java developers load SQL data into memory, convert it into some similar collection type, and then perform tedious math on those collections using the boundary loop control structure (at least until Java8’s collections are upgraded).
But some SQL databases support the advanced (and SQL standard supported) OLAP feature, which performs better and is easier to write. A (not very standard) example is Oracle’s excellent MODEL clause. Let the database do the processing and just bring the results into Java memory. Because after all, all the very smart guys have optimized these expensive products. So in effect, by moving OLAP to the database, you get two benefits:
Convenience. This is probably easier than writing the right SQL in Java. Performance. The database should be faster than your algorithm. And more importantly, you don’t have to pass around millions of records anymore. Solutions:
Every time you implement a data-centric algorithm in Java, ask yourself: Is there a way for the database to do the dirty work for me instead?
Not only is removing duplicate rows rarely necessary (and sometimes wrong), but it can be quite slow for large data sets with many rows, because the two subselectionsneed to be sorted, And each tuple also needs to be compared to its subsequence tuple.
Note that even though the SQL standard specifies INTERSECT ALL and EXCEPT ALL, few databases implement these useless set operators.
Solutions:
Every time you write a UNION statement, consider whether you actually need a UNION ALL statement.
Most databases will support some paging commands to achieve paging effect, such as LIMIT… OFFSET, TOP… START the AT, OFFSET… FETCH statement, etc. Even if there is no database that supports these statements, it is still possible to filter ROWNUM (Oracle) or ROW NUMBER(), OVER() (DB2, SQL Server2008, etc.), which is faster than implementing paging in memory. This is especially true when dealing with large amounts of data.
Solutions:
Using these statements alone, a tool (such as JOOQ) can simulate the actions of these statements.
5. Adding data to Java memory has been around since the early days of SQL, and some developers still feel uneasy when using JOIN statements in SQL. This is due to the inherent fear that joining a JOIN will slow down.
It may be true that all tables may be loaded in database memory before a join table source is created, given the cost-based optimization choice to implement nested loops. But the odds of that happening are so low. With proper prediction, constraints and fetches, merge join and hash join operations are all fairly fast. It’s all about correct metadata (I can’t quote Tom Kyte too much here). Also, there are probably still quite a few Java developers who load two tables by querying them separately into a map and adding them to memory to some extent.
Solutions:
If you have query operations from various tables in each step, consider whether you can express your query operations in a single statement.
Using DISTINCT or UNION to eliminate duplicates in a temporary set of Cartesian products Through complex joins, one can lose concept of all the relationships that play a key role in SQL statements. In particular, if this involves multi-column foreign key relationships, it is possible to forget the JOIN… Add relevant judgments to the ON clause. This can lead to duplicate records, but perhaps only in exceptional circumstances. Some developers may therefore choose DISTINCT to eliminate these duplicate records. This is wrong in three ways:
It (may) address the symptoms but not the problem. It may also fail to address symptoms in extreme cases. It is slow for large result sets with many columns. DISTINCT performs the ORDER BY operation to eliminate the duplication. It’s slow for a large Cartesian product set, and it still has to load a lot of data into memory. Solutions:
As a rule of thumb, if you get unwanted duplicate entries, check your JOIN judgment. There may be a set of Cartesian products somewhere that are hard to detect.
This is not a mistake, but it could be a lack of knowledge or confidence in powerful MERGE statements. Some databases understand other forms of UPSERT statements, such as MYSQL’s duplicate primary key update statements, but MERGE is powerful and important enough in databases to extend SQL standards such as SQL SERVER.
Solutions:
If you use things like combined INSERT and UPDATE or combined SELECT… Then think twice before inserting updates such as inserts or updates. You can use a simpler MERGE statement to avoid risk race conditions.
Before introducing window functions, aggregating data in SQL means using GROUP BY statements to map to aggregate functions. This works well in many cases, where aggregate data needs to condense regular data, so use group queries in join subqueries.
However, SQL2003 defined window functions, which are implemented in many major databases. Window functions aggregate data on a result set, but do not group it. In fact, each window function has its own, separate PARTITION BY statement, which is a great tool for displaying reports.
Using window functions:
Making SQL more readable (but not specialized in GROUP BY statements in subqueries) improves performance and makes it easier to optimize window function solutions like relational database management systems:
When you use GROUP BY statements in subqueries, think twice about using window functions.
SQL’s ORDER BY statement supports many types of expressions, including CASE statements, which are useful for indirect sorting. You might not want to sort the data in Java memory again, because you might think:
SQL sort is slow SQL sort can not solve the problem:
If you sort any SQL data in memory, think twice about whether you can’t sort it in the database. This is useful for database paging data.
10. Insert a large number of records one by one and JDBC “knows” batch, which you won’t forget. Do not use INSERT statements to go through thousands of records one by one, as a new PreparedStatement object will be created each time. If all your records are inserted into the same table, create an insert batch sentence with an SQL statement and many sets of values. Depending on your database and database Settings, you may need to commit the UNDO log after reaching a certain number of inserts to keep the UNDO log thin.
Solutions:
Always insert large amounts of data using batch processing.