This article is from OPPO Internet technology team, please note the author. At the same time, welcome to follow our official account: OPPO_tech, share with you OPPO cutting-edge Internet technology and activities.

Xiao O is a new analyst in OPPO. He met many like-minded colleagues in the company, which made him very happy. However, there is one thing that troubles him recently, because when searching data, he usually uses Hive query, but a lot of times his query is running very slowly, or even can not find out.

If Spark also runs slowly, you can use Presto in interactive queries. Therefore, O has mastered two new query engine magic tools.

However, after using Spark for some queries, Spark was not running as fast as Hive. Which engine should I use? And the syntax of different engines is often different, so that his head is big.

One day after work, small O met a colleague small Q, small Q know small O’s troubles said: “this is easy to solve, you can use the company’s development of the south Gate interactive query and self-help take number, I used for half a year, query data this saves a lot of trouble.”

Q explained in detail: “Osql, the underlying engine of Ntianmen, will automatically help us choose the appropriate query engine, we write the SQL, we can get the query result data, regardless of the difference between different engines, whether it is USING ANSI SQL, or HiveQL, it will do its best to help us find the results of the query. So far in 2020, Osql has provided us with more than 2 million queries, and 70% of the queries can return data in less than 1 minute, greatly improving the efficiency of the query.”

After listening to the introduction, small O is very excited, “Osql is so magical, that I will not be as long as Osql this engine can deal with, put my SQL to it, and then waiting for the return of the query results, so that the query this matter is much simpler. “

“Yes, so the tedious engine selection, syntax compatibility problems are as far as possible to the development of small brother, as long as we have a good combination of business needs, write their own SQL on the line, which greatly reduced the threshold and cost of our query.” Q responded.

This Osql service instantly solved xiao O’s problem and ignited his curiosity. “Then tell me how they did such a troublesome thing and what is the overall architecture of Osql system? “

“Don’t worry. Let me tell you all about it. If you look at the diagram below, it is the overall architecture of Osql.”

“In different application scenarios, we use different query engines, such as Spark/Hive for offline data, Presto for Ahoc queries, Druid for real-time data, etc. However, it also exposes too many different kinds of query interfaces to the upper layer business and creates obstacles to unified query management.

Therefore, based on the idea of unity, we hope to build a unified intelligent routing layer on many Olap (Online Analytical Processing) engines, and separate various common modules of query engine to Osql implementation, such as permission check, SQL specification check, query audit and so on. The end result is to provide users with a unified query portal and optimize the distribution of data sources across query engines based on query history to intelligently route queries to appropriate query engines.” “Q introduced.

“Osql can do so many things, I know SQL parsing, other query engines have this module, Osql query parsing what is different?” O continued.

“Yes, Osql query parsing mainly does the following things:

  1. Extract the user’s set parameters and specific SQL text, convenient for us to customize the set parameters in the tuning;

  2. To analyze whether the SQL conforms to the syntax specification, SparkSQL syntax specification is currently referenced;

  3. After parsing, the library name, table name, dimension and index are extracted to facilitate subsequent hot data mining.

  4. In the interactive scenario, results are limited to 500,000 rows to prevent arbitrary large queries.”

“Well, how does this determine which query engine we use to run? What are the rules?” O continued to ask the question in his mind.

“Ha ha ha, know you want to ask this, draw a picture for you clear.”

Seeing the above routing rules, O’s doubts were finally answered.

“I get it, they divide it up based on a few scenarios. If the user is an advanced player, follow the user’s wishes, such as set engine=presto, then the query will be run in presto. If the user does not actively set, Osql takes into account the current cluster resources and SQL usage scenarios, such as metadata, fetch, DDL, and normal Select, to Select the most appropriate query engine.”

Small Q listened to, praised, “very fierce, you understand very right.”

Small O modestly said: “are you the picture of good, but for me this is a small white users, there is a problem, often need to query the correlation engine parameter tuning, although this can be set, but I do not understand, don’t know which one to use to optimize parameters, the problem making small development have given the brothers?”

“This issue has certainly been taken into account and is gradually being improved. I remember last time when MY SQL was just starting to run, a small window popped up on the interface reminding me that I could add parameters to optimize, like the following.

The second time the same query is added with this parameter, the query efficiency is improved a lot. Later I found out that this parameter increases the maximum number of bytes in a single partition when Spark reads files. I would never have thought of using this parameter if I had tuned it myself.”

Hearing this, O also gave a thumbs-up, “Indeed, this function you said can solve a lot of trouble for us. By the way, Q teacher just mentioned the two modules of nantianmen interactive query and self-help number retrieval, they all use Osql, is there any difference in the implementation?”

“They do have some differences in implementation, as shown in the graph below, mainly for the following reasons.

Interactive query The following uses multiple query engines, which need to be more general. Generally, the amount of data returned is limited (500,000). The JDBC+CSV method is general. Presto, Druid, Clickhouse, etc do not support ‘Insert Overwrite Directory’;

Insert Overwrite Directory is used. The main reason is that the query results are large in the numeral-fetch scenario. If the JDBC+CSV method is used, the Osql machine is prone to run out of memory. Because the query results leave too much memory data, the data is written to a directory in the HDFS. Q explains.

“Qsql is quite creative, what are the latest development plans?”

Xiao Q said: “Last time we had a technical sharing in the company. They said that they would collect more common errors and optimization methods and give us suggestions through a real-time pop-up window. Then we will optimize the logic of engine routing more fine-grained, and strive to use the most appropriate engine to fulfill our query requirements as quickly as possible. It’s a continuous improvement process, and if you’re interested, you can work with them to make the service more stable and efficient in the future.”

After listening to Q’s introduction, O couldn’t wait to experience the new interactive query engine.