On December 2nd, Oracle launched MySQL Database Service with Analytics Engine on its website. This feature is notable as a major enhancement to the MySQL product. I did a brief survey over the weekend, and you can get a glimpse of one or two. (Some materials and illustrations are from Oracle’s official website).

1.MySQL’s natural weakness “data analysis”

Life is basically two things, topic selection and problem solving. The best life is to choose the right problem and solve it well at each key point. The greatest pain in life is to solve the right question, but choose the wrong question, and do not know that they have chosen the wrong question. Just as the greatest regret in life is not that you couldn’t, but that you could have.

MySQL, as the most popular open source database, has been widely used. As the latest DB-Engines index shows, it has taken an important place in the database space.

However, as such a popular database product, there is an obvious shortcoming is data analysis. I believe that MySQL users have this feeling, in the case of large data, MySQL is inadequate. Although the kernel has been improving its capabilities in this area (for example, the latest 8.0 supports Hash Join, histogram, etc.), it still has some disadvantages compared to other databases. The core is that MySQL is designed natively for OLTP scenarios, not OLAP scenarios. Although some vendors have expanded their storage engines to enhance their data analysis capabilities, the overall situation is not satisfactory. Thus, a common option is to migrate the data to another database/big data architecture when data is analyzed. Through the latter, to complete the data analysis work. In this process, you need to define your own ETL logic (possibly based on log parsing or logical data extraction) to complete the data migration. At the same time, pay attention to data synchronization during operation to ensure data validity. In use, it is necessary to write separate statements (statements of heterogeneous data sources) to complete data analysis actions. The whole process, no doubt, has a certain threshold for users to use, and it takes extra effort to solve the problem. I believe Oracle has seen this problem and has launched MySQL Analytics Engine.

2. What exactly is a MAE?

Life is basically two things, topic selection and problem solving. The best life is to choose the right problem and solve it well at each key point. The greatest pain in life is to solve the right question, but choose the wrong question, and do not know that they have chosen the wrong question. Just as the greatest regret in life is not that you couldn’t, but that you could have.

MySQL Analytics Engine (MAE) is basically a built-in Analytics Engine. By combining it with MySQL Database, Database administrators and application developers can directly use MySQL Database as a unified service for OLTP and OLAP workloads. The MySQL Database Service with Analytics Engine consists of a MySQL DB instance and multiple analysis nodes. When analysis is enabled, analysis services are installed on DB instances and are responsible for cluster management, data loading, query execution, and so on. From the user’s point of view, it can be connected through the standard MySQL JDBC/ODBC Connector. Let’s expand on this

First of all, in terms of overall use, MySQL Database Service is provided externally. Users still use it in the traditional way, whether OLTP or OLAP scenarios. Inside the service, the traditional OLTP engine and the new Analytics engine are included. Statistically, there are two. The former is still stored in a storage engine such as InnoDB, while the latter is stored in memory in a cluster of nodes. Data changes resulting from normal transaction operations are transparently propagated to subsequent analysis clusters to speed up analysis processing. This enables customers to run BOTH OLTP and OLAP workloads on a single database platform. In short, two computing engines and two data stores.

The core idea of MAE is to “make the data smaller”. Through the partitioning mechanism, data is broken up and processed by independent CPU resources. All the processed results are returned.

Within an MAE, it consists of multiple analysis nodes. The specific number of nodes can be obtained automatically by the automatic configuration advisor provided by the MySQL analysis engine. In the node, the data is stored in a hybrid column compression format. This facilitates vectorization, which results in very good query performance. Data is encoded and compressed before running in memory. This compression and optimization of memory usage, particularly for numeric and string data, improves performance and reduces memory footprint, thereby reducing costs for customers. Parallel operation techniques are also used on each node, which provides high cache hit ratios for analysis and good scalability between nodes. Each analysis node within the cluster and each core within the node can process partitioned data in parallel, including parallel scanning, joining, grouping, aggregation, and top-K processing.

The MySQL analysis engine implements the latest algorithms for distributed memory analysis processing. Connections within a partition can be handled quickly by using vectorization to build and probe the connection kernel. Network communication between analysis nodes is optimized by using asynchronous batch I/O. The algorithm is designed to overlap computation time with data communication across nodes, which helps achieve good scalability.

The integration of the MySQL analysis engine with MySQL database services provides a single data management platform for all OLTP and analysis needs of the enterprise. The MySQL Analysis Engine is designed to be a MySQL pluggable storage engine that completely hides all the underlying implementation details of the storage layer from the end user. Users and applications interact with MySQL analytics through MySQL database nodes in the cluster. Users connect to the MySQL analysis engine through standard tools and standards-based ODBC/JDBC connectors. The MySQL analysis engine supports the same ANSI SQL standard and ACID properties as MySQL, and supports different data types. This allows existing applications to take advantage of the MySQL analysis engine without making any changes to their applications for simple and fast integration. Once a user submits a query to the MySQL database, the MySQL query optimizer transparently decides whether to offload the query to the analysis cluster to speed up execution. This is based on whether the MySQL analysis engine supports all the operators and functions referenced in the query and if the estimated time to process the query using the analysis engine is less than in MySQL, the query will be pushed down to the analysis node for processing. After processing, the results are sent back to the MySQL database node and returned to the user.

Since the MySQL Analysis engine is a memory processing engine, data is persisted to the MySQL InnoDB storage engine. Any updates to the table are automatically propagated to the memory of the analysis node in real time. This allows subsequent queries to always have access to the latest data. This is done in the background with a lightweight change propagation algorithm that keeps up with MySQL’s data update rate. The synchronization principle is not specified?

3. What does MAE bring to the table?

Life is basically two things, topic selection and problem solving. The best life is to choose the right problem and solve it well at each key point. The greatest pain in life is to solve the right question, but choose the wrong question, and do not know that they have chosen the wrong question. Just as the greatest regret in life is not that you couldn’t, but that you could have.

From a product perspective, MySQL Analytics Engine is a cloud-native service that is specifically provided within the Oracle cloud infrastructure, providing compelling performance and cost for analysis workloads. Organizations that use the MySQL database to manage enterprise data can now run analysis queries using the MySQL analysis engine with significantly improved performance, lower cost, no ETL required, and support for real-time analysis. The service can be deployed only in the cloud or in a hybrid environment, simplifying the management of transaction and analysis applications. From its launched products, can be summarized as the following concepts:

❖ ONE DB

The core value of this product is the unified processing of scenarios. Previously MySQL had to use the following approach to handle mixed business scenarios (figure below)

MAE is introduced to enhance MySQL’s capability and achieve a unified effect.

❖ ONE SQL

The second value point is not only to unify the platform, but also to interact in the same way. The original logic does not need to be changed. There is no need to change either the transaction scenario at the front end or the data visualization analysis at the back end. This is attractive to users and protects their existing software assets.

❖ NO ETL

For users, there is no need to worry about the details of data synchronization and no need to write additional ETL jobs. This will greatly reduce the burden on users.

❖ NO TUNING

Before MAE, there were two ways for users to solve data analysis problems. Either in the repository or outside the repository. If it is solved in the library, it needs to do a lot of optimization work for MySQL or take storage engine for analysis scenario, which brings optimization work. The same is true for out-of-library solutions, where users need to do their own optimization. Using MAE doesn’t have to worry about this. It leverages automatic machine learning (AutoML) developed by Oracle LABS to automate all aspects of the service. Because this automation is based on machine learning, the system can intelligently predict and act on various scenarios, including automatically estimating the number of analysis nodes required for a workload. When the service starts, the database tables that run the analysis queries need to be loaded into the MySQL analysis cluster memory. The size of the cluster required depends on the tables and columns required to load and the compression of this data in memory. In a traditional configuration, the user has to guess the size of the cluster. Underestimation can result in data loading or query execution failure due to space constraints. Overestimation results in additional costs for unnecessary resources. As a result, the user iterates until the correct cluster size is determined, and this size estimate becomes inaccurate when the table is updated. MAE does this automatically.

❖ HIGH PERFORMANCE

MAE, as a distributed, scalable, in-memory, hybrid column query engine, achieves high performance through vectorization in memory and large-scale inter-node and intra-node parallel processing. The query processing has been optimized for Oracle cloud infrastructure, including network bandwidth optimization between nodes. Through these capabilities, MAE provides powerful analytical capabilities. Below is a performance comparison of its main competitors.

  • MySQL Analytics vs MySQL

  • MySQL Analytics vs Amazon Aurora

  • MySQL Analytics vs Amazon Redshift

❖ LOW COST

The cost of using MySQL database services with an analysis engine depends on the number of analysis nodes configured. The size of an analysis cluster depends on the size of the data set and the nature of the workload. An analysis node can hold about 400GB of data. When customers migrate to MySQL database services using Analytics Engine, they can expect to see significant cost reductions. Compared to Amazon Aurora and Redshift, the cost of the MySQL analytics engine is 1/3 of the cost.

❖ EASY SCALE

MAE currently supports up to 24 analysis nodes (thousands of cores) in a cluster, with a processing capacity of about 10 TB of analysis data. 10TB is the approximate amount of data that can be filled into the memory of the analysis node at a given time. There is no limit to the amount of data stored in the MySQL database, and the customer can choose which tables or columns to load from the MySQL database schema into the memory of the analysis node. If the tables are no longer needed for queries, the user can remove them from memory to make room for other data.

❖ EASY USE

MAE hides a lot of detail behind it, making it very simple for the front-end customer to use. You only need to configure the analysis cluster based on the recommended size, configure pre-accelerated query objects, and manually load the cluster for the first time. Then you can enjoy the acceleration of analysis clustering.

By looking at the execution plan, you can visually see if an analysis cluster is being used (Using Secondary Engine RAPID in the figure below)

In addition, for hybrid cloud scenarios (that is, customers who cannot deploy their data to the cloud), MySQL replication can be used to replicate local MySQL data to the MySQL analytics engine, again without the need for ETL. MAE is currently only available on the Oracle Cloud Infrastructure (OCI) Gen 2 Hardware platform.

Write in the last

MAE makes up for the shortcomings of MySQL in data analysis scenarios. We believe that this feature will further expand the use of MySQL and pave the way for its larger and more important enterprise adoption. Unfortunately, MAE can only be used in OCI at present, which is not available to the majority of offline users in China.