OLAP (Online Analytical Process) is a collection of Analytical operations based on the multidimensional model of data warehouse. In addition, it can flexibly provide roll-up, drill-down and other operations. It is a method of presenting integrated decision information. It is used in decision support systems, business intelligence, or data warehouse. Operations, sales and marketing, data analysts and other team members, not users. The main scene is in the system of analyzing operation based on the multi-dimensional model of data warehouse.
The concept of OLAP has two different ways of understanding in the broad and narrow sense. The broad meaning is the same as the literal meaning, generally refers to any analysis that does not update the data. But more often OLAP is understood in its narrow sense, that is, it is related to multidimensional analysis, based on Cube calculations.
So what is the difference between OLAP and OLTP?
Online Transaction Processing (OLTP) can also be a transaction-oriented Processing system. Personally understood as the main scenario for the real-time processing system with frequent user human-computer interaction, small amount of data and fast operation response. Database software such as Mysql and Oracle can be understood as the embodiment of OLTP industrial application software.
The amount of data in a single OLTP process is relatively small, and the tables involved are very limited, usually only one or two tables. OLAP is to find some regularity from a large amount of data, and it often uses aggregation methods such as count(), sum() and AVG () to understand the current situation and provide data support for future planning/decision making. Therefore, it is very common to connect and summarize data of multiple tables.
In order to express the difference of data volume and complexity with OLTP database, OLAP operation object is generally called data Warehouse, referred to as data warehouse. Data in a database warehouse often comes from multiple databases and corresponding business logs.
What are the salient features of OLAP
-
From the user’s thinking point of view, a multi-dimensional data model is constructed in advance according to the user’s thinking mode.
-
Users can quickly query and analyze data of various dimensions
-
Can dynamically switch between various dimensions or carry out multi-dimensional comprehensive analysis, with great analysis flexibility.
OLAP and data warehouse
The relationship between OLAP and data warehouse is complementary. Generally, it is based on data warehouse. After processing in data warehouse, the wide table is generated, and the data is imported into OLAP storage for data analysis tool to read.
The relationship between OLAP tools and data warehouses in a normalized data warehouse looks something like this:
In this case, OLAP does not allow access to the central database. On the one hand, the central database is normalized modeling, while OLAP only supports the analysis of dimension modeling data. On the other hand, the central database of the normalized data warehouse itself is not accessible to upper-level developers. In the case of a dimensional modeling data warehouse, the relationship between OLAP/BI tools and the data warehouse looks like this:
In a dimensional modeling data warehouse, OLAP can not only take numbers directly from the data warehouse for analysis, but also do the same for the cluster of data marts on which the architecture is built.
What are the architectural patterns and types of OLAP?
- MOLAP(Multidimensional Online Analytical Processing)
The MOLAP architecture generates a new cube, or an actual data cube. Its architecture is shown in the figure below:
In this cube, each cell corresponds to a direct address, and common queries have been precomputed. Therefore, each query is very fast, but the cube update is slow, so whether to use this architecture is a case by case analysis.
- ROLAP(Relational Online Analytical Processing)
Instead of generating an actual cube, the ROLAP architecture simulates the cube using a star schema and multiple relational tables. Its architecture is shown in the figure below:
Obviously, queries under this architecture are not as fast as MOLAP. Because in ROLAP, all queries are converted to SQL statements for execution. The execution of these SQL statements involves JOIN operations between multiple tables, which is not as fast as MOLAP.
- HOLAP(Hybrid Online Analytical Processing)
This architecture combines MOLAP and ROLAP with a hybrid solution, putting certain queries that require special acceleration into the MOLAP engine and calling the ROLAP engine for other queries.
DataCube Data Rubik’s Cube (OLAP basic Operation)
In the early days, when we had to manually extract information from a bunch of data, we would analyze a bunch of data reports. Usually these data reports are presented in two dimensions, which are two-dimensional tables of rows and columns. But in the real world, we can analyze data from multiple angles, and a data cube can be understood as a two-dimensional table with expanded dimensions.
The following figure shows a data cube with multidimensional data abstraction:
Although this example is three-dimensional, more often the data cube is n-dimensional. For most pure OLAP users, the object of data analysis is this logical concept of a data cube, the implementation of which needs no further elaboration. The basic usage for users of these OLAP tools is to first configure the dimension table, the fact table, and then tell OLAP which dimension and fact fields and operation types to display at each query.
Here are the five most common operations in the Data Cube (basic OLAP operations) :
-
slice
-
cutting
-
rotating
-
To get up
-
Drill down
Slice and dice
The operation of selecting a dimension member on a dimension of the data cube is called slicing, while selecting two or more dimensions is called slicing.
SELECT Locates. Region, Products. FROM Sales, Dates, Products, Locates WHERE Dates. Date_key = dates. Date_key AND sales.locate_key = Locates.Locate_key AND sales.product_key = Products.Product_key GROUP BY Locates. Region, products. category # cut block SELECT Locates. FROM Sales, Dates, Products, Locates WHERE (Dates). Dates = 2 OR Dates. Dates = 3) AND (Locates. Location = 'jiangsu' OR Locates. Location = 'Shanghai ') AND sales.date_key = dates. Date_key AND sales.locate_key = Locates Sales.Product_key = Products.Product_key GROUP BY Dates. Quarterly, Locates. Region, Products. ClassificationCopy the code
Rotation (Pivot)
Rotation refers to changing the presentation orientation of a report or page. To the user, it is a view operation, but from the perspective of SQL simulation, it is simply changing the order of the fields following the SELECT.
Rol-up and drill-down
Roll-up can be interpreted as “ignoring” certain dimensions; Drill-down is the subdivision of certain dimensions.
SELECT Locates. Region, Products. SUM(quantity) FROM Sales, Products Locates WHERE Sales.Locate_key = Locates.Locate_key AND Sales.Product_key = Products.Product_key GROUP BY Locates. Region, products. category # drill SELECT Locates. FROM Sales, Dates, Dates, Products. Locates WHERE Sales.Date_key = Dates.Date_key AND Sales.Locate_key = Locates.Locate_key AND Sales.Product_key = Products.Product_key GROUP BY Dates. Quarter. Month, Locates. Region, Products. ClassificationCopy the code
BI (Business Intelligence)
Business intelligence refers to the use of modern data warehouse technology, online analysis technology, data mining and data presentation technology for data analysis to achieve business value. Broadly speaking, BI is the use of various technologies to assist business decision making. It needs to be based on data of data warehouse and analyzed by OLAP system. If necessary, some data mining methods are needed to dig deeper value.
So what does that have to do with anything?
After the data warehouse is built, users can write SQL statements to access it and analyze its data. But writing SQL statements for each query is too cumbersome, and the SQL code for analyzing dimensional modeling data is fixed. Hence the OLAP tool, which is dedicated to analysis of dimensional modeling data. The BI tool is able to present OLAP results in a graph, and it and OLAP now appear together.