What is a data warehouse?
A data warehouse is a set of strategies that provide all types of data support for the decision-making process at all levels of the enterprise. It is a single data store, created for analytical reporting and decision support purposes. Provides guidance for business process improvement, monitoring time, cost, quality, and control for businesses that require business intelligence.
What are the characteristics of data warehouse?
1, high efficiency
The analysis data of the data warehouse is generally divided into day, week, month, quarter, year, etc. It can be seen that the data of the day cycle requires the highest efficiency, requiring customers to see the data analysis of yesterday within 24 hours or even 12 hours.
2. Extensibility
The reason why some large data warehouse system architecture design is complex is that considering the expansion in the next 3-5 years, in this way, the future does not have to spend money to rebuild the data warehouse system, can be very stable operation. It is mainly reflected in the rationality of data modeling. Some intermediate layers are added in the data warehouse scheme, so that the massive data flow has enough buffer, so that the data volume is much larger and the operation is not possible.
3. Theme-oriented
The data organization of the operational database is oriented to the transaction processing task, and each business system is separated separately, while the data in the data warehouse is organized according to a certain subject domain. The topic is an abstract concept corresponding to the application of traditional database. It is an abstraction to synthesize, classify and analyze the data in the enterprise information system at a higher level. Each topic corresponds to a macro area of analysis. A data warehouse removes data that is not useful for decision making and provides a concise view of a particular topic.
4. Integration
Operational transaction-oriented databases are usually associated with specific applications, independent of each other, and often heterogeneous. However, the data in the data warehouse is obtained through systematic processing, summary and arrangement on the basis of extracting and cleaning the original scattered data from the database, so the inconsistency in the source data must be eliminated to ensure that the information in the data warehouse is the consistent global information about the whole enterprise.
5. Reflect change
Operational database is mainly concerned about the current data within a certain period of time, and the data in the data warehouse usually contains historical information, the system records the enterprise from the past a certain point (such as the start point of application of data warehouse) to the various stages of the information, through these information, can to enterprise’s development and the future trend to make quantitative analysis and prediction.
Third, some common misunderstandings of data warehouse
1. The construction of data warehouse is a one-time project. The data warehouse actually needs to be updated every year, every month, every week or even every day. It is not a job that can be completed by entering historical data at one time.
2. A data warehouse is a very large warehouse. In fact, to measure the quality of a data warehouse is not measured by the amount of data. There are some high-quality data warehouse projects, and the amount of data is not very large.
As long as the data warehouse is established and used, the problem will be solved.
4. Focusing on internal archival data while ignoring the potential value of external data and image, audio and video files.
5. Data warehouse is to store all business data together. One of the goals of a data warehouse is to bring disparate businesses together, but it is often done analytically on purpose rather than integrating all the business data together.