This is my 36th original article
I shared zhongtai before, and I shared data governance two days ago. One of you yelled at me to tell you about the data center. Although I have done data center, but not successful, writing always feel not too confident.
Since my friend invited me, I will sort out my understanding and share it with you.
OK, Let’s GO!
The evolution of the data industry
In the field of data, from the business direction can be divided into OLTP (online transaction processing) and OLAP (online analytical processing) two fields. It’s a mouthful, but it’s fun.
OLTP is where you do it locally, as opposed to a stand-alone transaction, where the data is recorded locally, which is called a stand-alone transaction, where the data is recorded on the server, which is called an online transaction.
OLAP is where the data is stored on a server and you call it locally for online analysis, called online analytical processing.
The evolution of OLTP has been the evolution of databases, from relational databases to stand-alone databases to highly available versions to distributed relational databases. In addition, in order to meet the requirements of data storage and query in various scenarios, a series of databases such as NoSQL, MPP and timing database have been born, which marks the beginning of the big data industry.
OLAP’s path is also interesting. Start by building a table in the business database and counting various fixed reports. Later, as more and more contents need to be analyzed, they began to continue to develop, which also ushered in the golden development period of big data analysts.
Wild times
In the early days, there was no data warehouse, and OLAP had very little content, just a few fixed reports, which developers stored directly in the business library.
Now many systems are still like this. For example, if you purchase an ERP or e-commerce platform, most of your own statements are in the same database with the business system. In fact, this time officially corresponds to the “single architecture” in the system architecture.
Data Warehouse age
As information systems continue to be built, managers begin to be less satisfied with a few fixed reports. They expect to see more details, find anomalies, and find problems. At this time, it is necessary to have an information system to meet their needs, DSS/BI system was born, almost at the same time, the concept of data warehouse was also put forward.
Around 1990, modern BI and data warehouse were born at about the same time. There’s nothing wrong with that. These two are made for each other. BI is the business application system of OLAP, and warehouse is created to support BI. Bill Inmon, the father of data warehousing, wrote a book in 1991 called Building a Data Warehouse. Yeah, that’s Inmon, kimball’s inmon.
At this point, business data and analytics data begin to diverge and take different paths. Business data processing (OLTP) is moving towards accuracy, timeliness and consistency. Analytical data manipulation (OLAP) is becoming historically static, aggregated, relational, and multidimensional.
A typical problem is that in a business database, you can’t answer the question of how much time it takes to pick, place, pay, ship, and complete an order. Of course, you can do this by having N time stamps redundant, but you can’t keep track of all state changes. Because OLTP is a reflection of the current situation.
OLAP can store historical status data in full tables, zipper tables, and other ways to analyze the history of each object.
In addition to zip-up tables, there are also full snapshot tables, incremental tables, and flow tables, all of which are used to collect historical data.
The purpose of data warehouse construction is for data analysis. In principle, data in a data warehouse is not allowed to be modified. HIVE has no update and delete functionality. Many people don’t understand why. HIVE was built for data warehousing.
Data warehouse to solve the core problem is actually mentioned above, to solve the historical situation tracking, to solve the data analysis ability, to solve a series of problems such as frequent changes in business.
To conclude with the words of Don Inmon:
Data Warehouse is a subject-oriented, Integrated, non-volatile and time-variant Data collection. For Decision Making Support.
I have an article that explains the full path and details of data warehouse construction. You can check it out: How to Build a Data warehouse.
What does this look like? Is it like microservices in the system architecture? Isn’t it? Horizontal stratification, vertical segmentation field. It’s the same idea as microservices. \
Data lake age
The data warehouse is great, and multidimensional analysis is just about everything your boss needs. It can make decision makers from the overall situation of the company, drilling down to the contribution of each salesman, greatly meet the control desire of decision makers, but also to the enterprise decision has brought a solid data basis.
However, the data warehouse also has its fatal drawbacks: all data must be defined before it can be used, all data is processed by ETL, and all data is aggregated.
As a data worker, you will certainly understand what this means. Once the data has been manipulated, the information will be lost.
In the age of algorithms, this is unacceptable.
So in 2010, after 20 years of data warehousing, Pentaho founder James Dixon came up with the concept of a “data lake. Simply put, a data lake can be thought of as a giant ODS layer.
Any student who uses data can directly go to the data lake to extract data freely:
- If the problem still cannot be solved after the fineness is drilled in the multidimensional analysis report, the original data will be checked in the data lake to find the root cause.
- During the algorithm design, some information has been lost in the data processed in the data warehouse, so we can go to the data lake to find more detailed and richer underlying data, and maybe we can find the best features.
Data median age
A data lake may seem like a perfect solution to any problem, but it won’t fool you. Yes, data lake is at best a giant source of raw, untapped data, or at worst a data dump. No matter how well you manage it, it doesn’t change that.
You have now found an exceptional customer and want to find out how this customer is performing in your company’s business flow. We should communicate and follow up with them through CRM. Transactions with it through trading platforms; Goods are purchased through ERP, goods storage information is recorded through WMS, and goods transportation process information is recorded through TMS. Finally, you received his complaint message on weibo and received a complaint phone call in the CallCenter of the customer service center.
What do you want to do at this point? The systems are built independently, all the data is in the data lake, and you just can’t connect them! And it’s a line of business. Companies usually have N lines of business, and each line of business has its own system, and a customer has more and more systems to interact with the company.
At this point, the data center appears.
The picture comes from ali Cloud data Center solution manual
So you can see what the data center is solving:
- Entity open and portrait -OneID;
- Unified construction and management of data assets -OneModel;
- Unified service for data services -OneService
These three points together constitute the methodology system of OneData.
OneID is the lowest level of data through which the same entities (such as customers) of various business lines and business systems are identified in a unified manner. The feeling on the client side is that you can access all alibaba apps with one ID. The sense of the enterprise side is that no matter what client the user uses, they can identify themselves as a user through that system.
OneModel is a unified modeling of mid-tier data, which is essentially a data warehouse. It’s just not a line of business data warehouse, it’s a unified data warehouse for the whole enterprise, the whole group.
OneService is a unified service offering at the business level. It’s the same thing, master data, AD hoc queries, fixed reports, multidimensional analytics, and so on. Of course there will be more algorithmic testing, and that’s just testing.
In my previous work, I had set up the data center. The main content of my work was to make the connection between commodity ID and user ID, carry out global unified modeling, and provide unified coding work and unified data output of commodity master data.
If you read my previous post “All in one Breath — Give you an architect’s Perspective — click to view”, you will find that this is not the Architecture of The Middle Platform?
As you can see in the figure above, it was already clear in the previous article what a data center is. At the data level, service integration, standardization, unification and unification of multiple lines of business are carried out, which is the data center. Each architecture is to solve specific problems, data center to solve the core problem is global unification, global standards, global access.
So, don’t mythologize the center, and don’t mythologize the data center. If you don’t have this problem, or if the most urgent problems at the moment are not what the CENTRAL Taiwan is best at solving, then don’t follow suit and build the Central Taiwan. That would be a terrible death.
Above, with you! \
The data center is still too big for one article. Well, I have collected the product construction plans of three mainstream data zhongtai suppliers in the market and a white paper on data zhongtai from a consulting company to contribute to you. Follow the official account “Big DATA Architect” and reply to DATA in the background to get the download link. This is all hard-earned internal information ~~~
By the way, last time I shared the “Construction of Data Governance System”, I specially prepared the internationally popular “DAMA-DMbok Data Management Knowledge System” file, you can get the download link by replying to DAMA backstage. Go to the public to get it ~~~
Highlights from the past
Through the middle dry | the architect’s point of view
The underlying rules of data driven business | theory of “force”
Hot article | system of big data engineer career path whole solution
Retweet, like, watch, arrange?