BI: Business Intelligence.
All the means of intelligent operation, judgment and management for business can be grouped into the category of BI. Even AI (Artificial Intelligence) can still be a part of BI as long as it can provide the above functions.
What is intelligent business operation, judgment and management?
The story of “beer and diapers” is well known in the industry. When young dads buy diapers, they pack a pack of beer with them, so Walmart deliberately increases sales of both by putting them on the same shelf.
If you don’t have personal experience of this version, please open amazon app or website. If you input Data Warehouse, will you see a series of books with data Warehouse title? Select the first one and click on it, you will see the following screen:
This is a typical application case. All of the technologies that led to all of this can be summed up in BI.
BI is a concept-level enterprise department, to implement specific work, or pay attention to its implementation technology.
The case of amazon book purchase above reflects a BI application concept, namely recommendation. So how to implement this application, we still need to go into the technical details of the implementation. Specific engineers will be assigned to implement each detail, and each engineer will be assigned to the tasks involved in the problem.
(Image from Google, all rights reserved)
Buying books on Amazon is a fashionable thing to do, and it gives amazon a wealth of data on users’ “trajectoriness”. By recording this user behavior data, Amazon has created a data warehouse that stores your book purchases, mine, and others in a table, using SQL technology to store a combination of purchases as a specific purchase. When people buy books we have bought, they are offered recommended books through previously stored “special purchases”.
All the people in charge of technical implementation in the whole diagram can be called Data Engineer; A more accurate ETL to the right is called a BI person; More detailed, the implementation of ETL is called ETL engineer, Data Warehouse model designer is called Data Model engineer (Data Modeler), responsible for visual design is called BI Reporter.
Early BI implementations relied solely on data warehouses.
To implement the data warehouse, we need to mention two characters, Inmon and Kimball. The theory of data warehouse put forward by them basically laid the foundation for the understanding of logarithmic warehouse in our era.
(Image from Google, all rights reserved)
Kimball’s Data Warehouse theory is inclined to the large and unified Data Warehouse, while Inmon is inclined to the thematic Data Mart, advocating that the Data in the Data Mart should be returned to applications to facilitate business. The previous recommendation system is the prototype application of Data Mart.
Modern BI implementations are much richer: small data gameplay, big data gameplay, statistical reports, AD hoc queries, distributed computing, streaming computing, recommendation systems, knowledge graphs, NLP, voice interaction, pattern recognition. So data mining, data scientists and data analysts are added, and those responsible for the implementation of big data technology are called big data engineers.
The reason why modern BI is segmented is mainly due to the increase in distributed applications brought about by the data explosion.
Based on the current BI technology in the background of big data glow second spring, implementation architecture is becoming more and more complex. I choose the commonly used **Lamba Architecture ** to give the lecture.
Around 2010, big data began to emerge with Hadoop. At this time, data warehouse technology is still in its infancy, and there is no plan to replace this technical architecture for the time being, because the early investment and its own value continue to play a warm up. However, new data requirements have forced the adoption of new computing architectures, so many companies have adopted the strategy of combining traditional data warehouse technologies with distributed architectures based on Hadoop. Roughly as follows:
(Image from the network, all rights reserved)
The pink part belongs to the emerging data processing technology (mainly Hadoop, Spark, Hive, Kafka, Machine Learning, etc.), while the green part is the conventional data application.
In the post-modern BI Architecture, as Hadoop and other technologies become more mature, the original traditional data warehouse technology is gradually replaced, and distributed storage and computing are all used, thus forming Lambda Architecture. That is, Batch processing is used for part of the data, while real-time (Storm,Flink) and quasi-real-time (Spark) are used for the other part of the data to meet the requirements of data timeliness.
(Image from Google. All rights reserved.)
It can be seen that the Data porter is still ETL. I think this is a long-term popular position, and it will be even better if you can build the whole Data application. This is the hot Data Architect.
Artificial intelligence (ai)
With appropriate extension, AI technology will also be used in BI. This is the case with Amazon books. The following is the maturity curve of Gantner 2018 in the direction of ARTIFICIAL intelligence technology:
Machine Learning, NLP and Intelligence Application can all be used to serve BI. Therefore, the extension of BI is more and more difficult to clear division, more and more positions are called Data Engineer, Data Scientist.
A little personal opinion, do not like spray, welcome to discuss.