With the progress of society, the high demand for big data, high salary and high treatment have prompted many people to study and switch to the big data industry. What is learning big Data for? Become a senior engineer in big data. And where is the ability of big data engineers to get high salaries? Project experience, of course. Here are 12 big data projects used in Alibaba’s “Double 11″,” Double 12″, “Double Dan” ‘s upcoming “618” and Tencent’s big data:
The key components of a big data analysis project are as follows:
Information collection group, data cleaning group, data fusion group, data mining group, data visualization group.
It’s easy to understand by the name of each group.
The information collection group mainly collects data by web crawler. Of course, it can also collect data by different ways according to business requirements.
Data cleaning group is mainly to find out some invalid dirty data to remove or replace, the task is actually very large, because the amount of dirty data crawling to the data is very large, the working cycle of this group is generally very long, the task is also very heavy;
The data fusion group is mainly to classify similar course information, and those with superior and subordinate relationships are listed according to the relationship between subclasses and parent classes. The work of this group is very difficult to complete, and the fusion effect we have done is not good, so it is a difficulty to integrate well.
After data mining group is to get the available data through data mining algorithm, before going to the set of cause and effect relationship between impact factors, main classification algorithm of decision tree, bayes classification, based on the rules of classification, neural network, a vector machine (SVM), lazy learning algorithm of K – nearest neighbor classification and case-based reasoning algorithms;
As the name implies, the data visualization group is to visually display the results of the data mining group, so that you can intuitively see the relationship between data, and use data analysis and development tools to discover the processing process of unknown information
Understand the project composition of big data, so how to successfully make a project? How do you do that?
Signs of a successful project
Success is often the reverse of failure:
First, the project use case (goal/utility value) is clear.
From top to bottom, everyone know what you’re going to do this big data, and specific business units, including corporate finance director, for example, the Marketing Department, the big data project is used in the Marketing Department, they are also very clear, is responsible for the implementation of the technology sector is also very clear, this figure it out later, for everyone who completes the project is very important.
Second, improved project planning + rapid iterative r&d trial and error progress.
When you plan a project, don’t plan it for three months, six months, you do it the old-fashioned way, and it turns out that when the first phase is over, you go and test it and it doesn’t have what you want. When we do a big project, we need to do it in a fast iterative way. Every week, we can launch a function and conduct a quick test. Both internal and external markets are successfully tested, and the next week we can carry out the research and development, expansion and promotion of the next function. In this way, can through trial and error quickly, for example the direction of the second week do wrong, or there is no way to realize some function, or design is not the same with me, such a trial and error the price will be lower, not wait until six months to find significant errors, after adjustment for the third week can be, then can be in one direction, can adjust the content of the development, Or features, after three months, after four or five weeks of testing and development, basically the probability of making a mistake is lower.
Third, the selected technology meets the functional requirements of the big data project.
Many people have heard of to on a big data project must use some special techniques, big data project is the most important is not to choose the platform on the tall, or is a special technology, the most important thing is to choose a technology that conforms to the original design business functions, this technology may be relatively simple, may be the SAS software, or is a JAVA program, There is no need to be high tech, the most important thing is to meet your requirements. Many enterprises choose high quality products, but finally find that they have spent a lot of money, but failed to meet the expected requirements. Because if you choose high quality products, it will affect the integration of various aspects and the amount of data required, and the budget will be large, the cost will be relatively high, and it is difficult to achieve the goal of profit. So the most important thing is to pick a technology that fits the goals of your project. It’s very important.
Fourth, the project team has all aspects of professional knowledge and skills.
Big data technologies like enterprise do any type of innovative products and projects need to hire all contribute to the project, may be affected by the resources and may include human resources, including the technical resources, including market resources, including operations and so on various aspects of resources, formed a team, so it has the support of the leadership there is the consensus of everyone, The bottom line executives also know what they’re doing, and it’s important to have coordination and expertise.
Fifth, the project results obtain the desired results of the business use cases. The project took three months, six months, and when it came to fruition, it was a very important indicator that the business use case was achieving the desired results. A lot of times, it is difficult to is one hundred percent, average 80% of the project to reach the expected results completely, may be expected to reach 80%, that is very good, may reach 50%, also good, because it is an innovative project, can be continuously adjusted according to the desired project, the worst is only reached 20%, many enterprises do the project as a result, This is a statistical result that you can see. By industry standards, 50 percent is considered successful, and 80 percent is pretty good. Metrics for successful big data projects
There are five horizontal criteria for successful projects:
One is that the project can achieve or approach the predetermined goal within the predetermined time;
Second, the project or product realizes special internal and external business value that traditional data methods cannot bring;
Third, with limited big data investment, the benefits brought to a particular business can be easily replicated to other business areas. For example, the success achieved by the Marketing Department will be promoted to the r&d department of products, or to the business operation department. In this way, more work will be done at a small cost.
Fourth, the business departments that benefit from big data can use big data tools for efficient and convenient work, which is actually the most straightforward, because the original purpose of making a big data product or service project is to improve operational efficiency and work efficiency.
Fifth, through the implementation of this project, the enterprise has obtained a new business model and growth points, which is the most important. From a strategic point of view, this big data product and project has successfully realized the transformation and upgrading of the enterprise.
A roadmap for successful big data projects
The road map to big data success is divided into six steps:
Step 1: Identify big data use cases and innovation directions that will have a significant impact on your business.
Step 2: We need to develop a detailed product and service innovation plan based on big data projects.
Step 3: Understand in detail the business function requirements required by the big data project and select the technology to match them.
Step 4: Reach an internal consensus on the business benefits of big data projects.
Step 5: We should choose easy to achieve goals, rapid iterative research and development, trial and error, steady progress. That is to say, don’t just started, a lot of projects are tall, because the failure rate is almost one hundred percent, very prone to failure, because the budget is too large, choose tools are too complicated, arouses a lot of resources, it is difficult to achieve the goal of all at once, so we usually from a known target, easy to achieve, so that we can encourage morale, Mistakes are made in the early stages of development, not in the middle and at the end, which is most important.
Step 6: Do big projects and products data must dig and implementation can bring us big data of special value, it is the other way or the other kind of data can’t do, only to realize the value of this kind of special, we can realize the specific function of business need, whether to expand market share, or a more accurate understanding of your customers’ requirements, Or do you have to increase margins, or do you have to increase the time to market, shorten the development cycle, that’s what big data can do. Another is cross-border innovation. Traditional enterprises can combine their businesses with other enterprises through big data.
Here is a list of 12 projects that cover a wide range of areas
The following projects have video of construction and design. Friends with certain big data foundation and work experience can complete the construction of the whole project according to the video content. Very practical! Need video friends, join the small series of Java and big data communication circle 615997810 find group master to get video to learn the construction of these projects. Here is a brief introduction of the weight values of the main contents and fields of these 12 projects:
1. Offline data processing: The content of the project is to monitor the operation of the website by collecting and cleaning the website access logs and combining structured user data in the database to statistics and display the PV and UV conditions of the website. Through this project, we will review and connect the previous offline data processing related technologies, such as FIune, Sqoop, Hive, Spark, etc., to understand and master the general process and architecture of pB-level data offline processing.
2. Streaming data processing: The content of the project is to monitor the real-time transaction situation of the website through the real-time synchronization of the transaction data modification of the database, so as to improve the timeliness of the monitoring of the website transaction situation and reduce the risk of the website operation. Through this project, you can review and connect previous real-time data processing technologies, such as Kafka, Spark, Streaning, and HBase, to understand the general process and architecture of real-time data processing.
3. Recommendation system: Project content: product recommendation based on public database, product recommendation system analysis of a large Mutual Finance Company, through the analysis of the company’s actual recommendation project and the practical exercise of building a recommendation system based on real data, to understand the general architecture and common algorithms of the recommendation system.
4. Search system: the project content, the website crawler crawled the website data, and then built a complete search system based on KlastlcSeard and Klbana.
5. System operation dashboard: Through the collection and cleaning of website access logs, combined with structured user data in the database, statistics and display the PV and UV conditions of the website, so as to monitor the operation of the website. Through this project, we will review and connect the previous offline data processing related technologies, such as Flume, Sqoop, Hive, Spark, etc., to master the general process and architecture of pB-level data offline processing.
6. Real-time transaction monitoring system: Through the real-time synchronization of transaction data modification in the database, the real-time transaction situation of the website is monitored, so as to improve the timeliness of the monitoring of transaction situation of the website and reduce the risk of website operation. Through this project, we will review and connect the real-time data processing technologies mentioned above, such as Kafka, Spark Streaming and HBase, to master the general process and architecture of real-time data processing.
7. Recommendation system theory and practice: explain relevant background, common algorithms and general architecture of recommendation system; Build a movie recommendation system from scratch based on open data set. Through the analysis of the actual recommendation project of the company and the practical operation exercise of building the recommendation system based on real data, the general architecture and common algorithms of the recommendation system are understood
8. Data warehouse construction theory and practice: explain the methodology of data warehouse construction and common modeling theory; Starting from the data warehouse construction scene of Mutual Gold Company, examples are used to demonstrate the process and technical architecture of data warehouse construction.
9. Distributed business monitoring system: explain the background of business monitoring system requirements and technical solutions based on big data; Through the example code to build a complete business monitoring system
10. Log system based on ES: Set up system log collection and query system based on Flume, ElasticSearch and other technologies.
11. Credit Demand Prediction System: with jingdong credit demand Prediction competition as the background, examples explain how to design features, model basis, modeling and parameter adjustment in data mining projects.
12. User portrait system: Explain the demand background of user portrait system and the solution based on big data technology; Through example code to demonstrate the construction of user portrait system.
Do need to spend a lot of time to do a project, I hope everyone in the hard work of also should pay attention to your body at the same time, the body is the capital of revolution, also wish every champion of health, a successful career, hope everyone’s attention, need to know the specific content of the project can add communication circle to find a group manager access. 615997810