The data analysis process is mainly divided into five key links:Clear data analysis purpose, clear data source and data caliber, data processing, data analysis, output.

1. Clarify the purpose of data analysis

Every thing is purposeful before it is done, so is data analysis. Before data analysis, we should first make clear why data analysis should be done. The purpose of data analysis is determined by disassembling users, requirements and scenarios.

1, the user

Is the user mentioned here to analyze the content or results to whom? The target audience is divided into three categories: you, internal business units and external customers. The latter two are mainly analyzed here.

Internal business departments:

This type of user usually develops different strategies to improve certain indicators of the enterprise, be it Marketing Department, operations department or maintenance department. They often guide enterprises to accumulate a large amount of data, but do not know how to use, how to form effective decisions through data analysis.

External customers:

Such users typically do not have one or more areas in the field of industry data, hope that through these data to understand his user or market, which happen to your business has such data, in this case realized through data value, form foreign output, data analysis to external users can better understand the market, also can form in the data value for your cash, Bring benefits to the enterprise.

2, requirements,

Your users that proposes: why do you want to do the data analysis, data analysis problem is they hope that through the data analysis found that the problem or hope to promote a business indicators, these are all need to know before doing data analysis, only to understand the need to develop a more reasonable data analysis methods (later introduced to the data analysis method).

3,

Scenarios are more data analysis scenarios. For example, business departments want to know the reasons for user loss in the registration process, so this is the scenario of the problem. Problems should be defined according to the scenario, data analysis ideas should be sorted out, and data analysis methods should be selected. \

Second, clear data source and data caliber

1. Data source

There are three main ways to obtain data. The first one is to obtain data acquisition tools based on front-end pages, such as GrowingIO and other visual data acquisition products. The second is in the process of product design through the way of data buried, when the need for data can be simple extraction, the premise of this way is in the product planning stage has been ready for the future data acquisition in advance; The third way is to find the research and development team to obtain data through background scripts or technical research and development if there is no functional burying point and visual acquisition tool in the early stage and data cannot be obtained.

2. Data caliber

Data caliber refers to the definition of a data indicator. For a simple example, the definition of churn varies from product to product or domain to domain. For ordinary e-commerce products, users who do not log in or purchase within three days are regarded as losing. However, for luxury e-commerce, users who do not log in or purchase within a few days are regarded as losing.

To clarify the data caliber, it is necessary to combine the requirements of the data analysis task proposer and specific business scenarios, and to define a clear data caliber is of key significance for subsequent data processing and data analysis. \

Third, data processing

The main work of data processing stage is data cleaning, data completion and data integration.

1. Data cleaning

Finding outliers in the data, such as for user login data processing for days, if one day the login number is much higher than normal, need analysis, so the day if there is a major marketing campaign, there was a mistake, or collect data by outliers can be found not only data collection method, at the same time may find data analysis through abnormal value goal. Credit card fraud, for example, is analyzed by looking for abnormal data.

2. Data completion

In terms of how to solve the problem of missing data, one method is to fill the average value according to the association relationship before and after the data, and the other method is to directly choose to lose the record and not to be used for data analysis. The two methods have their own advantages and disadvantages, and it is recommended to make a specific analysis based on specific problems.

3. Data integration

During data collection, there may be potential correlation between different types of data. Through data integration, data dimensions can be enriched and more valuable information can be found. If user registration data is associated with user purchase data, users can judge whether the goods they buy are for their own use or for gifts through the user’s basic attribute information. \

4. Data analysis

Data analysis ideas are also called data analysis methods. Data analysis must be goal-oriented, and the method of data analysis is selected by the purpose. Generally speaking, there are the following concentrated analysis ideas

1. Anomaly analysis

Analyze data to find abnormal situations and find solutions to abnormal problems.

2. Look for relationships

Association can also become shopping cart analysis. The familiar case of Wal-Mart diapers and beer is the best practice of association. By analyzing the relationship between different products or different behaviors, users’ habits can be discovered.

3. Classify and stratify

Classify and stratify users based on user characteristics and behaviors, form refined operation and accurate business recommendation, and further improve operation efficiency and conversion rate.

4, forecasting

Predict future behaviors of users through historical behaviors to improve user perception and user experience. \

Five, the output

As mentioned in the previous layer, the purpose of data analysis is to clearly understand users, products and current business forms through data, so as to obtain effective strategic decisions and guide the development of the next step.

How can data provide a clear understanding of users, products, and business ecosystems? It is impossible for business departments or external customers to intuitively understand the meaning behind the data through lines of boring numbers. Therefore, data visualization is needed, which simply means to convert rows of data into charts to visually show the trend of data and the association between data. In data visualization, it is necessary to consider how many dimensions of data and what data should be presented to the audience, which all affect the form of data visualization.

For example, pie charts can be used to show the gender ratio of registered users, curve charts can be used to show the growth trend of registered users over time, and bar charts or maps can be used to show where registered users belong. When choosing the visualization method, we should fully consider the characteristics of the data and what we hope to show with the chart, so as to show the analysis results in a more intuitive way.

In addition, the output of data analysis is usually presented in the form of data analysis report. The main structure of data analysis report is as follows:

Background of data Analysis

Data sources and data description

Data analysis method

Data visualization

The data of decision

The above is the general framework of a relatively formal data analysis report. If the data analysis results of daily newspapers are not required to be formal, the analysis can be done on a case-by-case basis. \

Six, summarized

Data analysis methodology must serve to guide specific work practice, so it is not enough to just master the methodology, but to constantly improve and optimize the method through practice. It’s only when you actually do the data analysis that you discover your weaknesses. It’s better to just do it.