Background of data analysis
With the comprehensive integration of computer technology into social life, network data has been explosive growth, driving people into a new era of big data.
So the problem comes, there are so many data in the database, how to get valuable data quickly?
Data analysis can obtain hidden valuable information from massive data to help enterprises or individuals predict future trends and behaviors.
The bottom line: No matter what industry you’re in, mastering data analytics tends to make you more competitive.
What is data analytics
Data analysis is a process in which a large number of collected data are analyzed by using appropriate statistical analysis methods, useful information is extracted from them, and conclusions are formed, and detailed research and summary are made
The purpose of data analysis
To extract useful data from a large number of seemingly chaotic data information hidden in order to find out the internal law of the object of study.
Classification of data analysis
-
Descriptive data analysis
From a set of data, it is possible to summarize and describe the centralized and discrete situations of the dataCopy the code
-
Exploratory data analysis
Find patterns from massive data, and generate analytical models and research hypotheses.Copy the code
-
Confirmatory data analysis
Verify whether the conditions required for scientific hypothesis testing are met to ensure the reliability of confirmatory analysis.Copy the code
Application scenarios of data analysis
Marketing applications
Obtain consumers’ personal information in the form of membership card in order to further study their buying habits and find all kinds of valuable target groups.
Medical applications
By recording and analyzing a baby’s heartbeat, doctors can monitor premature and sick babies, and predict what symptoms a baby’s body is likely to show, which can help doctors better care for babies.
Network security applications
The new virus defense system can use data analysis technology to establish a potential attack identification and analysis model, monitor a large amount of network activity data and corresponding access behavior, and identify suspicious patterns of possible intrusion.
Transportation and logistics applications
Users can obtain data through the business system and GPS positioning system, and use the data to build an exchange prediction and analysis model, which can effectively predict real-time road conditions, logistics conditions, traffic flow and cargo throughput, so as to replenish goods in advance and formulate inventory management strategies.
The process of data analysis
Data analysis can be roughly divided into the following five stages:
Clear purpose and thinking | The data collection | The data processing | The data analysis | The data show |
---|---|---|---|---|
What business problem is being solved | Collect and integrate data | Data is cleaned, processed and sorted | Explore and analyze the data | Show the results of the analysis in a chart |
Why python was chosen for data analysis
The main reasons for choosing Python for data analysis are the following:
- The syntax is simple and concise, suitable for beginners
- Has a large and active scientific computing community
- Have strong general programming ability
- The universal language of the age of artificial intelligence
- Easy to connect to other languages
Data analysis environment
Here we use Anaconda’s Python environment.
We recommend that data analytics beginners install Anaconda to learn.
Anaconda is a distribution that provides easy access to and management of packages, as well as unified management of the environment.
- Python contains many popular Python libraries for science, mathematics, engineering, and data analysis
- Fully open source and free
- For academic use, you can apply for a free License
- The full platform supports Linux, Windows, and Mac OS X
Install Anaconda in Windows
The download address is linked below
Download website: www.anaconda.com/download/
Tsinghua mirror station download: mirrors.tuna.tsinghua.edu.cn/anaconda/ar…
Click Next to take the default installation path,
After the installation is complete, go to the “Start Menu” -> “All Programs” in the lower left corner of the system to find the Anaconda3 folder, you can see that this directory contains multiple components.
The home page of Anaconda Navigator is shown in the figure below.
Manage Python packages through Anaconda
Anaconda integrates common extension packages to facilitate the management of these extension packages, such as installing and uninstalling packages, which rely on Conda.
Conda is an open source package management system and environment management system running on Windows, Mac OS, and Linux that allows you to quickly install, run, and update software packages and their dependencies.
- In Windows, you can run the Anaconda Prompt command to check whether Conda is installed. Before configuring environment variables, add the path of the scripts directory in the Anaconda directory to environment variable path.
conda --version
Copy the code
If you want to quickly learn how to use the conda command management package, you can view the help files by typing ‘conda -h’ or ‘conda –help’ in the Anaconda Prompt.
-
You can use the list command to obtain information about installed packages in the current environment.
After the command is executed, the name and version of the installed package are displayed on the terminal.
conda list
Copy the code
- Use the search command to find packages available for installation
Conda search --full-name The full name of the packageCopy the code
In the above command, –full-name is the exact lookup parameter, followed by the full name of the package.
- If you want to install in a specified environment, you can explicitly specify the environment name at the end of the install command.
conda install --name env_name package_name
Copy the code
In the preceding command, env_name indicates the name of the installation environment, and package_name indicates the name of the package to be installed.
- If you want to uninstall packages in the specified environment, you can use the remove command to remove packages in the specified environment.
conda remove --name env_name package_name
Copy the code
To uninstall packages in the current environment, run the remove command.
- To update all packages in the current environment, run the following command:
conda update --all
Copy the code
- If you only want to update a package or packages, you simply add the package name after the update command, separated by Spaces.
conda update pandas numpy matplotlib
Copy the code
Miniconda, the smallest conda installation environment, contains only the most basic Python and Conda and their required dependencies. For space-conscious users, Miniconda is an option that contains only the most basic libraries, and requires manual installation of the rest.
Start the Jupyter Notebook that comes with Anaconda
In the Start menu, open the Anaconda3 directory and click Jupyter Notebook. The startup window is displayed.
At this point, the browser can open any link in the red box below.
The following figure shows the main interface of Jupyter Notebook in the browser. By default, the directory opened and saved is C:\Users\ Current user name.
Common data analysis tools
Python itself is not very powerful in data analysis, and some third-party extension libraries need to be installed to enhance its capabilities.
NumPy Pandas Matplotlib Seaborn NLTK
conclusion
As the first part of this column, this article first introduces the background, purpose, process of data analysis and why Python is chosen for data analysis. We then introduced you to a new Python environment, Anaconda, and taught you how to install and manage Python packages. We will then teach you how to enable the Jupyter Notebook and show you how to use it. Finally, some common data analysis tools are introduced. It is hoped that the reader will have a preliminary understanding of data analysis and prepare the development environment for the study of the following chapters.