Hello, everyone. I’m going to share with you some content about Python data analysis. I hope you enjoy it.
Artificial product placement:
PS: Small make up the last two days have a little bit lazy, did not send original for a long time, recently launched a pay column is in CSDN, for no party in publishing a manuscript written last year, interested students can go to the next (already upload part of chapter one set for sample chapters), mainly about SpringCloud micro some contents of services, If the overall typesetting down is printed on the physical book should be more than 400 pages, it is also a relatively thick book, of course, the price of this column is not expensive, as long as 9.9, the whole is not reviewed, may be more typos, of course, at this price, but also what bicycle?
What is data analysis?
Let’s get back to business. Before we do anything else, let’s know one thing. What is data analytics?
Have a problem of course is first Baidu ah, this still uses to ask!
Data analysis refers to the use of appropriate statistical analysis methods to analyze a large number of collected data, to summarize, understand and digest them, in order to maximize the development of data functions, play the role of data. Data analysis is a process in which data are studied and summarized in detail in order to extract useful information and form conclusions.
The mathematical foundations of data analysis were established in the early 20th century, but it was not until the advent of computers that practical operations became possible and data analysis became widespread. Data analysis is a combination of mathematics and computer science.
Baidu’s interpretation looked a little did not speak, small make up to a simple summary summary:
Data analysis this thing has a very important point is: a large number of data this delimited key ah, to test.
As to how much data is called a large amount of data, in fact this is not an accurate definition, can be thought of 1 MB of data is a large amount of data, also can be thought of 1 gb of data is a large amount of data, can also think 1 pb data is a large amount of data, but anyway, the amount of data must be big, can’t be dozens of hundreds of data, Data of this magnitude can be seen at a glance.
Then comes the second point: Mathematics, right, not wrong, is mathematics, statistics, in particular, when we have a large amount of data, using mathematical methods to a certain degree of data processing, then combined with the specific business analysis data, reach the final purpose of what we need, such as for some business monitoring, improve the management efficiency of enterprises, optimize the structure of enterprise management and so on.
In the age without computers, it is very difficult to conduct big data analysis even with the support of mathematics for a large amount of data. Let’s not say too much, but let’s think about how long it will take to draw a simple line chart if there are 1 million data without calculator.
So the last sentence of Baidu Encyclopedia is that data analysis is the product of the combination of mathematics and computer science.
Job prospects
Here certainly many students will ask, data analysis of the job is good to find wow, this career is mainly to do every day?
About this problem, had better solve actually, go up directly recruit for a job website to look next relevant recruit for a job information and the requirement to this post is good.
There aren’t a lot of data analyst positions that require Python skills.
A few JD’s:
There is no real connection between data analysis and Python. It is just that Python is more convenient to use when dealing with big data. If the amount of data is not that large, You can do it in Excel.
If you are learning this to find a job in data analysis, now you can go out and turn left. For the record, I can’t find a job in data analysis after reading this article.
If it is holding skills more not pressure body purpose, do a little ahead of time just reserve, so you can then look down.
Why do you need data analysis?
Before we talk about this question, let’s take a look at some classic examples of big data analysis:
1. Beer and diapers
Global retail giant Wal-mart has launched a promotion that combines beer with diapers after analyzing consumer behavior and discovering that men often treat themselves to a couple of beers when buying diapers for their babies. This led to a huge increase in diaper and beer sales. Nowadays, the data analysis results of “beer + diaper” have long become a classic case of big data technology application, which is talked about with great relish.
2. Google successfully predicted winter flu
In 2009, Google analyzed the most frequently searched words of 50 million Americans, compared them with data from the U.S. Centers for Disease Control from 2003 to 2008, when seasonal flu was circulating, and built a specific mathematical model. Finally, Google was able to predict the spread of the 2009 winter flu even by specific regions and states.
Data analysis can extract the information hidden behind a large number of data and summarize the internal laws of data.
Data analysis is gradually replacing the previous head-snapping decision-making mode in enterprises. Therefore, more and more enterprises begin to attach importance to data analysis, which can be seen from the recruitment of data analysis positions.
tool
The above mentioned so much data analysis background, in fact, just want to let you know roughly what is a data analysis, do not want to directly skip the matter.
The tools for data analysis are a matter of opinion, from Excel to various types of data, SQL statements, R and Python, which we plan to introduce in the future.
The choice of specific tools depends more on the application scenario. If there is a small amount of data, if you are familiar with the use of Excel, then Excel is the optimal solution, which is beyond doubt.
If the amount of data is already very large and stored in a variety of structured databases, THEN SQL language is an indispensable tool. If the amount of data is already very large and stored in a large data cluster, then USING R language or Python may be a good choice.
In Python, there are three toolkits known as the Data analysis Triad: Pandas, Numpy, and Matplotlib.
Pandas
Liverpoolfc.tv: pandas.pydata.org/
Chinese website: www.pypandas.cn/
Pandas what are Pandas?
Pandas is a powerful tool set for analyzing structured data; It is based on Numpy (high-performance matrix computing); It is used for data mining and analysis, and also provides data cleaning function.
DataFrame:
A DataFrame is a tabular data structure in Pandas that contains an ordered set of columns, each of which can be of a different value type (numeric, string, Boolean, etc.). A DataFrame can be a dictionary of Series, including row indexes and column indexes.
Series:
It is an object similar to a one-dimensional array, consisting of a set of data (various NumPy data types) and a set of data labels (indexes) associated with it. It is also possible to produce a simple Series object from just one set of data.
Numpy
Liverpoolfc.tv: numpy.org/
Chinese website: www.numpy.org.cn/
What is Numpy?
NumPy is the basic software package for scientific computation using Python. It includes, among other things:
- Powerful N – dimensional array objects.
- Precision broadcast function.
- A tool for integrating C/C+ and Fortran code.
- Powerful linear algebra, Fourier transform and random number functions.
Ndarray:
One of the most important features of NumPy is its N-dimensional array object, NDARray, which is a collection of data of the same type, indexed with a zero subscript for the elements in the collection. An Ndarray object is a multidimensional array that holds elements of the same type. Each element in NDARray has an area of memory with the same storage size.
One of the sharps slice and index:
The contents of an Ndarray object can be accessed and modified by indexing or slicing, just as a List is slicing in Python. Ndarray arrays can be indexed with subscripts from 0 to n, and slicing objects can be sliced from the original array using the built-in slice function with the start, stop, and step parameters.
Matplotlib
Website: www.matplotlib.org/
Chinese website: www.matplotlib.org.cn/
What is Matplotlib?
Matplotlib is a Python 2D drawing library that generates publish-quality graphics in a variety of hard copy formats and cross-platform interactive environments.
Matplotlib can help you?
Draw line graphs, scatter charts, contour charts, bar charts, bar charts, 3D graphics, and even graphic animation, etc.
Since the purpose of this series is to share Python content, of course, the following articles will focus on these three tools, and we will talk in detail about the use of the Three Musketeers of Python data analysis.