How can recorded data be analyzed, or graphically, in Python?
In this article, we will introduce numpy, matplotlib, PANDAS, scipy, and other packages for data analysis and graphics.
Prepare the environment
The Anaconda distribution is recommended for Python environments.
- Official: www.anaconda.com/products/in…
- Tsinghua source: mirrors.tuna.tsinghua.edu.cn/anaconda/ar…
Anaconda is a Python distribution for scientific computing that already includes a number of popular Python packages for scientific computing and data analysis.
Conda lists existing packages, and you’ll find several of the packages covered in this article:
$conda list | grep numpy numpy 1.17.2 py37h99e6662_0 $conda list | grep"matplot\|seaborn\|plotly"Matplotlib 3.1.1 py37h54f8f79_0 seaborn 0.9.0 py37_0 $conda list | grep"pandas\|scipy"Pandas 0.25.1 PY37H0A44026_0 SCIPY 1.3.1 PY37H1410FF5_0Copy the code
If you already have Python environments, PIP will install them:
pip install numpy matplotlib pandas scipy
# pypi mirror: https://mirrors.tuna.tsinghua.edu.cn/help/pypi/
Copy the code
Python 3.7.4 (Anaconda3-2019.10)
To prepare data
This article assumes data in the following format datA0.txt:
Id, data, timestamp 0, 55, 1592207702.688805 1, 41, 1592207702.783134 2, 57, 1592207702.883619 3, 59, 1592207702.980597 4, 58, 1592207703.08313 5, 41, 1592207703.183011 6, 52, 1592207703.281802...Copy the code
CSV format: comma separated, easy to read and write, Excel can be opened.
After that, we will achieve the following goals together:
- CSV data, NUMPY reading and calculation
- Data column data, matplotlib graphical
- Data column data, SCIPY interpolation, forming curves
- Timestamp column data, the difference before and after the analysis and the number of seconds
Numpy reads the data
Numpy can read CSV data directly from loadtxt.
import numpy as np
# id, (data), timestamp
datas = np.loadtxt(p, dtype=np.int32, delimiter=",", skiprows=1, usecols=(1))
Copy the code
dtype=np.int32
: Data typenp.int32
delimiter=","
: delimiter “,”skiprows=1
: Skip line 1usecols=(1)
: Reads column 1
If I read multiple columns,
# id, (data, timestamp)
dtype = {'names': ('data'.'timestamp'), 'formats': ('i4'.'f8')}
datas = np.loadtxt(path, dtype=dtype, delimiter=",", skiprows=1, usecols=(1.2))
Copy the code
Dtype: numpy.org/devdocs/ref…
Numpy analyzes the data
Numpy calculates mean value and sample standard deviation:
# average
data_avg = np.mean(datas)
# data_avg = np.average(datas)
# standard deviation
# data_std = np.std(datas)
# sample standard deviation
data_std = np.std(datas, ddof=1)
print(" avg: {:.2f}, std: {:.2f}, sum: {}".format(
data_avg, data_std, np.sum(datas)))
Copy the code
Matplotlib graphical
It only takes four lines to graphically display:
import sys
import matplotlib.pyplot as plt
import numpy as np
def _plot(path):
print("Load: {}".format(path))
# id, (data), timestamp
datas = np.loadtxt(path, dtype=np.int32, delimiter=",", skiprows=1, usecols=(1))
fig, ax = plt.subplots()
ax.plot(range(len(datas)), datas, label=str(i))
ax.legend()
plt.show()
if __name__ == "__main__":
if len(sys.argv) < 2:
sys.exit("python data_plot.py *.txt")
_plot(sys.argv[1])
Copy the code
ax.plot(x, y, …) Range (len(datas)).
See data_plot.py for the Gist address at the end of this article for the full code. The running effect is as follows:
$python data_plot.py datA0.txt Args nonzero: False Load: datA0.txt Size: 20 AVg: 52.15, STD: 8.57, sum: 1043Copy the code
Can read multiple files, display together:
$python data_plot.py data*.txt Args nonzero: False Load: datA0.txt Size: 20 AVg: 52.15, STD: 8.57, sum: 1043 Load: TXT size: 20 AVG: 53.35, STD: 6.78, sum: 1067Copy the code
Scipy interpolates data
Data of X and Y were interpolated by SCIPY and smoothed into curves:
from scipy import interpolate
xnew = np.arange(xvalues[0], xvalues[- 1].0.01)
ynew = interpolate.interp1d(xvalues, yvalues, kind='cubic')
Copy the code
See data_interp.py at the bottom of this Gist address for the complete code. The running effect is as follows:
python data_interp.py data0.txt
Copy the code
How to configure, delay, save, and view code and comments when matplotlib is graphed.
Pandas Analyzing data
Here we need to read the timestamp column,
# id, data, (timestamp)
stamps = np.loadtxt(path, dtype=np.float64, delimiter=",", skiprows=1, usecols=(2))
Copy the code
Numpy calculated before and after the difference,
stamps_diff = np.diff(stamps)
Copy the code
Pandas statistics the number of games per second.
stamps_int = np.array(stamps, dtype='int')
stamps_int = stamps_int - stamps_int[0]
import pandas as pd
stamps_s = pd.Series(data=stamps_int)
stamps_s = stamps_s.value_counts(sort=False)
Copy the code
The timestamp is changed to the integer second and the value is the same for pandas.
See stamp_diff.py for the Gist address for the complete code. The running effect is as follows:
python stamp_diff.py data0.txt
Copy the code
Matplotlib graphically displays multiple charts, also visible code.
conclusion
This article code Gist address: gist.github.com/ikuokuo/862…
Share practical tips and knowledge in Coding! Welcome to pay attention and grow together!