Copyright Notice: This set of technical column is the author (Qin Kaixin) usually work summary and sublimation, through extracting cases from real business environment to summarize and share, and give business application tuning suggestions and cluster environment capacity planning and other content, please continue to pay attention to this set of blog. QQ email address: [email protected], if there is any academic exchange, please feel free to contact.
1. Data preprocessing
-
Time series data generation
Import pandas as pd import numpy as NP date_range: specifies the start time and period H: hours D: days M: Months # TIMES #2016 Jul 1 7/1/2016 1/7/2016 2016-07-01 2016/07/01 RNG = pd.date_range('2016-07-01', periods = 10, freq = '3D') rng DatetimeIndex(['2016-07-01', '2016-07-04', '2016-07-07', '2016-07-10', '2016-07-13', '2016-07-16', '2016-07-19', '2016-07-22', '2016-07-25', '2016-07-28'], dtype='datetime64[ns]', freq='3D') time=pd.Series(np.random.randn(20), Dt.datetime (2016,1,1),periods=20) print(time) 2016-01-01-0.129379 2016-01-02 0.164480 2016-01-03 -0.639117 2016-01-04-0.427224 2016-01-05 2.055133 2016-01-06 1.116075 2016-01-07 0.357426 2016-01-08 0.274249 2016-01-09 0.834405 2016-01-10-0.005444 2016-01-11-0.134409 2016-01-12 0.249318 2016-01-13-0.297842 2016-01-14 -0.128514 2016-01-15 0.063690 2016-01-16-2.246031 2016-01-17 0.359552 2016-01-18 0.383030 2016-01-19 0.402717 2016-01-20-0.694068 Freq: D, DType: float64Copy the code
-
Truncate filter
Time. Truncate (before='2016-1-10') 2016-01-10-0.005444 2016-01-11-0.134409 2016-01-12 0.249318 2016-01-13-0.297842 2016-01-14-0.128514 2016-01-15 0.063690 2016-01-16-2.246031 2016-01-17 0.359552 2016-01-18 0.383030 2016-01-19 Freq: D, dtype: Truncate (after='2016-1-10') 2016-01-01-0.129379 2016-01-02 0.164480 2016-01-03-0.639117 2016-01-04 -0.427224 2016-01-05 2.055133 2016-01-06 1.116075 2016-01-07 0.357426 2016-01-08 0.274249 2016-01-09 0.834405 2016-01-10 0.005444 Freq: D, dtype: Float64 print(time['2016-01-15':'2016-01-20']) 2016-01-15 0.063690 2016-01-16-2.246031 2016-01-17 0.359552 2016-01-18 Freq: D, dtype: float64 data=pd.date_range('2010-01-01','2011-01-01',freq='M') print(data) DatetimeIndex(['2010-01-31', '2010-02-28', '2010-03-31', '2010-04-30', '2010-05-31', '2010-06-30', '2010-07-31', '2010-08-31', '2010-09-30', '2010-10-31', '2010-11-30', '2010-12-31'], dtype='datetime64[ns]', freq='M') freq = 'D') rng pd.Series(range(len(rng)), index = rng) 2016-07-01 0 2016-07-02 1 2016-07-03 2 2016-07-04 3 2016-07-05 4 2016-07-06 5 2016-07-07 6 2016-07-08 7 2016-07-09 8 2016-07-10 9 Freq: D, dtype: int32Copy the code
-
Specify the index
periods = [pd.Period('2016-01'), pd.Period('2016-02'), pd.Period('2016-03')] ts = pd.Series(np.random.randn(len(periods)), index = periods) ts 2016-07-01 0 2016-07-02 1 2016-07-03 2 2016-07-04 3 2016-07-05 4 2016-07-06 5 2016-07-07 6 2016-07-08 7 2016-07-09 8 2016-07-10 9 Freq: D, dtype: int32 Copy the code
-
Timestamps and time periods are interchangeable
ts = pd.Series(range(10), pd.date_range('07-10-16 8:00', periods = 10, freq = 'H')) ts 2016-07-10 08:00:00 0 2016-07-10 09:00:00 1 2016-07-10 10:00:00 2 2016-07-10 11:00:00 3 2016-07-10 12:00:00 4 2016-07-10 13:00:00 5 2016-07-10 14:00:00 6 2016-07-10 15:00:00 7 2016-07-10 16:00:00 8 2016-07-10 17:00:00 9 Freq: H, dtype: int32 ts_period = ts.to_period() ts_period 2016-07-10 08:00 0 2016-07-10 09:00 1 2016-07-10 10:00 2 2016-07-10 11:00 3 2016-07-10 12:00 4 2016-07-10 13:00 5 2016-07-10 14:00 6 2016-07-10 15:00 7 2016-07-10 16:00 8 2016-07-10 17:00 9 Freq: H, dtype: int32 ts_period['2016-07-10 08:30':'2016-07-10 11:45'] 2016-07-10 08:00 0 2016-07-10 09:00 1 2016-07-10 10:00 2 2016-07-10 11:00 3 Freq: H, dtype: int32 ts['2016-07-10 08:30':'2016-07-10 11:45'] 2016-07-10 09:00:00 1 2016-07-10 10:00:00 2 2016-07-10 11:00:00 3 Freq: H, dtype: int32Copy the code
2 Resampling data
-
Time data is converted from one frequency to another
-
Drop the sampling
-
Liter sample
rng = pd.date_range('1/1/2011', periods=90, freq='D') ts = pd.Series(np.random.randn(len(rng)), Index = RNG) ts.head() 2011-01-01-1.025562 2011-01-02 0.410895 2011-01-03 0.660311 2011-01-04 0.710293 2011-01-05 0.444985 Freq: D, dtype: Float64 ts.resample('M'). Sum () 2011-01-31 2.510102 2011-02-28 0.583209 2011-03-31 2.749411 Freq: M, dtype: Float64 ts.resAMPLE ('3D'). Sum () 2011-01-01 0.045643 2011-01-04-2.255206 2011-01-07 0.571142 2011-01-10 0.835032 2011-01-19 2011-01-22 2011-01-22 2.883952 2011-01-25 1.566908 2011-01-28 1.435563 2011-01-31 0.311565 2011-02-03-2.541235 2011-02-06 0.317075 2011-02-09 1.598877 2011-02-12-1.950509 2011-02-15 2.928312 2011-02-18-0.733715 2011-02-21 1.674817 2011-02-24-2.078872 2011-02-27 2.172320 2011-03-02 -2.022104 2011-03-05-0.070356 2011-03-08 1.276671 2011-03-11-2.835132 2011-03-14-1.384113 2011-03-17 1.517565 2011-03-30 2011-03-26 2.244319 2011-03-29 2.951082 Freq: 3D, dtype: Float64 day3Ts = ts.resample('3D'). Mean () day3Ts 2011-01-01 0.015214 2011-01-04-0.751735 2011-01-07 0.190381 2011-01-10 0.278344 2011-01-13-0.132255 2011-01-16-0.385418 2011-01-19-0.428961 2011-01-22 0.961317 2011-01-25 0.522303 2011-01-28 0.478521 2011-01-31 0.103855 2011-02-03-0.847078 2011-02-06 0.105692 2011-02-09 0.532959 2011-02-12 2011-02-21 0.558272 2011-02-24 0.692957 2011-02-27 0.724107 2011-03-02-0.674035 2011-03-05-0.023452 2011-03-08 0.425557 2011-03-11-0.945044 2011-03-14-0.461371 2011-03-17 Freq: 3D, dtype: Freq: 3D, dtype: Print (day3ts.resample ('D').asfreq()) 2011-01-01 0.015214 2011-01-02 NaN 2011-01-03 NaN 2011-01-04 -0.751735 2011-01-05 NaN 2011-01-06 NaN 2011-01-07 0.190381 2011-01-08 NaN 2011-01-09 NaN 2011-01-10 0.278344 2011-01-11 NaN 2011-01-12 NaN 2011-01-13-0.132255 2011-01-14 NaN 2011-01-15 NaN 2011-01-16-0.385418 2011-01-17 NaN 2011-01-18 NaN 2011-01-19-0.428961 2011-01-20 NaN 2011-01-21 NaN 2011-01-22 0.961317 Freq: D, Length: 88, DType: FLOAT64Copy the code
-
The ffill null value takes the previous value
-
The bfill empty value takes the following value
-
Interpolate Specifies a linear value
Day3ts.resample ('D').ffill(1) 2011-01-01 0.015214 2011-01-02 0.015214 2011-01-03 NaN 2011-01-04-0.751735 2011-01-05 -0.751735 2011-01-06 NaN 2011-01-07 0.190381 2011-01-08 0.190381 2011-01-09 NaN 2011-01-10 0.278344 2011-01-11 0.278344 Day3ts.resample ('D'). Bfill (1) 2011-01-01 0.015214 2011-01-02 NaN 2011-01-03-0.751735 2011-01-04-0.751735 2011-01-05 NaN 2011-01-06 0.190381 2011-01-07 0.190381 2011-01-08 NaN 2011-01-09 0.278344 2011-01-10 0.278344 2011-01-11 NaN 2011-01-12-0.132255 2011-01-13-0.132255 day3ts.resample ('D'). Interpolate (' Linear ') 2011-01-01 0.015214 2011-01-02 -0.240435 2011-01-03-0.496085 2011-01-04-0.751735 2011-01-05-0.437697 2011-01-06-0.123658 2011-01-07 0.190381 2011-01-08 0.219702 2011-01-09 0.249023 2011-01-10 0.278344 2011-01-11 0.141478 2011-01-12 0.004611 2011-01-13-0.132255 The 2011-01-14-2011-01-15-0.301030-0.216643Copy the code
3 sliding window
-
Sliding window calculation
%matplotlib inline import matplotlib.pylab import numpy as np import pandas as pd df = pd.Series(np.random.randn(600), index = pd.date_range('7/1/2016', freq = 'D', Periods = 600)) df.head() 2016-07-01-0.192140 2016-07-02 0.357953 2016-07-03-0.201847 2016-07-04-0.372230 2016-07-05 1.414753 Freq: D, dtype: float64 r = df.rolling(window = 10) #r.max, r.median, r.std, r.skew, r.sum, r.var print(r.mean()) 016-07-01 NaN 2016-07-02 NaN 2016-07-03 NaN 2016-07-04 NaN 2016-07-05 NaN 2016-07-06 NaN 2016-07-07 NaN 2016-07-08 NaN 2016-07-09 NaN 2016-07-10 0.300133 2016-07-11 0.284780 2016-07-12 0.252831 2016-07-13 0.220699 2016-07-14 0.167137 2016-07-15 0.018593 2016-07-16-0.061414 2016-07-17-0.134593 2016-07-18-0.153333 2016-07-19-0.218928 2016-07-20-0.169426 2016-07-21-0.219747 2016-07-22-0.181266 2016-07-23-0.173674 2016-07-24 -0.130629 2016-07-25-0.166730 2016-07-26-0.233044 2016-07-27-0.256642 2016-07-28-0.280738 2016-07-29-0.289893 The 2016-07-30-0.379625... 2018-01-22-0.211467 2018-01-23 0.034996 2018-01-24-0.105910 2018-01-25-0.145774 2018-01-26-0.089320 2018-01-27 0.164370 2018-01-28-0.110892 2018-01-29-0.205786 2018-01-30-0.101162 2018-01-31-0.034760 2018-02-01 0.229333 2018-02-02 0.043741 2018-02-03 0.052837 2018-02-04 0.057746 2018-02-05-0.071401 2018-02-06-0.011153 2018-02-07 -0.045737 2018-02-08-0.021983 2018-02-09-0.196715 2018-02-10-0.063721 2018-02-11-0.289452 2018-02-12-0.050946 2018-02-13-0.047014 2018-02-14 0.048754 2018-02-15 0.143949 2018-02-16 0.424823 2018-02-17 0.361878 2018-02-18 0.363235 2018-02-19 0.517436 2018-02-20 0.368020 Freq: D, Length: 600, DTYPE: float64Copy the code
-
visualization
import matplotlib.pyplot as plt %matplotlib inline plt.figure(figsize=(15, 5)) df.plot(style='r--') df.rolling(window=10).mean().plot(style='b') Copy the code
4 ARIMA predict
-
Data preprocessing
import pandas_datareader import datetime import matplotlib.pylab as plt import seaborn as sns from matplotlib.pylab import style from statsmodels.tsa.arima_model import ARIMA from statsmodels.graphics.tsaplots import plot_acf, plot_pacf style.use('ggplot') plt.rcParams['font.sans-serif'] = ['SimHei'] plt.rcParams['axes.unicode_minus'] = False stockFile = 'data/T10yr.csv' stock = pd.read_csv(stockFile, index_col=0, parse_dates=[0]) stock.head(10) Copy the code
stock_week = stock['Close'].resample('W-MON').mean() stock_train = stock_week['2000':'2015'] Stock_train.plot (figsize=(12,8)) plt.legend(bbox_to_anchor=(1.25, 0.5)) plt.title("Stock Close") sns.despine()Copy the code
Stock_diff = stock_train.diff() stock_diff = stock_diff.dropna() plt.figure() plt.plot(stock_diff) plt.title(' first difference ') plt.show()Copy the code
acf = plot_acf(stock_diff, lags=20)
plt.title("ACF")
acf.show()
Copy the code
pacf = plot_pacf(stock_diff, lags=20)
plt.title("PACF")
pacf.show()
Copy the code
model = ARIMA(stock_train, order=(1, 1, 1),freq='W-MON') result = model.fit() #print(result.summary()) pred = result.predict('20140609', '20160701',dynamic=True, Typ ='levels') print (pred) 2014-06-09 2.463559 2014-06-16 2.455539 2014-06-23 2.449569 2014-06-30 2.444183 2014-07-07 2.438962 2014-07-14 2.433788 2014-07-21 2.428627 2014-07-28 2.423470 2014-08-04 2.418315 2014-08-11 2.413159 2014-08-18 2.408004 2014-08-25 2.402849 2014-09-01 2.397693 2014-09-08 2.392538 2014-09-15 2.387383 plt.figure(figsize=(6, 6)) plt.xticks(rotation=45) plt.plot(pred) plt.plot(stock_train)Copy the code
5 concludes
Easy to review, into notes, rough content, do not blame
Copyright Notice: This set of technical column is the author (Qin Kaixin) usually work summary and sublimation, through extracting cases from real business environment to summarize and share, and give business application tuning suggestions and cluster environment capacity planning and other content, please continue to pay attention to this set of blog. QQ email address: [email protected], if there is any academic exchange, please feel free to contact.