How do I operate pandas?

Practicing with an actual data set is probably the fastest and best way.

Explore the basics of manipulating and manipulating data in Pandas.

1. Get stock data¶

To obtain data using the data API, see BigQuant Data API for details

Df = d.new holdings (stock, '2017-08-01', '2017-08-05', stock = d.new holdings ()[:20] ['company_name','company_type','volume', 'fs_net_profit','fs_roe']) Return on equity df.head() # Only look at the first 5 linesCopy the code
instrument date volume company_name company_type fs_net_profit fs_roe
0 000001.SZA 2017-08-01 203570991 Ping An Bank Co., Ltd. The public enterprise 6.214000 e+09 3.0319
1 000002.SZA 2017-08-01 20952262 China Vanke Co., Ltd. The public enterprise 6.954116 e+08 0.6116
2 000004.SZA 2017-08-01 653388 Shenzhen Cau Technology Co., Ltd. The private enterprise 1.177969 e+06 0.9797
3 000005.SZA 2017-08-01 7343560 Shenzhen Fountain Corporation The private enterprise 1.014027 e+07 0.7938
4 000006.SZA 2017-08-01 19458890 Shenzhen Zhenye (Group) Co., Ltd. Local state-owned enterprises 1.038588 e+08 2.0537

2. Preliminary study of data¶

Df.head (3Copy the code
instrument date volume company_name company_type fs_net_profit fs_roe
0 000001.SZA 2017-08-01 203570991 Ping An Bank Co., Ltd. The public enterprise 6.214000 e+09 3.0319
1 000002.SZA 2017-08-01 20952262 China Vanke Co., Ltd. The public enterprise 6.954116 e+08 0.6116
2 000004.SZA 2017-08-01 653388 Shenzhen Cau Technology Co., Ltd. The private enterprise 1.177969 e+06 0.9797
Df.tail (3Copy the code
instrument date volume company_name company_type fs_net_profit fs_roe
77 000021.SZA 2017-08-04 11471227 Shenzhen Kaifa Technology Co., Ltd. Central state-owned enterprise 116724688.0 2.1430
78 000022.SZA 2017-08-04 13885068 Shenzhen Chiwan Wharf Holdings Limited Central state-owned enterprise 138844496.0 2.9044
79 000023.SZA 2017-08-04 0 Shenzhen Universe (Group) Co., Ltd The public enterprise 10531537.0 2.6827
Df.sample (5) # 5 pieces of data were randomly selected for viewingCopy the code
instrument date volume company_name company_type fs_net_profit fs_roe
67 000009.SZA 2017-08-04 84400932 China Baoan Group Co., Ltd. The public enterprise 3.347200 e+07 0.7437
51 000014.SZA 2017-08-03 1209840 Shahe Industrial Co., Ltd. Local state-owned enterprises 3.359943 e+06 0.4574
38 000022.SZA 2017-08-02 10208777 Shenzhen Chiwan Wharf Holdings Limited Central state-owned enterprise 1.388445 e+08 2.9044
19 000023.SZA 2017-08-01 0 Shenzhen Universe (Group) Co., Ltd The public enterprise 1.053154 e+07 2.6827
74 000018.SZA 2017-08-04 9099017 Sino Great Wall Co., Ltd. The private enterprise 9.902958 e+07 5.4350
Df.shape # Check the column size of the dataCopy the code
(80, 7)Copy the code
Df.columns # Check the column name of the dataCopy the code
Index(['instrument', 'date', 'volume', 'company_name', 'company_type',
       'fs_net_profit', 'fs_roe'],
      dtype='object')Copy the code
Df.describe () # View statistics of dataCopy the code
volume fs_net_profit fs_roe
count 8.000000 e+01 8.000000 e+01 80.000000
mean 2.153483 e+07 3.934997 e+08 1.595305
std 3.982048 e+07 1.353417 e+09 2.877810
min 0.000000 e+00 2.550807 e+07 1.158000
25% 3.457600 e+06 1.352505 e+05 0.058325
50% 7.385609 e+06 1.838845 e+07 0.821900
75% 1.738763 e+07 1.222546 e+08 2.286550
max 2.062069 e+08 6.214000 e+09 11.774600
Df.info () # check the imported data typeCopy the code
<class 'pandas.core.frame.DataFrame'> Int64Index: 80 entries, 0 to 79 Data columns (total 7 columns): instrument 80 non-null object date 80 non-null datetime64[ns] volume 80 non-null int64 company_name 80 non-null object company_type 80 non-null object fs_net_profit 80 non-null float64 fs_roe 80 non-null float64 dtypes: Datetime64 [NS](1), Float64 (2), INT64 (1), Object (3) Memory Usage: 5.0+ KBCopy the code

3. Row/column selection¶

Df.iloc [22] # use the iloc command for line selectionCopy the code
Instrument 000004.SZA date 2017-08-02 00:00:00 Volume 792395 company_name Shenzhen Cau Technology Co.,Ltd. Company_type Private enterprise fs_net_profit -1.17797e+06 fs_roe -0.9797 Name: 22, dtype: objectCopy the code
Df.loc [22:25] # use the loc command to select linesCopy the code
instrument date volume company_name company_type fs_net_profit fs_roe
22 000004.SZA 2017-08-02 792395 Shenzhen Cau Technology Co., Ltd. The private enterprise 1.177969 e+06 0.9797
23 000005.SZA 2017-08-02 6313300 Shenzhen Fountain Corporation The private enterprise 1.014027 e+07 0.7938
24 000006.SZA 2017-08-02 15834085 Shenzhen Zhenye (Group) Co., Ltd. Local state-owned enterprises 1.038588 e+08 2.0537
25 000007.SZA 2017-08-02 5313578 Shenzhen Quanxinhao Co., Ltd The private enterprise 2.768926 e+05 0.0747
Df.loc [[22,33,44]Copy the code
instrument date volume company_name company_type fs_net_profit fs_roe
22 000004.SZA 2017-08-02 792395 Shenzhen Cau Technology Co., Ltd. The private enterprise 1.177969 e+06 0.9797
33 000017.SZA 2017-08-02 3603500 China Bicycle Company (Holdings) Limited The private enterprise 2.123222 e+05 1.4668
44 000006.SZA 2017-08-03 14414290 Shenzhen Zhenye (Group) Co., Ltd. Local state-owned enterprises 1.038588 e+08 2.0537
Df ['company_name'] #Copy the code
2 Shenzhen Cau Technology Co.,Ltd. 3 Shenzhen Cau Technology Corporation 4 Shenzhen Zhenye(Group) Co.,Ltd. 5 Shenzhen Quanxinhao Co.,Ltd. 6 China high-speed Railway Technology Co.,Ltd. 8 Shenzhen Ecobeauty Co.,Ltd. 9 Shenzhen Properties & Resources Development (Group) Ltd. 10 CSG Holding Co.,Ltd. 11 Shahe Industrial Co.,Ltd. 12 Konka Group Co.,Ltd. 13 China Bicycle Company (Holdings) Limited 14 Sino Great Wall Co.,Ltd Shenzhen Shenbao Industrial Co.,Ltd 16 Shenzhen Zhongheng Huafa Co.,Ltd. 17 Shenzhen Chiwan Technology Co.,Ltd. 18 Shenzhen Chiwan Wharf Holdings Limited 19 Shenzhen Universe (Group) Co.,Ltd. 20 Ping An Bank Co.,Ltd. 21 China Vanke Co.,Ltd Shenzhen Cau Technology Co.,Ltd. 23 Shenzhen Fountain Corporation 24 Shenzhen Zhenye(Group) Co.,Ltd. 25 Shenzhen Quanxinhao Co.,Ltd. 26 China High-speed Railway Technology Co.,Ltd. 27 China Baoan Group Co.,Ltd. 28 Shenzhen Ecobeauty Co.,Ltd. 29 Shenzhen Properties & Resources Development (Group) Ltd.... 50 CSG Holding Co.,Ltd. 51 Shahe Industrial Co.,Ltd. 52 Konka Group Co.,Ltd 53 China Bicycle Company (Holdings) Limited 54 Sino Great Wall Co.,Ltd. 55 Shenzhen Shenbao Industrial Co.,Ltd. 56 Shenzhen Zhongheng Huafa Co.,Ltd Shenzhen Kaifa Technology Co.,Ltd. 58 Shenzhen Chiwan Wharf Holdings Limited 59 Shenzhen Universe (Group) Co.,Ltd. 60 Ping An Bank Co.,Ltd. 61 China Vanke Co.,Ltd. 62 Shenzhen Cau Technology Co.,Ltd. 63 Shenzhen Fountain Corporation 64 Shenzhen Zhenye(Group) Co.,Ltd. 65 Shenzhen Quanxinhao Co.,Ltd. 66 China High-speed Railway Technology Co.,Ltd. 67 China Baoan Group Co.,Ltd Shahe Industrial Co.,Ltd. 72 Konka Group Co.,Ltd 73 China Bicycle Company (Holdings) Limited 74 Sino Great Wall Co.,Ltd. 75 Shenzhen Shenbao Industrial Co.,Ltd. 76 Shenzhen Zhongheng Huafa Co.,Ltd Shenzhen Chiwan Wharf Holdings Limited 79 Shenzhen Universe (Group) Co.,Ltd Name: company_name, Dtype: objectCopy the code
Df [['instrument', 'company_name', 'fs_roe']] #Copy the code
instrument company_name fs_roe
0 000001.SZA Ping An Bank Co., Ltd. 3.0319
1 000002.SZA China Vanke Co., Ltd. 0.6116
2 000004.SZA Shenzhen Cau Technology Co., Ltd. 0.9797
3 000005.SZA Shenzhen Fountain Corporation 0.7938
4 000006.SZA Shenzhen Zhenye (Group) Co., Ltd. 2.0537
5 000007.SZA Shenzhen Quanxinhao Co., Ltd 0.0747
6 000008.SZA China High-speed Railway Technology Co., Ltd. 0.1525
7 000009.SZA China Baoan Group Co., Ltd. 0.7437
8 000010.SZA Shenzhen Ecobeauty Co., Ltd. 1.1580
9 000011.SZA Shenzhen Properties & Resources Development (Group) Ltd. 11.7746
10 000012.SZA CSG Holding Co., Ltd. 2.1545
11 000014.SZA Shahe Industrial Co., Ltd. 0.4574
12 000016.SZA Konka Group Co., Ltd 0.9001
13 000017.SZA China Bicycle Company (Holdings) Limited 1.4668
14 000018.SZA Sino Great Wall Co., Ltd. 5.4350
15 000019.SZA Shenzhen Shenbao Industrial Co., Ltd 0.9659
16 000020.SZA Shenzhen Zhongheng Huafa Co., Ltd. 0.1317
17 000021.SZA Shenzhen Kaifa Technology Co., Ltd. 2.1430
18 000022.SZA Shenzhen Chiwan Wharf Holdings Limited 2.9044
19 000023.SZA Shenzhen Universe (Group) Co., Ltd 2.6827
20 000001.SZA Ping An Bank Co., Ltd. 3.0319
21 000002.SZA China Vanke Co., Ltd. 0.6116
22 000004.SZA Shenzhen Cau Technology Co., Ltd. 0.9797
23 000005.SZA Shenzhen Fountain Corporation 0.7938
24 000006.SZA Shenzhen Zhenye (Group) Co., Ltd. 2.0537
25 000007.SZA Shenzhen Quanxinhao Co., Ltd 0.0747
26 000008.SZA China High-speed Railway Technology Co., Ltd. 0.1525
27 000009.SZA China Baoan Group Co., Ltd. 0.7437
28 000010.SZA Shenzhen Ecobeauty Co., Ltd. 1.1580
29 000011.SZA Shenzhen Properties & Resources Development (Group) Ltd. 11.7746
. . . .
50 000012.SZA CSG Holding Co., Ltd. 2.1545
51 000014.SZA Shahe Industrial Co., Ltd. 0.4574
52 000016.SZA Konka Group Co., Ltd 0.9001
53 000017.SZA China Bicycle Company (Holdings) Limited 1.4668
54 000018.SZA Sino Great Wall Co., Ltd. 5.4350
55 000019.SZA Shenzhen Shenbao Industrial Co., Ltd 0.9659
56 000020.SZA Shenzhen Zhongheng Huafa Co., Ltd. 0.1317
57 000021.SZA Shenzhen Kaifa Technology Co., Ltd. 2.1430
58 000022.SZA Shenzhen Chiwan Wharf Holdings Limited 2.9044
59 000023.SZA Shenzhen Universe (Group) Co., Ltd 2.6827
60 000001.SZA Ping An Bank Co., Ltd. 3.0319
61 000002.SZA China Vanke Co., Ltd. 0.6116
62 000004.SZA Shenzhen Cau Technology Co., Ltd. 0.9797
63 000005.SZA Shenzhen Fountain Corporation 0.7938
64 000006.SZA Shenzhen Zhenye (Group) Co., Ltd. 2.0537
65 000007.SZA Shenzhen Quanxinhao Co., Ltd 0.0747
66 000008.SZA China High-speed Railway Technology Co., Ltd. 0.1525
67 000009.SZA China Baoan Group Co., Ltd. 0.7437
68 000010.SZA Shenzhen Ecobeauty Co., Ltd. 1.1580
69 000011.SZA Shenzhen Properties & Resources Development (Group) Ltd. 11.7746
70 000012.SZA CSG Holding Co., Ltd. 2.1545
71 000014.SZA Shahe Industrial Co., Ltd. 0.4574
72 000016.SZA Konka Group Co., Ltd 0.9001
73 000017.SZA China Bicycle Company (Holdings) Limited 1.4668
74 000018.SZA Sino Great Wall Co., Ltd. 5.4350
75 000019.SZA Shenzhen Shenbao Industrial Co., Ltd 0.9659
76 000020.SZA Shenzhen Zhongheng Huafa Co., Ltd. 0.1317
77 000021.SZA Shenzhen Kaifa Technology Co., Ltd. 2.1430
78 000022.SZA Shenzhen Chiwan Wharf Holdings Limited 2.9044
79 000023.SZA Shenzhen Universe (Group) Co., Ltd 2.6827

80 rows × 3 columns

Df.loc [:10, ['company_name', 'fs_roe']] #Copy the code
company_name fs_roe
0 Ping An Bank Co., Ltd. 3.0319
1 China Vanke Co., Ltd. 0.6116
2 Shenzhen Cau Technology Co., Ltd. 0.9797
3 Shenzhen Fountain Corporation 0.7938
4 Shenzhen Zhenye (Group) Co., Ltd. 2.0537
5 Shenzhen Quanxinhao Co., Ltd 0.0747
6 China High-speed Railway Technology Co., Ltd. 0.1525
7 China Baoan Group Co., Ltd. 0.7437
8 Shenzhen Ecobeauty Co., Ltd. 1.1580
9 Shenzhen Properties & Resources Development (Group) Ltd. 11.7746
10 CSG Holding Co., Ltd. 2.1545
Df.iloc [:5,3:Copy the code
company_name company_type fs_net_profit fs_roe
0 Ping An Bank Co., Ltd. The public enterprise 6.214000 e+09 3.0319
1 China Vanke Co., Ltd. The public enterprise 6.954116 e+08 0.6116
2 Shenzhen Cau Technology Co., Ltd. The private enterprise 1.177969 e+06 0.9797
3 Shenzhen Fountain Corporation The private enterprise 1.014027 e+07 0.7938
4 Shenzhen Zhenye (Group) Co., Ltd. Local state-owned enterprises 1.038588 e+08 2.0537

4. More row and column operations¶

Next, we will practice using what we have learned above:

Import numpy as np df['open_int'] = np.nanCopy the code
instrument date volume company_name company_type fs_net_profit fs_roe open_int
0 000001.SZA 2017-08-01 203570991 Ping An Bank Co., Ltd. The public enterprise 6.214000 e+09 3.0319 NaN
1 000002.SZA 2017-08-01 20952262 China Vanke Co., Ltd. The public enterprise 6.954116 e+08 0.6116 NaN
2 000004.SZA 2017-08-01 653388 Shenzhen Cau Technology Co., Ltd. The private enterprise 1.177969 e+06 0.9797 NaN
3 000005.SZA 2017-08-01 7343560 Shenzhen Fountain Corporation The private enterprise 1.014027 e+07 0.7938 NaN
4 000006.SZA 2017-08-01 19458890 Shenzhen Zhenye (Group) Co., Ltd. Local state-owned enterprises 1.038588 e+08 2.0537 NaN
Df ['open_int'] = 999 #Copy the code
instrument date volume company_name company_type fs_net_profit fs_roe open_int
0 000001.SZA 2017-08-01 203570991 Ping An Bank Co., Ltd. The public enterprise 6.214000 e+09 3.0319 999
1 000002.SZA 2017-08-01 20952262 China Vanke Co., Ltd. The public enterprise 6.954116 e+08 0.6116 999
2 000004.SZA 2017-08-01 653388 Shenzhen Cau Technology Co., Ltd. The private enterprise 1.177969 e+06 0.9797 999
3 000005.SZA 2017-08-01 7343560 Shenzhen Fountain Corporation The private enterprise 1.014027 e+07 0.7938 999
4 000006.SZA 2017-08-01 19458890 Shenzhen Zhenye (Group) Co., Ltd. Local state-owned enterprises 1.038588 e+08 2.0537 999
Df ['test'] = df.company_type == 'private' #Copy the code
instrument date volume company_name company_type fs_net_profit fs_roe open_int test
0 000001.SZA 2017-08-01 203570991 Ping An Bank Co., Ltd. The public enterprise 6.214000 e+09 3.0319 999 False
1 000002.SZA 2017-08-01 20952262 China Vanke Co., Ltd. The public enterprise 6.954116 e+08 0.6116 999 False
2 000004.SZA 2017-08-01 653388 Shenzhen Cau Technology Co., Ltd. The private enterprise 1.177969 e+06 0.9797 999 True
3 000005.SZA 2017-08-01 7343560 Shenzhen Fountain Corporation The private enterprise 1.014027 e+07 0.7938 999 True
4 000006.SZA 2017-08-01 19458890 Shenzhen Zhenye (Group) Co., Ltd. Local state-owned enterprises 1.038588 e+08 2.0537 999 False
Df.loc [df.company_type == 'private enterprise ', 'test'] =' private enterprise, tax burden is really not light ' Df.head () = 'private ', 'test'] =' private 'Copy the code
instrument date volume company_name company_type fs_net_profit fs_roe open_int test
0 000001.SZA 2017-08-01 203570991 Ping An Bank Co., Ltd. The public enterprise 6.214000 e+09 3.0319 999 It’s not a private enterprise. I don’t know what it is
1 000002.SZA 2017-08-01 20952262 China Vanke Co., Ltd. The public enterprise 6.954116 e+08 0.6116 999 It’s not a private enterprise. I don’t know what it is
2 000004.SZA 2017-08-01 653388 Shenzhen Cau Technology Co., Ltd. The private enterprise 1.177969 e+06 0.9797 999 He is a private enterprise, the tax burden is really not light
3 000005.SZA 2017-08-01 7343560 Shenzhen Fountain Corporation The private enterprise 1.014027 e+07 0.7938 999 He is a private enterprise, the tax burden is really not light
4 000006.SZA 2017-08-01 19458890 Shenzhen Zhenye (Group) Co., Ltd. Local state-owned enterprises 1.038588 e+08 2.0537 999 It’s not a private enterprise. I don’t know what it is
Df.loc [2:4, 'test'] = 'I'm not going to listen' # select a row and assign df.head()Copy the code
instrument date volume company_name company_type fs_net_profit fs_roe open_int test
0 000001.SZA 2017-08-01 203570991 Ping An Bank Co., Ltd. The public enterprise 6.214000 e+09 3.0319 999 It’s not a private enterprise. I don’t know what it is
1 000002.SZA 2017-08-01 20952262 China Vanke Co., Ltd. The public enterprise 6.954116 e+08 0.6116 999 It’s not a private enterprise. I don’t know what it is
2 000004.SZA 2017-08-01 653388 Shenzhen Cau Technology Co., Ltd. The private enterprise 1.177969 e+06 0.9797 999 I’m not listening. I’m not listening
3 000005.SZA 2017-08-01 7343560 Shenzhen Fountain Corporation The private enterprise 1.014027 e+07 0.7938 999 I’m not listening. I’m not listening
4 000006.SZA 2017-08-01 19458890 Shenzhen Zhenye (Group) Co., Ltd. Local state-owned enterprises 1.038588 e+08 2.0537 999 I’m not listening. I’m not listening
Df.rename (columns={'test':' columns '}, inplace=True) # Rename the column name and fix the df.head() operationCopy the code
instrument date volume company_name company_type fs_net_profit fs_roe open_int A random column
0 000001.SZA 2017-08-01 203570991 Ping An Bank Co., Ltd. The public enterprise 6.214000 e+09 3.0319 999 It’s not a private enterprise. I don’t know what it is
1 000002.SZA 2017-08-01 20952262 China Vanke Co., Ltd. The public enterprise 6.954116 e+08 0.6116 999 It’s not a private enterprise. I don’t know what it is
2 000004.SZA 2017-08-01 653388 Shenzhen Cau Technology Co., Ltd. The private enterprise 1.177969 e+06 0.9797 999 I’m not listening. I’m not listening
3 000005.SZA 2017-08-01 7343560 Shenzhen Fountain Corporation The private enterprise 1.014027 e+07 0.7938 999 I’m not listening. I’m not listening
4 000006.SZA 2017-08-01 19458890 Shenzhen Zhenye (Group) Co., Ltd. Local state-owned enterprises 1.038588 e+08 2.0537 999 I’m not listening. I’m not listening
Df_test.columns = [' columns %s' % STR (I) for I in range(1,len(df_test.columns)+1)Copy the code
Column 1 Column 2 Column 3 Column 4 Column 5 Column 6 The column 7 Column 8 The nine
0 000001.SZA 2017-08-01 203570991 Ping An Bank Co., Ltd. The public enterprise 6.214000 e+09 3.0319 999 It’s not a private enterprise. I don’t know what it is
1 000002.SZA 2017-08-01 20952262 China Vanke Co., Ltd. The public enterprise 6.954116 e+08 0.6116 999 It’s not a private enterprise. I don’t know what it is
2 000004.SZA 2017-08-01 653388 Shenzhen Cau Technology Co., Ltd. The private enterprise 1.177969 e+06 0.9797 999 I’m not listening. I’m not listening
3 000005.SZA 2017-08-01 7343560 Shenzhen Fountain Corporation The private enterprise 1.014027 e+07 0.7938 999 I’m not listening. I’m not listening
4 000006.SZA 2017-08-01 19458890 Shenzhen Zhenye (Group) Co., Ltd. Local state-owned enterprises 1.038588 e+08 2.0537 999 I’m not listening. I’m not listening
5 000007.SZA 2017-08-01 10004406 Shenzhen Quanxinhao Co., Ltd The private enterprise 2.768926 e+05 0.0747 999 He is a private enterprise, the tax burden is really not light
Df_test. Reindex (columns = [' 1 ', '2', '4', 'column 12', '3', '5', '6', '8', '7', 'nine']) # rearrangement column df_testCopy the code
Column 1 Column 2 Column 3 Column 4 Column 5 Column 6 The column 7 Column 8 The nine
0 000001.SZA 2017-08-01 203570991 Ping An Bank Co., Ltd. The public enterprise 6.214000 e+09 3.0319 999 It’s not a private enterprise. I don’t know what it is
1 000002.SZA 2017-08-01 20952262 China Vanke Co., Ltd. The public enterprise 6.954116 e+08 0.6116 999 It’s not a private enterprise. I don’t know what it is
2 000004.SZA 2017-08-01 653388 Shenzhen Cau Technology Co., Ltd. The private enterprise 1.177969 e+06 0.9797 999 I’m not listening. I’m not listening
3 000005.SZA 2017-08-01 7343560 Shenzhen Fountain Corporation The private enterprise 1.014027 e+07 0.7938 999 I’m not listening. I’m not listening
4 000006.SZA 2017-08-01 19458890 Shenzhen Zhenye (Group) Co., Ltd. Local state-owned enterprises 1.038588 e+08 2.0537 999 I’m not listening. I’m not listening
5 000007.SZA 2017-08-01 10004406 Shenzhen Quanxinhao Co., Ltd The private enterprise 2.768926 e+05 0.0747 999 He is a private enterprise, the tax burden is really not light
Df_test.reindex (index= [3,4,5,0,1,2]Copy the code
Column 1 Column 2 Column 3 Column 4 Column 5 Column 6 The column 7 Column 8 The nine
3 000005.SZA 2017-08-01 7343560 Shenzhen Fountain Corporation The private enterprise 1.014027 e+07 0.7938 999 I’m not listening. I’m not listening
4 000006.SZA 2017-08-01 19458890 Shenzhen Zhenye (Group) Co., Ltd. Local state-owned enterprises 1.038588 e+08 2.0537 999 I’m not listening. I’m not listening
5 000007.SZA 2017-08-01 10004406 Shenzhen Quanxinhao Co., Ltd The private enterprise 2.768926 e+05 0.0747 999 He is a private enterprise, the tax burden is really not light
0 000001.SZA 2017-08-01 203570991 Ping An Bank Co., Ltd. The public enterprise 6.214000 e+09 3.0319 999 It’s not a private enterprise. I don’t know what it is
1 000002.SZA 2017-08-01 20952262 China Vanke Co., Ltd. The public enterprise 6.954116 e+08 0.6116 999 It’s not a private enterprise. I don’t know what it is
2 000004.SZA 2017-08-01 653388 Shenzhen Cau Technology Co., Ltd. The private enterprise 1.177969 e+06 0.9797 999 I’m not listening. I’m not listening

5. Delete rows and columns¶

Drop ([2,5],axis=0) # drop rowsCopy the code
Column 1 Column 2 Column 3 Column 4 Column 5 Column 6 The column 7 Column 8 The nine
0 000001.SZA 2017-08-01 203570991 Ping An Bank Co., Ltd. The public enterprise 6.214000 e+09 3.0319 999 It’s not a private enterprise. I don’t know what it is
1 000002.SZA 2017-08-01 20952262 China Vanke Co., Ltd. The public enterprise 6.954116 e+08 0.6116 999 It’s not a private enterprise. I don’t know what it is
3 000005.SZA 2017-08-01 7343560 Shenzhen Fountain Corporation The private enterprise 1.014027 e+07 0.7938 999 I’m not listening. I’m not listening
4 000006.SZA 2017-08-01 19458890 Shenzhen Zhenye (Group) Co., Ltd. Local state-owned enterprises 1.038588 e+08 2.0537 999 I’m not listening. I’m not listening
Df_test. drop([' column 1',' column 2'], axis=1Copy the code
Column 3 Column 4 Column 5 Column 6 The column 7 Column 8 The nine
0 203570991 Ping An Bank Co., Ltd. The public enterprise 6.214000 e+09 3.0319 999 It’s not a private enterprise. I don’t know what it is
1 20952262 China Vanke Co., Ltd. The public enterprise 6.954116 e+08 0.6116 999 It’s not a private enterprise. I don’t know what it is
2 653388 Shenzhen Cau Technology Co., Ltd. The private enterprise 1.177969 e+06 0.9797 999 I’m not listening. I’m not listening
3 7343560 Shenzhen Fountain Corporation The private enterprise 1.014027 e+07 0.7938 999 I’m not listening. I’m not listening
4 19458890 Shenzhen Zhenye (Group) Co., Ltd. Local state-owned enterprises 1.038588 e+08 2.0537 999 I’m not listening. I’m not listening
5 10004406 Shenzhen Quanxinhao Co., Ltd The private enterprise 2.768926 e+05 0.0747 999 He is a private enterprise, the tax burden is really not light

6. Data type conversion¶

Df ['date'].sample(5) #Copy the code
9    2017-08-01
35   2017-08-02
54   2017-08-03
23   2017-08-02
5    2017-08-01
Name: date, dtype: datetime64[ns]Copy the code
Print (type(df.date[0])) df.date = df.date.map(lambda x: x.trftime ('%Y-%m-%d')) print(type(df.date[0]))Copy the code
<class 'pandas.tslib.Timestamp'>
<class 'str'>
Copy the code

7. Data filtering¶

Df [df['fs_roe']>1]. Head () # select fs_roe>10Copy the code
instrument date volume company_name company_type fs_net_profit fs_roe open_int A random column
0 000001.SZA 2017-08-01 203570991 Ping An Bank Co., Ltd. The public enterprise 6.214000 e+09 3.0319 999 It’s not a private enterprise. I don’t know what it is
4 000006.SZA 2017-08-01 19458890 Shenzhen Zhenye (Group) Co., Ltd. Local state-owned enterprises 1.038588 e+08 2.0537 999 I’m not listening. I’m not listening
9 000011.SZA 2017-08-01 10230327 Shenzhen Properties & Resources Development (Group) Ltd. Local state-owned enterprises 3.015978 e+08 11.7746 999 It’s not a private enterprise. I don’t know what it is
10 000012.SZA 2017-08-01 15099069 CSG Holding Co., Ltd. The public enterprise 1.701309 e+08 2.1545 999 It’s not a private enterprise. I don’t know what it is
13 000017.SZA 2017-08-01 2857907 China Bicycle Company (Holdings) Limited The private enterprise 2.123222 e+05 1.4668 999 He is a private enterprise, the tax burden is really not light
Df [(df [' fs_roe] > 1) & (df [' fs_roe] < 4)]. The head () # choose fs_roe in a certain range of dataCopy the code
instrument date volume company_name company_type fs_net_profit fs_roe open_int A random column
0 000001.SZA 2017-08-01 203570991 Ping An Bank Co., Ltd. The public enterprise 6.214000 e+09 3.0319 999 It’s not a private enterprise. I don’t know what it is
4 000006.SZA 2017-08-01 19458890 Shenzhen Zhenye (Group) Co., Ltd. Local state-owned enterprises 1.038588 e+08 2.0537 999 I’m not listening. I’m not listening
10 000012.SZA 2017-08-01 15099069 CSG Holding Co., Ltd. The public enterprise 1.701309 e+08 2.1545 999 It’s not a private enterprise. I don’t know what it is
13 000017.SZA 2017-08-01 2857907 China Bicycle Company (Holdings) Limited The private enterprise 2.123222 e+05 1.4668 999 He is a private enterprise, the tax burden is really not light
17 000021.SZA 2017-08-01 7955667 Shenzhen Kaifa Technology Co., Ltd. Central state-owned enterprise 1.167247 e+08 2.1430 999 It’s not a private enterprise. I don’t know what it is
Df [(df [' fs_roe] > 1) & (df [' company_type]! = 'local state-owned enterprises')], the head () # information choose to satisfy a variety of conditionsCopy the code
instrument date volume company_name company_type fs_net_profit fs_roe open_int A random column
0 000001.SZA 2017-08-01 203570991 Ping An Bank Co., Ltd. The public enterprise 6.214000 e+09 3.0319 999 It’s not a private enterprise. I don’t know what it is
10 000012.SZA 2017-08-01 15099069 CSG Holding Co., Ltd. The public enterprise 1.701309 e+08 2.1545 999 It’s not a private enterprise. I don’t know what it is
13 000017.SZA 2017-08-01 2857907 China Bicycle Company (Holdings) Limited The private enterprise 2.123222 e+05 1.4668 999 He is a private enterprise, the tax burden is really not light
14 000018.SZA 2017-08-01 8572320 Sino Great Wall Co., Ltd. The private enterprise 9.902958 e+07 5.4350 999 He is a private enterprise, the tax burden is really not light
17 000021.SZA 2017-08-01 7955667 Shenzhen Kaifa Technology Co., Ltd. Central state-owned enterprise 1.167247 e+08 2.1430 999 It’s not a private enterprise. I don’t know what it is

8. Data sorting¶

Df.sort_values (by='fs_roe').head() #Copy the code
instrument date volume company_name company_type fs_net_profit fs_roe open_int A random column
28 000010.SZA 2017-08-02 5410561 Shenzhen Ecobeauty Co., Ltd. The private enterprise 2.550807 e+07 1.1580 999 He is a private enterprise, the tax burden is really not light
68 000010.SZA 2017-08-04 5912089 Shenzhen Ecobeauty Co., Ltd. The private enterprise 2.550807 e+07 1.1580 999 He is a private enterprise, the tax burden is really not light
8 000010.SZA 2017-08-01 4519425 Shenzhen Ecobeauty Co., Ltd. The private enterprise 2.550807 e+07 1.1580 999 He is a private enterprise, the tax burden is really not light
48 000010.SZA 2017-08-03 4826822 Shenzhen Ecobeauty Co., Ltd. The private enterprise 2.550807 e+07 1.1580 999 He is a private enterprise, the tax burden is really not light
62 000004.SZA 2017-08-04 711022 Shenzhen Cau Technology Co., Ltd. The private enterprise 1.177969 e+06 0.9797 999 He is a private enterprise, the tax burden is really not light
Df.sort_values (by='fs_roe',ascending= False). Head () #ascendig= False, descending orderCopy the code
instrument date volume company_name company_type fs_net_profit fs_roe open_int A random column
69 000011.SZA 2017-08-04 5736572 Shenzhen Properties & Resources Development (Group) Ltd. Local state-owned enterprises 301597824.0 11.7746 999 It’s not a private enterprise. I don’t know what it is
49 000011.SZA 2017-08-03 7180287 Shenzhen Properties & Resources Development (Group) Ltd. Local state-owned enterprises 301597824.0 11.7746 999 It’s not a private enterprise. I don’t know what it is
29 000011.SZA 2017-08-02 8782902 Shenzhen Properties & Resources Development (Group) Ltd. Local state-owned enterprises 301597824.0 11.7746 999 It’s not a private enterprise. I don’t know what it is
9 000011.SZA 2017-08-01 10230327 Shenzhen Properties & Resources Development (Group) Ltd. Local state-owned enterprises 301597824.0 11.7746 999 It’s not a private enterprise. I don’t know what it is
14 000018.SZA 2017-08-01 8572320 Sino Great Wall Co., Ltd. The private enterprise 99029584.0 5.4350 999 He is a private enterprise, the tax burden is really not light
Df.sort_values (by= ['fs_roe','fs_net_profit'],ascending= False). Head () #Copy the code
instrument date volume company_name company_type fs_net_profit fs_roe open_int A random column
9 000011.SZA 2017-08-01 10230327 Shenzhen Properties & Resources Development (Group) Ltd. Local state-owned enterprises 301597824.0 11.7746 999 It’s not a private enterprise. I don’t know what it is
29 000011.SZA 2017-08-02 8782902 Shenzhen Properties & Resources Development (Group) Ltd. Local state-owned enterprises 301597824.0 11.7746 999 It’s not a private enterprise. I don’t know what it is
49 000011.SZA 2017-08-03 7180287 Shenzhen Properties & Resources Development (Group) Ltd. Local state-owned enterprises 301597824.0 11.7746 999 It’s not a private enterprise. I don’t know what it is
69 000011.SZA 2017-08-04 5736572 Shenzhen Properties & Resources Development (Group) Ltd. Local state-owned enterprises 301597824.0 11.7746 999 It’s not a private enterprise. I don’t know what it is
14 000018.SZA 2017-08-01 8572320 Sino Great Wall Co., Ltd. The private enterprise 99029584.0 5.4350 999 He is a private enterprise, the tax burden is really not light

9. Descriptive statistics of data¶

This section uses the ‘fs_roe’ column as an example to describe and collect statistics

Df ['fs_roe']. Mean () #Copy the code
1.5953049875795842Copy the code
Df ['fs_roe'].idxmax() #Copy the code
9Copy the code
Df.loc [df['fs_roe'].idxmin()] # locate the smallest column of fs_roeCopy the code
Instrument 000010.sZA Date 2017-08-01 Volume 4519425 company_name Shenzhen Ecobeauty Co.,Ltd. Company_type Private enterprise fs_net_profit He is a private enterprise with a heavy tax burden. Name: 8, dtype: objectCopy the code
Df.fs_net_education. corr(df.volume) #Copy the code
0.81501625795410615Copy the code
Df.com pany_type. Unique () #Copy the code
Array ([' public ', 'private ',' local state-owned enterprise ', 'central State-owned enterprise '], dType =object)Copy the code
Df.com pany_type. Value_counts () #Copy the code
Private 28 Public 24 Local State-Owned 16 Central state-owned 12 Name: company_type, dtype: int64Copy the code

10. Processing missing data¶

Df_test = df.sample(5) df_test.loc[df_test['fs_roe']<=1,'fs_roe'] = np.nan df_test.loc[66] = np.nan Df_test # Practice data set construction completedCopy the code
instrument date volume company_name company_type fs_net_profit fs_roe open_int A random column
23 000005.SZA 2017-08-02 6313300.0 Shenzhen Fountain Corporation The private enterprise 1.014027 e+07 NaN 999.0 He is a private enterprise, the tax burden is really not light
47 000009.SZA 2017-08-03 69392114.0 China Baoan Group Co., Ltd. The public enterprise 3.347200 e+07 NaN 999.0 It’s not a private enterprise. I don’t know what it is
16 000020.SZA 2017-08-01 0.0 Shenzhen Zhongheng Huafa Co., Ltd. The private enterprise 4.211734 e+05 NaN 999.0 He is a private enterprise, the tax burden is really not light
5 000007.SZA 2017-08-01 10004406.0 Shenzhen Quanxinhao Co., Ltd The private enterprise 2.768926 e+05 NaN 999.0 He is a private enterprise, the tax burden is really not light
40 000001.SZA 2017-08-03 98421938.0 Ping An Bank Co., Ltd. The public enterprise 6.214000 e+09 3.0319 999.0 It’s not a private enterprise. I don’t know what it is
66 NaN NaN NaN NaN NaN NaN NaN NaN NaN
Df_test.dropna () # Delete lines containing NaN valuesCopy the code
instrument date volume company_name company_type fs_net_profit fs_roe open_int A random column
40 000001.SZA 2017-08-03 98421938.0 Ping An Bank Co., Ltd. The public enterprise 6.214000 e+09 3.0319 999.0 It’s not a private enterprise. I don’t know what it is
Df_test. dropna(how= 'all',inplace= True) # drop all NaN rows and solidify the drop with 'inplace=True'. df_testCopy the code
instrument date volume company_name company_type fs_net_profit fs_roe open_int A random column
23 000005.SZA 2017-08-02 6313300.0 Shenzhen Fountain Corporation The private enterprise 1.014027 e+07 NaN 999.0 He is a private enterprise, the tax burden is really not light
47 000009.SZA 2017-08-03 69392114.0 China Baoan Group Co., Ltd. The public enterprise 3.347200 e+07 NaN 999.0 It’s not a private enterprise. I don’t know what it is
16 000020.SZA 2017-08-01 0.0 Shenzhen Zhongheng Huafa Co., Ltd. The private enterprise 4.211734 e+05 NaN 999.0 He is a private enterprise, the tax burden is really not light
5 000007.SZA 2017-08-01 10004406.0 Shenzhen Quanxinhao Co., Ltd The private enterprise 2.768926 e+05 NaN 999.0 He is a private enterprise, the tax burden is really not light
40 000001.SZA 2017-08-03 98421938.0 Ping An Bank Co., Ltd. The public enterprise 6.214000 e+09 3.0319 999.0 It’s not a private enterprise. I don’t know what it is
Df_test.dropna (axis= 1) # Drop columns containing NaN valuesCopy the code
instrument date volume company_name company_type fs_net_profit open_int A random column
23 000005.SZA 2017-08-02 6313300.0 Shenzhen Fountain Corporation The private enterprise 1.014027 e+07 999.0 He is a private enterprise, the tax burden is really not light
47 000009.SZA 2017-08-03 69392114.0 China Baoan Group Co., Ltd. The public enterprise 3.347200 e+07 999.0 It’s not a private enterprise. I don’t know what it is
16 000020.SZA 2017-08-01 0.0 Shenzhen Zhongheng Huafa Co., Ltd. The private enterprise 4.211734 e+05 999.0 He is a private enterprise, the tax burden is really not light
5 000007.SZA 2017-08-01 10004406.0 Shenzhen Quanxinhao Co., Ltd The private enterprise 2.768926 e+05 999.0 He is a private enterprise, the tax burden is really not light
40 000001.SZA 2017-08-03 98421938.0 Ping An Bank Co., Ltd. The public enterprise 6.214000 e+09 999.0 It’s not a private enterprise. I don’t know what it is

When working with data, the best way is not to delete it, but to populate it appropriately. Let’s construct a new data set to do the example operation.

Df_test.loc [23] = np.nan df_test.loc[[16,5],'volume'] = np.nanCopy the code
instrument date volume company_name company_type fs_net_profit fs_roe open_int A random column
23 NaN NaN NaN NaN NaN NaN NaN NaN NaN
47 000009.SZA 2017-08-03 69392114.0 China Baoan Group Co., Ltd. The public enterprise 3.347200 e+07 NaN 999.0 It’s not a private enterprise. I don’t know what it is
16 000020.SZA 2017-08-01 NaN Shenzhen Zhongheng Huafa Co., Ltd. The private enterprise 4.211734 e+05 NaN 999.0 He is a private enterprise, the tax burden is really not light
5 000007.SZA 2017-08-01 NaN Shenzhen Quanxinhao Co., Ltd The private enterprise 2.768926 e+05 NaN 999.0 He is a private enterprise, the tax burden is really not light
40 000001.SZA 2017-08-03 98421938.0 Ping An Bank Co., Ltd. The public enterprise 6.214000 e+09 3.0319 999.0 It’s not a private enterprise. I don’t know what it is
49 NaN NaN NaN NaN NaN NaN NaN NaN NaN
Df_test.fillna (0) # fills all missing data with 0Copy the code
instrument date volume company_name company_type fs_net_profit fs_roe open_int A random column
23 0 0 0.0 0 0 0.000000 e+00 0.0000 0.0 0
47 000009.SZA 2017-08-03 69392114.0 China Baoan Group Co., Ltd. The public enterprise 3.347200 e+07 0.0000 999.0 It’s not a private enterprise. I don’t know what it is
16 000020.SZA 2017-08-01 0.0 Shenzhen Zhongheng Huafa Co., Ltd. The private enterprise 4.211734 e+05 0.0000 999.0 He is a private enterprise, the tax burden is really not light
5 000007.SZA 2017-08-01 0.0 Shenzhen Quanxinhao Co., Ltd The private enterprise 2.768926 e+05 0.0000 999.0 He is a private enterprise, the tax burden is really not light
40 000001.SZA 2017-08-03 98421938.0 Ping An Bank Co., Ltd. The public enterprise 6.214000 e+09 3.0319 999.0 It’s not a private enterprise. I don’t know what it is
49 0 0 0.0 0 0 0.000000 e+00 0.0000 0.0 0
Df_test.fillna ({'date':'1988-09-01','volume':'20000000'}) # Fill the missing data in different columns with different valuesCopy the code
instrument date volume company_name company_type fs_net_profit fs_roe open_int A random column
23 NaN 1988-09-01 20000000 NaN NaN NaN NaN NaN NaN
47 000009.SZA 2017-08-03 6.93921 e+07 China Baoan Group Co., Ltd. The public enterprise 3.347200 e+07 NaN 999.0 It’s not a private enterprise. I don’t know what it is
16 000020.SZA 2017-08-01 20000000 Shenzhen Zhongheng Huafa Co., Ltd. The private enterprise 4.211734 e+05 NaN 999.0 He is a private enterprise, the tax burden is really not light
5 000007.SZA 2017-08-01 20000000 Shenzhen Quanxinhao Co., Ltd The private enterprise 2.768926 e+05 NaN 999.0 He is a private enterprise, the tax burden is really not light
40 000001.SZA 2017-08-03 9.84219 e+07 Ping An Bank Co., Ltd. The public enterprise 6.214000 e+09 3.0319 999.0 It’s not a private enterprise. I don’t know what it is
49 NaN 1988-09-01 20000000 NaN NaN NaN NaN NaN NaN
Df_test.volume.fillna (df_test.volume.mean()) # Fill with average valueCopy the code
23    83907026.0
47    69392114.0
16    83907026.0
5     83907026.0
40    98421938.0
49    83907026.0
Name: volume, dtype: float64Copy the code
Df_test.volume.fillna (method= 'ffill') # forward fill ('ffill') or backward fill ('bfill')Copy the code
23           NaN
47    69392114.0
16    69392114.0
5     69392114.0
40    98421938.0
49    98421938.0
Name: volume, dtype: float64Copy the code
Df_test.volume. fillna(method= 'ffill',limit= 1) #Copy the code
23           NaN
47    69392114.0
16    69392114.0
5            NaN
40    98421938.0
49    98421938.0
Name: volume, dtype: float64Copy the code

11. Data preservation¶

Save the cleaned data on the platform:

Df.to_csv ('df_Pandaslearning') # Save dataCopy the code

The basic uses of Pandas are introduced to Pandas.