Recently, the stock market is quite hot. I got on the car in early July, but now it has fallen. Although I had some meat in the middle, I lost it when I got off the bus and didn’t even have any soup at the end.

In this article we will do a simple analysis of stock data in Python. The data set is 1,095 stocks on the Shanghai Stock Exchange from 1999 to 2016.


There are 1000 files.

Our analytical thinking is as follows:

  • New shares are issued every year
  • What are the largest companies by market capitalization today
  • How does a stock rise or fall over time
  • In a bull market, how do individual stocks perform

First import the module

import pandas as pd
import numpy as np
import os
import seaborn as sns
import matplotlib.pyplot as plt
# Drawing shows Chinese plt.rcParams['font.sans-serif'] = ['SimHei'] plt.rcParams['axes.unicode_minus'] = False Copy the code

Use pandas to read the file

file_list = os.listdir('./data/a-share/')

pieces = []
for file_name in file_list:
    path = './data/a-share/%s' % file_name
 file = pd.read_csv(path, encoding ='gb2312')  pieces.append(file)  shares = pd.concat(pieces) Copy the code

Encoding =’ GB2312 ‘to read files using read_csv. After merging the dataframes of the individual files, reset the indexes and preview the data

shares.reset_index(inplace=True, drop=True)
shares.head()
Copy the code

The columns we care most about here are date, code, abbreviation, and closing price.

Following the analytical thread, let’s first look at the total number of listed companies

len(shares['the code'].unique())
Copy the code

To the stock code weight, counting can see a total of 1095 listed companies. Let’s take a look at how many new public companies are listed every year

# Calculate the earliest trading time for each stock (i.e., listing time)
shares_min_date = shares.groupby('简称').agg({'date':'min'})
shares_min_date['Year of market'] = shares_min_date['date'].apply(lambda x: str(x)[:4])

# Number of public companies per year
shares_min_date.groupby('Year of market').count().plot() Copy the code

It can be seen that, in most cases, 60-80 companies were listed every year, but there were very few companies after listing from 2005 to 2013, especially only 1 company in 2013, because the IPO was suspended in 2013.

Let’s take a look at the latest time point in the data set (2016-06-08), which companies have large market capitalization

shares_market_value = shares[shares['date'] = ='2016-06-08'] [['简称'.'Total Market Value (YUAN)']].sort_values(by='Total Market Value (YUAN)', ascending=False)

# top10 largest companies by market capitalization
tmp_df = shares_market_value.head(10)

# drawing sns.barplot(x=tmp_df['Total Market Value (YUAN)'], y=tmp_df['简称']) Copy the code

As of June 8, 2016, the market capitalization of INDUSTRIAL and Commercial Bank of China (ICBC) reached 1.5 trillion yuan, worthy of being the largest bank in the universe. And you can see that most of the top 10 companies are banks.

Let’s take a look again, from 11.06.09-16.06.085 years of stock ups and downs. The starting point is 11.06.09 because it contains about 900 stocks, a large sample. Then, we take the closing prices of stocks for those two days and calculate the gains and losses

shares_110609 = shares[shares['date'] = ='2011-06-09'] [['the code'.'简称'.'Closing Price (yuan)']]
shares_160609 = shares[shares['date'] = ='2016-06-08'] [['the code'.'Closing Price (yuan)']]

# Correlating 2-day data by ticker symbol
shares_price = shares_110609.merge(shares_160609, on='the code')
shares_price Copy the code

There are 879 stocks

# How many stocks are going up
shares_price[shares_price['Up or down (%)'] > 0].count()
Copy the code

# How many stocks are going up
shares_price[shares_price['Up or down (%)'] < 0].count()
Copy the code

As you can see, 627 stocks, or 71%, rose. So let’s look at the distribution of the stocks that went up

bins = np.array([0.40.70.100.1700])
# Companies with rising stock prices
shares_up = shares_price[shares_price['Up or down (%)'] > 0]
# Group by increase
shares_up['label'] = pd.cut(shares_up['Up or down (%)'], bins)
# Group statistics up_label_count = shares_up[['label'.'the code']].groupby('label').count() up_label_count['proportion'] = up_label_count['the code'] / up_label_count.sum().values sns.barplot(x=up_label_count['proportion'], y=up_label_count.index) Copy the code

The distribution of rise is still quite extreme. Although the stocks that rise are generally higher, 30% of the stocks that rise rise less than 40%, which is 8% in an average year. If the annual income of financial management is 10%, 8% is obviously on the low side. Add in the stocks that are falling, and stocks that are yielding less than 10% are more than 50%, so money in the stock market isn’t that easy to make.

Of course, there are times when you’re lucky, like buying these stocks and holding them for the long term

# Top gainers
tmp_df = shares_up.sort_values(by='Up or down (%)', ascending=False)[:8]
sns.barplot(y=tmp_df['简称'], x=tmp_df['Up or down (%)'])
Copy the code

Shares like Gold Securities can multiply 16 times after five years.

In the same way, we can look at the distribution of stock declines


Because the code is similar, I won’t post it here. Statistically, nearly 70% of stocks fell in the 0-40% range five years later.

One last interesting statistic, let’s look at how individual stocks rise and fall during a bull market. We choose 14.06.30 and 15.06.08 these two days of stock ups and downs. And the idea is similar, so I’m just going to jump right into the data.

During a bull market, 99.6% of stocks go up, which means that almost all stocks go up. Let’s look at the distribution of the increase


As you can see, 86% of the stocks doubled, so the bull market is here, and you can basically make money with your eyes closed. Also don’t know when this kind of big bull market can come again, of course, the bull market can grasp is a big problem.

Got here my analysis, analysis of actually have a lot of interesting data, such as analysis combined with some p/e ratio and other dimensions, interested friends can explore by oneself, I think there is a more challenging analysis is to predict the trend of individual stocks, although in practice is not feasible, but still quite worth studying from the Angle of learning, if everybody thumb up more, I’ll think about writing it next week.

Data and source code has been packaged, the public can reply to the keyword stock.

Welcome the public account “du Ma” to export dry goods that cannot be seen elsewhere.