This article is participating in Python Theme Month, see [event link]
The preface
Recently, my job has stabilized, and I have more spare money on hand. I can’t bear to look at the small income when I put it in the bank every day. I am not in the mood to go to work every day, and I am thinking about how to improve my income. One day when I was at lunch, I saw my colleagues all buying funds, and suddenly I wanted to get into this pit. I really realized the importance of financial management. Every day after work, I did not look at the source code and JVM courses, but began to watch some popular science videos of FINANCE and economics at B station.
If you want to analyze funds, from thousands of funds to pick out a good fund, it must not be without years of data, as a programmer, for the convenience of analysis, I still feel that all the data first climb down, and then do further processing. But how do you get data processing data? Do you have to read them one by one and then write them down? D ‘ahh, that’s not gonna happen.
I chose the crawler this more common emerging technology, to learn a simple Python began to write their own crawler program began, to ye climb!
Analysis of the
To climb data, you need to think about where to climb? After a search and consideration, I found that every day fund network data is relatively complete, and very easy to climb, so from its start.
We randomly selected a last year’s popular king – merchants China Certificate liquor (161725), above his various information.
I’m using Chrome, hold down F12 to go into developer mode, select Network, and refresh to see a side that normal people don’t see. So if we go down to Network we can see the fund code.js? V, click on it to see the URL next to the requested address: http://fund.eastmoney.com/pingzhongdata/161725.js?v=20210719125054, that is to say, we can simply by http://fund.eastmoney.com/pingzhongdata/ Fund code.js? V = current time such an interface can obtain the corresponding data.
Let’s click Preview next to it, and holy crap, that’s where all the data we need is hidden.
Start of actual combat
The introduction of the module
Interface has now, has the data analysis also, code is http://fund.eastmoney.com/pingzhongdata/ funds. Js? V = the current time
Through the fund code and the current time we can obtain the corresponding data, the next is the need to extract the data we want from the obtained files, that is, we say the process of data cleaning. The data provided by this site is not in the usual JSON format, so it can be a bit tricky to extract, such as looking up strings, but since this is a JS file, I found a more appropriate way to use the PyExecJs module to easily compile and parse THE JS code.
First we need to install the PyExecJs module on the terminal.
pip install PyExecJs
Copy the code
These modules are then introduced
import requests
import time
import execjs
Copy the code
The interface structure
def getUrl(fscode) :
head = 'http://fund.eastmoney.com/pingzhongdata/'
tail = '.js? v='+ time.strftime("%Y%m%d%H%M%S",time.localtime())
return head+fscode+tail
Copy the code
Access to the net value of
def getWorth(fscode) :
Get the file from requests
content = requests.get(getUrl(fscode))
Get the corresponding data using execjs
jsContent = execjs.compile(content.text)
name = jsContent.eval('fS_name')
code = jsContent.eval('fS_code')
# Net unit value trend
netWorthTrend = jsContent.eval('Data_netWorthTrend')
# Cumulative net value trend
ACWorthTrend = jsContent.eval('Data_ACWorthTrend')
netWorth = []
ACWorth = []
# Extract the net value inside
for dayWorth in netWorthTrend[::-1]:
netWorth.append(dayWorth['y'])
for dayACWorth in ACWorthTrend[::-1]:
ACWorth.append(dayACWorth[1])
print(name,code)
return netWorth, ACWorth
Copy the code
View the data
So we can check the corresponding data through the fund code.
netWorth, ACWorth = getWorth('161725')
print(netWorth)
Copy the code
And we can see that it’s true, the first one is today’s net unit value.
Of course, we can draw our own chart to verify this.
import matplotlib.pyplot as plt
plt.figure(figsize=(10.5))
plt.plot(netWorth[:60] [: : -1])
plt.show()
Copy the code
The source code is given below.
import requests
import time
import execjs
import matplotlib.pyplot as plt
def getUrl(fscode) :
head = 'http://fund.eastmoney.com/pingzhongdata/'
tail = '.js? v='+ time.strftime("%Y%m%d%H%M%S",time.localtime())
return head+fscode+tail
def getWorth(fscode) :
Get the file from requests
content = requests.get(getUrl(fscode))
Get the corresponding data using execjs
jsContent = execjs.compile(content.text)
name = jsContent.eval('fS_name')
code = jsContent.eval('fS_code')
# Net unit value trend
netWorthTrend = jsContent.eval('Data_netWorthTrend')
# Cumulative net value trend
ACWorthTrend = jsContent.eval('Data_ACWorthTrend')
netWorth = []
ACWorth = []
# Extract the net value inside
for dayWorth in netWorthTrend[::-1]:
netWorth.append(dayWorth['y'])
for dayACWorth in ACWorthTrend[::-1]:
ACWorth.append(dayACWorth[1])
print(name,code)
return netWorth, ACWorth
netWorth, ACWorth = getWorth('161725')
plt.figure(figsize=(10.5))
plt.plot(netWorth[:60] [: : -1])
plt.show()
Copy the code
Get all fund data
Here I find the interface for all the fund lists in the same way. By ‘http://fund.eastmoney.com/js/fundcode_search.js’ can direct access to all the fund code, again through the fund code can traverse the crawl all fund data, I will download the data became a CSV, convenient excel open or use the code to read.
import requests
import time
import execjs
def getUrl(fscode) :
head = 'http://fund.eastmoney.com/pingzhongdata/'
tail = '.js? v='+ time.strftime("%Y%m%d%H%M%S",time.localtime())
return head+fscode+tail
Obtain net value according to fund code
def getWorth(fscode) :
content = requests.get(getUrl(fscode))
jsContent = execjs.compile(content.text)
name = jsContent.eval('fS_name')
code = jsContent.eval('fS_code')
# Net unit value trend
netWorthTrend = jsContent.eval('Data_netWorthTrend')
# Cumulative net value trend
ACWorthTrend = jsContent.eval('Data_ACWorthTrend')
netWorth = []
ACWorth = []
for dayWorth in netWorthTrend[::-1]:
netWorth.append(dayWorth['y'])
for dayACWorth in ACWorthTrend[::-1]:
ACWorth.append(dayACWorth[1])
print(name,code)
return netWorth, ACWorth
def getAllCode() :
url = 'http://fund.eastmoney.com/js/fundcode_search.js'
content = requests.get(url)
jsContent = execjs.compile(content.text)
rawData = jsContent.eval('r')
allCode = []
for code in rawData:
allCode.append(code[0])
return allCode
allCode = getAllCode()
netWorthFile = open('./netWorth.csv'.'w')
ACWorthFile = open('./ACWorth.csv'.'w')
for code in allCode:
try:
netWorth, ACWorth = getWorth(code)
except:
continue
if len(netWorth)<=0 or len(ACWorth)<0:
print(code+"'s' data is empty.")
continue
netWorthFile.write("\ '"+code+"\ '.")
netWorthFile.write(",".join(list(map(str, netWorth))))
netWorthFile.write("\n")
ACWorthFile.write("\ '"+code+"\ '.")
ACWorthFile.write(",".join(list(map(str, ACWorth))))
ACWorthFile.write("\n")
print("write "+code+"'s data success.")
netWorthFile.close()
ACWorthFile.close()
Copy the code