This article is participating in Python Theme Month, see [event link]

The preface

Recently, my job has stabilized, and I have more spare money on hand. I can’t bear to look at the small income when I put it in the bank every day. I am not in the mood to go to work every day, and I am thinking about how to improve my income. One day when I was at lunch, I saw my colleagues all buying funds, and suddenly I wanted to get into this pit. I really realized the importance of financial management. Every day after work, I did not look at the source code and JVM courses, but began to watch some popular science videos of FINANCE and economics at B station.

If you want to analyze funds, from thousands of funds to pick out a good fund, it must not be without years of data, as a programmer, for the convenience of analysis, I still feel that all the data first climb down, and then do further processing. But how do you get data processing data? Do you have to read them one by one and then write them down? D ‘ahh, that’s not gonna happen.

I chose the crawler this more common emerging technology, to learn a simple Python began to write their own crawler program began, to ye climb!

Analysis of the

To climb data, you need to think about where to climb? After a search and consideration, I found that every day fund network data is relatively complete, and very easy to climb, so from its start.

We randomly selected a last year’s popular king – merchants China Certificate liquor (161725), above his various information.

I’m using Chrome, hold down F12 to go into developer mode, select Network, and refresh to see a side that normal people don’t see. So if we go down to Network we can see the fund code.js? V, click on it to see the URL next to the requested address: http://fund.eastmoney.com/pingzhongdata/161725.js?v=20210719125054, that is to say, we can simply by http://fund.eastmoney.com/pingzhongdata/ Fund code.js? V = current time such an interface can obtain the corresponding data.

Let’s click Preview next to it, and holy crap, that’s where all the data we need is hidden.

Start of actual combat

The introduction of the module

Interface has now, has the data analysis also, code is http://fund.eastmoney.com/pingzhongdata/ funds. Js? V = the current time

Through the fund code and the current time we can obtain the corresponding data, the next is the need to extract the data we want from the obtained files, that is, we say the process of data cleaning. The data provided by this site is not in the usual JSON format, so it can be a bit tricky to extract, such as looking up strings, but since this is a JS file, I found a more appropriate way to use the PyExecJs module to easily compile and parse THE JS code.

First we need to install the PyExecJs module on the terminal.

pip install PyExecJs
Copy the code

These modules are then introduced

import requests
import time
import execjs
Copy the code

The interface structure

def getUrl(fscode) :
  head = 'http://fund.eastmoney.com/pingzhongdata/'
  tail = '.js? v='+ time.strftime("%Y%m%d%H%M%S",time.localtime())
  
  return head+fscode+tail
Copy the code

Access to the net value of

def getWorth(fscode) :
    Get the file from requests
    content = requests.get(getUrl(fscode))
    
   Get the corresponding data using execjs
    jsContent = execjs.compile(content.text)
    name = jsContent.eval('fS_name')
    code = jsContent.eval('fS_code')
    # Net unit value trend
    netWorthTrend = jsContent.eval('Data_netWorthTrend')
    # Cumulative net value trend
    ACWorthTrend = jsContent.eval('Data_ACWorthTrend')

    netWorth = []
    ACWorth = []

   # Extract the net value inside
    for dayWorth in netWorthTrend[::-1]:
        netWorth.append(dayWorth['y'])

    for dayACWorth in ACWorthTrend[::-1]:
        ACWorth.append(dayACWorth[1])
    print(name,code)
    return netWorth, ACWorth
Copy the code

View the data

So we can check the corresponding data through the fund code.

netWorth, ACWorth = getWorth('161725')
print(netWorth)
Copy the code

And we can see that it’s true, the first one is today’s net unit value.

Of course, we can draw our own chart to verify this.

import matplotlib.pyplot as plt
plt.figure(figsize=(10.5))
plt.plot(netWorth[:60] [: : -1])
plt.show()
Copy the code

The source code is given below.

import requests
import time
import execjs
import matplotlib.pyplot as plt
def getUrl(fscode) :
  head = 'http://fund.eastmoney.com/pingzhongdata/'
  tail = '.js? v='+ time.strftime("%Y%m%d%H%M%S",time.localtime())
  
  return head+fscode+tail

def getWorth(fscode) :
    Get the file from requests
    content = requests.get(getUrl(fscode))
    
   Get the corresponding data using execjs
    jsContent = execjs.compile(content.text)
    name = jsContent.eval('fS_name')
    code = jsContent.eval('fS_code')
    # Net unit value trend
    netWorthTrend = jsContent.eval('Data_netWorthTrend')
    # Cumulative net value trend
    ACWorthTrend = jsContent.eval('Data_ACWorthTrend')

    netWorth = []
    ACWorth = []

   # Extract the net value inside
    for dayWorth in netWorthTrend[::-1]:
        netWorth.append(dayWorth['y'])

    for dayACWorth in ACWorthTrend[::-1]:
        ACWorth.append(dayACWorth[1])
    print(name,code)
    return netWorth, ACWorth

netWorth, ACWorth = getWorth('161725')

plt.figure(figsize=(10.5))
plt.plot(netWorth[:60] [: : -1])
plt.show()
Copy the code

Get all fund data

Here I find the interface for all the fund lists in the same way. By ‘http://fund.eastmoney.com/js/fundcode_search.js’ can direct access to all the fund code, again through the fund code can traverse the crawl all fund data, I will download the data became a CSV, convenient excel open or use the code to read.

import requests
import time
import execjs

def getUrl(fscode) :
  head = 'http://fund.eastmoney.com/pingzhongdata/'
  tail = '.js? v='+ time.strftime("%Y%m%d%H%M%S",time.localtime())
  
  return head+fscode+tail

Obtain net value according to fund code
def getWorth(fscode) :
    content = requests.get(getUrl(fscode))
    jsContent = execjs.compile(content.text)
    
    name = jsContent.eval('fS_name')
    code = jsContent.eval('fS_code')
    # Net unit value trend
    netWorthTrend = jsContent.eval('Data_netWorthTrend')
    # Cumulative net value trend
    ACWorthTrend = jsContent.eval('Data_ACWorthTrend')

    netWorth = []
    ACWorth = []

    for dayWorth in netWorthTrend[::-1]:
        netWorth.append(dayWorth['y'])

    for dayACWorth in ACWorthTrend[::-1]:
        ACWorth.append(dayACWorth[1])
    print(name,code)
    return netWorth, ACWorth
  
def getAllCode() :
    url = 'http://fund.eastmoney.com/js/fundcode_search.js'
    content = requests.get(url)
    jsContent = execjs.compile(content.text)
    rawData = jsContent.eval('r')
    allCode = []
    for code in rawData:
        allCode.append(code[0])
    return allCode

allCode = getAllCode()



netWorthFile = open('./netWorth.csv'.'w')
ACWorthFile = open('./ACWorth.csv'.'w')

for code in allCode:
  try:
    netWorth, ACWorth = getWorth(code)
  except:
    continue
  if len(netWorth)<=0 or len(ACWorth)<0:
    print(code+"'s' data is empty.")
    continue
  netWorthFile.write("\ '"+code+"\ '.")  
  netWorthFile.write(",".join(list(map(str, netWorth))))
  netWorthFile.write("\n")
  
  ACWorthFile.write("\ '"+code+"\ '.")  
  ACWorthFile.write(",".join(list(map(str, ACWorth))))
  ACWorthFile.write("\n")
  print("write "+code+"'s data success.")
  
netWorthFile.close()
ACWorthFile.close()

Copy the code