Machine learning stock price prediction from crawler to prediction - data crawl part

The original link

My GitHub blog address

preface

Hello, everyone, xiao Zhi came to bring you some dry goods today. The primary actual combat of machine learning stock price prediction is when I just got into quantitative trading, because I was struggling to find a data source, so I looked for a third-party platform to obtain stock data.

Later, I became interested in ipython Notebook used on the platform. After all, I have not been learning Python for a long time, so I was really happy to get in touch with such a special and beautiful compilation environment. Ipython’s mix of code, text, and diagrams makes it easy to document, and it compiles instantly, so it’s fun to use anyway.

So I did some research and learned that we could actually use Ipython as a local editor ourselves, which made me really happy, plus this Friday, tomorrow, I’m doing a session in front of the entire company on ARTIFICIAL intelligence and quantitative trading, So I took the opportunity to write the demo code on the Ipython Notebook. Tomorrow, I will first run the code well, and then demonstrate the code while demonstrating the diagram, comfortable, later you will also feel the charm of ipython Notebook in the article.

I will share this PPT, and I will use an article to talk about what I said and what I thought during the speech. I hope to communicate with you.

Of course, the style of PPT may not really meet your expectations. I’m just a programmer and I’m really not good at these, so black words on a white background feel very good.

URL analysis

Like the previous article, I am using the stock index to analyze the data, however, the meeting on Friday, I need to use a usually familiar with a product of our company (spot precious metals) are involved in our company, so I just asked our CTO, let him give me some way to get the gold, then the CTO gave me an address, knowledge site on Wall Street. Then I clicked on the chart of the category I wanted and, using Chrome’s checkup tool, easily grabbed the URL data. It looks something like this

The request URL for this data looks like this

https://forexdata.wallstreetcn.com/kline?prod_code=XAUUSD&candle_period=8&data_count=1000&end_time=1413158399&fields=tim e_stamp%2Copen_px%2Cclose_px%2Chigh_px%2Clow_pxCopy the code

The format of the data is clear. We can probably guess that the request parameter data_count represents the amount of data requested, and the end_time is the timestamp. The two data values combined are the datA_count data of the previous trading days from end_time.

Time_stamp is the time stamp of each data, close is the closing price, open is the opening price, high is the highest price and low is the lowest price. These five data are the basic data we need to draw the K-chart, which is the so-called candle chart. K line does not know the partner can look up, I will not elaborate here.

Crawl data

After analyzing the URL, we are going to formally crawl the data. I want to get 10 years of data in gold (actually gold/DOLLAR, code name is XAUUSD, it is a foreign currency). Note that after my attempt, the data_count in this URL can only get 1000 data at most, if it is greater than 1000, It also returns 1000 entries by default. Naturally, then, our request parameter end_time will have to change dynamically.

For convenience, I decided to crawl only one year ata time, so data_count is fixed at 365, and end_time is retrieved from the argument via the format function, as follows

def get_data(end_time,count): 
  url = "https://forexdata.wallstreetcn.com/kline?prod_code=XAUUSD&candle_period=8&data_count=365&end_time="\
        "{end_time}"\
        "&fields=time_stamp%2Copen_px%2Cclose_px%2Chigh_px%2Clow_px".format(end_time=end_time)
  response = requests.get(url) # request data
  data_list = json.loads(response.text) # json parsing
  data = data_list.get("data").get("candle").get("XAUUSD")
  Convert to DataFrame
  df = pd.DataFrame(data,columns=['date'.'open'.'close'.'high'.'low'],index=list(range(count,count+365)))
  return df
Copy the code

Here, we use the Requests third-party package to request the data, parse it with JSON, and convert it to the Pandas DataFrame structure. This is a routine operation, so I think everyone should be ok.

After the method of data acquisition is written, we cycle to call get_data function 10 times, and the DataFrame object is splice, then complete to get our golden 10 years of data, pay attention to each cycle interval must have a certain delay, so as not to be blocked by the anti-crawler mechanism iP.

init_time = 1237507200 # March 20, 2009
window = 60*60*24*365 Get data for 365 days at a time
df = pd.DataFrame()
for i in range(10):
   df = pd.concat([df,get_data(init_time + i * window,i*365)])
   print("get data success ",i)
   time.sleep(0.5)
Copy the code

Ok, after executing the code, let’s look at the df data and cut an Ipython notebook style.

Ipython is an experience that I can summarize as: carpe diem.

Play with the data

Well, now that we have 3,650 DataFrame data, it’s time to play around with the data as Python is the master of data analysis. (import matplotlib.pyplot as plt)

Let’s plot the closing price of gold in three lines of code

df['close'].plot(figsize=(15.10))
plt.grid(True)
plt.show()
Copy the code

So we have five basic data points for plotting k-line data, and it doesn’t make sense not to plot k-line data. The code to draw the K-line is a bit more complicated, dealing mainly with the time on the X-axis. It requires a data conversion, converting the timestamp to %Y-%m-%d, and converting this format to the time style supported by PyPlot.

import matplotlib.finance as mpf
from matplotlib.pylab import date2num
import datetime

r = map(lambda x : time.strftime('%Y-%m-%d',time.localtime(x)),df['date'])
df['date'] = list(r) 

def date_to_num(dates):
   num_time = []
   for date in dates:
       date_time = datetime.datetime.strptime(date,'%Y-%m-%d')
       num_date = date2num(date_time)
       num_time.append(num_date)
   return num_time

fig,ax = plt.subplots(figsize=(15.10))
mat_data = df.as_matrix()
num_time = date_to_num(mat_data[:,0])
mat_data[:,0] = num_time

fig.subplots_adjust(bottom=0.2)
ax.xaxis_date()
mpf.candlestick_ochl(ax,mat_data,width=0.6,colorup='r',colordown='g')
plt.grid(True)
plt.xlabel('Data')
plt.ylabel('Price')
plt.show()
Copy the code

We can also plot the daily ups and downs of the decade, showing gold’s bulls, bears and swings

rate_of_return = (df['close']-df['open'])/df['open']
rate_of_return.plot(kind='line',style='k--',figsize=(15.10))
plt.show()
Copy the code

As you can see, gold has been volatile most of the time, with some anomalies in the beginning and in the middle, which I guess was a correction in the period after the financial crisis, when the dollar plummeted.

Oh, and here I want to correct a mistake I made in my last post, remember that curved bar chart? Yes, that’s it

I thought it was an Ipython bug, but it turned out it wasn’t. I added this line to the code

with plt.xkcd():
Copy the code

XKCD is the name of a cartoon, and this function represents drawing in a style similar to that of the cartoon, so what does the cartoon look like?

emm… It does look handmade.

At the end

Well, this is the data crawl, in fact, and most of the crawl work is similar, but the market has a certain particularity.

In the next post I’ll play around with some machine learning code on this data to tune arguments with you.

Recommended reading

Machine learning stock price prediction

What’s up with quantitative trading and artificial intelligence

Share some tips for learning AI

Pay attention to the public number to get more articles -AI geek training station

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Machine learning stock price prediction from crawler to prediction – data crawl part

preface

URL analysis

Crawl data

Play with the data

At the end

Pay attention to the public number to get more articles -AI geek training station

Machine learning stock price prediction from crawler to prediction – data crawl part

preface

URL analysis

Crawl data

Play with the data

At the end

Pay attention to the public number to get more articles -AI geek training station

Related Posts

Spatial data visualization artifact keplergl

Feature engineering, machine learning optimization methods are all here! Collection!

Decryption seven Niuyunrui smart transcoding technology: to create low-cost, ultra HD solutions