Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.
NN is used for simple temperature prediction, and the data set has been uploaded. Introduction to the main fields of the dataset:
-
Year, MOTH,day,week: indicates the specific time, because the unique thermal encoding is required for the character format
-
Temp_2: maximum temperature of the day before yesterday
-
Temp_1: yesterday’s maximum temperature
-
Average: the highest average temperature on this day in history
-
Actual: indicates the actual maximum temperature of the day
-
-Serena: No, I don’t
Original data dimension: (348, 9), data: Year Day Week temp_2 TEMP_1 Average Actual friend 0 2016 1 1 Fri 45 45 45.6 45 29 1 2016 1 2 Sat 44 45 45.7 44 61 2 2016 1 3 Sun 45 44 45.8 41 56 3 2016 1 4 Mon 44 41 45.9 40 53 4 2016 1 5 Tues 41 40 46.0 44 41Copy the code
Now we need to train all the parameters of NN prediction model according to the data in the training set except the column that actual serves as the label. As you can see in the training set, the week column contains a string rather than a number as in the other columns, which cannot be computed numerically in the training, so week needs to be processed extra. Here you can use Sklearn to heat code week to standardize features. The training data after processing are as follows:
Standardized raw data, dimensions: (348, 14) Specific data: [[0-1.5678393-1.65682171... -0.40482045-0.41913682-0.40482045] [0-1.5678393-1.54267126... -0.40482045 -0.41913682-0.40482045] [0-1.5678393-1.4285208... -0.40482045-0.41913682-0.40482045] [0.1.5810006 2.47023092-0.41913682-0.40482045] [0.1.5810006 1.65354153... -0.40482045-0.41913682-0.40482045] [0.1.5810006 1.65354153... -0.40482045-0.41913682-0.40482045] 0.40482045-0.41913682-0.40482045]Copy the code
Here, we use a hidden layer, the hidden layer Size is set to 128, and the Batch Size is set to 16 (Batch refers to a part of data sent into the network for training each time, and Batch Size refers to the number of training samples in each Batch). The neural network will be designed with the following intention:
(We can also change the Size of Batch Size or the Size of hidden layer by ourselves and observe the influence on the prediction results, such as over-fitting and other phenomena)
The code is as follows:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
import datetime
from sklearn import preprocessing
import matplotlib
import warnings
warnings.filterwarnings("ignore")
features = pd.read_csv('temps.csv')
# see what the data looks like, head() displays the first five by default
print('Raw data dimension: {0}, data: \n{1}'.format(features.shape, features.head()))
# unique hot encoding encodes Fri, Sun, etc. in week instead of String format
features = pd.get_dummies(features)
features.head(5)
The hashtag also requires the true value of the predicted temperature
labels = np.array(features['actual'])
Remove the tag from the feature
features = features.drop('actual', axis=1)
Each column name of the training set is saved separately for later use
feature_list = list(features.columns)
# Convert to the appropriate format
features = np.array(features)
input_features = preprocessing.StandardScaler().fit_transform(features)
print("\n normalized raw data, dimension: {0} Concrete data: \n{1}".format(input_features.shape, input_features))
# Build the network model
input_size = input_features.shape[1]
hidden_size = 128
output_size = 1
batch_size = 16
my_nn = torch.nn.Sequential(
torch.nn.Linear(input_size, hidden_size),
torch.nn.Sigmoid(),
torch.nn.Linear(hidden_size, output_size),
)
cost = torch.nn.MSELoss(reduction='mean') # Calculate the loss function (mean square error)
optimizer = torch.optim.Adam(my_nn.parameters(), lr=0.001) # the optimizer
# Training network
losses = []
for i in range(500):
batch_loss = []
# Mini-batch method for training
for start in range(0.len(input_features), batch_size):
end = start + batch_size if start + batch_size < len(input_features) else len(input_features)
xx = torch.tensor(input_features[start:end], dtype=torch.float, requires_grad=True)
yy = torch.tensor(labels[start:end], dtype=torch.float, requires_grad=True)
prediction = my_nn(xx)
loss = cost(prediction, yy)
optimizer.zero_grad()
loss.backward(retain_graph=True)
All Optimizers implement the step() method, which updates all parameters.
# Once the gradient has been calculated by a function like BACKWARD (), we can call this function.
optimizer.step()
batch_loss.append(loss.data.numpy())
Print loss is printed every 100 rounds
if i % 100= =0:
losses.append(np.mean(batch_loss))
print(i, np.mean(batch_loss), batch_loss)
# Predict training results
x = torch.tensor(input_features, dtype=torch.float)
predict = my_nn(x).data.numpy()
# Convert date format
months = features[:, feature_list.index('month')]
days = features[:, feature_list.index('day')]
years = features[:, feature_list.index('year')]
dates = [str(int(year)) + The '-' + str(int(month)) + The '-' + str(int(day)) for year, month, day in zip(years, months, days)]
dates = [datetime.datetime.strptime(date, '%Y-%m-%d') for date in dates]
Create a table to store the date and its corresponding tag value
true_data = pd.DataFrame(data={'date': dates, 'actual': labels})
# Create another one to store the date and its corresponding model predicted value
test_dates = [str(int(year)) + The '-' + str(int(month)) + The '-' + str(int(day)) for year, month, day in
zip(years, months, days)]
test_dates = [datetime.datetime.strptime(date, '%Y-%m-%d') for date in test_dates]
predictions_data = pd.DataFrame(data={'date': test_dates, 'prediction': predict.reshape(-1)})
# Start drawing
# matplotlib add local support for Chinese font library, default is English cannot display Chinese
matplotlib.rc("font", family='Songti SC')
# real value
plt.plot(true_data['date'], true_data['actual'].'b+', label='True value')
# predicted
plt.plot(predictions_data['date'], predictions_data['prediction'].'r+', label='Predicted value')
plt.xticks(rotation='60')
plt.legend()
# title
plt.xlabel('date')
plt.ylabel('Maximum temperature (F: Fahrenheit)')
plt.title('Real and predicted temperatures')
plt.show()
Copy the code