Keras LSTM RNN forecast - Shifting fitted forecast backward - time-series

I am trying to use LSTM Recurrent Neural Net using Keras to forecast future purchase. My input variables are time-window of purchases for previous 5 days, and a categorical variable which I encoded as dummy variables A, B, ...,I. My input data looks like following:
>>> dataframe.head()
day price A B C D E F G H I TS_bigHolidays
0 2015-06-16 7.031160 1 0 0 0 0 0 0 0 0 0
1 2015-06-17 10.732429 1 0 0 0 0 0 0 0 0 0
2 2015-06-18 21.312692 1 0 0 0 0 0 0 0 0 0
My problem is my forecasts/fitted values (both for trained and test data) seem to be shifted forward. Here is a plot:
My question is what parameter in LSTM Keras should I change to correct this issue? Or do I need to change anything in my input data?
Here is my code:
import numpy as np
import os
import matplotlib.pyplot as plt
import pandas
import math
import time
import csv
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Dropout
from keras.layers.recurrent import LSTM
from sklearn.preprocessing import MinMaxScaler
np.random.seed(1234)
exo_feature = ["A","B","C","D","E","F","G","H","I", "TS_bigHolidays"]
look_back = 5 #this is number of days we are looking back for sliding window of time series
forecast_period_length = 40
# load the dataset
dataframe = pandas.read_csv('processedDataframeGameSphere.csv', header = 0, engine='python', skipfooter=6)
dataframe["price"] = dataframe['price'].astype('float32')
scaler = MinMaxScaler(feature_range=(0, 100))
dataframe["price"] = scaler.fit_transform(dataframe['price'])
# this function is used to make sliding window for time series data
def create_dataframe(dataframe, look_back=1):
dataX, dataY = [], []
for i in range(dataframe.shape[0]-look_back-1):
price_lookback = dataframe['price'][i: (i + look_back)] #i+look_back is exclusive here
exog_feature = dataframe[exo_feature].ix[i + look_back - 1] #Y is i+ look_back ,that's why
row_i = price_lookback.append(exog_feature)
dataX.append(row_i)
dataY.append(dataframe["price"][i + look_back])
return np.array(dataX), np.array(dataY)
window_dataframe, Y = create_dataframe(dataframe, look_back)
# split into train and test sets
train_size = int(dataframe.shape[0] - forecast_period_length) #28 is the number of days we want to forecast , 4 weeks
test_size = dataframe.shape[0] - train_size
test_size_start_point_with_lookback = train_size - look_back
trainX, trainY = window_dataframe[0:train_size,:], Y[0:train_size]
print(trainX.shape)
print(trainY.shape)
#below changed datawindowY indexing, since it's just array.
testX, testY = window_dataframe[train_size:dataframe.shape[0],:], Y[train_size:dataframe.shape[0]]
# reshape input to be [samples, time steps, features]
trainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
testX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1]))
print(trainX.shape)
print(testX.shape)
# create and fit the LSTM network
dimension_input = testX.shape[2]
model = Sequential()
layers = [dimension_input, 50, 100, 1]
epochs = 100
model.add(LSTM(
input_dim=layers[0],
output_dim=layers[1],
return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(
layers[2],
return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(
output_dim=layers[3]))
model.add(Activation("linear"))
start = time.time()
model.compile(loss="mse", optimizer="rmsprop")
print "Compilation Time : ", time.time() - start
model.fit(
trainX, trainY,
batch_size= 10, nb_epoch=epochs, validation_split=0.05,verbose =2)
# Estimate model performance
trainScore = model.evaluate(trainX, trainY, verbose=0)
trainScore = math.sqrt(trainScore)
trainScore = scaler.inverse_transform(np.array([[trainScore]]))
print('Train Score: %.2f RMSE' % (trainScore))
testScore = model.evaluate(testX, testY, verbose=0)
testScore = math.sqrt(testScore)
testScore = scaler.inverse_transform(np.array([[testScore]]))
print('Test Score: %.2f RMSE' % (testScore))
# generate predictions for training
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)
# shift train predictions for plotting
np_price = np.array(dataframe["price"])
print(np_price.shape)
np_price = np_price.reshape(np_price.shape[0],1)
trainPredictPlot = np.empty_like(np_price)
trainPredictPlot[:, :] = np.nan
trainPredictPlot[look_back:len(trainPredict)+look_back, :] = trainPredict
testPredictPlot = np.empty_like(np_price)
testPredictPlot[:, :] = np.nan
testPredictPlot[len(trainPredict)+look_back+1:dataframe.shape[0], :] = testPredict
# plot baseline and predictions
plt.plot(dataframe["price"])
plt.plot(trainPredictPlot)
plt.plot(testPredictPlot)
plt.show()

It's not a problem of LSTM, if you use just simple feed-forward network, the effect will be the same.
the problem is the network tend to mimic yesterday value instead of 'forecasting' you expect.
(it is nice strategy in term of reducing MSE loss)
you need more 'care' to avoid this issue and it's not a simple issue.

Related

How can I avoid underfitting in Pytorch NeuralNetwork

How can I avoid underfitting in Pytorch NeuralNetwork?
I try to predict the power consumtion of a plant based on seven features. I have built two simple neural network models.
The first one is a Linear model, and the second is a RNN model. However, both models perform bad in the test set, their forecast result is a straight line.
Something about data
There are about 360 samples in the CSV file. I take the first 300 samples for trainning and the others for test. The first 7 columns of raw data are features of daily operation. The last column is the electricity consumption of every day.
Setting of training set
In the linear model, train data is the first 7 colums of a certain day, and corresponding target is the power consumption of that day.
In the RNN model, train data is all the 8 columns of three days(seven features and power consumption), and corresponding traget is the power consumption of next three days.
Code
Code of RNN model
import torch
import pandas as pd
import numpy as np
import torch.nn.functional as f
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from matplotlib import pyplot as plt
'''
build simple RNN
'''
batchSize = 3
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
netPath = ''
'''Data processing'''
# read raw data
filePath = 'F:/.csv'
initialData = pd.read_csv(filePath)
print(initialData.head(10))
print('hello world')
# Separate features and power consumption.
trainDatas = initialData.iloc[0:7, 1:301]
trainPowerConsum = pd.DataFrame(initialData.iloc[-1, 1:301]).T
trainDatas = pd.concat([trainDatas, trainPowerConsum], 0)
trainPowerConsum = initialData.iloc[-1, 2:302]
# Plot
powerConsumPlot = trainDatas.iloc[-1, :]
xData = np.linspace(1, powerConsumPlot.shape[0], 300)
plt.plot(xData, powerConsumPlot)
plt.show()
testDatas = initialData.iloc[0:7, 302:-1]
testPowerConsum = pd.DataFrame(initialData.iloc[-1, 302:-1]).T
testDatas = pd.concat([testDatas, testPowerConsum], 0)
testPowerConsum = initialData.iloc[-1, 303:]
# convert to dataframe
trainDatas = pd.DataFrame(trainDatas)
trainDatas = trainDatas.T
trainPowerConsum = pd.DataFrame(trainPowerConsum)
testDatas = pd.DataFrame(testDatas)
testDatas = testDatas.T
testPowerConsum = pd.DataFrame(testPowerConsum)
# change the unit of PowerConsumption
trainDatas.iloc[:, -1] = trainDatas.iloc[:, -1] * 1000
testDatas.iloc[:, -1] = testDatas.iloc[:, -1] * 1000
trainPowerConsum.iloc[:, 0] = trainPowerConsum.iloc[:, 0] * 1000
testPowerConsum.iloc[:, 0] = testPowerConsum.iloc[:, 0] * 1000
assert testPowerConsum.shape[0] == testDatas.shape[0]
assert trainDatas.shape[0] == trainPowerConsum.shape[0]
# convert dataframe to tensor
trainDatas = torch.tensor(trainDatas.values.astype(float), device=device)
trainPowerConsum = torch.tensor(trainPowerConsum.values.astype(float), device=device)
testDatas = torch.tensor(testDatas.values.astype(float), device=device)
testPowerConsum = torch.tensor(testPowerConsum.values.astype(float), device=device)
trainDatasList = list()
trainPowerConsumList = list()
for i in range(298):
trainDatasList.append(trainDatas[i:i + 3])
trainPowerConsumList.append(trainPowerConsum[i:i + 3])
from torch.nn.utils.rnn import pad_sequence
trainPowerConsum = pad_sequence(trainPowerConsumList, batch_first=True)
trainDatas = pad_sequence(trainDatasList, batch_first=True)
print(trainDatas.shape)
# ensure the batch_size of test data is 1
testDatas = torch.unsqueeze(testDatas, dim=0)
testPowerConsum = torch.unsqueeze(testPowerConsum, dim=0)
'''build dataloader'''
trainDataLoader = DataLoader(
TensorDataset(
trainDatas, trainPowerConsum
),
shuffle=True, batch_size=batchSize, drop_last=True)
print('Data is ready')
seqLen = 2
inputDim = 8
hiddenSize = 3
numLayer = 2
learningRate = 0.01
class RNNModel(torch.nn.Module):
def __init__(self, inputsize, hiddensize, batchsize, numLayer):
super(RNNModel, self).__init__()
self.batchsize = batchsize
self.inputsize = inputsize
self.hiddensize = hiddensize
self.numlayers = numLayer
self.rnn = torch.nn.RNN(input_size=self.inputsize, hidden_size=self.hiddensize, num_layers=self.numlayers,
batch_first=True)
self.l1 = torch.nn.Linear(hiddenSize, hiddensize)
self.l2 = torch.nn.Linear(hiddenSize, 1)
def forward(self, input, hidden):
out, hidden = self.rnn(input.float(), hidden.float())
batch_size, seq_len, input_dim = out.shape
out = out.reshape(-1, input_dim)
# out = f.sigmoid(self.l1(out))
out = f.relu(self.l1(out))
out = self.l2(out)
out = out.reshape(batch_size, seq_len, -1)
return out, hidden
def initHidden(self):
hidden = torch.zeros(self.numlayers, self.batchsize, self.hiddensize, device=device, dtype=torch.float64)
return hidden
net = RNNModel(inputDim, hiddenSize, batchSize, numLayer).to(device)
criterion = torch.nn.L1Loss()
optimizer = optim.Adam(net.parameters(), lr=learningRate,momentum=0.01)
def train(epoch):
runLoss = 0.
optimizer.zero_grad()
hidden = net.initHidden()
for batchIndex, data in enumerate(trainDataLoader, 0):
inputs, target = data
optimizer.zero_grad()
outputs, hidden = net(inputs, hidden)
hidden = hidden.detach()
loss = criterion(outputs.float(), target.float())
loss = loss.mean()
loss.backward()
optimizer.step()
print(f'{epoch + 1},\t Loss={loss.item()}')
# torch.save(net.state_dict(), netPath)
def test():
testDatasVice = torch.clone(testDatas)
input = testDatasVice[:, 0, :]
input = input.view(1, 1, -1)
assert input.shape[2] == 8
predictPowConsum = list()
# the first hidden tensor in test set is zero
hidden = torch.zeros(2, 1, 3, device=device, dtype=torch.float64)
with torch.no_grad():
for i in range(testDatas.shape[1]):
output, hidden = net(input, hidden)
if i < 51:
testDatasVice[:, i + 1, -1] = output[0]
input = torch.unsqueeze(testDatasVice[:, i + 1, :], dim=0)
predictPowConsum.append(output.data.cpu().numpy().ravel()[0])
elif i == 51:
predictPowConsum.append(output.data.cpu().numpy().ravel()[0])
else:
print('\tindexError') # Exclude potential Errors
return predictPowConsum
if __name__ == '__main__':
epochNum = 300
for epoch in range(epochNum):
train(epoch)
predictPowConsum = test()
# plotting
xData = np.arange(303, 303 + testPowerConsum.size(1))
plt.plot(xData, testPowerConsum.cpu().numpy()[0, :, 0])
plt.plot(xData, predictPowConsum)
plt.show()
Code of Linear model
import torch
import pandas as pd
import numpy as np
import torch.nn.functional as f
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from matplotlib import pyplot as plt
filePath = 'F:.csv'
initialData = pd.read_csv(filePath)
print(initialData.head(10))
print('hello world')
trainDatas = initialData.iloc[0:7, 1:300]
trainPowerConsum = initialData.iloc[-1, 1:300]
testDatas = initialData.iloc[0:7, 300:-1]
testPowerConsum = initialData.iloc[-1, 300:-1]
trainDatas = pd.DataFrame(trainDatas)
trainDatas = trainDatas.T
trainPowerConsum = pd.DataFrame(trainPowerConsum)
testDatas = pd.DataFrame(testDatas)
testDatas = testDatas.T
testPowerConsum = pd.DataFrame(testPowerConsum)
trainPowerConsum.iloc[:, 0] = trainPowerConsum.iloc[:, 0] * 1000
testPowerConsum.iloc[:, 0] = testPowerConsum.iloc[:, 0] * 1000
# build dataloader
trainData = DataLoader(
TensorDataset(
torch.tensor(trainDatas.values).float(),
torch.tensor(trainPowerConsum.values.astype(float)).float()
),
shuffle=True, batch_size=15)
testData = DataLoader(
TensorDataset(
torch.tensor(testDatas.values.astype(float)).float(),
torch.tensor(testPowerConsum.values.astype(float)).float()
),
shuffle=False, batch_size=15)
print('data is ready')
class SimpleNet(torch.nn.Module):
def __init__(self):
super(SimpleNet, self).__init__()
self.l1 = torch.nn.Linear(7, 15)
self.l2 = torch.nn.Linear(15, 30)
self.l3 = torch.nn.Linear(30, 15)
self.l4 = torch.nn.Linear(15, 5)
self.l5 = torch.nn.Linear(5, 1)
def forward(self, x):
x = f.relu(self.l1(x))
x = f.relu(self.l2(x))
x = f.relu(self.l3(x))
x = f.relu(self.l4(x))
return self.l5(x)
model = SimpleNet()
criterion = torch.nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.0001)
def train(epoch):
runLoss = 0.
for batch_index, data in enumerate(trainData, 0):
inputs, target = data
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
runLoss += loss
print(f'{epoch + 1},{batch_index + 1},\tLoss={runLoss / 5}')
runLoss = 0
def test(epoch):
totalError = 0.
print('Start to test the model')
with torch.no_grad():
for data in testData:
# test ---------data for test
# testlab ---------corresponding power consumption
test, testlab = data
outputs = model(test)
predicted = outputs.data
testError = testlab - predicted
# plotting
if epoch % 50 == 2:
xData = np.linspace(1, 15, 15)
if predicted.size(0) != 15:
pass
else:
plt.plot(xData, predicted[:, 0].numpy(), label='predicted', color='red')
plt.plot(xData, testlab[:, 0].numpy(), label='origData', color='blue')
plt.show()
totalError += (torch.abs(testError).sum().item())
print(f'Average Error on test set is {totalError / 54}')
if __name__ == '__main__':
for epoch in range(1000):
train(epoch)
test(epoch)
Image of Output
output of RNN
The blue line is the actual data, and the orange line is the output of RNN model.
Solutions and its Effect
I have looked around and apparently I've got the choice between these solutions:
Add new domain-specific features
Decrease the amount of regularization used
Increase the duration of training
Increase the complexity or type of the model
Decrease the learning rate
Try other activate function
I have tried some solutions:
The data for trainning isn't regularized. I just change the unit of electricity from kWh to Wh
I take ReLu as activate function after using Sigmoid, but it doesn't work
I adjust the learning rate from 0.01 to 0.001, it doesn't improve
I try different optimizer such as SGD and Adam on both model, even use momentum, it doesn't get better
The sequence length of RNN model is 60 firstly, then is set to 3. The loss dropped more rapidly in the latter case, but the forecast result still is a straight line
In a word, all solutions I find doesn't work.
Besides, if shuffle is True when building DataLoader, the loss skips violently between epochs. But it drops slowly and close to an constant eventually when shuffle is False.
What could be the best way to avoid the problem?
Thanks in advance!

Test set accuracy is very high after very few epochs on mnist dataset

With very few epochs this model learns to classify beween 1 and 0 extremely quickly which leads me to consider something is wrong.
Below code downloads mnist dataset, extracts the mnist images that contain 1 or 0 only. A random sample of size 200 is selected from this subset of mnist images. This random sample is the dataset the model is trained on. With just 2 epochs the model achieves 90%+ test set accuracy, is this expected behaviour ? I expected many more epochs would be required in order to train the model to achieve this level of test set accuracy.
Model code :
%reset -f
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
import torch.utils.data as data_utils
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons
from matplotlib import pyplot
from pandas import DataFrame
import torchvision.datasets as dset
import os
import torch.nn.functional as F
import time
import random
import pickle
from sklearn.metrics import confusion_matrix
import pandas as pd
import sklearn
trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (1.0,))])
root = './data'
if not os.path.exists(root):
os.mkdir(root)
train_set = dset.MNIST(root=root, train=True, transform=trans, download=True)
test_set = dset.MNIST(root=root, train=False, transform=trans, download=True)
batch_size = 64
train_loader = torch.utils.data.DataLoader(
dataset=train_set,
batch_size=batch_size,
shuffle=True)
test_loader = torch.utils.data.DataLoader(
dataset=test_set,
batch_size=batch_size,
shuffle=True)
class NeuralNet(nn.Module):
def __init__(self):
super(NeuralNet, self).__init__()
self.fc1 = nn.Linear(28*28, 500)
self.fc2 = nn.Linear(500, 256)
self.fc3 = nn.Linear(256, 2)
def forward(self, x):
x = x.view(-1, 28*28)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
num_epochs = 2
random_sample_size = 200
values_0_or_1 = [t for t in train_set if (int(t[1]) == 0 or int(t[1]) == 1)]
values_0_or_1_testset = [t for t in test_set if (int(t[1]) == 0 or int(t[1]) == 1)]
print(len(values_0_or_1))
print(len(values_0_or_1_testset))
train_loader_subset = torch.utils.data.DataLoader(
dataset=values_0_or_1,
batch_size=batch_size,
shuffle=True)
test_loader_subset = torch.utils.data.DataLoader(
dataset=values_0_or_1_testset,
batch_size=batch_size,
shuffle=False)
train_loader = train_loader_subset
# Hyper-parameters
input_size = 100
hidden_size = 100
num_classes = 2
# learning_rate = 0.00001
learning_rate = .0001
# Device configuration
device = 'cpu'
print_progress_every_n_epochs = 1
model = NeuralNet().to(device)
# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
N = len(train_loader)
# Train the model
total_step = len(train_loader)
most_recent_prediction = []
test_actual_predicted_dict = {}
rm = random.sample(list(values_0_or_1), random_sample_size)
train_loader_subset = data_utils.DataLoader(rm, batch_size=4)
for epoch in range(num_epochs):
for i, (images, labels) in enumerate(train_loader_subset):
# Move tensors to the configured device
images = images.reshape(-1, 2).to(device)
labels = labels.to(device)
# Forward pass
outputs = model(images)
loss = criterion(outputs, labels)
# Backward and optimize
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (epoch) % print_progress_every_n_epochs == 0:
print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, i+1, total_step, loss.item()))
predicted_test = []
model.eval() # eval mode (batchnorm uses moving mean/variance instead of mini-batch mean/variance)
probs_l = []
predicted_values = []
actual_values = []
labels_l = []
with torch.no_grad():
for images, labels in test_loader_subset:
images = images.to(device)
labels = labels.to(device)
outputs = model(images)
_, predicted = torch.max(outputs.data, 1)
predicted_test.append(predicted.cpu().numpy())
sm = torch.nn.Softmax()
probabilities = sm(outputs)
probs_l.append(probabilities)
labels_l.append(labels.cpu().numpy())
predicted_values.append(np.concatenate(predicted_test).ravel())
actual_values.append(np.concatenate(labels_l).ravel())
if (epoch) % 1 == 0:
print('test accuracy : ', 100 * len((np.where(np.array(predicted_values[0])==(np.array(actual_values[0])))[0])) / len(actual_values[0]))
Output of model (12665 & 2115 represents the training and test set sizes) :
12665
2115
Epoch [1/2], Step [50/198], Loss: 0.1256
Epoch [2/2], Step [50/198], Loss: 0.0151
test accuracy : 99.76359338061465
/anaconda3/envs/pytorch/lib/python3.7/site-packages/ipykernel_launcher.py:143: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
Here's my 2 cents on your binary experiment.
It would seem like you have severely reduce the complexity of your dataset and with the high number of neurons in your intermediate layers, your model is expected to converge very quickly.
Note that MNIST dataset has channel of 1 and this makes the task very simple.
You may try to play with CIFAR10 and see if you are still getting high accuracy in just 2 epochs.
That's not a particularly well-posed question, because what is expected is entirely subjective. That being said, I am not surprised because 0 and 1 are very different digits. For instance, 0 has background surrounded by foreground, whereas 1 does not - that's an almost infallible test to distinguish the two. As a sanity check, I would swap out 0 for 7, which is similar to 1. I would expect to see significantly lower success rate. That being said, that's a sanity check - even if it passes, there may still be bugs or errors in your method.

RNN can't learn integral function

For studying deep learning, RNN, LSTM and so on I tried to make RNN fit integration function. I have put random signal from 0 to 1 as input to RNN and made integral from biased by -0.5 input signal, made the limit for integral between 0:1 and put it as RNN target to learn. Blue - random input, orange - integrated input
So I have time series with only one input (random) and one output (limited integral of input) and I want RNN to predict output by the input.
I used Pytorch and tried to use vanilla RNN, GRU cell, different sizes of hidden layers, stacking several RNN, putting dense connected layers to the RNN output, different deep in backpropagation through time (from 2 to 50 gradients rolling-back). And I can't get a good result at all! It works somehow, but I can't find a way to fit integral function precisely. Here is the best of my results:
green - RNN output. Green line (model output) does not fit orange line in many cases - that is the problem.
Here is my source code in jupyter.
My questions: is it possible - to learn a saturated integral function by RNN? Where is my problem? What can I try more to achieve good quality? Ideally I want to RNN output be equal desired output (integral function) through all time series.
PS:
My code in raw format:
import numpy as np
from scipy.stats import truncnorm
import random
import math
import copy
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import matplotlib.cm as cm
def generate_data(num_of_data):
input_data=[]
output_data=[]
current_input_value=0
current_output_value=0
for i in range(num_of_data):
if (random.random()<0.1):
current_input_value=random.random()
# current_output_value=0
current_input_value=current_input_value+(random.random()-0.5)*0
current_output_value=current_output_value+0.0*(current_input_value-current_output_value)+(current_input_value-0.5)*0.1
if (current_output_value<0):
current_output_value=0
if (current_output_value>1):
current_output_value=1
input_data.append(current_input_value)
output_data.append(current_output_value)
return input_data,output_data
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (20, 6)
input_data,output_data=generate_data(500)
plt.plot(input_data)
plt.plot(output_data)
plt.show()
import torch
import torch.nn as nn
from torch.autograd import Variable
from torch import optim
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(RNN, self).__init__()
self.number_of_layers=1
self.hidden_size = hidden_size
self.gru = nn.GRU(input_size, hidden_size,self.number_of_layers)
self.Dense1 = nn.Linear(hidden_size, hidden_size)
self.Dense1A = nn.ReLU()
self.Dense2 = nn.Linear(hidden_size, output_size)
def forward(self, input, hidden):
gru_output, hidden = self.gru(input, hidden)
Dense1Out=self.Dense1(gru_output)
Dense1OutAct=self.Dense1A(Dense1Out)
output=self.Dense2(Dense1OutAct)
return output, hidden
def initHidden(self):
return Variable(torch.zeros(self.number_of_layers,1,self.hidden_size))
import time
import math
import operator
def timeSince(since):
now = time.time()
s = now - since
m = math.floor(s / 60)
s -= m * 60
return '%dm %ds' % (m, s)
rnn = RNN(1, 50, 1)
n_iters = 250000
print_every = 2000
plot_every = 2000
all_losses = []
total_loss_print = 0
total_loss_plot = 0
criterion=nn.L1Loss()
print("training...\n")
start = time.time()
optimizer = optim.Adam(rnn.parameters(), lr=0.0002)
rnn_hidden = rnn.initHidden()
rnn.zero_grad()
loss = 0
#for gata_q in range(int(n_iters/500)):
# rnn_hidden = rnn.initHidden()
input_data,output_data=generate_data(n_iters)
for data_index in range(len(input_data)):
input_tensor=torch.zeros(1, 1, 1)
input_tensor[0][0][0]=input_data[data_index]
output_tensor=torch.zeros(1, 1, 1)
output_tensor[0][0][0]=output_data[data_index]
rnn_output, rnn_hidden = rnn(Variable(input_tensor), rnn_hidden)
loss += criterion(rnn_output, Variable(output_tensor))
if data_index%2==0:
loss.backward()
total_loss_print += loss.data[0]
total_loss_plot += loss.data[0]
optimizer.step()
rnn_hidden=Variable(rnn_hidden.data)
rnn.zero_grad()
loss = 0
if data_index % print_every == 0:
print('%s (%d %d%%) tl=%.4f' % (timeSince(start), data_index, data_index / n_iters * 100,total_loss_print/print_every))
total_loss_print = 0
if data_index % plot_every == 0:
all_losses.append(total_loss_plot / plot_every)
total_loss_plot = 0
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
plt.figure()
plt.plot(all_losses)
plt.show()
rnn_hidden = rnn.initHidden()
rnn.zero_grad()
loss = 0
rnn_output_data=[]
input_data,output_data=generate_data(1500)
for data_index in range(len(input_data)):
input_tensor=torch.zeros(1, 1, 1)
input_tensor[0][0][0]=input_data[data_index]
rnn_output, rnn_hidden = rnn(Variable(input_tensor), rnn_hidden)
rnn_output_data.append(rnn_output.data.numpy()[0][0][0])
plt.plot(input_data)#blue
plt.plot(output_data)#ogange
plt.plot(rnn_output_data)#green
plt.show()
I have found the problem by myself. The problem was in some case of overfitting on latest data, as in reinforcement learning case overfitting can occur with exploiting latest strategy. As I was not using any mini-batches and applied optimiser directly after a new point of data, and as because of data points similar through 20-50 of samples, optimiser simply fitted network to only latest points forgetting of fitting previous. I solved it by collecting gradient data through time for 50 points and only after it I apply one step of optimiser. The network can learn now much better, but still not perfect.
Here is modification of code to make it work:
rnn_output, rnn_hidden = rnn(Variable(input_tensor), rnn_hidden)
loss += criterion(rnn_output, Variable(output_tensor))
if data_index % 2==0:
loss.backward()
total_loss_print += loss.data[0]
rnn_hidden=Variable(rnn_hidden.data)
loss = 0
# torch.nn.utils.clip_grad_norm(rnn.parameters(), 0.01)
if data_index % 50==0:
optimizer.step()
rnn.zero_grad()
and new result of learning of integral:
pic.

value prediction with tensorflow and python

I have a data set which contains a list of stock prices. I need to use the tensorflow and python to predict the close price.
Q1: I have the following code which takes the first 2000 records as training and 2001 to 20000 records as test but I don't know how to change the code to do the prediction of the close price of today and 1 day later??? Please advise!
#!/usr/bin/env python2
import numpy as np
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt
def feature_scaling(input_pd, scaling_meathod):
if scaling_meathod == 'z-score':
scaled_pd = (input_pd - input_pd.mean()) / input_pd.std()
elif scaling_meathod == 'min-max':
scaled_pd = (input_pd - input_pd.min()) / (input_pd.max() -
input_pd.min())
return scaled_pd
def input_reshape(input_pd, start, end, batch_size, batch_shift, n_features):
temp_pd = input_pd[start-1: end+batch_size-1]
output_pd = map(lambda y : temp_pd[y:y+batch_size], xrange(0, end-start+1, batch_shift))
output_temp = map(lambda x : np.array(output_pd[x]).reshape([-1]), xrange(len(output_pd)))
output = np.reshape(output_temp, [-1, batch_size, n_features])
return output
def target_reshape(input_pd, start, end, batch_size, batch_shift, n_step_ahead, m_steps_pred):
temp_pd = input_pd[start+batch_size+n_step_ahead-2: end+batch_size+n_step_ahead+m_steps_pred-2]
print temp_pd
output_pd = map(lambda y : temp_pd[y:y+m_steps_pred], xrange(0, end-start+1, batch_shift))
output_temp = map(lambda x : np.array(output_pd[x]).reshape([-1]), xrange(len(output_pd)))
output = np.reshape(output_temp, [-1,1])
return output
def lstm(input, n_inputs, n_steps, n_of_layers, scope_name):
num_layers = n_of_layers
input = tf.transpose(input,[1, 0, 2])
input = tf.reshape(input,[-1, n_inputs])
input = tf.split(0, n_steps, input)
with tf.variable_scope(scope_name):
cell = tf.nn.rnn_cell.BasicLSTMCell(num_units=n_inputs)
cell = tf.nn.rnn_cell.MultiRNNCell([cell]*num_layers)
output, state = tf.nn.rnn(cell, input, dtype=tf.float32) yi1
output = output[-1]
return output
feature_to_input = ['open price', 'highest price', 'lowest price', 'close price','turnover', 'volume','mean price']
feature_to_predict = ['close price']
feature_to_scale = ['volume']
sacling_meathod = 'min-max'
train_start = 1
train_end = 1000
test_start = 1001
test_end = 20000
batch_size = 100
batch_shift = 1
n_step_ahead = 1
m_steps_pred = 1
n_features = len(feature_to_input)
lstm_scope_name = 'lstm_prediction'
n_lstm_layers = 1
n_pred_class = 1
learning_rate = 0.1
EPOCHS = 1000
PRINT_STEP = 100
read_data_pd = pd.read_csv('./stock_price.csv')
temp_pd = feature_scaling(input_pd[feature_to_scale],sacling_meathod)
input_pd[feature_to_scale] = temp_pd
train_input_temp_pd = input_pd[feature_to_input]
train_input_nparr = input_reshape(train_input_temp_pd,
train_start, train_end, batch_size, batch_shift, n_features)
train_target_temp_pd = input_pd[feature_to_predict]
train_target_nparr = target_reshape(train_target_temp_pd, train_start, train_end, batch_size, batch_shift, n_step_ahead, m_steps_pred)
test_input_temp_pd = input_pd[feature_to_input]
test_input_nparr = input_reshape(test_input_temp_pd, test_start, test_end, batch_size, batch_shift, n_features)
test_target_temp_pd = input_pd[feature_to_predict]
test_target_nparr = target_reshape(test_target_temp_pd, test_start, test_end, batch_size, batch_shift, n_step_ahead, m_steps_pred)
tf.reset_default_graph()
x_ = tf.placeholder(tf.float32, [None, batch_size, n_features])
y_ = tf.placeholder(tf.float32, [None, 1])
lstm_output = lstm(x_, n_features, batch_size, n_lstm_layers, lstm_scope_name)
W = tf.Variable(tf.random_normal([n_features, n_pred_class]))
b = tf.Variable(tf.random_normal([n_pred_class]))
y = tf.matmul(lstm_output, W) + b
cost_func = tf.reduce_mean(tf.square(y - y_))
train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost_func)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
init = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init)
for ii in range(EPOCHS):
sess.run(train_op, feed_dict={x_:train_input_nparr, y_:train_target_nparr})
if ii % PRINT_STEP == 0:
cost = sess.run(cost_func, feed_dict={x_:train_input_nparr, y_:train_target_nparr})
print 'iteration =', ii, 'training cost:', cost
Very simply, prediction (a.k.a. scoring or inference) comes from running the input through only the forward pass, and collecting the score for each input vector. It's the same process flow as testing. The difference is the four stages of model use:
Train: learn from the training data set; adjust weights as needed.
Test: evaluate the model's performance; if accuracy has converged, stop training.
Validate: evaluate the accuracy of the trained model. If it doesn't meet acceptance criteria, change something and start over with the training.
Predict: you've passed validation -- release the model for use by the intended application.
All four steps follow the same forward logic flow; training includes back-propagation; the others do not. Simply follow the forward-only process, and you'll get the result form you need.
I worry about your data partition: only 10% for training, 90% for testing, and none for validation. A more typical split is 50-30-20, or something in that general area.
Q-1 : You should change your LSTM parameter to return a sequence of size two which will be prediction for that day and the day after.
Q-2 it's clearly that your model is underfitting the data, which is so obvious with your 10% train 90% test data ! You should more equilibrated ratio as suggested in the previous answer.

LSTM history length vs prediction error

I use LSTM to predict next step voltage value in voltage time series signal. I have a question:
Why using longer sequences (5 or 10 steps back in time) to train LSTM does not improve prediction and reduce prediction error ? (it actually degrades it - see the figures e.g. results for sequence_length=5 is better than sequence_length=10)
testplot('epochs: 10', 'ratio: 1', 'sequence_length: 10', 'mean error: ', '0.00116802704509')
testplot('epochs: 10', 'ratio: 1', 'sequence_length: 5', 'mean error: ', '0.000495359163296'
(predicted signal in green, real in red)
import os
import matplotlib.pyplot as plt
import numpy as np
import time
import csv
from keras.layers.core import Dense, Activation, Dropout
from keras.layers.recurrent import LSTM
from keras.models import Sequential
np.random.seed(1234)
def data_power_consumption(path_to_dataset,
sequence_length=50,
ratio=1.0):
max_values = ratio * 2049280
with open(path_to_dataset) as f:
data = csv.reader(f, delimiter=",")
power = []
nb_of_values = 0
for line in data:
try:
power.append(float(line[4]))
nb_of_values += 1
except ValueError:
pass
# 2049280.0 is the total number of valid values, i.e. ratio = 1.0
if nb_of_values >= max_values:
print "max value", nb_of_values
break
print "Data loaded from csv. Formatting..."
result = []
for index in range(len(power) - sequence_length):
result.append(power[index: index + sequence_length])
result = np.array(result) # shape (2049230, 50)
result_mean = result.mean()
result -= result_mean
print "Shift : ", result_mean
print "Data : ", result.shape
row = round(0.9 * result.shape[0])
train = result[:row, :]
np.random.shuffle(train)
X_train = train[:, :-1]
y_train = train[:, -1]
X_test = result[row:, :-1]
y_test = result[row:, -1]
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))
return [X_train, y_train, X_test, y_test]
def build_model():
model = Sequential()
layers = [1, 50, 100, 1]
model.add(LSTM(
input_dim=layers[0],
output_dim=layers[1],
return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(
layers[2],
return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(
output_dim=layers[3]))
model.add(Activation("linear"))
start = time.time()
model.compile(loss="mse", optimizer="adam") # consider adam
print "Compilation Time : ", time.time() - start
return model
def run_network(model=None, data=None):
global_start_time = time.time()
epochs = 10
ratio = 1
sequence_length = 3
path_to_dataset = 'TIMBER_DATA_1.csv'
if data is None:
print 'Loading data... '
X_train, y_train, X_test, y_test = data_power_consumption(
path_to_dataset, sequence_length, ratio)
else:
X_train, y_train, X_test, y_test = data
print '\nData Loaded. Compiling...\n'
if model is None:
model = build_model()
try:
model.fit(
X_train, y_train,
batch_size=512, nb_epoch=epochs, validation_split=0.05)
predicted = model.predict(X_test)
predicted = np.reshape(predicted, (predicted.size,))
print "done"
except KeyboardInterrupt:
print 'Training duration (s) : ', time.time() - global_start_time
return model, y_test, 0
try:
fig, ax = plt.subplots()
txt = "epochs: " + str(epochs), "ratio: " + str(ratio), "sequence_length: " + str(sequence_length)
# calculate error (shift predicted by "sequence_length - 1 and apply mean with abs)
y_test_mean = y_test - np.mean(y_test)
y_test_mean_shifted = y_test_mean[:-1*(sequence_length - 1)]
predicted_mean = predicted - np.mean(predicted)
predicted_mean_shifted = predicted_mean[(sequence_length - 1):]
prediction_error = np.mean(abs(y_test_mean_shifted - predicted_mean_shifted))
text_mean = "mean error: ", str(prediction_error)
txt = txt + text_mean
# Now add the legend with some customizations.
legend = ax.legend(loc='upper center', shadow=True)
ax.plot(y_test_mean_shifted[900:1000], 'r--', label='Real data')
ax.plot(predicted_mean_shifted[900:1000], 'g:', label='Predicted')
fig.text(0.4, 0.2, txt, horizontalalignment='center', verticalalignment='center', transform = ax.transAxes)
plt.savefig(os.path.join('cern_figures', 'testplot' + str(txt) + '.png'))
plt.show()
except Exception as e:
print str(e)
print 'Training duration (s) : ', time.time() - global_start_time
return model, y_test, predicted
# main
if __name__ == "__main__":
_, y_test_out, predicted_out = run_network()
#y_test_out_mean = y_test_out - np.mean(y_test_out)
#predicted_out_mean = predicted_out - np.mean(predicted_out)
maybe because your time series at time t does not depend on your time series at time t-10. If you have a time series (x1,...,xn) and there is no link between xn and xn-p, there is no reason to use a step back of p.
For example if you want to predict the weather one hour ahead, you will not use a step back of 2 weeks. Why ? Because the weather of 2 weeks in the past has no influence on the weather right now. You will use instead the weather of the last hour (or last day).
Ps : I use this example of weather forcasting because there is to me no link between weather two weeks in the past and now. But maybe an expert in weather forcast would prove me wrong !
Cheers !

Resources