How to extract the final model data-set from caret subsampling inside of resampling of a random forest model - r-caret

Following a subsampling inside of resampling procedure, as exemplified here https://topepo.github.io/caret/subsampling-for-class-imbalances.html#subsampling-during-resampling my question is simply how to extract the actual data-set resulting from this procedure when the caret method = “rf” and the sampling method is “smote”.
If, for example, method= glm is used then the data can be extracted with model$finalModel$data; if the method = “rpart” the data can be similarly extracted with model$finalModel$call$data.
Using subsampling inside of resampling and the method = rpart the smote data-set can be extrated as follows:
library(caret)
library(DMwR)
data("GermanCredit")
set.seed(122)
index1<-createDataPartition(GermanCredit$Class, p=.7, list = FALSE)
training<-GermanCredit[index1, ]
#testing<-GermanCredit[-index1,]
colnames(training)
metric <- "ROC"
ctrl1<- trainControl(
method = "repeatedcv",
number = 10,
repeats = 5,
search = "random",
classProbs = TRUE, # note class probabilities included
savePredictions = T, #"final"
returnResamp = "final",
allowParallel = TRUE,
summaryFunction = twoClassSummary,
sampling = "smote")
set.seed(1)
mod_fit<-train(Class ~ Age +
ForeignWorker +
Property.RealEstate +
Housing.Own +
CreditHistory.Critical, data=training, method="rpart",
metric = metric,
trControl= ctrl1)
mod_fit # ROC 0.5951215
dat_smote<- mod_fit$finalModel$call$data
table(dat_smote$.outcome)
# Bad Good
# 630 840
head(dat_smote)
# Age ForeignWorker Property.RealEstate Housing.Own CreditHistory.Critical .outcome
# 40 1 0 1 1 Good
# 29 1 0 0 0 Good
# 37 1 1 0 1 Good
# 47 1 0 0 0 Good
# 53 1 0 1 0 Good
# 29 1 0 1 0 Good
I simply would like to be able to perform the same data-set extraction when the method = "rf". The code might look like this
dat<- mod_fit$trainingData[mod_fit$trainingData == mod_fit$finalModel$x,]

I think that the only way to do it is to write a custom model that saves the data object in the fit module (that's pretty unsatisfying though).

Related

How can I improve this Reinforced Learning scenario in Stable Baselines3?

In this scenario, I present a box observation with numbers 0, 1 or 2 and shape (1, 10).
The odds for 0 and 2 are 2% each, and 96% for 1.
I want the model to learn to pick the index of any 2 that comes. If it doesn't have a 2, just choose 0.
Bellow is my code:
import numpy as np
import gym
from gym import spaces
from stable_baselines3 import PPO, DQN, A2C
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.vec_env import VecFrameStack
action_length = 10
class TestBot(gym.Env):
def __init__(self):
super(TestBot, self).__init__()
self.total_rewards = 0
self.time = 0
self.action_space = spaces.Discrete(action_length)
self.observation_space = spaces.Box(low=0, high=2, shape=(1, action_length), dtype=np.float32)
def generate_next_obs(self):
p = [0.02, 0.02, 0.96]
a = [0, 2, 1]
self.observation = np.random.choice(a, size=(1, action_length), p=p)
if 2 in self.observation[0][1:]:
self.best_reward += 1
def reset(self):
if self.time != 0:
print('Total rewards: ', self.total_rewards, 'Best possible rewards: ', self.best_reward)
self.best_reward = 0
self.time = 0
self.generate_next_obs()
self.total_rewards = 0
self.last_observation = self.observation
return self.observation
def step(self, action):
reward = 0
if action != 0:
last_value = self.last_observation[0][action]
if last_value == 2:
reward = 1
else:
reward = -1
self.time += 1
self.generate_next_obs()
done = self.time == 4096
info = {}
self.last_observation = self.observation
self.total_rewards += reward
return self.observation, reward, done, info
For training, I used the following:
env = TestBot()
env = make_vec_env(lambda: env, n_envs=1)
model = PPO('MlpPolicy', env, verbose=0)
iters = 0
while True:
iters += 1
model.learn(total_timesteps=4096, reset_num_timesteps=True)
PPO gave the best result, which wasn't so great. It learned to have positive rewards, but took a long time and got stuck in a point far from optimal.
How can I improve the learning of this scenario?
I managed to solve my problem by tunning the PPO parameters.
I had to change the following parameters:
gamma: from 0.99 to 0. It determines the importance of future rewards in the decision-making process. A value of 0 means that only imediate rewards should be considered.
gae_lambda: from 0.95 to 0.65. The gae_lambda parameter in Reinforcement Learning is used in the calculation of the Generalized Advantage Estimation (GAE). GAE is a method for estimating the advantage function in reinforcement learning, which is a measure of how much better a certain action is compared to the average action. A lower value means that PPO doesn't need to use the GAE too much.
clip_range: from 0.2 to function based. It determines the percentage of the decisions that will be done for exploration. At the end, exploration starts to be irrelevant. So, I made a function that uses a high exploration in the first few iteractions and goes to 0 at the end.
I also made a small modification in the environment in order to penalize more the loss of oportunity of picking a number 2 index, but that is done just to accelerate the training.
The following is my final code:
env = TestBot()
env = make_vec_env(lambda: env, n_envs=1)
iters = 0
def clip_range_schedule():
def real_clip_range(progress):
global iters
cr = 0.2
if iters > 20:
cr = 0.0
elif iters > 12:
cr = 0.05
elif iters > 6:
cr = 0.1
return cr
return real_clip_range
model = PPO('MlpPolicy', env, verbose=0, gamma=0.0, gae_lambda=0.65, clip_range=clip_range_schedule())
while True:
iters += 1
model.learn(total_timesteps=4096, reset_num_timesteps=True)

XGBoost using Hyperopt. Facing issues while Hyper-Parameter Tuning

I am trying to Hyper-Parameter Tune XGBoostClassifier using Hyperopt. But I am facing a error. Please find below the code that I am using and the error as well:-
Step_1: Objective Function
import csv
from hyperopt import STATUS_OK
from timeit import default_timer as timer
MAX_EVALS = 200
N_FOLDS = 10
def objective(params, n_folds = N_FOLDS):
"""Objective function for XGBoost Hyperparameter Optimization"""
# Keep track of evals
global ITERATION
ITERATION += 1
# # Retrieve the subsample if present otherwise set to 1.0
# subsample = params['boosting_type'].get('subsample', 1.0)
# # Extract the boosting type
# params['boosting_type'] = params['boosting_type']['boosting_type']
# params['subsample'] = subsample
# Make sure parameters that need to be integers are integers
for parameter_name in ['max_depth', 'colsample_bytree',
'min_child_weight']:
params[parameter_name] = int(params[parameter_name])
start = timer()
# Perform n_folds cross validation
cv_results = xgb.cv(params, train_set, num_boost_round = 10000,
nfold = n_folds, early_stopping_rounds = 100,
metrics = 'auc', seed = 50)
run_time = timer() - start
# Extract the best score
best_score = np.max(cv_results['auc-mean'])
# Loss must be minimized
loss = 1 - best_score
# Boosting rounds that returned the highest cv score
n_estimators = int(np.argmax(cv_results['auc-mean']) + 1)
# Write to the csv file ('a' means append)
of_connection = open(out_file, 'a')
writer = csv.writer(of_connection)
writer.writerow([loss, params, ITERATION, n_estimators,
run_time])
# Dictionary with information for evaluation
return {'loss': loss, 'params': params, 'iteration': ITERATION,
'estimators': n_estimators, 'train_time': run_time,
'status': STATUS_OK}
I have defined the sample space and the optimization algorithm as well. While running Hyperopt, I am encountering this error below. The error is in the objective function.
Error:KeyError: 'auc-mean'
<ipython-input-62-8d4e97f16929> in objective(params, n_folds)
25 run_time = timer() - start
26 # Extract the best score
---> 27 best_score = np.max(cv_results['auc-mean'])
28 # Loss must be minimized
29 loss = 1 - best_score
First, print cv_results and see which key exists.
In the below example notebook the keys were : 'test-auc-mean' and 'train-auc-mean'
See cell 5 here:
https://www.kaggle.com/tilii7/bayesian-optimization-of-xgboost-parameters
#avvinci is correct. Let me explain it further.
cv_results = xgb.cv(params, train_set, num_boost_round = 10000,
nfold = n_folds, early_stopping_rounds = 100,
metrics = 'auc', seed = 50)
This is xgboost cross validation and it return the evaluation history. The history is essentially a pandas dataframe. The column names in the dataframe depends upon what is being passes as in, train, test and eval.
best_score = np.max(cv_results['auc-mean'])
Here you are looking for the best auc in the evaluation history which are called
'test-auc-mean' and 'train-auc-mean'
as #avvinci suggested. The column name 'auc-mean' does not exists so it throws KeyError. Either you call it train-auc-mean for best auc in training set or test-auc-mean for best auc in test set.
If you are in doubt, just run that cross validation outside and use head on the cv_results.

How to create combined ROC Curve for 2 classifiers and two different data set

I have a dataset of 1127 patients. My goal was to classify each patient to 0 or 1.
I have two different classifiers but with the same purpose - to classify the patient to 0 or 1.
I've run one classifier on 364 patients and the second classifier on the 763 patients.
for each classifier\group, I generated the ROC curve.
Now, I would like to combine the curves.
someone could guide me on how to do it?
I'm thinking of calculating the weighted FPR and TPR, but I'm not sure how to do it.
The number of FPR\TPR pairs is different between the curves (The first ROC curve based on 312 pairs and the second ROC curve based on 666 pairs).
Thanks!!!
Imports
import numpy as np
from sklearn.metrics import roc_curve, auc
import matplotlib.pyplot as plt
Data generation
# simulate first dataset with 364 obs
df1 = \
pd.DataFrame(i for i in range(364))
df1['predict_proba_1'] = np.random.normal(0,1,len(df1))
df1['epsilon'] = np.random.normal(0,1,len(df1))
df1['true'] = (0.7*df1['epsilon'] < df1['predict_proba_1']) * 1
df1 = df1.drop(columns=[0, 'epsilon'])
# simulate second dataset with 763 obs
df2 = \
pd.DataFrame(i for i in range(763))
df2['predict_proba_2'] = np.random.normal(0,1,len(df2))
df2['epsilon'] = np.random.normal(0,1,len(df2))
df2['true'] = (0.7*df2['epsilon'] < df2['predict_proba_2']) * 1
df2 = df2.drop(columns=[0, 'epsilon'])
Quick look at generated data
df1
predict_proba_1 true
0 1.234549 1
1 -0.586544 0
2 -0.229539 1
3 0.132185 1
4 -0.411284 0
.. ... ...
359 -0.218775 0
360 -0.985565 0
361 0.542790 1
362 -0.463667 0
363 1.119244 1
[364 rows x 2 columns]
df2
predict_proba_2 true
0 0.278755 1
1 0.653663 0
2 -0.304216 1
3 0.955658 1
4 -1.341669 0
.. ... ...
758 1.359606 1
759 -0.605894 0
760 0.379738 0
761 1.571615 1
762 -1.102565 0
[763 rows x 2 columns]
Necessary functions
def show_ROCs(scores_list: list, ys_list: list, labels_list:list = None):
"""
This function plots a couple of ROCs. Corresponding labels are optional.
Parameters
----------
scores_list : list of array-likes with scorings or predicted probabilities.
ys_list : list of array-likes with ground true labels.
labels_list : list of labels to be displayed in plotted graph.
Returns
----------
None
"""
if len(scores_list) != len(ys_list):
raise Exception('len(scores_list) != len(ys_list)')
fpr_dict = dict()
tpr_dict = dict()
for x in range(len(scores_list)):
fpr_dict[x], tpr_dict[x], _ = roc_curve(ys_list[x], scores_list[x])
for x in range(len(scores_list)):
try:
plot_ROC(fpr_dict[x], tpr_dict[x], str(labels_list[x]) + ' AUC:' + str(round(auc(fpr_dict[x], tpr_dict[x]),3)))
except:
plot_ROC(fpr_dict[x], tpr_dict[x], str(x) + ' ' + str(round(auc(fpr_dict[x], tpr_dict[x]),3)))
plt.show()
def plot_ROC(fpr, tpr, label):
"""
This function plots a single ROC. Corresponding label is optional.
Parameters
----------
fpr : array-likes with fpr.
tpr : array-likes with tpr.
label : label to be displayed in plotted graph.
Returns
----------
None
"""
plt.figure(1)
plt.plot([0, 1], [0, 1], 'k--')
plt.plot(fpr, tpr, label=label)
plt.xlabel('False positive rate')
plt.ylabel('True positive rate')
plt.title('ROC curve')
plt.legend(loc='best')
Plotting
show_ROCs(
[df1['predict_proba_1'], df2['predict_proba_2']],
[df1['true'], df2['true']],
['df1 with {} obs'.format(len(df1)), 'df2 with {} obs'.format(len(df2))]
)

Neural Network MNIST: Backpropagation is correct, but training/test accuracy very low

I am building a neural network to learn to recognize handwritten digits from MNIST. I have confirmed that backpropagation calculates the gradients perfectly (gradient checking gives error < 10 ^ -10).
It appears that no matter how I train the weights, the cost function always tends towards around 3.24-3.25 (never below that, just approaching from above) and the training/test set accuracy is very low (around 11% for the test set). It appears that the h values in the end are all very close to 0.1 and to each other.
I cannot find why my program cannot produce better results. I was wondering if anyone could maybe take a look at my code and please tell me any reasons for this occurring. Thank you so much for all your help, I really appreciate it!
Here is my Python code:
import numpy as np
import math
from tensorflow.examples.tutorials.mnist import input_data
# Neural network has four layers
# The input layer has 784 nodes
# The two hidden layers each have 5 nodes
# The output layer has 10 nodes
num_layer = 4
num_node = [784,5,5,10]
num_output_node = 10
# 30000 training sets are used
# 10000 test sets are used
# Can be adjusted
Ntrain = 30000
Ntest = 10000
# Sigmoid Function
def g(X):
return 1/(1 + np.exp(-X))
# Forwardpropagation
def h(W,X):
a = X
for l in range(num_layer - 1):
a = np.insert(a,0,1)
z = np.dot(a,W[l])
a = g(z)
return a
# Cost Function
def J(y, W, X, Lambda):
cost = 0
for i in range(Ntrain):
H = h(W,X[i])
for k in range(num_output_node):
cost = cost + y[i][k] * math.log(H[k]) + (1-y[i][k]) * math.log(1-H[k])
regularization = 0
for l in range(num_layer - 1):
for i in range(num_node[l]):
for j in range(num_node[l+1]):
regularization = regularization + W[l][i+1][j] ** 2
return (-1/Ntrain * cost + Lambda / (2*Ntrain) * regularization)
# Backpropagation - confirmed to be correct
# Algorithm based on https://www.coursera.org/learn/machine-learning/lecture/1z9WW/backpropagation-algorithm
# Returns D, the value of the gradient
def BackPropagation(y, W, X, Lambda):
delta = np.empty(num_layer-1, dtype = object)
for l in range(num_layer - 1):
delta[l] = np.zeros((num_node[l]+1,num_node[l+1]))
for i in range(Ntrain):
A = np.empty(num_layer-1, dtype = object)
a = X[i]
for l in range(num_layer - 1):
A[l] = a
a = np.insert(a,0,1)
z = np.dot(a,W[l])
a = g(z)
diff = a - y[i]
delta[num_layer-2] = delta[num_layer-2] + np.outer(np.insert(A[num_layer-2],0,1),diff)
for l in range(num_layer-2):
index = num_layer-2-l
diff = np.multiply(np.dot(np.array([W[index][k+1] for k in range(num_node[index])]), diff), np.multiply(A[index], 1-A[index]))
delta[index-1] = delta[index-1] + np.outer(np.insert(A[index-1],0,1),diff)
D = np.empty(num_layer-1, dtype = object)
for l in range(num_layer - 1):
D[l] = np.zeros((num_node[l]+1,num_node[l+1]))
for l in range(num_layer-1):
for i in range(num_node[l]+1):
if i == 0:
for j in range(num_node[l+1]):
D[l][i][j] = 1/Ntrain * delta[l][i][j]
else:
for j in range(num_node[l+1]):
D[l][i][j] = 1/Ntrain * (delta[l][i][j] + Lambda * W[l][i][j])
return D
# Neural network - this is where the learning/adjusting of weights occur
# W is the weights
# learn is the learning rate
# iterations is the number of iterations we pass over the training set
# Lambda is the regularization parameter
def NeuralNetwork(y, X, learn, iterations, Lambda):
W = np.empty(num_layer-1, dtype = object)
for l in range(num_layer - 1):
W[l] = np.random.rand(num_node[l]+1,num_node[l+1])/100
for k in range(iterations):
print(J(y, W, X, Lambda))
D = BackPropagation(y, W, X, Lambda)
for l in range(num_layer-1):
W[l] = W[l] - learn * D[l]
print(J(y, W, X, Lambda))
return W
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
# Training data, read from MNIST
inputpix = []
output = []
for i in range(Ntrain):
inputpix.append(2 * np.array(mnist.train.images[i]) - 1)
output.append(np.array(mnist.train.labels[i]))
np.savetxt('input.txt', inputpix, delimiter=' ')
np.savetxt('output.txt', output, delimiter=' ')
# Train the weights
finalweights = NeuralNetwork(output, inputpix, 2, 5, 1)
# Test data
inputtestpix = []
outputtest = []
for i in range(Ntest):
inputtestpix.append(2 * np.array(mnist.test.images[i]) - 1)
outputtest.append(np.array(mnist.test.labels[i]))
np.savetxt('inputtest.txt', inputtestpix, delimiter=' ')
np.savetxt('outputtest.txt', outputtest, delimiter=' ')
# Determine the accuracy of the training data
count = 0
for i in range(Ntrain):
H = h(finalweights,inputpix[i])
print(H)
for j in range(num_output_node):
if H[j] == np.amax(H) and output[i][j] == 1:
count = count + 1
print(count/Ntrain)
# Determine the accuracy of the test data
count = 0
for i in range(Ntest):
H = h(finalweights,inputtestpix[i])
print(H)
for j in range(num_output_node):
if H[j] == np.amax(H) and outputtest[i][j] == 1:
count = count + 1
print(count/Ntest)
Your network is tiny, 5 neurons make it basically a linear model. Increase it to 256 per layer.
Notice, that trivial linear model has 768 * 10 + 10 (biases) parameters, adding up to 7690 floats. Your neural network on the other hand has 768 * 5 + 5 + 5 * 5 + 5 + 5 * 10 + 10 = 3845 + 30 + 60 = 3935. In other words despite being nonlinear neural network, it is actualy a simpler model than a trivial logistic regression applied to this problem. And logistic regression obtains around 11% error on its own, thus you cannot really expect to beat it. Of course this is not a strict argument, but should give you some intuition for why it should not work.
Second issue is related to other hyperparameters, you seem to be using:
huge learning rate (is it 2?) it should be more of order 0.0001
very little training iterations (are you just executing 5 epochs?)
your regularization parameter is huge (it is set to 1), so your network is heavily penalised for learning anything, again - change it to something order of magnitude smaller
The NN architecture is most likely under-fitting. Maybe, the learning rate is high/low. Or there are most issues with the regularization parameter.

How to train a RNN with LSTM cells for time series prediction

I'm currently trying to build a simple model for predicting time series. The goal would be to train the model with a sequence so that the model is able to predict future values.
I'm using tensorflow and lstm cells to do so. The model is trained with truncated backpropagation through time. My question is how to structure the data for training.
For example let's assume we want to learn the given sequence:
[1,2,3,4,5,6,7,8,9,10,11,...]
And we unroll the network for num_steps=4.
Option 1
input data label
1,2,3,4 2,3,4,5
5,6,7,8 6,7,8,9
9,10,11,12 10,11,12,13
...
Option 2
input data label
1,2,3,4 2,3,4,5
2,3,4,5 3,4,5,6
3,4,5,6 4,5,6,7
...
Option 3
input data label
1,2,3,4 5
2,3,4,5 6
3,4,5,6 7
...
Option 4
input data label
1,2,3,4 5
5,6,7,8 9
9,10,11,12 13
...
Any help would be appreciated.
I'm just about to learn LSTMs in TensorFlow and try to implement an example which (luckily) tries to predict some time-series / number-series genereated by a simple math-fuction.
But I'm using a different way to structure the data for training, motivated by Unsupervised Learning of Video Representations using LSTMs:
LSTM Future Predictor Model
Option 5:
input data label
1,2,3,4 5,6,7,8
2,3,4,5 6,7,8,9
3,4,5,6 7,8,9,10
...
Beside this paper, I (tried) to take inspiration by the given TensorFlow RNN examples. My current complete solution looks like this:
import math
import random
import numpy as np
import tensorflow as tf
LSTM_SIZE = 64
LSTM_LAYERS = 2
BATCH_SIZE = 16
NUM_T_STEPS = 4
MAX_STEPS = 1000
LAMBDA_REG = 5e-4
def ground_truth_func(i, j, t):
return i * math.pow(t, 2) + j
def get_batch(batch_size):
seq = np.zeros([batch_size, NUM_T_STEPS, 1], dtype=np.float32)
tgt = np.zeros([batch_size, NUM_T_STEPS], dtype=np.float32)
for b in xrange(batch_size):
i = float(random.randint(-25, 25))
j = float(random.randint(-100, 100))
for t in xrange(NUM_T_STEPS):
value = ground_truth_func(i, j, t)
seq[b, t, 0] = value
for t in xrange(NUM_T_STEPS):
tgt[b, t] = ground_truth_func(i, j, t + NUM_T_STEPS)
return seq, tgt
# Placeholder for the inputs in a given iteration
sequence = tf.placeholder(tf.float32, [BATCH_SIZE, NUM_T_STEPS, 1])
target = tf.placeholder(tf.float32, [BATCH_SIZE, NUM_T_STEPS])
fc1_weight = tf.get_variable('w1', [LSTM_SIZE, 1], initializer=tf.random_normal_initializer(mean=0.0, stddev=1.0))
fc1_bias = tf.get_variable('b1', [1], initializer=tf.constant_initializer(0.1))
# ENCODER
with tf.variable_scope('ENC_LSTM'):
lstm = tf.nn.rnn_cell.LSTMCell(LSTM_SIZE)
multi_lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * LSTM_LAYERS)
initial_state = multi_lstm.zero_state(BATCH_SIZE, tf.float32)
state = initial_state
for t_step in xrange(NUM_T_STEPS):
if t_step > 0:
tf.get_variable_scope().reuse_variables()
# state value is updated after processing each batch of sequences
output, state = multi_lstm(sequence[:, t_step, :], state)
learned_representation = state
# DECODER
with tf.variable_scope('DEC_LSTM'):
lstm = tf.nn.rnn_cell.LSTMCell(LSTM_SIZE)
multi_lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * LSTM_LAYERS)
state = learned_representation
logits_stacked = None
loss = 0.0
for t_step in xrange(NUM_T_STEPS):
if t_step > 0:
tf.get_variable_scope().reuse_variables()
# state value is updated after processing each batch of sequences
output, state = multi_lstm(sequence[:, t_step, :], state)
# output can be used to make next number prediction
logits = tf.matmul(output, fc1_weight) + fc1_bias
if logits_stacked is None:
logits_stacked = logits
else:
logits_stacked = tf.concat(1, [logits_stacked, logits])
loss += tf.reduce_sum(tf.square(logits - target[:, t_step])) / BATCH_SIZE
reg_loss = loss + LAMBDA_REG * (tf.nn.l2_loss(fc1_weight) + tf.nn.l2_loss(fc1_bias))
train = tf.train.AdamOptimizer().minimize(reg_loss)
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
total_loss = 0.0
for step in xrange(MAX_STEPS):
seq_batch, target_batch = get_batch(BATCH_SIZE)
feed = {sequence: seq_batch, target: target_batch}
_, current_loss = sess.run([train, reg_loss], feed)
if step % 10 == 0:
print("#{}: {}".format(step, current_loss))
total_loss += current_loss
print('Total loss:', total_loss)
print('### SIMPLE EVAL: ###')
seq_batch, target_batch = get_batch(BATCH_SIZE)
feed = {sequence: seq_batch, target: target_batch}
prediction = sess.run([logits_stacked], feed)
for b in xrange(BATCH_SIZE):
print("{} -> {})".format(str(seq_batch[b, :, 0]), target_batch[b, :]))
print(" `-> Prediction: {}".format(prediction[0][b]))
Sample output of this looks like this:
### SIMPLE EVAL: ###
# [input seq] -> [target prediction]
# `-> Prediction: [model prediction]
[ 33. 53. 113. 213.] -> [ 353. 533. 753. 1013.])
`-> Prediction: [ 19.74548721 28.3149128 33.11489105 35.06603241]
[ -17. -32. -77. -152.] -> [-257. -392. -557. -752.])
`-> Prediction: [-16.38951683 -24.3657589 -29.49801064 -31.58583832]
[ -7. -4. 5. 20.] -> [ 41. 68. 101. 140.])
`-> Prediction: [ 14.14126873 22.74848557 31.29668617 36.73633194]
...
The model is a LSTM-autoencoder having 2 layers each.
Unfortunately, as you can see in the results, this model does not learn the sequence properly. I might be the case that I'm just doing a bad mistake somewhere, or that 1000-10000 training steps is just way to few for a LSTM. As I said, I'm also just starting to understand/use LSTMs properly.
But hopefully this can give you some inspiration regarding the implementation.
After reading several LSTM introduction blogs e.g. Jakob Aungiers', option 3 seems to be the right one for stateless LSTM.
If your LSTMs need to remember data longer ago than your num_steps, your can train in a stateful way - for a Keras example see Philippe Remy's blog post "Stateful LSTM in Keras". Philippe does not show an example for batch size greater than one, however. I guess that in your case a batch size of four with stateful LSTM could be used with the following data (written as input -> label):
batch #0:
1,2,3,4 -> 5
2,3,4,5 -> 6
3,4,5,6 -> 7
4,5,6,7 -> 8
batch #1:
5,6,7,8 -> 9
6,7,8,9 -> 10
7,8,9,10 -> 11
8,9,10,11 -> 12
batch #2:
9,10,11,12 -> 13
...
By this, the state of e.g. the 2nd sample in batch #0 is correctly reused to continue training with the 2nd sample of batch #1.
This is somehow similar to your option 4, however you are not using all available labels there.
Update:
In extension to my suggestion where batch_size equals the num_steps, Alexis Huet gives an answer for the case of batch_size being a divisor of num_steps, which can be used for larger num_steps. He describes it nicely on his blog.
I believe Option 1 is closest to the reference implementation in /tensorflow/models/rnn/ptb/reader.py
def ptb_iterator(raw_data, batch_size, num_steps):
"""Iterate on the raw PTB data.
This generates batch_size pointers into the raw PTB data, and allows
minibatch iteration along these pointers.
Args:
raw_data: one of the raw data outputs from ptb_raw_data.
batch_size: int, the batch size.
num_steps: int, the number of unrolls.
Yields:
Pairs of the batched data, each a matrix of shape [batch_size, num_steps].
The second element of the tuple is the same data time-shifted to the
right by one.
Raises:
ValueError: if batch_size or num_steps are too high.
"""
raw_data = np.array(raw_data, dtype=np.int32)
data_len = len(raw_data)
batch_len = data_len // batch_size
data = np.zeros([batch_size, batch_len], dtype=np.int32)
for i in range(batch_size):
data[i] = raw_data[batch_len * i:batch_len * (i + 1)]
epoch_size = (batch_len - 1) // num_steps
if epoch_size == 0:
raise ValueError("epoch_size == 0, decrease batch_size or num_steps")
for i in range(epoch_size):
x = data[:, i*num_steps:(i+1)*num_steps]
y = data[:, i*num_steps+1:(i+1)*num_steps+1]
yield (x, y)
However, another Option is to select a pointer into your data array randomly for each training sequence.

Resources