Questions regarding custom multiclass metrics (Keras) - machine-learning

could anyone explain how to write a custom multiclass metrics for Keras? I tried to write custom metric but encountered some issue. Main problem is I am not familiar with how tensor works during training (I think it is called Graph mode?). I am able to create confusion matrix and derived F1 score using NumPy or Python list.
I printed out the y-true and y_pred and tried to understand them, but the output was not what I expected:
Below is the function I used:
def f1_scores(y_true,y_pred):
y_true = K.print_tensor(y_true, message='y_true = ')
y_pred = K.print_tensor(y_pred, message='y_pred = ')
print(f"y_true_shape:{K.int_shape(y_true)}")
print(f"y_pred_shape:{K.int_shape(y_pred)}")
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
gt = K.argmax(y_true_f)
pred = K.argmax(y_pred_f)
print(f"pred_print:{pred}")
print(f"gt_print:{gt}")
pred = K.print_tensor(pred, message='pred= ')
gt = K.print_tensor(gt, message='gt =')
print(f"pred_shape:{K.int_shape(pred)}")
print(f"gt_shape:{K.int_shape(gt)}")
pred_f = K.flatten(pred)
gt_f = K.flatten(gt)
pred_f = K.print_tensor(pred_f, message='pred_f= ')
gt_f = K.print_tensor(gt_f, message='gt_f =')
print(f"pred_f_shape:{K.int_shape(pred_f)}")
print(f"gt_f_shape:{K.int_shape(gt_f)}")
conf_mat = tf.math.confusion_matrix(y_true_f,y_pred_f, num_classes = 14)
"""
add codes to find F1 score for each class
"""
# return an arbitrary number, as F1 scores not found yet.
return 1
The output at when epoch 1 just started:
y_true_shape:(None, 256, 256, 14)
y_pred_shape:(None, 256, 256, 14)
pred_print:Tensor("ArgMax_1:0", shape=(), dtype=int64)
gt_print:Tensor("ArgMax:0", shape=(), dtype=int64)
pred_shape:()
gt_shape:()
pred_f_shape:(1,)
gt_f_shape:(1,)
Then for the rest of the steps and epochs were similar as below:
y_true = [[[[1 0 0 ... 0 0 0]
[1 0 0 ... 0 0 0]
[1 0 0 ... 0 0 0]
...
y_pred = [[[[0.0889623 0.0624801107 0.0729747042 ... 0.0816219151 0.0735477135 0.0698677748]
[0.0857798532 0.0721047595 0.0754121244 ... 0.0723947287 0.0728530064 0.0676521733]
[0.0825942457 0.0670698211 0.0879610255 ... 0.0721599609 0.0845924541 0.0638583601]
...
pred= 1283828
gt = 0
pred_f= [1283828]
gt_f = [0]
Why is pred a number instead of a list of numbers with each number represents index of class? Similarly, why is pred_f is a list with only one number instead of list of indices?
And for gt (and gt_f), why is the value 0? I expect them to be list of indices.

I looks like argmax() simply uses the flattened y.
You need to specify which axis you want argmax() to reduce. Probably it's the last one, in your case 3. Then you'll get pred with a shape (None, 256, 256) containing integer between 0 and 13.
Try something like this: pred = K.argmax(y_pred, axis=3)
This is the documentation for tensorflow argmax. (But I'm not sure if you're using exactly that, since I can not see what K is imported as)

Related

Pytorch Multiclass Logistic Regression Type Errors

I'm new to ML and even more naive with Pytorch. Here's the problem. (I've skipped certain parts like the random_split() which seem to work just fine)
I've to predict wine quality (red) which from the dataset is the last column with 6 classes
That's what my dataset looks like
The link to the dataset (winequality-red.csv)
features = df.drop(['quality'], axis = 1)
targets = df.iloc[:, -1] # theres 6 classes
dataset = TensorDataset(torch.Tensor(np.array(features)).float(), torch.Tensor(targets).float())
# here's where I think the error might be, but I might be wrong
batch_size = 8
# Dataloader
train_loader = DataLoader(train_ds, batch_size, shuffle = True)
val_loader = DataLoader(val_ds, batch_size)
test_ds = DataLoader(test_ds, batch_size)
input_size = len(df.columns) - 1
output_size = 6
threshold = .5
class WineModel(nn.Module):
def __init__(self):
super().__init__()
self.linear = nn.Linear(input_size, output_size)
def forward(self, xb):
out = self.linear(xb)
return out
model = WineModel()
n_iters = 2000
num_epochs = n_iters / (len(train_ds) / batch_size)
num_epochs = int(num_epochs)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-2)
# the part below returns the error on running
iter = 0
for epoch in range(num_epochs):
for i, (x, y) in enumerate(train_loader):
optimizer.zero_grad()
outputs = model(x)
loss = criterion(outputs, y)
loss.backward()
optimizer.step()
RuntimeError: expected scalar type Long but found Float
Hopefully that is sufficient info
The targets for nn.CrossEntropyLoss are given as the class indices, which are required to be integers, to be precise they need to be of type torch.long, which is equivalent to torch.int64.
You converted the targets to floats, but you should convert them to longs:
dataset = TensorDataset(torch.Tensor(np.array(features)).float(), torch.Tensor(targets).long())
Since the targets are the indices of the classes, they must be in range [0, num_classes - 1]. As you have 6 classes that would be in range [0, 5]. Having a quick look at your data, the quality uses values in range [3, 8]. Even though you have 6 classes, the values cannot be used directly as the classes. If you list the classes as classes = [3, 4, 5, 6, 7, 8], you can see that the first class is 3, classes[0] == 3, up to the last class being classes[5] == 8.
You need to replace the class values with the indices, just like you would for named classes (e.g. if you had the classes dog and cat, dog would be 0 and cat would be 1), but you can avoid having to look them up, since the values are simply shifted by 3, i.e. index = classes[index] - 3. Therefore you can subtract 3 from the entire target tensor:
torch.Tensor(targets).long() - 3

Implementing a Generative RNN with continuous input and discrete output

I am currently using a generative RNN to classify indices in a sequence (sort of saying whether something is noise or not noise).
My input in continuous (i.e. a real value between 0 and 1) and my output is either a (0 or 1).
For example, if the model marks a 1 for numbers greater than 0.5 and 0 otherwise,
[.21, .35, .78, .56, ..., .21] => [0, 0, 1, 1, ..., 0]:
0 0 1 1 0
^ ^ ^ ^ ^
| | | | |
o->L1 ->L2 ->L3 ->L4 ->... ->L10
^ ^ ^ ^ ^
| | | | |
.21 .35 .78 .56 ... .21
Using
n_steps = 10
n_inputs = 1
n_neurons = 7
X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y = tf.placeholder(tf.float32, [None, n_steps, n_outputs])
cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons, activation=tf.nn.relu)
rnn_outputs, states = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32)
rnn_outputs becomes a (?, 10, 7) shape tensor, presumable 7 outputs per each of the 10 time steps.
Previously, I have run the following snippet on output projection wrapped rnn_outputs to get a classification label per sequence.
xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y,logits=logits)
loss = tf.reduce_mean(xentropy)
How would I run something similar on rnn_outputs to get a sequence?
Specifically,
1. Can I get the rnn_output from each step and feed it into a softmax?
curr_state = rnn_outputs[:,i,:]
logits = tf.layers.dense(states, n_outputs)
xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
2. What loss function should I use and should it be applied across every value of every sequence? (for sequence i and step j, loss = y_{ij} (true) - y_{ij}(predicted) )?
Should my loss be loss = tf.reduce_mean(np.sum(xentropy))?
EDIT
It seems I am trying to implement something similar to what is similar in https://machinelearningmastery.com/develop-bidirectional-lstm-sequence-classification-python-keras/ in TensorFlow.
In Keras, there's a TimeDistributed function:
You can then use TimeDistributed to apply a Dense layer to each of the
10 timesteps, independently
How would I go about implementing something similar in Tensorflow?
First up, it looks like you're doing seq-to-seq modelling. In this kind of problems it's usually a good idea to go with encoder-decoder architecture rather than predict the sequence from the same RNN. Tensorflow has a big tutorial about it under the name "Neural Machine Translation (seq2seq) Tutorial", which I'd recommend you to check out.
However, the architecture that you're asking about is also possible provided that n_steps is known statically (despite using dynamic_rnn). In this case, it's possible compute the cross-entropy of each cells' output and then sum up all the losses. It's possible if the RNN length is dynamic as well, but would be more hairy. Here's the code:
n_steps = 2
n_inputs = 3
n_neurons = 5
X = tf.placeholder(dtype=tf.float32, shape=[None, n_steps, n_inputs], name='x')
y = tf.placeholder(dtype=tf.int32, shape=[None, n_steps], name='y')
basic_cell = tf.nn.rnn_cell.BasicRNNCell(num_units=n_neurons)
outputs, states = tf.nn.dynamic_rnn(basic_cell, X, dtype=tf.float32)
# Reshape to make `time` a 0-axis
time_based_outputs = tf.transpose(outputs, [1, 0, 2])
time_based_labels = tf.transpose(y, [1, 0])
losses = []
for i in range(n_steps):
cell_output = time_based_outputs[i] # get the output, can do apply further dense layers if needed
labels = time_based_labels[i] # get the label (sparse)
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=cell_output)
losses.append(loss) # collect all losses
total_loss = tf.reduce_sum(losses) # compute the total loss

Why does Keras not generalize my data?

Ive been trying to implement a basic multilayered LSTM regression network to find correlations between cryptocurrency prices.
After running into unusable training results, i've decided to play around with some sandbox code, to make sure i've got the idea right before trying again on my full dataset.
The problem is I can't get Keras to generalize my data.
ts = 3
in_dim = 1
data = [i*100 for i in range(10)]
# tried this, didn't accomplish anything
# data = [(d - np.mean(data))/np.std(data) for d in data]
x = data[:len(data) - 4]
y = data[3:len(data) - 1]
assert(len(x) == len(y))
x = [[_x] for _x in x]
y = [[_y] for _y in y]
x = [x[idx:idx + ts] for idx in range(0, len(x), ts)]
y = [y[idx:idx + ts] for idx in range(0, len(y), ts)]
x = np.asarray(x)
y = np.asarray(y)
x looks like this:
[[[ 0]
[100]
[200]]
[[300]
[400]
[500]]]
and y:
[[[300]
[400]
[500]]
[[600]
[700]
[800]]]
and this works well when I predict using a very similar dataset, but doesn't generalize when I try a similar sequence with scaled values
model = Sequential()
model.add(BatchNormalization(
axis = 1,
input_shape = (ts, in_dim)))
model.add(LSTM(
100,
input_shape = (ts, in_dim),
return_sequences = True))
model.add(TimeDistributed(Dense(in_dim)))
model.add(Activation('linear'))
model.compile(loss = 'mse', optimizer = 'rmsprop')
model.fit(x, y, epochs = 2000, verbose = 0)
p = np.asarray([[[10],[20],[30]]])
prediction = model.predict(p)
print(prediction)
prints
[[[ 165.78544617]
[ 209.34489441]
[ 216.02174377]]]
I want
[[[ 40.0000]
[ 50.0000]
[ 60.0000]]]
how can I format this so that when i plug in a sequence with values that are of a completely different scale, the network will still output its predicted value? I've tried normalizing my training data, but the results are still entirely unusable.
What have I done wrong here?
How about transform your input data before sending into your LSTM, use something like sklearn.preprocessing.StandardScaler? after prediction you can call scaler.inverse_transform(prediction)

Use neural network to learn a square wave function

Out of curiosity, I am trying to build a simple fully connected NN using tensorflow to learn a square wave function such as the following one:
Therefore the input is a 1D array of x value (as the horizontal axis), and the output is a binary scalar value. I used tf.nn.sparse_softmax_cross_entropy_with_logits as loss function, and tf.nn.relu as activation. There are 3 hidden layers (100*100*100) and a single input node and output node. The input data are generated to match the above wave shape and therefore the data size is not a problem.
However, the trained model seems to fail completed, predicting for the negative class always.
So I am trying to figure out why this happened. Whether the NN configuration is suboptimal, or it is due to some mathematical flaw in NN beneath the surface (though I think NN should be able to imitate any function).
Thanks.
As per suggestions in the comment section, here is the full code. One thing I noticed saying wrong earlier is, there were actually 2 output nodes (due to 2 output classes):
"""
See if neural net can find piecewise linear correlation in the data
"""
import time
import os
import tensorflow as tf
import numpy as np
def generate_placeholder(batch_size):
x_placeholder = tf.placeholder(tf.float32, shape=(batch_size, 1))
y_placeholder = tf.placeholder(tf.float32, shape=(batch_size))
return x_placeholder, y_placeholder
def feed_placeholder(x, y, x_placeholder, y_placeholder, batch_size, loop):
x_selected = [[None]] * batch_size
y_selected = [None] * batch_size
for i in range(batch_size):
x_selected[i][0] = x[min(loop*batch_size, loop*batch_size % len(x)) + i, 0]
y_selected[i] = y[min(loop*batch_size, loop*batch_size % len(y)) + i]
feed_dict = {x_placeholder: x_selected,
y_placeholder: y_selected}
return feed_dict
def inference(input_x, H1_units, H2_units, H3_units):
with tf.name_scope('H1'):
weights = tf.Variable(tf.truncated_normal([1, H1_units], stddev=1.0/2), name='weights')
biases = tf.Variable(tf.zeros([H1_units]), name='biases')
a1 = tf.nn.relu(tf.matmul(input_x, weights) + biases)
with tf.name_scope('H2'):
weights = tf.Variable(tf.truncated_normal([H1_units, H2_units], stddev=1.0/H1_units), name='weights')
biases = tf.Variable(tf.zeros([H2_units]), name='biases')
a2 = tf.nn.relu(tf.matmul(a1, weights) + biases)
with tf.name_scope('H3'):
weights = tf.Variable(tf.truncated_normal([H2_units, H3_units], stddev=1.0/H2_units), name='weights')
biases = tf.Variable(tf.zeros([H3_units]), name='biases')
a3 = tf.nn.relu(tf.matmul(a2, weights) + biases)
with tf.name_scope('softmax_linear'):
weights = tf.Variable(tf.truncated_normal([H3_units, 2], stddev=1.0/np.sqrt(H3_units)), name='weights')
biases = tf.Variable(tf.zeros([2]), name='biases')
logits = tf.matmul(a3, weights) + biases
return logits
def loss(logits, labels):
labels = tf.to_int32(labels)
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits, name='xentropy')
return tf.reduce_mean(cross_entropy, name='xentropy_mean')
def inspect_y(labels):
return tf.reduce_sum(tf.cast(labels, tf.int32))
def training(loss, learning_rate):
tf.summary.scalar('lost', loss)
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
global_step = tf.Variable(0, name='global_step', trainable=False)
train_op = optimizer.minimize(loss, global_step=global_step)
return train_op
def evaluation(logits, labels):
labels = tf.to_int32(labels)
correct = tf.nn.in_top_k(logits, labels, 1)
return tf.reduce_sum(tf.cast(correct, tf.int32))
def run_training(x, y, batch_size):
with tf.Graph().as_default():
x_placeholder, y_placeholder = generate_placeholder(batch_size)
logits = inference(x_placeholder, 100, 100, 100)
Loss = loss(logits, y_placeholder)
y_sum = inspect_y(y_placeholder)
train_op = training(Loss, 0.01)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
max_steps = 10000
for step in range(max_steps):
start_time = time.time()
feed_dict = feed_placeholder(x, y, x_placeholder, y_placeholder, batch_size, step)
_, loss_val = sess.run([train_op, Loss], feed_dict = feed_dict)
duration = time.time() - start_time
if step % 100 == 0:
print('Step {}: loss = {:.2f} {:.3f}sec'.format(step, loss_val, duration))
x_test = np.array(range(1000)) * 0.001
x_test = np.reshape(x_test, (1000, 1))
_ = sess.run(logits, feed_dict={x_placeholder: x_test})
print(min(_[:, 0]), max(_[:, 0]), min(_[:, 1]), max(_[:, 1]))
print(_)
if __name__ == '__main__':
population = 10000
input_x = np.random.rand(population)
input_y = np.copy(input_x)
for bin in range(10):
print(bin, bin/10, 0.5 - 0.5*(-1)**bin)
input_y[input_x >= bin/10] = 0.5 - 0.5*(-1)**bin
batch_size = 1000
input_x = np.reshape(input_x, (population, 1))
run_training(input_x, input_y, batch_size)
Sample output shows that the model always prefer the first class over the second, as shown by min(_[:, 0]) > max(_[:, 1]), i.e. the minimum logit output for the first class is higher than the maximum logit output for the second class, for a sample size of population.
My mistake. The problem occurred in the line:
for i in range(batch_size):
x_selected[i][0] = x[min(loop*batch_size, loop*batch_size % len(x)) + i, 0]
y_selected[i] = y[min(loop*batch_size, loop*batch_size % len(y)) + i]
Python is mutating the whole list of x_selected to the same value. Now this code issue is resolved. The fix is:
x_selected = np.zeros((batch_size, 1))
y_selected = np.zeros((batch_size,))
for i in range(batch_size):
x_selected[i, 0] = x[(loop*batch_size + i) % x.shape[0], 0]
y_selected[i] = y[(loop*batch_size + i) % y.shape[0]]
After this fix, the model is showing more variation. It currently outputs class 0 for x <= 0.5 and class 1 for x > 0.5. But this is still far from ideal.
So after changing the network configuration to 100 nodes * 4 layers, after 1 million training steps (batch size = 100, sample size = 10 million), the model is performing very well showing only errors at the edges when y flips.
Therefore this question is closed.
You essentially try to learn a periodic function and the function is highly non-linear and non-smooth. So it is NOT simple as it looks like. In short, a better representation of the input feature helps.
Suppose your have a period T = 2, f(x) = f(x+2).
For a reduced problem when input/output are integers, your function is then f(x) = 1 if x is odd else -1. In this case, your problem would be reduced to this discussion in which we train a Neural Network to distinguish between odd and even numbers.
I guess the second bullet in that post should help (even for the general case when inputs are float numbers).
Try representing the numbers in binary using a fixed length precision.
In our reduced problem above, it's easy to see that the output is determined iff the least-significant bit is known.
decimal binary -> output
1: 0 0 1 -> 1
2: 0 1 0 -> -1
3: 0 1 1 -> 1
...
I created the model and the structure for the problem of recognizing odd/even numbers in here.
If you abstract the fact that:
decimal binary -> output
1: 0 0 1 -> 1
2: 0 1 0 -> -1
3: 0 1 1 -> 1
Is almost equivalent to:
decimal binary -> output
1: 0 0 1 -> 1
2: 0 1 0 -> 0
3: 0 1 1 -> 1
You may update the code to fit your need.

How to train a RNN with LSTM cells for time series prediction

I'm currently trying to build a simple model for predicting time series. The goal would be to train the model with a sequence so that the model is able to predict future values.
I'm using tensorflow and lstm cells to do so. The model is trained with truncated backpropagation through time. My question is how to structure the data for training.
For example let's assume we want to learn the given sequence:
[1,2,3,4,5,6,7,8,9,10,11,...]
And we unroll the network for num_steps=4.
Option 1
input data label
1,2,3,4 2,3,4,5
5,6,7,8 6,7,8,9
9,10,11,12 10,11,12,13
...
Option 2
input data label
1,2,3,4 2,3,4,5
2,3,4,5 3,4,5,6
3,4,5,6 4,5,6,7
...
Option 3
input data label
1,2,3,4 5
2,3,4,5 6
3,4,5,6 7
...
Option 4
input data label
1,2,3,4 5
5,6,7,8 9
9,10,11,12 13
...
Any help would be appreciated.
I'm just about to learn LSTMs in TensorFlow and try to implement an example which (luckily) tries to predict some time-series / number-series genereated by a simple math-fuction.
But I'm using a different way to structure the data for training, motivated by Unsupervised Learning of Video Representations using LSTMs:
LSTM Future Predictor Model
Option 5:
input data label
1,2,3,4 5,6,7,8
2,3,4,5 6,7,8,9
3,4,5,6 7,8,9,10
...
Beside this paper, I (tried) to take inspiration by the given TensorFlow RNN examples. My current complete solution looks like this:
import math
import random
import numpy as np
import tensorflow as tf
LSTM_SIZE = 64
LSTM_LAYERS = 2
BATCH_SIZE = 16
NUM_T_STEPS = 4
MAX_STEPS = 1000
LAMBDA_REG = 5e-4
def ground_truth_func(i, j, t):
return i * math.pow(t, 2) + j
def get_batch(batch_size):
seq = np.zeros([batch_size, NUM_T_STEPS, 1], dtype=np.float32)
tgt = np.zeros([batch_size, NUM_T_STEPS], dtype=np.float32)
for b in xrange(batch_size):
i = float(random.randint(-25, 25))
j = float(random.randint(-100, 100))
for t in xrange(NUM_T_STEPS):
value = ground_truth_func(i, j, t)
seq[b, t, 0] = value
for t in xrange(NUM_T_STEPS):
tgt[b, t] = ground_truth_func(i, j, t + NUM_T_STEPS)
return seq, tgt
# Placeholder for the inputs in a given iteration
sequence = tf.placeholder(tf.float32, [BATCH_SIZE, NUM_T_STEPS, 1])
target = tf.placeholder(tf.float32, [BATCH_SIZE, NUM_T_STEPS])
fc1_weight = tf.get_variable('w1', [LSTM_SIZE, 1], initializer=tf.random_normal_initializer(mean=0.0, stddev=1.0))
fc1_bias = tf.get_variable('b1', [1], initializer=tf.constant_initializer(0.1))
# ENCODER
with tf.variable_scope('ENC_LSTM'):
lstm = tf.nn.rnn_cell.LSTMCell(LSTM_SIZE)
multi_lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * LSTM_LAYERS)
initial_state = multi_lstm.zero_state(BATCH_SIZE, tf.float32)
state = initial_state
for t_step in xrange(NUM_T_STEPS):
if t_step > 0:
tf.get_variable_scope().reuse_variables()
# state value is updated after processing each batch of sequences
output, state = multi_lstm(sequence[:, t_step, :], state)
learned_representation = state
# DECODER
with tf.variable_scope('DEC_LSTM'):
lstm = tf.nn.rnn_cell.LSTMCell(LSTM_SIZE)
multi_lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * LSTM_LAYERS)
state = learned_representation
logits_stacked = None
loss = 0.0
for t_step in xrange(NUM_T_STEPS):
if t_step > 0:
tf.get_variable_scope().reuse_variables()
# state value is updated after processing each batch of sequences
output, state = multi_lstm(sequence[:, t_step, :], state)
# output can be used to make next number prediction
logits = tf.matmul(output, fc1_weight) + fc1_bias
if logits_stacked is None:
logits_stacked = logits
else:
logits_stacked = tf.concat(1, [logits_stacked, logits])
loss += tf.reduce_sum(tf.square(logits - target[:, t_step])) / BATCH_SIZE
reg_loss = loss + LAMBDA_REG * (tf.nn.l2_loss(fc1_weight) + tf.nn.l2_loss(fc1_bias))
train = tf.train.AdamOptimizer().minimize(reg_loss)
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
total_loss = 0.0
for step in xrange(MAX_STEPS):
seq_batch, target_batch = get_batch(BATCH_SIZE)
feed = {sequence: seq_batch, target: target_batch}
_, current_loss = sess.run([train, reg_loss], feed)
if step % 10 == 0:
print("#{}: {}".format(step, current_loss))
total_loss += current_loss
print('Total loss:', total_loss)
print('### SIMPLE EVAL: ###')
seq_batch, target_batch = get_batch(BATCH_SIZE)
feed = {sequence: seq_batch, target: target_batch}
prediction = sess.run([logits_stacked], feed)
for b in xrange(BATCH_SIZE):
print("{} -> {})".format(str(seq_batch[b, :, 0]), target_batch[b, :]))
print(" `-> Prediction: {}".format(prediction[0][b]))
Sample output of this looks like this:
### SIMPLE EVAL: ###
# [input seq] -> [target prediction]
# `-> Prediction: [model prediction]
[ 33. 53. 113. 213.] -> [ 353. 533. 753. 1013.])
`-> Prediction: [ 19.74548721 28.3149128 33.11489105 35.06603241]
[ -17. -32. -77. -152.] -> [-257. -392. -557. -752.])
`-> Prediction: [-16.38951683 -24.3657589 -29.49801064 -31.58583832]
[ -7. -4. 5. 20.] -> [ 41. 68. 101. 140.])
`-> Prediction: [ 14.14126873 22.74848557 31.29668617 36.73633194]
...
The model is a LSTM-autoencoder having 2 layers each.
Unfortunately, as you can see in the results, this model does not learn the sequence properly. I might be the case that I'm just doing a bad mistake somewhere, or that 1000-10000 training steps is just way to few for a LSTM. As I said, I'm also just starting to understand/use LSTMs properly.
But hopefully this can give you some inspiration regarding the implementation.
After reading several LSTM introduction blogs e.g. Jakob Aungiers', option 3 seems to be the right one for stateless LSTM.
If your LSTMs need to remember data longer ago than your num_steps, your can train in a stateful way - for a Keras example see Philippe Remy's blog post "Stateful LSTM in Keras". Philippe does not show an example for batch size greater than one, however. I guess that in your case a batch size of four with stateful LSTM could be used with the following data (written as input -> label):
batch #0:
1,2,3,4 -> 5
2,3,4,5 -> 6
3,4,5,6 -> 7
4,5,6,7 -> 8
batch #1:
5,6,7,8 -> 9
6,7,8,9 -> 10
7,8,9,10 -> 11
8,9,10,11 -> 12
batch #2:
9,10,11,12 -> 13
...
By this, the state of e.g. the 2nd sample in batch #0 is correctly reused to continue training with the 2nd sample of batch #1.
This is somehow similar to your option 4, however you are not using all available labels there.
Update:
In extension to my suggestion where batch_size equals the num_steps, Alexis Huet gives an answer for the case of batch_size being a divisor of num_steps, which can be used for larger num_steps. He describes it nicely on his blog.
I believe Option 1 is closest to the reference implementation in /tensorflow/models/rnn/ptb/reader.py
def ptb_iterator(raw_data, batch_size, num_steps):
"""Iterate on the raw PTB data.
This generates batch_size pointers into the raw PTB data, and allows
minibatch iteration along these pointers.
Args:
raw_data: one of the raw data outputs from ptb_raw_data.
batch_size: int, the batch size.
num_steps: int, the number of unrolls.
Yields:
Pairs of the batched data, each a matrix of shape [batch_size, num_steps].
The second element of the tuple is the same data time-shifted to the
right by one.
Raises:
ValueError: if batch_size or num_steps are too high.
"""
raw_data = np.array(raw_data, dtype=np.int32)
data_len = len(raw_data)
batch_len = data_len // batch_size
data = np.zeros([batch_size, batch_len], dtype=np.int32)
for i in range(batch_size):
data[i] = raw_data[batch_len * i:batch_len * (i + 1)]
epoch_size = (batch_len - 1) // num_steps
if epoch_size == 0:
raise ValueError("epoch_size == 0, decrease batch_size or num_steps")
for i in range(epoch_size):
x = data[:, i*num_steps:(i+1)*num_steps]
y = data[:, i*num_steps+1:(i+1)*num_steps+1]
yield (x, y)
However, another Option is to select a pointer into your data array randomly for each training sequence.

Resources