Torch: How to shuffle a tensor by its rows? - lua

I am currently working in torch to implement a random shuffle (on the rows, the first dimension in this case) on some input data. I am new to torch, so I have some troubles figuring out how permutation works..
The following is supposed to shuffle the data:
if argshuffle then
local perm = torch.randperm(sids:size(1)):long()
print("\n\n\nSize of X and y before")
print(X:view(-1, 1000, 128):size())
print(y:size())
print(sids:size())
print("\nPerm size is: ")
print(perm:size())
X = X:view(-1, 1000, 128)[{{perm},{},{}}]
y = y[{{perm},{}}]
print(sids[{{1}, {}}])
sids = sids[{{perm},{}}]
print(sids[{{1}, {}}])
print(X:size())
print(y:size())
print(sids:size())
os.exit(69)
end
This prints out
Size of X and y before
99
1000
128
[torch.LongStorage of size 3]
99
1
[torch.LongStorage of size 2]
99
1
[torch.LongStorage of size 2]
Perm size is:
99
[torch.LongStorage of size 1]
5
[torch.LongStorage of size 1x1]
5
[torch.LongStorage of size 1x1]
99
1000
128
[torch.LongStorage of size 3]
99
1
[torch.LongStorage of size 2]
99
1
[torch.LongStorage of size 2]
Out of the value, I can imply that the function did not shuffle the data. How can I make it shuffle correctly, and what is the common solution in lua/torch?

I also faced a similar issue. In the documentation, there is no shuffle function for tensors (there are for dataset loaders). I found a workaround to the problem using torch.randperm.
>>> a=torch.rand(3,5)
>>> print(a)
tensor([[0.4896, 0.3708, 0.2183, 0.8157, 0.7861],
[0.0845, 0.7596, 0.5231, 0.4861, 0.9237],
[0.4496, 0.5980, 0.7473, 0.2005, 0.8990]])
>>> # Row shuffling
...
>>> a=a[torch.randperm(a.size()[0])]
>>> print(a)
tensor([[0.4496, 0.5980, 0.7473, 0.2005, 0.8990],
[0.0845, 0.7596, 0.5231, 0.4861, 0.9237],
[0.4896, 0.3708, 0.2183, 0.8157, 0.7861]])
>>> # column shuffling
...
>>> a=a[:,torch.randperm(a.size()[1])]
>>> print(a)
tensor([[0.2005, 0.7473, 0.5980, 0.8990, 0.4496],
[0.4861, 0.5231, 0.7596, 0.9237, 0.0845],
[0.8157, 0.2183, 0.3708, 0.7861, 0.4896]])
I hope it answers the question!

dim = 0
idx = torch.randperm(t.shape[dim])
t_shuffled = t[idx]
If your tensor is e.g. of shape CxNxF (channels by rows by features), then you can shuffle along the second dimension like so:
dim=1
idx = torch.randperm(t.shape[dim])
t_shuffled = t[:,idx]

A straightforward solution is to use permutation matrices (those that are usual in linear algebra). Since you seem to be interested in the 3d case, we will have to flatten your 3d tensor first. So, here's an example code (ready-to use) that I came up with
data=torch.floor(torch.rand(5,3,2)*100):float()
reordered_data=data:view(5,-1)
perm=torch.randperm(5);
perm_rep=torch.repeatTensor(perm,5,1):transpose(1,2)
indexes=torch.range(1,5);
indexes_rep=torch.repeatTensor(indexes,5,1)
permutation_matrix=indexes_rep:eq(perm_rep):float()
permuted=permutation_matrix*reordered_data
print("perm")
print(perm)
print("before permutation")
print(data)
print("after permutation")
print(permuted:view(5,3,2))
As you will see from one execution, it reorders the tensor data according to the row indexes given in perm.

Based on your syntax, I assume you're using to torch with lua rather than PyTorch.
torch.Tensor.index is your function, it works like below:
x = torch.rand(4, 4)
p = torch.randperm(4)
print(x)
print(p)
print(x:index(1,p:long())

Related

Loss for Multi-label Classification

I am working on a multi-label classification problem. My gt labels are of shape 14 x 10 x 128, where 14 is the batch_size, 10 is the sequence_length, and 128 is the vector with values 1 if the item in sequence belongs to the object and 0 otherwise.
My output is also of same shape: 14 x 10 x 128. Since, my input sequence was of varying length I had to pad it to make it of fixed length 10. I'm trying to find the loss of the model as follows:
total_loss = 0.0
unpadded_seq_lengths = [3, 4, 5, 7, 9, 3, 2, 8, 5, 3, 5, 7, 7, ...] # true lengths of sequences
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.BCEWithLogitsLoss()
for data in training_dataloader:
optimizer.zero_grad()
# shape of input 14 x 10 x 128
output = model(data)
batch_loss = 0.0
for batch_idx, sequence in enumerate(output):
# sequence shape is 10 x 128
true_seq_len = unpadded_seq_lengths[batch_idx]
# only keep unpadded gt and predicted labels since we don't want loss to be influenced by padded values
predicted_labels = sequence[:true_seq_len, :] # for example, 3 x 128
gt_labels = gt_labels_padded[batch_idx, :true_seq_len, :] # same shape as above, gt_labels_padded has shape 14 x 10 x 128
# loop through unpadded predicted and gt labels and calculate loss
for item_idx, predicted_labels_seq_item in enumerate(predicted_labels):
# predicted_labels_seq_item and gt_labels_seq_item are 1D vectors of length 128
gt_labels_seq_item = gt_labels[item_idx]
current_loss = criterion(predicted_labels_seq_item, gt_labels_seq_item)
total_loss += current_loss
batch_loss += current_loss
batch_loss.backward()
optimizer.step()
Can anybody please check to see if I'm calculating loss correctly. Thanks
Update:
Is this the correct approach for calculating accuracy metrics?
# batch size: 14
# seq length: 10
for epoch in range(10):
TP = FP = TN = FN = 0.
for x, y, mask in tr_dl:
# mask shape: (10,)
out = model(x) # out shape: (14, 10, 128)
y_pred = (torch.sigmoid(out) >= 0.5).float().type(torch.int64) # consider all predictions above 0.5 as 1, rest 0
y_pred = y_pred[mask] # y_pred shape: (14, 10, 10, 128)
y_labels = y[mask] # y_labels shape: (14, 10, 10, 128)
# do I flatten y_pred and y_labels?
y_pred = y_pred.flatten()
y_labels = y_labels.flatten()
for idx, prediction in enumerate(y_pred):
if prediction == 1 and y_labels[idx] == 1:
# calculate IOU (overlap of prediction and gt bounding box)
iou = 0.78 # assume we get this iou value for objects at idx
if iou >= 0.5:
TP += 1
else:
FP += 1
elif prediction == 1 and y_labels[idx] == 0:
FP += 1
elif prediction == 0 and y_labels[idx] == 1:
FN += 1
else:
TN += 1
EPOCH_ACC = (TP + TN) / (TP + TN + FP + FN)
It is usually recommended to stick with batch-wise operations and avoid going into single-element processing steps while in the main training loop. One way to handle this case is to make your dataset return padded inputs and labels with additionally a mask that will come useful for loss computation. In other words, to compute the loss term with sequences of varying sizes, we will use a mask instead of doing individual slices.
Dataset
The way to proceed is to make sure you build the mask in the dataset and not in the inference loop. Here I am showing a minimal implementation that you should be able to transfer to your dataset without much hassle:
class Dataset(data.Dataset):
def __init__(self):
super().__init__()
def __len__(self):
return 100
def __getitem__(self, index):
i = random.randint(5, SEQ_LEN) # for demo puporse, generate x with random length
x = torch.rand(i, EMB_SIZE)
y = torch.randint(0, N_CLASSES, (i, EMB_SIZE))
# pad data to fit in batch
pad = torch.zeros(SEQ_LEN-len(x), EMB_SIZE)
x_padded = torch.cat((pad, x))
y_padded = torch.cat((pad, y))
# construct tensor to mask loss
mask = torch.cat((torch.zeros(SEQ_LEN-len(x)), torch.ones(len(x))))
return x_padded, y_padded, mask
Essentially in the __getitem__, we not only pad the input x and target y with zero values, we also construct a simple mask containing the positions of the padded values in the currently processed element.
Notice how:
x_padded, shaped (SEQ_LEN, EMB_SIZE)
y_padded, shaped (SEQ_LEN, N_CLASSES)
mask, shaped (SEQ_LEN,)
are all three tensors which are shape invariant across the dataset, yet mask contains the padding information necessary for us to compute the loss function appropriately.
Inference
The loss you've used nn.BCEWithLogitsLoss, is the correct one since it's a multi-dimensional loss used for binary classification. In other words, you can use it here in this multi-label classification task, considering each one of the 128 logits as an individual binary prediction. Do not use nn.CrossEntropyLoss) as suggested elsewhere, since the softmax will push a single logit (i.e. class), which is the behaviour required for single-label classification tasks.
Therefore, in the training loop, we simply have to apply the mask to our loss.
for x, y, mask in dl:
y_pred = model(x)
loss = mask*bce(y_pred, y)
# backpropagation, loss postprocessing, logs, etc.
This is what you need for the first part of the question, there are already loss functions implemented in tensorflow: https://medium.com/#aadityaura_26777/the-loss-function-for-multi-label-and-multi-class-f68f95cae525. Yours is just tf.nn.weighted_cross_entropy_with_logits, but you need to set the weight.
The second part of the question is not straightforward, because there's conditioning on the IOU, generally, when you do machine learning, you should heavily depend on matrix multiplication, in your case, you probably need to pre-calculate the IOU -> 1 or 0 as a vector, then multiply with the y_pred , element-wise, this will give you the modified y_pred . After that, you can use any accuracy available function to calculate the final result.
if you can use the CROSSENTROPYLOSS instead of BCEWithLogitsLoss there is something called ignore_index. you can use it to exclude your padded sequences. the difference between the 2 losses is the activation function used (softmax vs sigmoid). but I think you can still use the CROSSENTROPYLOSSfor binary classification as well.

Map Dask bincount over 2d array columns

I am trying to use bincount over a 2D array. Specifically I have this code:
import numpy as np
import dask.array as da
def dask_bincount(weights, x):
da.bincount(x, weights)
idx = da.random.random_integers(0, 1024, 1000)
weight = da.random.random((1000, 2))
bin_count = da.apply_along_axis(dask_bincount, 1, weight, idx)
The idea is that the bincount can be made with the same idx array on each one of the weight columns. That would return an array of size (np.amax(x) + 1, 2) if I am correct.
However when doing this I get this error message:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-17-5b8eed89ad32> in <module>
----> 1 bin_count = da.apply_along_axis(dask_bincount, 1, weight, idx)
~/.local/lib/python3.9/site-packages/dask/array/routines.py in apply_along_axis(func1d, axis, arr, dtype, shape, *args, **kwargs)
454 if shape is None or dtype is None:
455 test_data = np.ones((1,), dtype=arr.dtype)
--> 456 test_result = np.array(func1d(test_data, *args, **kwargs))
457 if shape is None:
458 shape = test_result.shape
<ipython-input-14-34fd0eb9b775> in dask_bincount(weights, x)
1 def dask_bincount(weights, x):
----> 2 da.bincount(x, weights)
~/.local/lib/python3.9/site-packages/dask/array/routines.py in bincount(x, weights, minlength, split_every)
670 raise ValueError("Input array must be one dimensional. Try using x.ravel()")
671 if weights is not None:
--> 672 if weights.chunks != x.chunks:
673 raise ValueError("Chunks of input array x and weights must match.")
674
AttributeError: 'numpy.ndarray' object has no attribute 'chunks'
I thought that when dask array were created the library automatically assigns them chunks, so the error does not say much. How can I fix this?
I made an script that does it on numpy with map.
idx_np = np.random.randint(0, 1024, 1000)
weight_np = np.random.random((1000,2))
f = lambda y: np.bincount(idx_np, weight_np[:,y])
result = map(f, [i for i in range(2)])
np.array(list(result))
array([[0.9885341 , 0.9977873 , 0.24937023, ..., 0.31024526, 1.40754883,
0.87609759],
[1.77406303, 0.84787723, 0.14591474, ..., 0.54584068, 0.38357015,
0.85202672]])
I would like to the same but with dask
There are multiple problems at play.
Weights should be (2, 1000)
You discover this by trying to write the same function in numpy using apply_along_axis.
idx_np = np.random.random_integers(0, 1024, 1000)
weight_np = np.random.random((2, 1000)) # <- transposed
# This gives the same result as the code you provided
np.apply_along_axis(lambda weight, idx: np.bincount(idx, weight), 1, weight_np, idx_np)
da.apply_along_axis applies the function to numpy arrays
You're getting the error
AttributeError: 'numpy.ndarray' object has no attribute 'chunks'
This suggests that what makes it into the da.bincount method is actually a numpy array. The fact is that da.apply_along_axis actually takes each row of weight and sends it to the function as a numpy array.
Your function should therefore actually be a numpy function:
def bincount(weights, x):
return np.bincount(x, weights)
However, if you try this, you will still get the same error. I believe that happens for a whole another reason though:
Dask doesn't know what the output shape will be and tries to infer it
In the code and/or documentation for apply_along_axis, we can see that Dask tries to infer the output shape and dtype by passing in the array [1] (related question). This is a problem, since bincount cannot just accept such argument.
What we can do instead is provide shape and dtype to the method so that Dask doesn't have to infer it.
The problem here is that bincount's output shape depends on the maximum value of the input array. Unless you know it beforehand, you will sadly need to compute it. The whole operation therefore won't be fully lazy.
This is the full answer:
import numpy as np
import dask.array as da
idx = da.random.random_integers(0, 1024, 1000)
weight = da.random.random((2, 1000))
def bincount(weights, x):
return np.bincount(x, weights)
m = idx.max().compute()
da.apply_along_axis(bincount, 1, weight, idx, shape=(m,), dtype=weight.dtype)
Appendix: randint vs random_integers
Be careful, because these are subtly different
randint takes integers from low (inclusive) to high (exclusive)
random_integers takes integers from low (inclusive) to high (inclusive)
Thus you have to call randint with high + 1 to get the same value.

How to export all the information from 3d numpy array to a csv file

Kaggle Dataset and code link
I'm trying to solve the above Kaggle problem and I want to export preprocessed csv so that I can build a model on weka, but when I'm trying to save it in csv I'm losing a dimension, I want to retain all the information in that csv.
please help me with the relevant code or any resource.
Thanks
print (scaled_x)
|x |y |z |label
|1.485231 |-0.661030 |-1.194153 |0
|0.888257 |-1.370361 |-0.829636 |0
|0.691523 |-0.594794 |-0.936247 |0
Fs=20
frame_size = Fs*4 #80
hop_size = Fs*2 #40
def get_frames(df, frame_size, hop_size):
N_FEATURES = 3
frames = []
labels = []
for i in range(0,len(df )- frame_size, hop_size):
x = df['x'].values[i: i+frame_size]
y = df['y'].values[i: i+frame_size]
z = df['z'].values[i: i+frame_size]
label = stats.mode(df['label'][i: i+frame_size])[0][0]
frames.append([x,y,z])
labels.append(label)
frames = np.asarray(frames).reshape(-1, frame_size, N_FEATURES)
labels = np.asarray(labels)
return frames, labels
x,y = get_frames(scaled_x, frame_size, hop_size)
x.shape, y.shape
((78728, 80, 3), (78728,))
According to the link you posted, the data is times series accelerometer/gyro data sampled at 20 Hz, with a label for each sample. They want to aggregate the time series into frames (with the corresponding label being the most common label during a given frame).
So frame_size is the number of samples in a frame, and hop_size is the amount the sliding window moves forward each iteration. In other words, the frames overlap by 50% since hop_size = frame_size / 2.
Thus at the end you get a 3D array of 78728 frames of length 80, with 3 values (x, y, z) each.
EDIT: To answer your new question about how to export as CSV, you'll need to "flatten" the 3D frame array to a 2D array since that's what a CSV represents. There are multiple different ways to do this but I think the easiest may just be to concatenate the final two dimensions, so that each row is a frame, consisting of 240 values (80 samples of 3 co-ordinates each). Then concatenate the labels as the final column.
x_2d = np.reshape(x, (x.shape[0], -1))
full = np.concatenate([x, y], axis=1)
import pandas as pd
df = pd.DataFrame(full)
df.to_csv("frames.csv")
If you also want proper column names:
columns = []
for i in range(1, x.shape[1] + 1):
columns.extend([f"{i}_X", f"{i}_Y", f"{i}_Z"])
columns.append("label")
df = pd.DataFrame(full, columns=columns)

image shuffling and slicing

This is my code for slicing my 512*512 image into a cube of 64*64*64 dimension. but when i reshape it again into a 2D array why is it not giving me the original image.am i doing something incorrect please help.
clc;
im=ind2gray(y,ymap);
% im=imresize(im,0.125);
[rows ,columns, colbands] = size(im)
end
image3d=reshape(image3d,512,512);
figure,imshow(uint8(image3d));
Just a small hint.
P(:,:,1) = [0,0;0,0]
P(:,:,2) = [1,1;1,1]
P(:,:,3) = [2,2;2,2]
P(:,:,4) = [3,3;3,3]
B = reshape(P,4,4)
B =
0 1 2 3
0 1 2 3
0 1 2 3
0 1 2 3
So you might change the slicing or do the reshaping on your own.
If I have understood your question right, you can look into the code below to perform the same operation.
% Random image of the provided size 512X512
imageX = rand(512,512)
imagesc(imageX)
% Converting the image "imageX" into the cube of 64X64X64 dimension
sliceColWise = reshape(imageX,64,64,64)
size(sliceColWise)
% Reshaping the cube to obtain the image original that was "imageX",
% in order to observe that they are identical the difference is plotted
imageY = reshape(sliceColWise,512,512);
imagesc(imageX-imageY)
n.b: From MATLAB help you can see that the reshape works column wise
reshape(X,M,N) or reshape(X,[M,N]) returns the M-by-N matrix
whose elements are taken columnwise from X. An error results
if X does not have M*N elements.

How to train a RNN with LSTM cells for time series prediction

I'm currently trying to build a simple model for predicting time series. The goal would be to train the model with a sequence so that the model is able to predict future values.
I'm using tensorflow and lstm cells to do so. The model is trained with truncated backpropagation through time. My question is how to structure the data for training.
For example let's assume we want to learn the given sequence:
[1,2,3,4,5,6,7,8,9,10,11,...]
And we unroll the network for num_steps=4.
Option 1
input data label
1,2,3,4 2,3,4,5
5,6,7,8 6,7,8,9
9,10,11,12 10,11,12,13
...
Option 2
input data label
1,2,3,4 2,3,4,5
2,3,4,5 3,4,5,6
3,4,5,6 4,5,6,7
...
Option 3
input data label
1,2,3,4 5
2,3,4,5 6
3,4,5,6 7
...
Option 4
input data label
1,2,3,4 5
5,6,7,8 9
9,10,11,12 13
...
Any help would be appreciated.
I'm just about to learn LSTMs in TensorFlow and try to implement an example which (luckily) tries to predict some time-series / number-series genereated by a simple math-fuction.
But I'm using a different way to structure the data for training, motivated by Unsupervised Learning of Video Representations using LSTMs:
LSTM Future Predictor Model
Option 5:
input data label
1,2,3,4 5,6,7,8
2,3,4,5 6,7,8,9
3,4,5,6 7,8,9,10
...
Beside this paper, I (tried) to take inspiration by the given TensorFlow RNN examples. My current complete solution looks like this:
import math
import random
import numpy as np
import tensorflow as tf
LSTM_SIZE = 64
LSTM_LAYERS = 2
BATCH_SIZE = 16
NUM_T_STEPS = 4
MAX_STEPS = 1000
LAMBDA_REG = 5e-4
def ground_truth_func(i, j, t):
return i * math.pow(t, 2) + j
def get_batch(batch_size):
seq = np.zeros([batch_size, NUM_T_STEPS, 1], dtype=np.float32)
tgt = np.zeros([batch_size, NUM_T_STEPS], dtype=np.float32)
for b in xrange(batch_size):
i = float(random.randint(-25, 25))
j = float(random.randint(-100, 100))
for t in xrange(NUM_T_STEPS):
value = ground_truth_func(i, j, t)
seq[b, t, 0] = value
for t in xrange(NUM_T_STEPS):
tgt[b, t] = ground_truth_func(i, j, t + NUM_T_STEPS)
return seq, tgt
# Placeholder for the inputs in a given iteration
sequence = tf.placeholder(tf.float32, [BATCH_SIZE, NUM_T_STEPS, 1])
target = tf.placeholder(tf.float32, [BATCH_SIZE, NUM_T_STEPS])
fc1_weight = tf.get_variable('w1', [LSTM_SIZE, 1], initializer=tf.random_normal_initializer(mean=0.0, stddev=1.0))
fc1_bias = tf.get_variable('b1', [1], initializer=tf.constant_initializer(0.1))
# ENCODER
with tf.variable_scope('ENC_LSTM'):
lstm = tf.nn.rnn_cell.LSTMCell(LSTM_SIZE)
multi_lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * LSTM_LAYERS)
initial_state = multi_lstm.zero_state(BATCH_SIZE, tf.float32)
state = initial_state
for t_step in xrange(NUM_T_STEPS):
if t_step > 0:
tf.get_variable_scope().reuse_variables()
# state value is updated after processing each batch of sequences
output, state = multi_lstm(sequence[:, t_step, :], state)
learned_representation = state
# DECODER
with tf.variable_scope('DEC_LSTM'):
lstm = tf.nn.rnn_cell.LSTMCell(LSTM_SIZE)
multi_lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * LSTM_LAYERS)
state = learned_representation
logits_stacked = None
loss = 0.0
for t_step in xrange(NUM_T_STEPS):
if t_step > 0:
tf.get_variable_scope().reuse_variables()
# state value is updated after processing each batch of sequences
output, state = multi_lstm(sequence[:, t_step, :], state)
# output can be used to make next number prediction
logits = tf.matmul(output, fc1_weight) + fc1_bias
if logits_stacked is None:
logits_stacked = logits
else:
logits_stacked = tf.concat(1, [logits_stacked, logits])
loss += tf.reduce_sum(tf.square(logits - target[:, t_step])) / BATCH_SIZE
reg_loss = loss + LAMBDA_REG * (tf.nn.l2_loss(fc1_weight) + tf.nn.l2_loss(fc1_bias))
train = tf.train.AdamOptimizer().minimize(reg_loss)
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
total_loss = 0.0
for step in xrange(MAX_STEPS):
seq_batch, target_batch = get_batch(BATCH_SIZE)
feed = {sequence: seq_batch, target: target_batch}
_, current_loss = sess.run([train, reg_loss], feed)
if step % 10 == 0:
print("#{}: {}".format(step, current_loss))
total_loss += current_loss
print('Total loss:', total_loss)
print('### SIMPLE EVAL: ###')
seq_batch, target_batch = get_batch(BATCH_SIZE)
feed = {sequence: seq_batch, target: target_batch}
prediction = sess.run([logits_stacked], feed)
for b in xrange(BATCH_SIZE):
print("{} -> {})".format(str(seq_batch[b, :, 0]), target_batch[b, :]))
print(" `-> Prediction: {}".format(prediction[0][b]))
Sample output of this looks like this:
### SIMPLE EVAL: ###
# [input seq] -> [target prediction]
# `-> Prediction: [model prediction]
[ 33. 53. 113. 213.] -> [ 353. 533. 753. 1013.])
`-> Prediction: [ 19.74548721 28.3149128 33.11489105 35.06603241]
[ -17. -32. -77. -152.] -> [-257. -392. -557. -752.])
`-> Prediction: [-16.38951683 -24.3657589 -29.49801064 -31.58583832]
[ -7. -4. 5. 20.] -> [ 41. 68. 101. 140.])
`-> Prediction: [ 14.14126873 22.74848557 31.29668617 36.73633194]
...
The model is a LSTM-autoencoder having 2 layers each.
Unfortunately, as you can see in the results, this model does not learn the sequence properly. I might be the case that I'm just doing a bad mistake somewhere, or that 1000-10000 training steps is just way to few for a LSTM. As I said, I'm also just starting to understand/use LSTMs properly.
But hopefully this can give you some inspiration regarding the implementation.
After reading several LSTM introduction blogs e.g. Jakob Aungiers', option 3 seems to be the right one for stateless LSTM.
If your LSTMs need to remember data longer ago than your num_steps, your can train in a stateful way - for a Keras example see Philippe Remy's blog post "Stateful LSTM in Keras". Philippe does not show an example for batch size greater than one, however. I guess that in your case a batch size of four with stateful LSTM could be used with the following data (written as input -> label):
batch #0:
1,2,3,4 -> 5
2,3,4,5 -> 6
3,4,5,6 -> 7
4,5,6,7 -> 8
batch #1:
5,6,7,8 -> 9
6,7,8,9 -> 10
7,8,9,10 -> 11
8,9,10,11 -> 12
batch #2:
9,10,11,12 -> 13
...
By this, the state of e.g. the 2nd sample in batch #0 is correctly reused to continue training with the 2nd sample of batch #1.
This is somehow similar to your option 4, however you are not using all available labels there.
Update:
In extension to my suggestion where batch_size equals the num_steps, Alexis Huet gives an answer for the case of batch_size being a divisor of num_steps, which can be used for larger num_steps. He describes it nicely on his blog.
I believe Option 1 is closest to the reference implementation in /tensorflow/models/rnn/ptb/reader.py
def ptb_iterator(raw_data, batch_size, num_steps):
"""Iterate on the raw PTB data.
This generates batch_size pointers into the raw PTB data, and allows
minibatch iteration along these pointers.
Args:
raw_data: one of the raw data outputs from ptb_raw_data.
batch_size: int, the batch size.
num_steps: int, the number of unrolls.
Yields:
Pairs of the batched data, each a matrix of shape [batch_size, num_steps].
The second element of the tuple is the same data time-shifted to the
right by one.
Raises:
ValueError: if batch_size or num_steps are too high.
"""
raw_data = np.array(raw_data, dtype=np.int32)
data_len = len(raw_data)
batch_len = data_len // batch_size
data = np.zeros([batch_size, batch_len], dtype=np.int32)
for i in range(batch_size):
data[i] = raw_data[batch_len * i:batch_len * (i + 1)]
epoch_size = (batch_len - 1) // num_steps
if epoch_size == 0:
raise ValueError("epoch_size == 0, decrease batch_size or num_steps")
for i in range(epoch_size):
x = data[:, i*num_steps:(i+1)*num_steps]
y = data[:, i*num_steps+1:(i+1)*num_steps+1]
yield (x, y)
However, another Option is to select a pointer into your data array randomly for each training sequence.

Resources