Tensorflow RNN example limited to fixed batch size? - machine-learning

When looking at the RNN example at Tensorflow im having an issue with how the initial state is constructed. At build time of the graph we limit the graph to only handle input of one batch size. This is an issue for me since I want to be able feed in a single example and get a prediction for that single example.
The part of the code that restricts this is:
initial_state = state = tf.zeros([batch_size, lstm.state_size])
So my question is how can I expand the example so that I can use a variable batch size so that I can use the same model for training with batch size and then use single example for predictions?

This is how I'm doing this. You can pass the batch_size as a variable like this:
batch_size = tf.placeholder(tf.int32)
init_state = cell.zero_state(batch_size, tf.float32)
where cell is one of RNN cells (BasicLSTMCell, BasicGRUCell, MultiRNNCell, etc). However, if you're preserving the state over multiple batches that won't work since its' size has to be constant.

The Tensorflow text generation tutorial explains how to do this (now TF 2.0). It seems that the batch_size becomes part of the built model, so you have to rebuild/reload from the saved weights with a new batch size:
https://www.tensorflow.org/tutorials/text/text_generation#restore_the_latest_checkpoint
To keep this prediction step simple, use a batch size of 1.
Because of the way the RNN state is passed from timestep to timestep,
the model only accepts a fixed batch size once built.
To run the model with a different batch_size, we need to rebuild the
model and restore the weights from the checkpoint.
model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)
model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))
model.build(tf.TensorShape([1, None]))
model.summary()
I don't know for sure why you have to do this, but I always assumed it's because batching for recurrent layers requires management of multiple, parallel hidden state pipelines, so it preallocates them.

Related

PyTorch: Batch size and individual datum in nn.Module

In pytorch nn.Module, the model created seems to be agnostic of the batch size. That is, if an individual datum is 128 dimensions, and we are training in batches of 64, the model should have an input of 128, not 128 x 64.
The first step of my nn.Sequential is a Flatten. When I apply the model to a single datum (no batch), I need to make sure the Flatten has a start_dim=0. But this is incorrect when applying to a batch. This seems to be the opposite interface than above: you need to tailor your model to whether or not you are using batches.
So:
Does a nn.Module need to be aware of batching?
If yes: How do you apply the model to a single sample, without a batch?
If not: How do you apply Flatten, when you might send a batch, or you might send a single sample?
An equivalent question might be: How do I build a PyTorch model to train with batches, but still apply it to individual datum at production time?

Specifying class or sample weights in Keras for one-hot encoded labels in a TF Dataset

I am trying to train an image classifier on an unbalanced training set. In order to cope with the class imbalance, I want either to weight the classes or the individual samples. Weighting the classes does not seem to work. And somehow for my setup I was not able to find a way to specify the samples weights. Below you can read how I load and encode the training data and the two approaches that I tried.
Training data loading and encoding
My training data is stored in a directory structure where each image is place in the subfolder corresponding to its class (I have 32 classes in total). Since the training data is too big too all load at once into memory I make use of image_dataset_from_directory and by that describe the data in a TF Dataset:
train_ds = keras.preprocessing.image_dataset_from_directory (training_data_dir,
batch_size=batch_size,
image_size=img_size,
label_mode='categorical')
I use label_mode 'categorical', so that the labels are described as a one-hot encoded vector.
I then prefetch the data:
train_ds = train_ds.prefetch(buffer_size=buffer_size)
Approach 1: specifying class weights
In this approach I try to specify the class weights of the classes via the class_weight argument of fit:
model.fit(
train_ds, epochs=epochs, callbacks=callbacks, validation_data=val_ds,
class_weight=class_weights
)
For each class we compute weight which are inversely proportional to the number of training samples for that class. This is done as follows (this is done before the train_ds.prefetch() call described above):
class_num_training_samples = {}
for f in train_ds.file_paths:
class_name = f.split('/')[-2]
if class_name in class_num_training_samples:
class_num_training_samples[class_name] += 1
else:
class_num_training_samples[class_name] = 1
max_class_samples = max(class_num_training_samples.values())
class_weights = {}
for i in range(0, len(train_ds.class_names)):
class_weights[i] = max_class_samples/class_num_training_samples[train_ds.class_names[i]]
What I am not sure about is whether this solution works, because the keras documentation does not specify the keys for the class_weights dictionary in case the labels are one-hot encoded.
I tried training the network this way but found out that the weights did not have a real influence on the resulting network: when I looked at the distribution of predicted classes for each individual class then I could recognize the distribution of the overall training set, where for each class the prediction of the dominant classes is most likely.
Running the same training without any class weight specified led to similar results.
So I suspect that the weights don't seem to have an influence in my case.
Is this because specifying class weights does not work for one-hot encoded labels, or is this because I am probably doing something else wrong (in the code I did not show here)?
Approach 2: specifying sample weight
As an attempt to come up with a different (in my opinion less elegant) solution I wanted to specify the individual sample weights via the sample_weight argument of the fit method. However from the documentation I find:
[...] This argument is not supported when x is a dataset, generator, or keras.utils.Sequence instance, instead provide the sample_weights as the third element of x.
Which is indeed the case in my setup where train_ds is a dataset. Now I really having trouble finding documentation from which I can derive how I can modify train_ds, such that it has a third element with the weight. I thought using the map method of a dataset can be useful, but the solution I came up with is apparently not valid:
train_ds = train_ds.map(lambda img, label: (img, label, class_weights[np.argmax(label)]))
Does anyone have a solution that may work in combination with a dataset loaded by image_dataset_from_directory?

Is the Keras's default LSTM unrolled or stateful?

In the Keras documentation, both stateful and unroll are set to False. So how is the recurrent done in Keras if it's neither of these?
Keras RNN documentaion
I have checked the source code for RNN in Keras, it seems that the default action is to initialize the LSTM at every time step. Am I worng?
if initial_state is not None:
pass
elif self.stateful:
initial_state = self.states
else:
initial_state = self.get_initial_state(inputs)
If I was correct, does it mean that, for time series analysis, it would be better to set unroll=True ?
Neither unrolled nor stateful.
Remember that "stateful" in Keras means only that "two consecutive batches will be interpreted as two parts of the same sequences". Nothing else. (Batch 2 is a sequel of batch 1)
All LSTM's, of course, have states (it's impossible not to).
Be careful with the expression "initialize the LSTM". A stateful=False layer will "reset states" for every batch. The practical result is: "each batch is a group of individual sequences from start to end". (Batch 2 is not a sequel of batch 1)
"States" are information about the "history of a sequence up to the current step". They are completely different from "weights", which are what the layer actually learned from all sequences.
"Unroll" is a way to transform the recurrent calculations into a single graph without recurrency. It's meant only for short sequences, it gets faster processing at the expense of using more memory.

Are epochs and training steps the same thing?

features = [tf.contrib.layers.real_valued_column("x", dimension=1)]
estimator = tf.contrib.learn.LinearRegressor(feature_columns=features)
y = np.array([0., -1., -2., -3.])
input_fn = tf.contrib.learn.io.numpy_input_fn({"x":x}, y, batch_size=4,
num_epochs=1000)
estimator.fit(input_fn=input_fn, steps=1000)
For example, do these "steps=1000" and "num_epochs=1000" mean exactly the same thing? If yes, why does it need to be duplicated? If not, can i set these two parameters differently?
Here is the basic difference between epoch and steps in any machine learning algorithm or framework:
Once the framework goes through all the data points in its training set to update its parameters it's called one epoch. A step is one update of the parameters (e.g. weights of the neural network if it training DNN). This update can be obtained using a single data point, or a mini-batch of data points (e.g. randomly drawing 100 data points, with or without replacement), or all the points. Hence as you can see if all your datapoints are used in one step (or update of parameters) it becomes one epoch i.e. one step = one epoch.
Typically frameworks use mini-batching and in one step they batch 100 (or some other number) datapoints together and do one update. In this case, if say you have total 1 million datapoints (10^6) then one epoch has 10000 steps since one step contains 100 datapoints.
No, they are not the same. As with most (all?) Frameworks, Tensorflow has some commands that specify epochs, and some that work on steps, a.k.a iterations. A step is one batch, which is governed by the batch size specified in the model's input.
For instance, if you are using AlexNet with its default batch size of 256, and the ILSVRC 2012 data set of roughly 1.28M images, then you have about 5000 steps per epoch (1,280,000 / 256).
Batch size is the number of images processed in parallel. If there are 1.28M images in the data set, then you have to process 12.8M images per epoch: that's the definition of epoch -- process each input once. Now is that arithmetic clear?

Torch, how to get a tensor of loss values during batch optimization

I am training a network with batch optimization over my training set, and I would like to get a loss vector containing the loss of each of my training examples.
More specifically I am using images (of size 3x64x64) in a batch of size 64. Therefore my input is a tensor of size 64x3x64x64.
During training when I write
output = net:forward(input)
loss = criterion:forward(input, target)
loss is a number, but I would like to get a tensor (of size 64) with one entry per image in my batch, corresponding to the loss value of this precise image.
Is there a way to do that without looping on the first dimension of my input tensor?
The forward method calls another method, the updateOutput method which can be overwritten.
For eg., in case of MSECriterion(), you can change the method by commenting the call to the THNN library and write on your own how you want the criterion to function, i.e., do a normal element wise subtraction and then square(again element wise) and divide by the total number of data points(again element wise); then return the output as a tensor.
You will also need to recompile the nn package once you have changed this using luarocks make rocks/[the scm file in the folder] after navigating to the nn folder.

Resources