Keras LSTM multi-dimension input - machine-learning

my input time series data is of the shape (nb_samples, 75, 32).
75 is the timesteps and 32 is the input dimension.
model = Sequential()
model.add(LSTM(4, input_shape=(75, 32)))
model.summary()
The LSTM weight vectors,[W_i, W_c, W_f, W_o] are all 32 dimensions, but the output is just a single value. the output shape of the above model is (1,4). But in LSTM the output is also a vector so should not it be (32,4) for many to one implementation as above? why is it giving a single value for multi-dimension input also?

As you can read in the Keras doc for reccurent layers
For an input of shape (nb_sample, timestep, input_dim), you have two possible outputs:
if you set return_sequence=True in your LSTM (which is not your case), you return every hidden state, so the intermediate steps when the LSTM 'reads' your sequence. You get an output of shape (nb_sample, timestep, output_dim).
if you set return_sequence=False (which is the default), it will only output the last state. So you will get an output of shape (nb_sample, output_dim).
So if you define your LSTM layer like this :
model.add(LSTM(4, return_sequence=True, input_shape=(75, 32)))
you will have an output of shape (None, 75, 4). If 32 is your time dimension, you will have to transpose the data before feeding it to the LSTM. The first dimension is the temporal one.
I hope this helps :)

Related

Keras: Is there any workaround to split the output of an intermediate layer without using Lamda layer?

Say, I have a 10x10x4 intermediate output of a convolution layer, which I need to split into 100 1x1x4 volume and apply softmax on each to get 100 outputs from the network. Is there any way to accomplish this without using the Lambda layer? The issue with the Lambda layer in this case is this simple task of splitting takes 100 passes through the lambda layer during forward pass, which makes the network performance very slow for my practical use. Please suggest a quicker way of doing this.
Edit: I had already tried the Softmax+Reshape approach before asking the question. With that approach, I would be getting a 10x10x4 matrix reshaped to a 100x4 Tensor with use of Reshape as the output. What I really need is a multi output network with 100 different outputs. In my application, it is not possible to jointly optimize over the 10x10 matrix, but I get good results by using a network with 100 different outputs with the Lambda layer.
Here are code snippets of my approach using the Keras functional API:
With Lambda layer (slow, gives 100 Tensors of shape (None, 4) as desired):
# Assume conv_output is output from a convolutional layer with shape (None, 10, 10,4)
preds = []
for i in range(10):
for j in range(10):
y = Lambda(lambda x, i,j: x[:, i, j,:], arguments={'i': i,'j':j})(conv_output)
preds.append(Activation('softmax',name='predictions_' + str(i*10+j))(y))
model = Model(inputs=img, outputs=preds, name='model')
model.compile(loss='categorical_crossentropy',
optimizer=Adam(),
metrics=['accuracy']
With Softmax+Reshape (fast, but gives Tensor of shape (None, 100, 4))
# Assume conv_output is output from a convolutional layer with shape (None, 10, 10,4)
y = Softmax(name='softmax', axis=-1)(conv_output)
preds = Reshape([100, 4])(y)
model = Model(inputs=img, outputs=preds, name='model')
model.compile(loss='categorical_crossentropy',
optimizer=Adam(),
metrics=['accuracy']
I don't think in the second case it is possible to individually optimize over each of the 100 outputs (probably one can think of it as learning the joint distribution, whereas I need to learn the marginals as in the first case). Please let me know if there is any way to accomplish what I am doing with the Lambda layer in the first code snippet in a faster way
You can use the Softmax layer and set the axis argument to the last axis (i.e. -1) to apply softmax over that axis:
from keras.layers import Softmax
soft_out = Softmax(axis=-1)(conv_out)
Note that the axis argument by default is set to -1, so you may not even need to pass that.

Training LSTMs in Keras with time series of different length

I'm new to Keras and wondering how to train an LTSM with (interrupted) time series of different lengths. Consider, for example, a continuous series from day 1 to day 10 and another continuous series from day 15 to day 20. Simply concatenating them to a single series might yield wrong results. I see basically two options to bring them to shape (batch_size, timesteps, output_features):
Extend the shorter series by some default value (0), i.e. for the above example we would have the following batch:
d1, ..., d10
d15, ..., d20, 0, 0, 0, 0, 0
Compute the GCD of the lengths, cut the series into pieces, and use a stateful LSTM, i.e.:
d1, ..., d5
d6, ..., d10
reset_state
d15, ..., d20
Are there any other / better solutions? Is training a stateless LSTM with a complete sequence equivalent to training a stateful LSTM with pieces?
Have you tried feeding the LSTM layer with inputs of different length? The input time-series can be of different length when LSTM is used (even the batch sizes can be different from one batch to another, but obvisouly the dimension of features should be the same). Here is an example in Keras:
from keras import models, layers
n_feats = 32
latent_dim = 64
lstm_input = layers.Input(shape=(None, n_feats))
lstm_output = layers.LSTM(latent_dim)(lstm_input)
model = models.Model(lstm_input, lstm_output)
model.summary()
Output:
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) (None, None, 32) 0
_________________________________________________________________
lstm_2 (LSTM) (None, 64) 24832
=================================================================
Total params: 24,832
Trainable params: 24,832
Non-trainable params: 0
As you can see the first and second axis of Input layer is None. It means they are not pre-specified and can be any value. You can think of LSTM as a loop. No matter the input length, as long as there are remaining data vectors of same length (i.e. n_feats), the LSTM layer processes them. Therefore, as you can see above, the number of parameters used in a LSTM layer does not depend on the batch size or time-series length (it only depends on input feature vector's length and the latent dimension of LSTM).
import numpy as np
# feed LSTM with: batch_size=10, timestamps=5
model.predict(np.random.rand(10, 5, n_feats)) # This works
# feed LSTM with: batch_size=5, timestamps=100
model.predict(np.random.rand(5, 100, n_feats)) # This also works
However, depending on the specific problem you are working on, this may not work; though I don't have any specific examples in my mind now in which this behavior may not be suitable and you should make sure all the time-series have the same length.

Number of neurons in input layer for Feedforward neural network

I'm trying to classify 1D data with 3-layered feedforward neural network (multilayer perceptron).
Currently I have input samples (time-series) consisting of 50 data points each. I've read on many sources that number of neurons in input layer should be equal to number of data points (50 in my case), however, after experimenting with cross validation a bit, I've found that I can get slightly better average classification (with lover variation as well) performance with 25 neurons in input layer.
I'm trying to understand math behind it: does it makes any sense to have lower number of neurons than data points in input layer? Or maybe results are better just because of some errors?
Also - are there any other rules to set number of neurons in input layer?
Update - to clarify what I mean:
I use Keras w tensorflow backend for this. My model looks like this:
model = Sequential()
model.add(Dense(25, input_dim=50, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(input_data, output_data, epochs=150, batch_size=10)
predictions = model.predict(X)
rounded = [round(x[0]) for x in predictions]
print(rounded)
input_data, output_data - numpy arrays with my data points in former and corresponding value of 1 or 0 in latter.
25 is number of neurons in first layer and input_dim is number of my data points, therefore technically it works, yet I'm not sure whether it makes sense to do so or I misunderstood concept of neurons in input layer and what they do.

Keras simple RNN implementation

I found problems when trying to compile a network with one recurrent layer. It seems there is some issue with the dimensionality of the first layer and thus my understanding of how RNN layers work in Keras.
My code sample is:
model.add(Dense(8,
input_dim = 2,
activation = "tanh",
use_bias = False))
model.add(SimpleRNN(2,
activation = "tanh",
use_bias = False))
model.add(Dense(1,
activation = "tanh",
use_bias = False))
The error is
ValueError: Input 0 is incompatible with layer simple_rnn_1: expected ndim=3, found ndim=2
This error is returned regardless of input_dim value. What am I missing ?
That message means: the input going into the rnn has 2 dimensions, but an rnn layer expects 3 dimensions.
For an RNN layer, you need inputs shaped like (BatchSize, TimeSteps, FeaturesPerStep). These are the 3 dimensions expected.
A Dense layer (in keras 2) can work with either 2 or 3 dimensions. We can see that you're working with 2 because you passed an input_dim instead of passing an input_shape=(Steps,Features).
There are many possible ways to solve this, but the most meaningful and logical would be a case where your input data is a sequence with time steps.
Solution 1 - Your training data is a sequence:
If your training data is a sequence, you shape it like (NumberOfSamples, TimeSteps, Features) and pass it to your model. Make sure you use input_shape=(TimeSteps,Features) in the first layer instead of using input_dim.
Solution 2 - You reshape the output of the first dense layer so it has the additional dimension:
model.add(Reshape((TimeSteps,Features)))
Make sure that the product TimeSteps*Features is equal to 8, the output of your first dense layer.

How to input the unknown size of time-step to TimeDistributed Keras layer

I have my input state with shape = (84,84,4)
state = Input(shape=(84,84,4), dtype="float")
So I want to pass this to some TimeDistributed layer with time steps size=1..5 (in range of 1 to 5) and I don't know exactly which it equals.
My next layer is something like this:
conv1 = TimeDistributed(Convolution2D(16, 8, 8, subsample=(4, 4), border_mode='valid',
activation='relu', dim_ordering='tf'))(state)
And I've got an error at this layer:
IndexError: tuple index out of range
I just want to pass an unknown time-series size to TimeDistributed and then to LSTM also.
So basically in Keras - you need to provide the sequence length because during computations Keras layers accepts as an input numpy array with a specified shape - what makes compulsory for all inputs (at least in one batch) to have a length fixed. But - you still can deal with varying input size by 0-padding (making all sequence equal size by adding all zero dummy timesteps at the beginning) and then masking what makes your network equivalent to a varying length input network.
You can give a variable sequence-length, like this :
classifier.add(TimeDistributed(Convolution2D(64,(3,3)),input_shape=(None,None,None,3)))
But now you will have to adjust the length of the vector when it flattens or un-rolls at the time prediction.

Resources