Keras simple RNN implementation - machine-learning

I found problems when trying to compile a network with one recurrent layer. It seems there is some issue with the dimensionality of the first layer and thus my understanding of how RNN layers work in Keras.
My code sample is:
model.add(Dense(8,
input_dim = 2,
activation = "tanh",
use_bias = False))
model.add(SimpleRNN(2,
activation = "tanh",
use_bias = False))
model.add(Dense(1,
activation = "tanh",
use_bias = False))
The error is
ValueError: Input 0 is incompatible with layer simple_rnn_1: expected ndim=3, found ndim=2
This error is returned regardless of input_dim value. What am I missing ?

That message means: the input going into the rnn has 2 dimensions, but an rnn layer expects 3 dimensions.
For an RNN layer, you need inputs shaped like (BatchSize, TimeSteps, FeaturesPerStep). These are the 3 dimensions expected.
A Dense layer (in keras 2) can work with either 2 or 3 dimensions. We can see that you're working with 2 because you passed an input_dim instead of passing an input_shape=(Steps,Features).
There are many possible ways to solve this, but the most meaningful and logical would be a case where your input data is a sequence with time steps.
Solution 1 - Your training data is a sequence:
If your training data is a sequence, you shape it like (NumberOfSamples, TimeSteps, Features) and pass it to your model. Make sure you use input_shape=(TimeSteps,Features) in the first layer instead of using input_dim.
Solution 2 - You reshape the output of the first dense layer so it has the additional dimension:
model.add(Reshape((TimeSteps,Features)))
Make sure that the product TimeSteps*Features is equal to 8, the output of your first dense layer.

Related

ValueError: Input 0 of layer "lstm_6" is incompatible with the layer

I am trying to create a hybrid model which is consists of EfficientNetB7 and LSTM.
# pretrained model act as a feature extractor
Effnet=tensorflow.keras.applications.EfficientNetB7( input_shape=(IMG_SIZE,IMG_SIZE,3), include_top=False,weights="imagenet",pooling="avg")
Effnet.trainable = False
x = Flatten()(Effnet.output)
x=(BatchNormalization())(x)
#add two LSTM Layers
x=LSTM(8,input_shape=(IMG_SIZE,IMG_SIZE,3),return_sequences=False)(x)
x=LSTM(8)(x)
x=(BatchNormalization())(x)
#add two fully connected dense layers 1024 as my model
x=Dense(1024)(x)
x=(BatchNormalization())(x)
x=Activation('relu')(x)
x=Dense(1024)(x)
x=(BatchNormalization())(x)
x=Activation('relu')(x)
x = Dense(NUM_CLASSE)(x)
x=(BatchNormalization())(x)
prediction =Activation('softmax')(x)
model = Model(inputs=Effnet.input, outputs=prediction)
model.summary()
But it gives me the following error
and the EfficientNetB7 is average pooling is, I think it is causing the problem, how do I remove it?
ValueError: Input 0 of layer "lstm_6" is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (None, 2560)
How can i fix it, please? Thank you, Regards!
The problem is on this line:
x=LSTM(8,input_shape=(IMG_SIZE,IMG_SIZE,3),return_sequences=False)(x)
You defined that the LSTM layers expect input of dimension 3. However, that only hold for the very beginning of your network, which flows into EfficientNetB7. When you have the last output from EfficientNet, you flatten it and get a 1D tensor.
The error message is actually pretty straightforward.
expected ndim=3, found ndim=2. Full shape received: (None, 2560)
2560 comes from flattening the features, and the first dimension is the one for batch size.
You must correct the input to your LSTM layer. If you do not specify anything, keras might just figure it out itself.

Input 0 is incompatible with layer lstm_12: expected ndim=3, found ndim=2

I am new to ML and trying to make an RNN LSTM model.
I want to optimize the hyper-parameter using GridSearchCV. What I want to optimize is the number of layers and nodes for each number of layer selection.
Here is the code to generate the model:
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV
def create_model(layers,activation):
model = Sequential()
for i,node in enumerate(layers):
if i == 0:
model.add(LSTM(units=node, input_shape=(x_train.shape[1],1)))
model.add(Activation(activation))
model.add(Dropout(0.2))
else:
model.add(LSTM(units=node, input_shape=(x_train.shape[1],1)))
model.add(Activation(activation))
model.add(Dropout(0.2))
model.add(Dense(units=1))
model.compile(optimizer='adam',loss='mean_squared_error',metrics=['accuracy'])
return model
and here is the variables
layers=[[40,40],[30,30],[30,30,30],[30,30,30,30],[30,30,30,30,30]]
activations =['sigmoid','relu']
batch_size = [32,50]
epochs = [50]
then I wrap it up using gridsearchcv
param_grid = dict(layers=layers,activation=activations,batch_size=batch_size,epochs=epochs)
grid = GridSearchCV(estimator=model,param_grid=param_grid)
When I do it
grid_result = grid.fit(x_train,y_train,verbose=3)
I got this error
ValueError: Input 0 is incompatible with layer lstm_14: expected ndim=3, found ndim=2
I dont know what happens. My x_train shape is (13871, 60, 1) and y_train shape is (13871,). Thank you beforehand and your help will be very much appreciated!
Thanks!
Phil
The error message actually explains this well. LSTM requires a time series input of shape (batch_size, timesteps, features). You seem to have this correct for your first input lstm layer. However, the output of LSTM is not a sequence. Consequent LSTM layers will not receive appropriate input.
You can make the LSTM output also as a sequence by setting the parameter
return_sequences=True
Note that you may have to set return sequence to false in the final layer before dense or perform flatten operation.
Does that help?
PS: your if... else, condition are exactly the same. Is that something you plan to change later?

Weight initialization in neural networks

Hi I am developing a neural network model using keras.
code
def base_model():
# Initialising the ANN
regressor = Sequential()
# Adding the input layer and the first hidden layer
regressor.add(Dense(units = 4, kernel_initializer = 'he_normal', activation = 'relu', input_dim = 7))
# Adding the second hidden layer
regressor.add(Dense(units = 2, kernel_initializer = 'he_normal', activation = 'relu'))
# Adding the output layer
regressor.add(Dense(units = 1, kernel_initializer = 'he_normal'))
# Compiling the ANN
regressor.compile(optimizer = 'adam', loss = 'mse', metrics = ['mae'])
return regressor
I have been reading about which kernel_initializer to use and came across the link- https://towardsdatascience.com/hyper-parameters-in-action-part-ii-weight-initializers-35aee1a28404
it talks about glorot and he initializations. I have tried with different intilizations for weights, but all of them give the same results. I want to understand how important is it do a proper initialization?
Thanks
I'll give you an explanation of how much weights initialisation is important.
Let's suppose our NN has an input layer with 1000 neurons, and suppose we start to initialise weights as they are normal distributed with mean 0 and variance 1 ().
At the second layer, we assume that only 500 first layer's neurons are activated, while the other 500 not.
The neuron's input of the second layer z will be the sum of :
so, it will be even normal distributed but with variance .
This means its value will be |z| >> 1 or |z| << 1, so neurons will saturate. The network will learn slowly at all.
A solution is to initialise weights as where is the number of the inputs of the first layer. In this way z will be and so less spreader, consequently neurons are less prone to saturate.
This trick can help as a start but in deep neural networks, due to the presence of hidden multi-layers, the weights initialisation should be done at each layer. A method may be using the batch normalization
Besides this from your code I can see you'v chosen as cost function the MSE, so it is a quadratic cost function. I don't know if your problem is a classification one, but if this is the case I suggest you to use a cross-entropy function as cost function for increasing the learning rate of your network.

Keras: Is there any workaround to split the output of an intermediate layer without using Lamda layer?

Say, I have a 10x10x4 intermediate output of a convolution layer, which I need to split into 100 1x1x4 volume and apply softmax on each to get 100 outputs from the network. Is there any way to accomplish this without using the Lambda layer? The issue with the Lambda layer in this case is this simple task of splitting takes 100 passes through the lambda layer during forward pass, which makes the network performance very slow for my practical use. Please suggest a quicker way of doing this.
Edit: I had already tried the Softmax+Reshape approach before asking the question. With that approach, I would be getting a 10x10x4 matrix reshaped to a 100x4 Tensor with use of Reshape as the output. What I really need is a multi output network with 100 different outputs. In my application, it is not possible to jointly optimize over the 10x10 matrix, but I get good results by using a network with 100 different outputs with the Lambda layer.
Here are code snippets of my approach using the Keras functional API:
With Lambda layer (slow, gives 100 Tensors of shape (None, 4) as desired):
# Assume conv_output is output from a convolutional layer with shape (None, 10, 10,4)
preds = []
for i in range(10):
for j in range(10):
y = Lambda(lambda x, i,j: x[:, i, j,:], arguments={'i': i,'j':j})(conv_output)
preds.append(Activation('softmax',name='predictions_' + str(i*10+j))(y))
model = Model(inputs=img, outputs=preds, name='model')
model.compile(loss='categorical_crossentropy',
optimizer=Adam(),
metrics=['accuracy']
With Softmax+Reshape (fast, but gives Tensor of shape (None, 100, 4))
# Assume conv_output is output from a convolutional layer with shape (None, 10, 10,4)
y = Softmax(name='softmax', axis=-1)(conv_output)
preds = Reshape([100, 4])(y)
model = Model(inputs=img, outputs=preds, name='model')
model.compile(loss='categorical_crossentropy',
optimizer=Adam(),
metrics=['accuracy']
I don't think in the second case it is possible to individually optimize over each of the 100 outputs (probably one can think of it as learning the joint distribution, whereas I need to learn the marginals as in the first case). Please let me know if there is any way to accomplish what I am doing with the Lambda layer in the first code snippet in a faster way
You can use the Softmax layer and set the axis argument to the last axis (i.e. -1) to apply softmax over that axis:
from keras.layers import Softmax
soft_out = Softmax(axis=-1)(conv_out)
Note that the axis argument by default is set to -1, so you may not even need to pass that.

Number of neurons in input layer for Feedforward neural network

I'm trying to classify 1D data with 3-layered feedforward neural network (multilayer perceptron).
Currently I have input samples (time-series) consisting of 50 data points each. I've read on many sources that number of neurons in input layer should be equal to number of data points (50 in my case), however, after experimenting with cross validation a bit, I've found that I can get slightly better average classification (with lover variation as well) performance with 25 neurons in input layer.
I'm trying to understand math behind it: does it makes any sense to have lower number of neurons than data points in input layer? Or maybe results are better just because of some errors?
Also - are there any other rules to set number of neurons in input layer?
Update - to clarify what I mean:
I use Keras w tensorflow backend for this. My model looks like this:
model = Sequential()
model.add(Dense(25, input_dim=50, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(input_data, output_data, epochs=150, batch_size=10)
predictions = model.predict(X)
rounded = [round(x[0]) for x in predictions]
print(rounded)
input_data, output_data - numpy arrays with my data points in former and corresponding value of 1 or 0 in latter.
25 is number of neurons in first layer and input_dim is number of my data points, therefore technically it works, yet I'm not sure whether it makes sense to do so or I misunderstood concept of neurons in input layer and what they do.

Resources