I'm trying to train a simple many-to-one RNN classifier using LSTM. My timesteps are 100 data points long with 7 features, I have a total of 192382 samples. Here is my model:
model = Sequential()
model.add(LSTM(50,input_shape = (100,7),name = 'LSTM',return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(3, activation='softmax',name = 'softmax_layer'))
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'],name='softmax')
model.fit(datax,datay,epochs=25,batch_size=128)
model.summary()
The model compiles fine with no error, however I can't fit the model. Here is the error that it returns:
ValueError: Error when checking target: expected softmax_layer to have shape (None, 3) but got array with shape (192282, 100)
Does anyone have an idea why the softmax layer is returning a (192282, 100) matrix? Isn't return_sequence=False in the LSTM layer supposed to only give me one output per timestep?
Actually, the softmax_layer returns (None, 3), because the size of last layer is 3.
Probably you want to fix it? So, in order to fix it you need the size of output layer (softmax_layer) should be equal to size of your label array (datay.shape[1]). In other words it must be equal to a number of classes.
Quick fix is:
model = Sequential()
model.add(LSTM(50,input_shape = (100,7),name = 'LSTM',return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(datay.shape[1], activation='softmax',name = 'softmax_layer'))
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'],name='softmax')
model.fit(datax,datay,epochs=25,batch_size=128)
model.summary()
Related
I am trying to run LSTM with TFIDF as input, but getting an error. I have TFIDF with each entry of 11915 dimensions
Code is as follows:
## Creating model
model=Sequential()
model.add(Bidirectional(LSTM(100, input_shape=(1, 11915),return_sequences=True)))
model.add(Dropout(0.3))
model.add(Dense(1,activation='sigmoid'))
model.build(input_shape=(1, 11915))
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
print(model.summary())
Error is as follows
Input 0 of layer bidirectional_27 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [1, 11915]
I am new to this area, any help will be highly appreciated. It will be really nice if someone writes a dummy code for running Bidirectional LSTM on such an input
My input is tfidf of 10229*11915. I want to do fake news detection using LSTM on TFIDF as input
this is a complete working example
# create fake data
n_sample = 10229
X = np.random.uniform(0,1, (n_sample,11915))
y = np.random.randint(0,2, n_sample)
# expand X to 3D
X = X.reshape(X.shape[0],1,X.shape[-1])
model=Sequential()
model.add(Bidirectional(LSTM(100, return_sequences=False), input_shape=(1, 11915)))
model.add(Dropout(0.3))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
print(model.summary())
model.fit(X,y, epochs=3, batch_size=256)
the error occurs because probably u didn't manage your data correctly. take care also to define the first layer correctly and return_sequences=False because your output is 2D
You want to specify your input_shape in the Bidirectional layer:
model.add(Bidirectional(LSTM(100, return_sequences=True), input_shape=(1, 11915)))
I am trying to do multi class classification with tf keras. I have total 20 labels and total data I have is 63952and I have tried the following code
features = features.astype(float)
labels = df_test["label"].values
encoder = LabelEncoder()
encoder.fit(labels)
encoded_Y = encoder.transform(labels)
dummy_y = np_utils.to_categorical(encoded_Y)
Then
def baseline_model():
model = Sequential()
model.add(Dense(50, input_dim=3, activation='relu'))
model.add(Dense(40, activation='softmax'))
model.add(Dense(30, activation='softmax'))
model.add(Dense(20, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
finally
history = model.fit(data,dummy_y,
epochs=5000,
batch_size=50,
validation_split=0.3,
shuffle=True,
callbacks=[ch]).history
I have a very poor accuray with this. How can I improve that ?
softmax activations in the intermediate layers do not make any sense at all. Change all of them to relu and keep softmax only in the last layer.
Having done that, and should you still be getting unsatisfactory accuracy, experiment with different architectures (different numbers of layers and nodes) with a short number of epochs (say ~ 50), in order to get a feeling of how your model behaves, before going for a full fit with your 5,000 epochs.
You did not give us vital information, but here are some guidelines:
1. Reduce the number of Dense layer - you have a complicated layer with a small amount of data (63k is somewhat small). You might experience overfitting on your train data.
2. Did you check that the test has the same distribution as your train?
3. Avoid using softmax in middle Dense layers - softmax should be used in the final layer, use sigmoid or relu instead.
4. Plot a loss as a function of epoch curve and check if it is reduces - you can then understand if your learning rate is too high or too small.
I am enjoying the simplicity that Keras offers, however I have not been successful in configuring a Keras regression model with multiple outputs.
More specifically, I have a Keras model that consumes X values with 308 columns and with 28 target Y values. The model is (I think) quite simple and I would have thought it would converge quite quickly, but in fact is does not.
I am guessing here, but I think I have setup the model incorrectly and am looking for assistance on how to configure a Keras model to work properly.
Data information:
Number of rows: 46038
My input shape: X_train: (46038, 308)
My target shape: Y_train: (46038, 28)
The inputs (X) are a series of floats representing values that influence the allocation of a resource. The targets are a series of floats (which total/sum to 1.0 representing the actual percent allocation to a particular resource). My goal is to predict resource pct allocations (Y) based upon the provided inputs (X) As such, I believe this is a regression problem and not a classification problem (correct me if I am wrong)
Sample data:
X: [100, 200, 400, 600, 32, 1, 0.1, 0.5, 2500...] (308 columns, with 40000+ rows)
Y: [0.333, 0.667, 0.0, 0.0, 0.0, ...]
In the case of Y above, this means that 0.333 (33%) of the resource is allocated to first resource, 0.667 (67%) is allocated to the second resource and 0.0 to all others)
Model:
model = Sequential()
model.add(Dense(256, input_shape=(308,) ))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(256, input_shape=(256,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(28))
model.compile(loss='mean_squared_error', optimizer='adam')
Here are a few specific questions:
1. Is my model configured properly to achieve my goals?
2. Should I have different activation functions?
3. Are my input shapes (308,) setup properly? Are my output shapes (28) correct?
4. Should I have an activation on my output layer (for example: model.add(Activation('softmax'))? if yes, what type would be ideal?
(I don't think it is particularly relevant, but I am using a Tensorflow backend)
model = Sequential()
model.add(Dense(256, input_shape=(308,) ))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(256, input_shape=(256,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(28, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
Should solve the problem. Although it seems like a regression problem, the allocations are competing with each other which makes it like a classification and requires softmax nonlinearity and categorical_crossentropy loss.
Update
For early stopping you'll need a validation set and the following code:
earlyStopping=keras.callbacks.EarlyStopping(monitor='val_loss', patience=0, verbose=0, mode='auto')
model.fit(X, y, batch_size=100, nb_epoch=100, verbose=1, callbacks=[earlyStopping], validation_split=0.0, validation_data=None, shuffle=True, show_accuracy=False, class_weight=None, sample_weight=None)
Also you'll need to define a new custom metric function which instead of accuracy returns cross-entropy loss. You set the metric argument in model.compile to this new function.
I found problems when trying to compile a network with one recurrent layer. It seems there is some issue with the dimensionality of the first layer and thus my understanding of how RNN layers work in Keras.
My code sample is:
model.add(Dense(8,
input_dim = 2,
activation = "tanh",
use_bias = False))
model.add(SimpleRNN(2,
activation = "tanh",
use_bias = False))
model.add(Dense(1,
activation = "tanh",
use_bias = False))
The error is
ValueError: Input 0 is incompatible with layer simple_rnn_1: expected ndim=3, found ndim=2
This error is returned regardless of input_dim value. What am I missing ?
That message means: the input going into the rnn has 2 dimensions, but an rnn layer expects 3 dimensions.
For an RNN layer, you need inputs shaped like (BatchSize, TimeSteps, FeaturesPerStep). These are the 3 dimensions expected.
A Dense layer (in keras 2) can work with either 2 or 3 dimensions. We can see that you're working with 2 because you passed an input_dim instead of passing an input_shape=(Steps,Features).
There are many possible ways to solve this, but the most meaningful and logical would be a case where your input data is a sequence with time steps.
Solution 1 - Your training data is a sequence:
If your training data is a sequence, you shape it like (NumberOfSamples, TimeSteps, Features) and pass it to your model. Make sure you use input_shape=(TimeSteps,Features) in the first layer instead of using input_dim.
Solution 2 - You reshape the output of the first dense layer so it has the additional dimension:
model.add(Reshape((TimeSteps,Features)))
Make sure that the product TimeSteps*Features is equal to 8, the output of your first dense layer.
my input time series data is of the shape (nb_samples, 75, 32).
75 is the timesteps and 32 is the input dimension.
model = Sequential()
model.add(LSTM(4, input_shape=(75, 32)))
model.summary()
The LSTM weight vectors,[W_i, W_c, W_f, W_o] are all 32 dimensions, but the output is just a single value. the output shape of the above model is (1,4). But in LSTM the output is also a vector so should not it be (32,4) for many to one implementation as above? why is it giving a single value for multi-dimension input also?
As you can read in the Keras doc for reccurent layers
For an input of shape (nb_sample, timestep, input_dim), you have two possible outputs:
if you set return_sequence=True in your LSTM (which is not your case), you return every hidden state, so the intermediate steps when the LSTM 'reads' your sequence. You get an output of shape (nb_sample, timestep, output_dim).
if you set return_sequence=False (which is the default), it will only output the last state. So you will get an output of shape (nb_sample, output_dim).
So if you define your LSTM layer like this :
model.add(LSTM(4, return_sequence=True, input_shape=(75, 32)))
you will have an output of shape (None, 75, 4). If 32 is your time dimension, you will have to transpose the data before feeding it to the LSTM. The first dimension is the temporal one.
I hope this helps :)