Issue in LSTM Input Dimensions in Keras - machine-learning

I am trying to implement a multi-input LSTM model using keras. The code is as follows:
data_1 -> shape (1150,50)
data_2 -> shape (1150,50)
y_train -> shape (1150,50)
input_1 = Input(shape=data_1.shape)
LSTM_1 = LSTM(100)(input_1)
input_2 = Input(shape=data_2.shape)
LSTM_2 = LSTM(100)(input_2)
concat = Concatenate(axis=-1)
x = concat([LSTM_1, LSTM_2])
dense_layer = Dense(1, activation='sigmoid')(x)
model = keras.models.Model(inputs=[input_1, input_2], outputs=[dense_layer])
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['acc'])
model.fit([data_1, data_2], y_train, epochs=10)
When I run this code, I get a ValueError:
ValueError: Error when checking model input: expected input_1 to have 3 dimensions, but got array with shape (1150, 50)
Do anyone have any solution to this problem?

Use data1 = np.expand_dims(data1, axis=2), before you define the model. LSTM expects inputs with dimensions (batch_size, timesteps, features), so, in your case, I guessing you have 1 feature, 50 time steps and 1150 samples, you need to add a dimension at the end of your vector.
This need to be done before you define the model otherwise when you set input_1 = Input(shape=data_1.shape) you are telling keras that your input has 1150 timesteps and 50 features,so it will expect inputs of shape (None, 1150, 50) (the non stands for "any dimension will be accepted").
The same holds for input_2
Hope this helps

Related

CIFAR-10 test set classification accuracy different on PyTorch and Keras

I’ve made a custom CNN in PyTorch for classifying 10 classes in the CIFAR-10 dataset. My classification accuracy on the test dataset is 45.739%, this is very low and I thought it’s because my model is not very deep but I implemented the same model in Keras and the classification accuracy come outs to be 78.92% on test dataset. No problem in Keras however I think there's something I'm missing in my PyTorch program.
I have used the same model architecture, strides, padding, dropout rate, optimizer, loss function, learning rate, batch size, number of epochs on both PyTorch and Keras and despite that, the difference in the classification accuracy is still huge thus I’m not able to decide how I should debug my PyTorch program further.
For now I suspect 3 things: in Keras, I use the categorical cross entropy loss function (one hot vector labels) and in PyTorch I use the standard cross entropy loss function (scalar indices labels), can this be a problem?, if not then I suspect either my training loop or the code for calculating classification accuracy in PyTorch. I have attached both my programs below, will be grateful to any suggestions.
My program in Keras:
#================Function that defines the CNN model===========
def CNN_model():
model = Sequential()
model.add(Conv2D(32,(3,3),activation='relu',padding='same', input_shape=(size,size,channels))) #SAME PADDING
model.add(Conv2D(32,(3,3),activation='relu')) #VALID PADDING
model.add(MaxPooling2D(pool_size=(2,2))) #VALID PADDING
model.add(Dropout(0.25))
model.add(Conv2D(64,(3,3),activation='relu', padding='same')) #SAME PADDING
model.add(Conv2D(64,(3,3),activation='relu')) #VALID PADDING
model.add(MaxPooling2D(pool_size=(2,2))) #VALID PADDING
model.add(Dropout(0.25))
model.add(Conv2D(128,(3,3),activation='relu', padding='same')) #SAME PADDING
model.add(Conv2D(128,(3,3),activation='relu')) #VALID PADDING
model.add(MaxPooling2D(pool_size=(2,2),name='feature_extractor_layer')) #VALID PADDING
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512, activation='relu', name='second_last_layer'))
model.add(Dropout(0.25))
model.add(Dense(10, activation='softmax', name='softmax_layer')) #10 nodes in the softmax layer
model.summary()
return model
#=====Main program starts here========
#get_train_data() and get_test_data() are my own custom functions to get CIFAR-10 dataset
images_train, labels_train, class_train = get_train_data(0,10)
images_test, labels_test, class_test = get_test_data(0,10)
model = CNN_model()
model.compile(loss='categorical_crossentropy', #loss function of the CNN
optimizer=Adam(lr=1.0e-4), #Optimizer
metrics=['accuracy'])#'accuracy' metric is to be evaluated
#images_train and images_test contain images and
#class_train and class_test contains one hot vectors labels
model.fit(images_train,class_train,
batch_size=128,
epochs=50,
validation_data=(images_test,class_test),
verbose=1)
scores=model.evaluate(images_test,class_test,verbose=0)
print("Accuracy: "+str(scores[1]*100)+"% \n")
My program in PyTorch:
#========DEFINE THE CNN MODEL=====
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 32, 3,1,1)#SAME PADDING
self.conv2 = nn.Conv2d(32,32,3,1,0)#VALID PADDING
self.pool1 = nn.MaxPool2d(2,2) #VALID PADDING
self.drop1 = nn.Dropout2d(0.25) #DROPOUT OF 0.25
self.conv3 = nn.Conv2d(32,64,3,1,1)#SAME PADDING
self.conv4 = nn.Conv2d(64,64,3,1,0)#VALID PADDING
self.pool2 = nn.MaxPool2d(2,2)#VALID PADDING
self.drop2 = nn.Dropout2d(0.25) #DROPOUT OF 0.25
self.conv5 = nn.Conv2d(64,128,3,1,1)#SAME PADDING
self.conv6 = nn.Conv2d(128,128,3,1,0)#VALID PADDING
self.pool3 = nn.MaxPool2d(2,2)#VALID PADDING
self.drop3 = nn.Dropout2d(0.25) #DROPOUT OF 0.25
self.fc1 = nn.Linear(128*2*2, 512)#128*2*2 IS OUTPUT DIMENSION AFTER THE PREVIOUS LAYER
self.drop4 = nn.Dropout(0.25) #DROPOUT OF 0.25
self.fc2 = nn.Linear(512,10) #10 output nodes
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.relu(self.conv2(x))
x = self.pool1(x)
x = self.drop1(x)
x = F.relu(self.conv3(x))
x = F.relu(self.conv4(x))
x = self.pool2(x)
x = self.drop2(x)
x = F.relu(self.conv5(x))
x = F.relu(self.conv6(x))
x = self.pool3(x)
x = self.drop3(x)
x = x.view(-1,2*2*128) #FLATTENING OPERATION 2*2*128 IS OUTPUT AFTER THE PREVIOUS LAYER
x = F.relu(self.fc1(x))
x = self.drop4(x)
x = self.fc2(x) #LAST LAYER DOES NOT NEED SOFTMAX BECAUSE THE LOSS FUNCTION WILL TAKE CARE OF IT
return x
#=======FUNCTION TO CONVERT INPUT AND TARGET TO TORCH TENSORS AND LOADING INTO GPU======
def PrepareInputDataAndTargetData(device,images,labels,batch_size):
#GET MINI BATCH OF TRAINING IMAGES AND RESHAPE THE TORCH TENSOR FOR CNN PROCESSING
mini_batch_images = torch.tensor(images)
mini_batch_images = mini_batch_images.view(batch_size,3,32,32)
#GET MINI BATCH OF TRAINING LABELS, TARGET SHOULD BE IN LONG FORMAT SO CONVERT THAT TOO
mini_batch_labels = torch.tensor(labels)
mini_batch_labels = mini_batch_labels.long()
#FEED THE INPUT DATA AND TARGET LABELS TO GPU
mini_batch_images = mini_batch_images.to(device)
mini_batch_labels = mini_batch_labels.to(device)
return mini_batch_images,mini_batch_labels
#==========MAIN PROGRAM==========
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
#get_train_data() and get_test_data() are my own custom functions to get CIFAR-10 dataset
Images_train, Labels_train, Class_train = get_train_data(0,10)
Images_test, Labels_test, Class_test = get_test_data(0,10)
net = Net()
net = net.double() #https://discuss.pytorch.org/t/runtimeerror-expected-object-of-scalar-type-double-but-got-scalar-type-float-for-argument-2-weight/38961
print(net)
#MAP THE MODEL ONTO THE GPU
net = net.to(device)
#CROSS ENTROPY LOSS FUNCTION AND ADAM OPTIMIZER
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=1e-4)
#PREPARE THE DATALOADER
#Images_train contains images and Labels_trains contains indices i.e. 0,1,...,9
dataset = TensorDataset( Tensor(Images_train), Tensor(Labels_train) )
trainloader = DataLoader(dataset, batch_size= 128, shuffle=True)
#START TRAINING THE CNN MODEL FOR 50 EPOCHS
for epoch in range(0,50):
for i, data in enumerate(trainloader, 0):
inputs, labels = data
inputs = torch.tensor(inputs).double()
inputs = inputs.view(len(inputs),3,32,32) #RESHAPE THE IMAGES
labels = labels.long() #MUST CONVERT LABEL TO LONG FORMAT
#MAP THE INPUT AND LABELS TO THE GPU
inputs=inputs.to(device)
labels=labels.to(device)
#FORWARD PROP, BACKWARD PROP, PARAMETER UPDATE
optimizer.zero_grad()
outputs = net.forward(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
#CALCULATE CLASSIFICATION ACCURACY ON ALL 10 CLASSES
with torch.no_grad():
Images_class,Labels_class = PrepareInputDataAndTargetData(device,Images_test,Labels_test,len(Images_test))
network_outputs = net.forward(Images_class)
correct = (torch.argmax(network_outputs.data,1) == Labels_class.data).float().sum()
acc = float(100.0*(correct/len(Images_class)))
print("Accuracy is: "+str(acc)+"\n")
del Images_class
del Labels_class
del network_outputs
del correct
del acc
torch.cuda.empty_cache()
print("Done\n")
I am not fully aware of how the actual core backend works in both libraries however I suppose that the classification accuracy of any model should be almost the same regardless of the library.

Using softmax in Neural Networks to decide label of input

I am using the keras model with the following layers to predict a label of input (out of 4 labels)
embedding_layer = keras.layers.Embedding(MAX_NB_WORDS,
EMBEDDING_DIM,
weights=[embedding_matrix],
input_length=MAX_SEQUENCE_LENGTH,
trainable=False)
sequence_input = keras.layers.Input(shape = (MAX_SEQUENCE_LENGTH,),
dtype = 'int32')
embedded_sequences = embedding_layer(sequence_input)
hidden_layer = keras.layers.Dense(50, activation='relu')(embedded_sequences)
flat = keras.layers.Flatten()(hidden_layer)
preds = keras.layers.Dense(4, activation='softmax')(flat)
model = keras.models.Model(sequence_input, preds)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['acc'])
model.fit(X_train, Y_train, batch_size=32, epochs=100)
However, the softmax function returns a number of outputs of 4 (because I have 4 labels)
When I'm using the predict function to get the predicted Y using the same model, I am getting an array of 4 for each X rather than one single label deciding the label for the input.
model.predict(X_test, batch_size = None, verbose = 0, steps = None)
How do I make the output layer of the keras model, or the model.predict function, decide on one single label, rather than output weights for each label?
The following is a common function to sample from a probability vector
def sample(preds, temperature=1.0):
# helper function to sample an index from a probability array
preds = np.asarray(preds).astype('float64')
preds = np.log(preds) / temperature
exp_preds = np.exp(preds)
preds = exp_preds / np.sum(exp_preds)
probas = np.random.multinomial(1, preds, 1)
return np.argmax(probas)
Taken from here.
The temperature parameter decides how much the differences between the probability weights are weightd. A temperature of 1 is considering each weight "as it is", a temperature larger than 1 reduces the differences between the weights, a temperature smaller than 1 increases them.
Here an example using a probability vector on 3 labels:
p = np.array([0.1, 0.7, 0.2]) # The first label has a probability of 10% of being chosen, the second 70%, the third 20%
print(sample(p, 1)) # sample using the input probabilities, unchanged
print(sample(p, 0.1)) # the new vector of probabilities from which to sample is [ 3.54012033e-09, 9.99996371e-01, 3.62508322e-06]
print(sample(p, 10)) # the new vector of probabilities from which to sample is [ 0.30426696, 0.36962778, 0.32610526]
To see the new vector make sample return preds.

How to interpret predictions in TensorFlow, they seem to have the wrong shape

I have a TensorFlow model with some of these characteristics:
state_size = 800,
num_classes = 14313,
batch_size = 10,
num_steps = 16, # width of the tensor
num_layers = 3
x = tf.placeholder(tf.int32, [batch_size, num_steps], name='input_placeholder')
y = tf.placeholder(tf.int32, [batch_size, num_steps], name='labels_placeholder')
rnn_inputs = [tf.squeeze(i, squeeze_dims=[1]) for i in
tf.split(x_one_hot, num_steps, 1)] # still a list of tensors (batch_size, num_classes)
...
logits = tf.matmul(rnn_outputs, W) + b
predictions = tf.nn.softmax(logits)
Now I want to feed it a np.array (shape = batch_size x num_steps, so 10 x 16) and I get a predictions tensor back.
Weirdly, its shape is 160 x 14313. The latter is the number of classes. But where does 160 come from? I don't understand that. I would like to have a probability for each of my classes, for each of the elements of the batch (which is 10). How did the num_steps become involved, how to I read from this pred. tensor which is the expected element after those 16 numbers?
In this case the 160 comes from the shape as you suspected.
which means that for each batch of 10, has 16 timesteps, this is technically flattened when you do your shape variable.
At this point you have logits of shape 160 * classes. so you can do predictions[i] for each batch which then will have the probability of each class being the desired class.
which is why to get the chosen class you would do something like tf.argmax(predictions, 1) to get a tensor with the classification
this will have a shape of 160 in your case, so it will be the predicted
class for each one of the batches.
In order to get the probabilities, you could use the logits:
def prob(logit):
return 1/(1 + np.exp(-logit)

keras model fit_generator ValueError: Error when checking model target: expected cropping2d_4 to have 4 dimensions, but got array with shape (32, 1)

I'm trying to use keras model.fit_generator() to fit a model, below is my definition of the generator:
from sklearn.utils import shuffle
IMG_PATH_PREFIX = "./data/IMG/"
def generator(samples, batch_size=64):
num_samples = len(samples)
while 1: # Loop forever so the generator never terminates
shuffle(samples)
for offset in range(0, num_samples, batch_size):
batch_samples = samples[offset:offset+batch_size]
images = []
angles = []
for batch_sample in batch_samples:
name = IMG_PATH_PREFIX + batch_sample[0].split('/')[-1]
center_image = cv2.imread(name)
center_angle = float(batch_sample[3])
images.append(center_image)
angles.append(center_angle)
X_train = np.array(images)
y_train = np.array(angles)
#X_train = np.expand_dims(X_train, axis=0)
#y_train = np.expand_dims(y_train, axis=1)
print("X_train shape: ", X_train.shape, " y_train shape:", y_train.shape)
#print("X train: ", X_train)
yield X_train, y_train
train_generator = generator(train_samples, batch_size = 32)
validation_generator = generator(validation_samples, batch_size = 32)
Here the output shape is:
X_train shape: (32, 160, 320, 3) y_train shape: (32,)
The model fit code is:
model = Sequential()
#cropping layer
model.add(Cropping2D(cropping=((50,20), (1,1)), input_shape=(160,320,3), dim_ordering='tf'))
model.compile(loss = "mse", optimizer="adam")
model.fit_generator(train_generator, samples_per_epoch= len(train_samples), validation_data=validation_generator, nb_val_samples=len(validation_samples), nb_epoch=3)
Then I get the error message:
ValueError: Error when checking model target: expected cropping2d_6 to have 4 dimensions, but got array with shape (32, 1)
Could someone help let me know what's the issue?
The big question here is : do you know what you are trying to do ?
1) If you read here, the input is a 4D tensor and the output is ALSO a 4D tensor. Your target is a 2D tensor of shape (batch_size,1). So of course, when keras tries to compute the error between the output which has 3D (without batch dimension) and the target which has 1D (without batch dimension), it can not make sense out of that. Outputs and targets must have the same dimensions.
2) Do you know what cropping2D is actually doing ? It is cropping your images... So removing values at the beginning and end of your cropping dimensions. In your case you are outputing images of shape (90, 218, 3). This is not a prediction, there is no weight to train on this layer so no reason to fit the "model". Your model is just cropping images. No training needed for that.

Fully Connected Layer Followed by LSTM Layer

How do i combine a Tensorflow Fully Connected Layer, which is then followed by a LSTM Layer. My goal is to feed data of batch_size=batch_size, sequence length seq_length, and dimension 1. Target is 21 dim one_hot vector.
Here is the code i tried. It throws an error Shape must be rank 2 but is rank 3 for W_first line. What am i doing wrong
data = tf.placeholder(tf.float32, [None, 20,1]) #Number of examples,number of input, dimension of each input
target = tf.placeholder(tf.float32, [None, 21])
W_first=tf.Variable(tf.random_normal([10,num_hidden]))
out_1= tf.matmul(data,W_first)
cell = tf.nn.rnn_cell.LSTMCell(num_hidden,state_is_tuple=True)
val, _ = tf.nn.dynamic_rnn(cell, out_1, dtype=tf.float32)
Thanks in advance!

Resources