In the Keras Documentation for Convolution2D the input_shape a 128x128 RGB pictures is given by input_shape=(3, 128, 128), thus I figured the first component should be the number of planes (or feature layers).
If I run the following code:
model = Sequential()
model.add(Convolution2D(4, 5,5, border_mode='same', input_shape=(3, 19, 19), activation='relu'))
print(model.output_shape)
I get an output_shapeof (None, 3, 19, 4), whereas in my understanding this should be (None, 4, 19, 19) with 4 the number of filters.
Is this an error in the example from the keras documentation or am I missing something?
(I am trying to recreate a part of AlphaGo so the 19x19 is the board size which would correspond to the images size. )
You are using the Theano dimension ordering (channels, rows, cols) as input but your Keras seems to use the Tensorflow one which is (rows, cols, channels).
So either you can switch to the Theano dimension ordering, directly in your code with :
import keras.backend as K
K.set_image_dim_ordering('th')
Or editing the keras.json file in (usually in ~\.keras) and switching
"image_dim_ordering": "tf" to "image_dim_ordering": "th"
Or you can keep the Tensorflow dimension ordering and switch your input_shape to (19,19,3)
Yes it should be (None, 4, 19, 19). There is something called dim_ordering in keras that decides in which index should one place the number of input channels. Check the documentation of "dim_ordering" parameter in the documentation. Mine is set to 'tf'.
So; just change the input shape to (19, 19, 3) like so
model.add(Convolution2D(4, 5,5, border_mode='same', input_shape=(19, 19,3), activation='relu'))
Then check the output shape.
You can also modify the dim_ordering in the file usually at ~/.keras/keras.json to your liking
Related
I am specifically looking at the AlexNet architecture found here:
https://github.com/pytorch/vision/blob/master/torchvision/models/alexnet.py
I am confused as to how they are getting the input and output channels. Based on my readings of the AlexNet, I can't figure out where they are getting outputchannels = 64 from (as the second argument to the Conv2d function). Even if the 256 is split across 2 GPUs, that should give 128 rather than 64. The input channel of 3 initially represents the color channels as per my assumption. However, the other input and output channels don't make sense to me either.
Could anyone clarify what the input and output channels are?
class AlexNet(nn.Module):
def __init__(self, num_classes=1000):
super(AlexNet, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2), #why 64?
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(64, 192, kernel_size=5, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(192, 384, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(384, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
)
The 3 is the number of input channels (R, G, B). That 64 is the number of channels (i.e. feature maps) in the output of the first convolution operation. So, the first conv layer takes a color (RGB) image as input, applies 11x11 kernel with a stride 4, and outputs 64 feature maps.
I agree that this is different from the number of channels (96, 48 in each GPU) in the architecture diagram (of original AlexNet implementation).
However, PyTorch does not implement the original Alexnet architecture. Rather it implements a variant of the AlexNet implementation described in the paper: One weird trick for parallelizing convolutional neural networks.
Also, see cs231n - convolutional networks for more details about how input, filters, stride, and padding equates to output after the conv operation.
P.S: See pytorch/vision/issues/185
I am enjoying the simplicity that Keras offers, however I have not been successful in configuring a Keras regression model with multiple outputs.
More specifically, I have a Keras model that consumes X values with 308 columns and with 28 target Y values. The model is (I think) quite simple and I would have thought it would converge quite quickly, but in fact is does not.
I am guessing here, but I think I have setup the model incorrectly and am looking for assistance on how to configure a Keras model to work properly.
Data information:
Number of rows: 46038
My input shape: X_train: (46038, 308)
My target shape: Y_train: (46038, 28)
The inputs (X) are a series of floats representing values that influence the allocation of a resource. The targets are a series of floats (which total/sum to 1.0 representing the actual percent allocation to a particular resource). My goal is to predict resource pct allocations (Y) based upon the provided inputs (X) As such, I believe this is a regression problem and not a classification problem (correct me if I am wrong)
Sample data:
X: [100, 200, 400, 600, 32, 1, 0.1, 0.5, 2500...] (308 columns, with 40000+ rows)
Y: [0.333, 0.667, 0.0, 0.0, 0.0, ...]
In the case of Y above, this means that 0.333 (33%) of the resource is allocated to first resource, 0.667 (67%) is allocated to the second resource and 0.0 to all others)
Model:
model = Sequential()
model.add(Dense(256, input_shape=(308,) ))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(256, input_shape=(256,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(28))
model.compile(loss='mean_squared_error', optimizer='adam')
Here are a few specific questions:
1. Is my model configured properly to achieve my goals?
2. Should I have different activation functions?
3. Are my input shapes (308,) setup properly? Are my output shapes (28) correct?
4. Should I have an activation on my output layer (for example: model.add(Activation('softmax'))? if yes, what type would be ideal?
(I don't think it is particularly relevant, but I am using a Tensorflow backend)
model = Sequential()
model.add(Dense(256, input_shape=(308,) ))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(256, input_shape=(256,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(28, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
Should solve the problem. Although it seems like a regression problem, the allocations are competing with each other which makes it like a classification and requires softmax nonlinearity and categorical_crossentropy loss.
Update
For early stopping you'll need a validation set and the following code:
earlyStopping=keras.callbacks.EarlyStopping(monitor='val_loss', patience=0, verbose=0, mode='auto')
model.fit(X, y, batch_size=100, nb_epoch=100, verbose=1, callbacks=[earlyStopping], validation_split=0.0, validation_data=None, shuffle=True, show_accuracy=False, class_weight=None, sample_weight=None)
Also you'll need to define a new custom metric function which instead of accuracy returns cross-entropy loss. You set the metric argument in model.compile to this new function.
I am writing a code for image classification for two classes using keras with tensorflow backend. My images are stored in folder in computer and i want to give these images as input to my keras model. load_img takes only one input image so i have to use either flow(x,y) or flow_from_directory(directory), but in flow(x,y) we need to also provide labels which is length task so i am using flow_from_directory(directory). My images are of variable sizes like 20*40, 55*43..... but here it is mentioned that fixed target_size is required. In this solution it is given that we can give variable size images as input to convolution layer using input_shape=(1, None, None) or input_shape=(None,None,3) (channel last and color images) but fchollet mention that it is not useful for flatten layer and my model consist both convolution and flatten layers. In that post only moi90 suggest that try different batches but every batch should have images with same size, but it is not possible me to group images with same sizes because my data is very scatter. So i decided to go with batch size=1 and write following code:
from __future__ import print_function
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D
from keras import backend as K
import numpy as np
from keras.preprocessing.image import ImageDataGenerator
input_shape = (None,None,3)
model = Sequential()
model.add(Conv2D(8, kernel_size=(3, 3),
activation='relu',
input_shape=input_shape))
model.get_weights()
model.add(Conv2D(16, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))
model.compile(loss='binary_crossentropy',optimizer='rmsprop',metrics=['accuracy'])
train_datagen = ImageDataGenerator()
test_datagen = ImageDataGenerator()
train_generator = train_datagen.flow_from_directory('/data/train', target_size=input_shape, batch_size=1,class_mode='binary')
validation_generator = test_datagen.flow_from_directory('/data/test',target_size=input_shape,batch_size=1,class_mode='binary')
model.fit_generator(train_generator,steps_per_epoch=1,epochs=2,validation_data=validation_generator,validation_steps=1)
Now i am getting following error:
Traceback (most recent call last):
File "<ipython-input-8-4e22d22e4bd7>", line 23, in <module>
model.add(Flatten())
File "/home/nd/anaconda3/lib/python3.6/site-packages/keras/models.py", line 489, in add
output_tensor = layer(self.outputs[0])
File "/home/nd/anaconda3/lib/python3.6/site-packages/keras/engine/topology.py", line 622, in __call__
output_shape = self.compute_output_shape(input_shape)
File "/home/nd/anaconda3/lib/python3.6/site-packages/keras/layers/core.py", line 478, in compute_output_shape
'(got ' + str(input_shape[1:]) + '. '
ValueError: The shape of the input to "Flatten" is not fully defined (got (None, None, 16). Make sure to pass a complete "input_shape" or "batch_input_shape" argument to the first layer in your model.
I am sure it is not because of img_dim_ordering and backend but because of this i have checked both are th Please help to correct he code or help how i can give variable size images as input to my model.
You can train variable sizes, as long as you don't try to put variable sizes in a numpy array.
But some layers do not support variable sizes, and Flatten is one of them. It's impossible to train models containing Flatten layers with variable sizes.
You can try, though, to replace the Flatten layer with either a GlobalMaxPooling2D or a GlobalAveragePooling2D layer. But these layers may condense too much information into a small data, so it might be necessary to add more convolutions with more channels before them.
You must make sure that your generator will produce batches containing images of the same size, though. The generator will fail when trying to put two or more images with different sizes in the same numpy array.
See the answer in https://github.com/keras-team/keras/issues/1920
Yo you should change the input to be:
input = Input(shape=(None, None,3))
The in the end add GlobalAveragePooling2D():
Try something like that ...
input = Input(shape=(None, None,3))
model = Sequential()
model.add(Conv2D(8, kernel_size=(3, 3),
activation='relu',
input_shape=(None, None,3))) #Look on the shape
model.add(Conv2D(16, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# IMPORTANT !
model add(GlobalAveragePooling2D())
# IMPORTANT !
model.add(Flatten())
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))
model.compile(loss='binary_crossentropy',optimizer='rmsprop',metrics=['accuracy'])
Unfortunately you can't train a neural network with various size images as it is. You have to resize all images to a given size. Fortunately you don't have to do this in your hard drive, permanently by keras does this for you on hte fly.
Inside your flow_from_directory you should define a target_size like this:
train_generator = train_datagen.flow_from_directory(
'data/train',
target_size=(150, 150), #every image will be resized to (150,150) before fed to neural network
batch_size=32,
class_mode='binary')
Also, if you do so, you can have whatever batch size you want.
Background:
Tagging TensorFlow since Keras runs on top of it and this is more a general deep learning question.
I have been working on the Kaggle Digit Recognizer problem and used Keras to train CNN models for the task. This model below has the original CNN structure I used for this competition and it performed okay.
def build_model1():
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), padding="Same" activation="relu", input_shape=[28, 28, 1]))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.25))
model.add(layers.Conv2D(64, (3, 3), padding="Same", activation="relu"))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.25))
model.add(layers.Conv2D(64, (3, 3), padding="Same", activation="relu"))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.25))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation="relu"))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(10, activation="softmax"))
return model
Then I read some other notebooks on Kaggle and borrowed another CNN structure (copied below), which works much better than the one above in that it achieved better accuracy, lower error rate, and took many more epochs before overfitting the training data.
def build_model2():
model = models.Sequential()
model.add(layers.Conv2D(32, (5, 5),padding ='Same', activation='relu', input_shape = (28, 28, 1)))
model.add(layers.Conv2D(32, (5, 5),padding = 'Same', activation ='relu'))
model.add(layers.MaxPool2D((2, 2)))
model.add(layers.Dropout(0.25))
model.add(layers.Conv2D(64,(3, 3),padding = 'Same', activation ='relu'))
model.add(layers.Conv2D(64, (3, 3),padding = 'Same', activation ='relu'))
model.add(layers.MaxPool2D(pool_size=(2, 2), strides=(2, 2)))
model.add(layers.Dropout(0.25))
model.add(layers.Flatten())
model.add(layers.Dense(256, activation = "relu"))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(10, activation = "softmax"))
return model
Question:
Is there any intuition or explanation behind the better performance of the second CNN structure? What is it that makes stacking 2 Conv2D layers better than just using 1 Conv2D layer before max pooling and dropout? Or is there something else that contributes to the result of the second model?
Thank y'all for your time and help.
The main difference between these two approaches is that the later (2 conv) has more flexibility in expressing non-linear transformations without loosing information. Maxpool removes information from the signal, dropout forces distributed representation, thus both effectively make it harder to propagate information. If, for given problem, highly non-linear transformation has to be applied on raw data, stacking multiple convs (with relu) will make it easier to learn, that's it. Also note that you are comparing a model with 3 max poolings with model with only 2, consequently the second one will potentially loose less information. Another thing is it has way bigger fully connected bit at the end, while the first one is tiny (64 neurons + 0.5 dropout means that you effectively have at most 32 neurons active, that is a tiny layer!). To sum up:
These architectures differe in many aspects, not just stacking conv nets.
Stacking convnets usually leads to less information being lost in processing; see for example "all convolutional" architectures.
I'm training a convolutional neural network on text (on the character level) and I want to do max-pooling. tf.nn.max_pool expects a rank 4 Tensor, but 1-d convnets are rank 3 in tensorflow ([batch, width, depth]), so when I pass the output of conv1d to the max pool function, this is the error:
ValueError: Shape (1, 144, 512) must have rank 4
I'm new to tensorflow and deep learning frameworks in general and would like advice on the best practice here, because I can imagine there are multiple workarounds. How can I perform max-pooling in the 1-d case?
Thanks.
A quick way would be to add an extra singleton dimension i.e. make the shape (1, 1, 144, 512), from there you can reduce it back with tf.squeeze.
I'm curious about other approaches though.