Why does this model have two activations? - machine-learning

I do not understand why code below has both activation layer and activation parameter? What is the point of having linear activation and then LeakyReLU? And finally, does the loss function act on the last layer or after the last layer is activated?
fashion_model = Sequential()
fashion_model.add(Conv2D(32, kernel_size=(3, 3),activation='linear',input_shape=(28,28,1),padding='same'))
model.add(LeakyReLU(alpha=0.1))
model.add(MaxPooling2D((2, 2),padding='same'))
...more code...
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adam(),metrics=['accuracy'])

'Linear' activation means no activation (it is just identity mapping).
Loss is applied to output of the model, and the output of the model in this case is the softmax.

Related

How to use categorical_hinge loss in keras in order to train with an SVM in the last layer?

I wanna train a CNN using SVM to classify at the last layer. I understand that the categorical_hinge is the best loss function for that . I have 6 classes to classify .
My model is as shown below:
model = Sequential()
model.add(Conv2D(50, 3, 3, activation = 'relu', input_shape = train_data.shape[1:]))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(50, 3, 3, activation = 'relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(50, 3, 3, activation = 'relu'))
model.add(Flatten())
model.add(Dense(400, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(128, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation = 'sigmoid'))
Is there a problem with the network , data processing , or the loss function?
The model does not learn anything after a point as shown in the image
What should I do?
Your model has a single output neuron, there is no way this will work with 6 classes. The output of your model should have 6 neurons. Also the output of your model should have no activation function in order to produce logits that the categorical hinge can use.
Note that the categorical hinge was added recently (2-3 weeks ago) so its quite new and probably not many people have tested it.
Use hinge loss in and linear activation in last layer.
model.add(Dense(nb_classes), W_regularizer=l2(0.01))
model.add(Activation('linear'))
model.compile(loss='hinge',
optimizer='adadelta',
metrics=['accuracy'])
for more information visit https://github.com/keras-team/keras/issues/6090

Keras model with multiple outputs not converging

I am enjoying the simplicity that Keras offers, however I have not been successful in configuring a Keras regression model with multiple outputs.
More specifically, I have a Keras model that consumes X values with 308 columns and with 28 target Y values. The model is (I think) quite simple and I would have thought it would converge quite quickly, but in fact is does not.
I am guessing here, but I think I have setup the model incorrectly and am looking for assistance on how to configure a Keras model to work properly.
Data information:
Number of rows: 46038
My input shape: X_train: (46038, 308)
My target shape: Y_train: (46038, 28)
The inputs (X) are a series of floats representing values that influence the allocation of a resource. The targets are a series of floats (which total/sum to 1.0 representing the actual percent allocation to a particular resource). My goal is to predict resource pct allocations (Y) based upon the provided inputs (X) As such, I believe this is a regression problem and not a classification problem (correct me if I am wrong)
Sample data:
X: [100, 200, 400, 600, 32, 1, 0.1, 0.5, 2500...] (308 columns, with 40000+ rows)
Y: [0.333, 0.667, 0.0, 0.0, 0.0, ...]
In the case of Y above, this means that 0.333 (33%) of the resource is allocated to first resource, 0.667 (67%) is allocated to the second resource and 0.0 to all others)
Model:
model = Sequential()
model.add(Dense(256, input_shape=(308,) ))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(256, input_shape=(256,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(28))
model.compile(loss='mean_squared_error', optimizer='adam')
Here are a few specific questions:
1. Is my model configured properly to achieve my goals?
2. Should I have different activation functions?
3. Are my input shapes (308,) setup properly? Are my output shapes (28) correct?
4. Should I have an activation on my output layer (for example: model.add(Activation('softmax'))? if yes, what type would be ideal?
(I don't think it is particularly relevant, but I am using a Tensorflow backend)
model = Sequential()
model.add(Dense(256, input_shape=(308,) ))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(256, input_shape=(256,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(28, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
Should solve the problem. Although it seems like a regression problem, the allocations are competing with each other which makes it like a classification and requires softmax nonlinearity and categorical_crossentropy loss.
Update
For early stopping you'll need a validation set and the following code:
earlyStopping=keras.callbacks.EarlyStopping(monitor='val_loss', patience=0, verbose=0, mode='auto')
model.fit(X, y, batch_size=100, nb_epoch=100, verbose=1, callbacks=[earlyStopping], validation_split=0.0, validation_data=None, shuffle=True, show_accuracy=False, class_weight=None, sample_weight=None)
Also you'll need to define a new custom metric function which instead of accuracy returns cross-entropy loss. You set the metric argument in model.compile to this new function.

Why is binary_crossentropy more accurate than categorical_crossentropy for multiclass classification in Keras?

I'm learning how to create convolutional neural networks using Keras. I'm trying to get a high accuracy for the MNIST dataset.
Apparently categorical_crossentropy is for more than 2 classes and binary_crossentropy is for 2 classes. Since there are 10 digits, I should be using categorical_crossentropy. However, after training and testing dozens of models, binary_crossentropy consistently outperforms categorical_crossentropy significantly.
On Kaggle, I got 99+% accuracy using binary_crossentropy and 10 epochs. Meanwhile, I can't get above 97% using categorical_crossentropy, even using 30 epochs (which isn't much, but I don't have a GPU, so training takes forever).
Here's what my model looks like now:
model = Sequential()
model.add(Convolution2D(100, 5, 5, border_mode='valid', input_shape=(28, 28, 1), init='glorot_uniform', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(100, 3, 3, init='glorot_uniform', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.3))
model.add(Flatten())
model.add(Dense(100, init='glorot_uniform', activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(100, init='glorot_uniform', activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(10, init='glorot_uniform', activation='softmax'))
model.compile(loss='binary_crossentropy', optimizer='adamax', metrics=['accuracy'])
Short answer: it is not.
To see that, simply try to calculate the accuracy "by hand", and you will see that it is different from the one reported by Keras with the model.evaluate method:
# Keras reported accuracy:
score = model.evaluate(x_test, y_test, verbose=0)
score[1]
# 0.99794011611938471
# Actual accuracy calculated manually:
import numpy as np
y_pred = model.predict(x_test)
acc = sum([np.argmax(y_test[i])==np.argmax(y_pred[i]) for i in range(10000)])/10000
acc
# 0.98999999999999999
The reason it seems to be so is a rather subtle issue at how Keras actually guesses which accuracy to use, depending on the loss function you have selected, when you include simply metrics=['accuracy'] in your model compilation.
If you check the source code, Keras does not define a single accuracy metric, but several different ones, among them binary_accuracy and categorical_accuracy. What happens under the hood is that, since you have selected binary cross entropy as your loss function and have not specified a particular accuracy metric, Keras (wrongly...) infers that you are interested in the binary_accuracy, and this is what it returns.
To avoid that, i.e. to use indeed binary cross entropy as your loss function (nothing wrong with this, in principle) while still getting the categorical accuracy required by the problem at hand (i.e. MNIST classification), you should ask explicitly for categorical_accuracy in the model compilation as follows:
from keras.metrics import categorical_accuracy
model.compile(loss='binary_crossentropy', optimizer='adamax', metrics=[categorical_accuracy])
And after training, scoring, and predicting the test set as I show above, the two metrics now are the same, as they should be:
sum([np.argmax(y_test[i])==np.argmax(y_pred[i]) for i in range(10000)])/10000 == score[1]
# True
(HT to this great answer to a similar problem, which helped me understand the issue...)
UPDATE: After my post, I discovered that this issue had already been identified in this answer.
First of all, binary_crossentropy is not when there are two classes.
The "binary" name is because it is adapted for binary output, and each number of the softmax is aimed at being 0 or 1.
Here, it checks for each number of the output.
It doesn't explain your result, since categorical_entropy exploits the fact that it is a classification problem.
Are you sure that when you read your data there is one and only one class per sample? It's the only one explanation I can give.

How to set initial weights in Convolution2D in Keras?

The syntax for adding Convolution2D layer is Keras is
https://keras.io/layers/convolutional/#convolution2d. I am unable to pass "weights" argument correctly. How should I do that?
conv1_1 = Conv2D(64, kernel_size=(3, 3), activation='relu', padding='same',
weight_and_bias=[weight, biases], name='conv1_1')(input)
The shape of weight is (nb_filter, nb_channel, filter_size, filter_size), shape of biases is (nb_channel,)
You should pass a list of numpy arrays to set as initial weights.
For Convolution2D, weights list have two items, one in shape (nb_filter, nb_channel, nb_row, nb_col) and bias in shape (nb_filter,).
According to the author of Keras:
If you have doubts about what these shapes are, you can simply
instantiate your layer then call get_weights(), and look at the
output. The argument weights, and also the method
set_weights(weights), expect exactly the same format as the output
of get_weights().

Keras: How to feed input directly into other hidden layers of the neural net than the first?

I have a question about using Keras to which I'm rather new. I'm using a convolutional neural net that feeds its results into a standard perceptron layer, which generates my output. This CNN is fed with a series of images. This is so far quite normal.
Now I like to pass a short non-image input vector directly into the last perceptron layer without sending it through all the CNN layers. How can this be done in Keras?
My code looks like this:
# last CNN layer before perceptron layer
model.add(Convolution2D(200, 2, 2, border_mode='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Dropout(0.25))
# perceptron layer
model.add(Flatten())
# here I like to add to the input from the CNN an additional vector directly
model.add(Dense(1500, W_regularizer=l2(1e-3)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1))
Any answers are greatly appreciated, thanks!
You didn't show which kind of model you use to me, but I assume that you initialized your model as Sequential. In a Sequential model you can only stack one layer after another - so adding a "short-cut" connection is not possible.
For this reason authors of Keras added option of building "graph" models. In this case you can build a graph (DAG) of your computations. It's a more complicated than designing a stack of layers, but still quite easy.
Check the documentation site to look for more details.
Provided your Keras's backend is Theano, you can do the following:
import theano
import numpy as np
d = Dense(1500, W_regularizer=l2(1e-3), activation='relu') # I've joined activation and dense layers, based on assumption you might be interested in post-activation values
model.add(d)
model.add(Dropout(0.5))
model.add(Dense(1))
c = theano.function([d.get_input(train=False)], d.get_output(train=False))
layer_input_data = np.random.random((1,20000)).astype('float32') # refer to d.input_shape to get proper dimensions of layer's input, in my case it was (None, 20000)
o = c(layer_input_data)
The answer here works. It is more high level and works also for the tensorflow backend:
input_1 = Input(input_shape)
input_2 = Input(input_shape)
merge = merge([input_1, input_2], mode="concat") # could also to "sum", "dot", etc.
hidden = Dense(hidden_dims)(merge)
classify = Dense(output_dims, activation="softmax")(hidden)
model = Model(input=[input_1, input_2], output=hidden)

Resources