getting values between layers of a neural network using functional API - machine-learning

I am using below simple neural network for classification
model2 = keras.Sequential([
keras.layers.Flatten(input_shape=(X.shape[1], X.shape[2])),
keras.layers.Dense(64, activation ='relu'),
keras.layers.Dense(32, activation ='relu'),
keras.layers.Dense(16, activation ='relu'),
keras.layers.Dense(5, activation ='softmax')
])
I would like to check how the values for different classes change after going through each layer using the Euclidean distance to see which layer is the most useful and which is the least.
Does it make sense? If so, how can I do it using functional API? I encountered the first problems at the very beginning - they are related to the shape of the data - originally it is (1991, 13, 1292).

Related

Can the number of units in NN input layer be different than the number of features in the data?

Based on the tensorflow keras API tutorial;
model = keras.Sequential([
keras.layers.Dense(10, activation='softmax', input_shape=(32,)),
keras.layers.Dense(10, activation='softmax')
])
I couldn't understand that why the number of units in the input layer is 10 while the input shape is 32. Also, there are many examples like this one in the tensorflow tutorials.
This is a rather common confusion by new practitioners, and not without a reason: the answer, as it has already been hinted at in the comments, is that in the Keras Sequential API there is an implicit input layer, determined by the input_shape argument of the first explicit layer.
This is directly visible in the Keras Functional API (check the example in the docs), where Input is an explicit layer itself, and in which your model would be written as:
inputs = Input(shape=(32,)) # input layer
x = Dense(10, activation='softmax')(inputs) # hidden layer
outputs = Dense(10, activation='softmax')(x) # output layer
model = Model(inputs, outputs)
i.e. your model is actually an example of a "good old" neural net with three layers (input, hidden, and output), despite that it looks like a two-layer net in the Keras Sequential API.
(BTW, and irrelevant to the question, it does not make much sense to have softmax as activation for your hidden layer.)

Why is ReLU used in regression with Neural Networks?

I am following the official TensorFlow with Keras tutorial and I got stuck here: Predict house prices: regression - Create the model
Why is an activation function used for a task where a continuous value is predicted?
The code is:
def build_model():
model = keras.Sequential([
keras.layers.Dense(64, activation=tf.nn.relu,
input_shape=(train_data.shape[1],)),
keras.layers.Dense(64, activation=tf.nn.relu),
keras.layers.Dense(1)
])
optimizer = tf.train.RMSPropOptimizer(0.001)
model.compile(loss='mse', optimizer=optimizer, metrics=['mae'])
return model
The general reason for using non-linear activation functions in hidden layers is that, without them, no matter how many layers or how many units per layer, the network would behave just like a simple linear unit. This is nicely explained in this short video by Andrew Ng: Why do you need non-linear activation functions?
In your case, looking more closely, you'll see that the activation function of your final layer is not the relu as in your hidden layers, but the linear one (which is the default activation when you don't specify anything, like here):
keras.layers.Dense(1)
From the Keras docs:
Dense
[...]
Arguments
[...]
activation: Activation function to use (see activations). If you don't specify anything, no activation is applied (ie. "linear" activation: a(x) = x).
which is indeed what is expected for a regression network with a single continuous output.

Visual proof that neural network can approximate any function

I want to create a neural network that will exactly fit to my sample, however at some point the learning process stops. Different optimizers like adam or sgd also don't work.
Cybenko proved that the statement from the title is true for any sigmoidal function as an activation function so I want to use either sigmoid or tanh. I want also to keep only one hidden layer.
What are tweaks that I can use? Here is my setup (python3) and results:
features = np.linspace(-0.5,.5,2000).reshape(2000,1)
target = np.sin(12*features)
hn = 100
nn = models.Sequential()
#Only one hidden layer
nn.add(layers.Dense(units=hn, activation='sigmoid',
input_shape=(features.shape[1],)))
nn.add(layers.Dense(units=1, activation='linear'))
# Compile neural network
nn.compile(loss='mse',
optimizer='RMSprop',
metrics=['mse'])
nn.fit(features,
target,
epochs=200,
verbose=2,
batch_size=5)

How to apply CNN for multi-channel pixel data based weights to each channel?

I have an image with 8 channels.I have a conventional algorithm where weights are added to each of these channels to get an output as '0' or '1'.This works fine with several samples and complex scenarios. I would like implement the same in Machine Learning using CNN method.
I am new to ML and started looking out the tutorials which seem to be exclusively dealing with image processing problems- Hand writing recognition,Feature extraction etc.
http://cv-tricks.com/tensorflow-tutorial/training-convolutional-neural-network-for-image-classification/
https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/neural_networks.html
I have setup the Keras with Theano as background.Basic Keras samples are working without problem.
What steps do I require to follow in order achieve the same result using CNN ? I do not comprehend the use of filters,kernels,stride in my use case.How do we provide Training data to Keras if the pixel channel values and output are in the below form?
Pixel#1 f(C1,C2...C8)=1
Pixel#2 f(C1,C2...C8)=1
Pixel#3 f(C1,C2...C8)=0 .
.
Pixel#N f(C1,C2...C8)=1
I think you should treat this the same way you use CNN to do semantic segmentation. For an example look at
https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf
You can use the same architecture has they are using but for the first layer instead of using filters for 3 channels use filters for 8 channels.
For the loss function you can use the same loos function or something that is more specific for binary loss.
There are several implementation for keras but with tensorflow
backend
https://github.com/JihongJu/keras-fcn
https://github.com/aurora95/Keras-FCN
Since the input is in the form of channel values,that too in sequence.I would suggest you to use Convolution1D. Here,you are taking each pixel's channel values as the input and you need to predict for each pixel.Try this
eg :
Conv1D(filters, kernel_size, strides=1, padding='valid')
Conv1D()
MaxPooling1D(pool_size)
......
(Add many layers as you want)
......
Dense(1)
use binary_crossentropy as the loss function.

Cross Entropy Loss for Semantic Segmentation Keras

I'm pretty sure this is a silly question but I can't find it anywhere else so I'm going to ask it here.
I'm doing semantic image segmentation using a cnn (unet) in keras with 7 labels. So my label for each image is (7,n_rows,n_cols) using the theano backend. So across the 7 layers for each pixel, it's one-hot encoded. In this case, is the correct error function to use categorical cross-entropy? It seems that way to me but the network seems to learn better with binary cross-entropy loss. Can someone shed some light on why that would be and what the principled objective is?
Binary cross-entropy loss should be used with sigmod activation in the last layer and it severely penalizes opposite predictions. It does not take into account that the output is a one-hot coded and the sum of the predictions should be 1. But as mis-predictions are severely penalizing the model somewhat learns to classify properly.
Now to enforce the prior of one-hot code is to use softmax activation with categorical cross-entropy. This is what you should use.
Now the problem is using the softmax in your case as Keras don't support softmax on each pixel.
The easiest way to go about it is permute the dimensions to (n_rows,n_cols,7) using Permute layer and then reshape it to (n_rows*n_cols,7) using Reshape layer. Then you can added the softmax activation layer and use crossentopy loss. The data should also be reshaped accordingly.
The other way of doing so will be to implement depth-softmax :
def depth_softmax(matrix):
sigmoid = lambda x: 1 / (1 + K.exp(-x))
sigmoided_matrix = sigmoid(matrix)
softmax_matrix = sigmoided_matrix / K.sum(sigmoided_matrix, axis=0)
return softmax_matrix
and use it as a lambda layer:
model.add(Deconvolution2D(7, 1, 1, border_mode='same', output_shape=(7,n_rows,n_cols)))
model.add(Permute(2,3,1))
model.add(BatchNormalization())
model.add(Lambda(depth_softmax))
If tf image_dim_ordering is used then you can do way with the Permute layers.
For more reference check here.
I tested the solution of #indraforyou and think that the proposed method has some mistakes. As the commentsection does not allow for proper code segments, here is what I think would be the fixed version:
def depth_softmax(matrix):
from keras import backend as K
exp_matrix = K.exp(matrix)
softmax_matrix = exp_matrix / K.expand_dims(K.sum(exp_matrix, axis=-1), axis=-1)
return softmax_matrix
This method will expect the ordering of the matrix to be (height, width, channels).

Resources