I am trying to build a recurrent convolutional autoencoder in Tensorflow, but I am having trouble linking the convolutional autoencoder with the recurrent layer.
From my understanding the a Tensorflow RNNCell takes in an input of shape (batch_size, time_steps, info_vector), but my 1D convolutional layer has an output shape of (batch_size, info_vector). Is there a way to have tensorflow store the previous information vectors. Alternatively do I need to use a 2D convolution, add an extra time_step dimension to the input and then not convolve over that dimension?
Try to expand the dimensionality of the tensor:
cnn_out = last_output_of_cnn # for example shape [32,10]
cnn_out = tf.expand_dims(cnn_output, axis=-1) # new shape [32,10,1]
You can use this in the first layer of your RNN, where here "timestep" is 10.
I have been trying to get a deeper understanding of convolutional operation as I am implementing a convolutional neural network. But I am stuck while trying to calculate the backward pass or deconvolution.
Lets say the input is a 3 dimensional RGB image with dimension 3x7x7 The filter has the dimension 3x3x3. On convolving with stride set to 2 we will get an output of dimension 3x3.
Now here is my problem. I have read that deconvolution is the convolution of the output with flipped kernel. But on flipping the kernel, it will still be of dimension 3x3x3 and output is of dimension 3x3 which . The input was of dimension 3x7x7 . So, how is deconvolution calculated ?
Here is a nice visualisation how convolution and deconvolution (transposed convolution). The white pieces are simply zeros.
im trying to fit the data with the following shape to the pretrained keras vgg19 model.
image input shape is (32383, 96, 96, 3)
label shape is (32383, 17)
and I got this error
expected block5_pool to have 4 dimensions, but got array with shape (32383, 17)
at this line
model.fit(x = X_train, y= Y_train, validation_data=(X_valid, Y_valid),
batch_size=64,verbose=2, epochs=epochs,callbacks=callbacks,shuffle=True)
Here's how I define my model
model = VGG16(include_top=False, weights='imagenet', input_tensor=None, input_shape=(96,96,3),classes=17)
How did maxpool give me a 2d tensor but not a 4D tensor ? I'm using the original model from keras.applications.vgg16. How can I fix this error?
Your problem comes from VGG16(include_top=False,...) as this makes your solution to load only a convolutional part of VGG. This is why Keras is complaining that it got 2-dimensional output insted of 4-dimensional one (4 dimensions come from the fact that convolutional output has shape (nb_of_examples, width, height, channels)). In order to overcome this issue you need to either set include_top=True or add additional layers which will squash the convolutional part - to a 2d one (by e.g. using Flatten, GlobalMaxPooling2D, GlobalAveragePooling2D and a set of Dense layers - including a final one which should be a Dense with size of 17 and softmax activation function).
When convoluting a multi-channel image into one channel image, usually you can have only one bias variable(as output is one channel). If I want to set local biases, that is, set biases for each pixel of the output image, how shall I do this in caffe and torch?
In Tensorflow, this is very simple. your just set a bias matrix, for example:
data is 25(height)X25(width)X48(channels)
weights is 3X3(kernel size)X48(input channels)X1(output channels)
biases is 25X25,
hidden = tf.nn.conv2d(data, weights, [1, 1, 1, 1], padding='SAME')
output = tf.relu(hidden+biases)
Is there a similar solution in caffe ortorch?
For caffe, here is a scale layer post: Scale layer in Caffe. Scale layer can only provide one variable bias.
The answer is Bias layer. bias layer can have a weight matrix, treat it as biases.
For torch, torch has a nn.Add() layer, almost like the tensorflow's tf.add() function, so nn.Add() layer is the solution.
All these have been proved by actual models.
But still thank you very much #Shai
I am using the sigmoid cross entropy loss function for a multilabel classification problem as laid out by this tutorial. However, in both their results on the tutorial and my results, the output predictions are in the range (-Inf, Inf), while the range of a sigmoid is [0, 1]. Is the sigmoid only processed in the backprop? That is, shouldn't a forward pass squash the output?
In this example the input to the "SigmoidCrossEntropyLoss" layer is the output of a fully-connect layer. Indeed there are no constraints on the values of the outputs of an "InnerProduct" layer and they can be in range [-inf, inf].
However, if you examine carefully the "SigmoidCrossEntropyLoss" you'll notice that it includes a "Sigmoid" layer inside -- to ensure stable gradient estimation.
Therefore, at test time, you should replace the "SigmoidCrossEntropyLoss" with a simple "Sigmoid" layer to output per-class predictions.
If I have a feed-forward multilayer perceptron with sigmoid activation function, which is trained and has known weights, how can I find the equation of the curve that is approximated by the network (the curve that separates between 2 types of data)?
In general, there is no closed form solution for the input points where your NN output is 0.5 (or 0, in case of -1/1 instead of 0/1).
What is usually done for visualization in low-dimensional input space is gridding up the input space and computing the contours of the NN output. (The contours are smooth estimate of what the NN response surface looks like.)
In MATLAB, one would do
[X,Y] = meshgrid(linspace(-1,1), linspace(-1,1));
where f is your trained NN, and assuming [-1,1] x [-1,1] space.