Pytorch VNet final softmax activation layer for segmentation. Different channel dimensions to labels. How do I get prediction output? - machine-learning

I am trying to build a V-Net. When I pass the images to segment during training, the output has 2 channels after the softmax activation (as specified in the architecture in the attached image) but the label and input has 1. How do I convert this such that output is the segmented image? Do I just take one of the channels as the final output when training (e.g output = output[:, 0, :, :, :]) and the other channel would be background?
outputs = network(inputs)
batch_size = 32
outputs.shape: [32, 2, 64, 128, 128]
inputs.shape: [32, 1, 64, 128, 128]
labels.shape: [32, 1, 64, 128, 128]
Here is my Vnet forward pass:
def forward(self, x):
# Initial input transition
out = self.in_tr(x)
# Downward transitions
out, residual_0 = self.down_depth0(out)
out, residual_1 = self.down_depth1(out)
out, residual_2 = self.down_depth2(out)
out, residual_3 = self.down_depth3(out)
# Bottom layer
out = self.up_depth4(out)
# Upward transitions
out = self.up_depth3(out, residual_3)
out = self.up_depth2(out, residual_2)
out = self.up_depth1(out, residual_1)
out = self.up_depth0(out, residual_0)
# Pass to convert to 2 channels
out = self.final_conv(out)
# return softmax
out = F.softmax(out)
return out [batch_size, 2, 64, 128, 128]
V Net architecture as described in (https://arxiv.org/pdf/1606.04797.pdf)

That paper has two outputs as they predict two classes:
The network predictions, which consist of two volumes having the same resolution as the original input data, are processed through a soft-max layer which
outputs the probability of each voxel to belong to foreground and to background.
Therefore this is not an autoencoder, where your inputs are passed back through the model as outputs. They use a set of labels which distinguish between their pixels of interest (foreground) and other (background). You will need to change your data if you wish to use the V-net in this manner.
It won't be as simple as designating a channel as output because this will be a classification task rather than a regression task. You will need annotated labels to work with this model architecture.

Related

Is hidden and output the same for a GRU unit in Pytorch?

I do understand conceptually what an LSTM or GRU should (thanks to this question What's the difference between "hidden" and "output" in PyTorch LSTM?) BUT when I inspect the output of the GRU h_n and output are NOT the same while they should be...
(Pdb) rnn_output
tensor([[[ 0.2663, 0.3429, -0.0415, ..., 0.1275, 0.0719, 0.1011],
[-0.1272, 0.3096, -0.0403, ..., 0.0589, -0.0556, -0.3039],
[ 0.1064, 0.2810, -0.1858, ..., 0.3308, 0.1150, -0.3348],
...,
[-0.0929, 0.2826, -0.0554, ..., 0.0176, -0.1552, -0.0427],
[-0.0849, 0.3395, -0.0477, ..., 0.0172, -0.1429, 0.0153],
[-0.0212, 0.1257, -0.2670, ..., -0.0432, 0.2122, -0.1797]]],
grad_fn=<StackBackward>)
(Pdb) hidden
tensor([[[ 0.1700, 0.2388, -0.4159, ..., -0.1949, 0.0692, -0.0630],
[ 0.1304, 0.0426, -0.2874, ..., 0.0882, 0.1394, -0.1899],
[-0.0071, 0.1512, -0.1558, ..., -0.1578, 0.1990, -0.2468],
...,
[ 0.0856, 0.0962, -0.0985, ..., 0.0081, 0.0906, -0.1234],
[ 0.1773, 0.2808, -0.0300, ..., -0.0415, -0.0650, -0.0010],
[ 0.2207, 0.3573, -0.2493, ..., -0.2371, 0.1349, -0.2982]],
[[ 0.2663, 0.3429, -0.0415, ..., 0.1275, 0.0719, 0.1011],
[-0.1272, 0.3096, -0.0403, ..., 0.0589, -0.0556, -0.3039],
[ 0.1064, 0.2810, -0.1858, ..., 0.3308, 0.1150, -0.3348],
...,
[-0.0929, 0.2826, -0.0554, ..., 0.0176, -0.1552, -0.0427],
[-0.0849, 0.3395, -0.0477, ..., 0.0172, -0.1429, 0.0153],
[-0.0212, 0.1257, -0.2670, ..., -0.0432, 0.2122, -0.1797]]],
grad_fn=<StackBackward>)
they are some transpose of each other...why?
They are not really the same. Consider that we have the following Unidirectional GRU model:
import torch.nn as nn
import torch
gru = nn.GRU(input_size = 8, hidden_size = 50, num_layers = 3, batch_first = True)
Please make sure you observe the input shape carefully.
inp = torch.randn(1024, 112, 8)
out, hn = gru(inp)
Definitely,
torch.equal(out, hn)
False
One of the most efficient ways that helped me to understand the output vs. hidden states was to view the hn as hn.view(num_layers, num_directions, batch, hidden_size) where num_directions = 2 for bidirectional recurrent networks (and 1 other wise, i.e., our case). Thus,
hn_conceptual_view = hn.view(3, 1, 1024, 50)
As the doc states (Note the italics and bolds):
h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len (i.e., for the last timestep)
In our case, this contains the hidden vector for the timestep t = 112, where the:
output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features h_t from the last layer of the GRU, for each t. If a torch.nn.utils.rnn.PackedSequence has been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated using output.view(seq_len, batch, num_directions, hidden_size), with forward and backward being direction 0 and 1 respectively.
So, consequently, one can do:
torch.equal(out[:, -1], hn_conceptual_view[-1, 0, :, :])
True
Explanation: I compare the last sequence from all batches in out[:, -1] to the last layer hidden vectors from hn[-1, 0, :, :]
For Bidirectional GRU (requires reading the unidirectional first):
gru = nn.GRU(input_size = 8, hidden_size = 50, num_layers = 3, batch_first = True bidirectional = True)
inp = torch.randn(1024, 112, 8)
out, hn = gru(inp)
View is changed to (since we have two directions):
hn_conceptual_view = hn.view(3, 2, 1024, 50)
If you try the exact code:
torch.equal(out[:, -1], hn_conceptual_view[-1, 0, :, :])
False
Explanation: This is because we are even comparing wrong shapes;
out[:, 0].shape
torch.Size([1024, 100])
hn_conceptual_view[-1, 0, :, :].shape
torch.Size([1024, 50])
Remember that for bidirectional networks, hidden states get concatenated at each time step where the first hidden_state size (i.e., out[:, 0, :50]) are the the hidden states for the forward network, and the other hidden_state size are for the backward (i.e., out[:, 0, 50:]). The correct comparison for the forward network is then:
torch.equal(out[:, -1, :50], hn_conceptual_view[-1, 0, :, :])
True
If you want the hidden states for the backward network, and since a backward network processes the sequence from time step n ... 1. You compare the first timestep of the sequence but the last hidden_state size and changing the hn_conceptual_view direction to 1:
torch.equal(out[:, -1, :50], hn_conceptual_view[-1, 1, :, :])
True
In a nutshell, generally speaking:
Unidirectional:
rnn_module = nn.RECURRENT_MODULE(num_layers = X, hidden_state = H, batch_first = True)
inp = torch.rand(B, S, E)
output, hn = rnn_module(inp)
hn_conceptual_view = hn.view(X, 1, B, H)
Where RECURRENT_MODULE is either GRU or LSTM (at the time of writing this post), B is the batch size, S sequence length, and E embedding size.
torch.equal(output[:, S, :], hn_conceptual_view[-1, 0, :, :])
True
Again we used S since the rnn_module is forward (i.e., unidirectional) and the last timestep is stored at the sequence length S.
Bidirectional:
rnn_module = nn.RECURRENT_MODULE(num_layers = X, hidden_state = H, batch_first = True, bidirectional = True)
inp = torch.rand(B, S, E)
output, hn = rnn_module(inp)
hn_conceptual_view = hn.view(X, 2, B, H)
Comparison
torch.equal(output[:, S, :H], hn_conceptual_view[-1, 0, :, :])
True
Above is the forward network comparison, we used :H because the forward stores its hidden vector in the first H elements for each timestep.
For the backward network:
torch.equal(output[:, 0, H:], hn_conceptual_view[-1, 1, :, :])
True
We changed the direction in hn_conceptual_view to 1 to get hidden vectors for the backward network.
For all examples we used hn_conceptual_view[-1, ...] because we are only interested in the last layer.
There are three things you have to remember to make sense of this in PyTorch.
This answer is written on the assumption that you are using something like torch.nn.GRU or the like, and that if you are making a multi-layer RNN with it, that you are using the num_layers argument to do so (rather than building one from scratch out of individual layers yourself.)
The output will give you the hidden layer outputs of the network for each time-step, but only for the final layer. This is useful in many applications, particularly encoder-decoders using attention. (These architectures build up a 'context' layer from all the hidden outputs, and it is extremely useful to have them sitting around as a self-contained unit.)
The h_n will give you the hidden layer outputs for the last time-step only, but for all the layers. Therefore, if and only if you have a single layer architecture, h_n is a strict subset of output. Otherwise, output and h_n intersect, but are not strict subsets of one another. (You will often want these, in an encoder-decoder model, from the encoder in order to jumpstart the decoder.)
If you are using a bidirectional output and you want to actually verify that part of h_n is contained in output (and vice-versa) you need to understand what PyTorch does behind the scenes in the organization of the inputs and outputs. Specifically, it concatenates a time-reversed input with the time-forward input and runs them together. This is literal. This means that the 'forward' output at time T is in the final position of the output tensor sitting right next to the 'reverse' output at time 0; if you're looking for the 'reverse' output at time T, it is in the first position.
The third point in particular drove me absolute bonkers for about three hours the first time I was playing RNNs and GRUs. In fairness, it is also why h_n is provided as an output, so once you figure it out, you don't have to worry about it any more, you just get the right stuff from the return value.
Is Not the transpose ,
you can get rnn_output = hidden[-1] when the layer of lstm is 1
hidden is a output of every cell every layer, it's shound be a 2D array for a specifc input time step , but lstm return all the time step , so the output of a layer should be hidden[-1]
and this situation discussed when batch is 1 , or the dimention of output and hidden need to add one

OneClass SVM model using pretrained ResNet50 network

I'm trying to build OneClass classifier for image recognition. I found this article, but because I have no full source code I don't exactly understand what am i doing.
X_train, X_test, y_train, y_test = train_test_split(x, y, random_state=42)
# X_train (2250, 200, 200, 3)
resnet_model = ResNet50(input_shape=(200, 200, 3), weights='imagenet', include_top=False)
features_array = resnet_model.predict(X_train)
# features_array (2250, 7, 7, 2048)
pca = PCA(svd_solver='randomized', n_components=450, whiten=True, random_state=42)
svc = SVC(kernel='rbf', class_weight='balanced')
model = make_pipeline(pca, svc)
param_grid = {'svc__C': [1, 5, 10, 50], 'svc__gamma': [0.0001, 0.0005, 0.001, 0.005]}
grid = GridSearchCV(model, param_grid)
grid.fit(X_train, y_train)
I have 2250 images (food and not food) 200x200px size, I send this data to predict method of ResNet50 model. Result is (2250, 7, 7, 2048) tensor, any one know what this dimensionality does it mean?
When I try to run grid.fit method i get an error:
ValueError: Found array with dim 4. Estimator expected <= 2.
These are the findings I could make.
You are getting the output tensor above the global average pooling layer. (See resnet_model.summary() to know about how input dimension changes to output dimension)
For a simple fix, add an Average pooling 2d Layer on top of resnet_model.
(So that output shape becomes (2250,1,1, 2048))
resnet_model = ResNet50(input_shape=(200, 200, 3), weights='imagenet', include_top=False)
resnet_op = AveragePooling2D((7, 7), name='avg_pool_app')(resnet_model.output)
resnet_model = Model(resnet_model.input, resnet_op, name="ResNet")
This generally is present in the source code of ResNet50 itself. Basically we are appending an AveragePooling2D layer to the resnet50 model. The last line combines the layer (2nd line) and the base line model into a model object.
Now the output dimension (feature_array) will be (2250, 1, 1, 2048) (because of added average pooling layer).
To avoid the ValueError you ought to reshape this feature_array to (2250, 2048)
feature_array = np.reshape(feature_array, (-1, 2048))
In the last line of the program in the question,
grid.fit(X_train, y_train)
you have fit with X_train (which are images in this case). The correct variable here is features_array (This is considered to be summary of the image). Entering this line will rectify the error,
grid.fit(features_array, y_train)
For more finetuning in this fashion by extracting feature vectors do look here (training with neural nets instead of using PCA and SVM).
Hope this helps!!

Weights from Conv Layer when applied to image layer gives saturated output

I am visualizing my first layer output with image_layer when applied with trained weight. However, when I try to visualize, I get white images as follows:
Ignore the last four, the filters are of size 7x7 and there are 32 of them.
The model is built on the following architecture (Code Attached):
import numpy as np
import tensorflow as tf
import cv2
from matplotlib import pyplot as plt
% matplotlib inline
model_path = "T_set_4/Model/model.ckpt"
# Define the model parameters
# Convolutional Layer 1.
filter_size1 = 7 # Convolution filters are 7 x 7 pixels.
num_filters1 = 32 # There are 32 of these filters.
# Convolutional Layer 2.
filter_size2 = 7 # Convolution filters are 7 x 7 pixels.
num_filters2 = 64 # There are 64 of these filters.
# Fully-connected layer.
fc_size = 512 # Number of neurons in fully-connected layer.
# Define the data dimensions
# We know that MNIST images are 48 pixels in each dimension.
img_size = 48
# Images are stored in one-dimensional arrays of this length.
img_size_flat = img_size * img_size
# Tuple with height and width of images used to reshape arrays.
img_shape = (img_size, img_size)
# Number of colour channels for the images: 1 channel for gray-scale.
num_channels = 1
# Number of classes, one class for each of 10 digits.
num_classes = 2
def new_weights(shape):
return tf.Variable(tf.truncated_normal(shape, stddev=0.05))
def new_biases(length):
return tf.Variable(tf.constant(0.05, shape=[length]))
def new_conv_layer(input, # The previous layer.
num_input_channels, # Num. channels in prev. layer.
filter_size, # Width and height of each filter.
num_filters, # Number of filters.
use_pooling=True): # Use 2x2 max-pooling.
# Shape of the filter-weights for the convolution.
# This format is determined by the TensorFlow API.
shape = [filter_size, filter_size, num_input_channels, num_filters]
# Create new weights aka. filters with the given shape.
weights = new_weights(shape=shape)
# Create new biases, one for each filter.
biases = new_biases(length=num_filters)
# Create the TensorFlow operation for convolution.
# Note the strides are set to 1 in all dimensions.
# The first and last stride must always be 1,
# because the first is for the image-number and
# the last is for the input-channel.
# But e.g. strides=[1, 2, 2, 1] would mean that the filter
# is moved 2 pixels across the x- and y-axis of the image.
# The padding is set to 'SAME' which means the input image
# is padded with zeroes so the size of the output is the same.
layer = tf.nn.conv2d(input=input,
filter=weights,
strides=[1, 1, 1, 1],
padding='SAME')
# Add the biases to the results of the convolution.
# A bias-value is added to each filter-channel.
layer += biases
# Rectified Linear Unit (ReLU).
# It calculates max(x, 0) for each input pixel x.
# This adds some non-linearity to the formula and allows us
# to learn more complicated functions.
layer = tf.nn.relu(layer)
# Use pooling to down-sample the image resolution?
if use_pooling:
# This is 2x2 max-pooling, which means that we
# consider 2x2 windows and select the largest value
# in each window. Then we move 2 pixels to the next window.
layer = tf.nn.max_pool(value=layer,
ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1],
padding='SAME')
# norm1
norm1 = tf.nn.lrn(layer, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75,
name='norm1')
# Note that ReLU is normally executed before the pooling,
# but since relu(max_pool(x)) == max_pool(relu(x)) we can
# save 75% of the relu-operations by max-pooling first.
# We return both the resulting layer and the filter-weights
# because we will plot the weights later.
return layer, weights
def flatten_layer(layer):
# Get the shape of the input layer.
layer_shape = layer.get_shape()
# The shape of the input layer is assumed to be:
# layer_shape == [num_images, img_height, img_width, num_channels]
# The number of features is: img_height * img_width * num_channels
# We can use a function from TensorFlow to calculate this.
num_features = layer_shape[1:4].num_elements()
# Reshape the layer to [num_images, num_features].
# Note that we just set the size of the second dimension
# to num_features and the size of the first dimension to -1
# which means the size in that dimension is calculated
# so the total size of the tensor is unchanged from the reshaping.
layer_flat = tf.reshape(layer, [-1, num_features])
# The shape of the flattened layer is now:
# [num_images, img_height * img_width * num_channels]
# Return both the flattened layer and the number of features.
return layer_flat, num_features
def new_fc_layer(input, # The previous layer.
num_inputs, # Num. inputs from prev. layer.
num_outputs, # Num. outputs.
use_relu=True): # Use Rectified Linear Unit (ReLU)?
# Create new weights and biases.
weights = new_weights(shape=[num_inputs, num_outputs])
biases = new_biases(length=num_outputs)
# Calculate the layer as the matrix multiplication of
# the input and weights, and then add the bias-values.
layer = tf.matmul(input, weights) + biases
# Use ReLU?
if use_relu:
layer = tf.nn.relu(layer)
return layer
# Create the model
tf.reset_default_graph()
x = tf.placeholder(tf.float32, shape=[None, img_size_flat], name='x')
x_image = tf.reshape(x, [-1, img_size, img_size, num_channels])
y_true = tf.placeholder(tf.float32, shape=[None, num_classes],
name='y_true')
y_true_cls = tf.argmax(y_true, dimension=1)
# Create the model footprint
layer_conv1, weights_conv1 = new_conv_layer(input=x_image,
num_input_channels=num_channels,
filter_size=filter_size1,
num_filters=num_filters1,
use_pooling=True)
layer_conv2, weights_conv2 = new_conv_layer(input=layer_conv1,
num_input_channels=num_filters1,
filter_size=filter_size2,
num_filters=num_filters2,
use_pooling=True)
layer_flat, num_features = flatten_layer(layer_conv2)
layer_fc1 = new_fc_layer(input=layer_flat,
num_inputs=num_features,
num_outputs=fc_size,
use_relu=True)
layer_fc2 = new_fc_layer(input=layer_fc1,
num_inputs=fc_size,
num_outputs=num_classes,
use_relu=False)
y_pred = tf.nn.softmax(layer_fc2)
y_pred_cls = tf.argmax(y_pred, dimension=1)
# Restore the model
saver = tf.train.Saver()
session = tf.Session()
saver.restore(session, model_path)
The code I followed to create visualized weights is from the following:
Source code
Can someone tell me is the training or network is too shallow?
This is a perfectly fine visualization of the feature maps (not the weights) produced by the first convolutional layer on the early stages of the training.
The first layers learn to extract simple features, the learning process is somehow slow and thus you first learn to "blur" the input images, but once the network starts to converge you'll see that the first layers will start extracting meaningful low-level features (edges and so on).
Just monitor the training process and let the network training a bit more.
If, instead, you got bad performance (always look at the validation accuracy) your feature maps will always look noisy and you should start tuning the hyperparameters (lowering the learning rate, regularize, ...) in order to extract meaningful features and thus get good results

The dark mystery of tensorflow and tensorboard using cross-validation in training. Weird graphs showing up

This is the first time I'm using tensorboard, as I am getting a weird bug for my graph.
This is what I get if I open up the 'STEP' window.
However, this is what I get if I open up the 'RELATIVE'. (Similary when opening the 'WALL' window).
In addition to that, to test the performance of the model, I apply cross-validation every few steps. The accuracy of this cross-validation drops from ~10% (random guessing), to 0% after some time. I am not sure where I have made a mistake, as I am not a pro with tensorflow, but I suspect my problem to be in the graph building. The code looks as follows:
def initialize_parameters():
global_step = tf.get_variable("global_step", shape=[], trainable=False,
initializer=tf.constant_initializer(1), dtype=tf.int64)
Weights = {
"W_Conv1": tf.get_variable("W_Conv1", shape=[3, 3, 1, 64],
initializer=tf.random_normal_initializer(mean=0.00, stddev=0.01),
),
...
"W_Affine3": tf.get_variable("W_Affine3", shape=[128, 10],
initializer=tf.random_normal_initializer(mean=0.00, stddev=0.01),
)
}
Bias = {
"b_Conv1": tf.get_variable("b_Conv1", shape=[1, 16, 8, 64],
initializer=tf.random_normal_initializer(mean=0.00, stddev=0.01),
),
...
"b_Affine3": tf.get_variable("b_Affine3", shape=[1, 10],
initializer=tf.random_normal_initializer(mean=0.00, stddev=0.01),
)
}
return Weights, Bias, global_step
def build_model(W, b, global_step):
keep_prob = tf.placeholder(tf.float32)
learning_rate = tf.placeholder(tf.float32)
is_training = tf.placeholder(tf.bool)
## 0.Layer: Input
X_input = tf.placeholder(shape=[None, 16, 8], dtype=tf.float32, name="X_input")
y_input = tf.placeholder(shape=[None, 10], dtype=tf.int8, name="y_input")
inputs = tf.reshape(X_input, (-1, 16, 8, 1)) #must be a 4D input into the CNN layer
inputs = tf.contrib.layers.batch_norm(
inputs,
center=False,
scale=False,
is_training=is_training
)
## 1. Layer: Conv1 (64, stride=1, 3x3)
inputs = layer_conv(inputs, W['W_Conv1'], b['b_Conv1'], is_training)
...
## 7. Layer: Affine 3 (128 units)
logits = layer_affine(inputs, W['W_Affine3'], b['b_Affine3'], is_training)
## 8. Layer: Softmax, or loss otherwise
predict = tf.nn.softmax(logits) #should be an argmax, or should this even go through
## Output: Loss functions and model trainers
loss = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(
labels=y_input,
logits=logits
)
)
trainer = tf.train.GradientDescentOptimizer(
learning_rate=learning_rate
)
updateModel = trainer.minimize(loss, global_step=global_step)
## Test Accuracy
correct_pred = tf.equal(tf.argmax(y_input, 1), tf.argmax(predict, 1))
acc_op = tf.reduce_mean(tf.cast(correct_pred, "float"))
return X_input, y_input, loss, predict, updateModel, keep_prob, learning_rate, is_training
Now I suspect my error to be in the definition of the loss-function of the graph, but I am not sure. Any idea what the problem could be? Or does the model converge correctly and all those errors are expected?
Yes I think you are runing same model more than once with your cross-validation implementation.
Just try at the end of every loop
session.close()
I suspect you are getting such strange output (and I have seen similar myself) because you are running the same model more than once and it is saving the Tensorboard output in exactly the same place. I can't see in your code how you name the file where you are putting the output? Try to make the file path in this part of code unique:
`summary_writer = tf.summary.FileWriter(unique_path_to_log, sess.graph)`
You can also try to locate the directory where your existing output has bene put in and try to remove the files that have the older (or newer?) timestamps and this way Tensorboard will not be confused as to which one to use.

Transposed convolution on feature maps using Theano

I asked similar question on CrossValidation for the image interpretation. I'm moving my detailed question here to include some code details.
The results I'm having are not fully desirable So maybe you have faced this issue before and you can help me find it out.
It is fully convolution neural network "no fully connected part".
Training part
first the images are transposed to match the convolution function. (batch_no,img_channels,width,height)
input.transpose(0, 3, 1, 2)
Learning optimized using learning rate:3e-6, Hu_uniform initialization and nestrove for 500 epochs until this convergence.
Training cost: 1.602449
Training loss: 4.610442
validation error: 5.126761
Test loss: 5.885714
Backward part
Loading Image
jpgfile = np.array(Image.open(join(testing_folder,img_name)))
Reshape to one batch
batch = jpgfile.reshape(1, jpgfile.shape[0], jpgfile.shape[1], 3)
Run the model to extract first feature map after activation using Relu
output = classifier.layer0.output
Test_model = theano.function(
inputs=[x],
outputs=output,
)
layer_Fmaps = Test_model(test_set_x)
Apply backwork model to reconstruct the image using the only activated
neurons
bch, ch, row, col = layer_Fmaps.shape
output_grad_reshaped = layer_Fmaps.reshape((-1, 1, row, col))
output_grad_reshaped = output_grad_reshaped[0].reshape(1,1,row,col)
input_shape = (1, 3, 226, 226)
W = classifier.layer0.W.get_value()[0].reshape(1,3,7,7)
kernel = theano.shared(W)
inp = T.tensor4('inp')
deconv_out = T.nnet.abstract_conv.conv2d_grad_wrt_inputs(
output_grad = inp,
filters=kernel,
input_shape= input_shape,
filter_shape=(1,3,7,7),
border_mode=(0,0),
subsample=(1,1)
)
f = theano.function(
inputs = [inp],
outputs= deconv_out)
f_out = f(output_grad_reshaped)
deconved_relu = T.nnet.relu(f_out)[0].transpose(1,2,0)
deconved = f_out[0].transpose(1,2,0)
Here we have two images results, the first is the transposed image without activation and the second with relu since kernels might have some negative weights.
It is clear from the transposed convolution image that this kernel is learn to detect some useful feature related to this image. But the reconstructing part is breaking the image color scheme during the transpose convolution. It might be because the pixels values are small float numbers. Do you see where is the problem here ?

Resources