We are implementing a GlobalAveragePooling on top of a masking layer causing Not Supported error.
We saw this solution but unfortunately this solution cant fit our situation.
We use a custom embedding algorithm, causing some samples to be all zeros, thus we can not mask with zero vectors e.g. self.model.add(Masking(mask_value=-9999., input_shape=(max_length, nr_in)))And some layers afterwards
avged = GlobalAveragePooling1D()(result, mask=max_len)
maxed = GlobalMaxPooling1D()(result, mask=max_len)
merged = merge([avged, maxed])
Is there a way to use the global average and max pooling methods with masks of this sort?
Thanks!
We are using Tensorflow 0.12.0 and keras 1.2.2
Related
Let's pretend that in plus of having an image, I also have a gradient from left to right on the X axis of an image, and another gradient from top to bottom on the Y axis. Those two gradients are of the same size of the image, and could both range from -0.5 to 0.5.
Now, I'd like to make the convolution kernel (a.k.a. convolution filter, or convolution weights) depend on the (x, y) location in the gradient. So the kernel is a function of the gradient as if the kernel was the output of a nested mini-neural net. This would make the weights of the filter to be different in every position, but slightly similar to their neighbors. How do I do that within PyTorch or TensorFlow?
Sure, I could compute a Toeplitz matrix (a.k.a. diagonal-constant matrix) by myself, but the matrix multiplication would take O(n^3) operations if pretending x==y==n, whereas convolutions can be implemented in O(n^2) normally. Or I could maybe iterate on every element myself and do the multiplications in an unvectorized fashion.
Any better ideas? I'd like to see creativity here, thinking about how could this be implemented neatly. I believe coding that would be an interesting way to build a network layer capable of doing things similar to a simplified version of a Spatial Transformer Networks, but which's spatial transformation would be independent of the image.
Here is a solution I thought for a simplified version of this problem where a linear combination of weights would be used rather than truly using a nested mini neural network:
It may be possible to do 4 different convolutions passes so as to have 4 feature maps, then to multiply those 4 maps with the gradients (2 vertical and 2 horizontal gradients), and add them together so that only 1 map remains. However, that would be a linear combination of the different maps which is simpler than truly using a nested neural network which would alter the kernel in the first place.
Thinking more about it, here is a solution to an equivalent question. The thing with this solution is that it flips the problem around by placing the "mini neural net" after rather than before, and in a quite different way. So it solves the problem, but offer a much different optimization space and convergence behavior which is less natural for me to think about than how I formulated the problem.
In a sense, a solution to the problem could be very similar to simply concatenating the two gradients to 1 regular feature map (from a regular convolution) such as having a depth of d_2 = d_1 + 2 after the concatenation), and then performing more convolutions on top of this. I won't prove why this is a valid solution to an equivalent problem, but I thought through this and it seems provable.
The optimization space (for the weights) would be here very different and I think it wouldn't converge with the same behavior. I'd like to know what you people think about this solution in terms of optimization convergence.
The reason why convolutions are more efficient than fully connected layers is because they are translation invariant. If you wish to have convolutions which are dependent on location you would need to add two extra parameters to the convolution i.e. having N+2 input channels where x, y coord are the values of the two additonal channels (as in e.g. CoordConv).
As for alternative solutions, is the gradient meaningful? If not, and it is uniform across all images, it might be better to just manually remove it in the pre-processing stage (similar to orientation correction, cropping etc). If it is not (e.g. differences in lighting, shadows) then including other layers under the assumption they would learn the invariance of different lightings is a common hands-off approach.
I recently found the "global_pooling" flag in the Pooling layer in caffe, however was unable to find sth about it in the documentation here (Layer Catalogue)
nor here (Pooling doxygen doc) .
Is there an easy forward examply explanation to this in comparison to the normal Pool-Layer behaviour?
With Global pooling reduces the dimensionality from 3D to 1D. Therefore Global pooling outputs 1 response for every feature map. This can be the maximum or the average or whatever other pooling operation you use.
It is often used at the end of the backend of a convolutional neural network to get a shape that works with dense layers. Therefore no flatten has to be applied.
Convolutions can work on any image input size (which is big enough). However, if you have a fully connected layer at the end, this layer needs a fixed input size. Hence the complete network needs a fixed image input size.
However, you can remove the fully connected layer and just work with convolutional layers. You can make a convolutional layer at the end which has the same number of filters as you have classes. But you want one value for each class which indicates the probability of that class. Hence you apply a pooling filter over the complete remaining feature map. This pooling is hence "global" as it always is as big as necessary. In contrast, usual pooling layers have a fixed size (e.g. of 2x2 or 3x3).
This is a general concept. You can also find global pooling in other libraries, e.g. Lasagne. If you want a good reference in literature, I recommend reading Network In Network.
We get only one value from entire feature map when we apply GP layer, in which kernel size is the h×w of the feature map. GP layers are used to reduce the spatial dimensions of a three-dimensional feature map. However, GP layers perform a more extreme type of dimensionality reduction, where a feature map with dimensions h×w×d is reduced in size to have dimensions 1×1×d. GP layers reduce each h×w feature map to a single number by simply taking the average of all hw values.
If you are looking for information regarding flags/parameters of caffe, it is best look them up in the comments of '$CAFFE_ROOT/src/caffe/proto/caffe.proto'.
For 'global_pooling' parameter the comment says:
// If global_pooling then it will pool over the size of the bottom by doing
// kernel_h = bottom->height and kernel_w = bottom->width
For more information about caffe layers, see this help pages.
I'm trying to implement deepdream in theano. As far as I understood I need to choose a layer in a pre-trained neural network to be the objective , then I need to set the activation functions of that layer to be equal to gradients, then backpropagate to the input image and finally apply the changes.
I'm new to theano and I can't think of how to implement it, how can I set the layer's values to be equal to gradients and backpropagate back to image? Is it possible to do in a couple of lines in theano?
Also if my understanding is wrong, please correct me.
Can the Keras deal with input images with different size? For example, in the fully convolutional neural network, the input images can have any size. However, we need to specify the input shape when we create a network by Keras. Therefore, how can we use Keras to deal with different input size without resizing the input images to the same size? Thanks for any help.
Yes.
Just change your input shape to shape=(n_channels, None, None).
Where n_channels is the number of channels in your input image.
I'm using Theano backend though, so if you are using tensorflow you might have to change it to (None,None,n_channels)
You should use:
input_shape=(1, None, None)
None in a shape denotes a variable dimension. Note that not all layers
will work with such variable dimensions, since some layers require
shape information (such as Flatten).
https://github.com/fchollet/keras/issues/1920
For example, using keras's functional API your input layer would be:
For a RGB dataset
inp = Input(shape=(3,None,None))
For a Gray dataset
inp = Input(shape=(1,None,None))
Implementing arbitrarily sized input arrays with the same computational kernels can pose many challenges - e.g. on a GPU, you need to know how big buffers to reserve, and more weakly how much to unroll your loops, etc. This is the main reason that Keras requires constant input shapes, variable-sized inputs are too painful to deal with.
This more commonly occurs when processing variable-length sequences like sentences in NLP. The common approach is to establish an upper bound on the size (and crop longer sequences), and then pad the sequences with zeros up to this size.
(You could also include masking on zero values to skip computations on the padded areas, except that the convolutional layers in Keras might still not support masked inputs...)
I'm not sure if for 3D data structures, the overhead of padding is not prohibitive - if you start getting memory errors, the easiest workaround is to reduce the batch size. Let us know about your experience with applying this trick on images!
Just use None while specifying input shape. But I still do not know how to pass different-shaped images into fit function.
I'm trying to add noise and blur functions to my project in Cuda and after quite some research i've hit a bit of a stumbling block, I've read up on the Gaussian blur matrix but i'm still having trouble getting a working piece of code which would be able to blur certain parts of an image, I've managed to get a form of noise to show.
If anyone could give a bit of help in either explaining how to implement a Gaussian or a simpler blur method or even providing a bit of code which implements blurring.
Gratefully appreciated!!
Gaussian blur is a separable filter, so you can apply the 1D kernel first to all the rows in your ROI and then to the columns of the blurred rows.
The tricky part with CUDA is that this is a neighbourhood operation, so typically you will need to have each block overlap by half the kernel size in order to get the required neighbourhood pixels into shared memory.
FYI, these are two separate questions and should be asked separately in this site.
Regarding the blur - for large blur kernels (strong blurs) the best approach is to use the FFT on the image and on a Gaussian noise kernel image then multiply the results using the complex multiplication and inverse FFT that result. You will have to implement a FFT-Shift function yourself and if you are using color images, you will have to split the image into a separate buffer per channel.
For small blur kernels (gentile blurs) the simplest approach is for each pixel in the result image, sum nearby pixels in the source image (with a Gaussian weight function).
Regarding the noise - test easiest approach is to load a pre-generated pseudo-random generator's result image into CUDA after transforming it from uniformly distributed random numbers to normal distributed random numbers. E.g. this question.
The a correctly size region in the random image should be multiplied by the noise sigma and added to the source image to receive the result.
Last time I checked there was no random buffer generation solution for CUDA, however, that was a few years ago.
Update: CUDA now has cuRand so you should be able to generator random numbers instead of using a pregenerated random buffer.