How exactly ZeroPadding2D works in Keras? If I apply padding=(0,40) attribute to a (96,1366,1) image, I get (96,1440,1), but I don't understand how 1440 is the result from computation.
It's just adding 40 to both sides of your image (adding 80 in total).
https://keras.io/layers/convolutional/#zeropadding2d
I really beleive that your values are either:
input (96,1360,1) and output (96,1440,1); or
input (96,1366,1) and output (96,1446,1)
Related
I am working on a machine learning project to learn more about this field. The project is about image classification. I want to use the EffnetB0 architecure and they mention in this architecure they use in the fisrt stage the following layer: "Conv3X3" and the following layers they use "MBConv1".
I tried to understand the difference between these two layers but I can't seem to find the answer. These two layers are both convolutional layers right ?
But what exactly is the difference between "Conv" and "MBConv"?
Thank you for helping me!
A conv means that there is a convolution core to scan the matrix corresponding to the target image line by line and convolution, the result of each convolution constitutes a value of the output matrix.
About the MBConv,i think you means mobile inverted bottleneck convolution,it's more of an encapsulated module than a single conv layer. A MBConv's structure can be expressed as follows:
MBConv = 1x1conv(ascending dimension) + Depthwise Convolution + SENet + 1x1conv(dimensionality reduction) + add
By the way, you may notice the new names Depthwise Convolution and SENet, which are also a kind of modules(honestly, it's like a nesting doll)
If you just want to use it, you don't necessarily need to fully understand it until you need to improve your model structure. So my answer to your question
What is the difference between these two layers : CONV and MBConv?
is : the former is a simple layer, and the latter is a complex module made up of many simple layers
I have to binarize the output o of a torch model (lua script), the value range is [-1,+1], i want to threshold those values in such a way that:
0 if o[i]<0
1 if o[i]>=0
The output is composed by 32 layers with size 1x1 float tensors, so 32 floats, i want to get 32 bits from those 32 floats but i cannot find a layer that allows to do that.
At the moment I have a for cycle that checks the value of each level but it is very slow.
Maybe I can use the threshold layer or implement one by my own, do you have any advice?
You can use the 'greater or equal than' operator https://github.com/torch/torch7/blob/master/doc/maths.md#torchgea-b
local threshold_tensor = o:ge(0)
In the Tensorflow example "Deep MNIST for Experts" https://www.tensorflow.org/get_started/mnist/pros
I am not clear how to determine the feature number specified in weight of activation function.
For example:
We can now implement our first layer. It will consist of convolution,
followed by max pooling. The convolution will compute 32 features for
each 5x5 patch.
W_conv1 = weight_variable([5, 5, 1, 32])
Why 32 is picked here?
In order to build a deep network, we stack several layers of this
type. The second layer will have 64 features for each 5x5 patch.
W_conv2 = weight_variable([5, 5, 32, 64])
Again, why 64 is picked?
Now that the image size has been reduced to 7x7, we add a
fully-connected layer with 1024 neurons to allow processing on the
entire image.
W_fc1 = weight_variable([7 * 7 * 64, 1024])
Why 1024 here?
Thanks
Each of these filters will actually do something, like check for edges, check for colour change, or right-shift, left-shit the image, sharpen, blur etc.
Each of these filters are actually working on finding out the meaning of the image by sharpening, enhancing, smoothening, intensifying etc.
For e.g. check this link which explains the meaning of these filters
http://setosa.io/ev/image-kernels/
So all these filters are actually neurons where the output will be max-pooled and eventually fed into a FC layer after some activation.
If you are looking for just understanding the filters, that is another approach. However if you are looking to learn how conv. architectures work but since these are tried and tested filters over the dataset, you hsould just go with it for now.
The filters also learn through Backprop.
32 and 64 are number of filters in the respective layers.
1024 is the number of output neurons in the fully connected layer.
Your question basically is about the reason behind the choice of these hyperparameters.
There is no mathematical or programming reason behind these specific choices. These have been picked up after experiments as they delivered a good accuracy over MNIST dataset.
You can change these numbers and that is one way by which you can modify a model.
Unfortunately you cannot yet explore the reason for the choice behind these parameters within TensorFlow or any other literature source.
I am trying to parameterise a 1D conv net via Torch.
Let's say I have a Tensor called data that is of dimensions 10 x 512, in that there are 10 rows and 512 columns. As such, I want to implement a single 3-layer stack of a TemporalConvolution layer, followed by ReLU, followed by TemporalMaxPooling. My classification problem is binary, and there is a corresponding labels tensor, which is 10 x 1. Let us assume that there is already written a feval to iterate through each row in both data and labels.
As such, the problem is to construct a net that can map from 512 columns down to 1 column
Adapted from the documentation:
...
model = nn.Sequential()
model:add(nn.TemporalConvolution(inputFrameSize, outputFrameSize, kW, [dW]))
model:add(nn.ReLU())
model:add(nn.TemporalMaxPooling(kW2, [dW2])
...
criterion = nn.BCECriterion()
...
I have parameterised it as follows, but the following doesn't work : /
TemporalConvolution(512,1,3,1)
ReLU())
TemporalMaxPooling(3, 1)
It throws the error: 2D or 3D(batch mode) tensor expected. As a result I tried to reshape data before passing it to the net:
data = data:resize(1, 100, 512)
But this throws the error: invalid input frame size.
I can see that the error concerns the shape of the data coming into the conv net and of course the parameterisation too. I am further confused by this post here which seems to suggest that inputFrameSize of TemporalConvolution should be set to 10 not 512.
Any guidance would be appreciated, as to how to build a 1D conv net.
P.S. I have tested the script with a logisticRegression model, and that runs, so the issue is purely with the conv net architecture / the shape of the data coming into it.
I guess you misunderstand the meaning of inputFrameSize, which is not the seqlen of your input but n_channels (e.g. for 512*512 RGB images in 2d-convlution, the inputFrameSize should be 3 not 512).
I have trained my own dataset of images (traffic light images 11x27) with LeNet, using caffe and DIGITS interface. I get 99% accuracy and when I give new images via DIGITS, it predicts the good label, so the network seems to work very well.
However, I struggle to predict the labels through Python/Matlab API for caffe. The last layer output (ip2) is a vector with 2 elements (I have 2 classes), which looks like [4.8060, -5.2608] for example (the first component is always positive, the second always negative and the absolute values range from 4 to 20). I know it from many tests in Python, Matlab and DIGITS.
My problem is :
Argmax can't work directly on this layer (it always gives 0)
If I use a softmax function, it will always give me [1, 0] (and that's actually the value of net.blobs['prob'] or out['prob'] in the python interface, no matter the class of my image)
So, how can I get the good label predicted ?
Thanks!