What is the difference between these two layers : CONV and MBConv? - machine-learning

I am working on a machine learning project to learn more about this field. The project is about image classification. I want to use the EffnetB0 architecure and they mention in this architecure they use in the fisrt stage the following layer: "Conv3X3" and the following layers they use "MBConv1".
I tried to understand the difference between these two layers but I can't seem to find the answer. These two layers are both convolutional layers right ?
But what exactly is the difference between "Conv" and "MBConv"?
Thank you for helping me!

A conv means that there is a convolution core to scan the matrix corresponding to the target image line by line and convolution, the result of each convolution constitutes a value of the output matrix.
About the MBConv,i think you means mobile inverted bottleneck convolution,it's more of an encapsulated module than a single conv layer. A MBConv's structure can be expressed as follows:
MBConv = 1x1conv(ascending dimension) + Depthwise Convolution + SENet + 1x1conv(dimensionality reduction) + add
By the way, you may notice the new names Depthwise Convolution and SENet, which are also a kind of modules(honestly, it's like a nesting doll)
If you just want to use it, you don't necessarily need to fully understand it until you need to improve your model structure. So my answer to your question
What is the difference between these two layers : CONV and MBConv?
is : the former is a simple layer, and the latter is a complex module made up of many simple layers

Related

How to apply CNN for multi-channel pixel data based weights to each channel?

I have an image with 8 channels.I have a conventional algorithm where weights are added to each of these channels to get an output as '0' or '1'.This works fine with several samples and complex scenarios. I would like implement the same in Machine Learning using CNN method.
I am new to ML and started looking out the tutorials which seem to be exclusively dealing with image processing problems- Hand writing recognition,Feature extraction etc.
http://cv-tricks.com/tensorflow-tutorial/training-convolutional-neural-network-for-image-classification/
https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/neural_networks.html
I have setup the Keras with Theano as background.Basic Keras samples are working without problem.
What steps do I require to follow in order achieve the same result using CNN ? I do not comprehend the use of filters,kernels,stride in my use case.How do we provide Training data to Keras if the pixel channel values and output are in the below form?
Pixel#1 f(C1,C2...C8)=1
Pixel#2 f(C1,C2...C8)=1
Pixel#3 f(C1,C2...C8)=0 .
.
Pixel#N f(C1,C2...C8)=1
I think you should treat this the same way you use CNN to do semantic segmentation. For an example look at
https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf
You can use the same architecture has they are using but for the first layer instead of using filters for 3 channels use filters for 8 channels.
For the loss function you can use the same loos function or something that is more specific for binary loss.
There are several implementation for keras but with tensorflow
backend
https://github.com/JihongJu/keras-fcn
https://github.com/aurora95/Keras-FCN
Since the input is in the form of channel values,that too in sequence.I would suggest you to use Convolution1D. Here,you are taking each pixel's channel values as the input and you need to predict for each pixel.Try this
eg :
Conv1D(filters, kernel_size, strides=1, padding='valid')
Conv1D()
MaxPooling1D(pool_size)
......
(Add many layers as you want)
......
Dense(1)
use binary_crossentropy as the loss function.

Features extraction methods

Which methods/algorithms that can be used to extract the features from this image
Where the previous image is a linear combination of several images with different weights
i.e., image= w1×LP01 + w2×LP02 + w3×LP03 + w4×LP11 + w5×LP12 ...etc
The LPmn images are something like this,
w is the weight.
I am looking for other methods except linear regression based methods, e.g., PCA, LDA, SVD ...
I have tried to use wavelet transform but it doesn't work. Any suggestions?
I would have played by reshaping the image to a vector and use the entire vector as your feature. And use a simple neural network to see how that works out. For a start!
Finding feature is an iterative process. It is not always obvious!

How to apply mean/average pooling over the batch size to get a single output for the whole batch in Keras?

For eg.- the input with dimensions [10,1,224,224] is required to be reduced to [1,1,224,224] where [samples,channels,rows,columns] is the convention for the dimensions.
Then your problem is badly formuled, consider using [10,1,224,224] as input_shape and make batches of such tensors. Then use Averagepooling3D, see doc here.
You won't be able to make operations on batches with the usual layers, except maybe if you build your own custom layer : see here.

Net surgery: How to reshape a convolution layer of a caffemodel file in caffe?

I'm trying to reshape the size of a convolution layer of a caffemodel (This is a follow-up question to this question). Although there is a tutorial on how to do net surgery, it only shows how to copy weight parameters from one caffemodel to another of the same size.
Instead I need to add a new channel (all 0) to my convolution filter such that it changes its size from currently (64x3x3x3) to (64x4x3x3).
Say the convolution layer is called 'conv1'. This is what I tried so far:
# Load the original network and extract the fully connected layers' parameters.
net = caffe.Net('../models/train.prototxt',
'../models/train.caffemodel',
caffe.TRAIN)
Now I can perform this:
net.blobs['conv1'].reshape(64,4,3,3);
net.save('myNewTrainModel.caffemodel');
But the saved model seems not to have changed. I've read that the actual weights of the convolution are stored rather in net.params['conv1'][0].data than in net.blobs but I can't figure out how to reshape the net.params object. Does anyone have an idea?
As you well noted, net.blobs does not store the learned parameters/weights, but rather stores the result of applying the filters/activations on the net's input. The learned weights are stored in net.params. (see this for more details).
AFAIK, you cannot directly reshape net.params and add a channel.
What you can do, is have two nets deploy_trained_net_with_3ch.prototxt and deploy_empty_net_with_4ch.prototxt. The two files can be almost identical apart from the input shape definition and the first layer's name.
Then you can load both nets to python and copy the relevant part:
net3ch = caffe.Net('deploy_trained_net_with_3ch.prototxt', 'train.caffemodel', caffe.TEST)
net4ch = caffe.Net('deploy_empty_net_with_4ch.prototxt', 'train.caffemodel', caffe.TEST)
since all layer names are identical (apart from conv1) net4ch.params will have the weights of train.caffemodel. As for the first layer, you can now manually copy the relevant part:
net4ch.params['conv1_4ch'][0].data[:,:3,:,:] = net3ch.params['conv1'][0].data[...]
and finally:
net4ch.save('myNewTrainModel.caffemodel')

Translating a TensorFlow LSTM into synapticjs

I'm working on implementing an interface between a TensorFlow basic LSTM that's already been trained and a javascript version that can be run in the browser. The problem is that in all of the literature that I've read LSTMs are modeled as mini-networks (using only connections, nodes and gates) and TensorFlow seems to have a lot more going on.
The two questions that I have are:
Can the TensorFlow model be easily translated into a more conventional neural network structure?
Is there a practical way to map the trainable variables that TensorFlow gives you to this structure?
I can get the 'trainable variables' out of TensorFlow, the issue is that they appear to only have one value for bias per LSTM node, where most of the models I've seen would include several biases for the memory cell, the inputs and the output.
Internally, the LSTMCell class stores the LSTM weights as a one big matrix instead of 8 smaller ones for efficiency purposes. It is quite easy to divide it horizontally and vertically to get to the more conventional representation. However, it might be easier and more efficient if your library does the similar optimization.
Here is the relevant piece of code of the BasicLSTMCell:
concat = linear([inputs, h], 4 * self._num_units, True)
# i = input_gate, j = new_input, f = forget_gate, o = output_gate
i, j, f, o = array_ops.split(1, 4, concat)
The linear function does the matrix multiplication to transform the concatenated input and the previous h state into 4 matrices of [batch_size, self._num_units] shape. The linear transformation uses a single matrix and bias variables that you're referring to in the question. The result is then split into different gates used by the LSTM transformation.
If you'd like to explicitly get the transformations for each gate, you can split that matrix and bias into 4 blocks. It is also quite easy to implement it from scratch using 4 or 8 linear transformations.

Resources