I am newbie, I need your help. How can I combine 3 different images for making single input for CNN neural network? In simple words I am using list of 3 images as single input with one output.
let's suppose :
x1=[img1,img2,img3] ====> y1
x2=[img1,img2,img3] ====> y2
x3=[img1,img2,img3] ====> y3
Thanks!
The idea in this case is is known as a multi-input model. Such model would have a stack of layer until some point where every stack is concatenated to follow a new stack of prediction. You can imagine it as this:
The source of this image and the implementation of such model can be found here: https://blog.csdn.net/alphachx/article/details/96482328
Related
I am working on a machine learning project to learn more about this field. The project is about image classification. I want to use the EffnetB0 architecure and they mention in this architecure they use in the fisrt stage the following layer: "Conv3X3" and the following layers they use "MBConv1".
I tried to understand the difference between these two layers but I can't seem to find the answer. These two layers are both convolutional layers right ?
But what exactly is the difference between "Conv" and "MBConv"?
Thank you for helping me!
A conv means that there is a convolution core to scan the matrix corresponding to the target image line by line and convolution, the result of each convolution constitutes a value of the output matrix.
About the MBConv,i think you means mobile inverted bottleneck convolution,it's more of an encapsulated module than a single conv layer. A MBConv's structure can be expressed as follows:
MBConv = 1x1conv(ascending dimension) + Depthwise Convolution + SENet + 1x1conv(dimensionality reduction) + add
By the way, you may notice the new names Depthwise Convolution and SENet, which are also a kind of modules(honestly, it's like a nesting doll)
If you just want to use it, you don't necessarily need to fully understand it until you need to improve your model structure. So my answer to your question
What is the difference between these two layers : CONV and MBConv?
is : the former is a simple layer, and the latter is a complex module made up of many simple layers
I am building a face recognition model using facenet. I could in most of the papers, LFW is used for validation. Trying to understand how LFW is used for validation as it has only 1600 classes with more than 2 images out of 5400 classes. Trying to find answers for the following questions
1) For validation, do we need to use only the classes with more than 1 image and neglect the remaining class ?
2) In the below link there are files under the name 'pairs.txt' and 'people.txt'. How is it exactly used ?
http://vis-www.cs.umass.edu/lfw/
To prepare a flipped dataset as a query dataset
You can use original lfw as a reference dataset, and flip it as a query dataset.
check this repo for detail https://github.com/ZhaoJ9014/face.evoLVe.PyTorch/blob/master/util/extract_feature_v1.py.
the author also gave extract_feature_v2.py which adding centre crop before flip.
I am planing to predict the next image from an image sequence. I have searched on the internet (Google/YouTube) for tutorials and for similar work. but I couldn't find any.
I want to know whether it is possible to find the pattern and predict the next image and can I find some tutorials for that.
You can use a CNN:
The input is then not 3 * w * h but (3*number of images) * w * h - so you can just concatenate the stuff in depth
The output is just an image instead of a class. So no flattening in between... or a reshape has to be added.
Have a look at
Fully Convolutional Networks for Semantic Segmentation and Image-to-Image translation.
If you haven't seen it already: Keras is pretty handy.
You might also be interested in the concept of Optical Flow
I have a neural network with an input layer having 10 nodes, some hidden layers and an output layer with only 1 node. Then I put a pattern in the input layer, and after some processing, it outputs the value in the output neuron which is a number from 1 to 10. After the training this model is able to get the output , provided the input pattern.
Now, my question is, if it is possible to calculate the inverse model: This means, that I provide a number from output side, (i.e. using output side as input) and then getting the random pattern from those 10 input neurons (i.e. using input as output side).
I want to do this because I will first train a network on basis of difficulty of pattern (input is the pattern and output is difficulty to understand the pattern). Then I want to feed the network with a number so it creates the random patterns on basis of difficulty.
I hope I understood your problem correctly, so I will summarize it in my own words: You have a given model, and want to determine the input which yields a given output.
Supposed, that this is correct, there is at least one way I know of, how you can do this approximately. This way is very easy to implement, but might take a while to calculate a value - probably there are better ways to do this, but I am not sure. (I needed this technique some weeks ago in the topic of reinforcement learning, and did not find anything better, compared to this): Lets assume that your Model maps an input to an output . We now have to create a new model, which we will call : This model will later on calculate the inverse of the model , so that it gives you the input which yields a specific output. To construct we will create a new model, which consists of one plain Dense layer which has the same dimension m as the input. This layer will be connected to the input of the model now. Next, you make all weights of non-trainable (this is very important!).
Now we are setup to find an inverse value already: Assuming you want to find the input corresponding (corresponding means here: it creates the output, but is not unique) to the output y. You have to create a new input vector v which is the unity of . Then you create a input-output data pair consisting of (v, y). Now you use any optimizer you wish to let the input-output-trainingdata propagate through your network, until the error converges to zero. Once this has happend, you can calculate the real input, which gives the output y by doing this: Supposed, that the weights if the new input layer are called w, and the bias is b, the desired input u is u = w*1 + b (whereby 1 )
You might be asking for the reason why this equation holds, so let me try to answer it: You model will try to learn the weights of your new input layer, so that the unity as an input will create the given output. As only the newly added input layer is trainable, only this weights will be changed. Therefore, each weight in this vector will represent the corresponding component of the desired input vector. By using an optimizer and minimizing the l^2 distance between the wanted output and the output of our inverse-model , we will finally determine a set of weights, which will give you a good approximation for the input vector.
First of all: I'm completely new to Machine Learning and TensorFlow - I'm just playing around with this technology for a few weeks - and I really like it.
But I have (maybe a simple) question about the MNIST data set in combination with TensorFlow: I'm currently working through the "MNIST for ML Beginners" tutorial (https://www.tensorflow.org/versions/r0.11/tutorials/mnist/beginners/index.html#mnist-for-ml-beginners). I fully understand how the whole thing works, and what I accomplish with the source code.
My question is now the following:
Is it possible to see the individual weights parameters for each pixel? As far as I understand I can't access the individual weight parameters directly for each pixel, because the tf.matmul() operation returns me the sum over all weight parameters for a given class.
I want to access the individual weight parameters, because I want to see how these values are changing through the training process of the Neural Network.
Thanks for your help,
-Klaus
You can get the actual weights by just doing something like:
w = sess.run(W, feed_dict={x: batch_xs, y_: batch_ys})
print w.shape
If you want the per pixel results, just do a element-wise multiply of batch_xs * w (reshaped appropriately.)