Background
While going through the documentation of tensorflow,I came across the convolutional neural net example on the Variable Sharing.The examples motivates the need of variable sharing by describing a problem in which two images are exposed to image filter which would want to reuse the variables.
Problem
I am not able to get my head around this approach.Instead of introducing concept of variable sharing can we not place all the images in a matrix(in form of pixel values with each row denoting a new image) and perform necessary operations like filters on the whole matrix by using single set of variables.The approach I am suggesting is similar to what we use in vanilla neural networks.We dont use the concept of reusing variables for every training example rather we stack them up in form of matrix and have a common weight matrix for all the training examples to perform the necessary operations.
Can someone point where am I wrong in setting up the congruency between the two approaches?
In the forward pass, you can't distinguish between a shared variable and two identical variables. But in training, the backward pass is different. If you had two identical variables, you'll have two adjustments, and those will in general not be identical so the two variables will diverge. If you have a single shared variable, you have only a single adjustment.
Related
I was wondering if is it possible combining images and some "bios" data for finding patterns. For example, if I want to know if a image is a cat or dog and I have:
Enough image data for train my model
Enough "bios" data like:
size of the animal
size of the tail
weight
height
Thanks!
Are you looking for a simple yes or no answer? In that case, yes. You are in complete control over building your models which includes what data you make them process and what predictions you get.
If you actually wanted to ask on how to do it, it will depend on specific datasets and application but one way to do it would be by having two models, one specialized for determining the output label (cat or dog) from the image - so perhaps some kind of a simple CNN. The other would process the text data and find patterns in that. Then at the end, you could have either a non-AI evaluator that would combine these two predictions into one naively or you could have both of these models as an input to a simple neural network that would learn pattern from the output of these two models.
That is just one way to possibly do it though and, as I said, the exact implementation will depend on a lot of other factors. How are both of the datasets labeled? Are the data connected to each other? Meaning that, for each picture, do you have some textual data that is for that specific image? Or do you jsut have a spearated dataset of pictures and separate dataset of biological information?
There is also the consideration that you'll probably want to make about the necessity of this approach. Current models can predict categories from processing images with super-human precision. Unless this is an excersise in creating a more complex model, this seems like an overkill.
PS: I wouldn't use term "bios" in this context, I believe it is not a very common usage and here on SO it will mostly confuse people into thinking you mean the actual BIOS.
I am just a beginner of the ML. I have gone through several websites for the basics and there are lots of unclear stuffs obviously to me and among below is the one.
In CNN(Convolutional Neural Network), is it required to indicate to the system prior that how many number of classes available as a result?
I was going through below URL, and get this question.
https://www.youtube.com/watch?v=2-Ol7ZB0MmU
Yes. The final layer of the CNN depends on the quantity of output classes you have: one element for each class. A CNN is built to handle a particular problem; this includes knowing the full shapes of the input and output.
For instance, the ILSVRC image data set comes with classified images, 1000 classes in all. The topologies that learn on this data set have 1000 elements in the final layer.
Does that solve the problem?
I want to classify the input as one of 3 possibilities. Is it better to use 3 networks with one output each or 1 network with 3 outputs?
(i.e. 3 networks that output 0 or 1 or 1 network that outputs a one hot vector of length 3 [1,0,0]
Does the answer change depending on how complex the incoming data is to classify?
At what amount of outputs does it make sense to partition the networks (if ever)? For example, if I want to classify into 20 groups, does it make a difference?
I would say it would make more sense to use a single network with multiple outputs.
The main reason is that hidden layers (I'm assuming you'll have at least one hidden layer) can be interpreted as transforming the data from the original space (feature space) into a different space that is more suitable for the task (classification in your case). For example, when training a network to recognize faces from raw pixels, it might use a hidden layer to first detect simple shapes such as small lines based on pixels, then use another hidden layer to detect simple shapes such as eyes/noses based on the lines from the first layer, etc. (it may not be entirely as ''clean'' as this, but this is an easy-to-understand example).
Such a transformation that a network can learn is typically useful for the classification task, regardless of what class the specific example has. For example, it is useful to be able to detect eyes in images regardless of whether or not the actual image contains a face; if you do indeed detect two eyes, you can classify it as a face, and otherwise you classify it as not being a face. In both cases, you were looking for eyes.
So, by splitting up into multiple networks, you may end up learning quite similar patterns in all networks anyway. Then you might as well have saved yourself the computational effort and just learned it once.
Another disadvantage of splitting up into multiple networks would be that you would probably cause your dataset to become imbalanced (or more imbalanced if it already is imbalanced). Suppose you have three classes, with exactly 1/3 of the dataset belonging to each class. If you use three networks for three binary classification tasks, you suddenly always have 1/3 ''1'' classes and 2/3 ''0'' classes. A network may then become biased towards predicting 0s everywhere, since those are the majority classes in each of the three separate problems.
Note that this is all based on my intuition; the best solution if you have time would be to simply try both approaches and test! I don't think I have ever seen someone using multiple networks for a single classification task in practice though, so if you only have time for one approach I'd recommend going for a single network.
I think the only case where it would really make sense to use multiple networks would be if you actually want to predict multiple unrelated values (or at least values that are not strongly related). For example, if, given images, you want to 1) predict whether or not there is a dog on the image, and 2) whether it is a photograph or a painting. Then it may be better to use two networks with two outputs each, instead of a single network with four outputs.
I'm using the default 'bvlc_reference_caffenet' model. I'm trying to detect a spatula. Now the results I'm getting are pretty satisfactory. The spatula class is always among the top 5 predicted classes but the rest are useless random things that I'm never going to be looking for. I could add a filter at the end to remove undesirable results but does Caffe provide this functionality on it's own? Can not look for said classes?
Yes, it does. 'bvlc_reference_caffenet' comes with a text file that defines structure of the neural network. It is composed of inuput layer, set of hidden layers and output layer. if you'd like to make it the best possible spatula-finder, then you have to modify the output layer and make it "spatul" and "rest of the world".
Mind that it requires re-training the model. In fact it's enough if you just refine the model by taking weights of the existing model and going only through fraction of iterations that were used to produce the model. It's still going to be computationally very expensive. Also, probably architecture of the hidden layers wouldn't be optimal.
My guess it that filtering on your own is just what you need.
I want to build a model that recognizes the species based on multiple indicators. The problem is, neural networks (usually) receive vectors, and my indicators are not always easily expressed in numbers. For example, one of the indicators is not only whether species performs some actions (that would be, say, '0' or '1', or anything in between, if the essence of action permits that), but sometimes, in which order are those actions performed. I want the system to be able to decide and classify species based on these indicators. There are not may classes but rather many indicators.
The amount of training data is not an issue, I can get as much as I want.
What machine learning techniques should I consider? Maybe some special kind of neural network would do? Or maybe something completely different.
If you treat a sequence of actions as a string, then using features like "an action A was performed" is akin to unigram model. If you want to account for order of actions, you should add bigrams, trigrams, etc.
That will blow up your feature space, though. For example, if you have M possible actions, then there are M (M-1) / 2 bigrams. In general, there are O(Mk) k-grams. This leads to the following issues:
The more features you have — the harder it is to apply some methods. For example, many models suffer from curse of dimensionality
The more features you have — the more data you need to capture meaningful relations.
This is just one possible approach to your problem. There may be others. For example, if you know that there's some set of parameters ϴ, that governs action-generating process in a known (at least approximately) way, you can build a separate model to infer these first, and then use ϴ as features.
The process of coming up with sensible numerical representation of your data is called feature engineering. Once you've done that, you can use any Machine Learning algorithm at your disposal.