Classify visually distinct objects as one class - machine-learning

We are building a neural network to classify objects and have a large dataset of images for 1000 classes. One of the classes is “banana” and it contains 1000 images of banana. Some of those images (about 10%) are of mashed bananas, which are visually very different from the rest of the images in that class.
If we want both mashed bananas and regular bananas to be classified, should we split the banana images into two separate classes and train separately, or keep the two subsets merged?
I am trying to understand how the presence of a visually distinct subclass impacts the recognition of a given class.

The problem here is simple. You need your neural network to learn both groups of images. That means you need to back-propagate sensible error information. If you do have the ground truth information about mashed bananas, back-propagating that is definitely useful. It helps the first layers learn two sets of features.
Note that the nice thing about neural networks is that you can back-propagate any kind of error vector. If your output has 3 nodes banana, non-mashed banana, mashed banana, you basically sidestep the binary choice implied in your question. You can always drop output nodes during inference.

There is no standard answer to be given here; it might be very hard for your network to generalize over classes if their subclasses are distinct in the feature space, in which case introducing multiple dummy classes that you collapse into a single one via post-processing would be the ideal solution. You could also pretrain a model with distinct classes (so as to build representations that discriminate between them), and then pop the final network layer (the classifier) and replace it with a collapsed classifier, fitting with the initial labels. This would accomplish having discriminating representations which are simply classified commonly. In any case I would advise you to construct the subclass-specific labels and check per-subclass error while training with the original classes; this way you will be able to quantify the prediction error you get and avoid over-engineering your network in case it can learn the task by itself without stricter supervision.

Related

Balance problem for classification on Cleveland Dataset

I’ve questioned the way famous Cleveland heart disease dataset labels its objects here
This dataset is very unbalanced (many objects of “no disease” class). I noticed that many papers that used this dataset used to combine all the other classes and reduce this to a binary classification (disease vs no disease)
Are there other ways to deal with this unbalancing class problem rather than reduce the number of classes to get a good result from a classifer?
Generally speaking, when handling a non balanced dataset, one should use a non-supervised learning approach.
You may use the Multivariate Normal Distribution.
In your case, if you have many elements in one class and very few in the other class, a supervised learning method is not appropriate. Therefore, the Multivariate Normal Distribution, which is a non supervised machine learning approach, may be the solution. The algorithm learns from the data and finds values which define the data (i.e. the most important part of the data, here the "no desease" cases). Once these values are outputed, one can search the elements which do not fit them, and these elements are the so called "abnormal elements" or "anomalies". In your case, these are the "disease" individuals.
A second solution would be to ballance you dataset, and use the initial supervised learning algorithm. You can do that using the following techniques. These statements are generally good, but they depend a lot on the data you have (mind, I do not have access to your input data!), so you should test them and see which one best fits your purpose.
Collecting more elements for the class with few elements.
Duplicate the elements in the class with less elements, in order to obtain the same amount of data for both classes, as for the class with more lements. There is a problem with this solution, in the case where you have a great difference of input data volume between the two classes, and you use a neural network, because the class with duplicated elements will not be very variate, and neural networks provide good results only when trained with a great amount of very variate data.
Use less data in the class with more lements, in order to have the same amount of elements in both classes as in the class with few elements. Here too there might be a problem when using a neural network, because training it with less data might not give the good results. be careful also in order to have more input elements than features, otherwise it would not work.

Deep learning - Find patterns combining images and bios data

I was wondering if is it possible combining images and some "bios" data for finding patterns. For example, if I want to know if a image is a cat or dog and I have:
Enough image data for train my model
Enough "bios" data like:
size of the animal
size of the tail
weight
height
Thanks!
Are you looking for a simple yes or no answer? In that case, yes. You are in complete control over building your models which includes what data you make them process and what predictions you get.
If you actually wanted to ask on how to do it, it will depend on specific datasets and application but one way to do it would be by having two models, one specialized for determining the output label (cat or dog) from the image - so perhaps some kind of a simple CNN. The other would process the text data and find patterns in that. Then at the end, you could have either a non-AI evaluator that would combine these two predictions into one naively or you could have both of these models as an input to a simple neural network that would learn pattern from the output of these two models.
That is just one way to possibly do it though and, as I said, the exact implementation will depend on a lot of other factors. How are both of the datasets labeled? Are the data connected to each other? Meaning that, for each picture, do you have some textual data that is for that specific image? Or do you jsut have a spearated dataset of pictures and separate dataset of biological information?
There is also the consideration that you'll probably want to make about the necessity of this approach. Current models can predict categories from processing images with super-human precision. Unless this is an excersise in creating a more complex model, this seems like an overkill.
PS: I wouldn't use term "bios" in this context, I believe it is not a very common usage and here on SO it will mostly confuse people into thinking you mean the actual BIOS.

Multiple neural networks with one output each or one with multiple outputs?

I want to classify the input as one of 3 possibilities. Is it better to use 3 networks with one output each or 1 network with 3 outputs?
(i.e. 3 networks that output 0 or 1 or 1 network that outputs a one hot vector of length 3 [1,0,0]
Does the answer change depending on how complex the incoming data is to classify?
At what amount of outputs does it make sense to partition the networks (if ever)? For example, if I want to classify into 20 groups, does it make a difference?
I would say it would make more sense to use a single network with multiple outputs.
The main reason is that hidden layers (I'm assuming you'll have at least one hidden layer) can be interpreted as transforming the data from the original space (feature space) into a different space that is more suitable for the task (classification in your case). For example, when training a network to recognize faces from raw pixels, it might use a hidden layer to first detect simple shapes such as small lines based on pixels, then use another hidden layer to detect simple shapes such as eyes/noses based on the lines from the first layer, etc. (it may not be entirely as ''clean'' as this, but this is an easy-to-understand example).
Such a transformation that a network can learn is typically useful for the classification task, regardless of what class the specific example has. For example, it is useful to be able to detect eyes in images regardless of whether or not the actual image contains a face; if you do indeed detect two eyes, you can classify it as a face, and otherwise you classify it as not being a face. In both cases, you were looking for eyes.
So, by splitting up into multiple networks, you may end up learning quite similar patterns in all networks anyway. Then you might as well have saved yourself the computational effort and just learned it once.
Another disadvantage of splitting up into multiple networks would be that you would probably cause your dataset to become imbalanced (or more imbalanced if it already is imbalanced). Suppose you have three classes, with exactly 1/3 of the dataset belonging to each class. If you use three networks for three binary classification tasks, you suddenly always have 1/3 ''1'' classes and 2/3 ''0'' classes. A network may then become biased towards predicting 0s everywhere, since those are the majority classes in each of the three separate problems.
Note that this is all based on my intuition; the best solution if you have time would be to simply try both approaches and test! I don't think I have ever seen someone using multiple networks for a single classification task in practice though, so if you only have time for one approach I'd recommend going for a single network.
I think the only case where it would really make sense to use multiple networks would be if you actually want to predict multiple unrelated values (or at least values that are not strongly related). For example, if, given images, you want to 1) predict whether or not there is a dog on the image, and 2) whether it is a photograph or a painting. Then it may be better to use two networks with two outputs each, instead of a single network with four outputs.

Can a neural network be trained while it changes in size?

Are there known methods of continuous training and graceful degradation of a neural net while it shrinks or grows in size (by number of nodes, connections, whatever)?
To the best of my memory, everything I've read about neural networks is from a static perspective. You define the net and then train it.
If there is some neural network X with N nodes (neurons, whatever), is it possible to train the network (X) so that while N increases or decreases, the network is still useful and capable of performing?
In general, changing network architecture (adding new layers, adding more neurons into existing layers) once the network was already trained makes sense and a rather common operation in Deep Learning domain. One example is the dropout - during training half of the neurons randomly get switched off completely and only remaining half participates in training during specific iteration (each iteration or 'epoch' as it often is named has different random list of switched off neurons). Another example is transfer learning - where you learn network on one set of input data, cut off part of the outcoming layers, replace them with new layers and re-learn the model on another dataset.
To better explain why it makes sense lets step back for a moment. In deep networks, where you have lots of hidden layers each layer learns some abstraction from the incoming data. Each additional layer uses abstract representations learned by previous layer and builds upon them, combining such abstraction to form a higher level of the data representation. For instance, you could be trying to classify the images with DNN. First layer will learn rather simple concepts from images - like edges or points in data. Next layer could combine this simple concepts to learn primitives - like triangles or circles of squares. Next layer could drive it further and combine this primitives to represent some objects which you could find in images, like 'a car' or 'a house'and using softmax it calculates the probabilities of the answer you are looking for (what to actually output). I need to mention that these facts and learned representations could be actually checked. You could visualize the activation of your hidden layer and see what it learned. For example this was done with google's project 'inceptionism'. With that in mind let's get back to what I mentioned earlier.
Dropout is used to improve generalization of the network. It forces each neuron to 'not be so sure' that some pieces of the information from the previous layer will be available and makes it to try to learn the representations relying on less favorable and informative pieces of abstractions from previous layer. It forces it to consider all of the representations from previous layer to make decisions instead of putting all of its weight into couple of neurons it 'likes most of all'. By doing this the network is usually better prepared to new data where the input will be different from the training set.
Q: "As far as you're aware is the quality of the stored knowledge (whatever training has done to the net) still usable following the dropout? Maybe random halves could be substituted by random 10ths with a single 10th dropping, that might result in less knowledge loss during the transition period."
A: Unfortunately I can't properly answer why precisely half of the neurons is switched off and not 10% (or any other number). Maybe there is an explanation but I haven't seen it. In general it just works and that's it.
Also I need to mention that the task of dropout is to ensure that each neuron doesn't consider just several of the neurons from previous layer and is ready to make some decision even if neurons which usually helped it to make correct decision are not available. This is used for generalization only and helps the network to better cope with the data it haven't seen previously, nothing else is achieved with a dropout.
Now let's consider Transfer Learning again. Consider that you have a network with 4 layers. You train it to recognize specific objects in pictures (cat, dog, table, car etc). Than you cut off last layer, replace it with three additional layers and now you train the resulting 6-layered network on a dataset which, for instance, wrights short sentences about what is shown on this image ('a cat is on the car', 'house with windows and tree nearby' etc). What we did with such operation? Our original 4-layer network was capable to understand if some specific object is in the image we feed it with. Its first 3 layers learned good representations of the images - first layer learned about possible edges or points or some extremely primitive geometric shapes in images. Second layer learned some more elaborate geometric figures like 'circle' or 'square'. Last layer knows how to combine them to form some higher level objects - 'car', 'cat', 'house'. Now, we could just re-use this good representation which we learned in different domain and just add several more layers. Each of them will use abstractions from last (3rd) layer of original network and learn how combine them to create meaningful descriptions of images. While you will perform learning on new dataset with images as input and sentences as output it will adjust first 3 layers which we got from original network but these adjustments will be mostly minor, while 3 new layers will be adjusted by learning significantly. What we achieve with transfer learning is:
1) We can learn a much better data representations. We could create a network which is very good at specific task and than build upon that network to perform something different.
2) We can save training time - first layers of network will already be trained well enough so that your layers which are closer to output already get a rather good data representations. So the training should finish much faster using pre-trained first layers.
So the bottom line is that pre-training some network and than re-using part or whole network in another network makes perfect sense and is not something uncommon.
This is something I have seen in the likes of this video...
https://youtu.be/qv6UVOQ0F44
There are links to further resources in the video description.
And is based on a process called NEAT. Neuro Evolution of Augmenting Topologies.
It uses a genetic algorithm and evolutionary process to design and evolve a neural net from scratch with no prior assumptions of structure or complexity of the neural net.
I believe this is what you are looking for.

Is there any algorithm good at pick special category?

When I see machine learning, specially the classification, I find that some algorithm are designed to classify , for example, the Decision tree, to classify without the consideration as described next:
For a two categories problem, category A and B, people are interested in a special one, for example the category A. For this case, assume that we have 100 for A and 1000 for B. A good classify may have a result that mixed 100A and 100B as a part and let 900B another part. This is good for classify . But is there a algorithm can pick, for example , 50A and 5 B to a part and 50 A and 995 B for another part. This may not so good as a view of classify, but if some one is interested in category A, I think that next algorithm can give a more pure A result so it is better.
In short, it means is there a algorithm can pure a special category, not to classify them with no bias?
If scikit-learn have included this algorithm, it is be better.
Look into a matching algorithm such as the "Stable Marriage Problem."
https://en.wikipedia.org/wiki/Stable_marriage_problem
If I understand you correctly, I think you're asking for a machine learning algorithm that gives a higher weight to certain classes and are therefore proportionally more likely to predict those "special" classes.
If that's what you're asking, you could use any algorithm that outputs a probability of each class during prediction. I think most algorithms take that approach actually, but I know specifically that neural nets do. Then, you can either train the network on proportionally more data on the "special" classes, or manually post-process the prediction output (the array of probabilities of each class) to adapt the probabilities to your specification.

Resources