Deep learning - Find patterns combining images and bios data - machine-learning

I was wondering if is it possible combining images and some "bios" data for finding patterns. For example, if I want to know if a image is a cat or dog and I have:
Enough image data for train my model
Enough "bios" data like:
size of the animal
size of the tail
weight
height
Thanks!

Are you looking for a simple yes or no answer? In that case, yes. You are in complete control over building your models which includes what data you make them process and what predictions you get.
If you actually wanted to ask on how to do it, it will depend on specific datasets and application but one way to do it would be by having two models, one specialized for determining the output label (cat or dog) from the image - so perhaps some kind of a simple CNN. The other would process the text data and find patterns in that. Then at the end, you could have either a non-AI evaluator that would combine these two predictions into one naively or you could have both of these models as an input to a simple neural network that would learn pattern from the output of these two models.
That is just one way to possibly do it though and, as I said, the exact implementation will depend on a lot of other factors. How are both of the datasets labeled? Are the data connected to each other? Meaning that, for each picture, do you have some textual data that is for that specific image? Or do you jsut have a spearated dataset of pictures and separate dataset of biological information?
There is also the consideration that you'll probably want to make about the necessity of this approach. Current models can predict categories from processing images with super-human precision. Unless this is an excersise in creating a more complex model, this seems like an overkill.
PS: I wouldn't use term "bios" in this context, I believe it is not a very common usage and here on SO it will mostly confuse people into thinking you mean the actual BIOS.

Related

Adding vocabulary and improve word embedding with another model that was built on bigger corpus

I'm new to NLP. I'm currently building a NLP system in a specific domain. After training a word2vec and fasttext model on my documents, I found that the embedding is not really good because I didn't feed enough number of documents (e.g. the embedding can't see that "bar" and "pub" is strongly correlated to each other because "pub" only appears a few in the documents). Later, I found a word2vec model online built on that domain-specific corpus which definitely has a way better embedding (so "pub" is more related to "bar"). Is there any way to improve my word embedding using the model I found? Thanks!
Word2Vec (and similar) models really require a large volume of varied data to create strong vectors.
But also, a model's vectors are typically only meaningful alongside other vectors that were trained together in the same session. This is both because the process includes some randomness, and the vectors only acquire their useful positions via a tug-of-war with all other vectors and aspects of the model-in-training.
So, there's no standard location for a word like 'bar' - just a good position, within a certain model, given the training data and model parameters and other words co-populating the model.
This means mixing vectors from different models is non-trivial. There are ways to learn a 'translation' that moves vectors from the space of one model to another – but that is itself a lot like a re-training. You can pre-initialize a model with vectors from elsewhere... but as soon as training starts, all the words in your training corpus will start drifting into the best alignment for that data, and gradually away from their original positions, and away from pure comparability with other words that aren't being updated.
In my opinion, the best approach is usually to expand your corpus with more appropriate data, so that it has "enough" examples of every word important to you, in sufficiently varied contexts.
Many people use large free text dumps like Wikipedia articles for word-vector training, but be aware that its style of writing – dry, authoritative reference texts – may not be optimal for all domains. If your problem-area is "business reviews", you'd probably do best finding other review texts. If it's fiction stories, more fictional writing. And so forth. You can shuffle these other text-soruces in with your data to expand the vocabulary coverage.
You can also potentially shuffle in extra repeated examples of your own local data, if you want it to effectively have relatively more influence. (Generally, merely repeating a small number of non-varied examples can't help improve word-vectors: it's the subtle contrasts of different examples that helps. But as a way to incrementally boost the influence of some examples, when there are plenty of examples overall, it can make more sense.)

Supervised Learning and its number of predictable classes

I am just a beginner of the ML. I have gone through several websites for the basics and there are lots of unclear stuffs obviously to me and among below is the one.
In CNN(Convolutional Neural Network), is it required to indicate to the system prior that how many number of classes available as a result?
I was going through below URL, and get this question.
https://www.youtube.com/watch?v=2-Ol7ZB0MmU
Yes. The final layer of the CNN depends on the quantity of output classes you have: one element for each class. A CNN is built to handle a particular problem; this includes knowing the full shapes of the input and output.
For instance, the ILSVRC image data set comes with classified images, 1000 classes in all. The topologies that learn on this data set have 1000 elements in the final layer.
Does that solve the problem?

Do you have any suggestions for a Machine Learning method that may actually learn to distinguish these two classes?

I have a dataset that overlaps a lot. So far my results with SVM are not good. Do you have any recomendations for a model that may be able to differ between these 2 datasets?
Scatter plot from both classes
It is easy to fit the dataset by interpolation of one of the classes and predicting the other one otherwise. The problem with this approach is though, that it will not generalize well. The question you have to ask yourself is, if you can predict the class of a point given its attributes. If not then every ML algorithm will also fail to do so.
Then the only reasonable thing you can do is to collect more data and more attributes for every point. Maybe by adding a third dimension you can seperate the data more easily.
If the data is overlapping so much, both should be of the same class, but we know they are not. So, there is/are some feature(s) or variable(s) that is/are separating these data points into two classes. Try to add more features for data.
And sometimes, just transforming the data into a different scale can help.
Both the classes need not be equally distributed, as skewed data distribution can be handled separately.
First of all, what is your criterion for "good results"? What style of SVM did you use? Simple linear will certainly fail for most concepts of "good", but a seriously convoluted Gaussian kernel might dredge something out of the handfuls of contiguous points in the upper regions of the plot.
I suggest that you run some basic statistics on the data you've presented, to see whether they're actually as separable as you'd want. I suggest a T-test for starters.
If you have other dimensions, I strongly recommend that you use them. Start with the greatest amount of input you can handle, and reduce from there (principal component analysis). Until we know the full shape and distribution of the data, there's not much hope of identifying a useful algorithm.
That said, I'll make a pre-emptive suggestion that you look into spectral clustering algorithms when you add the other dimensions. Some are good with density, some with connectivity, while others key on gaps.

Is there a way to limit a trained Caffe model to just a set of classes?

I'm using the default 'bvlc_reference_caffenet' model. I'm trying to detect a spatula. Now the results I'm getting are pretty satisfactory. The spatula class is always among the top 5 predicted classes but the rest are useless random things that I'm never going to be looking for. I could add a filter at the end to remove undesirable results but does Caffe provide this functionality on it's own? Can not look for said classes?
Yes, it does. 'bvlc_reference_caffenet' comes with a text file that defines structure of the neural network. It is composed of inuput layer, set of hidden layers and output layer. if you'd like to make it the best possible spatula-finder, then you have to modify the output layer and make it "spatul" and "rest of the world".
Mind that it requires re-training the model. In fact it's enough if you just refine the model by taking weights of the existing model and going only through fraction of iterations that were used to produce the model. It's still going to be computationally very expensive. Also, probably architecture of the hidden layers wouldn't be optimal.
My guess it that filtering on your own is just what you need.

Which machine learning model is applicable to the following case

I want to build a model that recognizes the species based on multiple indicators. The problem is, neural networks (usually) receive vectors, and my indicators are not always easily expressed in numbers. For example, one of the indicators is not only whether species performs some actions (that would be, say, '0' or '1', or anything in between, if the essence of action permits that), but sometimes, in which order are those actions performed. I want the system to be able to decide and classify species based on these indicators. There are not may classes but rather many indicators.
The amount of training data is not an issue, I can get as much as I want.
What machine learning techniques should I consider? Maybe some special kind of neural network would do? Or maybe something completely different.
If you treat a sequence of actions as a string, then using features like "an action A was performed" is akin to unigram model. If you want to account for order of actions, you should add bigrams, trigrams, etc.
That will blow up your feature space, though. For example, if you have M possible actions, then there are M (M-1) / 2 bigrams. In general, there are O(Mk) k-grams. This leads to the following issues:
The more features you have — the harder it is to apply some methods. For example, many models suffer from curse of dimensionality
The more features you have — the more data you need to capture meaningful relations.
This is just one possible approach to your problem. There may be others. For example, if you know that there's some set of parameters ϴ, that governs action-generating process in a known (at least approximately) way, you can build a separate model to infer these first, and then use ϴ as features.
The process of coming up with sensible numerical representation of your data is called feature engineering. Once you've done that, you can use any Machine Learning algorithm at your disposal.

Resources