Recognition of images with additional data - machine-learning

Good morning everyone, first I would like to make it clear that I began to take my first steps in machine learning yesterday.
I've read most basic items and attended some presentations.
I will participate in a project here a few months that this technology will be applied.
As a beginner I would like to ask a question that I think is silly, but I could not find answers for her.
In presentations and articles, I have seen the creation of a classifier that can classify images or data sets, but never both at the same time.
For example, Iris flower data set, which is used as an example. In this data set we have the characteristics of flowers, such as petal width, but we do not have a visual representation of it. It is possible to fit both and for example, to estimate the width of the petal of a certain image?
I imagine this is a very basic question, but I could not find something suitable for a beginner.
I would be very grateful.

Machine learning models always work on some abstract data items like vectors, points in multidimensional spaces etc. For the simplicity, let us assume for a moment that ML algorithms work on vectors. Classification therefore would be a task of assigning a label Y to a vector X(n).
Now with a data set conversion of values in a row into a vector is relatively easy - well, you have to somehow convert texts onto numbers or vice versa, but it is a standard procedure.
With images it is different. You have to now build a ML-suitable representation of an image. In other words you need to create features (e.g. numerical) describing the image, that you can later use as inputs to your ML.
Examples of such features are: colour histograms, average brightness, number of edges, various convolutions etc. There can be more complicated, semantic features like the presence of a human on the picture. Calculating these however is much more difficult.
So summing up - you can build a classifier on both the image and dataset, but it basically means transforming both into a set of features.

Related

Whether Data augmentation really needed in Machine Learning [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I am interested in knowing the importance of data augmentation(rotation at various angles, flipping the images) while providing a dataset to a Machine Learning problem.
Whether it is really needed? Or the CNN networks using will handle that as well no matter how different the data are transformed?
So I took a classification task with 2 classes to conclude some results
Arrow shapes
Circle shapes
The idea is to train the shapes with only one orientation(I have taken arrows pointing right) and check the model with a different orientation(I have taken arrows pointing downwards) which is not at all given during the training stage.
Some of the samples used in Training
Some of the samples used in Testing
This is the entire dataset I am using in for creating a tensorflow model.
https://bitbucket.org/akhileshmalviya/samples/src/bab50b85d826?at=master
I am wondering with the results I got,
(i) Except a few downward arrows all others are getting predicted correctly as arrow. Does it mean data augmentation is not at all needed?
(ii) Or is this the right use case I have taken to understand the importance of data augmentation?
Kindly share your thoughts, Any help could be really appreciated!
Data augmentation is a data-depended process.
In general, you need it when your training data is complex and you have a few samples.
A neural network can easily learn to extract simple patterns like arcs or straight lines and these patterns are enough to classify your data.
In your case data augmentation can barely help, the features the network will learn to extract are easy and highly different from each other.
When you, instead, have to deal with complex structures (cats, dogs, airplanes, ...) you can't rely on simple features like edges, arcs, etc..
Instead, you have to show to your network that the instances you're trying to classify got an high variance and that the features extracted can be combined in a lot of different ways for the same subject.
Think about a cat: it can be of any color, the picture can be taken in different light conditions, its whole body can be in any position, the picture could be taken with a certain orientation...
To correctly classify instances so different, the network must learn to extract robust features that could be learned only after seeing a lot of different inputs.
In your case, instead, simple features can completely discriminate your input, thus any sort of data augmentation could help by just a little bit.
The task you are solving can be easily solved without any NN and even without machine learning.
Just because the problem is so simple it does not really matter whether you do a data augmentation or not. The need for data augmentation is task specific and depends on many things:
how easy is to augment the data with preserving the ability to correctly mark the class. For image, sounds which we used to see/hear it is not a problem (we know that adding small noise to the sound does not change the meaning, rotating the lizard is still a lizard). For other things augmenting without preserving the class/value is hard (for example in Go, randomly adding a stone can change the value of the position dramatically)
does the augmented data is drawn from the same distribution you care about. Adding random stones to Go does not work, but rotating flipping the board works and preserves distribution. But for example in a racing king game (variant of chess) it will not help. You can't flip the position (left <-> right), the evaluation stays the same, but it will never happen in real game and therefore drawn from different distribution and useless
how much data do you have and how expressive is your model. The more parameters you model have, the bigger the chance of overfitting and the more is your need for data. If you train a linear regression in n dims, you will have n + 1 params. You do not really need to augment this. Also if you already have 10bln data points, the augmentation is probably will not be helpful.
how expensive the augmentation procedure. For rotating/scaling the image it is very cheap, but for other augmentation it can be computationally expensive
something else that I forgot.

Relation between features for classification and clustering

I'm a newbie for machine learning, and I have following question. Suppose that I have implemented a classification algorithm on some data, and recognized the best combination of features for the classification algorithm. If someday I get data from same resource, which lack the target feature in previous classification task, Can I use the best combination of features for classification directly to clustering task? (I know I can use the model I trained to predict the target of data, but I just want to know whether the best combination of features is same between classification and clustering algorithms)
I have searched websites and any resource I know, but I can't find the answer for my question, Could somebody tell me or just give me a link? Thanks!
I would say yes, provided the nature of the target is the same in both cases. What we want ideally is a tractable number of features which are orthogonal (perpendicular) to each other in N space, so that each can contribute maximally to the prediction.
Take a concrete example, that of T shirts and whether they are Large size or Small size. You are given data which shows that in the manufacturing process there is a bit of material shrinkage which means the T shirts come out a bit irregular, and the shrinkage varies between the height and width, but not much. The data shows height, width and colour and you want to decide if they are in the large group or the small. You find that the height and width are important but the colour is not, so you decide to go with the height and width as your classification features.
The important point is that these two features have been identified as the most orthogonal to each other, which should apply in a classification or clustering context. The number of clusters remains a factor to be examined.
It may not be good enough.
For example a decision tree or random forest can be analyzed to get the importance of features. But this will not tell you what kind of preprocessing (in particular scaling and weighting) is necessary to be able to cluster them (in particular, categorical features are difficult to use, anything that is not continuous or that is skewed is hard).
Furthermore, data tends to change over time. Features that were important once (e.g. Facebook likes) are useless now.

Interpreting a Self Organizing Map

I have been doing reading about Self Organizing Maps, and I understand the Algorithm(I think), however something still eludes me.
How do you interpret the trained network?
How would you then actually use it for say, a classification task(once you have done the clustering with your training data)?
All of the material I seem to find(printed and digital) focuses on the training of the Algorithm. I believe I may be missing something crucial.
Regards
SOMs are mainly a dimensionality reduction algorithm, not a classification tool. They are used for the dimensionality reduction just like PCA and similar methods (as once trained, you can check which neuron is activated by your input and use this neuron's position as the value), the only actual difference is their ability to preserve a given topology of output representation.
So what is SOM actually producing is a mapping from your input space X to the reduced space Y (the most common is a 2d lattice, making Y a 2 dimensional space). To perform actual classification you should transform your data through this mapping, and run some other, classificational model (SVM, Neural Network, Decision Tree, etc.).
In other words - SOMs are used for finding other representation of the data. Representation, which is easy for further analyzis by humans (as it is mostly 2dimensional and can be plotted), and very easy for any further classification models. This is a great method of visualizing highly dimensional data, analyzing "what is going on", how are some classes grouped geometricaly, etc.. But they should not be confused with other neural models like artificial neural networks or even growing neural gas (which is a very similar concept, yet giving a direct data clustering) as they serve a different purpose.
Of course one can use SOMs directly for the classification, but this is a modification of the original idea, which requires other data representation, and in general, it does not work that well as using some other classifier on top of it.
EDIT
There are at least few ways of visualizing the trained SOM:
one can render the SOM's neurons as points in the input space, with edges connecting the topologicaly close ones (this is possible only if the input space has small number of dimensions, like 2-3)
display data classes on the SOM's topology - if your data is labeled with some numbers {1,..k}, we can bind some k colors to them, for binary case let us consider blue and red. Next, for each data point we calculate its corresponding neuron in the SOM and add this label's color to the neuron. Once all data have been processed, we plot the SOM's neurons, each with its original position in the topology, with the color being some agregate (eg. mean) of colors assigned to it. This approach, if we use some simple topology like 2d grid, gives us a nice low-dimensional representation of data. In the following image, subimages from the third one to the end are the results of such visualization, where red color means label 1("yes" answer) andbluemeans label2` ("no" answer)
onc can also visualize the inter-neuron distances by calculating how far away are each connected neurons and plotting it on the SOM's map (second subimage in the above visualization)
one can cluster the neuron's positions with some clustering algorithm (like K-means) and visualize the clusters ids as colors (first subimage)

find mosquitos' head in the image

I have images of mosquitos similar to these ones and I would like to automatically circle around the head of each mosquito in the images. They are obviously in different orientations and there are random number of them in different images. some error is fine. Any ideas of algorithms to do this?
This problem resembles a face detection problem, so you could try a naïve approach first and refine it if necessary.
First you would need to recreate your training set. For this you would like to extract small images with examples of what is a mosquito head or what is not.
Then you can use those images to train a classification algorithm, be careful to have a balanced training set, since if your data is skewed to one class it would hit the performance of the algorithm. Since images are 2D and algorithms usually just take 1D arrays as input, you will need to arrange your images to that format as well (for instance: http://en.wikipedia.org/wiki/Row-major_order).
I normally use support vector machines, but other algorithms such as logistic regression could make the trick too. If you decide to use support vector machines I strongly recommend you to check libsvm (http://www.csie.ntu.edu.tw/~cjlin/libsvm/), since it's a very mature library with bindings to several programming languages. Also they have a very easy to follow guide targeted to beginners (http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf).
If you have enough data, you should be able to avoid tolerance to orientation. If you don't have enough data, then you could create more training rows with some samples rotated, so you would have a more representative training set.
As for the prediction what you could do is given an image, cut it using a grid where each cell has the same dimension that the ones you used on your training set. Then you pass each of this image to the classifier and mark those squares where the classifier gave you a positive output. If you really need circles then take the center of the given square and the radius would be the half of the square side size (sorry for stating the obvious).
So after you do this you might have problems with sizes (some mosquitos might appear closer to the camera than others) , since we are not trained the algorithm to be tolerant to scale. Moreover, even with all mosquitos in the same scale, we still might miss some of them just because they didn't fit in our grid perfectly. To address this, we will need to repeat this procedure (grid cut and predict) rescaling the given image to different sizes. How many sizes? well here you would have to determine that through experimentation.
This approach is sensitive to the size of the "window" that you are using, that is also something I would recommend you to experiment with.
There are some research may be useful:
A Multistep Approach for Shape Similarity Search in Image Databases
Representation and Detection of Shapes in Images
From the pictures you provided this seems to be an extremely hard image recognition problem, and I doubt you will get anywhere near acceptable recognition rates.
I would recommend a simpler approach:
First, if you have any control over the images, separate the mosquitoes before taking the picture, and use a white unmarked underground, perhaps even something illuminated from below. This will make separating the mosquitoes much easier.
Then threshold the image. For example here i did a quick try taking the red channel, then substracting the blue channel*5, then applying a threshold of 80:
Use morphological dilation and erosion to get rid of the small leg structures.
Identify blobs of the right size to be moquitoes by Connected Component Labeling. If a blob is large enough to be two mosquitoes, cut it out, and apply some more dilation/erosion to it.
Once you have a single blob like this
you can find the direction of the body using Principal Component Analysis. The head should be the part of the body where the cross-section is the thickest.

Face Recognition Logic

I want to develop an application in which user input an image (of a person), a system should be able to identify face from an image of a person. System also works if there are more than one persons in an image.
I need a logic, I dont have any idea how can work on image pixel data in such a manner that it identifies person faces.
Eigenface might be a good algorithm to start with if you're looking to build a system for educational purposes, since it's relatively simple and serves as the starting point for a lot of other algorithms in the field. Basically what you do is take a bunch of face images (training data), switch them to grayscale if they're RGB, resize them so that every image has the same dimensions, make the images into vectors by stacking the columns of the images (which are now 2D matrices) on top of each other, compute the mean of every pixel value in all the images, and subtract that value from every entry in the matrix so that the component vectors won't be affine. Once that's done, you compute the covariance matrix of the result, solve for its eigenvalues and eigenvectors, and find the principal components. These components will serve as the basis for a vector space, and together describe the most significant ways in which face images differ from one another.
Once you've done that, you can compute a similarity score for a new face image by converting it into a face vector, projecting into the new vector space, and computing the linear distance between it and other projected face vectors.
If you decide to go this route, be careful to choose face images that were taken under an appropriate range of lighting conditions and pose angles. Those two factors play a huge role in how well your system will perform when presented with new faces. If the training gallery doesn't account for the properties of a probe image, you're going to get nonsense results. (I once trained an eigenface system on random pictures pulled down from the internet, and it gave me Bill Clinton as the strongest match for a picture of Elizabeth II, even though there was another picture of the Queen in the gallery. They both had white hair, were facing in the same direction, and were photographed under similar lighting conditions, and that was good enough for the computer.)
If you want to pull faces from multiple people in the same image, you're going to need a full system to detect faces, pull them into separate files, and preprocess them so that they're comparable with other faces drawn from other pictures. Those are all huge subjects in their own right. I've seen some good work done by people using skin color and texture-based methods to cut out image components that aren't faces, but these are also highly subject to variations in training data. Color casting is particularly hard to control, which is why grayscale conversion and/or wavelet representations of images are popular.
Machine learning is the keystone of many important processes in an FR system, so I can't stress the importance of good training data enough. There are a bunch of learning algorithms out there, but the most important one in my view is the naive Bayes classifier; the other methods converge on Bayes as the size of the training dataset increases, so you only need to get fancy if you plan to work with smaller datasets. Just remember that the quality of your training data will make or break the system as a whole, and as long as it's solid, you can pick whatever trees you like from the forest of algorithms that have been written to support the enterprise.
EDIT: A good sanity check for your training data is to compute average faces for your probe and gallery images. (This is exactly what it sounds like; after controlling for image size, take the sum of the RGB channels for every image and divide each pixel by the number of images.) The better your preprocessing, the more human the average faces will look. If the two average faces look like different people -- different gender, ethnicity, hair color, whatever -- that's a warning sign that your training data may not be appropriate for what you have in mind.
Have a look at the Face Recognition Hompage - there are algorithms, papers, and even some source code.
There are many many different alghorithms out there. Basically what you are looking for is "computer vision". We had made a project in university based around facial recognition and detection. What you need to do is google extensively and try to understand all this stuff. There is a bit of mathematics involved so be prepared. First go to wikipedia. Then you will want to search for pdf publications of specific algorithms.
You can go a hard way - write an implementaion of all alghorithms by yourself. Or easy way - use some computer vision library like OpenCV or OpenVIDIA.
And actually it is not that hard to make something that will work. So be brave. A lot harder is to make a software that will work under different and constantly varying conditions. And that is where google won't help you. But I suppose you don't want to go that deep.

Resources