The SVM or ANN methods perform search of a surface which would separate the data points in a best way. This surface is returned in the vector or parametric form. Are there methods returning a spatial bitmap each voxel of which contains a numeric value defining a class for all points lying within this voxel?
Since I'm relatively new in machine learning I can't be quite sure that this was not done before or there are some reasons making this approach worthless for real data. I tried this approach with the adaptive grid in use. The below images were obtained by representing each data point by RBF and calculating influence of all such RBFs for each voxel.
Tests with dummy dataset.
Test with Zip Code dataset: rbf view f225; voxel view f10; voxel view f20; voxel view f30; voxel view f50; voxel view f100
Related
I am trying to detect potatoes using deep learning and sliding window approach to locate the potatoes. Since the CNN is not affected by orientation of the object that has been used to train the model, I don't see any problem in training the model but when it comes to detection there is huge issue. You see, the potatoes are more or less like cucumbers. And since I'm using sliding window technique, it is not possible to fit a potato which is in different orientation. For reference see the below image. What should I do for the detection part of the segmentation process?
This neural network is an example of how to handle object rotation. It uses a dataset where both object positions and orientations are labeled. Additionally, it used data augmentation with rotations. The rotation angle (more precisely, its sine and cosine) is added to the model output and the loss function.
So, the model detects the objects regardless of their orientation and also predicts their angles.
If you do not need to predict the angle, you can add rotated objects in the dataset and data augmentation without learning the angle. This will make the model rotationally invariant.
I trained up a very vanilla CNN using keras/theano that does a pretty good job of detecting whether a small (32X32) portion of an image contains a (relatively simple) object of type A or B (or neither). The output is an array of three numbers [prob(neither class), prob(A), prob(B)]. Now, I want to take a big image (512X680, methinks) and sweep across the image, running the trained model on each 32X32 sub-image to generate a feature map that's 480X648, at each point consisting of a 3-vector of the aforementioned probabilities. Basically, I want to use my whole trained CNN as a (nonlinear) filter with three-dimensional output. At the moment, I am cutting each 32X32 out of the image one at a time and running the model on it, then dropping the resulting 3-vectors in a big 3X480X648 array. However, this approach is very slow. Is there a faster/better way to do this?
Is there any way to reduce the dimension of the following features from 2D coordinate (x,y) to one dimension?
Yes. In fact, there are infinitely many ways to reduce the dimension of the features. It's by no means clear, however, how they perform in practice.
A feature reduction usually is done via a principal component analysis (PCA) which involves a singular value decomposition. It finds the directions with highest variance -- that is, those direction in which "something is going on".
In your case, a PCA might find the black line as one of the two principal components:
The projection of your data onto this one-dimensional subspace than yields the reduced form of your data.
Already with the eye one can see that on this line the three feature sets can be separated -- I coloured the three ranges accordingly. For your example, it is even possible to completely separate the data sets. A new data point then would be classified according to the range in which its projection onto the black line lies (or, more generally, the projection onto the principal component subspace) lies.
Formally, one could obtain a division with further methods that use the PCA-reduced data as input, such as for example clustering methods or a K-nearest neighbour model.
So, yes, in case of your example it could be possible to make such a strong reduction from 2D to 1D, and, at the same time, even obtain a reasonable model.
Let's say I have a perfect 3D model of the rigid object I am looking for.
I want to find this object in a scene image using the histogram of oriented gradients (HOG) algorithm.
One way to train my SVM would be to render this object on top of a bunch of random backgrounds, in order to generate the positive training examples.
But, is there a faster, more direct way to use the model to train the SVM? One that doesn't involve rendering it multiple times?
I have been doing reading about Self Organizing Maps, and I understand the Algorithm(I think), however something still eludes me.
How do you interpret the trained network?
How would you then actually use it for say, a classification task(once you have done the clustering with your training data)?
All of the material I seem to find(printed and digital) focuses on the training of the Algorithm. I believe I may be missing something crucial.
Regards
SOMs are mainly a dimensionality reduction algorithm, not a classification tool. They are used for the dimensionality reduction just like PCA and similar methods (as once trained, you can check which neuron is activated by your input and use this neuron's position as the value), the only actual difference is their ability to preserve a given topology of output representation.
So what is SOM actually producing is a mapping from your input space X to the reduced space Y (the most common is a 2d lattice, making Y a 2 dimensional space). To perform actual classification you should transform your data through this mapping, and run some other, classificational model (SVM, Neural Network, Decision Tree, etc.).
In other words - SOMs are used for finding other representation of the data. Representation, which is easy for further analyzis by humans (as it is mostly 2dimensional and can be plotted), and very easy for any further classification models. This is a great method of visualizing highly dimensional data, analyzing "what is going on", how are some classes grouped geometricaly, etc.. But they should not be confused with other neural models like artificial neural networks or even growing neural gas (which is a very similar concept, yet giving a direct data clustering) as they serve a different purpose.
Of course one can use SOMs directly for the classification, but this is a modification of the original idea, which requires other data representation, and in general, it does not work that well as using some other classifier on top of it.
EDIT
There are at least few ways of visualizing the trained SOM:
one can render the SOM's neurons as points in the input space, with edges connecting the topologicaly close ones (this is possible only if the input space has small number of dimensions, like 2-3)
display data classes on the SOM's topology - if your data is labeled with some numbers {1,..k}, we can bind some k colors to them, for binary case let us consider blue and red. Next, for each data point we calculate its corresponding neuron in the SOM and add this label's color to the neuron. Once all data have been processed, we plot the SOM's neurons, each with its original position in the topology, with the color being some agregate (eg. mean) of colors assigned to it. This approach, if we use some simple topology like 2d grid, gives us a nice low-dimensional representation of data. In the following image, subimages from the third one to the end are the results of such visualization, where red color means label 1("yes" answer) andbluemeans label2` ("no" answer)
onc can also visualize the inter-neuron distances by calculating how far away are each connected neurons and plotting it on the SOM's map (second subimage in the above visualization)
one can cluster the neuron's positions with some clustering algorithm (like K-means) and visualize the clusters ids as colors (first subimage)