What is the current state of the art in Multi-View Clustering? - machine-learning

Many real-world datasets have representations in the form of multiple views. For example, a person can be identified by face, fingerprint, signature and iris or an image that can be represented by its color and texture features. Multi-view is basically information obtained from multiple sources. In the context of machine learning/data clustering/computer vision, what are the most relevant applications that deal with this approach?

In the context of computer vision multi-view refers to the images of the same object taken from different views/angles/positions. There are multiple applications of this strategy. 3D reconstruction from multiple view is one of the most popular examples.
The type of multi-view you are referring to is basically data augmentation to solve a single problem. As you have mentioned too, identification of a person from different kind of data-sources is an application of data-augmentation. There can be multiple other applications too. For example expression estimation, to identify the mood of a person, using data from a RGB camera + 3D data from Kinect + Audio is another example.
In the context of machine-learning data-augmentation is everywhere. Combining different features of an image or audio for classification is data-augmentation.

Related

Explanation of feature descriptors in computer vision and machine learning

I've started working with computer vision techniques quite a bit, mainly deep learning but I want to try and get a good understanding of the more traditional techniques as well for a good grounding. I have been playing around with some manual feature engineering techniques for classification with RF and SVM classifiers. I've looked at texture representations like HOG and LBP descriptors as well as edge filters, gabor filters and spacial features such as fourier descriptors. What i'm kind of lacking is a good idea of how the different features group and what categories they each belong to. I know some are defined as global and local but what does this mean exactly and which ones? and are there others categories like texture and geometric that I should consider? Any explanation would be useful and much appreciated (i've looked a lot online but it all seems a bit fragmented)
Thanks!
Features are the information extracted from images in terms of numerical values that are difficult to understand and correlate by human. Suppose we consider the image as data the information extracted from the data is known as features. Generally, features extracted from an image are of much more lower dimension than the original image. The reduction in dimentionality reduces the overheads of processing the bunch of images.
Basically there are two types of features are extracted from the images based on the application. They are local and global features. Features are sometimes referred to as descriptors. Global descriptors are generally used in image retrieval, object detection and classification, while the local descriptors are used for object recognition/identification. There is a large difference between detection and identification. Detection is finding the existence of something/object (Finding whether an object is exist in image/video) where as Recognition is finding the identity (Recognizing a person/object) of an object.
Global features describe the image as a whole to the generalize the entire object where as the local features describe the image patches (key points in the image) of an object. Global features include contour representations, shape descriptors, and texture features and local features represents the texture in an image patch. Shape Matrices, Invariant Moments (Hu, Zerinke), Histogram Oriented Gradients (HOG) and Co-HOG are some examples of global descriptors. SIFT, SURF, LBP, BRISK, MSER and FREAK are some examples of local descriptors.
Generally, for low level applications such as object detection and classification, global features are used and for higher level applications such as object recognition, local features are used. Combination of global and local features improves the accuracy of the recognition with the side-effect of computational overheads.

Match an image from a set of images : Combine traditional Computer vision + Deep Learning/CNN

In the application I am developing, I have about 5000 product label images.(One label per product).
One functionality of my application is that user can take a picture using his camera and get a possible match(es) against the product labels registered the system.
Since initially, my system only has one sample per product, I decided to go with traditional Computer Vision techniques. I managed to implement this using Feature extraction and Descriptor matching.(using OpenCV SIFT and FLANN techniques referring this: https://github.com/kipr/opencv/blob/master/samples/cpp/matching_to_many_images.cpp)
Now I am thinking how to improve the accuracy by combining with CNN or Deep Learning techniques since when users approve matches, it gradually add more label samples for a product.
Is it possible to build a hybrid image matching system combining Computer Vision techniques and CNN/Deep Learning techniques?
Are there any similar services already available as services?
You should learn more about Distance Metrics Learning (DML). There is a lot of information on the internet, but briefly:
You must get embeddings (vector representation) for each image from your base (e.g. get feature vector from last convolutional layer of one of the modern CNN's (Inception, VGG, ResNet, DenseNet))
Then, when you get new image, you should create vector representation of the current image and find the closest vector from your base (by Euclidean distance, for example)
This topic is quite complicated, so study it carefully :)
Have a luck!

Making a trained model (machine learning) from 3D models

i have a database with almost 20k 3D files, they are drawings from machine parts designed in a CAD software (solid works). Im trying to build a trained model from all of this 3D models, so i can build a 3D object Recognition App when someone can take a picture from one of this parts (in the real world) and the app can provide useful information about material , size , treatment and so on.
If anyone already do something similar, any information you can provide me would be greatly appreciated!
Some ideas:
1) Several pictures: instead of only one. As Rodrigo commented and Brad Larson tried to circumvent with his method, the problem with the user taking only one picture for the input is that you are necessarily lacking information to make a triangulation and form a point cloud in 3D. With 4 pictures taken from a slightly different angle, you can already reconstruct parts of the object. Comparing point clouds would make the endeavor much easier for any ML algorithm, Neuronal Networks (NN), Support Vector Machine (SVM) or others. A common standard to create point clouds is ASTM E2807, which uses the e57 file format.
On the downside a 3D vision algorithm might be heavy on the user's device, and is not the easiest to implement.
2) Artificial picture training: By training on pre-computed artificial pictures like Brad Larson suggested, you take over much of the computation, to the user's benefit. Be aware that you should probably use "features" extracted from the pictures, not the complete picture, both to train and to classify. The problem with this method is that you might be very sensitive to lighting and background context. You should take care to produce CAD pictures that have the same lightning conditions for all objects, so that the classifier doesn't overfit certain aspects of the "pictures" that do not belong to the object.
This aspect is where solution 1) is much more stable, it is less sensitive to the visual context.
3) Scale: The size of your object is an important descriptor. You should thus add scale information to your object descriptor before training. You could ask the user to take pictures with a reference object. Alternatively you can ask the user to make a rule-of-thumb estimate of the object size ("What are the approximate dimensions of the object, in [cm]?"). Providing size could make your algorithm significantly faster and more accurate.
If your test data in production is mainly images of the 3D object, then the method in the comment section by Brad Larson is the better approach and it is also easier to implement and takes a lot less effort and resources to get it up and running.
However if you want to classify between 3D models there are existing networks which exist to classify 3D point clouds. You will have to convert these models to point clouds and use them as training samples. One of those and which I have used is Voxnet. I also suggest you to add more variations to the training data like different rotations of the 3D model.
You can used Pre-Trained 3D Deep Neural Networks as there are many networks that could help you in your work and would produce high accuracy.

How to detect if there is an arbitrary document has been signed with opencv

I've been tasked with classifying 350k documents into "signed" and "not signed" piles. What is the fastest way to search for something that looks like a human signature with open source tools? To compound the problem, I need to assume each document is unique length and signature location. Does anyone have any ideas?
With a little bit of searching on Google, I have founded this article of IEEE. You will need to have an account to be able to read it.
Here is the description:
Detecting and segmenting free-form objects from cluttered backgrounds
is a challenging problem in computer vision. Signature detection in
document images is one classic example and as of yet no reasonable
solutions have been presented. In this paper, we propose a novel
multi-scale approach to jointly detecting and segmenting signatures
from documents with diverse layouts and complex backgrounds. Rather
than focusing on local features that typically have large variations,
our approach aims to capture the structural saliency of a signature by
searching over multiple scales. This detection framework is general
and computationally tractable. We present a saliency measure based on
a signature production model that effectively quantifies the dynamic
curvature of 2D contour fragments. Our evaluation using large real
world collections of handwritten and machine printed documents
demonstrates the effectiveness of this joint detection and
segmentation approach.

Image classification/recognition open source library

I have a set of reference images (200) and a set of photos of those images (tens of thousands). I have to classify each photo in a semi-automated way. Which algorithm and open source library would you advise me to use for this task? The best thing for me would be to have a similarity measure between the photo and the reference images, so that I would show to a human operator the images ordered from the most similar to the least one, to make her work easier.
To give a little more context, the reference images are branded packages, and the photos are of the same packages, but with all kinds of noises: reflections from the flash, low light, imperfect perspective, etc. The photos are already (manually) segmented: only the package is visible.
Back in my days with image recognition (like 15 years ago) I would have probably tried to train a neural network with the reference images, but I wonder if now there are better ways to do this.
I recommend that you use Python, and use the NumPy/SciPy libraries for your numerical work. Some helpful libraries for handling images are the Mahotas library and the scikits.image library.
In addition, you will want to use scikits.learn, which is a Python wrapper for Libsvm, a very standard SVM implementation.
The hard part is choosing your descriptor. The descriptor will be the feature you compute from each image, intended to compute a similarity distance with the set of reference images. A good set of things to try would be Histogram of Oriented Gradients, SIFT features, and color histograms, and play around with various ways of binning the different parts of the image and concatenating such descriptors together.
Next, set aside some of your data for training. For these data, you have to manually label them according to the true reference image they belong to. You can feed these labels into built-in functions in scikits.learn and it can train a multiclass SVM to recognize your images.
After that, you may want to look at MPI4Py, an implementation of MPI in Python, to take advantage of multiprocessors when doing the large descriptor computation and classification of the tens of thousands of remaining images.
The task you describe is very difficult and solving it with high accuracy could easily lead to a research-level publication in the field of computer vision. I hope I've given you some starting points: searching any of the above concepts on Google will hit on useful research papers and more details about how to use the various libraries.
The best thing for me would be to have a similarity measure between the photo and the reference images, so that I would show to a human operator the images ordered from the most similar to the least one, to make her work easier.
One way people do this is with the so-called "Earth mover's distance". Briefly, one imagines each pixel in an image as a stack of rocks with height corresponding to the pixel value and defines the distance between two images as the minimal amount of work needed to transfer one arrangement of rocks into the other.
Algorithms for this are a current research topic. Here's some matlab for one: http://www.cs.huji.ac.il/~ofirpele/FastEMD/code/ . Looks like they have a java version as well. Here's a link to the original paper and C code: http://ai.stanford.edu/~rubner/emd/default.htm
Try Radpiminer (one of the most widely used data-mining platform, http://rapid-i.com) with IMMI (Image Mining Extension, http://www.burgsys.com/mumi-image-mining-community.php), AGPL licence.
It currently implements several similarity measurement methods (not only trivial pixel by pixel comparison). The similarity measures can be input for a learning algorithm (e.g. neural network, KNN, SVM, ...) and it can be trained in order to give better performance. Some information bout the methods is given in this paper:
http://splab.cz/wp-content/uploads/2012/07/artery_detection.pdf
Now-a-days Deep Learning based framworks like Torch , Tensorflow, Theano, Keras are the best open source tool/library for object classification/recognition tasks.

Resources