Imagine a huge 360° panoramic shot from security camera. I need to find people on this image. Suppose I already have some great classifier to tell if some picture is a picture of a single human being.
What I don't understand - is how to apply this classifier to the panoramic shot, which can contain multiple people? Should I apply it to all possible regions of the image? Or there is some way to search for "interesting points" and feed only regions around this points to my classifier?
Which keywords to google / algorythms to read about to find out ways to search regions of image, that may (or may not) contain information for following classification?
Related
Newbee question here: right now I need a relatively large database of images, say a few thousand, for image recognition training purposes.
I found some good ways to get rough results, like playing with google images search terms e.t.c. and binge-downloading them. But the results (images) are still only like 50% representative of what I would need on the image (other images are either related things, or plain garbage). Picking by hand isn't really an option, because it would take a whole lot of time. So is there a quicker way of picking those images? Like using some other image recognition network, or specific software?
You can search Kaggle for image recognition competitions such as cdiscount-image-classification-challenge.
The data is usually available easily.
What image recognition technology is good at identifying a low resolution object?
Specifically I want to match a low resolution object to a particular item in my database.
Specific Example:
Given a picture with bottles of wine, I want to identify which wines are in it. Here's an example picture:
I already have a database of high resolution labels to match to.
Given a high-res picture of an individual bottle of wine - it was very easy to match it to its label using a Vuforia (service for some image recongition). However the service doesn't work well for lower resolution matching, like the bottles in the example image.
Research:
I'm new to this area of programming, so apologies for any ambiguities or obvious answers to this question. I've been researching, but theres a huge breadth of technologies out there for image recognition. Evaluating each one takes a significant amount of time, so I'll try to keep this question updated as I research them.
OpenCV: seems to be the most popular open source computer vision library. Many modules, not sure which are applicable yet.
haar-cascade feature detection: helps with pre-processing an image by orienting a component correctly (e.g. making a wine label vertical)
OCR: good for reading text at decent resolutions - not good for low-resolution labels where a lot of text is not visible
Vuforia: a hosted service that does some types of image recognition. Mostly meant for augmented reality developers. Doesn't offer control over algorithm. Doesn't work for this kind of resolution
Given a logo image as a reference image, how to detect/recognize it in a cluttered natural image?
The logo may be quite small in the image, it can appear in clothes, hats, shoes, background wall etc. I have tried SIFT feature for matching without any other preprocessing, and the result is good for cases in which the size of the logo in images is big and the logo is clear. However, it fails for some cases where the scene is quite cluttered and the proportion of the logo size is quite small compared with the whole image. It seems that SIFT feature is sensitive to perspective distortions.
Anyone know some better features or ideas for logo detection/recognition in natural images? For example, training a classifier to locate candidate regions first, and then apply directly SIFT matching for further recognition. However, training a model needs many data, especially it needs manually annotating logo regions in images, and it needs re-training (needs to collect and annotate new image) if I want to apply it for new logos.
So, any suggestions for this? Detailed workflow/code/reference will be highly appreciated, thanks!
There are many algorithms from shape matching to haar classifiers. The best algorithm very depend on kind of logo.
If you want to continue with feature registration, i recommend:
For detection of small logos, use tiles. Split whole image to smaller (overlapping) tiles and perform usual detection. It will use "locality" of searched features.
Try ASIFT for affine invariant detection.
Use many template images for reference feature extraction, with different lightning , different background images (black, white, gray)
I am trying to compare two mono-chrome, basic hand drawn images, captured electronically. The scale may be different but the essences of the image is the same. I want to compare one hand drawn image to a save library of images and get a relative score of how similar they are. Think of several basic geometric shapes, lines, and curves that make up a drawing.
I have tried several techniques without much luck. Pixel based comparisons are too exact. I have tried scaling and cropping images and that did not get accurate results.
I have tried OpenCV with C# and have had a little success. I have experimented with SURF and it works for a few images, but not others that the eye can tell are very similar.
So now my question: Are there any examples of using openCV or commercial software that can support comparing drawings that are not exact? I prefer C# but I am open to any solutions.
Thanks in advance for any guidance.
(I have been working on this for over a month and have searched the internet and Stack Overflow without success. I of course could have missed something)
You need to extract features from these images and after that using a basic euclidean distance would be enough to calculate similarity. But hand writtend drawn thins are not easy to extract features. For example, companies that work on face recognition generally have much less accuracy on drawn face portraits.
I have a suggestion for you. For a machine learning homework, one of my friends got the signature recognition assingment. I do not fully know how he did it with a high accuracy, but I know feature extraction part. Firtstly he converted it to binary image. And than he calculated the each row's black pixel count. Than he used that features to train a NN or etc.
So you can use this similar approach to extract features. Than use a euclidean distance to calculate similarities.
I want to develop an application in which user input an image (of a person), a system should be able to identify face from an image of a person. System also works if there are more than one persons in an image.
I need a logic, I dont have any idea how can work on image pixel data in such a manner that it identifies person faces.
Eigenface might be a good algorithm to start with if you're looking to build a system for educational purposes, since it's relatively simple and serves as the starting point for a lot of other algorithms in the field. Basically what you do is take a bunch of face images (training data), switch them to grayscale if they're RGB, resize them so that every image has the same dimensions, make the images into vectors by stacking the columns of the images (which are now 2D matrices) on top of each other, compute the mean of every pixel value in all the images, and subtract that value from every entry in the matrix so that the component vectors won't be affine. Once that's done, you compute the covariance matrix of the result, solve for its eigenvalues and eigenvectors, and find the principal components. These components will serve as the basis for a vector space, and together describe the most significant ways in which face images differ from one another.
Once you've done that, you can compute a similarity score for a new face image by converting it into a face vector, projecting into the new vector space, and computing the linear distance between it and other projected face vectors.
If you decide to go this route, be careful to choose face images that were taken under an appropriate range of lighting conditions and pose angles. Those two factors play a huge role in how well your system will perform when presented with new faces. If the training gallery doesn't account for the properties of a probe image, you're going to get nonsense results. (I once trained an eigenface system on random pictures pulled down from the internet, and it gave me Bill Clinton as the strongest match for a picture of Elizabeth II, even though there was another picture of the Queen in the gallery. They both had white hair, were facing in the same direction, and were photographed under similar lighting conditions, and that was good enough for the computer.)
If you want to pull faces from multiple people in the same image, you're going to need a full system to detect faces, pull them into separate files, and preprocess them so that they're comparable with other faces drawn from other pictures. Those are all huge subjects in their own right. I've seen some good work done by people using skin color and texture-based methods to cut out image components that aren't faces, but these are also highly subject to variations in training data. Color casting is particularly hard to control, which is why grayscale conversion and/or wavelet representations of images are popular.
Machine learning is the keystone of many important processes in an FR system, so I can't stress the importance of good training data enough. There are a bunch of learning algorithms out there, but the most important one in my view is the naive Bayes classifier; the other methods converge on Bayes as the size of the training dataset increases, so you only need to get fancy if you plan to work with smaller datasets. Just remember that the quality of your training data will make or break the system as a whole, and as long as it's solid, you can pick whatever trees you like from the forest of algorithms that have been written to support the enterprise.
EDIT: A good sanity check for your training data is to compute average faces for your probe and gallery images. (This is exactly what it sounds like; after controlling for image size, take the sum of the RGB channels for every image and divide each pixel by the number of images.) The better your preprocessing, the more human the average faces will look. If the two average faces look like different people -- different gender, ethnicity, hair color, whatever -- that's a warning sign that your training data may not be appropriate for what you have in mind.
Have a look at the Face Recognition Hompage - there are algorithms, papers, and even some source code.
There are many many different alghorithms out there. Basically what you are looking for is "computer vision". We had made a project in university based around facial recognition and detection. What you need to do is google extensively and try to understand all this stuff. There is a bit of mathematics involved so be prepared. First go to wikipedia. Then you will want to search for pdf publications of specific algorithms.
You can go a hard way - write an implementaion of all alghorithms by yourself. Or easy way - use some computer vision library like OpenCV or OpenVIDIA.
And actually it is not that hard to make something that will work. So be brave. A lot harder is to make a software that will work under different and constantly varying conditions. And that is where google won't help you. But I suppose you don't want to go that deep.