Logo detection/recognition in natural images - opencv

Given a logo image as a reference image, how to detect/recognize it in a cluttered natural image?
The logo may be quite small in the image, it can appear in clothes, hats, shoes, background wall etc. I have tried SIFT feature for matching without any other preprocessing, and the result is good for cases in which the size of the logo in images is big and the logo is clear. However, it fails for some cases where the scene is quite cluttered and the proportion of the logo size is quite small compared with the whole image. It seems that SIFT feature is sensitive to perspective distortions.
Anyone know some better features or ideas for logo detection/recognition in natural images? For example, training a classifier to locate candidate regions first, and then apply directly SIFT matching for further recognition. However, training a model needs many data, especially it needs manually annotating logo regions in images, and it needs re-training (needs to collect and annotate new image) if I want to apply it for new logos.
So, any suggestions for this? Detailed workflow/code/reference will be highly appreciated, thanks!

There are many algorithms from shape matching to haar classifiers. The best algorithm very depend on kind of logo.
If you want to continue with feature registration, i recommend:
For detection of small logos, use tiles. Split whole image to smaller (overlapping) tiles and perform usual detection. It will use "locality" of searched features.
Try ASIFT for affine invariant detection.
Use many template images for reference feature extraction, with different lightning , different background images (black, white, gray)

Related

OpenCV Matching Exposure across images

I was wondering if its possible to match the exposure across a set of images.
For example, lets say you have 5 images that were taken at different angles. Images 1-3,5 are taken with the same exposure whilst the 4th image have a slightly darker exposure. When I then try to combine these into a cylindrical panorama using (seamFinder with: gc_color, surf detection, MULTI_BAND blending,Wave correction, etc.) the result turns out with a big shadow in the middle due to the darkness from image 4.
I've also tried using exposureCompensator without luck.
Since I'm taking the pictures in iOS, I maybe could increase exposure manually when needed? But this doesn't seem optimal..
Have anyone else dealt with this problem?
This method is probably overkill (and not just a little) but the current state-of-the-art method for ensuring color consistency between different images is presented in this article from HaCohen et al.
Their algorithm can correct a wide range of errors in image sets. I have implemented and tested it on datasets with large errors and it performs very well.
But, once again, I suppose this is way overkill for panorama stitching.
Sunreef has provided a very good paper, but it does seem overkill because of the complexity of a possible implementation.
What you want to do is to equalize the exposure not on the entire images, but on the overlapping zones. If the histograms of the overlapped zones match, it is a good indicator that the images have similar brightness and exposure conditions. Since you are doing more than 1 stitch, you may require a global equalization in order to make all the images look similar, and then only equalize them using either a weighted equalization on the overlapped region or a quadratic optimiser (which is again overkill if you are not a professional photographer). OpenCV has a simple implmentation of a simple equalization compensation algorithm.
The detail::ExposureCompensator class of OpenCV (sample implementation of such a stitiching is here) would be ideal for you to use.
Just create a compensator (try the 2 different types of compensation: GAIN and GAIN_BLOCKS)
Feed the images into the compensator, based on where their top-left cornes lie (in the stitched image) along with a mask (which can be either completely white or white only in the overlapped region).
Apply compensation on each individual image and iteratively check the results.
I don't know any way to do this in iOS, just OpenCV.

Proper approach to feature detection with opencv

My goal is to find known logos in static image and videos. I want to achieve that by using feature detection with KAZE or AKAZE and RanSac.
I am aiming for a similar result to: https://www.youtube.com/watch?v=nzrqH...
While experimenting with the detection example from the docs which is great btw, i was facing several issues:
Object resolution: Differences in size between the known object and
the resolution of the scene where the object should be located
sometimes breaks the detection algorithm - the object won't be
recognized in images with a low resolution although the image quality
is still allright for a human eye.
Color contrast with the background: It seems, that the detection can
easily be distracted by different background contrasts (eg: object is
logo black on white background, logo in scene is white on black
background). How can I make the detection more robust against
different luminations and background contrasts?
Preprocessing: Should there be done any kind of preprocessing of the
object / scene? For example enlarge the scene up to a specific size?
Is there any guideline how to approach the feature detection in
several steps to get the best results?
I think your issue is more complicated than feature-descriptor-matching-homography process.
It is more likely oriented to pattern recognition or classification.
You can check this extended paper review of shape matching:
http://www.staff.science.uu.nl/~kreve101/asci/vir2001.pdf
Firstly, the resolution of images is very important,
because usually matching operation makes a pixel intensity cross-correlation
between your sample image (logo) and your process image, so you will get the best-crosscorrelated area.
In the same way, the background colour intensity
is very important because background illumination could affect severally to your final result.
Feature-based methods are widely researched:
http://docs.opencv.org/2.4/modules/features2d/doc/feature_detection_and_description.html
http://docs.opencv.org/2.4/modules/features2d/doc/common_interfaces_of_descriptor_extractors.html
So for example, you can try alternative methods such as:
Hog descritors: Histogram oriented gradients:
https://en.wikipedia.org/wiki/Histogram_of_oriented_gradients
Pattern matching or template matching
http://docs.opencv.org/2.4/doc/tutorials/imgproc/histograms/template_matching/template_matching.html
I think the lastest (Pattern matching) is the easiest to check your algorithm.
Hope these references helps.
Cheers.
Unai.

Iris detection in still images

I have good resolution faces images and I would like to automatically detect the iris and know its color. Is there any state-of-the-art (standard) way to detect iris other than HoughCircles which is not reporting consistent results on different images. One condition I have is that I have to use still images (no video is available)?
I am using OpenCV-Python for image processing. Any help is highly appreciated.
I think the problem can be split into two parts:
Localisation of the iris regions
Estimating the colour
Step one is time consuming, but I have done this at my workplace. You can train a Haar-cascade classifier for iris images (grayscale), and localise the iris within the eye-region returned be the cascade classifier for the eyes. If you already have a collection of face images, you can use them. Otherwise, try to collect as many samples as possible, with the same image quality as the images you want to use.
Step two is relatively easy, but might not be "very easy" because of auto white balance etc.
If you want a simpler approach, try detecting the white regions of the eye and using them to loc

find mosquitos' head in the image

I have images of mosquitos similar to these ones and I would like to automatically circle around the head of each mosquito in the images. They are obviously in different orientations and there are random number of them in different images. some error is fine. Any ideas of algorithms to do this?
This problem resembles a face detection problem, so you could try a naïve approach first and refine it if necessary.
First you would need to recreate your training set. For this you would like to extract small images with examples of what is a mosquito head or what is not.
Then you can use those images to train a classification algorithm, be careful to have a balanced training set, since if your data is skewed to one class it would hit the performance of the algorithm. Since images are 2D and algorithms usually just take 1D arrays as input, you will need to arrange your images to that format as well (for instance: http://en.wikipedia.org/wiki/Row-major_order).
I normally use support vector machines, but other algorithms such as logistic regression could make the trick too. If you decide to use support vector machines I strongly recommend you to check libsvm (http://www.csie.ntu.edu.tw/~cjlin/libsvm/), since it's a very mature library with bindings to several programming languages. Also they have a very easy to follow guide targeted to beginners (http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf).
If you have enough data, you should be able to avoid tolerance to orientation. If you don't have enough data, then you could create more training rows with some samples rotated, so you would have a more representative training set.
As for the prediction what you could do is given an image, cut it using a grid where each cell has the same dimension that the ones you used on your training set. Then you pass each of this image to the classifier and mark those squares where the classifier gave you a positive output. If you really need circles then take the center of the given square and the radius would be the half of the square side size (sorry for stating the obvious).
So after you do this you might have problems with sizes (some mosquitos might appear closer to the camera than others) , since we are not trained the algorithm to be tolerant to scale. Moreover, even with all mosquitos in the same scale, we still might miss some of them just because they didn't fit in our grid perfectly. To address this, we will need to repeat this procedure (grid cut and predict) rescaling the given image to different sizes. How many sizes? well here you would have to determine that through experimentation.
This approach is sensitive to the size of the "window" that you are using, that is also something I would recommend you to experiment with.
There are some research may be useful:
A Multistep Approach for Shape Similarity Search in Image Databases
Representation and Detection of Shapes in Images
From the pictures you provided this seems to be an extremely hard image recognition problem, and I doubt you will get anywhere near acceptable recognition rates.
I would recommend a simpler approach:
First, if you have any control over the images, separate the mosquitoes before taking the picture, and use a white unmarked underground, perhaps even something illuminated from below. This will make separating the mosquitoes much easier.
Then threshold the image. For example here i did a quick try taking the red channel, then substracting the blue channel*5, then applying a threshold of 80:
Use morphological dilation and erosion to get rid of the small leg structures.
Identify blobs of the right size to be moquitoes by Connected Component Labeling. If a blob is large enough to be two mosquitoes, cut it out, and apply some more dilation/erosion to it.
Once you have a single blob like this
you can find the direction of the body using Principal Component Analysis. The head should be the part of the body where the cross-section is the thickest.

Face Recognition Logic

I want to develop an application in which user input an image (of a person), a system should be able to identify face from an image of a person. System also works if there are more than one persons in an image.
I need a logic, I dont have any idea how can work on image pixel data in such a manner that it identifies person faces.
Eigenface might be a good algorithm to start with if you're looking to build a system for educational purposes, since it's relatively simple and serves as the starting point for a lot of other algorithms in the field. Basically what you do is take a bunch of face images (training data), switch them to grayscale if they're RGB, resize them so that every image has the same dimensions, make the images into vectors by stacking the columns of the images (which are now 2D matrices) on top of each other, compute the mean of every pixel value in all the images, and subtract that value from every entry in the matrix so that the component vectors won't be affine. Once that's done, you compute the covariance matrix of the result, solve for its eigenvalues and eigenvectors, and find the principal components. These components will serve as the basis for a vector space, and together describe the most significant ways in which face images differ from one another.
Once you've done that, you can compute a similarity score for a new face image by converting it into a face vector, projecting into the new vector space, and computing the linear distance between it and other projected face vectors.
If you decide to go this route, be careful to choose face images that were taken under an appropriate range of lighting conditions and pose angles. Those two factors play a huge role in how well your system will perform when presented with new faces. If the training gallery doesn't account for the properties of a probe image, you're going to get nonsense results. (I once trained an eigenface system on random pictures pulled down from the internet, and it gave me Bill Clinton as the strongest match for a picture of Elizabeth II, even though there was another picture of the Queen in the gallery. They both had white hair, were facing in the same direction, and were photographed under similar lighting conditions, and that was good enough for the computer.)
If you want to pull faces from multiple people in the same image, you're going to need a full system to detect faces, pull them into separate files, and preprocess them so that they're comparable with other faces drawn from other pictures. Those are all huge subjects in their own right. I've seen some good work done by people using skin color and texture-based methods to cut out image components that aren't faces, but these are also highly subject to variations in training data. Color casting is particularly hard to control, which is why grayscale conversion and/or wavelet representations of images are popular.
Machine learning is the keystone of many important processes in an FR system, so I can't stress the importance of good training data enough. There are a bunch of learning algorithms out there, but the most important one in my view is the naive Bayes classifier; the other methods converge on Bayes as the size of the training dataset increases, so you only need to get fancy if you plan to work with smaller datasets. Just remember that the quality of your training data will make or break the system as a whole, and as long as it's solid, you can pick whatever trees you like from the forest of algorithms that have been written to support the enterprise.
EDIT: A good sanity check for your training data is to compute average faces for your probe and gallery images. (This is exactly what it sounds like; after controlling for image size, take the sum of the RGB channels for every image and divide each pixel by the number of images.) The better your preprocessing, the more human the average faces will look. If the two average faces look like different people -- different gender, ethnicity, hair color, whatever -- that's a warning sign that your training data may not be appropriate for what you have in mind.
Have a look at the Face Recognition Hompage - there are algorithms, papers, and even some source code.
There are many many different alghorithms out there. Basically what you are looking for is "computer vision". We had made a project in university based around facial recognition and detection. What you need to do is google extensively and try to understand all this stuff. There is a bit of mathematics involved so be prepared. First go to wikipedia. Then you will want to search for pdf publications of specific algorithms.
You can go a hard way - write an implementaion of all alghorithms by yourself. Or easy way - use some computer vision library like OpenCV or OpenVIDIA.
And actually it is not that hard to make something that will work. So be brave. A lot harder is to make a software that will work under different and constantly varying conditions. And that is where google won't help you. But I suppose you don't want to go that deep.

Resources