Comparing similar images as photographs -- detecting difference, image diff - image-processing

The situation is kind of unique from anything I have been able to find asked already, and is as follows: If I took a photo of two similar images, I'd like to be able to highlight the differing features in the two images. For example the following two halves of a children's spot the difference game:
The differences in the images will be bits missing/added and/or colour change, and the type of differences which would be easily detectable from the original image files by doing nothing cleverer than a pixel-by-pixel comparison. However the fact that they're subject to the fluctuations of light and imprecision of photography, I'll need a far more lenient/clever algorithm.
As you can see, the images won't necessarily line up perfectly if overlaid.
This question is tagged language-agnostic as I expect answers that point me towards relevant algorithms, however I'd also be interested in current implementations if they exist, particularly in Java, Ruby, or C.

The following approach should work. All of these functionalities are available in OpenCV. Take a look at this example for computing homographies.
Detect keypoints in the two images using a corner detector.
Extract descriptors (SIFT/SURF) for the keypoints.
Match the keypoints and compute a homography using RANSAC, that aligns the second image to the first.
Apply the homography to the second image, so that it is aligned with the first.
Now simply compute the pixel-wise difference between the two images, and the difference image will highlight everything that has changed from the first to the second.

My general approach would be to use an optical flow to align both images and perform a pixel by pixel comparison once they are aligned.
However, for the specifics, standard optical flows (OpenCV etc.) are likely to fail if the two images differ significantly like in your case. If that indeed fails, there are recent optical flow techniques that are supposed to work even if the images are drastically different. For instance, you might want to look at the paper about SIFT flows by Ce Liu et al that addresses this problem with sparse correspondences.

Related

OpenCV Matching Exposure across images

I was wondering if its possible to match the exposure across a set of images.
For example, lets say you have 5 images that were taken at different angles. Images 1-3,5 are taken with the same exposure whilst the 4th image have a slightly darker exposure. When I then try to combine these into a cylindrical panorama using (seamFinder with: gc_color, surf detection, MULTI_BAND blending,Wave correction, etc.) the result turns out with a big shadow in the middle due to the darkness from image 4.
I've also tried using exposureCompensator without luck.
Since I'm taking the pictures in iOS, I maybe could increase exposure manually when needed? But this doesn't seem optimal..
Have anyone else dealt with this problem?
This method is probably overkill (and not just a little) but the current state-of-the-art method for ensuring color consistency between different images is presented in this article from HaCohen et al.
Their algorithm can correct a wide range of errors in image sets. I have implemented and tested it on datasets with large errors and it performs very well.
But, once again, I suppose this is way overkill for panorama stitching.
Sunreef has provided a very good paper, but it does seem overkill because of the complexity of a possible implementation.
What you want to do is to equalize the exposure not on the entire images, but on the overlapping zones. If the histograms of the overlapped zones match, it is a good indicator that the images have similar brightness and exposure conditions. Since you are doing more than 1 stitch, you may require a global equalization in order to make all the images look similar, and then only equalize them using either a weighted equalization on the overlapped region or a quadratic optimiser (which is again overkill if you are not a professional photographer). OpenCV has a simple implmentation of a simple equalization compensation algorithm.
The detail::ExposureCompensator class of OpenCV (sample implementation of such a stitiching is here) would be ideal for you to use.
Just create a compensator (try the 2 different types of compensation: GAIN and GAIN_BLOCKS)
Feed the images into the compensator, based on where their top-left cornes lie (in the stitched image) along with a mask (which can be either completely white or white only in the overlapped region).
Apply compensation on each individual image and iteratively check the results.
I don't know any way to do this in iOS, just OpenCV.

Image similarity of apartment photos

I want to design an algorithm that would find matches in images of the same apartment, when put up by different real estate agents.
Photos are relatively taken in similar time so the interior of the rooms should not change that much but of course every guys takes different pictures from different angles, etc.
(TLDR; a apartment goes for sale, and different real estate guys come in and make their own pictures, and I want to know if the given pictures from various guys are of the same place)
I know that image processing and recognition algorithm selections highly depend on the use case, so could you point me in correct direction given my use-case?
http://reality.bazos.sk/inzerat/56232813/Prenajom-1-izb-bytu-v-sirsom-centre.php
http://reality.bazos.sk/inzerat/56371292/-PRENAJOM-krasny-1i-byt-rekonstr-Kupeckeho-Ruzinov-BA-II.php
You can actually use Clarifai's Custom Training API endpoint, fairly simple and straightforward. All you would have to do is train the initial image and then compare the second to it. If the probability is high, it is likely the same apartment. For example:
In javascript, to declare a positive it is:
clarifai.positive('http://example.com/apartment1.jpg', 'firstapartment', callback);
And a negative is:
clarifai.negative('http://example.com/notapartment1.jpg', 'firstapartment', callback);
You don't necessarily have to do a negative, but it could only help. Then, when you are comparing images to the first aparment, you do:
clarifai.predict('http://example.com/someotherapartment.jpg', 'firstapartment', callback);
This will give you a probability regarding the likeness of the photo to what you've trained ('firstapartment'). This API is basically doing machine learning without the hassle of the actual machine. Clarifai's API also has a tagging input that is extremely accurate with some basic tags. The API is free for a certain number of calls/month. Definitely worth it to check out for this case.
As user Shaked mentioned in a comment, this is a difficult problem. Even if you knew the position and orientation of each camera in space, and also the characteristics of each camera, it wouldn't be a trivial problem to match the images.
A "bag of words" (BoW) approach may be of use here. Rather than try to identify specific objects and/or deduce the original 3D scene, you determine what "feature descriptors" can distinguish objects from one another in your image sets.
https://en.wikipedia.org/wiki/Bag-of-words_model_in_computer_vision
Imagine you could describe the two images by the relative locations of textures and colors:
horizontal-ish line segments at far left
red blob near center left
green clumpy thing at bottom left
bright round object near top left
...
then for a reasonably constrained set of images (e.g. photos just within a certain zip code), you may be able to yield a good match between the two images above.
The Wikipedia article on BoW may look a bit daunting, but I think if you hunt around you'll find an article that describes "bag of words" for image processing clearly. I've seen a very good demo of a BoW approach used to identify objects such as boats and delivery vans in arbitrary video streams, and it worked impressively well. I wish I had a copy of the presentation to pass along.
If you don't suspect the image to change much, you could try the standard first step of any standard structure-from-motion algorithm to establish a notion of similarity between a pair of images. Any pair of images are similar if they contain a number of matching image features larger than a threshold which satisfy the geometrical constraint of the scene as well. For a general scene, that geometrical constraint is given by a Fundamental Matrix F computed using a subset of matching features.
Here are the steps. I have inserted the opencv method for each step, but you could write your methods too:
Read the pair of images. Use img = cv2.imread(filename).
Use SIFT/SURF to detect image features/descriptors in both images.
sift = cv2.xfeatures2d.SIFT_create()
kp, des = sift.detectAndCompute(img,None)
Match features using the descriptors.
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
matches = bf.match(des1,des2)
Use RANSAC to compute funamental matrix.
cv2.findFundamentalMatrix(pts1, pts2, cv2.FM_RANSAC, 3, 0.99, mask)
mask contains all the inliers. Simply count them to determine if the number of matches satisfying geometrical constraint is large enough.
CAUTION: In case of a planar scene, we use homography instead of a fundamental matrix and the steps described above work out pretty nicely because homography takes a point to a corresponding point in the other image. However, Fundamental matrix takes a point to the corresponding epipolar line in the other image, which makes the entire process a bit less stable. So I would recommend trying these steps a few more times with a little bit of jitter to the feature locations and collating the evidence over more than one trial to make the decision. You can also use more advanced steps to introduce robustness to this process but only if the steps described above don't yield the results you need.

OpenCV compare similar hand drawn images

I am trying to compare two mono-chrome, basic hand drawn images, captured electronically. The scale may be different but the essences of the image is the same. I want to compare one hand drawn image to a save library of images and get a relative score of how similar they are. Think of several basic geometric shapes, lines, and curves that make up a drawing.
I have tried several techniques without much luck. Pixel based comparisons are too exact. I have tried scaling and cropping images and that did not get accurate results.
I have tried OpenCV with C# and have had a little success. I have experimented with SURF and it works for a few images, but not others that the eye can tell are very similar.
So now my question: Are there any examples of using openCV or commercial software that can support comparing drawings that are not exact? I prefer C# but I am open to any solutions.
Thanks in advance for any guidance.
(I have been working on this for over a month and have searched the internet and Stack Overflow without success. I of course could have missed something)
You need to extract features from these images and after that using a basic euclidean distance would be enough to calculate similarity. But hand writtend drawn thins are not easy to extract features. For example, companies that work on face recognition generally have much less accuracy on drawn face portraits.
I have a suggestion for you. For a machine learning homework, one of my friends got the signature recognition assingment. I do not fully know how he did it with a high accuracy, but I know feature extraction part. Firtstly he converted it to binary image. And than he calculated the each row's black pixel count. Than he used that features to train a NN or etc.
So you can use this similar approach to extract features. Than use a euclidean distance to calculate similarities.

find mosquitos' head in the image

I have images of mosquitos similar to these ones and I would like to automatically circle around the head of each mosquito in the images. They are obviously in different orientations and there are random number of them in different images. some error is fine. Any ideas of algorithms to do this?
This problem resembles a face detection problem, so you could try a naïve approach first and refine it if necessary.
First you would need to recreate your training set. For this you would like to extract small images with examples of what is a mosquito head or what is not.
Then you can use those images to train a classification algorithm, be careful to have a balanced training set, since if your data is skewed to one class it would hit the performance of the algorithm. Since images are 2D and algorithms usually just take 1D arrays as input, you will need to arrange your images to that format as well (for instance: http://en.wikipedia.org/wiki/Row-major_order).
I normally use support vector machines, but other algorithms such as logistic regression could make the trick too. If you decide to use support vector machines I strongly recommend you to check libsvm (http://www.csie.ntu.edu.tw/~cjlin/libsvm/), since it's a very mature library with bindings to several programming languages. Also they have a very easy to follow guide targeted to beginners (http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf).
If you have enough data, you should be able to avoid tolerance to orientation. If you don't have enough data, then you could create more training rows with some samples rotated, so you would have a more representative training set.
As for the prediction what you could do is given an image, cut it using a grid where each cell has the same dimension that the ones you used on your training set. Then you pass each of this image to the classifier and mark those squares where the classifier gave you a positive output. If you really need circles then take the center of the given square and the radius would be the half of the square side size (sorry for stating the obvious).
So after you do this you might have problems with sizes (some mosquitos might appear closer to the camera than others) , since we are not trained the algorithm to be tolerant to scale. Moreover, even with all mosquitos in the same scale, we still might miss some of them just because they didn't fit in our grid perfectly. To address this, we will need to repeat this procedure (grid cut and predict) rescaling the given image to different sizes. How many sizes? well here you would have to determine that through experimentation.
This approach is sensitive to the size of the "window" that you are using, that is also something I would recommend you to experiment with.
There are some research may be useful:
A Multistep Approach for Shape Similarity Search in Image Databases
Representation and Detection of Shapes in Images
From the pictures you provided this seems to be an extremely hard image recognition problem, and I doubt you will get anywhere near acceptable recognition rates.
I would recommend a simpler approach:
First, if you have any control over the images, separate the mosquitoes before taking the picture, and use a white unmarked underground, perhaps even something illuminated from below. This will make separating the mosquitoes much easier.
Then threshold the image. For example here i did a quick try taking the red channel, then substracting the blue channel*5, then applying a threshold of 80:
Use morphological dilation and erosion to get rid of the small leg structures.
Identify blobs of the right size to be moquitoes by Connected Component Labeling. If a blob is large enough to be two mosquitoes, cut it out, and apply some more dilation/erosion to it.
Once you have a single blob like this
you can find the direction of the body using Principal Component Analysis. The head should be the part of the body where the cross-section is the thickest.

Shape context matching in OpenCV

Have OpenCV implementation of shape context matching? I've found only matchShapes() function which do not work for me. I want to get from shape context matching set of corresponding features. Is it good idea to compare and find rotation and displacement of detected contour on two different images.
Also some example code will be very helpfull for me.
I want to detect for example pink square, and in the second case pen. Other examples could be squares with some holes, stars etc.
The basic steps of Image Processing is
Image Acquisition > Preprocessing > Segmentation > Representation > Recognition
And what you are asking for seems to lie within the representation part os this general algorithm. You want some features that descripes the objects you are interested in, right? Before sharing what I've done for simple hand-gesture recognition, I would like you to consider what you actually need. A lot of times simplicity will make it a lot easier. Consider a fixed color on your objects, consider background subtraction (these two main ties to preprocessing and segmentation). As for representation, what features are you interested in? and can you exclude the need of some of these features.
My project group and I have taken a simple approach to preprocessing and segmentation, choosing a green glove for our hand. Here's and example of the glove, camera and detection on the screen:
We have used a threshold on defects, and specified it to find defects from fingers, and we have calculated the ratio of a rotated rectangular boundingbox, to see how quadratic our blod is. With only four different hand gestures chosen, we are able to distinguish these with only these two features.
The functions we have used, and the measurements are all available in the documentation on structural analysis for OpenCV, and for acces of values in vectors (which we've used a lot), can be found in the documentation for vectors in c++
I hope you can use the train of thought put into this; if you want more specific info I'll be happy to comment, Enjoy.

Resources