Correlating a vector with edges in an image - image-processing

I'm trying to implement user-assisted edge detection using OpenCV.
Assume you have an image in which we need to find a polygonal shape. For the sake of discussion, let's say we need to find the top of a rectangular table in a picture. The user will click on the four corners of the table to help us narrow things down. Connecting those four points gives us a polygon, or four vectors.
But the user is not very accurate when clicking on those corners. So I'd like to use edge information from the image to increase the accuracy.
I'm using a Canny edge detector with a fairly high treshold to determine important edges in my image. (more precisely, I'm scaling down, blurring, converting to grayscale, then run Canny). How can I compute whether a vector aligns with an edge in my image? If I have a way to compute "alignment", my overal algorithm comes down to perturbating the location of the four edge points, computing the total "alignment" of my polygon with the edges in the image, until I find an optimum.
What is a good way to define and compute this "alignment" metric?

You may want to try to use FindContours to detect your table or any other contour. Then build a contour also from the user input points. After this you can read about Contour Moments by which you can compare contours. You can compare all the contours from the image with the one built from the user points and then select the closest match.

Related

Matching bounding boxes from multiple viewing angles

I have mounted two cameras on different shelves of a fridge (at the bottom and top), facing each other. An object detector is fed with two images from these two streams and returns bounding boxes from them, independently.
The problem: Given bounding boxes from different viewing angles, determine their correspondence.
I know that since depth is unknown, the x, y coordinates in one camera may correspond to multiple positions in the other. This is why our solutions hitherto have been approximations that work with varying success. Naive solution has been to ignore any difference between the camera coordinates and use the Euclidean distance to get correspondence.
Another solution is to use the Fundamental Matrix, which gives a way to calculate correspondence based on epipolar geometry. This solution may be quite tedious because it requires a form of calibration and the results have not been great. It may be due to the fact that my calibration was sloppy.
The last solution, which works poorly, is to use edge och keypoint detection and match them. Since the edges, shape etc. differs quite radically, we understand why this is so.
Ultimately, I was wondering how you would have tackled the problem or if there are anything you can point me towards to get a more robust solutions.
In the left image we have the bottom view. This image has been mirrored about y-axis. The right image is from the top shelf. The object detector has detected two bounding boxes for each viewing angle, a total of four bounding boxes. These are drawn on both cameras. For example, bounding box with id 16 is detected on the top-positioned camera and is also drawn on the bottom camera to indicate its displacement. How would you go about in matching the bounding boxes which belong to the same object viewed from another viewing angle?

Algorithm to detect credit card sized card

I want to detect a credit card sized card in image. The card can be any card eg identity card, member card. Currently, I am thinking to use Canny Edge, Hough Line and Hough Circle to detect the card. But the process will be tedious when I want to combine all the information of Hough Line and Hough Circle to locate the card. Some people suggest threshold and findContour but the color of card can be similar to the background which make this method difficult to achieve the desired result. Is there any kernel and method which can help me to detect the card?
I think, your problem is similar to document scanner. You can refer to this link
Find edges in the image using Canny edge detector (lower and higher thresholds can be set as 0.66*meanIntensity and 1.33*meanIntensity) and do a morphological close operation.
Edge image after performing close
Find the contours in the image using findContours
filterout unwanted contours (I used contourArea to filter contours)
using approxPolyDP approximate the contours to 7 or more points. (I used 0.005 * perimeter as the parameter here)
If you want to find accurate edges, fit lines between the points and get the 4 biggest lines.Find their intersection (since the card may or may not contain curved edges)
You'll end up with the card endpoints which can be used further for homography or to determine the region.
vertices of the card
Edit
Edited the answer to include the steps to obtain the vertices of the card and results are updated.
There are two sub-problems here -
Detect rectangular object in the image.
The rectangular object's actual size should be similar to Credit Card.
For first part, you can try out several methods to extract rectangular region in the image and see which suits your need.
This post shows a lot of methods which you can try out.
In my experience edge detection works best in most cases. Try Canny > Contours with Appoximations > Filter out irrelevant contours > Search for rectangles using some shape detection, template matching or any other methods. Even this post does a similar thing to achieve its task.
Coming to the second point, you cannot find out the size of an object in an image unless you have any reference(known) sized object in the image. If the image was captured from a closer distance, the card will seem larger and if taken from far, the card will seem smaller. So while capturing, you will have to enforce some restrictions like you can ask the user to capture image along with some standard ruler. You can also ask the user to capture image on an A4 sheet with all the sheet edges visible. Since, you know the size of the A4 sheet, you'll be able to estimate the size of the card.
Apart from above methods, if you have enough data set of such images, you can use Haar Classifiers or Neural Network/Deep Learning based methods to do this with much better accuracy.

What are keypoints in image processing?

When using OpenCV for example, algorithms like SIFT or SURF are often used to detect keypoints. My question is what actually are these keypoints?
I understand that they are some kind of "points of interest" in an image. I also know that they are scale invariant and are circular.
Also, I found out that they have orientation but I couldn't understand what this actually is. Is it an angle but between the radius and something? Can you give some explanation? I think I need what I need first is something simpler and after that it will be easier to understand the papers.
Let's tackle each point one by one:
My question is what actually are these keypoints?
Keypoints are the same thing as interest points. They are spatial locations, or points in the image that define what is interesting or what stand out in the image. Interest point detection is actually a subset of blob detection, which aims to find interesting regions or spatial areas in an image. The reason why keypoints are special is because no matter how the image changes... whether the image rotates, shrinks/expands, is translated (all of these would be an affine transformation by the way...) or is subject to distortion (i.e. a projective transformation or homography), you should be able to find the same keypoints in this modified image when comparing with the original image. Here's an example from a post I wrote a while ago:
Source: module' object has no attribute 'drawMatches' opencv python
The image on the right is a rotated version of the left image. I've also only displayed the top 10 matches between the two images. If you take a look at the top 10 matches, these are points that we probably would want to focus on that would allow us to remember what the image was about. We would want to focus on the face of the cameraman as well as the camera, the tripod and some of the interesting textures on the buildings in the background. You see that these same points were found between the two images and these were successfully matched.
Therefore, what you should take away from this is that these are points in the image that are interesting and that they should be found no matter how the image is distorted.
I understand that they are some kind of "points of interest" of an image. I also know that they are scale invariant and I know they are circular.
You are correct. Scale invariant means that no matter how you scale the image, you should still be able to find those points.
Now we are going to venture into the descriptor part. What makes keypoints different between frameworks is the way you describe these keypoints. These are what are known as descriptors. Each keypoint that you detect has an associated descriptor that accompanies it. Some frameworks only do a keypoint detection, while other frameworks are simply a description framework and they don't detect the points. There are also some that do both - they detect and describe the keypoints. SIFT and SURF are examples of frameworks that both detect and describe the keypoints.
Descriptors are primarily concerned with both the scale and the orientation of the keypoint. The keypoints we've nailed that concept down, but we need the descriptor part if it is our purpose to try and match between keypoints in different images. Now, what you mean by "circular"... that correlates with the scale that the point was detected at. Take for example this image that is taken from the VLFeat Toolbox tutorial:
You see that any points that are yellow are interest points, but some of these points have a different circle radius. These deal with scale. How interest points work in a general sense is that we decompose the image into multiple scales. We check for interest points at each scale, and we combine all of these interest points together to create the final output. The larger the "circle", the larger the scale was that the point was detected at. Also, there is a line that radiates from the centre of the circle to the edge. This is the orientation of the keypoint, which we will cover next.
Also I found out that they have orientation but I couldn't understand what actually it is. It is an angle but between the radius and something?
Basically if you want to detect keypoints regardless of scale and orientation, when they talk about orientation of keypoints, what they really mean is that they search a pixel neighbourhood that surrounds the keypoint and figure out how this pixel neighbourhood is oriented or what direction this patch is oriented in. It depends on what descriptor framework you look at, but the general jist is to detect the most dominant orientation of the gradient angles in the patch. This is important for matching so that you can match keypoints together. Take a look at the first figure I have with the two cameramen - one rotated while the other isn't. If you take a look at some of those points, how do we figure out how one point matches with another? We can easily identify that the top of the cameraman as an interest point matches with the rotated version because we take a look at points that surround the keypoint and see what orientation all of these points are in... and from there, that's how the orientation is computed.
Usually when we want to detect keypoints, we just take a look at the locations. However, if you want to match keypoints between images, then you definitely need the scale and the orientation to facilitate this.
I'm not as familiar with SURF, but I can tell you about SIFT, which SURF is based on. I provided a few notes about SURF at the end, but I don't know all the details.
SIFT aims to find highly-distinctive locations (or keypoints) in an image. The locations are not merely 2D locations on the image, but locations in the image's scale space, meaning they have three coordinates: x, y, and scale. The process for finding SIFT keypoints is:
blur and resample the image with different blur widths and sampling rates to create a scale-space
use the difference of gaussians method to detect blobs at different scales; the blob centers become our keypoints at a given x, y, and scale
assign every keypoint an orientation by calculating a histogram of gradient orientations for every pixel in its neighborhood and picking the orientation bin with the highest number of counts
assign every keypoint a 128-dimensional feature vector based on the gradient orientations of pixels in 16 local neighborhoods
Step 2 gives us scale invariance, step 3 gives us rotation invariance, and step 4 gives us a "fingerprint" of sorts that can be used to identify the keypoint. Together they can be used to match occurrences of the same feature at any orientation and scale in multiple images.
SURF aims to accomplish the same goals as SIFT but uses some clever tricks in order to increase speed.
For blob detection, it uses the determinant of Hessian method. The dominant orientation is found by examining the horizontal and vertical responses to Haar wavelets. The feature descriptor is similar to SIFT, looking at orientations of pixels in 16 local neighborhoods, but results in a 64-dimensional vector.
SURF features can be calculated up to 3 times faster than SIFT features, yet are just as robust in most situations.
For reference:
A good SIFT tutorial
An introduction to SURF

What are the possible fast ways to detect circle in an image?

What are the possible fast ways to detect circle in an image ?
For ex:
i have an image with one Big Circle and has 6 small circles inside big Circle.
I need to find a big circle without using Hough Circles(OpencV).
Standard algorithms to find circles are Hough (which jamk mentioned in the comments) and RANSAC. Parameterizing these algorithms will set a baseline speed for your application.
http://en.wikipedia.org/wiki/Hough_transform
http://en.wikipedia.org/wiki/RANSAC
To speed up these algorithms, you can look at your collection of images and decide whether limiting the search ranges will help speed up the search. That's straightforward enough: only search within a reasonable range for the radius. Since they take edge points as inputs, you can also look at methods to reduce the number of edge points checked.
However, there are a few other tricks to speed up processing.
Carefully set the range or ranges over which radii are checked. For example, you might not simply check from the smallest possible radius to the largest possible radius, but instead you might split the search into two different ranges: from radius R1 to R2, and then from radius R3 to R4.
Ditch the Canny edge detection in favor of the fastest possible edge detection your application can tolerate. (You can ditch Canny for lots of applications.)
Preprocess your image of edge points to eliminate outliers. The appropriate algorithm to eliminate outliers will be specific to your image set, but you'll probably be able to find an algorithm that eliminates obvious outliers and thereby saves some search time in the more expensive circle fit algorithms.
If your circles are very well defined, and all or nearly all points are present, figure out how you might match only a quarter circle or semicircle instead of a full circle.
Long story short: start with a complete implementation and benchmark it, then gradually tighten up parameter settings and limit search ranges while ensuring that you can still find circles for your application and your image set.
If your images are amenable to scaling, then one possibility is to create an image pyramid of images at different scales: 1/2 scale, 1/4 scale, 1/8 scale, etc. You'll need an edge-preserving scaling method at smaller scales.
Once you have your image pyramid, try the following:
Find circles at the very smallest scale. The image will be small and
the range of possible radii will be limited, so this should be a
quick operation.
If you find a circle using the initial fit at the small scale, improve the fit by testing in the next larger scale image -OR- go ahead and search in the full scale image.
Check the next largest scale. Circles that weren't visible in the smaller scale image may suddenly "appear" in the current scale.
Repeat the steps above through all scales in the image.
Image scaling will be a fast operation, and you can see that if at least one of your circles is present in a smaller scale image you should be able to reduce the total number of cycles by performing a rough circle fit in the small scale image and then optimizing the fit for those edge points alone in the full scale image.
Edge-preserving scaling can also make it possible to use correlation-type tools to find circles, but being able to do so depends on the content of your images, including the noise, how completely edge points represent circles, and so on.
Maybe, detect contours and check their properties, e.g. try to use cv::isContourConvex or another way could be to use the eigenvalues of the covariance matrix and check if contour's representative ellipse first eccentricity is ~0.

Remove high frequency vertical shear noise from image

I have a some scanned images, where the scanner appears to have introduced a certain kind of noise that I've not encountered before. I would like to find a way to remove it automatically. The noise looks like high frequency vertical shear. In other words, a horizontal line that should look like ------------ shows up as /\/\/\/\/\/\/\/\/\, where the amplitude and frequency of the shear seem pretty regular.
Can someone suggest a way of doing the following steps?
Given an image, identify the frequency and amplitude of the shear noise. One can assume that it is always vertical and the characteristic frequency is higher than other frequencies that naturally appear in the image.
Given the above parameters, apply an opposite, vertical, periodic shear to the image to cancel this noise.
It would also be helpful to know how these could be implemented using the tools implemented by a freely available image processing package. (Netpbm, ImageMagick, Gimp, some Python library are some examples.)
Update: Here's a sample from an image with this kind of distortion. Actually, this sample shows that the shear amplitude need not be uniform throughout the image. :-(
The original images are higher resolution (600 dpi).
My solution to the problem would be to convert the image to frequency domain using FFT. The result will be two matrices: the image signal amplitude and the image signal phase. These two matrices should have the same dimensions of the input image.
Now, you should use the amplitude matrix to detect a spike in the area tha corresponds to the noise frequency. Note that the top left of this corner of this matrix should correspond to low frequency components and bottom right to high frequencies.
After you have indentified the spike, you should set the corresponding coefficients (amplitude matrix entries) to zero. After you apply the inverse FFT you should get the input image without the noise.
Please provide an example image for a more concrete (a practical) solution to your problem.
You could use a Hough fit or RANSAC to fit lines first. For Hough to work you may need to "smear" the points using Gaussian blur or morphological dilation so that you get more hits for a given (rho, theta) line in parameter space.
Once you have line fits, you can determine the relative distance of the original points to each line. From that spatial information you can use FFT to find help find a "best fit" spatial frequency and then shift pixels up/down accordingly.
As a first take, you might even skip FFT and use more of a brute force method:
Find the best fit lines using Hough or RANSAC.
Determine the orientation of the lines.
Sampling perpendicular to the (nominally) horizontal lines, find the points along that column with respect to the closest best fit lines.
If the points along one sample are on average a distance +N away from their best fit lines, shift all the pixels in that column (or along that perpendicular sample) by -N.
This sort of technique should work if the shear is consistent along a vertical sample, but not necessarily from left to right. If the shear is always exactly vertical, then finding horizontal lines should be relatively easy.
Judging from your sample image, it looks as though the shear may be consistent across a horizontal line segment between a 3-way or 4-way intersection with a nominally vertical line segment. You could use corner detectors or other methods to find these intersections to limit the extent over which a pixel shifting operation takes place.
A technique I posted here is another way to find horizontal stretches of dark pixels in case they don't fall on a line:
Is there an efficient algorithm for segmentation of handwritten text?
All that aside, is there a chance you could have the scanner fixed?

Resources