I want to transform an image area (e.g. a piece of paper with text) from an arbitrary, distorted view to a predefined view (e.g. top down). So basically I have two image: image1 shows the piece of paper from the desired view, image2 shows it from an arbitrary view with addition distortion (e.g. bending of the paper and/or lens distortion).
My best try so far was to find features in both images, create an affine transformation matrix from them and apply it to image2. This works to a certain degree but I noticed when overlaying both images that image2 still has some kind of distortion going on in some areas. I assume there is also some lens distortion and/or some distortion due to the bending of the paper within image2. What I saw was that areas with many features fit better than areas with less. Which would make sense since there is only one transformation matrix for the whole image which is biased to areas with many features.
I have added an example image which shows two pages. Please assume that both pages contain identical content and can be feature matched. The left page is the predefined view and the right page shall be transformed. The green and red lines show the distortion due to bending.
But how can I achieve the unbending of the paper and removing the lens distortion (assuming image1 is distortion free)?
Lens calibration(e.g. with a checkerboard) is not an option. I assume that I would require a method which uses many transformation matrices for different areas of the image.
Example
Related
The question is:
Briefly suggest how blurring and edge detection be combined to select edges of a
particular scale (for example, only the edges of large objects)
This is from a mock question paper prepared by our professor.
I understand Sobel Filter has the effect of both blurring (Gaussian) and detecting edges in an image. However, what confuses me is how does that combination(of any filter(s)) help in detecting edges of a particular scale.
Is this related to Difference of Gaussian (DoG)? If yes, would it help with finding large object edges (or equivalently small object edges?
This is one of the results of DoG with gaussians with sigma 1.0 and 2.5
Seems like it has detected all types of edges.
Original Image:
Another hypothesis is a method used in SIFT to find features at different scale. By scaling the blurred image at multiple levels and finding differences between different blurred images. Would this help in finding large object edges (or equivalently small object images?
Please let me know if any additional details are required.
I have a large set of images which are cellphone photos taken of driver’s licenses (but this could apply to any type of document). They come in all shapes and sizes, meaning, different angles of the camera, different distances from the camera to the driver’s license, different lighting, etc.
Is there any way in OpenCV to identify in each image a known landmark, then crop, resize, rotate each image so that for the final result, I have a set of images that are completely uniform (e.g. driver’s license fills the whole image, they all look the same, aligned the same, etc)?
Thanks!
One approach (using the functionality available in OpenCV) that I can suggest is to:
Train a Haar Cascade Classifier to recognize the landmark (works quite well for logos)
Take a few rotations of the image and look for the logo in each of them. You should find one or more matches, depending on the number of rotations that you use
Perform edge detection and project the edge pixels to the vertical axis. The projection with largest gaps will represent the spacing between text, and lead you to select one image that is closest to the correct orientation
Now you can crop, zoom or un-zoom this image using the logo position using the known properties of a driving license (logo is so many inches from the top left etc.).
Instead of a Haar cascade classifier, you can also match SIFT features of a logo with that of the photo.
I did not post any code or examples because the question is very broad. But you can easily find OpenCV documentation and examples for each of these steps.
Is there any way in OpenCV to
identify in each image a known landmark: there are several ways to do it, see here: https://docs.opencv.org/3.0-beta/doc/py_tutorials/py_feature2d/py_table_of_contents_feature2d/py_table_of_contents_feature2d.html
crop: yes, subset it and don't forget to copyTo Select a subset of a Mat and copy them to create a new mat in C++/Opencv
resize, rotate each image so that for the final result, I have a set
of images that are completely uniform: you should use transformations like shown here: https://docs.opencv.org/3.0-beta/doc/py_tutorials/py_imgproc/py_geometric_transformations/py_geometric_transformations.html
this tutorial should be useful for you: https://www.learnopencv.com/homography-examples-using-opencv-python-c/
I have two images which I know represent the exact same object. In the picture below, they are referred as Reference and Match.
The image Match can undergo the following transformations compared to Reference:
The object may have changed its appearance locally by addition(e.g. dirt or lettering added to the side) or omission (side mirror has been taken out).
Stretched or reduced in size horizontally only (it is not resized in vertical direction)
Portions of Reference image are not present in Match (shaded in red in Reference Image).
Question: How can the regions which have "changed" in the ways mentioned above be identified ?
Idea#1: Dynamic Time Warping seems like a good candidate once the beginning and end of Match image (numbered 1 and 3 in the image) are aligned with corresponding columns in Reference Image, but I am not sure how to proceed.
Idea#2: Match SIFT features across images. The tessellation produced by feature point locations breaks up the image into non-uniform tiles. Use feature correspondences across images to determine which tiles to match across images. Use a similarity measure to figure out any changes.
You might want to consider an iterative registration algorithm. Basically you want to perform optimization to find the parameters of the transform, in your case horizontal scaling and horizontal translation. Once you optimize the parameters you will have the transformation between the two images, transform one to match the other, and can then use a subtraction to identify the regions with differences.
For registration take a look at the ITK library.
You can probably do a gradient decent optimization using mutual information as the metric. It has a number of different transforms that will capture translation and scaling. The code should run quickly on the sample images you show.
I've created an iPhone app that can scan an image of a page of graph paper and can then tell me which squares have been blacked out and which squares are blank.
I do this by scanning from left to right and use the graph paper's lines as guides. When I encounter a graph paper line, I start to look for black, until I hit the graph paper line again. Then, instead of continuing along the scan line, I go ahead and completely scan the square for black. Then I continue on to the next box. At the end of the line, I skip down so many pixels before starting the scan on a new line (since I have already figured out how tall each box is).
This sort of works, but there are problems. Sometimes I mistake the graph lines as "black". Sometimes, if the image is skewed, or I don't have uniform lighting across the page, then I don't get good results.
What I'd like to do is to specify a few "alignment" boxes that I then resize and rotate (and skew) the picture to align with those. Then, I was thinking that once I have the image aligned, I would then know where all the boxes are and won't have to scan for the boxes, just scan inside the location of the boxes to see if they are black. This should be faster and more reliable. And if I were to operate on images coming from the camera, I'd have more flexibility in asking the user to align the picture to match the alignment marks, rather than having to align the image myself.
Given that this is my first Image Processing project, I feel like I am reinventing the wheel. I'd like suggestions on how to do this, and whether to utilize libraries like OpenCV.
I am enclosing an image similar to what I would like processed. I am looking for a list of all squares that have a significant amount of black marking, i.e. A8, C4, E7, G4, H1, J9.
Issues to be aware of:
Light coverage of the image may not be ideal, but should be relatively consistent across the image (i.e. no shadows)
All squares may be empty or all dark, and the algorithm needs to be able to determine that
the image may be skewed or rotated about any of the axis. Rotation about the z axis maybe easy to fix. There may be rotation around the x or y axis making ones side of the image be wider than the other. However, if I scan the image in realtime as it comes from the camera, I can ask the user to align the alignment marks with marks on the screen. How best to ensure that alignment to give the user appropriate feedback? Just checking to make sure that the 4 corners are dark could result in a false positive when the camera is pointing to a black surface.
not every square will be equally or consistently blacked, but I think there will be enough black to make it unquestionable to a human eye.
the blue grid may be useful, but there are cases where the black markings may overlap the blue grid. I think a virtual grid is probably better than relying on the printed grid. I would think that using the alignment markers to align the image, would then allow for a precise virtual grid to be laid out. And then the contents of each grid box could be sampled, to see if it was predominantly black, vs scanning from left-to-right, no? Here is another image with more markings on the grid. In this image, in addition to the previous marking in A8, C4, E7, G4, H1, J9, I have marked E2, G8 and G9, and I4 and J4 and you can see how the blue grid is obscured.
This is my first phase of this project. Eventually I'd like to scale this algorithm to be able to process at least a few hundred slots and possibly different colors.
To start with, this problem reminded me a bit of these demo's that might be useful to learn from:
The DNA microarray image processing
The Matlab Sudoku solver
The Iphone Sudoku solver blog post, explaining the image processing
Personally, I think the most simple approach would be to detect the squares in your image.
1) Remove the background and small cruft
f_makebw = #(I) im2bw(I.data, double(median(I.data(:)))/1.3);
bw = ~blockproc(im, [128 128], f_makebw);
bw = bwareaopen(bw, 30);
2) Remove everything but the squares and circles.
se = strel('disk', 5);
bw = imerode(bw, se);
% Detect the squares and cricles via morphology
[B, L] = bwboundaries(bw, 'noholes');
3) Detect the squares using 'extend' from regionprops. The 'Extent' metric measures what proportion of the bounding-box is filled. This makes it a
nice measure to distinguish between circles and squares
stats = regionprops(L, 'Extent');
extent = [stats.Extent];
idx1 = find(extent > 0.8);
bw = ismember(L, idx1);
4) This leaves you with your features, to synchronize or rectify the image with. An easy, and robust way, to do this, is via the Autocorrelation Function.
This gives nice peaks, which are easily detected. These peaks can be matched against the ACF peaks from a template image via the Hungarian algorithm. Once matched, you can correct rotation and scaling as you now have a linear system which you can solve:
x = Ax'
Translation can then be corrected using run-of-the-mill cross correlation against the same pre defined template.
If all goes well, you know have an aligned or synchronized image, which should help considerably in determining the position of the dots.
I've been starting to do something similar using my GPUImage iOS framework, so that might be an alternative to doing all of this in OpenCV or something else. As it's name indicates, GPUImage is entirely GPU-based, so it can have some tremendous performance benefits over CPU-bound processing (up to 180X faster for doing things like processing live video).
As a first stage, I took your images and ran them through a simple luminance thresholding filter with a threshold of 0.5 and arrived at the following for your two images:
I just added an adaptive thresholding filter, which attempts to correct for local illumination variances, and works really well for picking out text. However, in your images it uses too small of an averaging radius to handle your blobs well:
and seems to bring out your grid lines, which it sounds like you wish to ignore.
Maurits provides a more comprehensive description of what you could do, but there might be a way to implement these processing operations as high-performance GPU-based filters instead of relying on slower OpenCV versions of the same calculations. If you could grab rotation and scaling information from this thresholded image, you could construct a transform that could also be applied as a filter to your thresholded image to produce your final aligned image, which could then be downsampled and read out by your application to determine which grid locations were filled in.
These GPU-based thresholding operations run in less than 2 ms for 640x480 frames on an iPhone 4, so it might be possible to chain filters together to analyze incoming video frames as fast as the device's video camera can provide them.
I am currently facing a, in my opinion, rather common problem which should be quite easy to solve but so far all my approached have failed so I am turning to you for help.
I think the problem is explained best with some illustrations. I have some Patterns like these two:
I also have an Image like (probably better, because the photo this one originated from was quite poorly lit) this:
(Note how the Template was scaled to kinda fit the size of the image)
The ultimate goal is a tool which determines whether the user shows a thumb up/thumbs down gesture and also some angles in between. So I want to match the patterns against the image and see which one resembles the picture the most (or to be more precise, the angle the hand is showing). I know the direction in which the thumb is showing in the pattern, so if i find the pattern which looks identical I also have the angle.
I am working with OpenCV (with Python Bindings) and already tried cvMatchTemplate and MatchShapes but so far its not really working reliably.
I can only guess why MatchTemplate failed but I think that a smaller pattern with a smaller white are fits fully into the white area of a picture thus creating the best matching factor although its obvious that they dont really look the same.
Are there some Methods hidden in OpenCV I havent found yet or is there a known algorithm for those kinds of problem I should reimplement?
Happy New Year.
A few simple techniques could work:
After binarization and segmentation, find Feret's diameter of the blob (a.k.a. the farthest distance between points, or the major axis).
Find the convex hull of the point set, flood fill it, and treat it as a connected region. Subtract the original image with the thumb. The difference will be the area between the thumb and fist, and the position of that area relative to the center of mass should give you an indication of rotation.
Use a watershed algorithm on the distances of each point to the blob edge. This can help identify the connected thin region (the thumb).
Fit the largest circle (or largest inscribed polygon) within the blob. Dilate this circle or polygon until some fraction of its edge overlaps the background. Subtract this dilated figure from the original image; only the thumb will remain.
If the size of the hand is consistent (or relatively consistent), then you could also perform N morphological erode operations until the thumb disappears, then N dilate operations to grow the fist back to its original approximate size. Subtract this fist-only blob from the original blob to get the thumb blob. Then uses the thumb blob direction (Feret's diameter) and/or center of mass relative to the fist blob center of mass to determine direction.
Techniques to find critical points (regions of strong direction change) are trickier. At the simplest, you might also use corner detectors and then check the distance from one corner to another to identify the place when the inner edge of the thumb meets the fist.
For more complex methods, look into papers about shape decomposition by authors such as Kimia, Siddiqi, and Xiaofing Mi.
MatchTemplate seems like a good fit for the problem you describe. In what way is it failing for you? If you are actually masking the thumbs-up/thumbs-down/thumbs-in-between signs as nicely as you show in your sample image then you have already done the most difficult part.
MatchTemplate does not include rotation and scaling in the search space, so you should generate more templates from your reference image at all rotations you'd like to detect, and you should scale your templates to match the general size of the found thumbs up/thumbs down signs.
[edit]
The result array for MatchTemplate contains an integer value that specifies how well the fit of template in image is at that location. If you use CV_TM_SQDIFF then the lowest value in the result array is the location of best fit, if you use CV_TM_CCORR or CV_TM_CCOEFF then it is the highest value. If your scaled and rotated template images all have the same number of white pixels then you can compare the value of best fit you find for all different template images, and the template image that has the best fit overall is the one you want to select.
There are tons of rotation/scaling independent detection functions that could conceivably help you, but normalizing your problem to work with MatchTemplate is by far the easiest.
For the more advanced stuff, check out SIFT, Haar feature based classifiers, or one of the others available in OpenCV
I think you can get excellent results if you just compute the two points that have the furthest shortest path going through white. The direction in which the thumb is pointing is just the direction of the line that joins the two points.
You can do this easily by sampling points on the white area and using Floyd-Warshall.