Calculating gradient orientation (HOG like) poor accuracy - opencv

I am trying to implement a image matching algorithm based on gradient orientation matching. The main algo contains from the following steps:
convert the image to polar coordinates:
calculate gradients using sobel operator:
Xgrad = cv2.Sobel(gr,cv2.CV_64F,1,0,ksize=5)
Ygrad = cv2.Sobel(gr,cv2.CV_64F,0,1,ksize=3)
3) calculate orientation of gradient and binarize it. :
Now I can compare images using last features map with ignorance to rotation ans smal changes.
But what I have found that this algo detects rotations of the same image with rotation rather purely. I have build a test image with circles to test this algo:
and rotated it to 10 grads
here are polar conversions:
and gradient orientation masks: as You se here is already a lot of noise on gradient matrix. and it brokes matching algo.
and it's best passing difference mask: the whole line areas are marked as not matched. Small gaussian blurring on different steps are not helping at all. I dont know why.
Update:
Gradient calculation:
gx = cv2.Sobel(gr,cv2.CV_64F,1,0,ksize=1)
gy = cv2.Sobel(gr,cv2.CV_64F,0,1,ksize=1)
blurredgx = cv2.GaussianBlur(gx,(11,3),1)
blurredgy = cv2.GaussianBlur(gy,(11,3),1)
magnitude, angle = cv2.cartToPolar(blurredgx, blurredgy)

Could you please explain how you've computed orientation of gradients? I believe you've grouped each 4x4 window and computed the orientation of gradients inside each such window. But the sobel operator used has size 5x5. This obviously result in some overlap. Could you explain on this?

Related

The length of the gradient vector

It's just a simple thing that I need to clarify.
I need a little refresh in mathematics:
In a circle the length of the gradient should be the radius?
Or do we use the gradient only to get the orientation?
I got to this question after I read about gradient in image processing:
I've read this answer and this about how to get the image gradient and of course here.
I don't understand if the magnitude should stand for the number of pixels? or it just stand for the strength of the intensity changes in a specific point.
The following image is the magnitude of the gradient:
the magnitude of the gradient:
I run the code and watched the magnitude in numbers, and the numbers clearly are not in the range of the image width\height.
Me, waiting to a simple clarify.
Thanks!
Mathematically speaking, the gradient magnitude, or in other words the norm of the gradient vector, represents the derivative (i.e. the slope) of a 2D signal. This is quite clear in the definition given by Wikipedia:
Here, f is the 2D signal and x^, y^ (this is ugly, I'll note them ux and uy in the following) are respectively unit vectors in the horizontal and vertical direction.
In the context of images, the 2D signal (i.e. the image) is discrete instead of being continuous, hence the derivative is approximated by the difference between the intensity of the current pixel and the intensity of the previous pixel, in the considered direction (actually, there are several ways to approximate the derivative, but let's keep it simple). Hence, we can approximate the gradient by the following quantity:
gradient f (u,v) = [ f(u,v)-f(u-1,v) ] . ux + [ f(u,v)-f(u,v-1) ] . uy
In this case, the gradient magnitude is the following:
|| gradient f (u,v) || = square_root { [ f(u,v)-f(u-1,v) ]² + [ f(u,v)-f(u,v-1) ]² }
To summarize, the gradient magnitude is a measure of the local intensity change at a given point and has not much to do with a radius, nor the width/height of the image.

How can I select the best set of parameters in the Canny edge detection algorithm implemented in OpenCV?

I am working with OpenCV on the Android platform. With the tremendous help from this community and techies, I am able to successfully detect a sheet out of the image.
These are the step I used.
Imgproc.cvtColor()
Imgproc.Canny()
Imgproc.GausianBlur()
Imgproc.findContours()
Imgproc.approxPolyDP()
findLargestRectangle()
Find the vertices of the rectangle
Find the vertices of the rectangle top-left anticlockwise order using center of mass approach
Find the height and width of the rectangle just to maintain the aspect ratio and do warpPerspective transformation.
After applying all these steps I can easily get the document or the largest rectangle from an image. But it highly depends on the difference in the intensities of the background and the document sheet. As the Canny edge detector works on the principle of intensity gradient, a difference in intensity is always assumed from the implementation side. That is why Canny took into the account the various threshold parameters.
Lower threshold
Higher threshold
So if the intensity gradient of a pixel is greater than the higher threshold, it will be added as an edge pixel in the output image. A pixel will be rejected completely if its intensity gradient value is lower than the lower threshold. And if a pixel has an intensity between the lower and higher threshold, it will only be added as an edge pixel if it is connected to any other pixel having the value larger than the higher threshold.
My main purpose is to use Canny edge detection for the document scanning. So how can I compute these thresholds dynamically so that it can work with the both cases of dark and light background?
I tried a lot by manually adjusting the parameters, but I couldn't find any relationship associated with the scenarios.
You could calculate your thresholds using Otsu’s method.
The (Python) code would look like this:
high_thresh, thresh_im = cv2.threshold(im, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
lowThresh = 0.5*high_thresh
Use the following snippet which I obtained from this blog:
v = np.median(gray_image)
#---- Apply automatic Canny edge detection using the computed median----
lower = int(max(0, (1.0 - sigma) * v))
upper = int(min(255, (1.0 + sigma) * v))
edged = cv2.Canny(gray_image, lower, upper)
cv2.imshow('Edges',edged)
##So what am I doing here?
I am taking the median value of the gray scale image. The sigma value of 0.33 is chosen to set the lower and upper threshold. 0.33 value is generally used by statisticians for data science. So it is considered here as well.

Trilinear interpolation in HOG

I am currently implementing HOG in Matlab, but I don't understand the binning, especially the trilinear interpolation part.
What I understood is, that each pixel in a cell is dropped into a bin to form the histogram for this cell. But that's all I understand atm.
How is the magnitude computed?
What are the edges of the cube, and what are the 3D coordinates for one pixel?
Wikipedia describes the gradient (in the context of images) and shows how to obtain its x and y coordinates.
How is the magnitude computed?
r = sqrt(x*x+y*y)
what are the 3D coordinates for one pixel?
When computing the gradient, the image is considered as a height map. For a pixel at a position (x,y) with a gray scale value z it represents the height map 3D position (x,y,z).
A gradient at (x,y,z) has an orientation and magnitude. The histogram is a discretization of all possible orientations into bins. For example with 8 bins, all orientations from 0 to 45 degrees will be associated to the same bin.
The selection of bins is based on the gradient orientation and a weight is added to the bin based on the magnitude.
Wikipedia describes the steps of HOG and gives details pointers in the original paper.

Warping Perspective using arbitary rotation angle

I have an image of a chessboard taken at some angle. Now I want to warp perspective so the chessboard image look again as if was taken directly from above.
I know that I can try to use 'findHomography' between matched points but I wanted to avoid it and use e.g. rotation data from mobile sensors to build homography matrix on my own. I calibrated my camera to get intrinsic parameters. Then lets say the following image has been taken at ~60degrees angle around x-axis. I thought that all I have to do is to multiply camera matrix with rotation matrix to obtain homography matrix. I tried to use the following code but looks like I'm not understanding something correctly because it doesn't work as expected (result image completely black or white.
import cv2
import numpy as np
import math
camera_matrix = np.array([[ 5.7415988502105745e+02, 0., 2.3986181527877352e+02],
[0., 5.7473682183375217e+02, 3.1723734404756237e+02],
[0., 0., 1.]])
distortion_coefficients = np.array([ 1.8662919398453856e-01, -7.9649812697463640e-01,
1.8178068172317731e-03, -2.4296638847737923e-03,
7.0519002388825025e-01 ])
theta = math.radians(60)
rotx = np.array([[1, 0, 0],
[0, math.cos(theta), -math.sin(theta)],
[0, math.sin(theta), math.cos(theta)]])
homography = np.dot(camera_matrix, rotx)
im = cv2.imread('data/chess1.jpg')
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
im_warped = cv2.warpPerspective(gray, homography, (480, 640), flags=cv2.WARP_INVERSE_MAP)
cv2.imshow('image', im_warped)
cv2.waitKey()
pass
I also have distortion_coefficients after calibration. How can those be incorporated into the code to improve results?
This answer is awfully late by several years, but here it is ...
(Disclaimer: my use of terminology in this answer may be imprecise or incorrect. Please do look up on this topic from other more credible sources.)
Remember:
Because you only have one image (view), you can only compute 2D homography (perspective correspondence between one 2D view and another 2D view), not the full 3D homography.
Because of that, the nice intuitive understanding of the 3D homography (rotation matrix, translation matrix, focal distance, etc.) are not available to you.
What we say is that with 2D homography you cannot factorize the 3x3 matrix into those nice intuitive components like 3D homography does.
You have one matrix - (which is the product of several matrices unknown to you) - and that is it.
However,
OpenCV provides a getPerspectiveTransform function which solves the 3x3 perspective matrix (using homogenous coordinate system) for a 2D homography between two planar quadrilaterals.
Link to documentation
To use this function,
Find the four corners of the chessboard on the image. These will be your source coordinates.
Supply four rectangle corners of your choice. These will be your destination coordinates.
Pass the source coordinates and destination coordinates into the getPerspectiveTransform to generate a 3x3 matrix that is able to dewarp your chessboard to an upright rectangle.
Notes to remember:
Mind the ordering of the four corners.
If the source coordinates are picked in clockwise order, the destination also needs to be picked in clockwise order.
Likewise, if counter-clockwise order is used, do it consistently.
Likewise, if z-order (top left, top right, bottom left, bottom right) is used, do it consistently.
Failure to order the corners consistently will generate a matrix that executes the point-to-point correspondence exactly (mathematically speaking), but will not generate a usable output image.
The aspect ratio of the destination rectangle can be chosen arbitrarily. In fact, it is not possible to deduce the "original aspect ratio" of the object in world coordinates, because "this is 2D homography, not 3D".
One problem is that to multiply by a camera matrix you need some concept of a z coordinate. You should start by getting basic image warping given Euler angles to work before you think about distortion coefficients. Have a look at this answer for a slightly more detailed explanation and try to duplicate my result. The idea of moving your image down the z axis and then projecting it with your camera matrix can be confusing, let me know if any part of it does not make sense.
You do not need to calibrate the camera nor estimate the camera orientation (the latter, however, in this case would be very easy: just find the vanishing points of those orthogonal bundles of lines, and take their cross product to find the normal to the plane, see Hartley & Zisserman's bible for details).
The only thing you need to do is estimate the homography that maps the checkers to squares, then apply it to the image.

Pose correction for face recognition

I have a dataset of images with faces. I also have for each face within the dataset a set of 66 2D points that correspond to my face landmarks(nose, eyes, shape of my face, mouth).
So basically I have the shape of my face in terms of 2D points from my image.
Do you know any algorithm that I can use and that can rotate my shape so that the face shape is straight? Let's say that the pan angle is 30 degrees and I want it rotated to 30 degrees so that it is positioned at 0 degrees on the pan angle. I have illustrated bellow what I want to say.
Basically you can consider the above illustrated shapes outlines for my images, which are represented in 2D. I want to rotate my first shape points so that they can look like the second shape. A shape is made out of a set of 66 2D points which are basically pixel coordinates. All I want to do is to find the correspondence of each of those 66 points so that the new shape is rotated with a certain degree on the pan angle.
From your question, I can assume you either have the rotation parameters (e.g. degrees in x,y) or the point correspondences (since you have a database of matched points). Thus you either need to apply or estimate (and apply) a 2D similarity transformation for image alignment/registration. See also the response on this question: face alignment algorithm on images
From rotation angle and to new point locations: You can define a 2D rotation matrix R and transform your point coordinates with it.
From point correspondences between shape A and Shape B to rotation: Estimate a 2D similarity transform (image alignment) using 3 or more matching points.
From either rotation or point correspondences to warped image: From the similarity transform, map image values (accounting for interpolation or non-values) using the underlying coordinate transformation for the entire image grid.
(image courtesy of Denis Simakov, AAM Slides)
Most of these are already implemented in OpenCV and MATLAB. See also the background and relevant methods around Active Shape and Active Appearance Models (Tim Cootes page includes binaries and background material).

Resources