How to average two masks? - opencv

I have two boolean masks that I got from the object detection for two video frames i and i+1. Now I want to "avarage" them to remove noise. Masks are closed convex curves. So basically I want to find the middle line between them. How can I do this?
Here is an example:
Let's say that we have two maks red and blue for two successive frames, after filtering we need to get something like the green line that is between two contours.

You can accomplish this using the distance transform.
The core idea is to compute the signed distance to the edge of each mask, and find the zero level set for the average. There is no need to require convex masks for this algorithm. I do assume that the inputs are solid masks (i.e. a filled contour).
The distance transform computes the (Euclidean) distance of each object pixel to the nearest background pixel. The signed distance to the edge is formed by the combination of two distance transforms: the distance transform of the object and the distance transform of the background (i.e. of the inverted mask). The latter, subtracted from the former, gives an image where pixels outside the mask have negative distances to the edge of the mask, whereas pixels inside have positive distances. The edge of the mask is given by the zero crossings.
If you compute the signed distance to the edges of the two mask images, and average them together, you will obtain zero crossings at a location exactly half-way the edges of the two masks. Simply thresholding this result gives you the averaged mask.
Note that, since we're thresholding at 0, there is no difference between the sum of the two signed distances, or their average. The sum is cheaper to compute.
Here is an example, using your color coding (red and blue are the edges of the two inputs, green is the edge of the output):
The code below is MATLAB with DIPimage, which I wrote just to show the result. Just consider it pseudo-code for you to implement with OpenCV. :)
% inputs: mask1, mask2: binary images
d1 = dt(mask1) - dt(~mask1); % dt is the distance transform
d2 = dt(mask2) - dt(~mask2); % ~ is the logical negation
mask = (d1+d2) > 0; % output

Related

Finding edges in a height map

I want to find sharp edges in a heightmap image, while ignoring shallow edges.
OpenCV offers multiple approaches to finding edges in a 2d Image: Canny, Sobel, etc.
However, all these approaches work by comparing the intensity values on both sides of the edge.
If the 2D Image represents a height map of a 3D object, then this results in some weird behaviour.
In a height map, the height of a 3D object at a given X/Y coordinate is represented as the intensity of the 2D Pixel at that X/Y coordinate:
In the above picture, at the edge B the intensity changes only slightly between the left and right side, even though it is a sharp corner.
At the edge A, there is a bigchange in the intensity between pixels on the left side of the edge and the right, even though it is only a shallow angle.
So there is no threshold for Canny or Sobel that will preserve the sharp edge but filter the shallow edge.
(In the above example, the edge B has one side with an ascending slope, and one side with a descending slope. I could filter for this feature; but that would remove the edges C and D as well)
How can I get a binary edge image, containing only edges above a certain angle? (e.g. edge B, C, and D, but not A)
Or alternatively, how can I get a gradient derivative image, where the intensity of each pixel is proportional to the angle of the edge at that pixel?
Probably you'll want to use second derivative instead of first for this task.
Here's my intuition: taking derivative of height (intensity in your case) at each position on an evenly spaced grid would be proportional to arctan of the surface slope between sampling points (or at sampling points if you use a 2-sided derivative approximation). But since you want to detect sharp edges - you are looking for a derivative of slope at the sampling points. This means that you can set a threshold on a derivative of arctan of derivative of intensity to achieve your goal (luckily there's no "need to go deeper" :) )
You will have to be extra careful with taking a derivative of "slope angles" that you'll get - depending on the coordinate system you may come across ambiguity of angle difference (there are 2 ways to get from one angle to another, which are different in general case; you're looking for the "shorter" one). You can look for possible solution here
I have a rather simple approach that I came across wile reading a blog post.
It involves computing the median value of the gray scale image. Using this value we can now set two threshold values:
lower: max(0, (1.0 - 0.33) * v)
upper: min(255, (1.0 + 0.33) * v)
Now pass these two values as parameters into the cv2.Canny() function.
You will now be able to perform an optimized edge detection given any image. The crux of this answer depends on the median value of the image which varies for different images.
If i understand your question correctly, "what you need is basically a corner with high intensity values".
If that is so then look for Harris corner detector which would help you to find points with high gradient change in both direction.
http://docs.opencv.org/2.4/doc/tutorials/features2d/trackingmotion/harris_detector/harris_detector.html
Once you detect the corners you can filter the corners which have high intensity by using a suitable threshold.

how to perform hough transformation for spesific angle range

I have been trying to detect and track vehicle in a video stream. Recently I decided to implement a hard-coded method which find out the shadow of a vehicle and detect entire vehicle with respect to tire position. At the end, I partially done with my implementation. Here is the video link of demonstration.
At the first step I used canny edge detector to subtract edge of the video frames.
Then I used hough transform funciton in opencv.
However this functions finds all the horizontal and vertical lines while I only interested in horizontal lines which are possibly shadow of the vehicle.
My question is how I can use hough line transform function where it only checks the lines which are in a spesific range of angle and within a spesific area. Is there any parameter that tresholds the angle ? Or should I implement the function by myself ?
Since you end up with a binary image after the Canny operation, it may be easiest to convolve the image with a simple horizontal Prewitt operator before applying the Hough transform:
1 1 1
0 0 0
-1 -1 -1
which will give you a map of the grayscale intensities of each pixel, with pixels along horizontal edge giving the strongest signal. The advantage of using only the horizontal operator is that vertical edges receive no amplification, horizontal edges receive maximum amplification, and any edge within 45° of horizontal should have a signal somewhere between the two. You can use the resulting image to decide which pixels from the Canny mask to keep when you apply the detect edges to the original image: If the Prewitt signal is above a certain threshold for a pixel, that pixel is assumed to be along a 'horizontal-enough' edge that gets kept, discard otherwise. I believe opencv has this feature, but it's trivial to implement if not.

How to apply box filter on integral image? (SURF)

Assuming that I have a grayscale (8-bit) image and assume that I have an integral image created from that same image.
Image resolution is 720x576. According to SURF algorithm, each octave is composed of 4 box filters, which are defined by the number of pixels on their side. The
first octave uses filters with 9x9, 15x15, 21x21 and 27x27 pixels. The
second octave uses filters with 15x15, 27x27, 39x39 and 51x51 pixels.The third octave uses filters with 27x27, 51x51, 75x75 and 99x99 pixels. If the image is sufficiently large and I guess 720x576 is big enough (right??!!), a fourth octave is added, 51x51, 99x99, 147x147 and 195x195. These
octaves partially overlap one another to improve the quality of the interpolated results.
// so, we have:
//
// 9x9 15x15 21x21 27x27
// 15x15 27x27 39x39 51x51
// 27x27 51x51 75x75 99x99
// 51x51 99x99 147x147 195x195
The questions are:What are the values in each of these filters? Should I hardcode these values, or should I calculate them? How exactly (numerically) to apply filters to the integral image?
Also, for calculating the Hessian determinant I found two approximations:
det(HessianApprox) = DxxDyy − (0.9Dxy)^2 anddet(HessianApprox) = DxxDyy − (0.81Dxy)^2Which one is correct?
(Dxx, Dyy, and Dxy are Gaussian second order derivatives).
I had to go back to the original paper to find the precise answers to your questions.
Some background first
SURF leverages a common Image Analysis approach for regions-of-interest detection that is called blob detection.
The typical approach for blob detection is a difference of Gaussians.
There are several reasons for this, the first one being to mimic what happens in the visual cortex of the human brains.
The drawback to difference of Gaussians (DoG) is the computation time that is too expensive to be applied to large image areas.
In order to bypass this issue, SURF takes a simple approach. A DoG is simply the computation of two Gaussian averages (or equivalently, apply a Gaussian blur) followed by taking their difference.
A quick-and-dirty approximation (not so dirty for small regions) is to approximate the Gaussian blur by a box blur.
A box blur is the average value of all the images values in a given rectangle. It can be computed efficiently via integral images.
Using integral images
Inside an integral image, each pixel value is the sum of all the pixels that were above it and on its left in the original image.
The top-left pixel value in the integral image is thus 0, and the bottom-rightmost pixel of the integral image has thus the sum of all the original pixels for value.
Then, you just need to remark that the box blur is equal to the sum of all the pixels inside a given rectangle (not originating in the top-lefmost pixel of the image) and apply the following simple geometric reasoning.
If you have a rectangle with corners ABCD (top left, top right, bottom left, bottom right), then the value of the box filter is given by:
boxFilter(ABCD) = A + D - B - C,
where A, B, C, D is a shortcut for IntegralImagePixelAt(A) (B, C, D respectively).
Integral images in SURF
SURF is not using box blurs of sizes 9x9, etc. directly.
What it uses instead is several orders of Gaussian derivatives, or Haar-like features.
Let's take an example. Suppose you are to compute the 9x9 filters output. This corresponds to a given sigma, hence a fixed scale/octave.
The sigma being fixed, you center your 9x9 window on the pixel of interest. Then, you compute the output of the 2nd order Gaussian derivative in each direction (horizontal, vertical, diagonal). The Fig. 1 in the paper gives you an illustration of the vertical and diagonal filters.
The Hessian determinant
There is a factor to take into account the scale differences. Let's believe the paper that the determinant is equal to:
Det = DxxDyy - (0.9 * Dxy)^2.
Finally, the determinant is given by: Det = DxxDyy - 0.81*Dxy^2.
Look at page 17 of this document
http://www.sci.utah.edu/~fletcher/CS7960/slides/Scott.pdf
If you made a code for normal Gaussian 2D convolution, just use the box filter as a Gaussian kernel and the input image will be the same original image not integral image. The results from this method will be same with the one you asked.

Getting corners from convex points

I have written algorithm to extract the points shown in the image. They form convex shape and I know order of them. How do I extract corners (top 3 and bottom 3) from such points?
I'm using opencv.
if you already have the convex hull of the object, and that hull includes the corner points, then all you need to to do is simplify the hull until it only has 6 points.
There are many ways to simplify polygons, for example you could just use this simple algorithm used in this answer: How to find corner coordinates of a rectangle in an image
do
for each point P on the convex hull:
measure its distance to the line AB _
between the point A before P and the point B after P,
remove the point with the smallest distance
repeat until 6 points left
If you do not know the exact number of points, then you could remove points until the minimum distance rises above a certain threshold
you could also do Ramer-Douglas-Peucker to simplify the polygon, openCV already has that implemented in cv::approxPolyDP.
Just modify the openCV squares sample to use 6 points instead of 4
Instead of trying to directly determine which of your feature points correspond to corners, how about applying an corner detection algorithm on the entire image then looking for which of your feature points appear close to peaks in the corner detector?
I'd suggest starting with a Harris corner detector. The OpenCV implementation is cv::cornerHarris.
Essentially, the Harris algorithm applies both a horizontal and a vertical Sobel filter to the image (or some other approximation of the partial derivatives of the image in the x and y directions).
It then constructs a 2 by 2 structure matrix at each image pixel, looks at the eigenvalues of that matrix, and calls points corners if both eigenvalues are above some threshold.

Rectangle detection with Hough transform

I'm trying to implement rectangle detection using the Hough transform, based on
this paper.
I programmed it using Matlab, but after the detection of parallel pair lines and orthogonal pairs, I must detect the intersection of these pairs. My question is about the quality of the two line intersection in Hough space.
I found the intersection points by solving four equation systems. Do these intersection points lie in cartesian or polar coordinate space?
For those of you wondering about the paper, it's:
Rectangle Detection based on a Windowed Hough Transform by Cláudio Rosito Jung and Rodrigo Schramm.
Now according to the paper, the intersection points are expressed as polar coordinates, obviously you implementation may be different (the only way to tell is to show us your code).
Assuming you are being consistent with his notation, your peaks should be expressed as:
You must then perform peak paring given by equation (3) in section 4.3 or
where represents the angular threshold corresponding to parallel lines
and is the normalized threshold corresponding to lines of similar length.
The accuracy of the Hough space should be dependent on two main factors.
The accumulator maps onto Hough Space. To loop through the accumulator array requires that the accumulator divide the Hough Space into a discrete grid.
The second factor in accuracy in Linear Hough Space is the location of the origin in the original image. Look for a moment at what happens if you do a sweep of \theta for any given change in \rho. Near the origin, one of these sweeps will cover far less pixels than a sweep out near the edges of the image. This has the consequence that near the edges of the image you need a much higher \rho \theta resolution in your accumulator to achieve the same level of accuracy when transforming back to Cartesian.
The problem with increasing the resolution of course is that you will need more computational power and memory to increase it. Also If you uniformly increase the accumulator resolution you have wasted resolution near the origin where it is not needed.
Some ideas to help with this.
place the origin right at the
center of the image. as opposed to
using the natural bottom left or top
left of an image in code.
try using the closest image you can
get to a square. the more elongated an
image is for a given area the more
pronounced the resolution trap
becomes at the edges
Try dividing your image into 4/9/16
etc different accumulators each with
an origin in the center of that sub-image.
It will require a little overhead to link
the results of each accumulator together
for rectangle detection, but it should help
spread the resolution more evenly.
The ultimate solution would be to increase
the resolution linearly depending on the
distance from the origin. this can be achieved using the
(x-a)^2 + (y-b)^2 = \rho^2
circle equation where
- x,y are the current pixel
- a,b are your chosen origin
- \rho is the radius
once the radius is known adjust your accumulator
resolution accordingly. You will have to keep
track of the center of each \rho \theta bin.
for transforming back to Cartesian
The link to the referenced paper does not work, but if you used the standard hough transform than the four intersection points will be expressed in cartesian coordinates. In fact, the four lines detected with the hough tranform will be expressed using the "normal parametrization":
rho = x cos(theta) + y sin(theta)
so you will have four pairs (rho_i, theta_i) that identifies your four lines. After checking for orthogonality (for example just by comparing the angles theta_i) you solve four equation system each of the form:
rho_j = x cos(theta_j) + y sin(theta_j)
rho_k = x cos(theta_k) + y sin(theta_k)
where x and y are the unknowns that represents the cartesian coordinates of the intersection point.
I am not a mathematician. I am willing to stand corrected...
From Hough 2) ... any line on the xy plane can be described as p = x cos theta + y sin theta. In this representation, p is the normal distance and theta is the normal angle of a straight line, ... In practical applications, the angles theta and distances p are quantized, and we obtain an array C(p, theta).
from CRC standard math tables Analytic Geometry, Polar Coordinates in a Plane section ...
Such an ordered pair of numbers (r, theta) are called polar coordinates of the point p.
Straight lines: let p = distance of line from O, w = counterclockwise angle from OX to the perpendicular through O to the line. Normal form: r cos(theta - w) = p.
From this I conclude that the points lie in polar coordinate space.

Resources