I'm validating an image segmentation algorithm applied to 2D images. The algorithm generates a contour segment, i.e. a set of connected pixels that form a freecurve in 2D space. The idea is to compare this set of pixels with a ground-truth, in my case another contour segment manually traced by an expert. An image showing what would be a segmentation result and the corresponding manual (ground-truth) segmentation is shown below:
I'm trying to think of an adequate comparison metric to validate the segmentation results. Ideally the best metric would be the point-to-point euclidean distance between corresponding pairs of pixels on each segment, however (as seen in previous figure) the segments don't have the same length (i.e. differ by the total number of pixels) so pixel-to-pixel comparisons have to be discarded.
Can you suggest me an adequate metric for validating my algorithm? Thanks for any suggestion!
For each pixel in the ground truth, take the distance to the nearest pixel in the segmentation result. Then take the sum of that for all ground truth pixels as the total error.
That's basically recall weighted by distance. If you start with the pixels in the result, it would resemble precision instead.
If the curves are closed, you can compute the area between the curves. If you can tell which pixels belong to a segment, that is as easy as computing XOR set of the 2 pixel sets.
Here is an example using that I've created using Matlab:
You could divide each line into n segments of equal length, then compute the euclidean distance between each segment and its pair on the other line.
Related
I have an Image I
I am trying to do Automatic Object Extraction using Quantum Mechanics
Each pixel in an image is considered as a potential field, V(x,y) and hence each wave (eigen) function represents a meaningful region.
2D Time-independent Sschrodinger's equation
Multiplying both sides by
We get,
Rewriting the Laplacian using Finite Difference approach
where Ni is the set of neighbours with index i, and |Ni| is the cardinality of, i.e. the number of elements in Ni
Combining the above two equations, we get:
where M is the number of elements in
Now,the left hand side of the equation is a measure of how similar the labels in a neighbourhood are, i.e. a measure of spatial coherence.
Now, for applying this to images, the potential V is given as the pixel intensities.
Here, V is the pixel intensities
The right hand side is a measure of how close the pixel values in a segment are to a constant value E.
Now, the wave functions can be numerically calculated by solving the eigenvectors of Hamiltonian operator in matrix form which is
for i = j
for
and elsewhere 0
Now, in this paper it is said that first we have to find the maximum and minimum eigenvalues and then calculate the eigenvectors with eigenvalues closest to a number of values regularly selected between the minimum and maximum eigenvalues. the number is 300.
I have calculated the 300 eigenvectors.
And then the absolute square of the eigenvectors are thresholded to obtain the segments.
Fine upto this part.
Now, how do I reconstruct the eigenvectors into a 2D image so as to get the potential segments in the image?
I want to determine image sharpness by the amount of high frequencies within the image. As far as I understand the dft() function from OpenCV returns two matrices with real and complex numbers.
This is where I am stuck. How can I determine the amount of high frequencies from this data?
I am thankful for every hint/link which could provide me with a better understanding.
Greetings
Make FT
Calculate magnitude of result
Now you have 2D matrix. Consider upper left quadrant (other are mirrors for real source).
Here Magn[0][0] entry corresponds to zero frequency, and Magn[(n-1)/2][(n-1)/2] entry corresponds to the highest frequency.
Left upper part of this submatrix contains low-frequency samples, so you can calculate sum of values in this part and in the rest part and compare these sums. For example (pseudocode):
cvIntegral(Magn, Rect(0..n/4, 0..n/4)) compare with
cvIntegral(Magn, Rect(0..n/2, 0..n/2)) - cvIntegral(Magn, Rect(0..n/4, 0..n/4))
According to "Lowe, David G. "Distinctive image features from scale-invariant keypoints." International journal of
computer vision 60.2 (2004): 91-110 "
"It is important to avoid all boundary affects in which the descriptor
abruptly changes as a sample shifts smoothly from being within one
histogram to another or from one orientation to another. Therefore,
trilinear interpolation is used to distribute the value of each
gradient sample into adjacent histogram bins. In other words, each
entry into a bin is multiplied by a weight of 1−d for each dimension,
where d is the distance of the sample from the central value of the
bin as measured in units of the histogram bin spacing."
I am calculating the orientation[t] and location of gradient(x,y) which will be in floating point. Currently, I was just
providing the gradient magnitude to 3d histogram values[t][x][y] ( means the lower bound of floating point values of t,x
and y). But, according to paper, I have to distribute the gradient magnitude to adjacent bins. I am not sure about how
to distribute it.
I got my answer on following link:
HOG Trilinear Interpolation of Histogram Bins
The application of Konolige's block matching algorithm is not sufficiantly explained in the OpenCV documentation. The parameters of CvStereoBMState influence the accuracy of the disparities calculated by cv::StereoBM. However, those parameters are not documented. I will list those parameters below and describe, what I understand. Maybe someone can add a description of the parameters, which are unclear.
preFilterType: Determines, which filter is applied on the image before the disparities are calculated. Can be CV_STEREO_BM_XSOBEL (Sobel filter) or CV_STEREO_BM_NORMALIZED_RESPONSE (maybe differences to mean intensity???)
preFilterSize: Window size of the prefilter (width = height of the window, negative value)
preFilterCap: Clips the output to [-preFilterCap, preFilterCap]. What happens to the values outside the interval?
SADWindowSize: Size of the compared windows in the left and in the right image, where the sums of absolute differences are calculated to find corresponding pixels.
minDisparity: The smallest disparity, which is taken into account. Default is zero, should be set to a negative value, if negative disparities are possible (depends on the angle between the cameras views and the distance of the measured object to the cameras).
numberOfDisparities: The disparity search range [minDisparity, minDisparity+numberOfDisparities].
textureThreshold: Calculate the disparity only at locations, where the texture is larger than (or at least equal to?) this threshold. How is texture defined??? Variance in the surrounding window???
uniquenessRatio: Cited from calib3d.hpp: "accept the computed disparity d* only ifSAD(d) >= SAD(d*)(1 + uniquenessRatio/100.) for any d != d+/-1 within the search range."
speckleRange: Unsure.
trySmallerWindows: ???
roi1, roi2: Calculate the disparities only in these regions??? Unsure.
speckleWindowSize: Unsure.
disp12MaxDiff: Unsure, but a comment in calib3d.hpp says, that a left-right check is performed. Guess: Pixels are matched from the left image to the right image and from the right image back to the left image. The disparities are only valid, if the distance between the original left pixel and the back-matched pixel is smaller than disp12MaxDiff.
speckleWindowSize and speckleRange are parameters for the function cv::filterSpeckles. Take a look at OpenCV's documentation.
cv::filterSpeckles is used to post-process the disparity map. It replaces blobs of similar disparities (the difference of two adjacent values does not exceed speckleRange) whose size is less or equal speckleWindowSize (the number of pixels forming the blob) by the invalid disparity value (either short -16 or float -1.f).
The parameters are better described in the Python tutorial on depth map from stereo images. The parameters seem to be the same.
texture_threshold: filters out areas that don't have enough texture
for reliable matching
Speckle range and size: Block-based matchers
often produce "speckles" near the boundaries of objects, where the
matching window catches the foreground on one side and the background
on the other. In this scene it appears that the matcher is also
finding small spurious matches in the projected texture on the table.
To get rid of these artifacts we post-process the disparity image with
a speckle filter controlled by the speckle_size and speckle_range
parameters. speckle_size is the number of pixels below which a
disparity blob is dismissed as "speckle." speckle_range controls how
close in value disparities must be to be considered part of the same
blob.
Number of disparities: How many pixels to slide the window over.
The larger it is, the larger the range of visible depths, but more
computation is required.
min_disparity: the offset from the x-position
of the left pixel at which to begin searching.
uniqueness_ratio:
Another post-filtering step. If the best matching disparity is not
sufficiently better than every other disparity in the search range,
the pixel is filtered out. You can try tweaking this if
texture_threshold and the speckle filtering are still letting through
spurious matches.
prefilter_size and prefilter_cap: The pre-filtering
phase, which normalizes image brightness and enhances texture in
preparation for block matching. Normally you should not need to adjust
these.
Also check out this ROS tutorial on choosing stereo parameters.
I have one problem in the second step which is to accumulate weighted votes for gradient orientation over spatial cells.
Assuming the cell is 8*8. Let me use two matrix GO[8][8]([1 9]), GM[8][8] to represent the gradient orientation and gradient magnitude respectively.
The gradient orientation ranges from 0 - 180 and there are 9 orientation bins.
According to my understanding of HOG, for every pixel in a cell, adding its gradient magnitude to its corresponding orientation bin. In this way, we can have the histogram for every cell.
But there is one sentence thats confusing me.
"To reduce aliasing, votes(gradient magnitude) are interpolated
trilinearly between the neighbouring bin centers in both orientation
and position."1
Why interpolated? How to interpolate? Can someone explains more detailed? No reducing aliasing.
Thanks in advance.
1 This sentence is in Navneet Dalal's PHD thesis, p38, line 4.
Interpolation is a standard technique for computing histograms. The idea here is that each value is not simply placed into one bin, but is distributed between two neighboring bins (assuming a 1d histogram), based on how far away it is from the center of the original bin.
The purpose of this is to deal with situations when a small error in your measurement can cause a value to be placed into a different bin. This is a very good thing to do for any type of histogram, not just for HOGs, assuming you have the CPU cycles.
There is also bi-linear and tri-linear interpolation for 2d and 3d histograms, where each value is distributed between 4 and 8 neighboring bins respectively.