Image Processing(oblique image) - image-processing

I hope you can give me some suggestions. I want to semantically segment the cyanobacteria image of the lake, and hope to calculate the cyanobacteria area in the image. How to preprocess the image due to the existence of a certain angle?It is not vertical. Make it more accurate to calculate the actual area through pixels. The image is as follows.

You can't make calibrated measurements (in true units of area) without knowing the scaling factors. So you should let a calibration target float on the water, wholly in the field of view*.
If the viewing distance is sufficiently large that the perspective effect can be neglected, the transformation is affine and it suffices to take the ratio of the apparent area of the cyanobacteria (in pixels) over the apparent area of the target (in pixels), times the true area of the target**.
If the perspective is strong, the transformation is an homography, and things get a little more complicated. From four points of the target (say corners), you can obtain the coefficients of the homography that maps the viewed points to undistorded space. Then you need to undistort the cyanobacteria area outline (as a polygon) and you can compute its area by the shoelace formula.
You can also completely straighten the image before segmentation, though this is not really necessary.
*You could think of obtaining the scaling factors by knowing viewing angles and distance, but that method will be unpractical to use in the field.
**Take a picture of a large square. If it appears like a parallelogram, you are good. If like a general quadrilateral, perspective must be corrected.

Related

Is canny edge detection edge rotationlly invariant?

Suppose that the Canny edge detector successfully detects an edge in an image. The edge is then rotated by θ, where the relationship between a point on the original edge (x,y)(x,y) and a point on the rotated edge (x′,y′)(x′,y′) is defined as x′ = xcosθ; y′ = xsinθ;
Will the rotated edge be detected using the same Canny edge detector?
(I think we should find answer considering that the detection of an edge by the Canny edge detector depends only on the magnitude of its derivative.)
The answer is both yes and no, and which one you go for depends on how literally you take the question.
First of all, we're dealing with a rectangular grid, so given an integer location (x,y), the corresponding point (x',y') in a rotated image is highly likely not an integer location. And considering that the output of Canny is a set of points, and not a smooth function that can be interpolated, it would be difficult to establish a correspondence between the set resulting from the rotated and the one resulting from the original image.
Think for example about the number of pixels on a discrete line of a given length at 0 degrees and at 45 degrees. (Hint: the line at 45 degrees has sqrt(2) times fewer pixels.)
But if you take the question more generally and interpret it as "will an edge that is detected in the original image also be detected after rotating the image by θ degrees?" then the answer is yes, in theory.
Of course practice is always a bit different than theory. The details of the implementation matter here. And there is always numerical imprecision to contend with.
Let's start by assuming the rotation is computed correctly, with a precise interpolation scheme (cubic, Lanczos) and not rounded after to uint8 or something (i.e. we're computing using floating-point values).
If you read the original paper by Canny, you'll see he proposes using Gaussian derivatives as the best compromise between compact support and computational precision. I have seen few implementations that actually do. Typically I see a convolution with a Gaussian and then Sobel derivatives. Especially for smaller sigmas (less smoothing) the difference can be quite large. Gaussian derivatives are rotationally invariant, Sobel derivatives are not.
The next step in the algorithm is non-maximum suppression. This is where the continuous gradient is converted to a set of points. For each pixel, it checks to see if it is a local maximum in the direction of the gradient. Because this is done per pixel, a different set of locations are tested in the rotated image compared to the original. Nonetheless, it should detect points along the same ridges in both cases.
Next, a hysteresis threshold is applied. This is a two-threshold operation that keeps pixels above one threshold as long as at least one pixel above a second threshold is present in the same connected component. This is where the differences could occur between rotated and original image. Remember we're dealing with a set of pixels. We have samples the continuous gradient function at discrete points. There could be an edge that has one pixel above the second threshold in one version of the image, but not in the other. This would only occur for edges very close to the chosen threshold, of course.
Next comes a thinning. Because the non-maximum suppression can yield points along a thicker line, a thinning operation is applied that removes pixels from the set that are not needed to maintain connectivity of the lines. Which pixels are selected here will also differ between rotated and original images, but this does not change the geometry of the solution, so we still have the same set of points.
So, the answer is yes and no. :)
Note that the same logic applies to translation.

Understanding Distance Transform in OpenCV

What is Distance Transform?What is the theory behind it?if I have 2 similar images but in different positions, how does distance transform help in overlapping them?The results that distance transform function produce are like divided in the middle-is it to find the center of one image so that the other is overlapped just half way?I have looked into the documentation of opencv but it's still not clear.
Look at the picture below (you may want to increase you monitor brightness to see it better). The pictures shows the distance from the red contour depicted with pixel intensities, so in the middle of the image where the distance is maximum the intensities are highest. This is a manifestation of the distance transform. Here is an immediate application - a green shape is a so-called active contour or snake that moves according to the gradient of distances from the contour (and also follows some other constraints) curls around the red outline. Thus one application of distance transform is shape processing.
Another application is text recognition - one of the powerful cues for text is a stable width of a stroke. The distance transform run on segmented text can confirm this. A corresponding method is called stroke width transform (SWT)
As for aligning two rotated shapes, I am not sure how you can use DT. You can find a center of a shape to rotate the shape but you can also rotate it about any point as well. The difference will be just in translation which is irrelevant if you run matchTemplate to match them in correct orientation.
Perhaps if you upload your images it will be more clear what to do. In general you can match them as a whole or by features (which is more robust to various deformations or perspective distortions) or even using outlines/silhouettes if they there are only a few features. Finally you can figure out the orientation of your object (if it has a dominant orientation) by running PCA or fitting an ellipse (as rotated rectangle).
cv::RotatedRect rect = cv::fitEllipse(points2D);
float angle_to_rotate = rect.angle;
The distance transform is an operation that works on a single binary image that fundamentally seeks to measure a value from every empty point (zero pixel) to the nearest boundary point (non-zero pixel).
An example is provided here and here.
The measurement can be based on various definitions, calculated discretely or precisely: e.g. Euclidean, Manhattan, or Chessboard. Indeed, the parameters in the OpenCV implementation allow some of these, and control their accuracy via the mask size.
The function can return the output measurement image (floating point) - as well as a labelled connected components image (a Voronoi diagram). There is an example of it in operation here.
I see from another question you have asked recently you are looking to register two images together. I don't think the distance transform is really what you are looking for here. If you are looking to align a set of points I would instead suggest you look at techniques like Procrustes, Iterative Closest Point, or Ransac.

Pixel-Milimeter Proportion

I have a digital image, and I want to make some calculation based on distances on it. So I need to get the Milimeter/Pixel proportion. What I'm doing right now, is to mark two points wich I know the real world distance, to calculate the Euclidian distance between them, and than obtain the proportion.
The question is, Only with two points can I make the correct Milimeter/Pixel's proportion, or do I need to use 4 points, 2 for the X-Axis and 2 for Y-axis?
If your image is of a flat surface and the camera direction is perpendicular to that surface, then your scale factor should be the same in both directions.
If your image is of a flat surface, but it is tilted relative to the camera, then marking out a rectangle of known proportions on that surface would allow you to compute a perspective transform. (See for example this question)
If your image is of a 3D scene, then of course there is no way in general to convert pixels to distances.
If you know the distance between the points A and B measured on the picture(say in inch) and you also know the number of pixels between the points, you can easily calculate the pixels/inch ratio by dividing <pixels>/<inches>.
I suggest to take the points on the picture such that the line which intersects them is either horizontal either vertical such that calculations do not have errors taking into account the pixels have a rectangular form.

opencv SimpleBlobDetector filterByInertia meaning?

I don't understand what filterByInertia means... neither do I understand the documentation's little description :
By ratio of the minimum inertia to maximum inertia. Extracted blobs will have this ratio between minInertiaRatio (inclusive) and maxInertiaRatio (exclusive).
. The above image pretty much explains what the different filter parameters do. SimpleBlobDetector is happiest when it sees a circular blob, and different filters filter out different kids of deviations from the circular shape.
Inertia measures the the ratio of the minor and major axes of a blob.
The figure also shows the difference between circularity and inertia. I have copied this figure from Blob Detection Tutorial at LearnOpenCV.com
I've been wondering this for a while also; the OpenCV documentation isn't very helpful when it comes to blob detection.
Based on the descriptions of other blob analyzers, the inertia of a blob is "the inertial resistance of the blob to rotation about its principal axes". It depends on how the mass of the blob (I guess in this case the area) is distributed throughout the blob's shape.
There's a lot of mathy stuff involved -- most of which I don't remember how to do -- but the result at the bottom of this page on the properties of binary images sums it up fairly well (blob detection is done by converting the input image to a series of binary images):
The ratio gives us some idea of how rounded the object is. This ratio will be 0 for a line and 1 for a circle.
So basically, by specifying minInertiaRatio and maxInertiaRatio you can filter the blobs based on how elongated they are. An inertia ratio of 0 will yield elongated blobs (closer to lines) and an inertia ratio of 1 will yield blobs where the area is more concentrated toward the center (closer to circles).
Here's a physical intepretation:
If you cut the blob out on a piece of card, you could find its center of gravity, and then attach an axle to it, crossing this point (the axle being parallel to the card), and then spin it, and measure its moment of inertia. Depending on the shape, you may get different values according to how you place the axle. For an ellipse, you get the lowest value when the axle is attached along the long (major) axis, and the largest when the the axle is placed along the short axis (so that more of the card is far from the axle). For a circle the inertia is always the same, of course.
If there are different values, there will be always be a 'max' inertia at some orientation, and a 'min' with the axle placed 90 degrees away from the 'max'. The inertia ratio is simply the ratio between these intertias, min/max.
For shapes which are not ellipses, the metric tells you whether the overall shape is roughly elongated, or roughly the same size in all directions; without caring in particular about an uneven boundary or cuts and concavities (which roundness and convexity look at).
Mathematically, it does something like this:
Consider the set of points within the blob to be a population of (x,y) samples
Find the mean of these, and the covariance matrix x vs. y
Find the two eigenvalues of the covariance matrix (which are the same as its singular values, due to the nature of this matrix)
The inertia ratio is the ratio between these two values, smallest/largest.

How to determine the distance of a (skewed) rectangular target from a camera

I have a photograph containing multiple rectangles of various sizes and orientations. I am currently trying to find the distance from the camera to any rectangles present in the image. What is the best way to accomplish this?
For example, an example photograph might look like similar to this (although this is probably very out-of-proportion):
I can find the pixel coordinates of the corners of any of the rectangles in the image, along with the camera FOV and resolution. I also know beforehand the length and width of any rectangle that could be in the image (but not what angle they face the camera). The ratio of length to width of each rectangular target that could be in the image is guaranteed to be unique. The rectangles and the camera will always be parallel to the ground.
What I've tried:
I hacked out a solution based on some example code I found on the internet. I'm basically iterating through each rectangle and finding the average pixel length and height.
I then use this to find the ratio of length vs. height, and compare it against a list of
the ratios of all known rectangular targets so I can find the actual height of the target in inches. I then use this information to find the distance:
...where actual_height is the real height of the target in inches, the IMAGE_HEIGHT is how tall the image is (in pixels), the pixel_height is the average height of the rectangle on the image (in pixels), and the VERTICAL_FOV is the angle the camera sees along the vertical axis in degrees (about 39.75 degrees on my camera).
I found this formula on the internet, and while it seems to work somewhat ok, I don't really understand how it works, and it always seems to undershoot the actual distance by a bit.
In addition, I'm not sure how to go about modifying the formula so that it can deal with rectangles that are very skewed from viewing them along an angle. Since my algorithm works by finding the proportion of the length and height, it works ok for rectangles 1 and 2 (which aren't too skewed), but doesn't work for rectangle 3, since it's very skewed, throwing the ratios completely off.
I considered finding the ratio using the method outlined in this StackOverflow question regarding the proportions of a perspective-deformed rectangle, but I wasn't sure how well that would work with what I have, and was wondering if it's overkill or if there's a simpler solution I could try.
FWIW I once did something similar with triangles (full 6DoF pose, not just distance).

Resources