PCL Point Feature Histograms - binning - histogram

The binning process, which is part of the point feature histogram estimation, results in b^3 bins if only the three angular features (alpha, phi, theta) are used, where b is the number of bins.
Why is it b^3 and not b * 3?
Let's say we consider alpha.
The feature value range is subdivided into b intervals. You iterate over all neighbors of the query point and count the amount of alpha values which lie in one interval. So you have b bins for alpha. When you repeat this for the other two features, you get 3 * b bins.
Where am I wrong?

For simplicity, I'll first explain it in 2D, i.e. with two angular features. In that case, you would have b^2 bins, not b*2.
The feature space is divided into a regular grid. Features are binned according to their position in the 2D (or 3D) space, not independently along each dimension. See the following example with two feature dimensions and b=4, where the feature is binned into the cell marked with #:
^ phi
|
+-+-+-+-+
| | | | |
+-+-+-+-+
| | | | |
+-+-+-+-+
| | | |#|
+-+-+-+-+
| | | | |
+-+-+-+-+-> alpha
The feature is binned into the cell where alpha is in a given interval AND phi in another interval. The key difference to your understanding is that the dimensions are not treated independently. Each cell specifies an interval on all the dimensions, not a single one.
(This would work the same way in 3D, only that you would have another dimension for theta and a 3D grid instead of a 2D one.)
This way of binning results in b^2 bins for the 2D case, since each interval in the alpha dimension is combined with ALL intervals in the phi dimension, resulting in a squaring of the number, not a doubling. Add another dimension, and you get the cubing instead of the tripling, as in your question.

Related

Mean Filter at first position (0,0)

Actually, I am in the middle work of adaptive thresholding using mean. I used 3x3 matrix so I calculate means value on that matrix and replace it into M(1,1) or middle position of the matrix. I got confused about how to do perform the process at first position f(0,0).
This is a little illustration, let's assume that I am using 3x3 Matrix (M) and image (f) first position f(0,0) = M(1,1) = 4. So, M(0,0) M(0,1) M(0,2) M(1,0) M(2,0) has no value.
-1 | -1 | -1 |
-1 | 4 | 3 |
-1 | 2 | 1 |
Which one is the correct process,
a) ( 4 + 3 + 2 + 1 ) / 4
b) ( 4 + 3 + 2 + 1) / 9
I asked this because I follow some tutorial adaptive mean thresholding, it shows a different result. So, I need to make sure that the process is correct. Thanks.
There is no "correct" way to solve this issue. There are many different solutions used in practice, they all have some downsides:
Averaging over only the known values (i.e. your suggested (4+3+2+1)/4). By averaging over fewer pixels, one obtains a result that is more sensitive to noise (i.e. the "amount of noise" left in the image after filtering is larger near the borders. Also, a bias is introduced, since the averaging happens over values to one side only.
Assuming 0 outside the image domain (i.e. your suggested (4+3+2+1)/9). Since we don't know what is outside the image, assuming 0 is as good as anything else, no? Well, no it is not. This leads to a filter result that has darker values around the edges.
Assuming a periodic image. Here one takes values from the opposite side of the image for the unknown values. This effectively happens when computing the convolution through the Fourier domain. But usually images are not periodic, with strong differences in intensities (or colors) at opposite sides of the image, leading to "bleeding" of the colors on the opposite of the image.
Extrapolation. Extending image data by extrapolation is a risky business. This basically comes down to predicting what would have been in those pixels had we imaged them. The safest bet is 0-order extrapolation (i.e. replicating the boundary pixel), though higher-order polygon fits are possible too. The downside is that the pixels at the image edge become more important than other pixels, they will be weighted more heavily in the averaging.
Mirroring. Here the image is reflected at the boundary (imagine placing a mirror at the edge of the image). The value at index -1 is taken to be the value at index 1; at index -2 that at index 2, etc. This has similar downsides as the extrapolation method.

How to get scale, rotation & translation after feature tracking?

I have implemented a Kanade–Lucas–Tomasi feature tracker. I have used it on two images, that show the same scene, but the camera has moved a bit between taking the pictures.
As a result I get the coordinates of the features. For example:
1. Picture:
| feature | (x,y)=val |
|---------|-----------------|
| 1 | (436,349)=33971 |
| 2 | (440,365)=29648 |
| 3 | ( 36,290)=29562 |
2nd Picture:
| feature | (x,y)=val |
|---------|--------------|
| 1 | (443.3,356.0)=0 |
| 2 | (447.6,373.0)=0 |
| 3 | ( -1.0, -1.0)=-4 |
So I know the position of the features 1 & 2 in both images and that feature 3 couldn't be found in the second image. The coordinates of the features 1 & 2 aren't the same, because the camera has zoomed in a bit and also moved.
Which algorithm is suitable to get the scale, rotation and translation between the two images? Is there a robust algorithm, that also considers outliers?
If you dont know what movement happened between the images, then you need to calculate the Homography between them. The homography however, needs 4 points to be calculated.
If you have 4 points in both images, that are relatively on a plane (same flat surface, e.g. a window), then you can follow the steps from here in math.stackexchange to compute the homography matrix that will transform between images.
Note that while rotation and translation may happen between 2 images, they also could have been taken from different angles. If this happens, then the homography is your only option. Instead, if the images are for sure just rotation and translation (e.g. 2 satelite images) then you may find some other method, but homography will also help.
Depending upon whether the camera is calibrated or uncalibrated, tracked features to compute essential or fundamental matrix respectively.
Factorize the matrix into R, T. Use Multiview geometry book for any help with the formulae. https://www.robots.ox.ac.uk/~vgg/hzbook/hzbook1/HZepipolar.pdf
Caution: These steps only work well if the features come from different depth planes and cover wide field of view. In case all features lie on a single plane, you should estimate homography and try to factorize that instead.

Calculating Gray Level Co-occurrence Matrices (GLCM)

GLCM considers the relation between two pixels at a time, called the reference and the neighbour pixel. Based on the selection of the neighbour pixel generally 4 different Gray Level Co-occurrence Matrices (GLCM) can be calculated for an image.
Selection of the neighbour pixel is as follows.
reference pixel | neighbour pixel
(x,y) (x+1, y) the pixel to its right
(x,y) (x+1, y+1) the pixel to its right and above
(x,y) (x, y+1) the pixel above
(x,y) (x-1, y+1) the pixel to its left and above
A good, in-detail explanation about GLCM is available here (Original link).
My question is, is it required to consider all 3 intensity values of an image pixel when calculating Gray Level Co-occurrence Matrices (GLCM) of a "gray scale image"?
As an example consider an image with 2 pixels
------------------------------------------------------------------------------------
| [pixel1] | [pixel2] |
| / | \ | / | \ |
| [intensity1] [intensity2] [intensity3] | [intensity4] [intensity5] [intensity6] |
------------------------------------------------------------------------------------
When calculating the GLCM of a gray scale image is it required to take into account all 3 intensity values of a pixel?
E.g- When the reference pixel is (x,y) and its neighbour pixel is (x+1, y) the pixel to its right
Is it required to take into account the occurrences of intensity levels individually as follows?
[intensity1] & [intensity2]
[intensity2] & [intensity3]
[intensity3] & [intensity4]
[intensity4] & [intensity5]
[intensity5] & [intensity6]
Or can I just take into account one intensity value from each pixel, assuming all 3 intensity values of a pixel is same as follows?
[intensity1] & [intensity4]
Which is the correct method? Is it applicable for all 4 neighbours?

Essential Matrix from Fundamental Matrix in OpenCV

I've already computed the Fundamental Matrix of a stereo pair through corresponding points, found using SURF. According to Hartley and Zisserman, the Essential Matrix is computed doing:
E = K.t() * F * K
How I get K? Is there another way to compute E?
I don't know where you got that formulae, but the correct one is
E = K'^T . F . K (see Hartley & Zisserman, §9.6, page 257 of second edition)
K is the intrinsic camera parameters, holding scale factors and positions of the center of the image, expressed in pixel units.
| \alpha_u 0 u_0 |
K = | 0 \alpha_u v_0 |
| 0 0 1 |
(sorry, Latex not supported on SO)
Edit : To get those values, you can either:
calibrate the camera
compute an approximate value if you have the manufacturer data. If the lens is correctly centered on the sensor, then u_0 and v_0 are the half of, respectively, width and height of image resolution. And alpha = k.f with f: focal length (m.), and k the pixel scale factor: if you have a pixel of, say, 6 um, then k=1/6um.
Example, if the lens is 8mm and pixel size 8um, then alpha=1000
Computing E
Sure, there are several of ways to compute E, for example, if you have strong-calibrated the rig of cameras, then you can extract R and t (rotation matrix and translation vector) between the two cameras, and E is defined as the product of the skew-symmetric matrix t and the matrix R.
But if you have the book, all of this is inside.
Edit Just noticed, there is even a Wikipedia page on this topic!

Calculate surface area

For a given terrain, how can you calculate its surface area?
As of now, I plan to build the terrain using Three.js with something like:
var geo = new THREE.PlaneGeometry(300, 300, 10, 10);
for (var i = 0; i < geo.vertices.length; i++)
geo.vertices[i].y = someHeight; // Makes the flat plain into a terrain
Next, if its possible to iterate through each underlying triangle of the geometry (i.e. triangles of TRIANGLE_STRIP given to the WebGL array) the area of each triangle could be summed up to get the total surface area.
Does this approach sound right? If so, how do you determine vertices of individual triangles?
Any other ideas to build the terrain in WebGL/Three.js are welcome.
I think your approach sounds right and shouldn't be hard to implement.
I'm not familiar with three.js, but I think it's quite easy to determine positions of the vertices. You know that the vertices are evenly distribute between x=0...300, z=0...300 and you know the y coordinate. So the [i,j]-th vertex has position [i*300/10, y, j*300/10].
You have 10x10 segments in total and each segment consists of 2 triangles. This is where you have to be careful.
The triangles could form two different shapes:
------ ------
| \ | | /|
| \ | or | / |
| \| | / |
------ ------
which could result in different shape and (I'm not entirely sure about this) into different surface areas.
When you find out how exactly three.js creates the surface, it should be relatively easy to iteratively sum the triangle surfaces.
It would be nice to be able to do the sum without actual iteration over all triangles, but, right now, I don't have any idea how to do it...

Resources