Non-region of interest with Mat Image in OpenCV - opencv

I want to get features from non-region of interest area. I know how to define ROI in Mat format, however, I also need the rest of the area for negative image features.Thanks in advance.

You could use the mask to define any region you want to get features. However, it requires the called function to support mask.
For example:
void ORB::operator()(InputArray image, InputArray mask, vector<KeyPoint>& keypoints, OutputArray descriptors, bool useProvidedKeypoints=false ) const
mask – The operation mask.
If the functions do not support mask. There are two tricks to get the features in non-ROI:
Get the features of whole image, then filter the result manually.
Split the non-ROI into ROI's (as following), then pass the ROI's into the function.
For example:
|-----------------|
| 1 |
|----|-------|----|
| 2 | | 3 |
|----|-------|----|
| 4 |
|-----------------|

Related

I am confused using the OpenCV sepfilter2D function's kernelX, and kernelY

I don't know how to use sepFilter2D properly. I'm confused using the function parameters such kernelX, kernelY in OpenCV sepFilter2D function.
vector<double> filter1; //row vector
sepFilter2D(src, convolvedImg, CV_64FC3, filter1, filter1, Point(-1, -1), 0.0, BORDER_DEFAULT);
//filter1 = [0.00443305 0.0540056 0.242036 0.39905 0.242036 0.0540056 0.00443305]
As you might be aware, the operation of convolution is widely used in image processing. It involves using a 2D filter, usually small in size (e.g. 3x3 or 5x5), and the short explanation is that you overlay the filter to each position, multiply the values in the filter with the values in the image and add everything together. The wikipedia page is much more detailed in presenting this operation.
Just to get a sense for this, assuming you have a MxN image and a UxV filter. For each pixel, you have to apply the filter once. Therefore, you have to perform MNU*V multiplications and additions.
Some filters have a nice property called separability. You can achieve the same effect of a UxV 2D filter by applying once a horizontal filter of size V and then a vertical filter of size U. Now you have MNU + MNV = MN(U+V) operations, therefore this is more efficient.
The sepFilter2D does exactly this: applies a vertical and a horizontal 1D filter. The full function signature is:
void sepFilter2D(InputArray src, OutputArray dst, int ddepth, InputArray kernelX, InputArray kernelY, Point anchor=Point(-1,-1), double delta=0, int borderType=BORDER_DEFAULT )
, where src is your initial image, the filtered image will be in dst, ddepth represents the desired type of the destination image, kernelX and kernelY are the horizontal and vertical 1D kernels I described above, anchor represents the kernel origin (default means center), delta represents a value that is added to the destination image to offset its brightness and borderType represents the method used around borders.
Use Mat data structure to declare kernels. (I'm not sure about vector, I'm not near me PC right now. I'll check later.)

How to get scale, rotation & translation after feature tracking?

I have implemented a Kanade–Lucas–Tomasi feature tracker. I have used it on two images, that show the same scene, but the camera has moved a bit between taking the pictures.
As a result I get the coordinates of the features. For example:
1. Picture:
| feature | (x,y)=val |
|---------|-----------------|
| 1 | (436,349)=33971 |
| 2 | (440,365)=29648 |
| 3 | ( 36,290)=29562 |
2nd Picture:
| feature | (x,y)=val |
|---------|--------------|
| 1 | (443.3,356.0)=0 |
| 2 | (447.6,373.0)=0 |
| 3 | ( -1.0, -1.0)=-4 |
So I know the position of the features 1 & 2 in both images and that feature 3 couldn't be found in the second image. The coordinates of the features 1 & 2 aren't the same, because the camera has zoomed in a bit and also moved.
Which algorithm is suitable to get the scale, rotation and translation between the two images? Is there a robust algorithm, that also considers outliers?
If you dont know what movement happened between the images, then you need to calculate the Homography between them. The homography however, needs 4 points to be calculated.
If you have 4 points in both images, that are relatively on a plane (same flat surface, e.g. a window), then you can follow the steps from here in math.stackexchange to compute the homography matrix that will transform between images.
Note that while rotation and translation may happen between 2 images, they also could have been taken from different angles. If this happens, then the homography is your only option. Instead, if the images are for sure just rotation and translation (e.g. 2 satelite images) then you may find some other method, but homography will also help.
Depending upon whether the camera is calibrated or uncalibrated, tracked features to compute essential or fundamental matrix respectively.
Factorize the matrix into R, T. Use Multiview geometry book for any help with the formulae. https://www.robots.ox.ac.uk/~vgg/hzbook/hzbook1/HZepipolar.pdf
Caution: These steps only work well if the features come from different depth planes and cover wide field of view. In case all features lie on a single plane, you should estimate homography and try to factorize that instead.

PCL Point Feature Histograms - binning

The binning process, which is part of the point feature histogram estimation, results in b^3 bins if only the three angular features (alpha, phi, theta) are used, where b is the number of bins.
Why is it b^3 and not b * 3?
Let's say we consider alpha.
The feature value range is subdivided into b intervals. You iterate over all neighbors of the query point and count the amount of alpha values which lie in one interval. So you have b bins for alpha. When you repeat this for the other two features, you get 3 * b bins.
Where am I wrong?
For simplicity, I'll first explain it in 2D, i.e. with two angular features. In that case, you would have b^2 bins, not b*2.
The feature space is divided into a regular grid. Features are binned according to their position in the 2D (or 3D) space, not independently along each dimension. See the following example with two feature dimensions and b=4, where the feature is binned into the cell marked with #:
^ phi
|
+-+-+-+-+
| | | | |
+-+-+-+-+
| | | | |
+-+-+-+-+
| | | |#|
+-+-+-+-+
| | | | |
+-+-+-+-+-> alpha
The feature is binned into the cell where alpha is in a given interval AND phi in another interval. The key difference to your understanding is that the dimensions are not treated independently. Each cell specifies an interval on all the dimensions, not a single one.
(This would work the same way in 3D, only that you would have another dimension for theta and a 3D grid instead of a 2D one.)
This way of binning results in b^2 bins for the 2D case, since each interval in the alpha dimension is combined with ALL intervals in the phi dimension, resulting in a squaring of the number, not a doubling. Add another dimension, and you get the cubing instead of the tripling, as in your question.

Calculating Gray Level Co-occurrence Matrices (GLCM)

GLCM considers the relation between two pixels at a time, called the reference and the neighbour pixel. Based on the selection of the neighbour pixel generally 4 different Gray Level Co-occurrence Matrices (GLCM) can be calculated for an image.
Selection of the neighbour pixel is as follows.
reference pixel | neighbour pixel
(x,y) (x+1, y) the pixel to its right
(x,y) (x+1, y+1) the pixel to its right and above
(x,y) (x, y+1) the pixel above
(x,y) (x-1, y+1) the pixel to its left and above
A good, in-detail explanation about GLCM is available here (Original link).
My question is, is it required to consider all 3 intensity values of an image pixel when calculating Gray Level Co-occurrence Matrices (GLCM) of a "gray scale image"?
As an example consider an image with 2 pixels
------------------------------------------------------------------------------------
| [pixel1] | [pixel2] |
| / | \ | / | \ |
| [intensity1] [intensity2] [intensity3] | [intensity4] [intensity5] [intensity6] |
------------------------------------------------------------------------------------
When calculating the GLCM of a gray scale image is it required to take into account all 3 intensity values of a pixel?
E.g- When the reference pixel is (x,y) and its neighbour pixel is (x+1, y) the pixel to its right
Is it required to take into account the occurrences of intensity levels individually as follows?
[intensity1] & [intensity2]
[intensity2] & [intensity3]
[intensity3] & [intensity4]
[intensity4] & [intensity5]
[intensity5] & [intensity6]
Or can I just take into account one intensity value from each pixel, assuming all 3 intensity values of a pixel is same as follows?
[intensity1] & [intensity4]
Which is the correct method? Is it applicable for all 4 neighbours?

Calculate surface area

For a given terrain, how can you calculate its surface area?
As of now, I plan to build the terrain using Three.js with something like:
var geo = new THREE.PlaneGeometry(300, 300, 10, 10);
for (var i = 0; i < geo.vertices.length; i++)
geo.vertices[i].y = someHeight; // Makes the flat plain into a terrain
Next, if its possible to iterate through each underlying triangle of the geometry (i.e. triangles of TRIANGLE_STRIP given to the WebGL array) the area of each triangle could be summed up to get the total surface area.
Does this approach sound right? If so, how do you determine vertices of individual triangles?
Any other ideas to build the terrain in WebGL/Three.js are welcome.
I think your approach sounds right and shouldn't be hard to implement.
I'm not familiar with three.js, but I think it's quite easy to determine positions of the vertices. You know that the vertices are evenly distribute between x=0...300, z=0...300 and you know the y coordinate. So the [i,j]-th vertex has position [i*300/10, y, j*300/10].
You have 10x10 segments in total and each segment consists of 2 triangles. This is where you have to be careful.
The triangles could form two different shapes:
------ ------
| \ | | /|
| \ | or | / |
| \| | / |
------ ------
which could result in different shape and (I'm not entirely sure about this) into different surface areas.
When you find out how exactly three.js creates the surface, it should be relatively easy to iteratively sum the triangle surfaces.
It would be nice to be able to do the sum without actual iteration over all triangles, but, right now, I don't have any idea how to do it...

Resources