Map points from one 2D plane to another - mapping

Given a point on a plane A, I want to be able to map to its corresponding point on plane B. I have a set of N corresponding pairs of reference points between the two planes, however, the overall mapping is not a simple affine transform (no homographies for me).
Things I have tried:
For a given point, find the three closest reference points in plane A, compute barrycentric coordinates of that triangle, and then apply that transform to the corresponding reference points in plane B. How it failed: sometimes the three closest points were nearly collinear, so errors were huge. Also, there was no consistency in the mapping when crossing borders. It was very "jittery."
Compute all possible triangles given the N reference points (N^3). Order them by size. For the given point, find the smallest triangle that it's in. This fixes the linearly of the
points problem, but was still extremely jittery and slow.
Start with a triangulated plane A. Iterate through the reference points, adding each one to the reference plane. Every time you add a point it exists in at least one triangle. Break that triangle into three triangles using the new reference point as a vertex. You end up with plane A triangulated so you can map from plane A to plane B with ease. Issues: You can prove that every triangle will have a point that is on the edge of the planes. This results in huge errors if your reference points are far from the edge of the planes.
I feel like this should be a fairly standard problem. Are there standard algorithms/libraries for this?

There you go my friend.. I have used it myslef and can only recommend you give it a try.
Kahn Academy - Matrix transformations
Understanding how we can map one set of vectors to another set. Matrices used to define linear transformations
https://www.khanacademy.org/math/linear-algebra/matrix_transformations

Related

How to check whether Essential Matrix is correct or not without decomposing it?

On a very high level, my pose estimation pipeline looks somewhat like this:
Find features in image_1 and image_2 (let's say cv::ORB)
Match the features (let's say using the BruteForce-Hamming descriptor matcher)
Calculate Essential Matrix (using cv::findEssentialMat)
Decompose it to get the proper rotation matrix and translation unit vector (using cv::recoverPose)
Repeat
I noticed that at some point, the yaw angle (calculated using the output rotation matrix R of cv::recoverPose) suddenly jumps by more than 150 degrees. For that particular frame, the number of inliers is 0 (the return value of cv::recoverPose). So, to understand what exactly that means and what's going on, I asked this question on SO.
As per the answer to my question:
So, if the number of inliers is 0, then something went very wrong. Either your E is wrong, or the point matches are wrong, or both. In this case you simply cannot estimate the camera motion from those two images.
For that particular image pair, based on the visualization and from my understanding, matches look good:
The next step in the pipeline is finding the Essential Matrix. So, now, how can I check whether the calculated Essential Matrix is correct or not without decomposing it i.e. without calculating Roll Pitch Yaw angles (which can be done by finding the rotation matrix via cv::recoverPose)?
Basically, I want to double-check whether my Essential Matrix is correct or not before I move on to the next component (which is cv::recoverPose) in the pipeline!
The essential matrix maps each point p in image 1 to its epipolar line in image 2. The point q in image 2 that corresponds to p must be very close to the line. You can plot the epipolar lines and the matching points to see if they make sense. Remember that E is defined in normalized image coordinates, not in pixels. To get the epipolar lines in pixel coordinates, you would need to convert E to F (the fundamental matrix).
The epipolar lines must all intersect in one point, called the epipole. The epipole is the projection of the other camera's center in your image. You can find the epipole by computing the nullspace of F.
If you know something about the camera motion, then you can determine where the epipole should be. For example, if the camera is mounted on a vehicle that is moving directly forward, and you know the pitch and roll angles of the camera relative to the ground, then you can calculate exactly where the epipole will be. If the vehicle can turn in-plane, then you can find the horizontal line on which the epipole must lie.
You can get the epipole from F, and see if it is where it is supposed to be.

Measure distance to object with a single camera in a static scene

let's say I am placing a small object on a flat floor inside a room.
First step: Take a picture of the room floor from a known, static position in the world coordinate system.
Second step: Detect the bottom edge of the object in the image and map the pixel coordinate to the object position in the world coordinate system.
Third step: By using a measuring tape measure the real distance to the object.
I could move the small object, repeat this three steps for every pixel coordinate and create a lookup table (key: pixel coordinate; value: distance). This procedure is accurate enough for my use case. I know that it is problematic if there are multiple objects (an object could cover an other object).
My question: Is there an easier way to create this lookup table? Accidentally changing the camera angle by a few degrees destroys the hard work. ;)
Maybe it is possible to execute the three steps for a few specific pixel coordinates or positions in the world coordinate system and perform some "calibration" to calculate the distances with the computed parameters?
If the floor is flat, its equation is that of a plane, let
a.x + b.y + c.z = 1
in the camera coordinates (the origin is the optical center of the camera, XY forms the focal plane and Z the viewing direction).
Then a ray from the camera center to a point on the image at pixel coordinates (u, v) is given by
(u, v, f).t
where f is the focal length.
The ray hits the plane when
(a.u + b.v + c.f) t = 1,
i.e. at the point
(u, v, f) / (a.u + b.v + c.f)
Finally, the distance from the camera to the point is
p = √(u² + v² + f²) / (a.u + b.v + c.f)
This is the function that you need to tabulate. Assuming that f is known, you can determine the unknown coefficients a, b, c by taking three non-aligned points, measuring the image coordinates (u, v) and the distances, and solving a 3x3 system of linear equations.
From the last equation, you can then estimate the distance for any point of the image.
The focal distance can be measured (in pixels) by looking at a target of known size, at a known distance. By proportionality, the ratio of the distance over the size is f over the length in the image.
Most vision libraries (including opencv) have built in functions that will take a couple points from a camera reference frame and the related points from a Cartesian plane and generate your warp matrix (affine transformation) for you. (some are fancy enough to include non-linearity mappings with enough input points, but that brings you back to your time to calibrate issue)
A final note: most vision libraries use some type of grid to calibrate off of ie a checkerboard patter. If you wrote your calibration to work off of such a sheet, then you would only need to measure distances to 1 target object as the transformations would be calculated by the sheet and the target would just provide the world offsets.
I believe what you are after is called a Projective Transformation. The link below should guide you through exactly what you need.
Demonstration of calculating a projective transformation with proper math typesetting on the Math SE.
Although you can solve this by hand and write that into your code... I strongly recommend using a matrix math library or even writing your own matrix math functions prior to resorting to hand calculating the equations as you will have to solve them symbolically to turn it into code and that will be very expansive and prone to miscalculation.
Here are just a few tips that may help you with clarification (applying it to your problem):
-Your A matrix (source) is built from the 4 xy points in your camera image (pixel locations).
-Your B matrix (destination) is built from your measurements in in the real world.
-For fast recalibration, I suggest marking points on the ground to be able to quickly place the cube at the 4 locations (and subsequently get the altered pixel locations in the camera) without having to remeasure.
-You will only have to do steps 1-5 (once) during calibration, after that whenever you want to know the position of something just get the coordinates in your image and run them through step 6 and step 7.
-You will want your calibration points to be as far away from eachother as possible (within reason, as at extreme distances in a vanishing point situation, you start rapidly losing pixel density and therefore source image accuracy). Make sure that no 3 points are colinear (simply put, make your 4 points approximately square at almost the full span of your camera fov in the real world)
ps I apologize for not writing this out here, but they have fancy math editing and it looks way cleaner!
Final steps to applying this method to this situation:
In order to perform this calibration, you will have to set a global home position (likely easiest to do this arbitrarily on the floor and measure your camera position relative to that point). From this position, you will need to measure your object's distance from this position in both x and y coordinates on the floor. Although a more tightly packed calibration set will give you more error, the easiest solution for this may simply be to have a dimension-ed sheet(I am thinking piece of printer paper or a large board or something). The reason that this will be easier is that it will have built in axes (ie the two sides will be orthogonal and you will just use the four corners of the object and used canned distances in your calibration). EX: for a piece of paper your points would be (0,0), (0,8.5), (11,8.5), (11,0)
So using those points and the pixels you get will create your transform matrix, but that still just gives you a global x,y position on axes that may be hard to measure on (they may be skew depending on how you measured/ calibrated). So you will need to calculate your camera offset:
object in real world coords (from steps above): x1, y1
camera coords (Xc, Yc)
dist = sqrt( pow(x1-Xc,2) + pow(y1-Yc,2) )
If it is too cumbersome to try to measure the position of the camera from global origin by hand, you can instead measure the distance to 2 different points and feed those values into the above equation to calculate your camera offset, which you will then store and use anytime you want to get final distance.
As already mentioned in the previous answers you'll need a projective transformation or simply a homography. However, I'll consider it from a more practical view and will try to summarize it short and simple.
So, given the proper homography you can warp your picture of a plane such that it looks like you took it from above (like here). Even simpler you can transform a pixel coordinate of your image to world coordinates of the plane (the same is done during the warping for each pixel).
A homography is basically a 3x3 matrix and you transform a coordinate by multiplying it with the matrix. You may now think, wait 3x3 matrix and 2D coordinates: You'll need to use homogeneous coordinates.
However, most frameworks and libraries will do this handling for you. What you need to do is finding (at least) four points (x/y-coordinates) on your world plane/floor (preferably the corners of a rectangle, aligned with your desired world coordinate system), take a picture of them, measure the pixel coordinates and pass both to the "find-homography-function" of your desired computer vision or math library.
In OpenCV that would be findHomography, here an example (the method perspectiveTransform then performs the actual transformation).
In Matlab you can use something from here. Make sure you are using a projective transformation as transform type. The result is a projective tform, which can be used in combination with this method, in order to transform your points from one coordinate system to another.
In order to transform into the other direction you just have to invert your homography and use the result instead.

What is SCNLookAtConstraint doing behind the scenes?

What is the math that SCNLookAtConstraint is doing? I want to try to recreate this with vectors.
I think it can be done with a cross product and a dot product once you have the two directional vectors.
By default the node points in the direction of the negative z-axis of its local coordinate system.
The other direction we are interested in is from the node that looks to the other node, in the node that looks's local coordinate system. You can get it by converting the positions using convertPosition:fromNode: or
convertPosition:toNode:.
If not done already, normalize the two directional vectors.
With the two directions in the local coordinate system, a cross product between the two gives a vector that is orthogonal to the plane that can be formed between the two directions. This vector is the surface normal to that plane. Any rotation around that normal is going to be another vector that remains in the plane.
Since the two directions are normalized, a dot product of the two should give you cos(ϴ), where ϴ is the angle between the two.
Rotating the first vector (the one that points in the direction of the negative z-axis) by this angle around the normal to the plane should make it point in the same direction as the second vector (that one that points at the other node).
That should be the way it's done for two vectors (or at least one way to do it).
To do it for a node, you would set a rotation of that angle around that axis, to the node that is looking. This would rotate the node so that it's local negative z-axis (the direction it's looking) would point at the other node.
I have a very similar example in one of the chapters for 3D Graphics with Scene Kit, where a node is rotated to point straight out of the surface of a sphere. You can look at the sample code to see how it's solved there.

Calculating the neighborhood distance

What method would you use to compute a distance that represents the number of "jumps" one has to do to go from one area to another area in a given 2D map?
Let's take the following map for instance:
(source: free.fr)
The end result of the computation would be a triangle like this:
A B C D E F
A
B 1
C 2 1
D 2 1 1
E . . . .
F 3 2 2 1 .
Which means that going from A to D, it takes 2 jumps.
However, to go from anywhere to E, it's impossible because the "gap" is too big, and so the value is "infinite", represented here as a dot for simplification.
As you can see on the example, the polygons may share points, but most often they are simply close together and so a maximum gap should be allowed to consider two polygons to be adjacent.
This, obviously, is a simplified example, but in the real case I'm faced with about 60000 polygons and am only interested by jump values up to 4.
As input data, I have the polygon vertices as an array of coordinates, from which I already know how to calculate the centroid.
My initial approach would be to "paint" the polygons on a white background canvas, each with their own color and then walk the line between two candidate polygons centroid. Counting the colors I encounter could give me the number of jumps.
However, this is not really reliable as it does not take into account concave arrangements where one has to walk around the "notch" to go from one polygon to the other as can be seen when going from A to F.
I have tried looking for reference material on this subject but could not find any because I have a hard time figuring what the proper terms are for describing this kind of problem.
My target language is Delphi XE2, but any example would be most welcome.
You can create inflated polygon with small offset for every initial polygon, then check for intersection with neighbouring (inflated) polygons. Offseting is useful to compensate small gaps between polygons.
Both inflating and intersection problems might be solved with Clipper library.
Solution of the potential neighbours problem depends on real conditions - for example, simple method - divide plane to square cells, and check for neighbours that have vertices in the same cell and in the nearest cells.
Every pair of intersecting polygons gives an edge in (unweighted, undirected) graph. You want to find all the path with length <=4 - just execute depth-limited BFS from every vertice (polygon) - assuming that graph is sparse
You can try a single link clustering or some voronoi diagrams. You can also brute-force or try Density-based spatial clustering of applications with noise (DBSCAN) or K-means clustering.
I would try that:
1) Do a Delaunay triangulation of all the points of all polygons
2) Remove from Delaunay graph all triangles that have their 3 points in the same polygon
Two polygons are neightbor by point if at least one triangle have at least one points in both polygons (or obviously if polygons have a common point)
Two polygons are neightbor by side if each polygon have at least two adjacents points in the same quad = two adjacent triangles (or obviously two common and adjacent points)
Once the gaps are filled with new polygons (triangles eventually combined) use Djikistra Algorithm ponderated with distance from nearest points (or polygons centroid) to compute the pathes.

Given a set of points to define a shape, how can I contract this shape like Photoshop's Selection>Contract

I have a set of points to define a shape. These points are in order and essentially are my "selection".
I want to be able to contract this selection by an arbitrary amount to get a smaller version of my original shape.
In a basic example with a triangle, the points are simply moved along their normal which is defined by the points to the left and the right of the points in question.
Eventually all 3 points will meet and form one point but until that point they will make a smaller and smaller triangle.
For more complex shapes, when moving the individual points inward, they may pass through the outer edge of the shape resulting in weird artifacts. Obviously I'll need to cull these points and remove them from the array.
Any help in exactly how I can do that would be greatly appreciated.
Thanks!
This is just an idea but couldn't you find the center of mass of the object, create a vector from the center to each point, and move each point along this vector?
To find the center of mass would of course involve averaging each x and y coordinate. Getting a vector is as simple a subtracting the point in question with the center point. Normalizing and scaling are common vector operations that can be found with the Google.
EDIT
Another way to interpret what you're asking is you want to erode your collection of points. As in morphology erosion. This is typically applied to binary images but you can slightly modify the concept to work with a collection of points. Essentially, you need to write a function that, given a point, will return true (black) or false (white) depending on if that point is inside or outside the shape defined by your points. You'd have to look up how to do that for shapes that aren't always concave (it's harder but not impossible).
Now, obviously, every single one of your actual points will return false because they're all on the border (by definition). However, you now have a matrix of points around your point of interest that define where is "inside" and where is "outside". Average all of the "inside" points and move your actual point along the vector between itself and towards this average. You could play with different erosion kernels to see what works best.
You could even work with a kernel with floating point weights instead of either/or values which will affect your average calculation proportional to their weights. With this, you could approximate a circular kernel with a low number of points. Try the simpler method first.
Find the selection center (as suggested by colithium)
Map the selection points to the coordinate system with the selection center at (0,0). For example, if the selection center is at (150,150), and a given selection point is at (125,75), the mapped position of the point becomes (-25,-75).
Scale the mapped points (multiply X and Y by something in the range of 0.0..1.0)
Remap the points back to the original coordinate system
Only simple maths required, no need to muck about normalizing vectors.

Resources