ARKit Perspective Correction - ios

I'm working on a project with ARKit and I'm trying to do a perspective correction of the ARFrame.capturedImage to orient a piece of paper sitting on a detected plane so I can feed that into a CoreML model which expects images to be taken from directly overhead.
ARKit gives me the device orientation relative to the plane (ARCamera.transform, ARCamera.eulerAngles, and ARCamera.projectionMatrix all look promising).
So I have the orientation of the camera (and I know the plane is horizontal since that's all ARKit detects right now).. but I can't quite figure out how to create a GLKMatrix4 that will perform the correct perspective correction.
Originally I thought it would be as easy as transforming by the inverse of ARCamera.projectionMatrix but that doesn't appear to work at all; I'm not entirely sure what that matrix is describing.. it doesn't seem to change much based on the device orientation.
I've tried creating my own matrix using GLKMatrix4Rotate and the roll/pitch/yaw but that didn't work.. I couldn't even get it working with a single axis of rotation.
I found GLKMatrix4MakePerspective, GLKMatrix4MakeOrtho, and GLKMatrix4MakeFrustum which seem to do perspective transforms but I can't figure out how to take the information I have and translate it to the inputs of those functions to make the proper perspective transformation.
Edit:
As an example to better explain what I'm trying to do, I used the Perspective Warp tool in Photoshop to transform an example image; what I want to know is how to come up with a matrix that will perform a similar transform given the info I have about the scene.

I ended up using iOS11 Vision's Rectangle Detection and then feeding it into Core Image's CIPerspectiveCorrection filter.

I solved using OpenCV perspective transformation. (https://docs.opencv.org/trunk/da/d6e/tutorial_py_geometric_transformations.html,https://docs.opencv.org/2.4/modules/imgproc/doc/geometric_transformations.html#getperspectivetransform)
If you're able to get the corners of your paper in the scene (for example with an ARReferenceImage and project them in 2D), take them. Otherwise you can try to detect the corners through OpenCV directly (see https://stackoverflow.com/a/12636153/9298773) from the UIImage taken from sceneView.snapshot() with sceneView of type ARSceneView. In this last case I'd suggest you to binarize first and to change the MAX_CORNERS variable in the snippet at the link above to 4 (the 4 corners of your paper).
Then create a new cv::Mat with width and height of your choice respecting the proportion width and height of your paper and do perspective transform. For a guideline of this last paragraph, take a look at the section "Perspective Correction using Homography" at this link: https://www.learnopencv.com/homography-examples-using-opencv-python-c/#download. Succintly: you ask opencv to find an appropriate transform to project your prospected paper points into a perfectly rectangular plane (your new cv::Mat)

Related

Can ARKit detect specific surfaces as planes?

Using iOS 11 and iOS 12 and ARKit, we are currently able to detect planes on horizontal surfaces, and we may also visualize that plane on the surface.
I am wondering if we can declare, through some sort of image file, specific surfaces in which we want to detect planes? (possibly ignoring all other planes that ARKit detects from other surfaces)
If that is not possible, could we then perhaps capture the plane detected (via an image), to which we could then process through a CoreML model which identifies that specific surface?
ARKit has no support for such thing at the moment. You can indeed capture the plane detected as an image and if you're able to match this through core ML in real time, I'm sure lot of people would be interested!
You should:
get the 3D position of the corners of the plane
find their 2D position in the frame, using sceneView.projectPoint
extract the frame from the currentFrame.capturedImage
do an affine transform on the image to be left with the your plane, reprojected to a rectangle
do some ML / image processing to detect a match
Keep in mind that the ARKit rectangle detection is often not well aligned, and can have only part of the full plane.
Finally, unfortunately, the feature points that ARKit exposes are not useful since they dont contain any characteristics used for matching feature points across frames, and Apple has not say what algorithm they use to compute their feature points.
Here is small demo code for Find horizontal surface. In #Swift5 Github

Measuring distance between objects from a photo, Perspective transform

I have two questions which could be related:
1.) I would like to estimate distances between objects which are positioned in one plane from a photo. Geometrical shape of one object in the photo is rectangular and its dimensions are known, but there is no information on the photo (Camera focal length, photo angle, senor size etc…). For example, say I have the following PCB photo and dimensions of the rectangular chip are known to be 20x10mm, all objects lie in a plane. Is it even possible to estimate the distances (in top view) between other PCB components ?
In this particular case, maximum distance error of 2-3mm would be acceptable.
2.) Say I have similar PCB photo like the above, where I have one feature (object) for which I know it is rectangular shaped. I would like to transform the image perspective so that the object looks rectangular. I have tried imageJ (Fiji) and Interactive Perspective Plugin for this task. First I display rectangular grid over the image and then manually transform the image using the plugin till the object does not appear rectangular. But for some photo angles I find it impossible to manually adjust the control points in order to get rectangular object shape.
Does somebody know alternative approach using imageJ (Fiji) or Octave ? A solution in python would also be ok, although I don’t have much python experience (just recently installed Anaconda with Spyder).
A few years ago, I created a software that seems good for you. It corrects perspective transforming a quadrilateral to a rectangle.
Here is the result:
,
where you can measure distances.

How to determine distance of objects from camera using Epipolar Plane Image?

I am working on converting 2d images to 3d environment. The images were collected from a video made in a lateral motion. Then the images were placed one behind the other, so it would be easy to find the correspondences between the two images. This is called a spatial-temporal volume.
Next I take a slice from the spatiotemporal volume. That slice is called the Epipolar Plane Image.
Using the Epipolar Plane Image, I want to calculate the depth of the objects in the scene and make a 3D enviornment. I have listed the reference but I have not been able to figure out the math described in the paper. Can someone help me figure this out? Any help is appreciated.
Reference
Epipolar-Plane Image Analysis: An Approach to Determining Structure from Motion* !
The math in this situation is easy and straight forward.
First let's define two the coordinate systems for two overlapping images taken by the same camera with the focal length with the following schema:
Let us say that first camera position is defined as follows:
While it's orientation by using three Euler angles is:
By using this definition the corresponding rotation matrix is the identity matrix
The second camera position can be defined as follows:
And since the orientation is the same as the first camera, all Euler angles remain zero:
Which also means that the corresponding rotation matrix is the identity matrix.
If the images overlap and the orientation is the same, the situation in the image space looks like this:
Here the image coordinates and their measurement accuracy are defined as follows:
This geometrical situation can be described by using the Intercept Theorem:
As you see it's not complicated. But be aware that this solution is certainly not the best, since it's base assumption that all orientation angles are the same can't be fulfilled in reality.
If you need to be accurate then you have to perform an bundle adjustment. However, this equations are often used to determine the approximated solution for this geometric situation, where the values are used to linearize the collinearity equations.

How to create 3D perspective views of an image using OpenCV?

I have an image on the wall. I'd like to create its 3D perspective views by myself. Suppose the points on the images, camera location, orientation of the camera are given, how can I do to obtain the 3d perspective matrix to play with the original image?
I understand I can use the orientation of the camera to calculate the 3d rotation matrix, but I've no idea how to calculate the subsequent projection matrix...
I've come across this link (see Section Perspective Projection), but I don't understand what's going on after projection.. And what is the difference between the camera position and the viewer's position?
Thanks a lot.
use openGl and its open example to solve your problem.
in bellow link there are good samples to undestand 3d reconstruction:
http://www.songho.ca/opengl/gl_transform.html
wish helpful

Using OpenCV to correct stereo images

I intend to make a program which will take stereo pair images, taken by a single camera, and then correct and crop them so that when the images are viewed side by side with the parallel or cross eye method, the best 3D effect will be achieved. The left image will be the reference image, the right image will be modified for corrections. I believe OpenCV will be the best software for these purposes. So far I believe the processing will occur something like this:
Correct for rotation between images.
Correct for y axis shift.
Doing so will I imagine result in irregular black borders above and below the right image so:
Crop both images to the same height to remove borders.
Compute stereo-correspondence/disparity
Compute optimal disparity
Correct images for optimal disparity
Okay, so that's my take on what needs doing and the order it occurs in, what I'm asking is, does that seem right, is there anything I've missed, anything in the wrong order etc. Also, which specific functions of OpenCV would I need to use for all the necessary steps to complete this project? Or is OpenCV not the way to go? Much thanks.
OpenCV is great for this.
There is a whole chapter in:
And all the sample code for this in the book ships with the opencv distribution
edit: Roughly the steps are:
Remap each image to remove lens distortions and rotate/translate views to image center.
Crop pixels that don't appear in both views (optional)
Find matching objects in each view (stereoblock matching) create disparity map
Reproject disparity map into 3D model

Resources