Can ARKit detect specific surfaces as planes? - ios

Using iOS 11 and iOS 12 and ARKit, we are currently able to detect planes on horizontal surfaces, and we may also visualize that plane on the surface.
I am wondering if we can declare, through some sort of image file, specific surfaces in which we want to detect planes? (possibly ignoring all other planes that ARKit detects from other surfaces)
If that is not possible, could we then perhaps capture the plane detected (via an image), to which we could then process through a CoreML model which identifies that specific surface?

ARKit has no support for such thing at the moment. You can indeed capture the plane detected as an image and if you're able to match this through core ML in real time, I'm sure lot of people would be interested!
You should:
get the 3D position of the corners of the plane
find their 2D position in the frame, using sceneView.projectPoint
extract the frame from the currentFrame.capturedImage
do an affine transform on the image to be left with the your plane, reprojected to a rectangle
do some ML / image processing to detect a match
Keep in mind that the ARKit rectangle detection is often not well aligned, and can have only part of the full plane.
Finally, unfortunately, the feature points that ARKit exposes are not useful since they dont contain any characteristics used for matching feature points across frames, and Apple has not say what algorithm they use to compute their feature points.

Here is small demo code for Find horizontal surface. In #Swift5 Github


ARKit – White paper sheet detection

I would like to build a very simple AR app, which is able to detect a white sheet of A4 paper in its surrounding. I thought it would be enough to use Apple's image recognition sample project as well as a white sample image in the ratio of a A4 sheet but the ARSession will fail.
One or more reference images have insufficient texture: white_a4,
NSLocalizedRecoverySuggestion=One or more images lack sufficient
texture and contrast for accurate detection. Image detection works
best when an image contains multiple high-contrast regions distributed
across its extent.
Is there a simple way, to detect sheets of paper using ARKit? Thanks!
I think even ARKit 3.0 isn't ready for an abstract white sheet's detection at the moment.
If you have a white sheet with some markers at its corners, or some text on it, or, even, a white sheet placed inside definite environment (it's a kind of detection based on surroundings, not on the sheet itself) – then it has some sense.
But simple white paper has no distinct marks on it, hence ARKit has no understanding what it is, what its color is (outside a room it has cold tint, for instance, but inside a room it has warm tint), what a contrast is (contrast's important property in image detection) and how it's oriented (this mainly depends on your PoV).
Suppose the common sense of image detection is that ARKit detects image, not its absence.
So, for successive detection you'll need to give ARKit not only a sheet but its surrounding as well.
Also, you can look at Apple's recommendations when working with image detection technique:
Enter the physical size of the image in Xcode as accurately as possible. ARKit relies on this information to determine the distance of the image from the camera. Entering an incorrect physical size will result in an ARImageAnchor that’s the wrong distance from the camera.
When you add reference images to your asset catalog in Xcode, pay attention to the quality estimation warnings Xcode provides. Images with high contrast work best for image detection.
Use only images on flat surfaces for detection. If an image to be detected is on a nonplanar surface, like a label on a wine bottle, ARKit might not detect it at all, or might create an image anchor at the wrong location.
Consider how your image appears under different lighting conditions. If an image is printed on glossy paper or displayed on a device screen, reflections on those surfaces can interfere with detection.
I must add that you need a unique texture pattern, not a repetitive one.
What you could do is run a simple ARWorldTrackingConfiguration where you periodically analyze the camera image for rectangles using the Vision framework.
This post ( describes how to use ARKit in combination with CoreML

Convert ARKit SCNNode's bounding extent

I have an ARKit app that uses plane detection, and successfully places objects on those planes. I want to use some of the information on what's sitting below the object in my approach to shading it - something a bit similar to the WWDC demo where the chameleon blended in with the color of the table. I want to grab the rectangular region of the screen around the footprint of the object, (or in this case, the bounding volume of the whole node would work just as well) so I can take the camera capture data for the region of interest and use it in the image processing, like a metal sphere that reflects the ground it's sitting on. I'm just not sure what combination of transforms to apply - I've tried various combinations of convertPoint and projectPoint, and I occasionally get the origin, height, or width right, but never all 3. Is there an easy helper method I'm missing? I assume basically what I'm looking for is a way of going from SCNNode -> extent.

ARKit Perspective Correction

I'm working on a project with ARKit and I'm trying to do a perspective correction of the ARFrame.capturedImage to orient a piece of paper sitting on a detected plane so I can feed that into a CoreML model which expects images to be taken from directly overhead.
ARKit gives me the device orientation relative to the plane (ARCamera.transform, ARCamera.eulerAngles, and ARCamera.projectionMatrix all look promising).
So I have the orientation of the camera (and I know the plane is horizontal since that's all ARKit detects right now).. but I can't quite figure out how to create a GLKMatrix4 that will perform the correct perspective correction.
Originally I thought it would be as easy as transforming by the inverse of ARCamera.projectionMatrix but that doesn't appear to work at all; I'm not entirely sure what that matrix is describing.. it doesn't seem to change much based on the device orientation.
I've tried creating my own matrix using GLKMatrix4Rotate and the roll/pitch/yaw but that didn't work.. I couldn't even get it working with a single axis of rotation.
I found GLKMatrix4MakePerspective, GLKMatrix4MakeOrtho, and GLKMatrix4MakeFrustum which seem to do perspective transforms but I can't figure out how to take the information I have and translate it to the inputs of those functions to make the proper perspective transformation.
As an example to better explain what I'm trying to do, I used the Perspective Warp tool in Photoshop to transform an example image; what I want to know is how to come up with a matrix that will perform a similar transform given the info I have about the scene.
I ended up using iOS11 Vision's Rectangle Detection and then feeding it into Core Image's CIPerspectiveCorrection filter.
I solved using OpenCV perspective transformation. (,
If you're able to get the corners of your paper in the scene (for example with an ARReferenceImage and project them in 2D), take them. Otherwise you can try to detect the corners through OpenCV directly (see from the UIImage taken from sceneView.snapshot() with sceneView of type ARSceneView. In this last case I'd suggest you to binarize first and to change the MAX_CORNERS variable in the snippet at the link above to 4 (the 4 corners of your paper).
Then create a new cv::Mat with width and height of your choice respecting the proportion width and height of your paper and do perspective transform. For a guideline of this last paragraph, take a look at the section "Perspective Correction using Homography" at this link: Succintly: you ask opencv to find an appropriate transform to project your prospected paper points into a perfectly rectangular plane (your new cv::Mat)

how to find object distance from asus xtion pro camera opencv, ROS

Hi i am using an asus xtion pro live camera for my object detection, i am also new to opencv. Im trying to get distance of object from the camera. The Object detected is in 2d image. Im not sure on what should i use to get the information then following up with the calculations to get distance between camera and object detected. Could someone advise me please?
In short: You can't.
You're losing the depth information and any visible pixel in your camera image essentially transforms into a ray originating from your camera.
So once you've got an object at pixel X, all you know is that the object somewhere intersects the vector cast based on this pixel and the camera's intrinsic/extrinsic parameters.
You'll essentially need more information. One of the following should suffice:
Know at least one coordinate of the 3D point (e.g. everything detected is on the ground or in some known plane).
Know the relation between two projected points:
Either the same point from different positions (known camera movement/offset)
or two points with significant distance between them (like the two ends of some staff or bar).
Once you've got either, you're able to use simple trigonometry (rule of three) to calculate the missing values.
Since I initially missed this being a camera with an OpenNI compatible depth sensor, it's possible to build OpenCV with support for that by definining the preprocessor define WITH_OPENNI when building the library.
I don't like to be the one breaking this to you but what you are trying to do is either impossible or extremely difficult with a single camera.
You need to have the camera moving, record a video of it and use a complex technique such as this. Usually 3d information is created from at least 2 2d images taken from 2 different places. You also need to know quite precisely the distance and the rotation between the two images. The common technique is to have 2 cameras with a precisely measured distance between the two.
The Xtion is not a basic webcam. It's a stereo-scopic depth sensing cam similar to Kinect and Primesense. The main API for this is OpenNI - see

Correspondence between a set of 3D model points and their image projections

I have a set of 3-d points and some images with the projections of these points. I also have the focal length of the camera and the principal point of the images with the projections (resulting from previously done camera calibration).
Is there any way to, given these parameters, find the automatic correspondence between the 3-d points and the image projections? I've looked through some OpenCV documentation but I didn't find anything suitable until now. I'm looking for a method that does the automatic labelling of the projections and thus the correspondence between them and the 3-d points.
The question is not very clear, but I think you mean to say that you have the intrinsic calibration of the camera, but not its location and attitude with respect to the scene (the "extrinsic" part of the calibration).
This problem does not have a unique solution for a general 3d point cloud if all you have is one image: just notice that the image does not change if you move the 3d points anywhere along the rays projecting them into the camera.
If have one or more images, you know everything about the 3D cloud of points (e.g. the points belong to an object of known shape and size, and are at known locations upon it), and you have matched them to their images, then it is a standard "camera resectioning" problem: you just solve for the camera extrinsic parameters that make the 3D points project onto their images.
If you have multiple images and you know that the scene is static while the camera is moving, and you can match "enough" 3d points to their images in each camera position, you can solve for the camera poses up to scale. You may want to start from David Nister's and/or Henrik Stewenius's papers on solvers for calibrated cameras, and then look into "bundle adjustment".
If you really want to learn about this (vast) subject, Zisserman and Hartley's book is as good as any. For code, look into libmv, vxl, and the ceres bundle adjuster.
