Finding the horizon using ARKit? - ios

I have an app using OpenCV to produce panoramas of outdoor scenes from multiple images (gebus I wish Apple would expose their pano feature!).
I need to extract the horizon. OCV does this, but is easily fooled in the samples I tried - it thought the roof was the horizon in one case.
So maybe there is a way to do this with ARKit instead? Essentially, I want to know where the "floor" is as the user moves the camera around to take images.

In ARKit there is the ARHitTestResult with various types that recognise planes (such as estimatedHorizontalPlane, existingPlane and existingPlaneUsingExtent. The upper edge of the plane would be your horizon.
You could try these and examine if the result is acceptable.


Can ARCore track moving surfaces?

ARCore can track static surfaces according to its documentation, but doesn't mention anything about moving surfaces, so I'm wondering if ARCore can track flat surfaces (of course, with enough feature points) that can move around.
Yes, you definitely can track moving surfaces and moving objects in ARCore.
If you track static surface using ARCore – the resulted features are mainly suitable for so-called Camera Tracking. If you track moving object/surface – the resulted features are mostly suitable for Object Tracking.
You also can mask moving/not-moving parts of the image and, of course, inverse Six-Degrees-Of-Freedom (translate xyz and rotate xyz) camera transform.
Watch this video to find out how they succeeded.
Yes, ARCore tracks feature points, estimates surfaces, and also allows access to the image data from the camera, so custom computer vision algorithms can be written as well.
I guess it should be possible theoretically.
However, Ive tested it with some stuff in my HOUSE (running S8 and an app with unity and arcore)
and the problem is more or less that it refuses to even start tracking movable things like books and plates etc:
due to the feature points of the surrounding floor etc it always picks up on those first.
Edit: did some more testing and i Managed to get it to track a bed sheet, it does However not adjust to any movement. Meaning as of now the plane stays fixed allthough i saw some wobbling but i guess that Was because it tried to adjust the Positioning of the plane once it's original Feature points where moved.

Computing real depth map of image objects and reconstruction from several images

I have a next task: get a room 3d projection from multiple images (possible video stream, doesn't matter). There will be spherical camera (in fact multiple cameras on sphere-like construction), so the case is the right one on the image.
I decided to code it on iOS platform as I'm iOS developer and model cameras with iPhone cam rotating it as shown on the pic above. As I can decompose this task, first I need to get real distance to the objects (walls in most cases, I think). Is it possible? Which algoritms/methods should I use to achieve this? I don't ask you to make the task for me obviously, but give me the direction, because I have no idea, maybe some equations/tutorials/algorithms with explanation to my case. Thank you!
The task of building a 3D model from multiple 2D images is called "scene reconstruction." It's still an active area of research, but solutions involve recognizing the same keypoint (e.g. a distinctive part of an object) in two images. Once you have that, you can use the known camera geometry to solve for the 3D position of that keypoint in the world.
Here's a reference:
You can google "scene reconstruction" to find lots more, and papers that go into more detail.

iOS:Which Augmented Reality SDK for virtual try room to be used?

I am working on iOS Augmented Reality project, Where i need to integrate virtual dressing concept.
I tried OpenCV, it worked as desired for me in Face Detection Scenario Only but when i did Upper Body Portion, That didn't work for me as desired.
I used UPPER_BODY_HAAR_CASCADE but it didn't work as it was desired
it came as something like
but my desired output is something like this
If someone has achieved this functionality in iOS, Please Reply me
Not exactly answer you are looking for. You make your app depending on the sdk you choose. Most of them are quite expensive to use and may suffer from changing the use policy. Additionally you drag all the extensive functionality you don't need into your app. So at the end of day your app is 60-100MB in size.
If I was you (and I was in similar situation), I would develop own little sdk with the functionality you need. If you know how to do it then it takes couple days for the basic things to work. Plus opencv and you are in good shape.
PS. #Tommy asked interesting question. How one can approach to implement something like on this video:
Adding some info which is too long for comment.
#Tommy Nice video. It seems to have all we need to proceed. First of all, for any AR application you need your camera (mobile phone camera) calibration info. In simple case, it contains two matrixes: camera matrix and distortion matrix. Camera matrix is then used for creating opengl projection matrix (how the 3d model is projected to 2d flat screen, field of view, planes, etc). And distortions matrix is used for example, for warping parts of your input frame in case of detecting something. In the example with watches, we need to detect the belt and watches body in order to place the 3d model in that position. Given the paper watches is not having ideal perspective with 90 degrees angle to the eye, it needs to be transformed to this view.
In other words, your paper watches looks like this:
/ /
And for the analysis and detecting the model name you need it look like this:
| |
| |
This is where distortion matrix is used in order to have precise transformation. And different cameras have their own distortions.
Most of application use so called offline calibration. There is a chessboard and its feed into opencv functions that detect cells on series of frames with different perspective, and build the matrices based on how the cells are shaped.
In your case, the belt of your watch may be designed in a way that it will contain all the needed for online calibration. On your video it has special pattern, I'm pretty sure its done exactly for this purpose. You may do the same and use chessboard pattern for simplicity.
Then you could use lets say 25 first frames for online calibration and then having all the matrixes you go for detecting paper watches, building projection matrix and replace it with your 3d model. If all is done right then your paper watcthes will have coord 0 0 0 in 3d space and you could easily place something else in that position.

how to find object distance from asus xtion pro camera opencv, ROS

Hi i am using an asus xtion pro live camera for my object detection, i am also new to opencv. Im trying to get distance of object from the camera. The Object detected is in 2d image. Im not sure on what should i use to get the information then following up with the calculations to get distance between camera and object detected. Could someone advise me please?
In short: You can't.
You're losing the depth information and any visible pixel in your camera image essentially transforms into a ray originating from your camera.
So once you've got an object at pixel X, all you know is that the object somewhere intersects the vector cast based on this pixel and the camera's intrinsic/extrinsic parameters.
You'll essentially need more information. One of the following should suffice:
Know at least one coordinate of the 3D point (e.g. everything detected is on the ground or in some known plane).
Know the relation between two projected points:
Either the same point from different positions (known camera movement/offset)
or two points with significant distance between them (like the two ends of some staff or bar).
Once you've got either, you're able to use simple trigonometry (rule of three) to calculate the missing values.
Since I initially missed this being a camera with an OpenNI compatible depth sensor, it's possible to build OpenCV with support for that by definining the preprocessor define WITH_OPENNI when building the library.
I don't like to be the one breaking this to you but what you are trying to do is either impossible or extremely difficult with a single camera.
You need to have the camera moving, record a video of it and use a complex technique such as this. Usually 3d information is created from at least 2 2d images taken from 2 different places. You also need to know quite precisely the distance and the rotation between the two images. The common technique is to have 2 cameras with a precisely measured distance between the two.
The Xtion is not a basic webcam. It's a stereo-scopic depth sensing cam similar to Kinect and Primesense. The main API for this is OpenNI - see

Fiducial marker detection in the presence of camera shake

I'm trying to make my OpenCV-based fiducial marker detection more robust when the user moves the camera (phone) violently. Markers are ArTag-style with a Hamming code embedded within a black border. Borders are detected by thresholding the image, then looking for quads based on the found contours, then checking the internals of the quads.
In general, decoding of the marker is fairly robust if the black border is recognized. I've tried the most obvious thing, which is downsampling the image twice, and also performing quad-detection on those levels. This helps with camera defocus on extreme nearground markers, and also with very small levels of image blur, but doesn't hugely help the general case of camera motion blur
Is there available research on ways to make detection more robust? Ideas I'm wondering about include:
Can you do some sort of optical flow tracking to "guess" the positions of the marker in the next frame, then some sort of corner detection in the region of those guesses, rather than treating the rectangle search as a full-frame thresholding?
On PCs, is it possible to derive blur coeffiients (perhaps by registration with recent video frames where the marker was detected) and deblur the image prior to processing?
On smartphones, is it possible to use the gyroscope and/or accelerometers to get deblurring coefficients and pre-process the image? (I'm assuming not, simply because if it were, the market would be flooded with shake-correcting camera apps.)
Links to failed ideas would also be appreciated if it saves me trying them.
Yes, you can use optical flow to estimate where the marker might be and localise your search, but it's just relocalisation, your tracking will have broken for the blurred frames.
I don't know enough about deblurring except to say it's very computationally intensive, so real-time might be difficult
You can use the sensors to guess the sort of blur you're faced with, but I would guess deblurring is too computational for mobile devices in real time.
Then some other approaches:
There is some really smart stuff in here: where they're doing edge detection (which could be used to find your marker borders, even though you're looking for quads right now), modelling the camera movements from the sensors, and using those values to estimate how an edge in the direction of blur should appear given the frame-rate, and searching for that. Very elegant.
Similarly here they just pre-blur the tracking targets and try to match the blurred targets that are appropriate given the direction of blur. They use Gaussian filters to model blur, which are symmetrical, so you need half as many pre-blurred targets as you might initially expect.
If you do try implementing any of these, I'd be really interested to hear how you get on!
From some related work (attempting to use sensors/gyroscope to predict likely location of features from one frame to another in video) I'd say that 3 is likely to be difficult if not impossible. I think at best you could get an indication of the approximate direction and angle of motion which may help you model blur using the approaches referenced by dabhaid but I think it unlikely you'd get sufficient precision to be much more help.
