Using OpenCV rotation and translation in iOS Metal to draw AR cube - ios

I am new to playing the AR, computer vision and graphics. I am trying to use make a AR experience using OpenCV and Metal in iOS. Rotation and Translation was retrieved by SolvePnP and passed to Metal to draw the cube pose.
However, there are a few problems:
The cube jump out of the view as soon as the R and T are passed.
Even can see the cube sometimes, the rotation seems not match with phone movement.
Translation values seems not doing to move the cube with respect to phone movement. Therefore, failed to anchor to any surface or a point at space.
I can provide more information to analysis my problem if needed. Thank you.

Related

Can ARKit detect specific surfaces as planes?

Using iOS 11 and iOS 12 and ARKit, we are currently able to detect planes on horizontal surfaces, and we may also visualize that plane on the surface.
I am wondering if we can declare, through some sort of image file, specific surfaces in which we want to detect planes? (possibly ignoring all other planes that ARKit detects from other surfaces)
If that is not possible, could we then perhaps capture the plane detected (via an image), to which we could then process through a CoreML model which identifies that specific surface?
ARKit has no support for such thing at the moment. You can indeed capture the plane detected as an image and if you're able to match this through core ML in real time, I'm sure lot of people would be interested!
You should:
get the 3D position of the corners of the plane
find their 2D position in the frame, using sceneView.projectPoint
extract the frame from the currentFrame.capturedImage
do an affine transform on the image to be left with the your plane, reprojected to a rectangle
do some ML / image processing to detect a match
Keep in mind that the ARKit rectangle detection is often not well aligned, and can have only part of the full plane.
Finally, unfortunately, the feature points that ARKit exposes are not useful since they dont contain any characteristics used for matching feature points across frames, and Apple has not say what algorithm they use to compute their feature points.
Here is small demo code for Find horizontal surface. In #Swift5 Github

How to get the coordinates of the moving object with GPUImage motionDetection- swift

How would I go about getting the screen coordinates of something that enters frame with motionDetection filter? I'm fairly new to programming, and would prefer a swift answer if possible.
Example - I have the iphone pointing at a wall - monitoring it with the motionDetector. If I bounce a tennis ball against the wall - I want the app to place an image of a tennis ball on the iphone display at the same spot it hit the wall.
To do this, I would need the coordinates of where the motion occurred.
I thought maybe the "centroid" argument did this.... but I'm not sure.
I should point out that the motion detector is pretty crude. It works by taking a low-pass filtered version of the video stream (a composite image generated by a weighted average of incoming video frames) and then subtracting that from the current video frame. Pixels that differ above a certain threshold are marked. The number of these pixels, along with the centroid of the marked pixels are provided as a result.
The centroid is a normalized (0.0-1.0) coordinate representing the centroid of all of these differing pixels. A normalized strength gives you the percentage of the pixels that were marked as differing.
Movement in a scene will generally cause a bunch of pixels to differ, and for a single moving object the centroid will generally be the center of that object. However, it's not a reliable measure, as lighting changes, shadows, other moving objects, etc. can also cause pixels to differ.
For true object tracking, you'll want to use feature detection and tracking algorithms. Unfortunately, the framework as published does not have a fully implemented version of any of these at present.

AR With External Tracking - Alignment is wrong, values are right

I recently managed to get my augmented reality application up and running close to what is expected. However, I'm having an issue where, even though the values are correct, the augmentation is still off by some translation! It would be wonderful to get this solved as I'm so close to having this done.
The system utilizes an external tracking system (Polaris Spectra stereo optical tracker) with IR-reflective markers to establish global and reference frames. I have a LEGO structure with a marker attached which is the target of the augmentation, a 3D model of the LEGO structure created using CAD with the exact specs of its real-world counterpart, a tracked pointer tool, and a camera with a world reference marker attached to it. The virtual space was registered to the real world using a toolset in 3D Slicer, a medical imaging software which is the environment I'm developing in. Below are a couple of photos just to clarify exactly the system I'm dealing with (May or may not be relevant to the issue).
So a brief overview of exactly what each marker/component does (Markers are the black crosses with four silver balls):
The world marker (1st image on right) is the reference frame for all other marker's transformations. It is fixed to the LEGO model so that a single registration can be done for the LEGO's virtual equivalent.
The camera marker (1st image, attached to camera) tracks the camera. The camera is registered to this marker by an extrinsic calibration performed using cv::solvePnP().
The checkerboard is used to acquire data for extrinsic calibration using a tracked pointer (unshown) and cv::findChessboardCorners().
Up until now I've been smashing my face against the mathematics behind the system until everything finally lined up. When I move where I estimate the camera origin to be to the reference origin, the translation vector between the two is about [0; 0; 0]. So all of the registration appears to work correctly. However, when I run my application, I get the following results:
As you can see, there's a strange offset in the augmentation. I've tried removing distortion correction on the image (currently done with cv::undistort()), but it just makes the issue worse. The rotations are all correct and, as I said before, the translations all seem fine. I'm at a loss for what could be causing this. Of course, there's so much that can go wrong during implementation of the rendering pipeline, so I'm mostly posting this here under the hope that someone has experienced a similar issue. I already performed this project using a webcam-based tracking method and experienced no issues like this even though I used the same rendering process.
I've been purposefully a little ambiguous in this post to avoid bogging down readers with the minutia of the situation as there are so many different details I could include. If any more information is needed I can provide it. Any advice or insight would be massively appreciated. Thanks!
Here are a few tests that you could do to validate that each module works well.
First verify your extrinsic and intrinsic calibrations:
Check that the position of the virtual scene-marker with respect to the virtual lego scene accurately corresponds to the position of the real scene-marker with respect to the real lego scene (e.g. the real scene-marker may have moved since you last measured its position).
Same for the camera-marker, which may have moved since you last calibrated its position with respect to the camera optical center.
Check that the calibration of the camera is still accurate. For such a camera, prefer a camera matrix of the form [fx,0,cx;0,fy,cy;0,0,1] (i.e. with a skew fixed to zero) and estimate the camera distortion coefficients (NB: OpenCV's undistort functions do not support camera matrices with non-zero skews; using such matrices may not raise any exception but will result in erroneous undistortions).
Check that the marker tracker does not need to be recalibrated.
Then verify the rendering pipeline, e.g. by checking that the scene-marker reprojects correctly into the camera image when moving the camera around.
If it does not reproject correctly, there is probably an error with the way you map the OpenCV camera matrix into the OpenGL projection matrix, or with the way you map the OpenCV camera pose into the OpenGL model view matrix. Try to determine which one is wrong using toy examples with simple 3D points and simple projection and modelview matrices.
If it reprojects correctly, then there probably is a calibration problem (see above).
Beyond that, it is hard to guess what could be wrong without directly interacting with the system. If I were you and I still had no idea where the problem could be after doing the tests above, I would try to start back from scratch and validate each intermediate step using toy examples.

how to find object distance from asus xtion pro camera opencv, ROS

Hi i am using an asus xtion pro live camera for my object detection, i am also new to opencv. Im trying to get distance of object from the camera. The Object detected is in 2d image. Im not sure on what should i use to get the information then following up with the calculations to get distance between camera and object detected. Could someone advise me please?
In short: You can't.
You're losing the depth information and any visible pixel in your camera image essentially transforms into a ray originating from your camera.
So once you've got an object at pixel X, all you know is that the object somewhere intersects the vector cast based on this pixel and the camera's intrinsic/extrinsic parameters.
You'll essentially need more information. One of the following should suffice:
Know at least one coordinate of the 3D point (e.g. everything detected is on the ground or in some known plane).
Know the relation between two projected points:
Either the same point from different positions (known camera movement/offset)
or two points with significant distance between them (like the two ends of some staff or bar).
Once you've got either, you're able to use simple trigonometry (rule of three) to calculate the missing values.
Since I initially missed this being a camera with an OpenNI compatible depth sensor, it's possible to build OpenCV with support for that by definining the preprocessor define WITH_OPENNI when building the library.
I don't like to be the one breaking this to you but what you are trying to do is either impossible or extremely difficult with a single camera.
You need to have the camera moving, record a video of it and use a complex technique such as this. Usually 3d information is created from at least 2 2d images taken from 2 different places. You also need to know quite precisely the distance and the rotation between the two images. The common technique is to have 2 cameras with a precisely measured distance between the two.
The Xtion is not a basic webcam. It's a stereo-scopic depth sensing cam similar to Kinect and Primesense. The main API for this is OpenNI - see http://structure.io/openni.

iOS Camera Color Recognition in Real Time: Tracking a Ball

I have been looking for a bit and know that people are able to track faces with core image and openGL. However I am not that sure where to start the process of tracking a colored ball with the iOS camera.
Once I have a lead to being able to track the ball. I hope to create something to detect. when the ball changes directions.
Sorry I don't have source code, but I am unsure where to even start.
The key point is image preprocessing and filtering. You can use the Camera API-s to get the video stream from the camera. Take a snapshot picture from it, then you should use a Gaussian-blur on it (spatial enhance), then a Luminance Average Threshold Filter (to make black and white image). After that a morphological preprocessing should be wise (opening, closing operators), to hide the small noises. Then an Edge detection algorithm (with for example a Prewitt-operator). After these processes only the edges remain, your ball should be a circle (when the recording environment was ideal) After that you can use a Hough-transform to find the center of the ball. You should record the ball position and in the next frame, the small part of the picture can be processed (around the ball only).
Other keyword could be: blob detection
A fast library for image processing (on GPU with openGL) is Brad Larsons: GPUImage library https://github.com/BradLarson/GPUImage
It implements all the needed filter (except Hough-transformation)
The tracking process can be defined as following:
Having the initial coordinate and dimensions of an object with a given visual characteristics (image features)
In the next video frame, find the same visual characteristics near the coordinate of the last frame.
Near means considering basic transformations related to the last frame:
translation in each direction;
scale;
rotation;
The variation of these tranformations are strictly related with the frame rate. Higher the frame rate, nearest the position will be in the next frame.
Marvin Framework provides plug-ins and examples to perform this task. It's not compatible with iOs yet. However, it is open source and I think you can port the source code easily.
This video demonstrates some tracking features, starting at 1:10.

Resources