I have few related questions.
1) Can you get the position of the device in the global coordinates? I tried to get this value using ARFrame.camera.transform.colums.3, but it seems like the [X,Y,Z] in this column is alway [0,0,0]. I interpreted this transform to be the camera's orientation with respective to the body frame. Can someone explain what exactly you get out of the ARFrame.camera.transform?
2) If we have the position of the device (camera) in the global coordinates, I assume we can easily get the velocity of the device. Is this a valid statement?
3) Can you only get the global position when you are tracking an object? Thus, you get your position relative to the tracked object? I would like to get the speed of the device even when the camera shakes a lot, thus the tracking quality is not always good.
Yes, you can make a speedometer with ARKit. A few people have already.
Regarding your more specific questions...
ARKit doesn’t have “global coordinates”, or probably not in the sense you’re thinking. Camera and anchor transforms use a shared reference frame (“world” space in traditional 3D graphics parlance), but that reference frame is valid only within the session: 0,0,0 is where your camera/device was at the beginning of the session.
If you have two positions at two different times in any shared reference frame, the difference between those positions is the average velocity over that time.
ARKit doesn’t track objects. The camera transform is always relative to “world” space. As mentioned above, it’s 0,0,0 at the beginning of your session because the reference frame is based within the session.
If you want Global positioning — that is, relative to the Earth — you should be looking at Core Location. Note that there’s a difference of scale and precision, though: GPS is accurate to a meter or two but operates at planet scale, and ARKit is accurate to a centimeter or two but operates at room scale.
Related
I'm in a somewhat unusual situation where I need to be able to calculate, without using ARCore anchors, the transform from the ARKit coordinate system (ARWorldTracking) to the geo spatial coordinate system (see note below on why I need to do this).
Each frame, I am getting the heading of the camera using the appropriate API. I would now like to use this heading to figure out how map between ARKit coordinates and geo spatial coordinates. When the phone is held upright in portrait orientation, it is pretty easy to figure out how the heading is determined (it appears to be based on the negative z-axis of the the ARFrame's ARCamera object. When the phone is held flat with the screen up, the heading seems to follow the negative x-axis of the camera object.
What I am unable to determine is how the heading is determined when the phone has yaw, pitch, and roll (in this situation it is unclear which axis the heading is in reference to). I've tried a bunch of different test cases and so far I am unable to achieve the accuracy I am expecting.
Note: I cannot use the ARCore anchors since I am using the ARWorldMap in conjunction with ARWorldTrackingSession. This means that when the system is localizing to the ARWorldMap, no ARCore anchors can be created (due to the fact that the tracking state is .limited).
I am using ARCore to track an image. Based on the following reference, the FOCUSMODE of the camera should be set to FIXED for better AR tracking performance.
Since for each frame we can get camera intrinsic parameter of focal length, why we need to use a fixed focus?
With Fixed Camera Focus ARCore can better calculate a parallax (no near or distant real-world objects must be out of focus), so your Camera Tracking will be reliable and accurate. At Tracking Stage, your gadget should be able to clearly distinguish all textures of surrounding objects and feature points – to build correct 3D scene.
Also, Scene Understanding stage requires fixed focus as well (to correctly detect planes, catch lighting intensity and direction, etc). That's what you expect from ARCore, don't you?
Fixed Focus also guarantees that your "in-focus" rendered 3D model will be placed in scene beside the real-world objects that are "in-focus" too. However, if we're using Depth API we can defocus real-world and virtual objects.
P.S.
In the future ARCore engineers may change the aforementioned behaviour of camera focus.
I have an application that is detecting objects in dashcam videos using Tensorflow object detection. I am trying to calculate the physical distance (and subsequently angle) of the detected objects from the camera.
I tried the similar triangles method described in this post:
https://medium.com/geoai/road-feature-detection-geotagging-600ea03f9a8
It is working for objects with a height such as road signs, but how do I calculate the distance of flat objects such as potholes? I tried setting height of 1mm but it is not correct.
When taking an image of a 3D scene we lose depth information in the process. In some cases, we can infer the lost information using various methods such as triangulation, or by using assumptions about the scene like the one you are making (knowing the height of the object whose distance you are trying to calculate).
When inferring the distance of an object that has no height, you will need to use some other information. For example, you can use the width/diameter of the pothole (if you know it) as replacement for the height and replacing h and H in your calculations accordingly.
I use single-camera calibration with checkerboard and I used one fix position of the camera to do the calibration. Now my question is if I use the same position but change the height of the camera then do I need to do calibration again? If no then will I get the same result by using the different height of the camera?
In my case, I changed the height of the camera but the position of the camera was the same. And I got a different result when I changed height. So I was wondering that may I need to do again calibration of the camera or not?
please help me out.
Generally speaking, and to achieve greatest accuracy, you will need to recalibrate the camera whenever it is moved. However, if the lens mount is rigid enough w.r.t the sensor, you may get away with only updating the extrinsic calibration, especially if your accuracy requirements are modest.
To see why this is the case notice that, unless you have a laboratory-grade rig holding and moving the camera, you can't just change the height only. With a standard tripod, for example, there will in general be a motion in all three axes amounting to a significant fraction of the sensor's size, which will be reflected in visible motion of several pixel with respect to the scene.
Things get worse / more complicated when you also add rotation to re-orient the field of view, since a mechanical mount will not, in general, rotate the camera around its optical center (i.e. the exit pupil of the lens), and therefore every rotation necessarily comes with an additional translation.
In your specific case, since you are only interested in measurements on a plane, and therefore can compute everything using homographies, refining the extrinsic calibration amounts to just recomputing the world-to-image scale. This can easily be achieved by taking one or more images of objects of known size on the plane - a calibration checkerboard is just such an object.
Before ARKit 1.5, we had no way to adjust the focus of the camera and getting the lens position would always return the same value. With ARKit 1.5, however, we can now use autofocus by setting ARWorldTrackingConfiguration.isAutoFocusEnabled. My question is that, is there any way to get the current lens position from ARKit so that I can apply an out-of-focus effect on my virtual objects? I had a look at some classes where this information may be stored, like ARFrame or ARSession, but they don't seem to have such a field.
I've stumbled upon this thread where the OP says that he was able to set the lens position by using some private API's, but this was before the release of ARKit 1.5 and a sure way to get your app rejected by the App Store.
Are there any legal ways to get the lens position from ARKit?
My guess is: probably not, but there are things you might try.
The intrinsics matrix vended by ARCamera is defined to express focal length in pixel units. But I’m not sure if that’s a measurement you could (together with others like aperture) define a depth blur effect with. Nor whether it changes during autofocus (that part you can test, at least).
The AVCapture APIs underlying ARKit offer a lensPosition indicator, but it’s a generic floating point value. Zero is minimum focus distance, one is maximum, and with no real world measurement this corresponds to you wouldn’t know how much blur to apply (or what physically based camera settings in SceneKit, Unity settings to use) for each possible lens position.
Even if you could put lensPosition to use, there’s no API for getting the capture device used by an ARSession. You can probably safely assume it’s the back (wide) camera, though.