What sensors does ARCore use? - augmented-reality

What sensors does ARCore use: single camera, dual-camera, IMU, etc. in a compatible phone?
Also, is ARCore dynamic enough to still work if a sensor is not available by switching to a less accurate version of itself?

Updated: May 10, 2022.
About ARCore and ARKit sensors
Google's ARCore, as well as Apple's ARKit, use a similar set of sensors to track a real-world environment. ARCore can use a single RGB camera along with IMU, what is a combination of an accelerometer, magnetometer and a gyroscope. Your phone runs world tracking at 60fps, while Inertial Measurement Unit operates at 1000Hz. Also, there is one more sensor that can be used in ARCore – iToF camera for scene reconstruction (Apple's name is LiDAR). ARCore 1.25 supports Raw Depth API and Full Depth API.
Read what Google says about it about COM method, built on Camera + IMU:
Concurrent Odometry and Mapping – An electronic device tracks its motion in an environment while building a three-dimensional visual representation of the environment that is used for fixing a drift in the tracked motion.
Here's Google US15595617 Patent: System and method for concurrent odometry and mapping.
in 2014...2017 Google tended towards Multicam + DepthCam config (Tango project)
in 2018...2020 Google tended to SingleCam + IMU config
in 2021 Google returned to Multicam + DepthCam config
We all know that the biggest problem for Android devices is a calibration. iOS devices don't have this issue ('cause Apple controls its own hardware and software). A low quality of calibration leads to errors in 3D tracking, hence all your virtual 3D objects might "float" in a poorly-tracked scene. In case you use a phone without iToF sensor, there's no miraculous button against bad tracking (and you can't switch to a less accurate version of tracking). The only solution in such a situation is to re-track your scene from scratch. However, a quality of tracking is much higher when your device is equipped with ToF camera.
Here are four main rules for good tracking results (if you have no ToF camera):
Track your scene not too fast, not too slow
Track appropriate surfaces and objects
Use well lit environment when tracking
Don't track reflected of refracted objects
Horizontal planes are more reliable than vertical ones
SingleCam config vs MultiCam config
The one of the biggest problems of ARCore (that's ARKit problem too) is an Energy Impact. We understand that the higher frame rate is – the better tracking's results are. But the Energy Impact at 30 fps is HIGH and at 60 fps it's VERY HIGH. Such an energy impact will quickly drain your smartphone's battery (due to an enormous burden on CPU/GPU). So, just imagine that you use 2 cameras for ARCore – your phone must process 2 image sequences at 60 fps in parallel as well as process and store feature points and AR anchors, and at the same time, a phone must simultaneously render animated 3D graphics with Hi-Res textures at 60 fps. That's too much for your CPU/GPU. In such a case, a battery will be dead in 30 minutes and will be as hot as a boiler)). It seems users don't like it because this is not-good AR experience.

Related

Scene Reconstruction with ARGeoTrackingConfiguration

Is there a way to add Scene Reconstruction using ARGeoTrackingConfiguration in Xcode?
It appears it is only available for ARWorldTrackingConfiguration.
At the moment it's impossible and it's easy to explain why. Scene Reconstruction feature is a computationally-intensive task (due to processing of LiDAR data, number of stored polygons, tracked anchors and classifications). And I am not counting the fact that you need simultaneous plane detection, raycasting, rendering of PBR shaders, shadows and physics at 60 fps.
But Geo Tracking is a highly expensive task too. Your gadget must process URLSession data, ML data, IMU data and GPS data in real time. I suppose we need a considerably powerful iOS device with a capacious battery for running Scene Reconstruction during Geo Tracking.

How do you make object placement realistic when there's a delay finding planes using ARCore?

There's a bit of a delay when detecting planes using ARCore. That's fine, but what do you do when you want to place an object on a plane as the user pans the phone?
With the delay, the object pops into the screen after the plane is detected, rather than appearing as panned, which isn't realistic.
Let's compare two leading AR SDKs
LiDAR scanner in iOS devices for ARKit 4.0
Since an official release of ARKit 3.5 there's a support for brand-new Light Detection And Ranging scanner allowing considerably reduce a time required for detecting a vertical and/or horizontal planes (it operates at nano-second speed). Apple has implemented this sensor on the rear camera of iPad Pro 2020. LiDAR scanner (that is basically direct ToF) gives us almost instant polygonal mesh of real-world environment in AR app, which is suitable for People/Objects Occlusion feature, precise ZDepth-object-placement and a complex collision shape for dynamics. A working distance of Apple LiDAR scanner is up to 5 meters. LiDAR scanner helps you detect planes in poorly-lit rooms with no feature points on walls and a floor.
iToF cameras in Android Devices for ARCore 1.18
3D indirect Time-of-Flight sensor is a sort of scannerless LiDAR. It also surveys the surrounding environment and accurately measures a distance. Although LiDARs and iToFs at their core are almost the same things, a scanner type is more accurate since it uses multiple laser pulses versus just one large flash laser pulse. In Android world, Huawei and Samsung, for instance, include scannerless 3D iToF sensors in their smartphones. Google Pixel 4 doesn't have iToF camera. A working distance of iToF sensor is up to 5 meters and more. Let's see what Google says about its brand-new Depth API:
Google's Depth API uses a depth-from-motion algorithm to create depth maps, which you can obtain using acquireDepthImage() method. This algorithm takes multiple device images from different angles and compares them to estimate the distance to every pixel as a user moves their phone. If the device has an active depth sensor, such as a time-of-flight sensor (or iToF sensor), that data is automatically included in the processed depth. This enhances the existing depth map and enables depth even when the camera is not moving. It also provides better depth on surfaces with few or no features, such as white walls, or in dynamic scenes with moving people or objects.
Recommendations
When you're using AR app built on ARCore without iToF sensor support, you need to detect planes in a well-lit environment containing a rich and unique wall and floor textures (you needn't track repetitive textures or textures like "polka dot"). Also, you may use Augmented Images feature to quickly get ARAnchors with a help of image detection algorithm.
Conclusion
Plane Detection is a very fast stage in case you're using LiDAR or iToF sensors. But for devices without LiDAR or iToF (when you're using ARKit 3.0 and lower or ARCore 1.17 and lower) there will be some delay at plane detection stage.
If you need more details about LiDAR scanner, read my story on Medium.

The coordinate system of ARKit unstable

I load a model in the AR environment and add an ARAnchor to stabilize the model. When I place the device on the desktop and picked up later. The model’s position is not changed, but it will fly away soon. The coordinate system of ARKit will fly and be unstable.
How to avoid or deal with this situation
ARKit/RealityKit world tracking system is based on a combination of five sensors:
Rear RGB Camera
LiDAR Scanner
Gyroscope
Accelerometer
Magnetometer
Three latter ones are known as Inertial Measurement Unit (IMU) that operates at 1000 fps. But what sees your RGB Camera (running at 60 fps) and LiDAR (also at 60 fps) is very important too.
Hence, a stability of world tracking greatly depends on camera image.
Here are some recommendations for high-quality tracking:
Track only well-lit environment (if you don't have LiDAR)
Track only static objects (not moving)
Don't track poor-textured surfaces like white walls (if you don't have LiDAR)
Don't track surfaces with repetitive texture pattern (like Polka Dots)
Don't track mirrors, chrome and glass objects (reflective and refractive)
Move your iPhone slowly when tracking
Don't shake iPhone when tracking
Track as much environment as possible
Track high-contrast objects in environment (if you don't have LiDAR)
If you follow these recommendations, coordinate system in ARKit will be stable.
And look at the picture in this SO post – there are a good example for tracking and a bad one.

Determine distance from user to screen with raspberry pi sensors

I have a Raspberry Pi connected to a monitor and a camera tracking the user. I would like to know the distance of the user to the screen (or to the camera if that is better). Preferably I would like to know the distance from the users face straight to the screen.
Can I do this with just one camera and OpenCV? What about with two cameras?
Otherwise, should I just use a different sensor like the ultrasonic sensor? Is this sensor appropriate if it's below or on the side of the screen? What type of spread/'field of view' does it have?
You could do this with two cameras, I think, by comparing how far the images are displaced, and using some trigonometry. The math will be non-trivial, however. This sounds like a good application for an ultrasonic sensor. The popular HC-SR04 gives pretty accurate (for my purposes) readings from about 30cm to 2m, provided the object is on-axis. I get some useful measurements for objects up to about 20 degrees off axis, but it's considerably less accurate. You can connect a HC-SR04 to the GPIO pins, but I prefer to use commercial i2c interfaces, because doing the timing in the Pi CPU is a pain. In any event, the HC-SR04 is so cheap that you haven't lost a great deal if you buy one just to experiment with.

How to use the VGA camera as a optical sensor?

I am designing an information kiosk which incorporates a mobile phone hidden inside the kiosk.
I wonder whether it would be possible to use the VGA camera of the phone as a sensor to detect when somebody is standing in front of the kiosk.
Which SW components (e.g. Java, APIs, bluetooth stack etc) would be required for a code to use the VGA camera for movement detection?
Obvious choice is to use face detection. But you would have to calibrate this to ensure that the face detected is close enough to the kiosk. May be using the relative size of the face in the picture. This could be done using opencv lib which is widely used. But as this kiosk would be deployed in places you would have little control of the lighting, there's a good chance of false positives and negatives. May be you also want to consider a proximity sensor in combination with face detection.
Depending on what platform is the information kiosk using the options would vary... But assuming there is linux somewhere underneath, you should take a look at OpenCV library. And in case it is of any use - here's a link to my funny experiment to get the 'nod-controlled interface' for reading the long web pages.
And speaking of false positives - or even worse - false negatives - in case of bad lighting or unusual angle the chances are pretty high. So you'd need to complement that by some fallback mechanism like onscreen button 'press here to start' which would be there by default, and then use the inactivity timeout alongside with the face detection to avoid having just one information input vector.
Another idea (depending on the light conditions), might be to measure the overall amount of light in the picture - natural light should be eliciting only slow changes, while the person walking close to the kiosk would cause rapid lighting change.
In j2me (java for mobile phones), you can use the mmapi (mobile media api) to capture the camera screen.
Most phones support this.
#Andrew's suggestion on OpenCV is good. There are a lot of motion detection projects. BUT, I would suggest adding a cheap CMOS camera rather than the mobile phone camera.

Resources