How does Google implement visual positioning services? - augmented-reality

The VPS(visual positioning services) impressed me a lot. I know visual position method based on AR-Markers, however it is hard to do the visual positioning in the open environment (without known markers)? I guess they may use sensors in the smart phone to get difference of world coordinates, which may be used in the calculation.
Does anyone know how Google do the indoor positioning in the open environment? Thanks.

VPS is in closed beta right now so finding out the specifics would probably be a breach of non-disclosure agreements.
However Google Tango's current development stack for determining absolute positioning in Euclidean space is achieved with three core software/hardware technologies.
Motion Tracking (achieved through a wide angle "fisheye" monochrome camera in conjunction with the IMU, Gyroscopes, magnetometers and accelerometers within the device.
Depth Perception (achieved through a "time of flight" InfraRed emitter and receiver which creates a dense point cloud of depth measurements).
Area Learning (achieved through the RGB camera in conjunction with the Fisheye and IR sensors, which 'maps' areas in the point cloud and 'remembers' their location)
The main focus here is that Tango isn't just a software stack, it is hardware dependant too. You can only develop Tango software on a Tango enabled device such as the Lenovo Phab 2 Pro.
You could always sign up for the Tango VPS closed beta and find out more that way?

Related

How do you make object placement realistic when there's a delay finding planes using ARCore?

There's a bit of a delay when detecting planes using ARCore. That's fine, but what do you do when you want to place an object on a plane as the user pans the phone?
With the delay, the object pops into the screen after the plane is detected, rather than appearing as panned, which isn't realistic.
Let's compare two leading AR SDKs
LiDAR scanner in iOS devices for ARKit 4.0
Since an official release of ARKit 3.5 there's a support for brand-new Light Detection And Ranging scanner allowing considerably reduce a time required for detecting a vertical and/or horizontal planes (it operates at nano-second speed). Apple has implemented this sensor on the rear camera of iPad Pro 2020. LiDAR scanner (that is basically direct ToF) gives us almost instant polygonal mesh of real-world environment in AR app, which is suitable for People/Objects Occlusion feature, precise ZDepth-object-placement and a complex collision shape for dynamics. A working distance of Apple LiDAR scanner is up to 5 meters. LiDAR scanner helps you detect planes in poorly-lit rooms with no feature points on walls and a floor.
iToF cameras in Android Devices for ARCore 1.18
3D indirect Time-of-Flight sensor is a sort of scannerless LiDAR. It also surveys the surrounding environment and accurately measures a distance. Although LiDARs and iToFs at their core are almost the same things, a scanner type is more accurate since it uses multiple laser pulses versus just one large flash laser pulse. In Android world, Huawei and Samsung, for instance, include scannerless 3D iToF sensors in their smartphones. Google Pixel 4 doesn't have iToF camera. A working distance of iToF sensor is up to 5 meters and more. Let's see what Google says about its brand-new Depth API:
Google's Depth API uses a depth-from-motion algorithm to create depth maps, which you can obtain using acquireDepthImage() method. This algorithm takes multiple device images from different angles and compares them to estimate the distance to every pixel as a user moves their phone. If the device has an active depth sensor, such as a time-of-flight sensor (or iToF sensor), that data is automatically included in the processed depth. This enhances the existing depth map and enables depth even when the camera is not moving. It also provides better depth on surfaces with few or no features, such as white walls, or in dynamic scenes with moving people or objects.
Recommendations
When you're using AR app built on ARCore without iToF sensor support, you need to detect planes in a well-lit environment containing a rich and unique wall and floor textures (you needn't track repetitive textures or textures like "polka dot"). Also, you may use Augmented Images feature to quickly get ARAnchors with a help of image detection algorithm.
Conclusion
Plane Detection is a very fast stage in case you're using LiDAR or iToF sensors. But for devices without LiDAR or iToF (when you're using ARKit 3.0 and lower or ARCore 1.17 and lower) there will be some delay at plane detection stage.
If you need more details about LiDAR scanner, read my story on Medium.

ARKit detect house exterior planes

I know that ARKit is able to detect and classify planes on A12+ processors. It does the job reasonably well inside the house, but what about the outside? Is it able to detect windows and doors if I move around a house a little? I tried it myself and the result did not satisfy me: i moved around the building too much and still ARKit did not distinguish wall from the window.
I used app from here for tests: https://developer.apple.com/documentation/arkit/tracking_and_visualizing_planes
I’m I doing everything correct? Maybe there is some third party library to detect house parts better?
Thanks in advance!
When you test the sample app outside and try to use ARKit to detect the surfaces on the exterior of a house it will not work. ARKit is built to map flat surfaces and their orientations (horizontal/vertical). This means ARKit can understand that a surface is flat, is either a wall or a floor. When you attempt to "map" the exterior of a house, ARKit will only detect the horizontal surfaces as walls, it cannot distinguish between walls and windows.
You will need to develop/source an AI model and run it against the camera data using CoreML to enable your app to distinguish between windows and walls on the exterior of a house.
ARKit Plane tracking documentation for reference: https://developer.apple.com/documentation/arkit/tracking_and_visualizing_planes
a couple articles about ARKit with CoreML
https://www.rightpoint.com/rplabs/dev/arkit-and-coreml
https://medium.com/s23nyc-tech/using-machine-learning-and-coreml-to-control-arkit-24241c894e3b
[Update]
Yes you are correct, for A12+ devices Apple does allow for plane classification. I would assume the issue with exterior windows vs interior is either distance to the window (too far for the CV to properly classify) or Apple has tuned it more for interior windows vs exterior. The difference may seem trivial but to a CV algorithm it's quite different.

What sensors does ARCore use?

What sensors does ARCore use: single camera, dual-camera, IMU, etc. in a compatible phone?
Also, is ARCore dynamic enough to still work if a sensor is not available by switching to a less accurate version of itself?
Updated: May 10, 2022.
About ARCore and ARKit sensors
Google's ARCore, as well as Apple's ARKit, use a similar set of sensors to track a real-world environment. ARCore can use a single RGB camera along with IMU, what is a combination of an accelerometer, magnetometer and a gyroscope. Your phone runs world tracking at 60fps, while Inertial Measurement Unit operates at 1000Hz. Also, there is one more sensor that can be used in ARCore – iToF camera for scene reconstruction (Apple's name is LiDAR). ARCore 1.25 supports Raw Depth API and Full Depth API.
Read what Google says about it about COM method, built on Camera + IMU:
Concurrent Odometry and Mapping – An electronic device tracks its motion in an environment while building a three-dimensional visual representation of the environment that is used for fixing a drift in the tracked motion.
Here's Google US15595617 Patent: System and method for concurrent odometry and mapping.
in 2014...2017 Google tended towards Multicam + DepthCam config (Tango project)
in 2018...2020 Google tended to SingleCam + IMU config
in 2021 Google returned to Multicam + DepthCam config
We all know that the biggest problem for Android devices is a calibration. iOS devices don't have this issue ('cause Apple controls its own hardware and software). A low quality of calibration leads to errors in 3D tracking, hence all your virtual 3D objects might "float" in a poorly-tracked scene. In case you use a phone without iToF sensor, there's no miraculous button against bad tracking (and you can't switch to a less accurate version of tracking). The only solution in such a situation is to re-track your scene from scratch. However, a quality of tracking is much higher when your device is equipped with ToF camera.
Here are four main rules for good tracking results (if you have no ToF camera):
Track your scene not too fast, not too slow
Track appropriate surfaces and objects
Use well lit environment when tracking
Don't track reflected of refracted objects
Horizontal planes are more reliable than vertical ones
SingleCam config vs MultiCam config
The one of the biggest problems of ARCore (that's ARKit problem too) is an Energy Impact. We understand that the higher frame rate is – the better tracking's results are. But the Energy Impact at 30 fps is HIGH and at 60 fps it's VERY HIGH. Such an energy impact will quickly drain your smartphone's battery (due to an enormous burden on CPU/GPU). So, just imagine that you use 2 cameras for ARCore – your phone must process 2 image sequences at 60 fps in parallel as well as process and store feature points and AR anchors, and at the same time, a phone must simultaneously render animated 3D graphics with Hi-Res textures at 60 fps. That's too much for your CPU/GPU. In such a case, a battery will be dead in 30 minutes and will be as hot as a boiler)). It seems users don't like it because this is not-good AR experience.

ARKit Version 1.5 can detect Vertical surface?

If I am not wrong ARKit does not support vertical surface detection.
But
According to https://developer.apple.com/news/
and
https://developer.apple.com/news/?id=01242018b
OS 11 is the biggest AR platform in the world, allowing you to create unparalleled augmented reality experiences for hundreds of millions of iOS users. Now you can build even more immersive experiences by taking advantage of the latest features of ARKit, available in iOS 11.3 beta. With improved scene understanding, your app can see and place virtual objects on vertical surfaces, and more accurately map irregularly shaped surfaces.
Does it mean that version 1.5 can able to detect vertical surface too. ?

How to use the VGA camera as a optical sensor?

I am designing an information kiosk which incorporates a mobile phone hidden inside the kiosk.
I wonder whether it would be possible to use the VGA camera of the phone as a sensor to detect when somebody is standing in front of the kiosk.
Which SW components (e.g. Java, APIs, bluetooth stack etc) would be required for a code to use the VGA camera for movement detection?
Obvious choice is to use face detection. But you would have to calibrate this to ensure that the face detected is close enough to the kiosk. May be using the relative size of the face in the picture. This could be done using opencv lib which is widely used. But as this kiosk would be deployed in places you would have little control of the lighting, there's a good chance of false positives and negatives. May be you also want to consider a proximity sensor in combination with face detection.
Depending on what platform is the information kiosk using the options would vary... But assuming there is linux somewhere underneath, you should take a look at OpenCV library. And in case it is of any use - here's a link to my funny experiment to get the 'nod-controlled interface' for reading the long web pages.
And speaking of false positives - or even worse - false negatives - in case of bad lighting or unusual angle the chances are pretty high. So you'd need to complement that by some fallback mechanism like onscreen button 'press here to start' which would be there by default, and then use the inactivity timeout alongside with the face detection to avoid having just one information input vector.
Another idea (depending on the light conditions), might be to measure the overall amount of light in the picture - natural light should be eliciting only slow changes, while the person walking close to the kiosk would cause rapid lighting change.
In j2me (java for mobile phones), you can use the mmapi (mobile media api) to capture the camera screen.
Most phones support this.
#Andrew's suggestion on OpenCV is good. There are a lot of motion detection projects. BUT, I would suggest adding a cheap CMOS camera rather than the mobile phone camera.

Resources