In which case the SLAM technique is used? - augmented-reality

I am doing a study related to the field of augmented reality, and especially related to Google's ARCore technology. I would like to know if the SLAM method is required for model-based tracking. It seems obvious to me that it is not used in this case, but I could not find any article to confirm this.
My second question is similar to the first one and is related to the Azure Spatial Anchors technology. This technology has the ability to recognize a scene that has been visualized during a previous session. In a way, the Azure Spatial Anchors technology reminds me a little bit of the model based tracking technology, knowing that the model based tracking has the ability to recognize a 3D object that has been previously recorded. So, in the same way I was wondering if the use of the Azure Spatial Anchors technology requires the use of the slam method ?

Have a look at Frequently asked questions about Azure Spatial Anchors
Azure Spatial Anchors depends on mixed reality / augmented reality trackers. These trackers perceive the environment with cameras and track the device in 6-degrees-of-freedom (6DoF) as it moves through the space.
Given a 6DoF tracker as a building block, Azure Spatial Anchors allows you to designate certain points of interest in your real environment as "anchor" points. You might, for example, use an anchor to render content at a specific place in the real-world.
When you create an anchor, the client SDK captures environment information around that point and transmits it to the service. If another device looks for the anchor in that same space, similar data transmits to the service. That data is matched against the environment data previously stored. The position of the anchor relative to the device is then sent back for use in the application.
...
For each point in the sparse point cloud, we transmit and store a hash of the visual characteristics of that point. The hash is derived from, but does not contain, any pixel data.
There is disclosure in Microsoft Research Blog that the same type of visual simultaneous localization and mapping (SLAM) algorithms are being used with Azure Spatial Anchors: Azure Spatial Anchors: How it works
For further details on the algorithm under NDA you can Open a tech support ticket.

Related

Inquiry of Drake simulator automatic differentiation

I have a question regarding the drake simulator's automatic differentiation abilities. I have a paper coming out soon in a few months and some of the feedback was that I didn't comment enough on automatic differentiation.
I am familiar with automatic differentiation but am unclear how it works with physics simulators exactly.As far as i'm aware, once you have constructed the graph, you can query it several times with a forwards pass and calculate the partial derivatives of outputs with respect to inputs. In my head, querying such a graph should be computationally quick.
In the drake simulator, once I load a scene, lets say a robot arm with a single free body item (like a cube or cylinder), does it create a graph that you can query regardless of the state of the system? Or does the graph need to be reconstructed depending on the system state. For instance would the same graph work in a situation when the arm was in contact with the free body item and also when it is doing free space motion?
There is this paper (https://arxiv.org/pdf/2202.13986.pdf) where they use drake for contact based manipulation tasks in python. Their optimization takes significant time and they claim it is down to drakes automatic differentiation scheme. The only way I can think getting the derivatives over their trajectories takes so long is if at each time step, a new graph needs to be constructed.
Is anyone able to comment on this from the drake team? Or maybe even link me a useful document regarding how drake's automatic differentiation works? I have been unsuccessful in finding this information myself so far.
Drake uses Eigen's AutoDiffScalar instead of double to obtain derivatives from the same code we use for computation. That method does not build a graph at all but rather performs rote propagation of the chain rule though the computation, ending up finally with both the result and partial derivatives of that result with respect to any chosen variables.

Creating Meshes from Pointclouds of Urban Scenes

I want to create high fidelity meshes of urban street scenes using pointcloud data. The data consists of pointclouds using a HDL64-e and the scenes are very similar to the one in the Kitti Dataset.
Currently im only able to use the 'raw' point clouds and odometry of the car. Previous works already implemented the LeGO-LOAM algorithm to create a monolithic map and better odometry estimates.
Available Data:
Point Clouds with 10Hz timings
Odometry estimates with higher frequencies (LOAM Output)
Monolithic map of the scene (LOAM Output) (~1.500.000 Points)
I already did some research and came to the conclusion, that I can either
use the monolithic map with algorithms like Poisson Reconstruction, Advancing Front, etc... (using CGAL)
go the robotics way and use some packages like Voxgraph (which uses Marching Cubes internally)
As we might want to integrate image data at a later step the second option would be preferred.
Questions:
Is there a State-of-the-Art way to go?
Is it possible to get a mesh that can preserve small features like curbs and sign posts? (I know there might be a feasable limit on how fine the mesh can be)
I am very interested in some feedback and a discourse on how to tackle this problem 'the right way'.
Thank you for your suggestions/answers in advance!

ROS Human-Robot mapping (Baxter)

I'm having some difficulties understanding the concept of teleoperation in ROS so hoping someone can clear some things up.
I am trying to control a Baxter robot (in simulation) using a HTC Vive device. I have a node (publisher) which successfully extracts PoseStamped data (containing pose data in reference to the lighthouse base stations) from the controllers and publishes this on separate topics for right and left controllers.
So now I wish to create the subscribers which receive the pose data from controllers and converts it to a pose for the robot. What I'm confused about is the mapping... after reading documentation regarding Baxter and robotics transformation, I don't really understand how to map human poses to Baxter.
I know I need to use IK services which essentially calculate the co-ordinates required to achieve a pose (given the desired location of the end effector). But it isn't as simple as just plugging in the PoseStamped data from the node publishing controller data to the ik_service right?
Like a human and robot anatomy is quite different so I'm not sure if I'm missing a vital step in this.
Seeing other people's example codes of trying to do the same thing, I see that some people have created a 'base'/'human' pose which hard codes co-ordinates for the limbs to mimic a human. Is this essentially what I need?
Sorry if my question is quite broad but I've been having trouble finding an explanation that I understand... Any insight is very much appreciated!
You might find my former student's work on motion mapping using a kinect sensor with a pr2 informative. It shows two methods:
Direct joint angle mapping (eg if the human has the arm in a right angle then the robot should also have the arm in a right angle).
An IK method that controls the robot's end effector based on the human's hand position.
I know I need to use IK services which essentially calculate the
co-ordinates required to achieve a pose (given the desired location of
the end effector). But it isn't as simple as just plugging in the
PoseStamped data from the node publishing controller data to the
ik_service right?
Yes, indeed, this is a fairly involved process! In both cases, we took advantage of the kinects api to access the human's joint angle values and the position of the hand. You can read about how Microsoft research implemented the human skeleton tracking algorithm here:
https://www.microsoft.com/en-us/research/publication/real-time-human-pose-recognition-in-parts-from-a-single-depth-image/?from=http%3A%2F%2Fresearch.microsoft.com%2Fapps%2Fpubs%2F%3Fid%3D145347
I am not familiar with the Vive device. You should see if it offers a similar api for accessing skeleton tracking information since reverse engineering Microsoft's algorithm will be challenging.

Is ARCore object recognition possible?

My goal is to overlay material/texture on a physical object (it would be an architectural model) that I would have an identical 3d model of. The model would be static (on a table if that helps), but I obviously want to look at the object from any side. The footprint area of my physical models would tend to be no smaller than 15x15cm and could be as large as 2-3m^2, but I would be willing to change the size of the model to work with ARCore's capability.
I know ARCore is mainly designed to anchor digital objects to flat horizontal planes. My main question is, in its current state, is it capable of accompliahing my end goal? If i have this right, it would record physical point cloud data and attempt to match it to point cloud data of my digital model, then overlapping the two on the phone screen?
If that really isn't what ARCore is for, is there an alternative that I should be focusing on? In my head this sounded fairly straightforward, but I'm sure I'll get way out of my depth if I go about it an inefficient way. Speaking of depth, I would prefer not to use a depth sensor, since my target devices are phones.
I most definitely hope that it will be possible in the future - after all an AR toolkit without Computer Vision is not that helpful.
Unfortunately, according to the ARCore employee Ian, this is currently not directly supported but you could try to access the pixels via glReadPixels and then use OpenCV with these image bytes.
Quote from Ian:
I can't speak to future plans, but I agree that it's a desirable
capability. Unfortunately, my understanding is that current Android
platform limitations prevent providing a single buffer that can be
used as both a GPU texture and CPU-accessible image, so care must be
taken in providing that capability.
Updated: 25 September, 2022.
At the moment there's still no 3D Object Recognition API in ARCore 1.33.
But... You can use ML Kit framework and Augmented Images API (ARCore 1.2+) for some tasks.
According to Google documentation, you can use ARCore as input for Machine Learning models.

What is an augmented reality mobile application?

I've heard the term "augmented reality" used before, but what does it mean?
In particular, what is an augmented reality iPhone application?
From: http://en.wikipedia.org/wiki/Augmented_reality
Augmented reality (AR) is a term for a
live direct or indirect view of a
physical, real-world environment whose
elements are augmented by virtual
computer-generated sensory input, such
as sound or graphics. It is related to
a more general concept called mediated
reality, in which a view of reality is
modified (possibly even diminished
rather than augmented) by a computer.
As a result, the technology functions
by enhancing one’s current perception
of reality.
In the case of Augmented Reality, the
augmentation is conventionally in
real-time and in semantic context with
environmental elements, such as sports
scores on TV during a match. With the
help of advanced AR technology (e.g.
adding computer vision and object
recognition) the information about the
surrounding real world of the user
becomes interactive and digitally
usable. Artificial information about
the environment and the objects in it
can be stored and retrieved as an
information layer on top of the real
world view. The term augmented reality
is believed to have been coined in
1990 by Thomas Caudell, an employee of
Boeing at the time.
Incidentally, there are some images at the above URL that should make what's being discussed above fairly evident.
An augmented reality application is software that adds (augments) data or visuals to your experience on your camera.
Popular examples include snapchat filters, yelp monocle, and various map applications.
"Augmented reality (AR) is a live direct or indirect view of a physical, real-world environment whose elements are "augmented" by computer-generated or extracted real-world sensory input such as sound, video, graphics or GPS data. It is related to a more general concept called computer-mediated reality, in which a view of reality is modified (possibly even diminished rather than augmented) by a computer. Augmented reality enhances one’s current perception of reality, whereas in contrast, virtual reality replaces the real world with a simulated one.1 Augmentation techniques are typically performed in real time and in semantic context with environmental elements, such as overlaying supplemental information like scores over a live video feed of a sporting event." source: wikipedia.org

Resources