Contact mechanics in Drake - drake

I have a general question regarding the accuracy of the contact mechanics of Drake. So far I have tried some different open source robotic simulation tools. They all appear to have the same problem when simulating the contact between two meshed objects, that the objects are unstable and fall off each other. Eg. in Gazebo I tried stacking two meshed objects (see https://youtu.be/_4qQh3pvAZ8) without success.
I am trying to learn assembly tasks using reinforcement learning. RL needs a lot of iterations (simulations) before it converges to a valid policy. Since RL needs to learn something in a reasonable time it is not possible to increase the accuracy (by reducing the step size), because that will also increase the computation time too much. In the end the only solution was to go to Adams, which is an (expensive) multibody mechanics software toolbox, where there is more freedom to optimize the contact between two specific objects. I also tried the simulator Klampt, where the contact is more accurate but it also adds a layer around each object.
Today I came across Drake and saw that in the videos the contact mechanics are really accurate. But most of the obects seem non-meshed objects (blocks and cilinders), of which the behavior is easier to approximate. So I am wondering if Drake also exhibits inaccurate behavior, like the video, with meshed objects stacked on top of each other? Also if the speed of the simulation is around the same speed as the real world?

I can't comment specifically on what may be causing your instability in other applications (e.g., Gazebo), but I can shed some light into contact stability in Drake.
Drake's default contact model is a very frequently imlemented "point contact" model (discussed here). Given two bodies in contact, intersection between the representative collision geometries is detected and the amount of penetration is reported by a pair of points representing the maximum amount of penetration (and the force is applied at that contact point).
For a sphere on a plane, this is perfectly sufficient, because the contact between a rigid sphere and plane is a single point. For stacking boxes, it is a poor approximation; the contact interface between two stacked boxes isn't a point, but a polygon where force would be applied across the full contact interface. Representing it as a point introduces artificial torques.
Drake has an additional contact model -- one that is currently in development and incomplete. It is called "hydroelastic" contact and instead of representing contact by measurement at a single point, it computes an entire surface of contact, distributing the contact force over that full surface. As you might imagine, this leads to far more stable contact. However, because the model is not complete, there are restrictions on how you can use it and when it'll actually provide value. However, the feature is available in Drake's public API and you are free to investigate it. A basic explanation of the characterization can be found here.
Some further thoughts based on the details above:
General, non-convex meshes.
For the point contact model, non-convex meshes are not directly supported. Instead, it uses the implicit convex hull of that mesh (which can negatively impact performance). If you know the mesh to be convex, you can declare it as such and Drake can use techniques which may improve the efficiency.
The hydroelastic contact model does support non-convex meshes, but they must be for strictly "rigid" objects. And that model only computes contact between soft and rigid objects. So, if you're hoping to compute contact between two non-convex meshes, you won't be able to use this model. In its current state, you need to model things as contact between rigid meshes and soft primitives. (e.g., you can create a "soft" box to serve as a table, and place any number of rigid, non-convex meshes on it stably, but contact between those rigid objects will not be stable).
Tricks for stable contact
One "trick" for getting better stability in contact with a point contact model is to change the representation of the collision geometry. Place small spheres on the surface of the object. The goal is to ensure that when it is in contact with other objects, you will get contact at multiple points. The challenge here is placing the spheres such that meaningful contact will generally get you at least three results.

Related

How to track Fast Moving Objects?

I'm trying to create an application that will be able to track rapidly moving objects in video/camera feed, however have not found any CV/DL solution that is good enough. Can you recommend any computer vision solution for tracking fast moving objects on regular laptop computer and web cam? A demo app would be ideal.
For example see this video where the tracking is done in hardware (I'm looking for software solution) : https://www.youtube.com/watch?v=qn5YQVvW-hQ
Target tracking is a very difficult problem. In target tracking you will have two main issues: the motion uncertainty problem, and the origin uncertainty problem. The first one refers to the way you model object motion so you can predict its future state, and the second refers to the issue of data association(what measurement corresponds to what track, and the literature is filled with scientific ways in which this issue can be approached).
Before you can come up with a solution to your problem you will have to answer some questions yourself, regarding the tracking problem you want to solve. For example: what are the values that you what to track(this will define your state vector), how are those values related to one another, are you trying to perform single object tracking or multiple object tracking, how are the objects moving( do they have a relatively constant acceleration or velocity ) or not, do objects make turns, can objects also be occluded or not and so on.
The Kalman Filter is good solution to predict the next state of your system (once you have identified your process model). A deep learning alternative to the Kalman filter is the so called Deep Kalman Filter which essentially is used to do the same thing. In case your process or measurement models are not linear, you will have to linearize them before predicting the next state. Some solutions that deal with non-linear process or measurement models are the Extended Kalman Filter (EKF) or Unscented Kalman Filter (UKF).
Now related to fast moving objects, an idea you can use is to have a larger covariance matrix since the objects can move a lot more if they are fast, so the search space for the correct association has to be a bit larger. Additionally you can use multiple motion models in case your motion model cannot be satisfied with only one model.
In case of occlusions I will leave you this stack overflow thread, where I have given an answer covering more details regarding occlusion handling in case of tracking. I have added some references for you to read. You will have to provide more details in your question, if you would like to receive more information regarding a solution (for example you should define fast moving objects with respect to camera frame rate).
I personally do not think there is a silver bullet solution for the tracking problem, I prefer to tailor a solution to the problem I am trying to solve.
The tracking problem is complicated. It is also more in the realm of control systems than computer vision. It would be also helpful to know more about your situation, as the performance of the chosen method pretty much depends on your problem constraints. Are you interested in real-time tracking? Are you trying to reconstruct an existing trajectory? Are there multiple targets? Just one? Are the physical properties of the targets (i.e. velocity, direction, acceleration) constant?
One of the most basic tracking methods is implemented by a Linear Dynamic System (LDS) description, in concrete, a discrete implementation, since we’re working with discrete frames of information. This method is purely based on physics, and its prediction is very sensitive. Depending on your application, the error rate could be acceptable… or not.
A more robust solution is Kalman’s Filter, and it is pretty much the go-to answer when tracking is needed. It implements prediction based on all the measurements obtained so far during the model's lifetime. It mainly works on constant-based measurements (velocity and acceleration) although it can be extended to handle non-constant models. If you are working with targets that won't exhibit a drastic change in their velocity, this is what you (probably) should implement.
I'm sorry I can't provide you with more, but the topic is pretty extensive and, admittedly, the details are beyond my area of expertise. Hopefully, this info should give you a little bit of context for finding a solution.
The problem of tracking fast-moving objects (FMO) is a known research topic in computer vision. FMOs are defined as objects which move over a distance larger than their size in one video frame. The solutions which have been proposed use classical image processing and energy minimization to establish their trajectories and sharp appearance.
If you need a demo app, I would suggest this GitHub repository: https://github.com/rozumden/fmo-cpp-demo. The demo is written in OpenCV/C++ and runs in real-time. The authors also provide a mobile app version, which is still in testing mode. Using this demo app you can track any fast moving objects in real-time without even providing an object model. However, if you provide object size in real-world units, the app can also estimate object speed.
A more sophisticated algorithm is open-sourced here: https://github.com/rozumden/deblatting_python, written in Python and PyTorch for speed-up. The repository contains a solution to the deblatting (deblurring and matting) problem, exactly what happens when a Fast Moving Object appears in front of a camera.

Is deep learning the only way to detect humans in a picture?

I'm looking for a way to detect humans in a picture. For instance, regarding the picture below, I'd like to coarsely determine how many people are in the scene. I must be able to detect both standing and sitting people. I do not mind not detecting people located behind a physical object (such as the glass in the bus picture).
AFAIK, such a problem can rather easily be solved by training deep neural networks. However, my coworkers would like me to also implement a detection technique based on general image processing techniques. I've spent several days looking for techniques designed by researchers but I couldn't find anything else than saliency-based techniques (which may be fine, but I'd like to test several techniques based on old-fashioned image processing).
I'd like to mention that I'm not new to the topic of image segmentation & I used to segment aortas in medical scans. However, this task was easier IMHO since scanners have similar features: in this use-case (human detection in a bus, for instance), the pictures will have very different characteristics (e.g. image contrast can strongly vary, whether it's been taken during the day or at night).
Long story short, I'd like to know if there's some segmentation technique for human detection for which it'd be interesting giving a shot, given the fact that the images features vary a lot?
Is deep learning the only way to detect humans in a picture?
No. Is it the best way we know? Depends on your conditions.
The simplest way of detection is to generate lots of random bounding boxes and then solving the classification problem of the crop. Here is some pythonic pseudo-code:
def detect_people(image):
"""
Find all people in image.
Parameters
----------
image : image object
Returns
-------
people : list of axis-aligned bounding boxes (aabb)
Each bounding box contains a person
"""
people = []
for aabb in generate_random_aabb(image):
crop = crop_image(image, aabb)
if is_person(crop):
people.append(crop)
return people
In this case is_person can be any classifier, e.g. boosted decision stumps as used in the Viola–Jones object detection framework. Speaking of which: That would likely be the way to go without DL, but is much more complicated to explain.
Object Detection vs Segmentation
Your question mixes both. Object detection gives you bounding boxes (coarse) for instances. Semantic segmentation labels all pixels by classes, but does not distinguish different instances of the same class (e.g. different people). Instance segmentation is like object detection, but is fine-grained and aims for pixel-exact results.
If you are interested in segemantation, I can recommend my paper: A Survey of Semantic Segmentation

Real Time Camera/Image Based Large-Scale Slam Algorithm

I want to use an already implemented SLAM algorithm for mapping my college campus.
I have found some algorithms on OpenSLAM.org and some other independent ones such as LSD-SLAM and Hector SLAM, which show some promises but they have limitation such as they use LIDAR, or don't extend to large dataset etc.
SLAM has been an active topic for many years and some groups have also mapped an entire town. Can someone point me to such efficient algorithm?
My requirements are:
It must use RGB camera/cameras.
Preferably produce (somewhat) dense map of area.
It should be able to map large area (I have seen some algo which can only map up to a desk table or room, but they usually lose track if there is jerk in camera motion (Observed in LSD SLAM) or take very few landmarks which is only useful for study purposes).
Preferably a ROS implementation.

Feature combination/joint features in supervised learning

While trying to come up with appropriate features for a supervised learning problem I had the following idea and wondered if it makes sense and if so, how to algorithmically formulate it.
In an image I want to classify two regions, i.e. two "types" of pixels. Say I have some bounded structure, let's take a circle, and I know I can limit my search space to this circle. Within that circle I want to find a segmenting contour, i.e. a contour that separates my pixels into an inner class A and an outer class B.
I want to implement the following model:
I know that pixels close to the bounding circle are more likely to be in the outer class B.
Of course, I can use the distance from the bounding circle as a feature, then the algorithm would learn the average distance of the inner contour from the bounding circle.
But: I wonder if I can exploit my model assumption in a smarter way. One heuristic idea would be to weigh other features by this distance, so to say, if a pixel further away from the bounding circle wants to belong to the outer class B, it has to have strongly convincing other features.
This leads to a general question:
How can one exploit joint information of features, that were prior individually learned by the algorithm?
And to a specific question:
In my outlined setup, does my heuristic idea make sense? At what point of the algorithm should this information be used? What would be recommended literature or what would be buzzwords if I wanted to search for similar ideas in the literature?
This leads to a general question:
How can one exploit joint information of features, that were prior individually learned by the algorithm?
It is not really clear what you are really asking here. What do you mean by "individually learned by the algorithm" and what would be "joiint information"? First of all, problem is too broad, there is no such tring as "generic supervised learning model", each of them works in at least slightly different way, most falling into three classes:
Building a regression model of some kind, to map input data to the output and then agregate results for classification (linear regression, artificial neural networks)
Building geometrical separation of data (like support vector machines, classification-soms' etc.)
Directly (more or less) estimating probability of given classes (like Naive Bayes, classification restricted boltzmann machines etc.)
in each of them, there is somehow encoded "joint information" regarding features - the classification function is their joint information. In some cases it is easy do interpret (linear regression) and in some it is almost impossible (deep boltzmann machines, generally all deep architectures).
And to a specific question:
In my outlined setup, does my heuristic idea make sense? At what point of the algorithm should this information be used? What would be recommended literature or what would be buzzwords if I wanted to search for similar ideas in the literature?
To my best knowledge this concept is quite doubtfull. Many models tends to learn and work better, if your data is uncorrelated, while you are trying to do the opposite - correlate everything with some particular feature. This leads to one main concern - why are you doing this? To force model to use mainly this feature?
If it is so important - maybe a supervised learning is not the good idea, maybe you can directly model your problem by appling set of simple rules based on this particular feature?
If you know the feature is important, but you are aware that in some cases other things matter, and you cannot model them, then your problem will be how much to weight your feature. Should it be just distance*other_feature? Why not sqrt(distance)*feature? What about log(distance)*feature? There are countless possibilities, and seek for the best weighting scheme may be much more costfull, then finding a better machine learning model, which can learn your data from its raw features.
If you only suspect the importance of the feature, the best possible option would be to... do not trust this belief. Numerous studies have shown, that machine learning models are better in selecting features then humans. In fact, this is the whole point of non-linear models.
In literature, problem they you are trying to solve is generally refered as incorporating expert knowledge into the learning process. There are thousands of examples, where there is some kind of knowledge that cannot be directly encoded in data representation, yet too valuable to omit it. You should research terms like "machine learning expert knowledge", and its possible synomyms.
There's a fair amount of work treating the kind of problem you're looking at (which is called segmentation) as an optimisation to be performed on a Markov Random Field, which can be solved by graph theoretic methods like GraphCut. Some examples are the work of Pushmeet Kohli at Microsoft Research (try this paper).
What you describe is, in that framework, a prior on node membership, where p(B) is inversely proportional to the distance from the edge (in addition to any other connectivity constraints you want to impose, there's normally a connectedness one, and there will certainly be a likelihood term for the pixel's intensity). The advantage of doing this is that if you can express everything as a probability model, you don't need to rely on heuristics and you can use standard mechanisms for performing inference.
The downside is you need a fairly strong mathematical background to attempt this; I don't know what the scale of the project you're proposing is, but if you want results quickly or you're lacking the necessary background this is going to be pretty daunting.

Is there a difference in printing quality between polys vs. NURBS for Maya 3D models?

I've been reading a lot about the many differences, pros and cons between NURBS and polys, but is there a difference when it comes to 3D printing?
The printed model is typically polygonized before printing - it's easier to do things like watertightness checks and so on using triangle meshes. A nurbs model can be polygonized at various resolutions, so it should be possible to get a higher quality, smoother looking print by starting with a NURBS model and using a very generous tesselation at printing time. The tesselation may not always produce a watertight mesh - depending on the software used to do the printing that might cause problems which need to be fixed up by hand.
So, overall the main advantage of a nurbs model in this context is that you can work with a more efficient, lightweight representation of the data up until its time to print: the final printed mesh may be impractically dense for most ordinary applications (millions of triangles).
To add to #theodox answer. The other reason is that CAD/CAE applications do not really like polygon models, and treat them as a second class citizen at best. So if you need to do some analysis on the model and do some extra operations or send it to a engineer the NURBS model is MUCH better. For the engineer it allows to optimize production paths so if they are using high end printers or CNC machines instead it allows them to do a much better job. If you do not use a NURBS model the engineer will just most likely reverse engineer your model and throw your data away.
Maya on the other hand is not a very conductive engineering application. But as a upside you can just use subdivision surfaces and get both a NURBS model and benefits of polygon modeling.
PS: For a engineering application making the model watertight is no problem whatsoever if your gaps are not too big.
Depends what you mean by polys. Most of the time, what people mean is you model a poly and then smooth it (by hitting '3' or turning it into a subd).
If you're doing that, nurbs have absolutely no advantages over subds for 3d printing in terms of smoothness.
NURBS surfaces created with Class A technical surfacing may be considered "airtight" surface meshes. B-spline mathematical surfaces include physical science dynamic compression/tension surface characteristics as "structurally loaded" systems model architectures. G-Code file formats now apply b-spline data in vector based tool path manufacturing. Raster based polygon smooth function is contrary to accurate modeling for functional engineered prototypes of zero tolerance accuracy. Smooth function provides unpredictable mesh as is undefinable approximation. Professional 3D Print solutions employ NURBS geometry G-Code directly and do NOT create polygon tessellated mesh as seen in common STL file formats.The future of 3D modeling and additive manufacturing is clearly vector based b-spline NURBS surface product architecture.

Resources