Creating Meshes from Pointclouds of Urban Scenes - ros

I want to create high fidelity meshes of urban street scenes using pointcloud data. The data consists of pointclouds using a HDL64-e and the scenes are very similar to the one in the Kitti Dataset.
Currently im only able to use the 'raw' point clouds and odometry of the car. Previous works already implemented the LeGO-LOAM algorithm to create a monolithic map and better odometry estimates.
Available Data:
Point Clouds with 10Hz timings
Odometry estimates with higher frequencies (LOAM Output)
Monolithic map of the scene (LOAM Output) (~1.500.000 Points)
I already did some research and came to the conclusion, that I can either
use the monolithic map with algorithms like Poisson Reconstruction, Advancing Front, etc... (using CGAL)
go the robotics way and use some packages like Voxgraph (which uses Marching Cubes internally)
As we might want to integrate image data at a later step the second option would be preferred.
Questions:
Is there a State-of-the-Art way to go?
Is it possible to get a mesh that can preserve small features like curbs and sign posts? (I know there might be a feasable limit on how fine the mesh can be)
I am very interested in some feedback and a discourse on how to tackle this problem 'the right way'.
Thank you for your suggestions/answers in advance!

Related

Lane tracking with a camera: how to get distance from camera to the lane?

i am doing final year project as lane tracking using a camera. the most challenging task now is how i can measure distance between the camera (the car that carries it actually) and the lane.
While the lane is easily recognized (Hough line transform) but i found no way to measure distance to it.
given the fact that there is a way to measure distance to object in front of camera based on Pixel width of the object, but it does not work here be because the nearest point of the line, is blind in the camera.
What you want is to directly infer the depth map with a monocular camera.
You can refer my answer here
https://stackoverflow.com/a/64687551/11530294
Usually, we need a photometric measurement from a different position in the world to form a geometric understanding of the world(a.k.a depth map). For a single image, it is not possible to measure the geometric, but it is possible to infer depth from prior understanding.
One way for a single image to work is to use a deep learning-based method to direct infer depth. Usually, the deep learning-based approaches are all based on python, so if you only familiar with python, then this is the approach that you should go for. If the image is small enough, i think it is possible for realtime performance. There are many of this kind of work using CAFFE, TF, TORCH etc. you can search on git hub for more option. The one I posted here is what i used recently
reference:
Godard, Clément, et al. "Digging into self-supervised monocular depth estimation." Proceedings of the IEEE international conference on computer vision. 2019.
Source code: https://github.com/nianticlabs/monodepth2
The other way is to use a large FOV video for a single camera-based SLAM. This one has various constraints such as need good features, large FOV, slow motion, etc. You can find many of this work such as DTAM, LSDSLAM, DSO, etc. There are a couple of other packages from HKUST or ETH that does the mapping given the position(e.g if you have GPS/compass), some of the famous names are REMODE+SVO open_quadtree_mapping etc.
One typical example for a single camera-based SLAM would be LSDSLAM. It is a realtime SLAM.
This one is implemented based on ROS-C++, I remember they do publish the depth image. And you can write a python node to subscribe to the depth directly or the global optimized point cloud and project it into a depth map of any view angle.
reference: Engel, Jakob, Thomas Schöps, and Daniel Cremers. "LSD-SLAM: Large-scale direct monocular SLAM." European conference on computer vision. Springer, Cham, 2014.
source code: https://github.com/tum-vision/lsd_slam

ROS Human-Robot mapping (Baxter)

I'm having some difficulties understanding the concept of teleoperation in ROS so hoping someone can clear some things up.
I am trying to control a Baxter robot (in simulation) using a HTC Vive device. I have a node (publisher) which successfully extracts PoseStamped data (containing pose data in reference to the lighthouse base stations) from the controllers and publishes this on separate topics for right and left controllers.
So now I wish to create the subscribers which receive the pose data from controllers and converts it to a pose for the robot. What I'm confused about is the mapping... after reading documentation regarding Baxter and robotics transformation, I don't really understand how to map human poses to Baxter.
I know I need to use IK services which essentially calculate the co-ordinates required to achieve a pose (given the desired location of the end effector). But it isn't as simple as just plugging in the PoseStamped data from the node publishing controller data to the ik_service right?
Like a human and robot anatomy is quite different so I'm not sure if I'm missing a vital step in this.
Seeing other people's example codes of trying to do the same thing, I see that some people have created a 'base'/'human' pose which hard codes co-ordinates for the limbs to mimic a human. Is this essentially what I need?
Sorry if my question is quite broad but I've been having trouble finding an explanation that I understand... Any insight is very much appreciated!
You might find my former student's work on motion mapping using a kinect sensor with a pr2 informative. It shows two methods:
Direct joint angle mapping (eg if the human has the arm in a right angle then the robot should also have the arm in a right angle).
An IK method that controls the robot's end effector based on the human's hand position.
I know I need to use IK services which essentially calculate the
co-ordinates required to achieve a pose (given the desired location of
the end effector). But it isn't as simple as just plugging in the
PoseStamped data from the node publishing controller data to the
ik_service right?
Yes, indeed, this is a fairly involved process! In both cases, we took advantage of the kinects api to access the human's joint angle values and the position of the hand. You can read about how Microsoft research implemented the human skeleton tracking algorithm here:
https://www.microsoft.com/en-us/research/publication/real-time-human-pose-recognition-in-parts-from-a-single-depth-image/?from=http%3A%2F%2Fresearch.microsoft.com%2Fapps%2Fpubs%2F%3Fid%3D145347
I am not familiar with the Vive device. You should see if it offers a similar api for accessing skeleton tracking information since reverse engineering Microsoft's algorithm will be challenging.

Making a trained model (machine learning) from 3D models

i have a database with almost 20k 3D files, they are drawings from machine parts designed in a CAD software (solid works). Im trying to build a trained model from all of this 3D models, so i can build a 3D object Recognition App when someone can take a picture from one of this parts (in the real world) and the app can provide useful information about material , size , treatment and so on.
If anyone already do something similar, any information you can provide me would be greatly appreciated!
Some ideas:
1) Several pictures: instead of only one. As Rodrigo commented and Brad Larson tried to circumvent with his method, the problem with the user taking only one picture for the input is that you are necessarily lacking information to make a triangulation and form a point cloud in 3D. With 4 pictures taken from a slightly different angle, you can already reconstruct parts of the object. Comparing point clouds would make the endeavor much easier for any ML algorithm, Neuronal Networks (NN), Support Vector Machine (SVM) or others. A common standard to create point clouds is ASTM E2807, which uses the e57 file format.
On the downside a 3D vision algorithm might be heavy on the user's device, and is not the easiest to implement.
2) Artificial picture training: By training on pre-computed artificial pictures like Brad Larson suggested, you take over much of the computation, to the user's benefit. Be aware that you should probably use "features" extracted from the pictures, not the complete picture, both to train and to classify. The problem with this method is that you might be very sensitive to lighting and background context. You should take care to produce CAD pictures that have the same lightning conditions for all objects, so that the classifier doesn't overfit certain aspects of the "pictures" that do not belong to the object.
This aspect is where solution 1) is much more stable, it is less sensitive to the visual context.
3) Scale: The size of your object is an important descriptor. You should thus add scale information to your object descriptor before training. You could ask the user to take pictures with a reference object. Alternatively you can ask the user to make a rule-of-thumb estimate of the object size ("What are the approximate dimensions of the object, in [cm]?"). Providing size could make your algorithm significantly faster and more accurate.
If your test data in production is mainly images of the 3D object, then the method in the comment section by Brad Larson is the better approach and it is also easier to implement and takes a lot less effort and resources to get it up and running.
However if you want to classify between 3D models there are existing networks which exist to classify 3D point clouds. You will have to convert these models to point clouds and use them as training samples. One of those and which I have used is Voxnet. I also suggest you to add more variations to the training data like different rotations of the 3D model.
You can used Pre-Trained 3D Deep Neural Networks as there are many networks that could help you in your work and would produce high accuracy.

Real Time Camera/Image Based Large-Scale Slam Algorithm

I want to use an already implemented SLAM algorithm for mapping my college campus.
I have found some algorithms on OpenSLAM.org and some other independent ones such as LSD-SLAM and Hector SLAM, which show some promises but they have limitation such as they use LIDAR, or don't extend to large dataset etc.
SLAM has been an active topic for many years and some groups have also mapped an entire town. Can someone point me to such efficient algorithm?
My requirements are:
It must use RGB camera/cameras.
Preferably produce (somewhat) dense map of area.
It should be able to map large area (I have seen some algo which can only map up to a desk table or room, but they usually lose track if there is jerk in camera motion (Observed in LSD SLAM) or take very few landmarks which is only useful for study purposes).
Preferably a ROS implementation.

Is there a difference in printing quality between polys vs. NURBS for Maya 3D models?

I've been reading a lot about the many differences, pros and cons between NURBS and polys, but is there a difference when it comes to 3D printing?
The printed model is typically polygonized before printing - it's easier to do things like watertightness checks and so on using triangle meshes. A nurbs model can be polygonized at various resolutions, so it should be possible to get a higher quality, smoother looking print by starting with a NURBS model and using a very generous tesselation at printing time. The tesselation may not always produce a watertight mesh - depending on the software used to do the printing that might cause problems which need to be fixed up by hand.
So, overall the main advantage of a nurbs model in this context is that you can work with a more efficient, lightweight representation of the data up until its time to print: the final printed mesh may be impractically dense for most ordinary applications (millions of triangles).
To add to #theodox answer. The other reason is that CAD/CAE applications do not really like polygon models, and treat them as a second class citizen at best. So if you need to do some analysis on the model and do some extra operations or send it to a engineer the NURBS model is MUCH better. For the engineer it allows to optimize production paths so if they are using high end printers or CNC machines instead it allows them to do a much better job. If you do not use a NURBS model the engineer will just most likely reverse engineer your model and throw your data away.
Maya on the other hand is not a very conductive engineering application. But as a upside you can just use subdivision surfaces and get both a NURBS model and benefits of polygon modeling.
PS: For a engineering application making the model watertight is no problem whatsoever if your gaps are not too big.
Depends what you mean by polys. Most of the time, what people mean is you model a poly and then smooth it (by hitting '3' or turning it into a subd).
If you're doing that, nurbs have absolutely no advantages over subds for 3d printing in terms of smoothness.
NURBS surfaces created with Class A technical surfacing may be considered "airtight" surface meshes. B-spline mathematical surfaces include physical science dynamic compression/tension surface characteristics as "structurally loaded" systems model architectures. G-Code file formats now apply b-spline data in vector based tool path manufacturing. Raster based polygon smooth function is contrary to accurate modeling for functional engineered prototypes of zero tolerance accuracy. Smooth function provides unpredictable mesh as is undefinable approximation. Professional 3D Print solutions employ NURBS geometry G-Code directly and do NOT create polygon tessellated mesh as seen in common STL file formats.The future of 3D modeling and additive manufacturing is clearly vector based b-spline NURBS surface product architecture.

Resources