I am trying to extract features (vertices and road edges) from ground plane point cloud.
I have the ground plane estimated from RANSAC
The point cloud is from road and off-road points of a street.
I need to detect the road. So I need the edges of a road.
This is what I am referring to as "edges"
Can someone please suggest a way to extract this information from point cloud?
Related
I am asked to create calibration for the eye-tracking algorithm. However, I still don't really understand about how does the calibration helps in making our gaze estimation more accurate, as well as how calibration in eye-tracking actually works. I have read https://www.tobiidynavox.com/support-training/eye-tracker-calibration/, as well as https://developer.tobii.com/community/forums/topic/explain-calibration/, but I still don't fully understand it. I will appreciate if somebody can explain it to me.
Thank you
In the answer below, I assume that you are referring to standard pupil-centre corneal-reflection video-oculography rather than any other form of eye tracking technology.
In eye tracking, calibration is the process that transforms the coordinates of features located in a two dimensional still video frame of the eye into gaze coordinates (i.e. coordinates that are related to the world being observed). For example, let's say your eye tracker produces a 400 × 400 pixel image of the eye, and the subject is looking at a screen that is 1024 × 768 pixels in size, some distance in front of them. The calibration process needs to relate coordinates in the eye image to where the person is looking (i.e. gazing) at on the display screen. This process is not trivial: just because the pupil is centred in the eye image does not mean that the person is looking at the centre of the display in the world, for example. And the position of the pupil centre could move within the eye image even though the direction of gaze is held constant in the world. This is why we track the centre of the pupil and the corneal reflection, as the vector linking the two is robust to translation of the eye within the image that occurs in the absence of a gaze rotation.
A standard way to do this mapping is via relatively straightforward 2D non-linear regression: you move a target at known coordinates on the display and ask the participant to fixate steadily on each, while recording the location of the pupil centre and corneal reflection in the eye image. The calibration process will map the vector linking the pupil centre and the corneal reflection to the corresponding known gaze coordinates. This produces a regression solution that allows you to map intermediate locations to their interpolated gaze coordinates.
(An alternative, or supplementary, approach is model-based rather than regression-based, but let's not go there right now.)
So in essence, calibration doesn't improve gaze estimation, it provides gaze estimation. Without first doing a calibration, all you are doing is tracking the movements of features (the pupil and corneal reflection) within a relatively arbitrary image of the eye. Until calibration is carried out, you have no idea at that stage where that eye is actually pointing in the world.
Having said all that, this is not at all a coding-based question (or answer), so not actually sure that StackOverflow is the ideal venue to be asking this.
I have four cameras which are at fixed position. So I can measure the distance (even rotation) among them using a ruler (physically). Camera one and two gives me a point cloud and camera three and four gives me another point cloud. I need to merge these point clouds.
As far I have understood that ICP and other such algorithms do a rigid transformation of one point cloud to match the other. My question is how can I use the extra knowledge (distance between the cameras in centimeter) to do this transformation.
I am quite new to such work so, please correct me if I misunderstood something. And thanks in advance.
First, what kind of accuracy are you looking for, and over what volume of space? Achieving 0.1 mm registration accuracy over a 0.5 m tabletop scene is a completely different problem (in terms of mechanical design and constraints) than a few mm over a floor tens of meters wide.
Generally speaking, with a well reconstructed and unambiguous object shape, ICP will always give you a better solution than manual measurements.
If the cameras are static, then what you have is really a calibration problem, and you need to calibrate your 4-camera rig only at setup and when its configuration changes for whatever reason.
I suggest using a calibration object of precisely known size and geometry, e.g. a machined polyhedron. You can generate and ICP-register point clouds for it, then fit the merged cloud to the known geometry, thus obtaining position and orientation of every individual point cloud with respect to the fixed object. From these you can work out the poses of the cameras w.r.t. each other.
I want to get 3d model of some real word object.
I have two web cams and using openCV and SBM for stereo correspondence I get point cloud of the scene, and filtering through z I can get point cloud only of object.
I know that ICP is good for this purprose, but it needs point clouds to be initally good aligned, so it is combined with SAC to achieve better results.
But my SAC fitness score it too big smth like 70 or 40, also ICP doesn't give good results.
My questions are:
Is it ok for ICP if I just rotate the object infront of cameras for obtaining point clouds? What angle of rotation must be to achieve good results? Or maybe there are better way of taking pictures of the object for getting 3d model? Is it ok if my point clouds will have some holes? What is maximal acceptable fitness score of SAC for good ICP, and what is maximal fitness score of good ICP?
Example of my point cloud files:
https://drive.google.com/file/d/0B1VdSoFbwNShcmo4ZUhPWjZHWG8/view?usp=sharing
My advice and experience is that you already have rgb images or grey. ICP is an good application for optimising the point cloud, but has some troubles aligning them.
First start with rgb odometry (through feature points aligning the point cloud (rotated from each other)) then use and learn how ICP works with the already mentioned point cloud library. Let rgb features giving you a prediction and then use ICP to optimize that when possible.
When this application works think about good fitness score calculation. If that all works use the trunk version of ICP and optimize the parameter. After this all been done You have code that is not only fast, but also with the a low error of going wrong.
The following post is explain what went wrong.
Using ICP, we refine this transformation using only geometric information. However, here ICP decreases the precision. What happens is that ICP tries to match as many corresponding points as it can. Here the background behind the screen has more points that the screen itself on the two scans. ICP will then align the clouds to maximize the correspondences on the background. The screen is then misaligned
https://github.com/introlab/rtabmap/wiki/ICP
I'm trying to implement a head pose estimation algorithm and I'm using a Time-of-Flight camera. I need to detect the nose tip in the point cloud data I get from the camera.
After I know where the nose tip is I would sample N nearest neighbour points around it and do Least Square Error Plane Fitting on that part of the point cloud to retrieve the Yaw and Pitch angles.
The nose detection should work for different head poses not just for a full frontal head pose.
I implemented the plane fitting and that works fine but I don't know how to detect the nose tip from the 3D data.
Any advice on how this could be done would be much appreciated.
Regards,
V.
I used to work with Kinect images that have a limit on depth z > .5m, see below. I hope you don’t have this restriction with your ToF camera. Nose as an object is not very pronounced but probably can be detected using connected components on depth image. You have to find it as a blob on otherwise flat face. You can further confirm that it is a nose by comparing face depth with nose depth and nose position relative to the face. This of course doesn’t apply to the non frontal pose where nose should be found differently.
I suggest inverting your logical chain of processing: find nose then found face and start looking for a head first (as a larger object with possibly better depth contrast) and then for nose. Head is well defined by its size and shape in 3D and a face 2D detection can also fit a raw head model into your 3D point cloud using similarity transform in 3D.
link to Kinect depth map
I'm working on a stereo-camera based obstacle avoidance system for a mobile robot. It'll be used indoors, so I'm working off the assumption that the ground plane is flat. We also get to design our own environment, so I can avoid specific types of obstacle that generate false positives or negatives.
I've already found plenty of resources for calibrating the cameras and getting the images lined up, as well as information on generating a disparity map/depth map. What I'm struggling with is techniques for detecting obstacles from this. A technique that instead worked by detecting the ground plane would be just as useful.
I'm working with openCV, and using the book Learning OpenCV as a reference.
Thanks, all
From the literature I've read, there are three main approaches:
Ground plane approaches determine the ground plane from the stereo data and assume that all points that are not on the plane are obstacles. If you assume that the ground is the dominant plane in the image, then you may be able to find it simply a plane to the reconstructed point cloud using a robust model-fitting algorithm (such as RANSAC).
Disparity map approaches skip converting the stereo output to a point cloud. The most popular algorithms I've seen are called v-disparity and uv-disparity. Both look for the same attributes in the disparity map, but uv-disparity can detect some types of obstacles that v-disparity alone cannot.
Point cloud approaches project the disparity map into a three-dimensional point cloud and process those points. One example is "inverted cone algorithm" that uses a minimum obstacle height, maximum obstacle height, and maximum ground inclination to detect obstacles on arbitrary, non-flat, terrain.
Of these three approaches, detecting the ground-plane is the simplest and least reliable. If your environment has sparse obstacles and a textured ground, it should be sufficient. I don't have much experience with disparity-map approaches, but the results look very promising. Finally, the Manduchi algorithm works extremely well under the widest range of conditions, including on uneven terrain. Unfortunately, it is very difficult to implement and is extremely computationally expensive.
References:
v-Disparity: Labayrade, R. and Aubert, D. and Tarel, J.P.
Real time obstacle detection in stereovision on non flat road geometry through v-disparity representation
uv-Disparity: Hu, Z. and Uchimura, K.
UV-disparity: an efficient algorithm for stereovision based scene analysis
Inverted Cone Algorithm: Manduchi, R. and Castano, A. and Talukder, A. and Matthies, L.
Obstacle detection and terrain classification for autonomous off-road navigation
There are a few papers on ground-plane obstacle detection algorithms, but I don't know of a good one off the top of my head. If you just need a starting point, you can read about my implementation for a recent project in Section 4.2.3 and Section 4.3.4 of this design report. There was not enough space to discuss the full implementation, but it does address some of the problems you might encounter.