Point Cloud Stitching and Processing - ros

I have pointclouds from two realsense D415 cameras, which is mounted in a way to have overlap. I am able to stitch the output pointclouds in realtime to get a single larger FOV pointcloud by using PCL ICP to find the transforms and transforming one pointcloud to match with the other. Now I would like to detect planes on this output pointcloud and further detect people with the help of the detected planes and ground plane. I have implemented the same with PCL libraries once again.
Now issues arise in two cases:
(1) The final stitched output is unordered leading to a lot of PCL functions not being able to use the pointcloud. To overcome this , say I resize my pointcloud to match the final dimensions with height not being 1, I run into (2).
(2) Upon passing the resized pointcloud I get an error from the normal algorithm I am using as Input not from a projective device, using only XXXX points (XXXX is a very small fraction of the total number of points in the pointcloud). Any other available normal algorithm performs really slow and is not capable of being used in real time applications.
Any ideas how to proceed with this? Happy to provide more information.

Related

Live 360° Panorama Image Stitching implementation

I am planning to implement a live 360° panorama stitcher having 6 cameras of the same model.
I came across the stitching_detailed.cpp implementation from OpenCV. The problem is that it takes around 1 second to stitch only 2 images together using my desired parameters, which is fairly slow.
As my application should be ran in real-time. I need to be able to stitch 6 images together in around 100 ms for it to be "acceptable". The output resolution should be around 0.2 Megapixels. Therefore, I am starting to do my own implementation in C++, based pretty much on what is done on stitchig_detailed. I am aiming to use as much as possible the CUDA functions on OpenCV (some of them are not even implemented stitching_detailed).
I have been carefully studying the stitching pipeline on which the previous algorithm is based, as described in Images stitching by OpenCV and in the paper Automatic Panoramic Image Stitching using Invariant Features.
As the stitching pipeline is too general, there are several assumptions I have made in order to simplify it and speed it up, I would like to get some feedback to know if they are valid:
All the images I will provide to the algorithm are for sure part of the panorama image. So I do not have to extra check on that.
The 6 cameras will be fixed in position and orientation. Therefore, I know beforehand the order in which the cameras need to be stitched into the panorama picture. I can therefore avoid trying to match images from cameras that are not contiguous.
As the cameras are going to remain static. It would be valid to perform the registration step in order to get the camera orientation Matrix R only once (as a kind of initialization). Afterwards, I could only perform the compositing block for subsequent frames. (Again all this assuming the cameras remain completely static).
I also have the following questions...
I can indeed calibrate the cameras prior to my application and obtain each of the intrinsic camera parameters Matrix K and its respective distortion parameters. Could I plug K into the stitching pipeline and therefore avoid the K calculation in the registration step?
What other thing (if any) could camera calibration bring into the pipeline? Distortion correction?
If my previous assumption about executing only the compositing block is correct... Could I still take out some parts of it? My guess is that maybe the seam finder should be ran only once (in the initialization of the algorithm).
Is exposure compensation needed at all for my application case? (As the cameras are literally the same).
Any lead would be deeply appreciated, thanks!
The first thing you can do to reduce your progressing time is to calibrate your camera so that you don't need to process images to find homography matrices based on features. Find them beforehand so that they are constant matrix

is 1.5 average error on stereo camera calibration bad? using opencv

i used the opencv sample code for stereo camera calibration to get the intrinsics and extrinsics of my stereo camera. I used 149 image pairs and the program detected 114 image pairs
Result of my Calibration:
..... 114 pairs have been successfully detected.
Running stereo calibration ...
done with RMS error = 1.60208
average epipolar error = 1.15512
i know the error should be below 1 but i only get below 1 of error in small number of image pairs. so im not sure if my result is good or bad.
You should be able to get an error below 1, but it's not so bad. I also do the calibration with around 100 of images. I often got a few images to discard in which the detection was not reliable.
If you decreased the number of images down to 10 images, then the calibration might overfit for these cases. The error would then not be reliable.
In the calibration process, the problems I faced came from the calibration setup. My recommendations are the following:
Check that your calibration pattern is perfectly flat. In my case I printed on adhesive paper and glued it on a piece of glass.
Check that your calibration pattern is not symmetrical in rotation, otherwise the pose estimation could be wrong.
Check the intermediate pattern points detection. There are some examples in opencv to show the corners or circles centers detected points.
The error can be also displayed for each frame. This can help you to understand for which images you have a problem. If you see that these images actually have a detection problem, you can discard them.
If you acquire videos and not images, both cameras should be synchronized with a hardware connection. In my case I cannot have such a link, therefore I built some kind of holder for the calibration target to keep it still, and I acquired only images, not videos.
This won't reduce your calibration error, but use very different pattern positions to cover the maximum of the field of view.
If your depth of field is small and you have blurry images before/after the focus because of that, change from the chessboard pattern to a circles pattern (functions also available in opencv).
If you don't have a strong distortion in your images (e.g. a photo with an iphone doesn't really show a strong fisheye-like distortion), consider forcing K3=0.
In my case, I fixed the "principal point" in the middle of the image, because the algorithm always found crazy values for these parameters, like for K3.
Hope this helps a bit. Good luck!

How to track people across multiple cameras?

This is the setup: A fairly large room with 4 fish-eye cameras mounted on the ceiling. There are no blind spots. Each camera coverage overlaps a little with the other.
The idea is to track people across these cameras. As of now a blob extracting algorithm is in place, which detects people as blobs. It's a fairly decently working algorithm which detects individual people pretty good. Am using the OpenCV API for all of this.
What I mean by track people is that - Say, camera 1 identifies two people, say Person A and Person B. Now, as these two people move from the coverage of camera 1 into the overlapping area of coverage of cam1 and cam2 and into the area where only cam2 covers, cam2 should be able to identify them as the same people A and B cam1 identified them as.
This is what I thought I'd do -
1) The camera renders the image at 15fps and I think the dimensions of the frames are of 1920x1920.
2) Identify blobs individually in each camera and give each blob an unique label.
3) Now as for the overlaps - Compute an affine transformation matrix which maps pixels on one camera's frame onto another camera's frame - this needn't be done for every frame - this can be done before the whole process starts, as a pre-processing step. So in real time, whenever I detect a blob which is in the overlapping area, all I have to do is apply the transformation matrix to the pixels in cam1 and see if there is a corresponding blob in cam2 and give them the same label.
So, Questions :
1) Would this system give me a badly-working system which tracks people decently ?
2) So, for the affine transform, do I have to convert the fish-eye to rectilinear image ? (My answer is yes, but am not too sure)
Please feel free to point out possible errors and why certain things might not work in the process I've described. Also alternate suggestions are welcome! TIA
1- blob extraction is not enough to track a specific object, for people case I suggest HoG - or at least background subtraction before blob extraction, since all of the cameras have still scenes.
2- opencv <=2.4.9 uses pinhole model for stereo vision. so, before any calibration with opencv methods your fisheye images must be converted to rectilinear images first. You might try calibrating yourself using other approaches too
release 3.0.0 will have support for fisheye model. It is on alpha stage, you can still download and give it a try.

Undistorting/rectify images with OpenCV

I took the example of code for calibrating a camera and undistorting images from this book: shop.oreilly.com/product/9780596516130.do
As far as I understood the usual camera calibration methods of OpenCV work perfectly for "normal" cameras.
When it comes to Fisheye-Lenses though we have to use a vector of 8 calibration parameters instead of 5 and also the flag CV_CALIB_RATIONAL_MODEL in the method cvCalibrateCamera2.
At least, that's what it says in the OpenCV documentary
So, when I use this on an array of images like this (Sample images from OCamCalib) I get the following results using cvInitUndistortMap: abload.de/img/rastere4u2w.jpg
Since the resulting images are cut out of the whole undistorted image, I went ahead and used cvInitUndistortRectifyMap (like it's described here stackoverflow.com/questions/8837478/opencv-cvremap-cropping-image). So I got the following results: abload.de/img/rasterxisps.jpg
And now my question is: Why is not the whole image undistorted? In some pics of my later results you can recognize that the laptop for example is still totally distorted. How can I acomplish even better results using the standard OpenCV methods?
I'm new to stackoverflow and I'm new to OpenCV as well, so please excuse any of my shortcomings when it comes to expressing my problems.
All chessboard corners should be visible to be found. The algorithm expect a certain size of chessboard such as 4x3 or 7x6 (for example). The white border around a chess board should be visible too or dark squares may not be defined precisely.
You still have high distortions at the image periphery after undistort() since distortions are radial (that is they increase with the radius) and your found coefficients are wrong. The latter are wrong since a calibration process minimizes the sum of squared errors in pixel coordinates and you did not represent the periphery with enough samples.
TODO: You have to have 20-40 chess board pattern images if you use 8 distCoeff. Slant your boards at different angles, put them at different distances and spread them around, especially at the periphery. Remember, the success of calibration depends on sampling and also on seeing vanishing points clearly from your chess board (hence slanting and tilting).

SURF feature detection for linear panoramas OpenCV

I have started on a project to create linear/ strip panorama's of long scenes using video. This meaning that the panorama doesn't revolve around a center but move parallel to a scene eg. vid cam mounted on a vehicle looking perpendicular to the street facade.
The steps I will be following are:
capture frames from video
Feature detection - (SURF)
Feature tracking (Kanade-Lucas-Tomasi)
Homography estimation
Stitching Mosaic.
So far I have been able to save individual frames from video and complete SURF feature detection on only two images. I am not asking for someone to solve my entire project but I am stuck trying complete the SURF detection on the remaing frames captured.
Question: How do I apply SURF detection to successive frames? Do I save it as a YAML or xml?
For my feature detection I used OpenCV's sample find_obj.cpp and just changed the images used.
Has anyone experienced such a project? An example of what I would like to achieve is from Iwane technologies http://www.iwane.com/en/2dpcci.php
While working on a similar project, I created an std::vector of SURF keypoints (both points and descriptors) then used them to compute the pairwise matchings.
The vector was filled while reading frame-by-frame a movie, but it works the same with a sequence of images.
There are not enough points to saturate your memory (and use yml/xml files) unless you have very limited resources or a very very long sequence.
Note that you do not need the feature tracking part, at least in most standard cases: SURF descriptors matching can also provide you an homography estimate (without the need for tracking).
Reading to a vector
Start by declaring a vector of Mat's, for example std::vector<cv::Mat> my_sequence;.
Then, you have two choices:
either you know the number of frames, then you resize the vector to the correct size. Then, for each frame, read the image to some variable and copy it to the correct place in the sequence, using my_sequence.at(i) = frame.clone(); or frame.copyTo(my_sequence.at(i));
or you don't know the size beforehand, and you simply call the push_back() method as usual: my_sequence.push_back(frame);

Resources