I am planning to implement a live 360° panorama stitcher having 6 cameras of the same model.
I came across the stitching_detailed.cpp implementation from OpenCV. The problem is that it takes around 1 second to stitch only 2 images together using my desired parameters, which is fairly slow.
As my application should be ran in real-time. I need to be able to stitch 6 images together in around 100 ms for it to be "acceptable". The output resolution should be around 0.2 Megapixels. Therefore, I am starting to do my own implementation in C++, based pretty much on what is done on stitchig_detailed. I am aiming to use as much as possible the CUDA functions on OpenCV (some of them are not even implemented stitching_detailed).
I have been carefully studying the stitching pipeline on which the previous algorithm is based, as described in Images stitching by OpenCV and in the paper Automatic Panoramic Image Stitching using Invariant Features.
As the stitching pipeline is too general, there are several assumptions I have made in order to simplify it and speed it up, I would like to get some feedback to know if they are valid:
All the images I will provide to the algorithm are for sure part of the panorama image. So I do not have to extra check on that.
The 6 cameras will be fixed in position and orientation. Therefore, I know beforehand the order in which the cameras need to be stitched into the panorama picture. I can therefore avoid trying to match images from cameras that are not contiguous.
As the cameras are going to remain static. It would be valid to perform the registration step in order to get the camera orientation Matrix R only once (as a kind of initialization). Afterwards, I could only perform the compositing block for subsequent frames. (Again all this assuming the cameras remain completely static).
I also have the following questions...
I can indeed calibrate the cameras prior to my application and obtain each of the intrinsic camera parameters Matrix K and its respective distortion parameters. Could I plug K into the stitching pipeline and therefore avoid the K calculation in the registration step?
What other thing (if any) could camera calibration bring into the pipeline? Distortion correction?
If my previous assumption about executing only the compositing block is correct... Could I still take out some parts of it? My guess is that maybe the seam finder should be ran only once (in the initialization of the algorithm).
Is exposure compensation needed at all for my application case? (As the cameras are literally the same).
Any lead would be deeply appreciated, thanks!
The first thing you can do to reduce your progressing time is to calibrate your camera so that you don't need to process images to find homography matrices based on features. Find them beforehand so that they are constant matrix
Related
This is the setup: A fairly large room with 4 fish-eye cameras mounted on the ceiling. There are no blind spots. Each camera coverage overlaps a little with the other.
The idea is to track people across these cameras. As of now a blob extracting algorithm is in place, which detects people as blobs. It's a fairly decently working algorithm which detects individual people pretty good. Am using the OpenCV API for all of this.
What I mean by track people is that - Say, camera 1 identifies two people, say Person A and Person B. Now, as these two people move from the coverage of camera 1 into the overlapping area of coverage of cam1 and cam2 and into the area where only cam2 covers, cam2 should be able to identify them as the same people A and B cam1 identified them as.
This is what I thought I'd do -
1) The camera renders the image at 15fps and I think the dimensions of the frames are of 1920x1920.
2) Identify blobs individually in each camera and give each blob an unique label.
3) Now as for the overlaps - Compute an affine transformation matrix which maps pixels on one camera's frame onto another camera's frame - this needn't be done for every frame - this can be done before the whole process starts, as a pre-processing step. So in real time, whenever I detect a blob which is in the overlapping area, all I have to do is apply the transformation matrix to the pixels in cam1 and see if there is a corresponding blob in cam2 and give them the same label.
So, Questions :
1) Would this system give me a badly-working system which tracks people decently ?
2) So, for the affine transform, do I have to convert the fish-eye to rectilinear image ? (My answer is yes, but am not too sure)
Please feel free to point out possible errors and why certain things might not work in the process I've described. Also alternate suggestions are welcome! TIA
1- blob extraction is not enough to track a specific object, for people case I suggest HoG - or at least background subtraction before blob extraction, since all of the cameras have still scenes.
2- opencv <=2.4.9 uses pinhole model for stereo vision. so, before any calibration with opencv methods your fisheye images must be converted to rectilinear images first. You might try calibrating yourself using other approaches too
release 3.0.0 will have support for fisheye model. It is on alpha stage, you can still download and give it a try.
When I capture images on a mobile device I can use the tilt-sensor and magnetometer to get the camera rotation matrix for each frame - or at least an initial estimate of it.
Is there a way to provide Stitcher or the "detailed-pipeline" these estimates for improving the convergence and results?
Looking at the code, the Stitcher class is probably not the way to go, it is too high level.
Presumably, there should be a way to feed these initial guesses to the Bundle-Adjuster, but I cannot figure out how to do it.
Looking both at the high-level Stitcher class and the detailed demo in samples/stitching_detailed.cpp, you would have to work here:
detail::HomographyBasedEstimator estimator;
estimator(features_, pairwise_matches_, cameras_);
...
bundle_adjuster_->setConfThresh(conf_thresh_);
(*bundle_adjuster_)(features_, pairwise_matches_, cameras_);
In a sense, you already have computed the homographies frame-by-frame. Therefore you can skip re-estimating homographies for all frames. When you set the cameras to you guesses, they are used by the bundle adjustment as initial guesses.
However, it doesn't seem that the current stitching pipeline can update only the (k-)newest frame(s). So this approach would work if you record a video and stitch the images together finally. If you wanted to stitch continuously, you'd have to construct a more complex pipeline.
Have anyone done something like that?
My problems with the OpenCV sticher is that it warps the images for panoramas, meaning the images get stretched a lot as one moves away from the first image.
From what I can tell OpenCV also builds ontop of the assumption of the camera is in the same position. I am seeking a little guidence on this, if its just the warper I need to change or I also need to relax this asusmption about the camera position being fixed before that.
I noticed that opencv uses a bundle adjuster also, is it using the same assumption that the camera is fixed?
Aerial image mosaicing
The image warping routines that are used in remote sensing and digital geography (for example to produce geotiff files or more generally orthoimages) rely on both:
estimating the relative image motion (often improved with some aircraft motion sensors such as inertial measurement units),
the availability of a Digital Elevation Model of the observed scene.
This allows to estimate the exact projection on the ground of each measured pixel.
Furthermore, this is well beyond what OpenCV will provide with its built-in stitcher.
OpenCV's Stitcher
OpenCV's Stitcher class is indeed dedicated to the assembly of images taken from the same point.
This would not be so bad, except that the functions try to estimate just a rotation (to be more robust) instead of plain homographies (this is where the fixed camera assumption will bite you).
It adds however more functionality that are useful in the context of panoramao creation, especially the image seam cut detection part and the image blending in overlapping areas.
What you can do
With aerial sensors, it is usually sound to assume (except when creating orthoimages) that the camera - scene distance is big enough so that you can approach the inter-frame transform by homographies (expecially if your application does not require very accurate panoramas).
You can try to customize OpenCV's stitcher to replace the transform estimate and the warper to work with homographies instead of rotations.
I can't guess if it will be difficult or not, because for the most part it will consist in using the intermediate transform results and bypassing the final rotation estimation part. You may have to modify the bundle adjuster too however.
I have started on a project to create linear/ strip panorama's of long scenes using video. This meaning that the panorama doesn't revolve around a center but move parallel to a scene eg. vid cam mounted on a vehicle looking perpendicular to the street facade.
The steps I will be following are:
capture frames from video
Feature detection - (SURF)
Feature tracking (Kanade-Lucas-Tomasi)
Homography estimation
Stitching Mosaic.
So far I have been able to save individual frames from video and complete SURF feature detection on only two images. I am not asking for someone to solve my entire project but I am stuck trying complete the SURF detection on the remaing frames captured.
Question: How do I apply SURF detection to successive frames? Do I save it as a YAML or xml?
For my feature detection I used OpenCV's sample find_obj.cpp and just changed the images used.
Has anyone experienced such a project? An example of what I would like to achieve is from Iwane technologies http://www.iwane.com/en/2dpcci.php
While working on a similar project, I created an std::vector of SURF keypoints (both points and descriptors) then used them to compute the pairwise matchings.
The vector was filled while reading frame-by-frame a movie, but it works the same with a sequence of images.
There are not enough points to saturate your memory (and use yml/xml files) unless you have very limited resources or a very very long sequence.
Note that you do not need the feature tracking part, at least in most standard cases: SURF descriptors matching can also provide you an homography estimate (without the need for tracking).
Reading to a vector
Start by declaring a vector of Mat's, for example std::vector<cv::Mat> my_sequence;.
Then, you have two choices:
either you know the number of frames, then you resize the vector to the correct size. Then, for each frame, read the image to some variable and copy it to the correct place in the sequence, using my_sequence.at(i) = frame.clone(); or frame.copyTo(my_sequence.at(i));
or you don't know the size beforehand, and you simply call the push_back() method as usual: my_sequence.push_back(frame);
I am working with 2 fly cameras and trying to stitch them together.
I am working with OpenCV and C++ here.
Since I am trying to cover large region using both cameras (and to contour detection later on), I am wondering if there's a fast way to stitch both images from both cameras together ?
Currently here's what I am doing:
Removing each camera's image with previously stored background image (to speed up contour detection later on)
Un-distort each image using cvRemap function
And finally to set the ROI of the images for stitching them together.
My question is, is it possible to speed this up even more ? Since currently these steps take around 60ms, and with additional functionality it slows down to 0.1 second.
Have I been using the slower functions of OpenCV ? Or are there any tricks to gain more speed ?
Take the latest OpenCV snapshot from here and try the stitching module implemented here. They have been working on stitching performance lately, so it's possible to get some good improvements.
By the way, what step takes the most? Did you profile your app? Take a look at the profile results, and you'll be able to understand exactly where to optimize, and maybe how to do it.