When I capture images on a mobile device I can use the tilt-sensor and magnetometer to get the camera rotation matrix for each frame - or at least an initial estimate of it.
Is there a way to provide Stitcher or the "detailed-pipeline" these estimates for improving the convergence and results?
Looking at the code, the Stitcher class is probably not the way to go, it is too high level.
Presumably, there should be a way to feed these initial guesses to the Bundle-Adjuster, but I cannot figure out how to do it.
Looking both at the high-level Stitcher class and the detailed demo in samples/stitching_detailed.cpp, you would have to work here:
detail::HomographyBasedEstimator estimator;
estimator(features_, pairwise_matches_, cameras_);
...
bundle_adjuster_->setConfThresh(conf_thresh_);
(*bundle_adjuster_)(features_, pairwise_matches_, cameras_);
In a sense, you already have computed the homographies frame-by-frame. Therefore you can skip re-estimating homographies for all frames. When you set the cameras to you guesses, they are used by the bundle adjustment as initial guesses.
However, it doesn't seem that the current stitching pipeline can update only the (k-)newest frame(s). So this approach would work if you record a video and stitch the images together finally. If you wanted to stitch continuously, you'd have to construct a more complex pipeline.
Related
I am trying to differentiate between camera motion and tool motion in a surgical video.
I have tried optical flow using opencv farneback and pass the results to an ML model to learn but no success.a major issue is getting good keypoints in case of camera motion. Is there an alternate technique to distinguish between camera motion and tool/tissue movement? Note: camera motion happens only in 10% of the video
I wish I could add a comment (too new to be able to comment), as I don't have a good answer for you.
I think it really depends on the nature of the input image. Can you show some typical input images here?
What is your optical flow result look like? I thought you might get some reasonable results.
Have you tried some motion estimation method, to analyze if there is global movement across different frames, or there is only some local movements?
I am planning to implement a live 360° panorama stitcher having 6 cameras of the same model.
I came across the stitching_detailed.cpp implementation from OpenCV. The problem is that it takes around 1 second to stitch only 2 images together using my desired parameters, which is fairly slow.
As my application should be ran in real-time. I need to be able to stitch 6 images together in around 100 ms for it to be "acceptable". The output resolution should be around 0.2 Megapixels. Therefore, I am starting to do my own implementation in C++, based pretty much on what is done on stitchig_detailed. I am aiming to use as much as possible the CUDA functions on OpenCV (some of them are not even implemented stitching_detailed).
I have been carefully studying the stitching pipeline on which the previous algorithm is based, as described in Images stitching by OpenCV and in the paper Automatic Panoramic Image Stitching using Invariant Features.
As the stitching pipeline is too general, there are several assumptions I have made in order to simplify it and speed it up, I would like to get some feedback to know if they are valid:
All the images I will provide to the algorithm are for sure part of the panorama image. So I do not have to extra check on that.
The 6 cameras will be fixed in position and orientation. Therefore, I know beforehand the order in which the cameras need to be stitched into the panorama picture. I can therefore avoid trying to match images from cameras that are not contiguous.
As the cameras are going to remain static. It would be valid to perform the registration step in order to get the camera orientation Matrix R only once (as a kind of initialization). Afterwards, I could only perform the compositing block for subsequent frames. (Again all this assuming the cameras remain completely static).
I also have the following questions...
I can indeed calibrate the cameras prior to my application and obtain each of the intrinsic camera parameters Matrix K and its respective distortion parameters. Could I plug K into the stitching pipeline and therefore avoid the K calculation in the registration step?
What other thing (if any) could camera calibration bring into the pipeline? Distortion correction?
If my previous assumption about executing only the compositing block is correct... Could I still take out some parts of it? My guess is that maybe the seam finder should be ran only once (in the initialization of the algorithm).
Is exposure compensation needed at all for my application case? (As the cameras are literally the same).
Any lead would be deeply appreciated, thanks!
The first thing you can do to reduce your progressing time is to calibrate your camera so that you don't need to process images to find homography matrices based on features. Find them beforehand so that they are constant matrix
Have anyone done something like that?
My problems with the OpenCV sticher is that it warps the images for panoramas, meaning the images get stretched a lot as one moves away from the first image.
From what I can tell OpenCV also builds ontop of the assumption of the camera is in the same position. I am seeking a little guidence on this, if its just the warper I need to change or I also need to relax this asusmption about the camera position being fixed before that.
I noticed that opencv uses a bundle adjuster also, is it using the same assumption that the camera is fixed?
Aerial image mosaicing
The image warping routines that are used in remote sensing and digital geography (for example to produce geotiff files or more generally orthoimages) rely on both:
estimating the relative image motion (often improved with some aircraft motion sensors such as inertial measurement units),
the availability of a Digital Elevation Model of the observed scene.
This allows to estimate the exact projection on the ground of each measured pixel.
Furthermore, this is well beyond what OpenCV will provide with its built-in stitcher.
OpenCV's Stitcher
OpenCV's Stitcher class is indeed dedicated to the assembly of images taken from the same point.
This would not be so bad, except that the functions try to estimate just a rotation (to be more robust) instead of plain homographies (this is where the fixed camera assumption will bite you).
It adds however more functionality that are useful in the context of panoramao creation, especially the image seam cut detection part and the image blending in overlapping areas.
What you can do
With aerial sensors, it is usually sound to assume (except when creating orthoimages) that the camera - scene distance is big enough so that you can approach the inter-frame transform by homographies (expecially if your application does not require very accurate panoramas).
You can try to customize OpenCV's stitcher to replace the transform estimate and the warper to work with homographies instead of rotations.
I can't guess if it will be difficult or not, because for the most part it will consist in using the intermediate transform results and bypassing the final rotation estimation part. You may have to modify the bundle adjuster too however.
I have started on a project to create linear/ strip panorama's of long scenes using video. This meaning that the panorama doesn't revolve around a center but move parallel to a scene eg. vid cam mounted on a vehicle looking perpendicular to the street facade.
The steps I will be following are:
capture frames from video
Feature detection - (SURF)
Feature tracking (Kanade-Lucas-Tomasi)
Homography estimation
Stitching Mosaic.
So far I have been able to save individual frames from video and complete SURF feature detection on only two images. I am not asking for someone to solve my entire project but I am stuck trying complete the SURF detection on the remaing frames captured.
Question: How do I apply SURF detection to successive frames? Do I save it as a YAML or xml?
For my feature detection I used OpenCV's sample find_obj.cpp and just changed the images used.
Has anyone experienced such a project? An example of what I would like to achieve is from Iwane technologies http://www.iwane.com/en/2dpcci.php
While working on a similar project, I created an std::vector of SURF keypoints (both points and descriptors) then used them to compute the pairwise matchings.
The vector was filled while reading frame-by-frame a movie, but it works the same with a sequence of images.
There are not enough points to saturate your memory (and use yml/xml files) unless you have very limited resources or a very very long sequence.
Note that you do not need the feature tracking part, at least in most standard cases: SURF descriptors matching can also provide you an homography estimate (without the need for tracking).
Reading to a vector
Start by declaring a vector of Mat's, for example std::vector<cv::Mat> my_sequence;.
Then, you have two choices:
either you know the number of frames, then you resize the vector to the correct size. Then, for each frame, read the image to some variable and copy it to the correct place in the sequence, using my_sequence.at(i) = frame.clone(); or frame.copyTo(my_sequence.at(i));
or you don't know the size beforehand, and you simply call the push_back() method as usual: my_sequence.push_back(frame);
I'm looking for the fastest and more efficient method of detecting an object in a moving video. Things to note about this video: It is very grainy and low resolution, also both the background and foreground are moving simultaneously.
Note: I'm trying to detect a moving truck on a road in a moving video.
Methods I've tried:
Training a Haar Cascade - I've attempted training the classifiers to identify the object by taking copping multiple images of the desired object. This proved to produce either many false detects or no detects at all (the object desired was never detected). I used about 100 positive images and 4000 negatives.
SIFT and SURF Keypoints - When attempting to use either of these methods which is based on features, I discovered that the object I wanted to detect was too low in resolution, so there were not enough features to match to make an accurate detection. (Object desired was never detected)
Template Matching - This is probably the best method I've tried. It's the most accurate although the most hacky of them all. I can detect the object for one specific video using a template cropped from the video. However, there is no guaranteed accuracy because all that is known is the best match for each frame, no analysis is done on the percentage template matches the frame. Basically, it only works if the object is always in the video, otherwise it will create a false detect.
So those are the big 3 methods I've tried and all have failed. What would work best is something like template matching but with scale and rotation invariance (which led me to try SIFT/SURF), but i have no idea how to modify the template matching function.
Does anyone have any suggestions how to best accomplish this task?
Apply optical flow to the image and then segment it based on flow field. Background flow is very different from "object" flow (which mainly diverges or converges depending on whether it is moving towards or away from you, with some lateral component also).
Here's an oldish project which worked this way:
http://users.fmrib.ox.ac.uk/~steve/asset/index.html
This vehicle detection paper uses a Gabor filter bank for low level detection and then uses the response to create the features space where it trains an SVM classifier.
The technique seems to work well and is at least scale invariant. I am not sure about rotation though.
Not knowing your application, my initial impression is normalized cross-correlation, especially since I remember seeing a purely optical cross-correlator that had vehicle-tracking as the example application. (Tracking a vehicle as it passes using only optical components and an image of the side of the vehicle - I wish I could find the link.) This is similar (if not identical) to "template matching", which you say kind of works, but this won't work if the images are rotated, as you know.
However, there's a related method based on log-polar coordinates that will work regardless of rotation, scale, shear, and translation.
I imagine this would also enable tracking that the object has left the scene of the video, too, since the maximum correlation will decrease.
How low resolution are we talking? Could you also elaborate on the object? Is it a specific color? Does it have a pattern? The answers affect what you should be using.
Also, I might be reading your template matching statement wrong, but it sounds like you are overtraining it (by testing on the same video you extracted the object from??).
A Haar Cascade is going to require significant training data on your part, and will be poor for any adjustments in orientation.
Your best bet might be to combine template matching with an algorithm similar to camshift in opencv (5,7MB PDF), along with a probabilistic model (you'll have to figure this one out) of whether the truck is still in the image.