SURF feature detection for linear panoramas OpenCV - opencv

I have started on a project to create linear/ strip panorama's of long scenes using video. This meaning that the panorama doesn't revolve around a center but move parallel to a scene eg. vid cam mounted on a vehicle looking perpendicular to the street facade.
The steps I will be following are:
capture frames from video
Feature detection - (SURF)
Feature tracking (Kanade-Lucas-Tomasi)
Homography estimation
Stitching Mosaic.
So far I have been able to save individual frames from video and complete SURF feature detection on only two images. I am not asking for someone to solve my entire project but I am stuck trying complete the SURF detection on the remaing frames captured.
Question: How do I apply SURF detection to successive frames? Do I save it as a YAML or xml?
For my feature detection I used OpenCV's sample find_obj.cpp and just changed the images used.
Has anyone experienced such a project? An example of what I would like to achieve is from Iwane technologies http://www.iwane.com/en/2dpcci.php

While working on a similar project, I created an std::vector of SURF keypoints (both points and descriptors) then used them to compute the pairwise matchings.
The vector was filled while reading frame-by-frame a movie, but it works the same with a sequence of images.
There are not enough points to saturate your memory (and use yml/xml files) unless you have very limited resources or a very very long sequence.
Note that you do not need the feature tracking part, at least in most standard cases: SURF descriptors matching can also provide you an homography estimate (without the need for tracking).
Reading to a vector
Start by declaring a vector of Mat's, for example std::vector<cv::Mat> my_sequence;.
Then, you have two choices:
either you know the number of frames, then you resize the vector to the correct size. Then, for each frame, read the image to some variable and copy it to the correct place in the sequence, using my_sequence.at(i) = frame.clone(); or frame.copyTo(my_sequence.at(i));
or you don't know the size beforehand, and you simply call the push_back() method as usual: my_sequence.push_back(frame);

Related

Live 360° Panorama Image Stitching implementation

I am planning to implement a live 360° panorama stitcher having 6 cameras of the same model.
I came across the stitching_detailed.cpp implementation from OpenCV. The problem is that it takes around 1 second to stitch only 2 images together using my desired parameters, which is fairly slow.
As my application should be ran in real-time. I need to be able to stitch 6 images together in around 100 ms for it to be "acceptable". The output resolution should be around 0.2 Megapixels. Therefore, I am starting to do my own implementation in C++, based pretty much on what is done on stitchig_detailed. I am aiming to use as much as possible the CUDA functions on OpenCV (some of them are not even implemented stitching_detailed).
I have been carefully studying the stitching pipeline on which the previous algorithm is based, as described in Images stitching by OpenCV and in the paper Automatic Panoramic Image Stitching using Invariant Features.
As the stitching pipeline is too general, there are several assumptions I have made in order to simplify it and speed it up, I would like to get some feedback to know if they are valid:
All the images I will provide to the algorithm are for sure part of the panorama image. So I do not have to extra check on that.
The 6 cameras will be fixed in position and orientation. Therefore, I know beforehand the order in which the cameras need to be stitched into the panorama picture. I can therefore avoid trying to match images from cameras that are not contiguous.
As the cameras are going to remain static. It would be valid to perform the registration step in order to get the camera orientation Matrix R only once (as a kind of initialization). Afterwards, I could only perform the compositing block for subsequent frames. (Again all this assuming the cameras remain completely static).
I also have the following questions...
I can indeed calibrate the cameras prior to my application and obtain each of the intrinsic camera parameters Matrix K and its respective distortion parameters. Could I plug K into the stitching pipeline and therefore avoid the K calculation in the registration step?
What other thing (if any) could camera calibration bring into the pipeline? Distortion correction?
If my previous assumption about executing only the compositing block is correct... Could I still take out some parts of it? My guess is that maybe the seam finder should be ran only once (in the initialization of the algorithm).
Is exposure compensation needed at all for my application case? (As the cameras are literally the same).
Any lead would be deeply appreciated, thanks!
The first thing you can do to reduce your progressing time is to calibrate your camera so that you don't need to process images to find homography matrices based on features. Find them beforehand so that they are constant matrix

How to track people across multiple cameras?

This is the setup: A fairly large room with 4 fish-eye cameras mounted on the ceiling. There are no blind spots. Each camera coverage overlaps a little with the other.
The idea is to track people across these cameras. As of now a blob extracting algorithm is in place, which detects people as blobs. It's a fairly decently working algorithm which detects individual people pretty good. Am using the OpenCV API for all of this.
What I mean by track people is that - Say, camera 1 identifies two people, say Person A and Person B. Now, as these two people move from the coverage of camera 1 into the overlapping area of coverage of cam1 and cam2 and into the area where only cam2 covers, cam2 should be able to identify them as the same people A and B cam1 identified them as.
This is what I thought I'd do -
1) The camera renders the image at 15fps and I think the dimensions of the frames are of 1920x1920.
2) Identify blobs individually in each camera and give each blob an unique label.
3) Now as for the overlaps - Compute an affine transformation matrix which maps pixels on one camera's frame onto another camera's frame - this needn't be done for every frame - this can be done before the whole process starts, as a pre-processing step. So in real time, whenever I detect a blob which is in the overlapping area, all I have to do is apply the transformation matrix to the pixels in cam1 and see if there is a corresponding blob in cam2 and give them the same label.
So, Questions :
1) Would this system give me a badly-working system which tracks people decently ?
2) So, for the affine transform, do I have to convert the fish-eye to rectilinear image ? (My answer is yes, but am not too sure)
Please feel free to point out possible errors and why certain things might not work in the process I've described. Also alternate suggestions are welcome! TIA
1- blob extraction is not enough to track a specific object, for people case I suggest HoG - or at least background subtraction before blob extraction, since all of the cameras have still scenes.
2- opencv <=2.4.9 uses pinhole model for stereo vision. so, before any calibration with opencv methods your fisheye images must be converted to rectilinear images first. You might try calibrating yourself using other approaches too
release 3.0.0 will have support for fisheye model. It is on alpha stage, you can still download and give it a try.

Object Recognition by Outlines vs Features

Context:
I have the RGB-D video from a Kinect, which is aimed straight down at a table. There is a library of around 12 objects I need to identify, alone or several at a time. I have been working with SURF extraction and detection from the RGB image, preprocessing by downscaling to 320x240, grayscale, stretching the contrast and balancing the histogram before applying SURF. I built a lasso tool to choose among detected keypoints in a still of the video image. Then those keypoints are used to build object descriptors which are used to identify objects in the live video feed.
Problem:
SURF examples show successful identification of objects with a decent amount of text-like feature detail eg. logos and patterns. The objects I need to identify are relatively plain but have distinctive geometry. The SURF features found in my stills are sometimes consistent but mostly unimportant surface features. For instance, say I have a wooden cube. SURF detects a few bits of grain on one face, then fails on other faces. I need to detect (something like) that there are four corners at equal distances and right angles. None of my objects has much of a pattern but all have distinctive symmetric geometry and color. Think cellphone, lollipop, knife, bowling pin. My thought was that I could build object descriptors for each significantly different-looking orientation of the object, eg. two descriptors for a bowling pin: one standing up and one laying down. For a cellphone, one laying on the front and one on the back. My recognizer needs rotational invariance and some degree of scale invariance in case objects are stacked. Ability to deal with some occlusion is preferable (SURF behaves well enough) but not the most important characteristic. Skew invariance would be preferable and SURF does well with paper printouts of my objects held by hand at a skew.
Questions:
Am I using the wrong SURF parameters to find features at the wrong scale? Is there a better algorithm for this kind of object identification? Is there something as readily usable as SURF that uses the depth data from the Kinect along with or instead of the RGB data?
I was doing something similar for a project, and ended up using a super simple method for object recognition, which was using OpenCV blob detection, and recognizing objects based on their areas. Obviously, there needs to be enough variance for this method to work.
You can see my results here: http://portfolio.jackkalish.com/Secondhand-Stories
I know there are other methods out there, one possible solution for you could be approxPolyDP, which is described here:
How to detect simple geometric shapes using OpenCV
Would love to hear about your progress on this!

OpenCV intrusion detection

For a project of mine, I'm required to process images differences with OpenCV. The goal is to detect an intrusion in a zone.
To be a little more clear, here are the inputs and outputs:
Inputs:
An image of reference
A second image from approximately the same point of view (can be an error margin)
Outputs:
Detection of new objects in the scene.
Bonus:
Recognition of those objects.
For me, the most difficult part of it is to take off small differences (luminosity, camera position margin error, movement of trees...)
I already read a lot about OpenCV image processing (subtraction, erosion, threshold, SIFT, SURF...) and have some good results.
What I would like is a list of steps you think is the best to have a good detection (humans, cars...), and the algorithms to do each step.
Many thanks for your help.
Track-by-Detect, human tracker:
You apply the Hog detector to detect humans.
You draw a respective rectangle as foreground area on the foreground mask.
You pass this mask to "The OpenCV Video Surveillance / Blob Tracker Facility"
You can, now, group the passing humans based on their blob.{x,y} values into public/restricted areas.
I had to deal with this problem the last year.
I suggest an adaptive background-foreground estimation algorithm which produced a foreground mask.
On top of that, you add a blob detector and tracker, and then calculate if an intersection takes place between the blobs and your intrusion area.
Opencv comes has samples of all of these within the legacy code. Ofcourse, if you want you can also use your own or other versions of these.
Links:
http://opencv.willowgarage.com/wiki/VideoSurveillance
http://experienceopencv.blogspot.gr/2011/03/blob-tracking-video-surveillance-demo.html
I would definitely start with a running average background subtraction if the camera is static. Then you can use findContours() to find the intruding object's location and size. If you want to detect humans that are walking around in a scene, I would recommend looking at using the built-in haar classifier:
http://docs.opencv.org/doc/tutorials/objdetect/cascade_classifier/cascade_classifier.html#cascade-classifier
where you would just replace the xml with the upperbody classifier.

object recognition performance not good

I am trying to do object recognition using algorithms such as SURF, FERN, FREAK in opencv 2.4.2.
I am using the programs from opencv samples without modifications - find_obj.cpp, find_obj_ferns.cpp, freak_demo.cpp
I tried changing the parameters for the algorithms which didn't help.
I have my training images, test images and the result of FREAK recognition here
As you can see the result is pretty bad.
No feature descriptors is detected for one of the training image - image here
Feature descriptors are detected outside the object boundary for the other - image here
I have a few questions:
Why does these algorithms work with grayscale images ? It is apparent that for my above training images, the object can be detected easily if RGB is included. Is there any technique that takes this into account.
Is there any other way to improve performance. I tried fiddling with feature parameters which didn't work well.
First thing i observed in your image is, object is plane and no texture differences are there...I mean all the feature detectors you used are for finding corners which are view invariant, it means those are the keypoints in an image which are having unique neighborhood and good magnitude of x and y derivatives. I have uploaded my analysis...see the figures
How to know what I am saying is correct?
Just go to the descriptor values of a keypoint you find over your object and see the values, you will see most of them are zeros...Because a descriptor is the description of variation of the edges around a corner point in a specific direction (see surf documentation for more details).
The object you are trying to detect is looking like a mobile phone, so you just reverse the object or mobile and repeat the experiment and you will surely get good results...Because on front side generally objects have more texture like switches, logos etc..
Here is a result I uploaded,

Resources