I'm currently working on Image stitching using OpenCV 2.3.1 on Visual Studio 2010, but I'm having some trouble.
Problem Description
I'm trying to write a code for stitching multiple images derived from a few cameras(about 3~4), i,e, the code should keep executing image stitching until I ask it to stop.
The following is what I've done so far:
(For simplification, I'll replace some part of the code with just a few words)
1.Reading frames(images) from 2 cameras (Currently I'm just working on 2 cameras.)
2.Feature detection, descriptor calculation (SURF)
3.Feature matching using FlannBasedMatcher
4.Removing outliers and calculate the Homography with inliers using RANSAC.
5.Warp one of both images.
For step 5., I followed the answer in the following thread and just changed some parameters:
Stitching 2 images in opencv
However, the result is terrible though.
I just uploaded the result onto youtube and of course only those who have the link will be able to see it.
My code is shown below:
(Only crucial parts are shown)
VideoCapture cam1, cam2;
Mat frm1, frm2;
cam1 >> frm1;
cam2 >> frm2;
//(SURF detection, descriptor calculation
//and matching using FlannBasedMatcher)
double max_dist = 0; double min_dist = 100;
//-- Quick calculation of max and min distances between keypoints
for( int i = 0; i < descriptors_1.rows; i++ )
double dist = matches[i].distance;
if( dist < min_dist ) min_dist = dist;
if( dist > max_dist ) max_dist = dist;
(Draw only "good" matches
(i.e. whose distance is less than 3*min_dist ))
vector<Point2f> frame1;
vector<Point2f> frame2;
for( int i = 0; i < good_matches.size(); i++ )
//-- Get the keypoints from the good matches
frame1.push_back( keypoints_1[ good_matches[i].queryIdx ].pt );
frame2.push_back( keypoints_2[ good_matches[i].trainIdx ].pt );
Mat H = findHomography( Mat(frame1), Mat(frame2), CV_RANSAC );
cout << "Homography: " << H << endl;
/* warp the image */
Mat warpImage2;
warpPerspective(frm2, warpImage2,
H, Size(frm2.cols, frm2.rows), INTER_CUBIC);
Mat final(Size(frm2.cols*3 + frm1.cols, frm2.rows),CV_8UC3);
Mat roi1(final, Rect(frm1.cols, 0, frm1.cols, frm1.rows));
Mat roi2(final, Rect(2*frm1.cols, 0, frm2.cols, frm2.rows));
imshow("final", final);
What else should I do to make the stitching better?
Besides, is it reasonable to make the Homography matrix fixed instead of keeping computing it ?
What I mean is to specify the angle and the displacement between the 2 cameras by myself so as to derive a Homography matrix that satisfies what I want.
Thanks. :)

It sounds like you are going about this sensibly, but if you have access to both of the cameras, and they will remain stationary with respect to each other, then calibrating offline, and simply applying the transformation online will make your application more efficient.
One point to note is, you say you are using the findHomography function from OpenCV. From the documentation, this function:
Finds a perspective transformation between two planes.
However, your points are not restricted to a specific plane as they are imaging a 3D scene. If you wanted to calibrate offline, you could image a chessboard with both cameras, and the detected corners could be used in this function.
Alternatively, you may like to investigate the Fundamental matrix, which can be calculated with a similar function. This matrix describes the relative position of the cameras, but some work (and a good textbook) will be required to extract them.
If you can find it, I would strongly recommend having a look at Part II: "Two-View Geometry" in the book "Multiple View Geometry in computer vision", by Richard Hartley and Andrew Zisserman, which goes through the process in detail.

I have been working lately on image registration. My algorithm takes two images, calculates the SURF features, find correspondences, find homography matrix and then stitch both images together, I did it with the next code:
void stich(Mat base, Mat target,Mat homography, Mat& panorama){
Mat corners1(1, 4,CV_32F);
Mat corners2(1,4,CV_32F);
Mat corners(1,4,CV_32F);
vector<Mat> planes;
/* compute corners
of warped image
perspectiveTransform(corners, corners, homography);
/* compute size of resulting
image and allocate memory
double x_start = min( min( (double)corners.at<Vec2f>(0,0)[0], (double)corners.at<Vec2f> (0,1)[0]),0.0);
double x_end = max( max( (double)corners.at<Vec2f>(0,2)[0], (double)corners.at<Vec2f>(0,3)[0]), (double)base.cols);
double y_start = min( min( (double)corners.at<Vec2f>(0,0)[1], (double)corners.at<Vec2f>(0,2)[1]), 0.0);
double y_end = max( max( (double)corners.at<Vec2f>(0,1)[1], (double)corners.at<Vec2f>(0,3)[1]), (double)base.rows);
/*Creating image
with same channels, depth
as target
and proper size
panorama.create(Size(x_end - x_start + 1, y_end - y_start + 1), target.depth());
/*Planes should
have same n.channels
as target
for (int i=0;i<target.channels();i++){
// create translation matrix in order to copy both images to correct places
Mat T;
// copy base image to correct position within output image
warpPerspective(base, panorama, T,panorama.size(),INTER_LINEAR| CV_WARP_FILL_OUTLIERS);
// change homography to take necessary translation into account
gemm(T, homography,1,T,0,T);
// warp second image and copy it to output image
warpPerspective(target,panorama, T, panorama.size(),INTER_LINEAR);
Any question I will try


OpenCV Multiple marker detection?

I've been working on detecting fiducial markers in scenes. An example of my fiducial marker is here:
I have been able to detect a single fiducial marker in a scene very well. What is the methodology for detecting multiple fiducial markers in a scene? Doing feature detection, extraction, and then matching is great for finding a single match, but it seems to be the wrong method for detecting multiple matches since it would be difficult to determine which features belong to which marker?
The fiducial markers would be the same, and would not be in a known location in the scene.
Below is some sample code. I was trying to match the first fiducial marker with x number of keypoints, and then use the remaining keypoints to match the second marker. However, this is not robust at all. Does anybody have any suggestions?
OrbFeatureDetector detector;
vector<KeyPoint> keypoints1, keypoints2;
detector.detect(im1, keypoints1);
detector.detect(im2, keypoints2);
Mat display_im1, display_im2;
drawKeypoints(im1, keypoints1, display_im1, Scalar(0,0,255));
drawKeypoints(im2, keypoints2, display_im2, Scalar(0,0,255));
SiftDescriptorExtractor extractor;
Mat descriptors1, descriptors2;
extractor.compute( im1, keypoints1, descriptors1 );
extractor.compute( im2, keypoints2, descriptors2 );
BFMatcher matcher;
vector< DMatch > matches1, matches2;
matcher.match( descriptors1, descriptors2, matches1 );
sort (matches1.begin(), matches1.end());
matches2 = matches;
int numElementsToSave = 50;
Mat match_im1, match_im2;
drawMatches( im1, keypoints1, im2, keypoints2,
matches1, match_im1, Scalar::all(-1), Scalar::all(-1),
vector<char>(), DrawMatchesFlags::NOT_DRAW_SINGLE_POINTS );
drawMatches( im1, keypoints1, im2, keypoints2,
matches2, match_im2, Scalar::all(-1), Scalar::all(-1),
vector<char>(), DrawMatchesFlags::NOT_DRAW_SINGLE_POINTS );
I have never tried it before, but here you have a good explanation about the detection of multiple occurrences:
This tutorial shows how enabling multiple detection of the same
object. To enable multiple detection, the parameter
General->multiDetection should be checked. The approach is as
As usual, we match all features between the objects and the scene.
For an object which is in the scene two times (or more), it should have twice of matched features. We apply a RANSAC algorithm to find a
homography. The inliers should belong to only one occurrence of the
object, all others considered as outliers. We redo the homography
process on the outliers, then find another homography… we do this
process until no homography can be computed.
It may happens that a homography can be found superposed on a previous one using the outliers. You could set
Homography->ransacReprojThr (in pixels) higher to accept more
inliers in the homographies computed, which would decrease the chance
of superposed detections. Another way is to ignore superposed
homographies on a specified radius with the parameter General->multiDetectionRadius (in pixels).
For more information see the page below:
I developed a semi-automatic algorithm to detect multiple markers (interest points) from image using findContours method on a binary image (my markers are white on a green surface then I limit my search to area constraint as I know how big is each marker in each frame. of course this got some false positives but it was good enough. I couldn't see the picture in your post as tinypic is blocked here for some reason. But you can use the matchShape opencv function to eliminate the bad contours.
here is the part of code I made for this.
Mat tempFrame;
cvtColor(BallFrame, tempFrame, COLOR_BGR2GRAY);
GaussianBlur(tempFrame, tempFrame, Size(15, 15), 2, 2); // remove noise
Mat imBw;
threshold(tempFrame, imBw, 220, 255, THRESH_BINARY); // High threshold to get better results
std::vector<std::vector<Point> > contours;
std::vector<Vec4i> hierarchy;
findContours(imBw, contours, hierarchy, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_SIMPLE);
Point2f center;
float radius = 0.0;
for (int i = 0; i < contours.size(); i++)
double area = contourArea(contours[i]);
if (area > 1 && area < 4000) {
minEnclosingCircle(contours[i], center, radius);
if (radius < 50) // eliminate wide narrow contours
// You can use `matchShape` here to match your marker shape
I hope this will help

OpenCV, Haar cascade classifier: scaling feature or computing image pyramid?

I read the paper of Viola and Jones.
They stated clearly in the paper that their algorithm is faster than others because calculation of image pyramid is avoided by scaling feature rectangles.
But I googled around for a long time, only to find that OpenCV implements the image pyramid method instead of scaling the feature rectangles. And integral image is computed for all sub images in the pyramid. And this is done for every frame if this algorithm is used to process video in stead of picture.
What's the rationale of this choice? I don't quite get it.
All I can understand is completely the opposite: for video applications, scaling the features only needs to be done once, and the scaled features can be reused by all the frames. And only the integral image of the whole image needs to be computed .
Am I correct on this?
Viola and Jones also presented a 15fps frame rate on a Pentium 3 computer, but I hardly see anybody achieving that performance with the OpenCV implementation on modern computer. That's strange, isn't it?
Any input will be helpful. Thank you.
I have tried to verify this by looking into their code. This is based on version 2.4.10. The short answer is: both. OpenCv scales the image according to the scale factor at which the detection is performed and it can also rescale the features at different window sizes according to the scale factor. Justification is bellow:
1. Looking at the older functions, cvHaarDetectObjectsForROC from objdetect module (haar.cpp). Notable arguments are the CvSize minSize, CvSize maxSize and const CvArr* _img, double scaleFactor, int minNeighbors.
cvHaarDetectObjectsForROC( const CvArr* _img,
CvHaarClassifierCascade* cascade, CvMemStorage* storage,
std::vector<int>& rejectLevels, std::vector<double>& levelWeights,
double scaleFactor, int minNeighbors, int flags,
CvSize minSize, CvSize maxSize, bool outputRejectLevels )
CvMat stub, *img = (CvMat*)_img;
.... // skip a bit ahead to this part
if( flags & CV_HAAR_SCALE_IMAGE )
CvSize winSize0 = cascade->orig_window_size; // this would be the trained size of 24x24 pixels mentioned in the paper
for( factor = 1; ; factor *= scaleFactor )
// detection window for current scale
CvSize winSize = { cvRound(winSize0.width*factor), cvRound(winSize0.height*factor) };
//resized image size
CvSize sz = { cvRound( img->cols/factor ), cvRound( img->rows/factor ) };
// take every possible scale factor as long as the resulting window doesn't exceed the maximum size given and is bigger than the minimum one
if( winSize.width > maxSize.width || winSize.height > maxSize.height )
if( winSize.width < minSize.width || winSize.height < minSize.height )
img1 = cvMat( sz.height, sz.width, CV_8UC1, imgSmall->data.ptr );
... // skip sum, square sum, tilted sums a.k.a interal image arrays initialization
cvResize( img, &img1, CV_INTER_LINEAR ); // scaling down the image here
cvIntegral( &img1, &sum1, &sqsum1, _tilted ); // compute integral representation for the scaled down version
... //skip some lines
cvSetImagesForHaarClassifierCascade( cascade, &sum1, &sqsum1, _tilted, 1. ) //-> set the structures and also rescales the feature according to the last parameter which is the scale factor.
// Notice it is 1.0 because the image was scaled down this time.
<call detection function with notable arguments: cascade,... factor, cv::Mat(&sum1), cv::Mat(&sqsum1) ...>
// the above call is a parallel for that evaluates a window at a certain position in the image with the cascade classifier
// note the class naming HaarDetectObjects_ScaleImage_Invoker in the actual code and skipped here.
} // end for
} // if
int n_factors = 0; // total number of factors
cvIntegral( img, sum, sqsum, tilted ); // -> makes a single integral image for the given image (the original one passed in the cvHaarDetectObjects)
// below aims to see the total number of scale factors at which detection is performed.
for( n_factors = 0, factor = 1;
factor*cascade->orig_window_size.width < img->cols - 10 &&
factor*cascade->orig_window_size.height < img->rows - 10;
n_factors++, factor *= scaleFactor );
... // skip some lines
for( ; n_factors-- > 0; factor *= scaleFactor )
CvSize winSize = { cvRound( cascade->orig_window_size.width * factor ), cvRound( cascade->orig_window_size.height * factor )};
... // skip check for minSize and maxSize here
cvSetImagesForHaarClassifierCascade( cascade, sum, sqsum, tilted, factor ); // -> notice here the scale factor is given so that the trained Haar features can be rescaled.
<parallel for detect call given a startX, endX and startY endY, window size and cascade> // Note the name here HaarDetectObjects_ScaleCascade_Invoker used in actual code and skipped here
} // end of if
... // skip rest
} // end of cvHaarDetectObjectsForROC function
If you take the new API (C++) the class CascadeClassifier if it loads the new .xml format of the cascade outputted by the traincascade.exe application will scale the image according to the scale factor (for Haars it should be up from what I know of). The detectMultiScale method of the class will default to the detectSingleScale method at some point in the code:
if( !detectSingleScale( scaledImage, stripCount, processingRectSize, stripSize, yStep, factor, candidates, rejectLevels, levelWeights, outputRejectLevels ) )
break; // from cascadedetect.cpp in the detectMultiScale method.
Possible reason I can think of: In order to have a unified design in C++ this is the only method that can achieve transparency with a single interface for different types of features.
I left the trail of thought in case I have understood something wrong or have omitted something another user can correct me by verifying this trail.

OpenCV 2.4.3 - warpPerspective with reversed homography on a cropped image

When finding a reference image in a scene using SURF, I would like to crop the found object in the scene, and "straighten" it back using warpPerspective and the reversed homography matrix.
Meaning, let's say I have this SURF result:
Now, I would like to crop the found object in the scene:
and "straighten" only the cropped image with warpPerspective using the reversed homography matrix. The result I'm aiming at is that I'll get an image containing, roughly, only the object, and some distorted leftovers from the original scene (as the cropping is not a 100% the object alone).
Cropping the found object, and finding the homography matrix and reversing it are simple enough. Problem is, I can't seem to understand the results from warpPerspective. Seems like the resulting image contains only a small portion of the cropped image, and in a very large size.
While researching warpPerspective I found that the resulting image is very large due to the nature of the process, but I can't seem to wrap my head around how to do this properly. Seems like I just don't understand the process well enough. Would I need to warpPerspective the original (not cropped) image and than crop the "straightened" object?
Any advice?
try this.
given that you have the unconnected contour of your object (e.g. the outer corner points of the box contour) you can transform them with your inverse homography and adjust that homography to place the result of that transformation to the top left region of the image.
compute where those object points will be warped to (use the inverse homography and the contour points as input):
cv::Rect computeWarpedContourRegion(const std::vector<cv::Point> & points, const cv::Mat & homography)
std::vector<cv::Point2f> transformed_points(points.size());
for(unsigned int i=0; i<points.size(); ++i)
// warp the points
transformed_points[i].x = points[i].x * homography.at<double>(0,0) + points[i].y * homography.at<double>(0,1) + homography.at<double>(0,2) ;
transformed_points[i].y = points[i].x * homography.at<double>(1,0) + points[i].y * homography.at<double>(1,1) + homography.at<double>(1,2) ;
// dehomogenization necessary?
if(homography.rows == 3)
float homog_comp;
for(unsigned int i=0; i<transformed_points.size(); ++i)
homog_comp = points[i].x * homography.at<double>(2,0) + points[i].y * homography.at<double>(2,1) + homography.at<double>(2,2) ;
transformed_points[i].x /= homog_comp;
transformed_points[i].y /= homog_comp;
// now find the bounding box for these points:
cv::Rect boundingBox = cv::boundingRect(transformed_points);
return boundingBox;
modify your inverse homography (result of computeWarpedContourRegion and inverseHomography as input)
cv::Mat adjustHomography(const cv::Rect & transformedRegion, const cv::Mat & homography)
if(homography.rows == 2) throw("homography adjustement for affine matrix not implemented yet");
// unit matrix
cv::Mat correctionHomography = cv::Mat::eye(3,3,CV_64F);
// correction translation
correctionHomography.at<double>(0,2) = -transformedRegion.x;
correctionHomography.at<double>(1,2) = -transformedRegion.y;
return correctionHomography * homography;
you will call something like
cv::warpPerspective(objectWithBackground, output, adjustedInverseHomography, sizeOfComputeWarpedContourRegionResult);
hope this helps =)

How can I use Homography?

I am developing a program where I receive 2 pictures of the same scene, but one of them has a distortion:
Mat img_1 = imread(argv[1], 0); // nORMAL pICTURE
Mat img_2 = imread(argv[2], 0); // PICTURE WITH DISTORTION
I AM ALREADY ABLE TO FIND THE KEYPOINTS AND I WOULD LIKE TO KNOW IF I CAN USE THE FUNCTION cv::findHomography for this... In any case, how to do so?
A homography will map one image plane to another. That means that if your distortion can be expressed as a 3x3 matrix, findHomography is what you want. If not, then it isn't what you want. It takes two vectors of corresponding points as input and will return the 3x3 matrix that best represents the transform between those points.
Alright, so suppose I've two pictures (A and B) slightly distorted one from the other, where there are translation, rotation and scale differences between them (for example, these pictures:)
Ssoooooooo what I need is to apply a kind of transformation in pic B so it compensates the distortion/translation/rotation that exists to make both pictures with the same size, orientation and with no translation
I've already extracted the points and found the Homography, as shown bellow. But I don'know how to use the Homography to transform Mat img_B so it looks like Mat img_A. Any idea?
//-- Localize the object from img_1 in img_2
std::vector<Point2f> obj;
std::vector<Point2f> scene;
for (unsigned int i = 0; i < good_matches.size(); i++) {
//-- Get the keypoints from the good matches
Mat H = findHomography(obj, scene, CV_RANSAC);

OpenCV - Image Stitching

I am using following code to stitch to input images. For an unknown
reason the output result is crap!
It seems that the homography matrix is wrong (or is affected wrongly)
because the transformed image is like an "exploited star"!
I have commented the part that I guess is the source of the problem
but I cannot realize it.
Any help or point is appriciated!
Have a nice day,
void Stitch2Image(IplImage *mImage1, IplImage *mImage2)
// Convert input images to gray
IplImage* gray1 = cvCreateImage(cvSize(mImage1->width, mImage1->height), 8, 1);
cvCvtColor(mImage1, gray1, CV_BGR2GRAY);
IplImage* gray2 = cvCreateImage(cvSize(mImage2->width, mImage2->height), 8, 1);
cvCvtColor(mImage2, gray2, CV_BGR2GRAY);
// Convert gray images to Mat
Mat img1(gray1);
Mat img2(gray2);
// Detect FAST keypoints and BRIEF features in the first image
FastFeatureDetector detector(50);
BriefDescriptorExtractor descriptorExtractor;
BruteForceMatcher<L1<uchar> > descriptorMatcher;
vector<KeyPoint> keypoints1;
detector.detect( img1, keypoints1 );
Mat descriptors1;
descriptorExtractor.compute( img1, keypoints1, descriptors1 );
/* Detect FAST keypoints and BRIEF features in the second image*/
vector<KeyPoint> keypoints2;
detector.detect( img1, keypoints2 );
Mat descriptors2;
descriptorExtractor.compute( img2, keypoints2, descriptors2 );
vector<DMatch> matches;
descriptorMatcher.match(descriptors1, descriptors2, matches);
if (matches.size()==0)
vector<Point2f> points1, points2;
for(size_t q = 0; q < matches.size(); q++)
// Create the result image
result = cvCreateImage(cvSize(mImage2->width * 2, mImage2->height), 8, 3);
// Copy the second image in the result image
cvSetImageROI(result, cvRect(mImage2->width, 0, mImage2->width, mImage2->height));
cvCopy(mImage2, result);
// Create warp image
IplImage* warpImage = cvCloneImage(result);
/************************** Is there anything wrong here!? *******************/
// Find homography matrix
Mat H = findHomography(Mat(points1), Mat(points2), 8, 3.0);
CvMat HH = H; // Is this line converted correctly?
// Transform warp image
cvWarpPerspective(mImage1, warpImage, &HH);
// Blend
blend(result, warpImage);
This is what I would suggest you to try, in this order:
1) Use CV_RANSAC option for homography. Refer http://opencv.willowgarage.com/documentation/cpp/calib3d_camera_calibration_and_3d_reconstruction.html
2) Try other descriptors, particularly SIFT or SURF which ship with OpenCV. For some images FAST or BRIEF descriptors are not discriminating enough. EDIT (Aug '12): The ORB descriptors, which are based on BRIEF, are quite good and fast!
3) Try to look at the Homography matrix (step through in debug mode or print it) and see if it is consistent.
4) If above does not give you a clue, try to look at the matches that are formed. Is it matching one point in one image with a number of points in the other image? If so the problem again should be with the descriptors or the detector.
My hunch is that it is the descriptors (so 1) or 2) should fix it).
Also switch to Hamming distance instead of L1 distance in BruteForceMatcher. BRIEF descriptors are supposed to be compared using Hamming distance.
Your homography, might calculated based on wrong matches and thus represent bad allignment.
I suggest to path the matrix through additional check of interdependancy between its rows.
You can use the following code:
bool cvExtCheckTransformValid(const Mat& T){
// Check the shape of the matrix
if (T.empty())
return false;
if (T.rows != 3)
return false;
if (T.cols != 3)
return false;
// Check for linear dependency.
Mat tmp;
tmp /= T.row(1);
Scalar mean;
Scalar stddev;
double X = abs(stddev[0]/mean[0]);
printf("std of H:%g\n",X);
if (X < 0.8)
return false;
return true;
