I am working on a project to detect object of interest using background subtraction and track them using optical flow in OpenCV C++. I was able to detect the object of interest using background subtraction. I was able to implement OpenCV Lucas Kanade optical flow on separate program. But, I am stuck at how to these two program in a single program. frame1 holds the actual frame from the video, contours2are the selected contours from the foreground object.
To summarize, how do I feed the forground object obtained from Background subtraction method to the calcOpticalFlowPyrLK? Or, help me if my approach is wrong. Thank you in advance.
Mat mask = Mat::zeros(fore.rows, fore.cols, CV_8UC1);
drawContours(mask, contours2, -1, Scalar(255), 4, CV_FILLED);
if (first_frame)
goodFeaturesToTrack(mask, features_next, 1000, 0.01, 10, noArray(), 3, false, 0.04);
fm0 = mask.clone();
features_prev = features_next;
first_frame = false;
if (!features_prev.empty())
calcOpticalFlowPyrLK(fm0, mask, features_prev, features_next, featuresFound, err, winSize, 3, termcrit, 0, 0.001);
for (int i = 0; i < features_prev.size(); i++)
line(frame1, features_prev[i], features_next[i], CV_RGB(0, 0, 255), 1, 8);
imshow("final optical", frame1);
goodFeaturesToTrack(mask, features_next, 1000, 0.01, 10, noArray(), 3, false, 0.04);
features_prev = features_next;
fm0 = mask.clone();
Your approach of using optical flow for tracking is wrong. The idea behind optical flow approach is that a movning point in two consequtive images has at the start and endpoint the same pixel intensity. That means a motion for a feautre is estimated by observing its appearance from the start images and search for the structure in the end image (very simplified).
calcOpticalFlowPyrLK is a point tracker that means point in the previous images are tracked to the current one. Therefore the methods need the original gray valued image of your system. Because it only can estimate motion on structured / textured region ( you need x and y gradients in your image).
I think your code should do somethink like:
Extract objects by background substraction (by contour) this is in the literature called a blob
Extract objects in the next image and apply a blob-assoziation (which countour belong to whom) this is also called blob-tracken
It is possible to do a blob-tracking with the calcOpticalFlowPyrLK. E.g. in a very simple way:
Track points from the countour or a point inside the blob.
Assoziation: The previous contour is one of the current if the points track, that belong to the previous contour are located at the current countour
I think the output of background subtraction in OpenCV not Gray Scale image. for input Optical flow we need gray scale images.
I am developing an android application for analyzing chess games based on series of photos. To process images, I am using OpenCV. My question is how can I detect that there is a player's hand on a picture? Because I would like to filter those photos and analyze only the ones with the only chessboard on them.
So far I managed to get the Canny, so from an image like that
original image
I am able to get that canny
But I have no idea what can I do next...
The code I used to get Canny:
Mat gray, blur, cannyed;
cvtColor(img, gray, CV_BGR2GRAY);
GaussianBlur(gray, blur, Size(7, 7), 0, 0);
Canny(blur, cannyed, 50, 100, 3);
I would highly appreciate any ideas and advice on what to do next and what OpenCV functions can I use.
You have a very nice spectrum in the chess board. A hand in it messes up the frequencies built up by the regular transitions between the black and white squares. Try moving a bigger square (let's say the size of a 4.5 x 4.5 squares) around and see what happens to the frequencies.
Another approach if you have the sequence of pictures taken as a movie is to analyse the motions. Take the difference of consecutive frames (low pass filter them a bit first) to detect motions. Filter the motions in time (over several frames). Then threshold these motions to get a binary image. Erode the binary shapes to filter out small moving objects (noise, chess figure) be able to detect if any larger moving shape is on the board (e.g. a hand).
Here, After Canny Edge detection the morphological operations of horizontal and vertical lines extraction process i tried.
Mat horizontal = cannyed.clone();
// Specify size on horizontal axis
int horizontalsize = horizontal.cols / 60;
// Create structure element for extracting horizontal lines through morphology operations
Mat horizontalStructure = getStructuringElement(MORPH_RECT, Size(horizontalsize,1));
erode(horizontal, horizontal, horizontalStructure, Point(-1, -1),2);
dilate(horizontal, horizontal, horizontalStructure, Point(-1, -1),1);
Mat vertical = cannyed.clone();
// Specify size on horizontal axis
int verticalsize = vertical.cols / 60;
// Create structure element for extracting horizontal lines through morphology operations
Mat verticalStructure = getStructuringElement(MORPH_RECT, Size(1,verticalsize));
erode(vertical, vertical, verticalStructure, Point(-1, -1));
dilate(vertical, vertical, verticalStructure, Point(-1, -1),2);
the results are ,
Horizontal Lines in the chess board
Then, from the figure you can see there is a proper interval in between the lines. The area where hand is present there is more interval in lines.
In that location, if contour is done, the hand (or any object ) over the chess board can be detected.
This helps to solve for any object when placed over chess board.
Thank you all very much for your suggestions.
So I solved the problem mostly using Gowthaman's method. First I use his code to generate vertical and horizontal lines. Then I combine them like this:
Mat combined = vertical + horizontal;
So I get something like that when there is no hand
or like that when there is a hand
Next I count white pixels using the code:
int GetPixelCount(Mat image, uchar color)
int result = 0;
for (int i = 0; i < image.rows; i++)
for (int j = 0; j < image.cols; j++)
if (image.at<uchar>(Point(j, i)) == color)
return result;
I do that for every photo in the series. First photo is always without a hand, so I use is as a template. If current photo has less then 98% of template white pixels then I deduce there is hand (or something else) in it.
Most likely this is not an optimal method and has lots of weaknesses, but it is very simple and works for me just fine :)
I've got problem with precise detection of markers using OpenCV.
I've recorded video presenting that issue: http://youtu.be/IeSSW4MdyfU
As you see I'm markers that I'm detecting are slightly moved at some camera angles. I've read on the web that this may be camera calibration problems, so I'll tell you guys how I'm calibrating camera, and maybe you'd be able to tell me what am I doing wrong?
At the beginnig I'm collecting data from various images, and storing calibration corners in _imagePoints vector like this
std::vector<cv::Point2f> corners;
_imageSize = cvSize(image->size().width, image->size().height);
bool found = cv::findChessboardCorners(*image, _patternSize, corners);
if (found) {
cv::Mat *gray_image = new cv::Mat(image->size().height, image->size().width, CV_8UC1);
cv::cvtColor(*image, *gray_image, CV_RGB2GRAY);
cv::cornerSubPix(*gray_image, corners, cvSize(11, 11), cvSize(-1, -1), cvTermCriteria(CV_TERMCRIT_EPS+ CV_TERMCRIT_ITER, 30, 0.1));
cv::drawChessboardCorners(*image, _patternSize, corners, found);
Than, after collecting enough data I'm calculating camera matrix and coefficients with this code:
std::vector< std::vector<cv::Point3f> > *objectPoints = new std::vector< std::vector< cv::Point3f> >();
for (unsigned long i = 0; i < _imagePoints->size(); i++) {
std::vector<cv::Point2f> currentImagePoints = _imagePoints->at(i);
std::vector<cv::Point3f> currentObjectPoints;
for (int j = 0; j < currentImagePoints.size(); j++) {
cv::Point3f newPoint = cv::Point3f(j % _patternSize.width, j / _patternSize.width, 0);
std::vector<cv::Mat> rvecs, tvecs;
static CGSize size = CGSizeMake(_imageSize.width, _imageSize.height);
cv::Mat cameraMatrix = [_userDefaultsManager cameraMatrixwithCurrentResolution:size]; // previously detected matrix
cv::Mat coeffs = _userDefaultsManager.distCoeffs; // previously detected coeffs
cv::calibrateCamera(*objectPoints, *_imagePoints, _imageSize, cameraMatrix, coeffs, rvecs, tvecs);
Results are like you've seen in the video.
What am I doing wrong? is that an issue in the code? How much images should I use to perform calibration (right now I'm trying to obtain 20-30 images before end of calibration).
Should I use images that containg wrongly detected chessboard corners, like this:
or should I use only properly detected chessboards like these:
I've been experimenting with circles grid instead of of chessboards, but results were much worse that now.
In case of questions how I'm detecting marker: I'm using solvepnp function:
solvePnP(modelPoints, imagePoints, [_arEngine currentCameraMatrix], _userDefaultsManager.distCoeffs, rvec, tvec);
with modelPoints specified like this:
markerPoints3D.push_back(cv::Point3d(-kMarkerRealSize / 2.0f, -kMarkerRealSize / 2.0f, 0));
markerPoints3D.push_back(cv::Point3d(kMarkerRealSize / 2.0f, -kMarkerRealSize / 2.0f, 0));
markerPoints3D.push_back(cv::Point3d(kMarkerRealSize / 2.0f, kMarkerRealSize / 2.0f, 0));
markerPoints3D.push_back(cv::Point3d(-kMarkerRealSize / 2.0f, kMarkerRealSize / 2.0f, 0));
and imagePoints are coordinates of marker corners in processing image (I'm using custom algorithm to do that)
In order to properly debug your problem I would need all the code :-)
I assume you are following the approach suggested in the tutorials (calibration and pose) cited by #kobejohn in his comment and so that your code follows these steps:
collect various images of chessboard target
find chessboard corners in images of point 1)
calibrate the camera (with cv::calibrateCamera) and so obtain as a result the intrinsic camera parameters (let's call them intrinsic) and the lens distortion parameters (let's call them distortion)
collect an image of your own custom target (the target is seen at 0:57 in your video) and it is shown in the following figure and find some relevant points in it (let's call the point you found in image image_custom_target_vertices and world_custom_target_vertices the corresponding 3D points).
estimate the rotation matrix (let's call it R) and the translation vector (let's call it t) of the camera from the image of your own custom target you get in point 4), with a call to cv::solvePnP like this one cv::solvePnP(world_custom_target_vertices,image_custom_target_vertices,intrinsic,distortion,R,t)
giving the 8 corners cube in 3D (let's call them world_cube_vertices) you get the 8 2D image points (let's call them image_cube_vertices) by means of a call to cv2::projectPoints like this one cv::projectPoints(world_cube_vertices,R,t,intrinsic,distortion,image_cube_vertices)
draw the cube with your own draw function.
Now, the final result of the draw procedure depends on all the previous computed data and we have to find where the problem lies:
Calibration: as you observed in your answer, in 3) you should discard the images where the corners are not properly detected. You need a threshold for the reprojection error in order to discard "bad" chessboard target images. Quoting from the calibration tutorial:
Re-projection Error
Re-projection error gives a good estimation of just how exact is the
found parameters. This should be as close to zero as possible. Given
the intrinsic, distortion, rotation and translation matrices, we first
transform the object point to image point using cv2.projectPoints().
Then we calculate the absolute norm between what we got with our
transformation and the corner finding algorithm. To find the average
error we calculate the arithmetical mean of the errors calculate for
all the calibration images.
Usually you will find a suitable threshold with some experiments. With this extra step you will get better values for intrinsic and distortion.
Finding you own custom target: it does not seem to me that you explain how you find your own custom target in the step I labeled as point 4). Do you get the expected image_custom_target_vertices? Do you discard images where that results are "bad"?
Pose of the camera: I think that in 5) you use intrinsic found in 3), are you sure nothing is changed in the camera in the meanwhile? Referring to the Callari's Second Rule of Camera Calibration:
Second Rule of Camera Calibration: "Thou shalt not touch the lens
after calibration". In particular, you may not refocus nor change the
f-stop, because both focusing and iris affect the nonlinear lens
distortion and (albeit less so, depending on the lens) the field of
view. Of course, you are completely free to change the exposure time,
as it does not affect the lens geometry at all.
And then there may be some problems in the draw function.
So, I've experimented a lot with my code, and I still haven't fixed the main issue (shifted objects), but I've managed to answer some of calibration questions I've asked.
First of all - in order to obtain good calibration results you have to use images with properly detected grid elements/circles positions!. Using all captured images in calibration process (even those that aren't properly detected) will result bad calibration.
I've experimented with various calibration patterns:
Asymmetric circles pattern (CALIB_CB_ASYMMETRIC_GRID), give much worse results than any other pattern. By worse results I mean that it produces a lot of wrongly detected corners like these:
I've experimented with CALIB_CB_CLUSTERING and it haven't helped much - in some cases (different light environment) it got better, but not much.
Symmetric circles pattern (CALIB_CB_SYMMETRIC_GRID) - better results than asymmetric grid, but still I've got much worse results than standard grid (chessboard). It often produces errors like these:
Chessboard (found using findChessboardCorners function) - this method is producing best possible results - it doesn't produce misaligned corners very often, and almost every calibration is producing similar results to best-possible results from symmetric circles grid
For every calibration I've been using 20-30 images that were coming from different angles. I've tried even with 100+ images but it haven't produced noticeable change in calibration results than smaller amount of images. It's worth noticing that larger number of test images is increasing time needed to compute camera parameters in non-linear way (100 test images in 480x360 resolution are computing 25 minutes in iPad4, compared with 4 minutes with ~50 images)
I've also experimented with solvePNP parameters - but is also haven't gave me any acceptable results: I've tried all 3 detection methods (ITERATIVE, EPNP and P3P), but I haven't seen aby noticeable change.
Also I've tried with useExtrinsicGuess set to true, and I've used rvec and tvec from previous detection, but this one resulted with complete disapperance of detected cube.
I've ran out of ideas - what else could be affecting these shifting problems?
For those still interested:
this is an old question, but I think your problem is not the bad calibration.
I developed an AR app for iOS, using OpenCV and SceneKit, and I have had your same issue.
I think your problem is the wrong render position of the cube:
OpenCV's solvePnP returns the X, Y, Z coordinates of the marker center, but you wanna render the cube over the marker, at a specific distance along the Z axis of the marker, exactly at one half of the cube side size. So you need to improve the Z coordinate of the marker translation vector of this distance.
In fact, when you see your cube from the top, the cube is render properly.
I have done an image in order to explain the problem, but my reputation prevent to post it.
I have problems getting the contours of an object in my picture(s).
In order to delete all noise, I use adjustROI() and Canny().
I also tried erode() and dillate(), Laplacian(), GaussianBlur(), Sobel()... and I even found this code snippet to sharpen a picture:
GaussianBlur(src, dst_gaussian, Size(0, 0), 3);
addWeighted(src, 1.5, dst_gaussian, -0.5, 0, dst);
But my result is always the same: My object is filled with black and white colour (like noise on a TV-screen) so that it is impossible to get the contours with findContours() (findContours() finds a million of contours, but not the one of the whole object. I check this with drawContours()).
I use C++ and I load my picture as a grayscale Mat (for Canny it has to be grayscale). My object has a diffenent shape on every picture, but it is always around the middle of the picture.
I either need to find a way to get a better coloured object by image processing - but I don't know what else to try - or a way how to fill the object with colour after image processing (without having it's contours, because this is what I want in the end).
Any ideas are welcome. Thank you in advance.
I found a solution that works in most cases. I fill my object using the probabilistic Hough transform HoughLinesP().
vector<Vec4i> lines;
HoughLinesP(dst, lines, 1, CV_PI/180, 80, 30, 10);
for(size_t i = 0; i < lines.size(); i++)
line(color_dst, Point(lines[i][0], lines[i][1]), Point(lines[i][2], lines[i][3]), Scalar(0,0,255), 3, 8);
This is from some sample code OpenCV documentation provides.
After using some edge-detection algorithm (like Canny()), the probabilistic Hough transform finds objects in binary pictures. The algorithm finds lines, which, if drawn, represent the whole object. Of course some of the parameters have to be adapted for some kind of picture.
I'm not sure if this will work on every picture or every object, but in my case, it does.
I'm doing a coin detection using JavaCV (OpenCV wrapper) but I have a little problem when the coins are connected. If I try to erode them to separate these coins they loose their circle form and if I try to count pixels inside each coin there can be problems so that some coins can be miscounted as one that bigger. What I want to do is firstly to reshape them and make them like a circle (equal with the radius of that coin) and then count pixels inside them.
Here is my thresholded image:
And here is eroded image:
Any suggestions? Or is there any better way to break bridges between coins?
It looks similar to a problem I recently had to separate bacterial colonies growing on agar plates.
I performed a distance transform on the thresholded image (in your case you will need to invert it).
Then found the peaks of the distance map (by calculating the difference between a the dilated distance map and the distance map and finding the zero values).
Then, I assumed each peak to be the centre of a circle (coin) and the value of the peak in the distance map to be the radius of the circle.
Here is the result of your image after this pipeline:
I am new to OpenCV, and c++ so my code is probably very messy, but I did that:
int main( int argc, char** argv ){
cv::Mat objects, distance,peaks,results;
std::vector<std::vector<cv::Point> > contours;
cv::cvtColor(objects, objects, CV_BGR2GRAY);
cv::blur( objects,objects,cv::Size(3,3));
/*Applies a distance transform to "objects".
* The result is saved in "distance" */
/* In order to find the local maxima, "distance"
* is subtracted from the result of the dilatation of
* "distance". All the peaks keep the save value */
/* Now all the peaks should be exactely 0*/
/* And the non-peaks 255*/
/* Only the zero values of "peaks" that are non-zero
* in "objects" are the real peaks*/
/* The peaks that are distant from less than
* 2 pixels are merged by dilatation */
/* In order to map the peaks, findContours() is used.
* The results are stored in "contours" */
cv::findContours(peaks, contours, CV_RETR_CCOMP, CV_CHAIN_APPROX_SIMPLE);
/* The next steps are applied only if, at least,
* one contour exists */
/* Defines vectors to store the moments of the peaks, the center
* and the theoritical circles of the object of interest*/
std::vector <cv::Moments> moms(contours.size());
std::vector <cv::Point> centers(contours.size());
std::vector<cv::Vec3f> circles(contours.size());
float rad,x,y;
/* Caculates the moments of each peak and then the center of the peak
* which are approximatively the center of each objects of interest*/
for(unsigned int i=0;i<contours.size();i++) {
moms[i]= cv::moments(contours[i]);
centers[i]= cv::Point(moms[i].m10/moms[i].m00,moms[i].m01/moms[i].m00);
x= (float) (centers[i].x);
y= (float) (centers[i].y);
if(x>0 && y>0){
rad= (float) (distance.at<float>((int)y,(int)x)+1);
circles[i][0]= x;
circles[i][3]= y;
circles[i][2]= rad;
cv::circle(results,centers[i],rad+1,cv::Scalar( 255, 0,0 ), 2, 4, 0 );
return 1;
You don't need to erode, just a good set of params for cvHoughCircles():
The code used to generate this image came from my other post: Detecting Circles, with these parameters:
CvSeq* circles = cvHoughCircles(gray, storage, CV_HOUGH_GRADIENT, 1, gray->height/12, 80, 26);
OpenCV has a function called HoughCircles() that can be applied to your case, without separating the different circles. Can you call it from JavaCV ? If so, it will do what you want (detecting and counting circles), bypassing your separation problem.
The main point is to detect the circles accurately without separating them first. Other algorithms (such as template matching can be used instead of generalized Hough transform, but you have to take into account the different sizes of the coins.
The usual approach for erosion-based object recognition is to label continuous regions in the eroded image and then re-grow them until they match the regions in the original image. Hough circles is a better idea in your case, though.
After detecting the joined coins, I recommend applying morphological operations to classify areas as "definitely coin" and "definitely not coin", apply a distance transformation, then run the watershed to determine the boundaries. This scenario is actually the demonstration example for the watershed algorithm in OpenCV − perhaps it was created in response to this question.
I am working on a project aimed to track eye pupil. For this I have made a head-mounted system that captures the images of the eye. Completed with the hardware portion I am struck in software part. I am using opencv. Please let me know what would be the most efficient way to track the pupil. Houghcircles didn't performing well.
After that I have also tried with HSV filter and here is the code and
link to screenshot of the raw-image and processed one. Please help me to resolve this issue. The link also contains video of eye pupil that I am using in this code.
include "cv.h"
IplImage* GetThresholdedImage(IplImage* img)
IplImage *imgHSV=cvCreateImage(cvGetSize(img),8,3);
IplImage *imgThresh=cvCreateImage(cvGetSize(img),8,1);
cvInRangeS(imgHSV,cvScalar(0, 84, 0, 0),cvScalar(179, 256, 11, 0),imgThresh);
return imgThresh;
void main(int *argv,char **argc)
IplImage *imgScribble= NULL;
char c=0;
CvCapture *capture;
printf("Camera could not be initialized");
IplImage *img=0;
IplImage *timg=GetThresholdedImage(img);
CvMoments *moments=(CvMoments*)malloc(sizeof(CvMoments));
double moment10 = cvGetSpatialMoment(moments, 1, 0);
double moment01 = cvGetSpatialMoment(moments, 0, 1);
double area = cvGetCentralMoment(moments, 0, 0);
static int posX = 0;
static int posY = 0;
int lastX = posX;
int lastY = posY;
posX = moment10/area;
posY = moment01/area;
// Print it out for debugging purposes
printf("position (%d,%d)\n", posX, posY);
// We want to draw a line only if its a valid position
if(lastX>0 && lastY>0 && posX>0 && posY>0)
// Draw a yellow line from the previous point to the current point
cvLine(imgScribble, cvPoint(posX, posY), cvPoint(lastX, lastY), cvScalar(0,255,255), 5);
// Add the scribbling image and the frame...
cvAdd(img, imgScribble, img);
delete moments;
I am able to track the eye and find the center coordinates of pupil precisely.
First I thresholded the image taken by the head mounted camera. After that I have used contour finding algorithm then I find the centroid of all the contours. This gives me the center coordinates of eye pupil, this method is working fine in real time and also detecting eye blinking with very good accuracy.
Now, my aim is to embed this feature into a game(a racing game). In which If I look to left/right then the car moves left/right and If I blink the car slows down. How could I proceed now??? Would I need a game engine to do that?
I heard of some open source game engines compatible with visual studio 2010(unity etc.). Is it feasible??? If yes, how should I proceed ?
I am one of the developers of SimpleCV. We maintain an open-source python library for computer vision. You can download it at SimpleCV.org. SimpleCV is great for solving these types of problems by hacking on the command line. I was able to extract the pupil in only a couple lines of code. Here you go:
img = Image("eye4.jpg") # load the image
bm = BlobMaker() # create the blob extractor
# invert the image so the pupil is white, threshold the image, and invert again
# and then extract the information from the image
blobs = bm.extractFromBinary(img.invert().binarize(thresh=240).invert(),img)
if(len(blobs)>0): # if we got a blob
blobs[0].draw() # the zeroth blob is the largest blob - draw it
locationStr = "("+str(blobs[0].x)+","+str(blobs[0].y)+")"
# write the blob's centroid to the image
# save the image
# and show us the result.
Here are the results.
So your next steps are to use some sort of tracker, like a Kalmann filter, to track the pupil robustly. You may want to model the eye as a sphere and track the pupil's centroid in sphereical coordinates (i.e. theta and phi). You will also want to write a bit of code to detect blink events so the system doesn't go all wonky when the user blinks. I suggest using a canny edge detector to find the largest horizontal lines in the image and assuming those are the eye lids. I hope this helps and please let us know how your work progresses.
It all depends on how good your system must be. If it's a 2-months university project, that's ok to find and track some blobs or to use a ready-made solution, as Kscottz recommended.
But if you aim to have a more serious system, you must go deeper.
An approach I recommend you is to detect the face interest points. A good example is Active Appearance Models, which seems to be the best at tracking faces
It requires you a solid understanding of computer vision algorithms, good programming skills, and some work. But the results will be worth the effort.
And do not be fooled by the fact that the demos show whole-face tracking. You can train it to track anything: hands, eyes, flowers or leaves, etc.
(Before starting with AAM, you may want to read more about other face-tracking algorithms. They may be better for you)
This is my solution, I am able to track the eye and find the center coordinates of pupil precisely.
First I thresholded the image taken by the head mounted camera. After that I have used contour finding algorithm then I find the centroid of all the contours. This gives me the center coordinates of eye pupil, this method is working fine in real time and also detecting eye blinking with very good accuracy.