Related
I need find edges of document that in user hands.
1) Original image from camera:
2) Then i convert image to BG:
3) Then i make blur:
3) Finds edges in an image using the Canny:
4) And use dilate :
As you can see on the last image the contour around the map is torn and the contour is not determined. What is my error and how to solve the problem in order to determine the outline of the document completely?
This is code how i to do it:
final Mat mat = new Mat();
sourceMat.copyTo(mat);
//convert the image to black and white
Imgproc.cvtColor(mat, mat, Imgproc.COLOR_BGR2GRAY);
//blur to enhance edge detection
Imgproc.GaussianBlur(mat, mat, new Size(5, 5), 0);
if (isClicked) saveImageFromMat(mat, "blur", "blur");
//convert the image to black and white does (8 bit)
int thresh = 128;
Imgproc.Canny(mat, mat, thresh, thresh * 2);
//dilate helps to connect nearby line segments
Imgproc.dilate(mat, mat,
Imgproc.getStructuringElement(Imgproc.MORPH_RECT, new Size(3, 3)),
new Point(-1, -1),
2,
1,
new Scalar(1));
This answer is based on my above comment. If someone is holding the document, you cannot see the edge that is behind the user's hand. So, any method for detecting the outline of the document must be robust to some missing parts of the edge.
I suggest using a variant of the Hough transform to detect the document. The Wikipedia article about the Hough transform makes it sound quite scary (as Wikipedia often does with mathematical subjects), but don't be discouraged, actually they are not too difficult to understand or implement.
The original Hough transform detected straight lines in images. As explained in this OpenCV tutorial, any straight line in an image can be defined by 2 parameters: an angle θ and a distance r of the line from the origin. So you quantize these 2 parameters, and create a 2D array with one cell for every possible line that could be present in your image. (The finer the quantization you use, the larger the array you will need, but the more accurate the position of the found lines will be.) Initialize the array to zeros. Then, for every pixel that is part of an edge detected by Canny, you determine every line (θ,r) that the pixel could be part of, and increment the corresponding bin. After processing all pixels, you will have, for each bin, a count of how many pixels were detected on the line corresponding to that bin. Counts which are high enough probably represent real lines in the image, even if parts of the line are missing. So you just scan through the bins to find bins which exceed the threshold.
OpenCV contains Hough detectors for straight lines and circles, but not for rectangles. You could either use the line detector and check for 4 lines that form the edges of your document; or you could write your own Hough detector for rectangles, perhaps using the paper Jung 2004 for inspiration. Rectangles have at least 5 degrees of freedom (2D position, scale, aspect ratio, and rotation angle), and memory requirement for a 5D array obviously goes up pretty fast. But since the range of each parameter is limited (ie, the document's aspect ratio is known, and you can assume the document will be well centered and not rotated much) it is probably feasible.
I've got problem with precise detection of markers using OpenCV.
I've recorded video presenting that issue: http://youtu.be/IeSSW4MdyfU
As you see I'm markers that I'm detecting are slightly moved at some camera angles. I've read on the web that this may be camera calibration problems, so I'll tell you guys how I'm calibrating camera, and maybe you'd be able to tell me what am I doing wrong?
At the beginnig I'm collecting data from various images, and storing calibration corners in _imagePoints vector like this
std::vector<cv::Point2f> corners;
_imageSize = cvSize(image->size().width, image->size().height);
bool found = cv::findChessboardCorners(*image, _patternSize, corners);
if (found) {
cv::Mat *gray_image = new cv::Mat(image->size().height, image->size().width, CV_8UC1);
cv::cvtColor(*image, *gray_image, CV_RGB2GRAY);
cv::cornerSubPix(*gray_image, corners, cvSize(11, 11), cvSize(-1, -1), cvTermCriteria(CV_TERMCRIT_EPS+ CV_TERMCRIT_ITER, 30, 0.1));
cv::drawChessboardCorners(*image, _patternSize, corners, found);
}
_imagePoints->push_back(_corners);
Than, after collecting enough data I'm calculating camera matrix and coefficients with this code:
std::vector< std::vector<cv::Point3f> > *objectPoints = new std::vector< std::vector< cv::Point3f> >();
for (unsigned long i = 0; i < _imagePoints->size(); i++) {
std::vector<cv::Point2f> currentImagePoints = _imagePoints->at(i);
std::vector<cv::Point3f> currentObjectPoints;
for (int j = 0; j < currentImagePoints.size(); j++) {
cv::Point3f newPoint = cv::Point3f(j % _patternSize.width, j / _patternSize.width, 0);
currentObjectPoints.push_back(newPoint);
}
objectPoints->push_back(currentObjectPoints);
}
std::vector<cv::Mat> rvecs, tvecs;
static CGSize size = CGSizeMake(_imageSize.width, _imageSize.height);
cv::Mat cameraMatrix = [_userDefaultsManager cameraMatrixwithCurrentResolution:size]; // previously detected matrix
cv::Mat coeffs = _userDefaultsManager.distCoeffs; // previously detected coeffs
cv::calibrateCamera(*objectPoints, *_imagePoints, _imageSize, cameraMatrix, coeffs, rvecs, tvecs);
Results are like you've seen in the video.
What am I doing wrong? is that an issue in the code? How much images should I use to perform calibration (right now I'm trying to obtain 20-30 images before end of calibration).
Should I use images that containg wrongly detected chessboard corners, like this:
or should I use only properly detected chessboards like these:
I've been experimenting with circles grid instead of of chessboards, but results were much worse that now.
In case of questions how I'm detecting marker: I'm using solvepnp function:
solvePnP(modelPoints, imagePoints, [_arEngine currentCameraMatrix], _userDefaultsManager.distCoeffs, rvec, tvec);
with modelPoints specified like this:
markerPoints3D.push_back(cv::Point3d(-kMarkerRealSize / 2.0f, -kMarkerRealSize / 2.0f, 0));
markerPoints3D.push_back(cv::Point3d(kMarkerRealSize / 2.0f, -kMarkerRealSize / 2.0f, 0));
markerPoints3D.push_back(cv::Point3d(kMarkerRealSize / 2.0f, kMarkerRealSize / 2.0f, 0));
markerPoints3D.push_back(cv::Point3d(-kMarkerRealSize / 2.0f, kMarkerRealSize / 2.0f, 0));
and imagePoints are coordinates of marker corners in processing image (I'm using custom algorithm to do that)
In order to properly debug your problem I would need all the code :-)
I assume you are following the approach suggested in the tutorials (calibration and pose) cited by #kobejohn in his comment and so that your code follows these steps:
collect various images of chessboard target
find chessboard corners in images of point 1)
calibrate the camera (with cv::calibrateCamera) and so obtain as a result the intrinsic camera parameters (let's call them intrinsic) and the lens distortion parameters (let's call them distortion)
collect an image of your own custom target (the target is seen at 0:57 in your video) and it is shown in the following figure and find some relevant points in it (let's call the point you found in image image_custom_target_vertices and world_custom_target_vertices the corresponding 3D points).
estimate the rotation matrix (let's call it R) and the translation vector (let's call it t) of the camera from the image of your own custom target you get in point 4), with a call to cv::solvePnP like this one cv::solvePnP(world_custom_target_vertices,image_custom_target_vertices,intrinsic,distortion,R,t)
giving the 8 corners cube in 3D (let's call them world_cube_vertices) you get the 8 2D image points (let's call them image_cube_vertices) by means of a call to cv2::projectPoints like this one cv::projectPoints(world_cube_vertices,R,t,intrinsic,distortion,image_cube_vertices)
draw the cube with your own draw function.
Now, the final result of the draw procedure depends on all the previous computed data and we have to find where the problem lies:
Calibration: as you observed in your answer, in 3) you should discard the images where the corners are not properly detected. You need a threshold for the reprojection error in order to discard "bad" chessboard target images. Quoting from the calibration tutorial:
Re-projection Error
Re-projection error gives a good estimation of just how exact is the
found parameters. This should be as close to zero as possible. Given
the intrinsic, distortion, rotation and translation matrices, we first
transform the object point to image point using cv2.projectPoints().
Then we calculate the absolute norm between what we got with our
transformation and the corner finding algorithm. To find the average
error we calculate the arithmetical mean of the errors calculate for
all the calibration images.
Usually you will find a suitable threshold with some experiments. With this extra step you will get better values for intrinsic and distortion.
Finding you own custom target: it does not seem to me that you explain how you find your own custom target in the step I labeled as point 4). Do you get the expected image_custom_target_vertices? Do you discard images where that results are "bad"?
Pose of the camera: I think that in 5) you use intrinsic found in 3), are you sure nothing is changed in the camera in the meanwhile? Referring to the Callari's Second Rule of Camera Calibration:
Second Rule of Camera Calibration: "Thou shalt not touch the lens
after calibration". In particular, you may not refocus nor change the
f-stop, because both focusing and iris affect the nonlinear lens
distortion and (albeit less so, depending on the lens) the field of
view. Of course, you are completely free to change the exposure time,
as it does not affect the lens geometry at all.
And then there may be some problems in the draw function.
So, I've experimented a lot with my code, and I still haven't fixed the main issue (shifted objects), but I've managed to answer some of calibration questions I've asked.
First of all - in order to obtain good calibration results you have to use images with properly detected grid elements/circles positions!. Using all captured images in calibration process (even those that aren't properly detected) will result bad calibration.
I've experimented with various calibration patterns:
Asymmetric circles pattern (CALIB_CB_ASYMMETRIC_GRID), give much worse results than any other pattern. By worse results I mean that it produces a lot of wrongly detected corners like these:
I've experimented with CALIB_CB_CLUSTERING and it haven't helped much - in some cases (different light environment) it got better, but not much.
Symmetric circles pattern (CALIB_CB_SYMMETRIC_GRID) - better results than asymmetric grid, but still I've got much worse results than standard grid (chessboard). It often produces errors like these:
Chessboard (found using findChessboardCorners function) - this method is producing best possible results - it doesn't produce misaligned corners very often, and almost every calibration is producing similar results to best-possible results from symmetric circles grid
For every calibration I've been using 20-30 images that were coming from different angles. I've tried even with 100+ images but it haven't produced noticeable change in calibration results than smaller amount of images. It's worth noticing that larger number of test images is increasing time needed to compute camera parameters in non-linear way (100 test images in 480x360 resolution are computing 25 minutes in iPad4, compared with 4 minutes with ~50 images)
I've also experimented with solvePNP parameters - but is also haven't gave me any acceptable results: I've tried all 3 detection methods (ITERATIVE, EPNP and P3P), but I haven't seen aby noticeable change.
Also I've tried with useExtrinsicGuess set to true, and I've used rvec and tvec from previous detection, but this one resulted with complete disapperance of detected cube.
I've ran out of ideas - what else could be affecting these shifting problems?
For those still interested:
this is an old question, but I think your problem is not the bad calibration.
I developed an AR app for iOS, using OpenCV and SceneKit, and I have had your same issue.
I think your problem is the wrong render position of the cube:
OpenCV's solvePnP returns the X, Y, Z coordinates of the marker center, but you wanna render the cube over the marker, at a specific distance along the Z axis of the marker, exactly at one half of the cube side size. So you need to improve the Z coordinate of the marker translation vector of this distance.
In fact, when you see your cube from the top, the cube is render properly.
I have done an image in order to explain the problem, but my reputation prevent to post it.
I have an image of a chessboard taken at some angle. Now I want to warp perspective so the chessboard image look again as if was taken directly from above.
I know that I can try to use 'findHomography' between matched points but I wanted to avoid it and use e.g. rotation data from mobile sensors to build homography matrix on my own. I calibrated my camera to get intrinsic parameters. Then lets say the following image has been taken at ~60degrees angle around x-axis. I thought that all I have to do is to multiply camera matrix with rotation matrix to obtain homography matrix. I tried to use the following code but looks like I'm not understanding something correctly because it doesn't work as expected (result image completely black or white.
import cv2
import numpy as np
import math
camera_matrix = np.array([[ 5.7415988502105745e+02, 0., 2.3986181527877352e+02],
[0., 5.7473682183375217e+02, 3.1723734404756237e+02],
[0., 0., 1.]])
distortion_coefficients = np.array([ 1.8662919398453856e-01, -7.9649812697463640e-01,
1.8178068172317731e-03, -2.4296638847737923e-03,
7.0519002388825025e-01 ])
theta = math.radians(60)
rotx = np.array([[1, 0, 0],
[0, math.cos(theta), -math.sin(theta)],
[0, math.sin(theta), math.cos(theta)]])
homography = np.dot(camera_matrix, rotx)
im = cv2.imread('data/chess1.jpg')
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
im_warped = cv2.warpPerspective(gray, homography, (480, 640), flags=cv2.WARP_INVERSE_MAP)
cv2.imshow('image', im_warped)
cv2.waitKey()
pass
I also have distortion_coefficients after calibration. How can those be incorporated into the code to improve results?
This answer is awfully late by several years, but here it is ...
(Disclaimer: my use of terminology in this answer may be imprecise or incorrect. Please do look up on this topic from other more credible sources.)
Remember:
Because you only have one image (view), you can only compute 2D homography (perspective correspondence between one 2D view and another 2D view), not the full 3D homography.
Because of that, the nice intuitive understanding of the 3D homography (rotation matrix, translation matrix, focal distance, etc.) are not available to you.
What we say is that with 2D homography you cannot factorize the 3x3 matrix into those nice intuitive components like 3D homography does.
You have one matrix - (which is the product of several matrices unknown to you) - and that is it.
However,
OpenCV provides a getPerspectiveTransform function which solves the 3x3 perspective matrix (using homogenous coordinate system) for a 2D homography between two planar quadrilaterals.
Link to documentation
To use this function,
Find the four corners of the chessboard on the image. These will be your source coordinates.
Supply four rectangle corners of your choice. These will be your destination coordinates.
Pass the source coordinates and destination coordinates into the getPerspectiveTransform to generate a 3x3 matrix that is able to dewarp your chessboard to an upright rectangle.
Notes to remember:
Mind the ordering of the four corners.
If the source coordinates are picked in clockwise order, the destination also needs to be picked in clockwise order.
Likewise, if counter-clockwise order is used, do it consistently.
Likewise, if z-order (top left, top right, bottom left, bottom right) is used, do it consistently.
Failure to order the corners consistently will generate a matrix that executes the point-to-point correspondence exactly (mathematically speaking), but will not generate a usable output image.
The aspect ratio of the destination rectangle can be chosen arbitrarily. In fact, it is not possible to deduce the "original aspect ratio" of the object in world coordinates, because "this is 2D homography, not 3D".
One problem is that to multiply by a camera matrix you need some concept of a z coordinate. You should start by getting basic image warping given Euler angles to work before you think about distortion coefficients. Have a look at this answer for a slightly more detailed explanation and try to duplicate my result. The idea of moving your image down the z axis and then projecting it with your camera matrix can be confusing, let me know if any part of it does not make sense.
You do not need to calibrate the camera nor estimate the camera orientation (the latter, however, in this case would be very easy: just find the vanishing points of those orthogonal bundles of lines, and take their cross product to find the normal to the plane, see Hartley & Zisserman's bible for details).
The only thing you need to do is estimate the homography that maps the checkers to squares, then apply it to the image.
I'm doing a coin detection using JavaCV (OpenCV wrapper) but I have a little problem when the coins are connected. If I try to erode them to separate these coins they loose their circle form and if I try to count pixels inside each coin there can be problems so that some coins can be miscounted as one that bigger. What I want to do is firstly to reshape them and make them like a circle (equal with the radius of that coin) and then count pixels inside them.
Here is my thresholded image:
And here is eroded image:
Any suggestions? Or is there any better way to break bridges between coins?
It looks similar to a problem I recently had to separate bacterial colonies growing on agar plates.
I performed a distance transform on the thresholded image (in your case you will need to invert it).
Then found the peaks of the distance map (by calculating the difference between a the dilated distance map and the distance map and finding the zero values).
Then, I assumed each peak to be the centre of a circle (coin) and the value of the peak in the distance map to be the radius of the circle.
Here is the result of your image after this pipeline:
I am new to OpenCV, and c++ so my code is probably very messy, but I did that:
int main( int argc, char** argv ){
cv::Mat objects, distance,peaks,results;
std::vector<std::vector<cv::Point> > contours;
objects=cv::imread("CUfWj.jpg");
objects.copyTo(results);
cv::cvtColor(objects, objects, CV_BGR2GRAY);
//THIS IS THE LINE TO BLUR THE IMAGE CF COMMENTS OF THIS POST
cv::blur( objects,objects,cv::Size(3,3));
cv::threshold(objects,objects,125,255,cv::THRESH_BINARY_INV);
/*Applies a distance transform to "objects".
* The result is saved in "distance" */
cv::distanceTransform(objects,distance,CV_DIST_L2,CV_DIST_MASK_5);
/* In order to find the local maxima, "distance"
* is subtracted from the result of the dilatation of
* "distance". All the peaks keep the save value */
cv::dilate(distance,peaks,cv::Mat(),cv::Point(-1,-1),3);
cv::dilate(objects,objects,cv::Mat(),cv::Point(-1,-1),3);
/* Now all the peaks should be exactely 0*/
peaks=peaks-distance;
/* And the non-peaks 255*/
cv::threshold(peaks,peaks,0,255,cv::THRESH_BINARY);
peaks.convertTo(peaks,CV_8U);
/* Only the zero values of "peaks" that are non-zero
* in "objects" are the real peaks*/
cv::bitwise_xor(peaks,objects,peaks);
/* The peaks that are distant from less than
* 2 pixels are merged by dilatation */
cv::dilate(peaks,peaks,cv::Mat(),cv::Point(-1,-1),1);
/* In order to map the peaks, findContours() is used.
* The results are stored in "contours" */
cv::findContours(peaks, contours, CV_RETR_CCOMP, CV_CHAIN_APPROX_SIMPLE);
/* The next steps are applied only if, at least,
* one contour exists */
cv::imwrite("CUfWj2.jpg",peaks);
if(contours.size()>0){
/* Defines vectors to store the moments of the peaks, the center
* and the theoritical circles of the object of interest*/
std::vector <cv::Moments> moms(contours.size());
std::vector <cv::Point> centers(contours.size());
std::vector<cv::Vec3f> circles(contours.size());
float rad,x,y;
/* Caculates the moments of each peak and then the center of the peak
* which are approximatively the center of each objects of interest*/
for(unsigned int i=0;i<contours.size();i++) {
moms[i]= cv::moments(contours[i]);
centers[i]= cv::Point(moms[i].m10/moms[i].m00,moms[i].m01/moms[i].m00);
x= (float) (centers[i].x);
y= (float) (centers[i].y);
if(x>0 && y>0){
rad= (float) (distance.at<float>((int)y,(int)x)+1);
circles[i][0]= x;
circles[i][3]= y;
circles[i][2]= rad;
cv::circle(results,centers[i],rad+1,cv::Scalar( 255, 0,0 ), 2, 4, 0 );
}
}
cv::imwrite("CUfWj2.jpg",results);
}
return 1;
}
You don't need to erode, just a good set of params for cvHoughCircles():
The code used to generate this image came from my other post: Detecting Circles, with these parameters:
CvSeq* circles = cvHoughCircles(gray, storage, CV_HOUGH_GRADIENT, 1, gray->height/12, 80, 26);
OpenCV has a function called HoughCircles() that can be applied to your case, without separating the different circles. Can you call it from JavaCV ? If so, it will do what you want (detecting and counting circles), bypassing your separation problem.
The main point is to detect the circles accurately without separating them first. Other algorithms (such as template matching can be used instead of generalized Hough transform, but you have to take into account the different sizes of the coins.
The usual approach for erosion-based object recognition is to label continuous regions in the eroded image and then re-grow them until they match the regions in the original image. Hough circles is a better idea in your case, though.
After detecting the joined coins, I recommend applying morphological operations to classify areas as "definitely coin" and "definitely not coin", apply a distance transformation, then run the watershed to determine the boundaries. This scenario is actually the demonstration example for the watershed algorithm in OpenCV − perhaps it was created in response to this question.
I'm intending to write a program to detect and differentiate certain objects from a nearly solid background. The foreground and the background have a high contrast difference which I would further increase to aid in the object identification process. I'm planning to use Hough transform technique and OpenCV.
Sample image
As seen in the above image, I would want to separately identify the circular objects and the square objects (or any other shape out of a finite set of shapes). Since I'm quite new to image processing I do not have an idea whether such a situation needs a neural network to be implemented and each shape to be learned beforehand. Would a technique such as template matching let me do this without a neural network?
These posts will get you started:
How to detect circles
How to detect squares
How to detect a sheet of paper (advanced square detection)
You will probably have to adjust some parameters in these codes to match your circles/squares, but the core of the technique is shown on these examples.
If you intend to detect shapes other than just circles, (and from the image I assume you do), I would recommend the Chamfer matching for a quick start, especially as you have a good contrast.
The basic premise, explained in simple terms, is following:
You do an edge detection (for example, cvCanny in opencv)
You create a distance image, where the value of each pixel means the distance fom the nearest edge.
You take the shapes you would like to detect, define sample points along the edges of the shape, and try to match these points on the distance image. Basically you just add the values on the distance image which are "under" the coordinates of your sample points, given a specific position of your objects.
Find a good minimization algorithm, the effectiveness of this depends on your application.
This basic approach is a general solution, usually works well, but without further advancements, it is very slow.
Usually it's a good idea to first separate the objects of interest, so you don't have to always do the full search on the whole image. Find a good threshold, so you can separate objects. You still don't know which object it is, but you only have to do the matching itself in close proximity of this object.
Another good idea is, instead of doing the full search on the high resolution image, first do it on a very low resolution. The result will not be very accurate, but you can know the general areas where it's worth to do a search on a higher resolution, so you don't waste your time on areas where there is nothing of interest.
There are a number of more advanced techniques, but it's still worth to take a look at the basic chamfer matching, as it is the base of a large number of techniques.
With the assumption that the objects are simple shapes, here's an approach using thresholding + contour approximation. Contour approximation is based on the assumption that a curve can be approximated by a series of short line segments which can be used to determine the shape of a contour. For instance, a triangle has three vertices, a square/rectangle has four vertices, a pentagon has five vertices, and so on.
Obtain binary image. We load the image, convert to grayscale, Gaussian blur, then adaptive threshold to obtain a binary image.
Detect shapes. Find contours and identify the shape of each contour using contour approximation filtering. This can be done using arcLength to compute the perimeter of the contour and approxPolyDP to obtain the actual contour approximation.
Input image
Detected objects highlighted in green
Labeled contours
Code
import cv2
def detect_shape(c):
# Compute perimeter of contour and perform contour approximation
shape = ""
peri = cv2.arcLength(c, True)
approx = cv2.approxPolyDP(c, 0.04 * peri, True)
# Triangle
if len(approx) == 3:
shape = "triangle"
# Square or rectangle
elif len(approx) == 4:
(x, y, w, h) = cv2.boundingRect(approx)
ar = w / float(h)
# A square will have an aspect ratio that is approximately
# equal to one, otherwise, the shape is a rectangle
shape = "square" if ar >= 0.95 and ar <= 1.05 else "rectangle"
# Star
elif len(approx) == 10:
shape = "star"
# Otherwise assume as circle or oval
else:
shape = "circle"
return shape
# Load image, grayscale, Gaussian blur, and adaptive threshold
image = cv2.imread('1.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (7,7), 0)
thresh = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,31,3)
# Find contours and detect shape
cnts = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
# Identify shape
shape = detect_shape(c)
# Find centroid and label shape name
M = cv2.moments(c)
cX = int(M["m10"] / M["m00"])
cY = int(M["m01"] / M["m00"])
cv2.putText(image, shape, (cX - 20, cY), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (36,255,12), 2)
cv2.imshow('thresh', thresh)
cv2.imshow('image', image)
cv2.waitKey()