How to use OpenCV stereoCalibrate output to map pixels from one camera to another - opencv

Context: I have two cameras of very different focus and size that I want to align for image processing. One is RGB, one is near-infrared. The cameras are in a static rig, so fixed relative to each other. Because the image focus/width are so different, it's hard to even get both images to recognize the chessboard at the same time. Pretty much only works when the chessboard is centered in both images with very little skew/tilt.
I need to perform computations on the aligned images, so I need as good of a mapping between the optical frames as I can get. Right now the results I'm getting are pretty far off. I'm not sure if I'm using the method itself wrong, or if I am misusing the output. Details and image below.
Computation: I am using OpenCV stereoCalibrate to estimate the rotation and translation matrices with the following code, and throwing out bad results based on final error.
int flag = cv::CALIB_FIX_INTRINSIC;
double err = cv::stereoCalibrate(temp_points_object_vec, temp_points_alignvec, temp_points_basevec, camera_mat_align, camera_distort_align, camera_mat_base, camera_distort_base, mat_align.size(), rotate_mat, translate_mat, essential_mat, F, flag, cv::TermCriteria(cv::TermCriteria::MAX_ITER + cv::TermCriteria::EPS, 30, 1e-6));
if (last_error_ == -1.0 || (err < last_error_ + improve_threshold_)) {
// -1.0 indicate first calibration, accept points. Other cond indicates acceptable error.
points_alignvec_.push_back(addalign);
points_basevec_.push_back(addbase);
points_object_vec_.push_back(object_points);
}
The result doesn't produce an OpenCV error as is, and due to the large difference between images, more than half of the matched points are rejected. Results are much better since I added the conditional on the error, but still pretty poor. Error as computed above starts around 30, but doesn't get lower than 15-17. For comparison, I believe a "good" error would be <1. So for starters, I know the output isn't great, but on top of that, I'm not sure I'm using the output right for validating visually. I've attached images showing some of the best and worst results I see. The middle image on the right of each shows the "cross-validated" chessboard keypoints. These are computed like this (note addalign is the temporary vector containing only the chessboard keypoints from the current image in the frame to be aligned):
for (int i = 0; i < addalign.size(); i++) {
cv::Point2f validate_pt;// = rotate_mat * addalign.at(i) + translate_mat;
// Project pixel from image aligned to 3D
cv::Point3f ray3d = align_camera_model_.projectPixelTo3dRay(addalign.at(i));
// Rotate and translate
rotate_mat.convertTo(rotate_mat, CV_32F);
cv::Mat temp_result = rotate_mat * cv::Mat(ray3d, false);
cv::Point3f ray_transformed;
temp_result.copyTo(cv::Mat(ray_transformed, false));
cv::Mat tmat = cv::Mat(translate_mat, false);
ray_transformed.x += tmat.at<float>(0);
ray_transformed.y += tmat.at<float>(1);
ray_transformed.z += tmat.at<float>(2);
// Reproject to base image pixel
cv::Point2f pixel = base_camera_model_.project3dToPixel(ray_transformed);
corners_validated.push_back(pixel);
}
Here are two images showing sample outputs, including both raw images, both images with "drawChessboard," and a cross-validated image showing the base image with above-computed keypoints translated from the alignment image.
Better result
Worse result
In the computation of corners_validated, I'm not sure I'm using rotate_mat andtranslate_mat correctly. I'm sure there is probably an OpenCV method that does this more efficiently, but I just did it the way that made sense to me at the time.
Also relevant: This is all inside a ROS package, using ROS noetic on Ubuntu 20.04 which only permits the use of OpenCV 4.2, so I don't have access to some of the newer opencv methods.

Related

Extract an object on a sheet of paper

From pictures of tools on a sheet of paper, I'm asked to find their outline contour to vectorize them.
I'm a total beginner in computer-vision-related problems and the only thing I thought about was OpenCV and edge detection.
The result is better than what I've imagined, this is still very unreliable, especially if the source picture isn't "perfect".
I took 2 photographies of a wrench they gave me.
After playing around with opencv bindings for node, I get this:
Then, I've tried with the less-good picture:
That's totally inexploitable.
I can get something a little better by changing the Canny thresold, but that must be automatized (given that the picture is relatively correct).
So I've got a few questions:
Am I taking the right approach? Is GrabCut better for this? A combination of Grabcut and Canny edge detection? I still need vertices at the end, but I feel that GrabCut does what I want too.
The borders are rough and have certain errors. I can augment approxPolyDP's multiplier, but without a loss of precision on good parts.
Related to the above point, I'm thinking of integrating Savitzky-Golay algorithm to smooth the outline, instead of polygon simplification with approxPolyDP. Is it a good idea?
Normally, the line of the outer border must form a simple, cuttable block. Is there a way in OpenCL to avoid that line to do impossible things, like passing on itself? - Or, simply, detect the problem? Those configurations are, of course, impossible but happen when the detection is failed (like in the second pic).
I'm searching a way to do automatic Canny thresold calculation, since I must tweak it manually for each image. Do you have a good example for that?
I noticed that converting the image to grayscale before edge detection sometimes deteriorates the result, and sometimes makes it better. Which one should I choose? (tools can be of any color btw!)
here is the source for my tests:
const cv = require('opencv');
const lowThresh = 90;
const highThresh = 90;
const nIters = 1;
const GRAY = [120, 120, 120];
const WHITE = [255, 255, 255];
cv.readImage('./files/viv1.jpg', function(err, im) {
if (err) throw err;
width = im.width()
height = im.height()
if (width < 1 || height < 1) throw new Error('Image has no size');
const out = new cv.Matrix(height, width);
im.convertGrayscale();
im_canny = im.copy();
im_canny.canny(lowThresh, highThresh);
im_canny.dilate(nIters);
contours = im_canny.findContours();
let maxArea = 0;
let biggestContour;
for (i = 0; i < contours.size(); i++) {
const area = contours.area(i);
if (area > maxArea) {
maxArea = area;
biggestContour = i;
}
out.drawContour(contours, i, GRAY);
}
const arcLength = contours.arcLength(biggestContour, true);
contours.approxPolyDP(biggestContour, 0.001 * arcLength, true);
out.drawContour(contours, biggestContour, WHITE, 5);
out.save('./tmp/out.png');
console.log('Image saved to ./tmp/out.png');
});
You'll need to add some pre-processing to clean up the image. Because you have a large variation in intensities in the image because of shadow, poor lighting, high shine on tools, etc you should equalize the image. This will help you get a better response in the regions that are currently poorly lit or have high shine.
Here's an opencv tutorial on Histogram equalization in C++: http://docs.opencv.org/2.4/doc/tutorials/imgproc/histograms/histogram_equalization/histogram_equalization.html
Hope this helps
EDIT:
You can have an automatic threshold based on some loss function(?). For eg: If you know that the tool will be completely captured in the frame, you know that you should get a high value at every column from x = 10 to x = 800(say). You could then keep reducing the threshold until you get a high value at every column from x = 10 to x = 800. This is a very naive way of doing it, but its an interesting experiment, I think, since you are generating the images yourself and have control over object placement.
You might also try running your images through an adaptive threshold first. This type of binarization is fairly adept at segmenting foreground and background in cases like this, even with inconsistent lighting/shadows (which seems to be the issue in your second example above). Adathresh will require some parameter fine-tuning, but once the entire tool is segmented from the background, Canny edge detection should produce more consistent results.
As for the roughness in your contours, you could try setting your findContours mode to one of the CV_CHAIN_APPROX methods described here.

Improving image matching accuracy with OpenCV matchTemplate

I’m a making an iOS application that will find instances of a smaller (similar) image inside a larger image. For example, something like:
The image we are searching inside
The image we are searching for
The matched image
The main things to consider are, the smallImage size will match the size of the target in the bigImage, but the object may be slightly obscured in the bigImage (as in they won’t always be identical). Also, the images I am dealing with are quite a bit smaller than my examples here, the image that I am trying to match (the smallImage) is between 32 x 32 pixels and 80 x 80 pixels, and the big image around 1000 x 600 pixels. Other than potentially being slightly obscured, the smallImage will match the object in the big image in every way (size, color, rotation etc..)
I have tried out a few methods using OpenCV. Feature matching didn’t seem accurate enough and gave me hundreds of meaningless results, so I am trying template matching. My code looks something like:
cv::Mat ref = [bigImage CVMat];
cv::Mat tpl = [smallImage CVMat];
cv::Mat gref, gtpl;
cv::cvtColor(ref, gref, CV_LOAD_IMAGE_COLOR);
cv::cvtColor(tpl, gtpl, CV_LOAD_IMAGE_COLOR);
cv::Mat res(ref.rows-tpl.rows+1, ref.cols-tpl.cols+1, CV_32FC1);
cv::matchTemplate(gref, gtpl, res, CV_TM_CCOEFF_NORMED);
cv::threshold(res, res, [tolerance doubleValue], 1., CV_THRESH_TOZERO);
double minval, maxval, threshold = [tolerance doubleValue];
cv::Point minloc, maxloc;
cv::minMaxLoc(res, &minval, &maxval, &minloc, &maxloc);
if (maxval >= threshold) {
// match
bigImage is the large image in which we are tryign to find the target
smallImage is the image we are looking for within the bigImage
tolerance is the tolerance for matches (between 0 and 1)
This does work, but there are a few issues.
I originally tried using a full image of the image object that I am trying to match (ie; an image of the entire fridge), but I found that it was very inaccurate, when the tolerance was high it found nothing, and when it was low it found many incorrect matches.
Next I tested out using smaller portions of the image, for example:
This increased the accuracy of finding the target in the big image, but also results in a lot of incorrect matches as well.
I have tried all the available methods for matchTemplate from here, and they all return a large amount of false matches, except CV_TM_CCOEFF_NORMED which returns less matches (but also mostly false matches)
How can I improve the accuracy of image matching using OpenCV in iOS?
Edit:
I have googled loads, the most helpful posts are:
Pattern Matching - Find reference object in second image
Measure of accuracy in pattern recognition using SURF in OpenCV
Template Matching
Algorithm improvement for Coca-Cola can shape recognition
I can't find any suggestions on how to improve the accuracy
If the template image is not rotated (or under some projective distortion) in the image in which you are searching for - since all geometric and texture properties are preserved (assuming occlusion is not very large), the only variable left is the scale. Hence, running a template matching algorithm, at multiple scales of the original template and then taking the maximum normalized response over all scales should give a perfect match. One issue may be that for a perfect match, guessing (optimizing over) the exact scale will be computationally expensive or involve some heuristics. One heuristic can be, run template matching at 3 different scales (1, 2, 4), suppose you get the best response at a particular scale, (say 2), try between (1.5, 2.25, 3) and keep on refining. Ofcourse, this is a heuristic which may work well in practice, but is not a theoretically correct way of finding the right scale and can get stuck in local minima.
The reason why feature based methods will not work on this kind of image is because they rely on texture/sharp gradients which are not very evident in the homogeneous template image you have shown.

getting strange coordinate values from vector<Point2f> in opencv

I have extracted 100 features points of an image using vector<Point2f>connersFrame1 but i am getting segmentation fault while trying to access the pixel value of that image (480 * 640).
Then i tried to see the coordinate value using
cout<<"\n cornersFrame1[82]: ("<<cornersFrame1[82].y<<" , "<<cornersFrame1[82].x<<")\n \n";
and i got the following output:
cornersFrame1[83]: (-1.68725 , 317.552)
How can i get the coordinate values in float form??
By the way, i am trying the following code to access the pixel values:
for(int i=0; i<cornersFrame1.size(); i++)
{
float value = calculatedU.at<float>(cornersFrame1[i]);
}
I would not be surprised of getting float values in general from feature detectors. Bear in mind, the detectors might be sophisticated tools that go deeply in the image knowledge (yep, literally - e.g. BRISK which use pattern idea which are scaled to find keypoints). In that case, you can just round the returned floated point to get ints.
But it is strange opencv can return negative point coordinates values. I think it is not proper opencv behavior. However, this small value (-1.68) is still acceptable because it usually detectors have some radius (or patch) on which they operate. But still opencv should have checked boundaries and it seems it does not.

OpenCV: solvePnP detection problems

I've got problem with precise detection of markers using OpenCV.
I've recorded video presenting that issue: http://youtu.be/IeSSW4MdyfU
As you see I'm markers that I'm detecting are slightly moved at some camera angles. I've read on the web that this may be camera calibration problems, so I'll tell you guys how I'm calibrating camera, and maybe you'd be able to tell me what am I doing wrong?
At the beginnig I'm collecting data from various images, and storing calibration corners in _imagePoints vector like this
std::vector<cv::Point2f> corners;
_imageSize = cvSize(image->size().width, image->size().height);
bool found = cv::findChessboardCorners(*image, _patternSize, corners);
if (found) {
cv::Mat *gray_image = new cv::Mat(image->size().height, image->size().width, CV_8UC1);
cv::cvtColor(*image, *gray_image, CV_RGB2GRAY);
cv::cornerSubPix(*gray_image, corners, cvSize(11, 11), cvSize(-1, -1), cvTermCriteria(CV_TERMCRIT_EPS+ CV_TERMCRIT_ITER, 30, 0.1));
cv::drawChessboardCorners(*image, _patternSize, corners, found);
}
_imagePoints->push_back(_corners);
Than, after collecting enough data I'm calculating camera matrix and coefficients with this code:
std::vector< std::vector<cv::Point3f> > *objectPoints = new std::vector< std::vector< cv::Point3f> >();
for (unsigned long i = 0; i < _imagePoints->size(); i++) {
std::vector<cv::Point2f> currentImagePoints = _imagePoints->at(i);
std::vector<cv::Point3f> currentObjectPoints;
for (int j = 0; j < currentImagePoints.size(); j++) {
cv::Point3f newPoint = cv::Point3f(j % _patternSize.width, j / _patternSize.width, 0);
currentObjectPoints.push_back(newPoint);
}
objectPoints->push_back(currentObjectPoints);
}
std::vector<cv::Mat> rvecs, tvecs;
static CGSize size = CGSizeMake(_imageSize.width, _imageSize.height);
cv::Mat cameraMatrix = [_userDefaultsManager cameraMatrixwithCurrentResolution:size]; // previously detected matrix
cv::Mat coeffs = _userDefaultsManager.distCoeffs; // previously detected coeffs
cv::calibrateCamera(*objectPoints, *_imagePoints, _imageSize, cameraMatrix, coeffs, rvecs, tvecs);
Results are like you've seen in the video.
What am I doing wrong? is that an issue in the code? How much images should I use to perform calibration (right now I'm trying to obtain 20-30 images before end of calibration).
Should I use images that containg wrongly detected chessboard corners, like this:
or should I use only properly detected chessboards like these:
I've been experimenting with circles grid instead of of chessboards, but results were much worse that now.
In case of questions how I'm detecting marker: I'm using solvepnp function:
solvePnP(modelPoints, imagePoints, [_arEngine currentCameraMatrix], _userDefaultsManager.distCoeffs, rvec, tvec);
with modelPoints specified like this:
markerPoints3D.push_back(cv::Point3d(-kMarkerRealSize / 2.0f, -kMarkerRealSize / 2.0f, 0));
markerPoints3D.push_back(cv::Point3d(kMarkerRealSize / 2.0f, -kMarkerRealSize / 2.0f, 0));
markerPoints3D.push_back(cv::Point3d(kMarkerRealSize / 2.0f, kMarkerRealSize / 2.0f, 0));
markerPoints3D.push_back(cv::Point3d(-kMarkerRealSize / 2.0f, kMarkerRealSize / 2.0f, 0));
and imagePoints are coordinates of marker corners in processing image (I'm using custom algorithm to do that)
In order to properly debug your problem I would need all the code :-)
I assume you are following the approach suggested in the tutorials (calibration and pose) cited by #kobejohn in his comment and so that your code follows these steps:
collect various images of chessboard target
find chessboard corners in images of point 1)
calibrate the camera (with cv::calibrateCamera) and so obtain as a result the intrinsic camera parameters (let's call them intrinsic) and the lens distortion parameters (let's call them distortion)
collect an image of your own custom target (the target is seen at 0:57 in your video) and it is shown in the following figure and find some relevant points in it (let's call the point you found in image image_custom_target_vertices and world_custom_target_vertices the corresponding 3D points).
estimate the rotation matrix (let's call it R) and the translation vector (let's call it t) of the camera from the image of your own custom target you get in point 4), with a call to cv::solvePnP like this one cv::solvePnP(world_custom_target_vertices,image_custom_target_vertices,intrinsic,distortion,R,t)
giving the 8 corners cube in 3D (let's call them world_cube_vertices) you get the 8 2D image points (let's call them image_cube_vertices) by means of a call to cv2::projectPoints like this one cv::projectPoints(world_cube_vertices,R,t,intrinsic,distortion,image_cube_vertices)
draw the cube with your own draw function.
Now, the final result of the draw procedure depends on all the previous computed data and we have to find where the problem lies:
Calibration: as you observed in your answer, in 3) you should discard the images where the corners are not properly detected. You need a threshold for the reprojection error in order to discard "bad" chessboard target images. Quoting from the calibration tutorial:
Re-projection Error
Re-projection error gives a good estimation of just how exact is the
found parameters. This should be as close to zero as possible. Given
the intrinsic, distortion, rotation and translation matrices, we first
transform the object point to image point using cv2.projectPoints().
Then we calculate the absolute norm between what we got with our
transformation and the corner finding algorithm. To find the average
error we calculate the arithmetical mean of the errors calculate for
all the calibration images.
Usually you will find a suitable threshold with some experiments. With this extra step you will get better values for intrinsic and distortion.
Finding you own custom target: it does not seem to me that you explain how you find your own custom target in the step I labeled as point 4). Do you get the expected image_custom_target_vertices? Do you discard images where that results are "bad"?
Pose of the camera: I think that in 5) you use intrinsic found in 3), are you sure nothing is changed in the camera in the meanwhile? Referring to the Callari's Second Rule of Camera Calibration:
Second Rule of Camera Calibration: "Thou shalt not touch the lens
after calibration". In particular, you may not refocus nor change the
f-stop, because both focusing and iris affect the nonlinear lens
distortion and (albeit less so, depending on the lens) the field of
view. Of course, you are completely free to change the exposure time,
as it does not affect the lens geometry at all.
And then there may be some problems in the draw function.
So, I've experimented a lot with my code, and I still haven't fixed the main issue (shifted objects), but I've managed to answer some of calibration questions I've asked.
First of all - in order to obtain good calibration results you have to use images with properly detected grid elements/circles positions!. Using all captured images in calibration process (even those that aren't properly detected) will result bad calibration.
I've experimented with various calibration patterns:
Asymmetric circles pattern (CALIB_CB_ASYMMETRIC_GRID), give much worse results than any other pattern. By worse results I mean that it produces a lot of wrongly detected corners like these:
I've experimented with CALIB_CB_CLUSTERING and it haven't helped much - in some cases (different light environment) it got better, but not much.
Symmetric circles pattern (CALIB_CB_SYMMETRIC_GRID) - better results than asymmetric grid, but still I've got much worse results than standard grid (chessboard). It often produces errors like these:
Chessboard (found using findChessboardCorners function) - this method is producing best possible results - it doesn't produce misaligned corners very often, and almost every calibration is producing similar results to best-possible results from symmetric circles grid
For every calibration I've been using 20-30 images that were coming from different angles. I've tried even with 100+ images but it haven't produced noticeable change in calibration results than smaller amount of images. It's worth noticing that larger number of test images is increasing time needed to compute camera parameters in non-linear way (100 test images in 480x360 resolution are computing 25 minutes in iPad4, compared with 4 minutes with ~50 images)
I've also experimented with solvePNP parameters - but is also haven't gave me any acceptable results: I've tried all 3 detection methods (ITERATIVE, EPNP and P3P), but I haven't seen aby noticeable change.
Also I've tried with useExtrinsicGuess set to true, and I've used rvec and tvec from previous detection, but this one resulted with complete disapperance of detected cube.
I've ran out of ideas - what else could be affecting these shifting problems?
For those still interested:
this is an old question, but I think your problem is not the bad calibration.
I developed an AR app for iOS, using OpenCV and SceneKit, and I have had your same issue.
I think your problem is the wrong render position of the cube:
OpenCV's solvePnP returns the X, Y, Z coordinates of the marker center, but you wanna render the cube over the marker, at a specific distance along the Z axis of the marker, exactly at one half of the cube side size. So you need to improve the Z coordinate of the marker translation vector of this distance.
In fact, when you see your cube from the top, the cube is render properly.
I have done an image in order to explain the problem, but my reputation prevent to post it.

Threshold of blurry image - part 2

How can I threshold this blurry image to make the digits as clear as possible?
In a previous post, I tried adaptively thresholding a blurry image (left), which resulted in distorted and disconnected digits (right):
Since then, I've tried using a morphological closing operation as described in this post to make the brightness of the image uniform:
If I adaptively threshold this image, I don't get significantly better results. However, because the brightness is approximately uniform, I can now use an ordinary threshold:
This is a lot better than before, but I have two problems:
I had to manually choose the threshold value. Although the closing operation results in uniform brightness, the level of brightness might be different for other images.
Different parts of the image would do better with slight variations in the threshold level. For instance, the 9 and 7 in the top left come out partially faded and should have a lower threshold, while some of the 6s have fused into 8s and should have a higher threshold.
I thought that going back to an adaptive threshold, but with a very large block size (1/9th of the image) would solve both problems. Instead, I end up with a weird "halo effect" where the centre of the image is a lot brighter, but the edges are about the same as the normally-thresholded image:
Edit: remi suggested morphologically opening the thresholded image at the top right of this post. This doesn't work too well. Using elliptical kernels, only a 3x3 is small enough to avoid obliterating the image entirely, and even then there are significant breakages in the digits:
Edit2: mmgp suggested using a Wiener filter to remove blur. I adapted this code for Wiener filtering in OpenCV to OpenCV4Android, but it makes the image even blurrier! Here's the image before (left) and after filtering with my code and a 5x5 kernel:
Here is my adapted code, which filters in-place:
private void wiener(Mat input, int nRows, int nCols) { // I tried nRows=5 and nCols=5
Mat localMean = new Mat(input.rows(), input.cols(), input.type());
Mat temp = new Mat(input.rows(), input.cols(), input.type());
Mat temp2 = new Mat(input.rows(), input.cols(), input.type());
// Create the kernel for convolution: a constant matrix with nRows rows
// and nCols cols, normalized so that the sum of the pixels is 1.
Mat kernel = new Mat(nRows, nCols, CvType.CV_32F, new Scalar(1.0 / (double) (nRows * nCols)));
// Get the local mean of the input. localMean = convolution(input, kernel)
Imgproc.filter2D(input, localMean, -1, kernel, new Point(nCols/2, nRows/2), 0);
// Get the local variance of the input. localVariance = convolution(input^2, kernel) - localMean^2
Core.multiply(input, input, temp); // temp = input^2
Imgproc.filter2D(temp, temp, -1, kernel, new Point(nCols/2, nRows/2), 0); // temp = convolution(input^2, kernel)
Core.multiply(localMean, localMean, temp2); //temp2 = localMean^2
Core.subtract(temp, temp2, temp); // temp = localVariance = convolution(input^2, kernel) - localMean^2
// Estimate the noise as mean(localVariance)
Scalar noise = Core.mean(temp);
// Compute the result. result = localMean + max(0, localVariance - noise) / max(localVariance, noise) * (input - localMean)
Core.max(temp, noise, temp2); // temp2 = max(localVariance, noise)
Core.subtract(temp, noise, temp); // temp = localVariance - noise
Core.max(temp, new Scalar(0), temp); // temp = max(0, localVariance - noise)
Core.divide(temp, temp2, temp); // temp = max(0, localVar-noise) / max(localVariance, noise)
Core.subtract(input, localMean, input); // input = input - localMean
Core.multiply(temp, input, input); // input = max(0, localVariance - noise) / max(localVariance, noise) * (input - localMean)
Core.add(input, localMean, input); // input = localMean + max(0, localVariance - noise) / max(localVariance, noise) * (input - localMean)
}
Some hints that you might try out:
Apply the morphological opening in your original thresholded image (the one which is noisy at the right of the first picture). You should get rid of most of the background noise and be able to reconnect the digits.
Use a different preprocessing of your original image instead of morpho closing, such as median filter (tends to blur the edges) or bilateral filtering which will preserve better the edges but is slower to compute.
As far as threshold is concerned, you can use CV_OTSU flag in the cv::threshold to determine an optimal value for a global threshold. Local thresholding might still be better, but should work better with the bilateral or median filter
I've tried thresholding each 3x3 box separately, using Otsu's algorithm (CV_OTSU - thanks remi!) to determine an optimal threshold value for each box. This works a bit better than thresholding the entire image, and is probably a bit more robust.
Better solutions are welcome, though.
If you're willing to spend some cycles on it there are de-blurring techniques that could be used to sharpen up the picture prior to processing. Nothing in OpenCV yet but if this is a make-or-break kind of thing you could add it.
There's a bunch of literature on the subject:
http://www.cse.cuhk.edu.hk/~leojia/projects/motion_deblurring/index.html
http://www.google.com/search?q=motion+deblurring
And some chatter on the OpenCV mailing list:
http://tech.groups.yahoo.com/group/OpenCV/message/20938
The weird "halo effect" that you're seeing is likely due to OpenCV assuming black for the color when the adaptive threshold is at/near the edge of the image and the window that it's using "hangs over" the edge into non-image territory. There are ways to correct for this, most likely you would make an temporary image that's at least two full block-sizes taller and wider than the image from the camera. Then copy the camera image into the middle of it. Then set the surrounding "blank" portion of the temp image to be the average color of the image from the camera. Now when you perform the adaptive threshold the data at/near the edges will be much closer to accurate. It won't be perfect since its not a real picture but it will yield better results than the black that OpenCV is assuming is there.
My proposal assumes you can identify the sudoku cells, which I think, is not asking too much. Trying to apply morphological operators (although I really like them) and/or binarization methods as a first step is the wrong way here, in my opinion of course. Your image is at least partially blurry, for whatever reason (original camera angle and/or movement, among other reasons). So what you need is to revert that, by performing a deconvolution. Of course asking for a perfect deconvolution is too much, but we can try some things.
One of these "things" is the Wiener filter, and in Matlab, for instance, the function is named deconvwnr. I noticed the blurry to be in the vertical direction, so we can perform a deconvolution with a vertical kernel of certain length (10 in the following example) and also assume the input is not noise free (assumption of 5%) -- I'm just trying to give a very superficial view here, take it easy. In Matlab, your problem is at least partially solved by doing:
f = imread('some_sudoku_cell.png');
g = deconvwnr(f, fspecial('motion', 10, 90), 0.05));
h = im2bw(g, graythresh(g)); % graythresh is the Otsu method
Here are the results from some of your cells (original, otsu, otsu of region growing, morphological enhanced image, otsu from morphological enhanced image with region growing, otsu of deconvolution):
The enhanced image was produced by performing original + tophat(original) - bottomhat(original) with a flat disk of radius 3. I manually picked the seed point for region growing and manually picked the best threshold.
For empty cells you get weird results (original and otsu of deconvlution):
But I don't think you would have trouble to detect whether a cell is empty or not (the global threshold already solves it).
EDIT:
Added the best results I could get with a different approach: region growing. I also attempted some other approaches, but this was the second best one.

Resources