Reshaping noisy coin into a circle form

Reshaping noisy coin into a circle form - image-processing

I'm doing a coin detection using JavaCV (OpenCV wrapper) but I have a little problem when the coins are connected. If I try to erode them to separate these coins they loose their circle form and if I try to count pixels inside each coin there can be problems so that some coins can be miscounted as one that bigger. What I want to do is firstly to reshape them and make them like a circle (equal with the radius of that coin) and then count pixels inside them.
Here is my thresholded image:
And here is eroded image:
Any suggestions? Or is there any better way to break bridges between coins?

It looks similar to a problem I recently had to separate bacterial colonies growing on agar plates.
I performed a distance transform on the thresholded image (in your case you will need to invert it).
Then found the peaks of the distance map (by calculating the difference between a the dilated distance map and the distance map and finding the zero values).
Then, I assumed each peak to be the centre of a circle (coin) and the value of the peak in the distance map to be the radius of the circle.
Here is the result of your image after this pipeline:
I am new to OpenCV, and c++ so my code is probably very messy, but I did that:
int main( int argc, char** argv ){
cv::Mat objects, distance,peaks,results;
std::vector<std::vector<cv::Point> > contours;
objects=cv::imread("CUfWj.jpg");
objects.copyTo(results);
cv::cvtColor(objects, objects, CV_BGR2GRAY);
//THIS IS THE LINE TO BLUR THE IMAGE CF COMMENTS OF THIS POST
cv::blur( objects,objects,cv::Size(3,3));
cv::threshold(objects,objects,125,255,cv::THRESH_BINARY_INV);
/*Applies a distance transform to "objects".
* The result is saved in "distance" */
cv::distanceTransform(objects,distance,CV_DIST_L2,CV_DIST_MASK_5);
/* In order to find the local maxima, "distance"
* is subtracted from the result of the dilatation of
* "distance". All the peaks keep the save value */
cv::dilate(distance,peaks,cv::Mat(),cv::Point(-1,-1),3);
cv::dilate(objects,objects,cv::Mat(),cv::Point(-1,-1),3);
/* Now all the peaks should be exactely 0*/
peaks=peaks-distance;
/* And the non-peaks 255*/
cv::threshold(peaks,peaks,0,255,cv::THRESH_BINARY);
peaks.convertTo(peaks,CV_8U);
/* Only the zero values of "peaks" that are non-zero
* in "objects" are the real peaks*/
cv::bitwise_xor(peaks,objects,peaks);
/* The peaks that are distant from less than
* 2 pixels are merged by dilatation */
cv::dilate(peaks,peaks,cv::Mat(),cv::Point(-1,-1),1);
/* In order to map the peaks, findContours() is used.
* The results are stored in "contours" */
cv::findContours(peaks, contours, CV_RETR_CCOMP, CV_CHAIN_APPROX_SIMPLE);
/* The next steps are applied only if, at least,
* one contour exists */
cv::imwrite("CUfWj2.jpg",peaks);
if(contours.size()>0){
/* Defines vectors to store the moments of the peaks, the center
* and the theoritical circles of the object of interest*/
std::vector <cv::Moments> moms(contours.size());
std::vector <cv::Point> centers(contours.size());
std::vector<cv::Vec3f> circles(contours.size());
float rad,x,y;
/* Caculates the moments of each peak and then the center of the peak
* which are approximatively the center of each objects of interest*/
for(unsigned int i=0;i<contours.size();i++) {
moms[i]= cv::moments(contours[i]);
centers[i]= cv::Point(moms[i].m10/moms[i].m00,moms[i].m01/moms[i].m00);
x= (float) (centers[i].x);
y= (float) (centers[i].y);
if(x>0 && y>0){
rad= (float) (distance.at<float>((int)y,(int)x)+1);
circles[i][0]= x;
circles[i][3]= y;
circles[i][2]= rad;
cv::circle(results,centers[i],rad+1,cv::Scalar( 255, 0,0 ), 2, 4, 0 );
}
}
cv::imwrite("CUfWj2.jpg",results);
}
return 1;
}

You don't need to erode, just a good set of params for cvHoughCircles():
The code used to generate this image came from my other post: Detecting Circles, with these parameters:
CvSeq* circles = cvHoughCircles(gray, storage, CV_HOUGH_GRADIENT, 1, gray->height/12, 80, 26);

OpenCV has a function called HoughCircles() that can be applied to your case, without separating the different circles. Can you call it from JavaCV ? If so, it will do what you want (detecting and counting circles), bypassing your separation problem.
The main point is to detect the circles accurately without separating them first. Other algorithms (such as template matching can be used instead of generalized Hough transform, but you have to take into account the different sizes of the coins.

The usual approach for erosion-based object recognition is to label continuous regions in the eroded image and then re-grow them until they match the regions in the original image. Hough circles is a better idea in your case, though.

After detecting the joined coins, I recommend applying morphological operations to classify areas as "definitely coin" and "definitely not coin", apply a distance transformation, then run the watershed to determine the boundaries. This scenario is actually the demonstration example for the watershed algorithm in OpenCV − perhaps it was created in response to this question.

Related

Can't determine document edges from camera with OpenCV

I need find edges of document that in user hands.
1) Original image from camera:
2) Then i convert image to BG:
3) Then i make blur:
3) Finds edges in an image using the Canny:
4) And use dilate :
As you can see on the last image the contour around the map is torn and the contour is not determined. What is my error and how to solve the problem in order to determine the outline of the document completely?
This is code how i to do it:
final Mat mat = new Mat();
sourceMat.copyTo(mat);
//convert the image to black and white
Imgproc.cvtColor(mat, mat, Imgproc.COLOR_BGR2GRAY);
//blur to enhance edge detection
Imgproc.GaussianBlur(mat, mat, new Size(5, 5), 0);
if (isClicked) saveImageFromMat(mat, "blur", "blur");
//convert the image to black and white does (8 bit)
int thresh = 128;
Imgproc.Canny(mat, mat, thresh, thresh * 2);
//dilate helps to connect nearby line segments
Imgproc.dilate(mat, mat,
Imgproc.getStructuringElement(Imgproc.MORPH_RECT, new Size(3, 3)),
new Point(-1, -1),
2,
1,
new Scalar(1));

This answer is based on my above comment. If someone is holding the document, you cannot see the edge that is behind the user's hand. So, any method for detecting the outline of the document must be robust to some missing parts of the edge.
I suggest using a variant of the Hough transform to detect the document. The Wikipedia article about the Hough transform makes it sound quite scary (as Wikipedia often does with mathematical subjects), but don't be discouraged, actually they are not too difficult to understand or implement.
The original Hough transform detected straight lines in images. As explained in this OpenCV tutorial, any straight line in an image can be defined by 2 parameters: an angle θ and a distance r of the line from the origin. So you quantize these 2 parameters, and create a 2D array with one cell for every possible line that could be present in your image. (The finer the quantization you use, the larger the array you will need, but the more accurate the position of the found lines will be.) Initialize the array to zeros. Then, for every pixel that is part of an edge detected by Canny, you determine every line (θ,r) that the pixel could be part of, and increment the corresponding bin. After processing all pixels, you will have, for each bin, a count of how many pixels were detected on the line corresponding to that bin. Counts which are high enough probably represent real lines in the image, even if parts of the line are missing. So you just scan through the bins to find bins which exceed the threshold.
OpenCV contains Hough detectors for straight lines and circles, but not for rectangles. You could either use the line detector and check for 4 lines that form the edges of your document; or you could write your own Hough detector for rectangles, perhaps using the paper Jung 2004 for inspiration. Rectangles have at least 5 degrees of freedom (2D position, scale, aspect ratio, and rotation angle), and memory requirement for a 5D array obviously goes up pretty fast. But since the range of each parameter is limited (ie, the document's aspect ratio is known, and you can assume the document will be well centered and not rotated much) it is probably feasible.

Detecting a hand above a chessboard using opencv

I am developing an android application for analyzing chess games based on series of photos. To process images, I am using OpenCV. My question is how can I detect that there is a player's hand on a picture? Because I would like to filter those photos and analyze only the ones with the only chessboard on them.
So far I managed to get the Canny, so from an image like that
original image
I am able to get that canny
.
But I have no idea what can I do next...
The code I used to get Canny:
Mat gray, blur, cannyed;
cvtColor(img, gray, CV_BGR2GRAY);
GaussianBlur(gray, blur, Size(7, 7), 0, 0);
Canny(blur, cannyed, 50, 100, 3);
I would highly appreciate any ideas and advice on what to do next and what OpenCV functions can I use.

You have a very nice spectrum in the chess board. A hand in it messes up the frequencies built up by the regular transitions between the black and white squares. Try moving a bigger square (let's say the size of a 4.5 x 4.5 squares) around and see what happens to the frequencies.
Another approach if you have the sequence of pictures taken as a movie is to analyse the motions. Take the difference of consecutive frames (low pass filter them a bit first) to detect motions. Filter the motions in time (over several frames). Then threshold these motions to get a binary image. Erode the binary shapes to filter out small moving objects (noise, chess figure) be able to detect if any larger moving shape is on the board (e.g. a hand).

Here, After Canny Edge detection the morphological operations of horizontal and vertical lines extraction process i tried.
Mat horizontal = cannyed.clone();
// Specify size on horizontal axis
int horizontalsize = horizontal.cols / 60;
// Create structure element for extracting horizontal lines through morphology operations
Mat horizontalStructure = getStructuringElement(MORPH_RECT, Size(horizontalsize,1));
erode(horizontal, horizontal, horizontalStructure, Point(-1, -1),2);
dilate(horizontal, horizontal, horizontalStructure, Point(-1, -1),1);
imshow("horizontal",horizontal);
Mat vertical = cannyed.clone();
// Specify size on horizontal axis
int verticalsize = vertical.cols / 60;
// Create structure element for extracting horizontal lines through morphology operations
Mat verticalStructure = getStructuringElement(MORPH_RECT, Size(1,verticalsize));
erode(vertical, vertical, verticalStructure, Point(-1, -1));
dilate(vertical, vertical, verticalStructure, Point(-1, -1),2);
imshow("vertical",vertical);
the results are ,
Horizontal Lines in the chess board
Then, from the figure you can see there is a proper interval in between the lines. The area where hand is present there is more interval in lines.
In that location, if contour is done, the hand (or any object ) over the chess board can be detected.
This helps to solve for any object when placed over chess board.

Thank you all very much for your suggestions.
So I solved the problem mostly using Gowthaman's method. First I use his code to generate vertical and horizontal lines. Then I combine them like this:
Mat combined = vertical + horizontal;
So I get something like that when there is no hand
or like that when there is a hand
.
Next I count white pixels using the code:
int GetPixelCount(Mat image, uchar color)
{
int result = 0;
for (int i = 0; i < image.rows; i++)
{
for (int j = 0; j < image.cols; j++)
{
if (image.at<uchar>(Point(j, i)) == color)
result++;
}
}
return result;
}
I do that for every photo in the series. First photo is always without a hand, so I use is as a template. If current photo has less then 98% of template white pixels then I deduce there is hand (or something else) in it.
Most likely this is not an optimal method and has lots of weaknesses, but it is very simple and works for me just fine :)

Remove Boxes/rectangles from image

I have the following image.
this image
I would like to remove the orange boxes/rectangle around numbers and keep the original image clean without any orange grid/rectangle.
Below is my current code but it does not remove it.
Mat mask = new Mat();
Mat src = new Mat();
src = Imgcodecs.imread("enveloppe.jpg",Imgcodecs.CV_LOAD_IMAGE_COLOR);
Imgproc.cvtColor(src, hsvMat, Imgproc.COLOR_BGR2HSV);
Scalar lowerThreshold = new Scalar(0, 50, 50);
Scalar upperThreshold = new Scalar(25, 255, 255);
Mat mask = new Mat();
Core.inRange(hsvMat, lowerThreshold, upperThreshold, mask);
//src.setTo(new scalar(255,255,255),mask);
what to do next ?
How can i remove the orange boxes/rectangle from the original images ?
Update:
For information , the mask contains exactly all the boxes/rectangle that i want to remove. I don't know how to use this mask to remove boxes/rectangle from the source (src) image as if they were not present.

This is what I did to solve the problem. I solved the problem in C++ and I used OpenCV.
Part 1: Find box candidates
Firstly I wanted to isolate the signal that was specific for red channel. I splitted the image into three channels. I then subtracted the red channel from blue channel and the red from green channel. After that I subtracted both previous subtraction results from one another. The final subtraction result is shown on the image below.
using namespace cv;
using namespace std;
Mat src_rgb = imread("image.jpg");
std::vector<Mat> channels;
split(src_rgb, channels);
Mat diff_rb, diff_rg;
subtract(channels[2], channels[0], diff_rb);
subtract(channels[2], channels[1], diff_rg);
Mat diff;
subtract(diff_rb, diff_rg, diff);
My next goal was to divide the parts of obtained image into separate "groups". To do that, I smoothed the image a little bit with a Gaussian filter. Then I applied a threshold to obtain a binary image; finally I looked for external contours within that image.
GaussianBlur(diff, diff, cv::Size(11, 11), 2.0, 2.0);
threshold(diff, diff, 5, 255, THRESH_BINARY);
vector<vector<Point>> contours;
findContours(diff, contours, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_NONE);
Click to see subtraction result, Gaussian blurred image, thresholded image and detected contours.
Part 2: Inspect box candidates
After that, I had to make an estimate whether the interior of each contour contained a number or something else. I made an assumption that numbers will always be printed with black ink and that they will have sharp edges. Therefore I took a blue channel image and I applied just a little bit of Gaussian smoothing and convolved it with a Laplacian operator.
Mat blurred_ch2;
GaussianBlur(channels[2], blurred_ch2, cv::Size(7, 7), 1, 1);
Mat laplace_result;
Laplacian(blurred_ch2, laplace_result, -1, 1);
I then took the resulting image and applied the following procedure for every contour separately. I computed a standard deviation of the pixel values within the contour interior. Standard deviation was high inside the contours that surrounded numbers; and it was low inside the two contours that surrounded the dog's head and the letters on top of the stamp.
That is why I could appliy the standard deviation threshold. Standard deviation was approx. twice larger for contours containing numbers so this was an easy way to only select the contours that contained numbers. Then I drew the contour interior mask. I used erosion and subtraction to obtain the "box edge mask".
The final step was fairly easy. I computed an estimate of average pixel value nearby the box on every channel of the image. Then I changed all pixel values under the "box edge mask" to those values on every channel. After I repeated that procedure for every box contour, I merged all three channels into one.
Mat mask(src_rgb.size(), CV_8UC1);
for (int i = 0; i < contours.size(); ++i)
{
mask.setTo(0);
drawContours(mask, contours, i, cv::Scalar(200), -1);
Scalar mean, stdev;
meanStdDev(laplace_result, mean, stdev, mask);
if (stdev.val[0] < 10.0) continue;
Mat eroded;
erode(mask, eroded, cv::Mat(), cv::Point(-1, -1), 6);
subtract(mask, eroded, mask);
for (int c = 0; c < src_rgb.channels(); ++c)
{
erode(mask, eroded, cv::Mat());
subtract(mask, eroded, eroded);
Scalar mean, stdev;
meanStdDev(channels[c], mean, stdev, eroded);
channels[c].setTo(mean, mask);
}
}
Mat final_result;
merge(channels, final_result);
imshow("Final Result", final_result);
Click to see red channel of the image, the result of convolution with Laplacian operator, drawn mask of the box edges and the final result.
Please note
This code is far from being optimal, especially the last loop does quite a lot of unnecessary work. But I think that in this case readability is more important (and the author of the question did not request an optimized solution anyway).
Looking towards more general solution
After I posted the initial reply, the author of the question noted that the digits can be of any color and their edges are not necessarily sharp. That means that above procedure can fail because of various reasons. I altered the input image so that it contains different kinds of numbers (click to see the image) and you can run my algorithm on this input and analyze what goes wrong.
The way I see it, one of these approaches is needed (or perhaps a mixture of both) to obtain a more "general" solution:
concentrate only on rectangle shape and color (confirm that the box candidate is really an orange box and remove it regardless of what is inside)
concentrate on numbers only (run a proper number detection algorithm inside the interior of every box candidate; if it contains a single number, remove the box)
I will give a trivial example of the first approach. If you can assume that orange box size will always be the same, just check the box size instead of standard deviation of the signal in the last loop of the algorithm:
Rect rect = boundingRect(contours[i]);
float area = rect.area();
if (area < 1000 || area > 1200) continue;
Warning: actual area of rectangles is around 600Px^2, but I took into account the Gaussian Blurring, which caused the contour to expand. Please also note that if you use this approach you no longer need to perform blurring or laplace operations on blue channel image.
You can also add other simple constraints to that condition; ratio between width and height is the first one that comes to my mind. Geometric properties can also be a good option (right angles, straight edges, convexness ...).

directional edge detection in OpenCV

I would like to detect edge that has certain angle/orientation.
Adapting from a post in SO, I've figured out to use OpenCV magnitude, phase and Sobel functions to filter out unwanted edge points. Then use the magnitude image (with phase image as condition) to output the edge points.
However, the results is not similar to Canny edge function. It's good that the edges with unwanted angles are filtered out but detected edges are blobs of points, not thin line edge
the left edge image is also plotted out after findContours is used, but this barely helps out
1) what else should be added to mimic Canny processing?
2) As for the purpose of directional edge detection, is this approach more robust than using a directional kernel other than typical Sobel ones?
Thank you!
Edit 01:
forgot to put my code link

alternatively, you can try lsd,(http://www.ipol.im/pub/art/2012/gjmr-lsd/). it outputs lines as two point pairs so directional filtering is also possible.
there's also another line segment implementation # http://sourceforge.net/projects/lswms/ though the lsd link above has better results
if you want a single pixel edge, you would need to do skeletonization/thinning
edit
rename the lsd.c into lsd.cpp when you are compiling. i used version 1.6 attached in the url. code and results below. you can tweak the thresholds to suppress the small segments as well.
#include "opencv2/opencv.hpp"
using namespace cv;
#include "lsd.h"
void lsd_call(Mat& im)
{
Mat gray;
cvtColor(im,gray,CV_BGR2GRAY);
Mat imgdouble;
gray.convertTo(imgdouble,CV_64FC1);
double * image;
double * out;
int x,y,i,j,n;
out = lsd(&n,(double*)imgdouble.data,imgdouble.cols,imgdouble.rows);
Mat lines = im.clone();
Mat lines_binary = Mat::zeros(gray.size(),CV_8UC1);
for(i=0;i<n;i++)
{
double x1,y1,x2,y2,w;
x1 = out[7*i+0];
y1 = out[7*i+1];
x2 = out[7*i+2];
y2 = out[7*i+3];
w = out[7*i+4];
double length = sqrt(pow(x1-x2,2)+pow(y1-y2,2));
double angle = atan2(y2 - y1, x2 - x1) * 180 / CV_PI;
if(angle<180 && angle>90)
{
line(lines,Point2d(out[7*i+0],out[7*i+1]),Point2d(out[7*i+2],out[7*i+3]),Scalar (0,0,255));
line(lines_binary,Point2d(out[7*i+0],out[7*i+1]),Point2d(out[7*i+2],out[7*i+3]) ,Scalar(255));
}
if(length>75)
{
//line(todraw,Point2d(out[7*i+0],out[7*i+1]),Point2d(out[7*i+2],out[7*i+3]), Scalar(0,0,255),out[7*i+4]);
}
}
imshow("lines",lines);
imshow("lines_binary",lines_binary);
imwrite("c:/data/lines.jpg",lines);
imwrite("c:/data/linesbinary.jpg",lines_binary);
free( (void *) out );
}
int main(int argc,char** argv )
{
Mat im = imread("c:/data/lines.png");
lsd_call(im);
waitKey(0);
}

1)
Canny edge detector produces thin edges because of non-maxima supression along the neighbours.
In order to mimic that, you need to choose edge pixels with maximimum edge response along that direction. So blobs of points can be prevented this way.
As you can probably guess, the weaker images in the grid can be suppressed with threshold defined by you.
2) I can't give a definite answer to that sadly. For the angel given, the kernels might be limited by
discretization. So for many different angles, this approach 'should' be better.

OpenCV: solvePnP detection problems

I've got problem with precise detection of markers using OpenCV.
I've recorded video presenting that issue: http://youtu.be/IeSSW4MdyfU
As you see I'm markers that I'm detecting are slightly moved at some camera angles. I've read on the web that this may be camera calibration problems, so I'll tell you guys how I'm calibrating camera, and maybe you'd be able to tell me what am I doing wrong?
At the beginnig I'm collecting data from various images, and storing calibration corners in _imagePoints vector like this
std::vector<cv::Point2f> corners;
_imageSize = cvSize(image->size().width, image->size().height);
bool found = cv::findChessboardCorners(*image, _patternSize, corners);
if (found) {
cv::Mat *gray_image = new cv::Mat(image->size().height, image->size().width, CV_8UC1);
cv::cvtColor(*image, *gray_image, CV_RGB2GRAY);
cv::cornerSubPix(*gray_image, corners, cvSize(11, 11), cvSize(-1, -1), cvTermCriteria(CV_TERMCRIT_EPS+ CV_TERMCRIT_ITER, 30, 0.1));
cv::drawChessboardCorners(*image, _patternSize, corners, found);
}
_imagePoints->push_back(_corners);
Than, after collecting enough data I'm calculating camera matrix and coefficients with this code:
std::vector< std::vector<cv::Point3f> > *objectPoints = new std::vector< std::vector< cv::Point3f> >();
for (unsigned long i = 0; i < _imagePoints->size(); i++) {
std::vector<cv::Point2f> currentImagePoints = _imagePoints->at(i);
std::vector<cv::Point3f> currentObjectPoints;
for (int j = 0; j < currentImagePoints.size(); j++) {
cv::Point3f newPoint = cv::Point3f(j % _patternSize.width, j / _patternSize.width, 0);
currentObjectPoints.push_back(newPoint);
}
objectPoints->push_back(currentObjectPoints);
}
std::vector<cv::Mat> rvecs, tvecs;
static CGSize size = CGSizeMake(_imageSize.width, _imageSize.height);
cv::Mat cameraMatrix = [_userDefaultsManager cameraMatrixwithCurrentResolution:size]; // previously detected matrix
cv::Mat coeffs = _userDefaultsManager.distCoeffs; // previously detected coeffs
cv::calibrateCamera(*objectPoints, *_imagePoints, _imageSize, cameraMatrix, coeffs, rvecs, tvecs);
Results are like you've seen in the video.
What am I doing wrong? is that an issue in the code? How much images should I use to perform calibration (right now I'm trying to obtain 20-30 images before end of calibration).
Should I use images that containg wrongly detected chessboard corners, like this:
or should I use only properly detected chessboards like these:
I've been experimenting with circles grid instead of of chessboards, but results were much worse that now.
In case of questions how I'm detecting marker: I'm using solvepnp function:
solvePnP(modelPoints, imagePoints, [_arEngine currentCameraMatrix], _userDefaultsManager.distCoeffs, rvec, tvec);
with modelPoints specified like this:
markerPoints3D.push_back(cv::Point3d(-kMarkerRealSize / 2.0f, -kMarkerRealSize / 2.0f, 0));
markerPoints3D.push_back(cv::Point3d(kMarkerRealSize / 2.0f, -kMarkerRealSize / 2.0f, 0));
markerPoints3D.push_back(cv::Point3d(kMarkerRealSize / 2.0f, kMarkerRealSize / 2.0f, 0));
markerPoints3D.push_back(cv::Point3d(-kMarkerRealSize / 2.0f, kMarkerRealSize / 2.0f, 0));
and imagePoints are coordinates of marker corners in processing image (I'm using custom algorithm to do that)

In order to properly debug your problem I would need all the code :-)
I assume you are following the approach suggested in the tutorials (calibration and pose) cited by #kobejohn in his comment and so that your code follows these steps:
collect various images of chessboard target
find chessboard corners in images of point 1)
calibrate the camera (with cv::calibrateCamera) and so obtain as a result the intrinsic camera parameters (let's call them intrinsic) and the lens distortion parameters (let's call them distortion)
collect an image of your own custom target (the target is seen at 0:57 in your video) and it is shown in the following figure and find some relevant points in it (let's call the point you found in image image_custom_target_vertices and world_custom_target_vertices the corresponding 3D points).
estimate the rotation matrix (let's call it R) and the translation vector (let's call it t) of the camera from the image of your own custom target you get in point 4), with a call to cv::solvePnP like this one cv::solvePnP(world_custom_target_vertices,image_custom_target_vertices,intrinsic,distortion,R,t)
giving the 8 corners cube in 3D (let's call them world_cube_vertices) you get the 8 2D image points (let's call them image_cube_vertices) by means of a call to cv2::projectPoints like this one cv::projectPoints(world_cube_vertices,R,t,intrinsic,distortion,image_cube_vertices)
draw the cube with your own draw function.
Now, the final result of the draw procedure depends on all the previous computed data and we have to find where the problem lies:
Calibration: as you observed in your answer, in 3) you should discard the images where the corners are not properly detected. You need a threshold for the reprojection error in order to discard "bad" chessboard target images. Quoting from the calibration tutorial:
Re-projection Error
Re-projection error gives a good estimation of just how exact is the
found parameters. This should be as close to zero as possible. Given
the intrinsic, distortion, rotation and translation matrices, we first
transform the object point to image point using cv2.projectPoints().
Then we calculate the absolute norm between what we got with our
transformation and the corner finding algorithm. To find the average
error we calculate the arithmetical mean of the errors calculate for
all the calibration images.
Usually you will find a suitable threshold with some experiments. With this extra step you will get better values for intrinsic and distortion.
Finding you own custom target: it does not seem to me that you explain how you find your own custom target in the step I labeled as point 4). Do you get the expected image_custom_target_vertices? Do you discard images where that results are "bad"?
Pose of the camera: I think that in 5) you use intrinsic found in 3), are you sure nothing is changed in the camera in the meanwhile? Referring to the Callari's Second Rule of Camera Calibration:
Second Rule of Camera Calibration: "Thou shalt not touch the lens
after calibration". In particular, you may not refocus nor change the
f-stop, because both focusing and iris affect the nonlinear lens
distortion and (albeit less so, depending on the lens) the field of
view. Of course, you are completely free to change the exposure time,
as it does not affect the lens geometry at all.
And then there may be some problems in the draw function.

So, I've experimented a lot with my code, and I still haven't fixed the main issue (shifted objects), but I've managed to answer some of calibration questions I've asked.
First of all - in order to obtain good calibration results you have to use images with properly detected grid elements/circles positions!. Using all captured images in calibration process (even those that aren't properly detected) will result bad calibration.
I've experimented with various calibration patterns:
Asymmetric circles pattern (CALIB_CB_ASYMMETRIC_GRID), give much worse results than any other pattern. By worse results I mean that it produces a lot of wrongly detected corners like these:
I've experimented with CALIB_CB_CLUSTERING and it haven't helped much - in some cases (different light environment) it got better, but not much.
Symmetric circles pattern (CALIB_CB_SYMMETRIC_GRID) - better results than asymmetric grid, but still I've got much worse results than standard grid (chessboard). It often produces errors like these:
Chessboard (found using findChessboardCorners function) - this method is producing best possible results - it doesn't produce misaligned corners very often, and almost every calibration is producing similar results to best-possible results from symmetric circles grid
For every calibration I've been using 20-30 images that were coming from different angles. I've tried even with 100+ images but it haven't produced noticeable change in calibration results than smaller amount of images. It's worth noticing that larger number of test images is increasing time needed to compute camera parameters in non-linear way (100 test images in 480x360 resolution are computing 25 minutes in iPad4, compared with 4 minutes with ~50 images)
I've also experimented with solvePNP parameters - but is also haven't gave me any acceptable results: I've tried all 3 detection methods (ITERATIVE, EPNP and P3P), but I haven't seen aby noticeable change.
Also I've tried with useExtrinsicGuess set to true, and I've used rvec and tvec from previous detection, but this one resulted with complete disapperance of detected cube.
I've ran out of ideas - what else could be affecting these shifting problems?

For those still interested:
this is an old question, but I think your problem is not the bad calibration.
I developed an AR app for iOS, using OpenCV and SceneKit, and I have had your same issue.
I think your problem is the wrong render position of the cube:
OpenCV's solvePnP returns the X, Y, Z coordinates of the marker center, but you wanna render the cube over the marker, at a specific distance along the Z axis of the marker, exactly at one half of the cube side size. So you need to improve the Z coordinate of the marker translation vector of this distance.
In fact, when you see your cube from the top, the cube is render properly.
I have done an image in order to explain the problem, but my reputation prevent to post it.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart