Triangulation of encoded points from Structured Light Scanning produces warped point cloud - opencv

I've been trying to code a 3D scanner that uses structured light (one camera and projector only). I'm using Taubin and Moreno's software (at their site that is somehow down right now, use wayback machine to check it out.) to obtain the projector intrinsics and extrinsics as a start, and the linear least squares triangulation method described here.
However, regardless of the objects scanned, the point clouds obtained are warped in a convex manner (see images below). This is most likely not a intrinsics/extrinsics/distortion parameter issue, as the same calibration parameters gives a proper point cloud when using the software linked above. I'm also inclined to say that my decoding process is not faulty, as the row and column correspondences appear to be correct (see below). Using a previously decoded dataset also gives the same issue.
def linearLS_triangulation(u_c, u_p, P_c, P_p, A, B):
Performs linear least squares triangulation via an overdetermined linear
A[0][0] = u_c[0]*P_c[2][0] - P_c[0][0]
A[0][1] = u_c[0]*P_c[2][1] - P_c[0][1]
A[0][2] = u_c[0]*P_c[2][2] - P_c[0][2]
A[1][0] = u_c[1]*P_c[2][0] - P_c[1][0]
A[1][1] = u_c[1]*P_c[2][1] - P_c[1][1]
A[1][2] = u_c[1]*P_c[2][2] - P_c[1][2]
A[2][0] = u_p[0]*P_p[2][0] - P_p[0][0]
A[2][1] = u_p[0]*P_p[2][1] - P_p[0][1]
A[2][2] = u_p[0]*P_p[2][2] - P_p[0][2]
A[3][0] = u_p[1]*P_p[2][0] - P_p[1][0]
A[3][1] = u_p[1]*P_p[2][1] - P_p[1][1]
A[3][2] = u_p[1]*P_p[2][2] - P_p[1][2]
B[0][0] = -(u_c[0] * P_c[2][3] - P_c[0][3])
B[1][0] = -(u_c[1] * P_c[2][3] - P_c[1][3])
B[2][0] = -(u_p[0] * P_p[2][3] - P_p[0][3])
B[3][0] = -(u_p[1] * P_p[2][3] - P_p[1][3])
# Use of the normal equation, np.linalg.lstsq also works!
w = np.linalg.inv(
return w[:, 0]
def get_cam_points(decoded, K_c):
Get list of camera pixels that have a correspondence to projector pixels
Returned in global coordinates, where world centre is centre of projection
of the camera
[height, width] = np.nonzero(decoded[0])
points_cam = np.zeros([3, height.shape[0]], dtype = np.float)
K_c_inv = np.linalg.inv(K_c)
for i in range(height.shape[0]):
points_cam[:, i] = [width[i], height[i], 1]
points_cam =, points_cam)
return points_cam
def get_proj_pixels(width_p, height_p, K_p, dist_p, R_p, T_p):
Passes the resolution of the projector along with the intrinsics and
extrinsics, computing the mapping from projector pixels to the optical
rays returned in [x, y, z] for each pixel in the 'image'
This assumes that the camera is the origin, with rotation and translation
matrixes of the projector respect to that.
column_p = np.arange(width_p, dtype = np.float)
row_p = np.arange(height_p, dtype = np.float)
C, R = np.meshgrid(column_p, row_p)
uv_p = np.zeros([np.ravel(C).shape[0], 1, 2], dtype = np.float)
uv_p[:, 0, :] = np.c_[np.ravel(C),np.ravel(R)]
uv_p = cv2.undistortPoints(uv_p, K_p, dist_p)
uv_p = uv_p[:,0,:]
uv_p = np.c_[uv_p, np.ones([np.ravel(C).shape[0]])]
uv_p = uv_p.transpose()
uv_grid = np.zeros([3, height_p, width_p], dtype = np.float)
uv_grid[0] = np.reshape(uv_p[0, :], [height_p, width_p])
uv_grid[1] = np.reshape(uv_p[1, :], [height_p, width_p])
uv_grid[2] = np.reshape(uv_p[2, :], [height_p, width_p])
return uv_grid
def triangulate_all(decoded, P_c, P_p, dist_p, K_c, K_p, width_p, height_p):
[height, width] = np.nonzero(decoded[0])
points = np.zeros([3, height.shape[0]], dtype = np.float)
points_cam = get_cam_points(decoded, K_c)
points_proj = np.zeros([3, height.shape[0]], dtype = np.float)
uv_grid = get_proj_pixels(width_p, height_p, K_p, dist_p,
P_p[:, :3], P_p[:, 3].reshape(-1, 1))
# Get list of projector pixels corresponding to non-zero camera pixels
for i in range(height.shape[0]):
inter = decoded[:, height[i], width[i]]
points_proj[:, i] = uv_grid[:, inter[1], inter[0]]
A = np.zeros((4, 3), dtype = np.float)
B = np.zeros((4, 1), dtype = np.float)
for i in range(height.shape[0]):
points[:, i] = linearLS_triangulation(points_cam[:, i],
points_proj[:, i],
P_c, P_p, A, B)
return points
print('Loading calibration parameters...')
calib_params = cv2.FileStorage('calibration.yml', cv2.FILE_STORAGE_READ)
dist_c = calib_params.getNode('cam_kc').mat()
dist_p = calib_params.getNode('proj_kc').mat()
K_c = calib_params.getNode('cam_K').mat()
K_p = calib_params.getNode('proj_K').mat()
R_p = calib_params.getNode('R').mat()
R_p = R_p.transpose() # Rotation matrix of projector with respect to camera origin
R_c = np.array([[1,0,0],[0,1,0],[0,0,1]])
T_p = calib_params.getNode('T').mat()
T_c = np.array([0,0,0])
width_p = 1920
height_p = 1080
P_c = np.c_[R_c, T_c]
P_p = np.c_[R_p, T_p]
print('Loading color image...')
color = cv2.imread(scandir + 'Image01.jpg')
color = color/255
print('Loading decoded matrix...')
# A 2 x imgheight x imgwidth (in pixels) matrix, with the first channel being the column (x-direction)
# estimates and the second channel being the row (y-direction) pixel estimates of the projector.
# E.g. a pixel at point [300, 400] (Origin at top left of image!) would correspond to the projector
# pixels of [16, 4] (Origin at top left). A zero would indicate the lack of correspondence for that
# specific pixel
decoded = np.load('Decoded Matrix.npy')
points = triangulate_all(decoded, P_c, P_p, dist_p, K_c, K_p,
width_p, height_p)
Point cloud and original image
Another point cloud, and the decoded row and column estimates
Help would be greatly appreciated! At a loss of what to do.
Got rid of the line normalizing the ray vectors


Open CV snap points to a rectangle of a specific size

I am attempting to detect an image of a certain type on a page of degraded quality, that has rotational and translational variance. I need to "cropped" the detected image out of the page, so I will need the rotation and coords of the detected image. For example an image that has been photocopied on an A4 page.
I am using SIFT to detect objects the scanned page. These images can be rotated and translated but are not sheered or have any perspective distortion. I am using the classic (SIFT, SURF, ORB, etc) approach however it assumes perspective transform in order to create the 4 points of the bounding polygon. The issue here is since the key points dont line up perfectly (due to varying image qualities, the projection assumes spatial distortion and the polygon is rightfully distorted.
The approach I want to try is to "snap" the detected polygon points to the dimensions/area of the input image. This should allow me to determine the angle of rotation and translation of the image on the page.
Things I have tried are (And Failed):
Filter key point to remove outliers to minimise distortion.
Affine/Rotations/etc matrices, however they assume point from the samples are equidistant and dont do approximations.
ICP: Would probably work, but there is not enough samples and it seems to be more of an approach than a method. I am certain there is a better way.
def detect(img, frame, detector):
frame = frame.copy()
kp1, desc1 = detector.detectAndCompute(img, None)
kp2, desc2 = detector.detectAndCompute(frame, None)
index_params = dict(algorithm=0, trees=5)
search_params = dict()
flann = cv2.FlannBasedMatcher(index_params, search_params)
matches = flann.knnMatch(desc1, desc2, k=2)
good_points = []
for m, n in matches:
if m.distance < 0.5 * n.distance:
if(len(good_points) == 20):
# out_img=cv2.drawMatches(img, kp1, frame, kp2, good_points, flags=2, outImg=None)
# plt.figure(figsize = (6*4, 8*4))
# plt.imshow(out_img)
if len(good_points) > 10: # at least 6 matches are required
# Get the matching points
query_pts = np.float32([kp1[m.queryIdx].pt for m in good_points]).reshape(-1, 1, 2)
train_pts = np.float32([kp2[m.trainIdx].pt for m in good_points]).reshape(-1, 1, 2)
matrix, mask = cv2.findHomography(query_pts, train_pts, cv2.RANSAC, 5.0)
matches_mask = mask.ravel().tolist()
h, w = img.shape
pts = np.float32([[0, 0], [0, h], [w, h], [w, 0]]).reshape(-1, 1, 2)
dst = cv2.perspectiveTransform(pts, matrix)
overlayImage = cv2.polylines(frame, [np.int32(dst)], True, (0, 0, 0), 3)
plt.figure(figsize = (6*2, 8*2))
orb = cv2.SIFT_create()
for frame in frames:
detect(img, frame, orb)
This is an example of a page with the image we are trying to detect on it.
Blue line: rectangle with correct size
Red Line: determines polygon using perspective transform
I stumbled on a post that show you how to extract the minimum bounding box from a set of points. This works really well as it also discloses the rotation.
def detect_ICP(img, frame, detector):
frame = frame.copy()
kp1, desc1 = detector.detectAndCompute(img, None)
kp2, desc2 = detector.detectAndCompute(frame, None)
index_params = dict(algorithm=0, trees=5)
search_params = dict()
flann = cv2.FlannBasedMatcher(index_params, search_params)
matches = flann.knnMatch(desc1, desc2, k=2)
matches = sorted(matches, key = lambda x:x[0].distance + 0.5 * x[1].distance)
good_points = []
for m, n in matches:
if m.distance < 0.5 * n.distance:
out_img=cv2.drawMatches(img, kp1, frame, kp2, good_points, flags=2, outImg=None)
plt.figure(figsize = (6*4, 8*4))
if len(good_points) > 10: # at least 6 matches are required
# Get the matching points
query_pts = np.float32([kp1[m.queryIdx].pt for m in good_points]).reshape(-1, 1, 2)
train_pts = np.float32([kp2[m.trainIdx].pt for m in good_points]).reshape(-1, 1, 2)
matrix, mask = cv2.findHomography(query_pts, train_pts, cv2.RANSAC, 5.0)
# matches_mask = mask.ravel().tolist()
h, w = img.shape
pts = np.float32([[0, 0], [0, h], [w, h], [w, 0]]).reshape(-1, 1, 2)
dst = cv2.perspectiveTransform(pts, matrix)
# determine the minimum bounding box
minAreaRect = cv2.minAreaRect(dst) # This will have size and rotation information
rotatedBox = cv2.boxPoints(minAreaRect)
rotatedBox = np.float32(rotatedBox).reshape(-1, 1, 2)
overlayImage = cv2.polylines(frame, [np.int32(rotatedBox)], True, (0, 0, 0), 3)
plt.figure(figsize = (6*2, 8*2))

opencv stereo camera calibration

I am working on stereo camera calibration with OpenCV according to the standard tutorial given by However, the calibrated output is not good and the rms value is 78.26. I already tried any available solutions I can find from Google, while none of them can work.
Detail implementation:
I use 13 image pairs to find object points and image point with the below code.
def getCalibrateParams(leftImgPath, rightImgPath):
# termination criteria
w = 9
h = 7
chess_size = (9, 7)
chess_size_r = (7,9)
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
# prepare object points, like (0,0,0), (1,0,0), (2,0,0) ....,(6,5,0)
#objp = np.zeros((,3), np.float32)
#objp[:,:2] = np.indices(chess_size).T.reshape(-1,2)
objp = np.zeros((w*h, 3), np.float32)
objp[:,:2] = np.mgrid[0:w, 0:h].T.reshape(-1,2)
# Arrays to store object points and image points from all the images.
objpoints = [] # 3d point in real world space
leftImgpoints = [] # 2d points in image plane.
rightImgPoints = []
leftImg = glob.glob(leftImgPath)
rightImg = glob.glob(rightImgPath)
for fname in leftImg:
img = cv2.imread(fname)
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# Find the chess board corners
ret, corners = cv2.findChessboardCorners(gray, (w,h), None)
if not ret:
raise ChessboardNotFoundError('No chessboard could be found!')
#increase the accuracy of seeking for corners
# Draw and display the corners
#cv2.drawChessboardCorners(img, chess_size, corners,ret)
for fname in rightImg:
img = cv2.imread(fname)
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
ret, corners = cv2.findChessboardCorners(gray, chess_size_r)
if not ret:
raise ChessboardNotFoundError('No chessboard could be found!')
#increase the accuracy of seeking for corners
return objpoints,leftImgpoints,rightImgPoints
After that, I try to calibrate an image pair with the below code:
objectPoints, imagePoints1, imagePoints2 = getCalibrateParams(leftImgPath, rightImgPath)
#use any image to find the size
img = cv2.imread('/home/wuyang/vr/img/test/test_1_01_02.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
h, w = img.shape[:2]
#single camera calibration to fetch a more accurate camera matrix
ret1, cameraMatrix1, distCoeffs1, rvecs1, tvecs1 = cv2.calibrateCamera(objectPoints, imagePoints1, gray.shape[::-1],None, None)
ret2, cameraMatrix2, distCoeffs2, rvecs2, tvecs2 = cv2.calibrateCamera(objectPoints, imagePoints2, gray.shape[::-1],None, None)
print ret1, ret2
stereo_criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
stereo_flags = cv2.CALIB_FIX_INTRINSIC
rms, cameraMatrix1,distCoeffs1, cameraMatrix2, distCoeffs2, R, T = cv2.stereoCalibrate(objectPoints, imagePoints1,
imagePoints2, imageSize = (w,h),
cameraMatrix1 = cameraMatrix1, distCoeffs1 = distCoeffs1,
cameraMatrix2 = cameraMatrix2, distCoeffs2 = distCoeffs2,
criteria = stereo_criteria, flags = stereo_flags)[:-2]
print 'stereo calibration result: ',rms
#print cv2.CALIB_FIX_INTRINSIC 256
#print cv2.CALIB_RATIONAL_MODEL 16384
#print cv2.CALIB_FIX_K1 32
#print cv2.CALIB_FIX_K2 64
#print cv2.CALIB_FIX_K3 128
#print cv2.CALIB_FIX_K4 2048
#print cv2.CALIB_FIX_K5 4096
#print cv2.CALIB_FIX_K6 8192
print 'rms value:', rms
print 'cameraMatrix1:\n', cameraMatrix1
print 'cameraMatrix2:\n', cameraMatrix2
print 'disCoeffs1:\n', distCoeffs1
print 'disCoeffs2:\n', distCoeffs2
print 'rotation vector:\n', R
print 'translation vector:\n', T
#left camera calibration test
computeReprojectionError(objectPoints, imagePoints1, rvecs1, tvecs1, cameraMatrix1, distCoeffs1)
newcameramtx1, roi1 = getCameraMatrix(img, cameraMatrix1, distCoeffs1)
undistort(img, cameraMatrix1, distCoeffs1, newcameramtx1, roi1)
R1, R2, P1, P2, Q = cv2.stereoRectify(cameraMatrix1, distCoeffs1, cameraMatrix2, distCoeffs2,
(w,h), R, T, flags = 0, alpha = -1)[:-2]
# distort images
undistort_map1, rectify_map1 = cv2.initUndistortRectifyMap(cameraMatrix1, distCoeffs1, R1, P1, (w,h), cv2.CV_32FC1)
undistort_map2, rectify_map2 = cv2.initUndistortRectifyMap(cameraMatrix2, distCoeffs2, R2, P2, (w,h), cv2.CV_32FC1)
lpath = '/home/wuyang/vr/img/test/test_2_01_01.jpg'
rpath = '/home/wuyang/vr/img/test/test_2_01_02.jpg'
lImg = cv2.imread(lpath)
rImg = cv2.imread(rpath)
#undistor_output1 = cv2.undistort(test,undistort_map1, rectify_map1, None, newcameramtx)
undistor_output1 = cv2.remap(lImg, undistort_map1, rectify_map1, cv2.INTER_LINEAR)
undistor_output2 = cv2.remap(rImg, undistort_map2, rectify_map2, cv2.INTER_LINEAR)
cv2.imwrite('ss.jpg', undistor_output1)
The flow is quite standard while the output is not good.
The left image to be calibrated:
The calibrated result: enter link description here
Please help to see how to get a reasonable good calibrated result. Thanks a lot!
I would say your captured photos are just not good enough... That is a too high value of rms error. Analyze carefully your pairs of photos and see if they are not blurred. Additionally capture a little more pairs of photos, from different points of view, different distances to the camera and always having examples of the chessboard on the borders of the images. A good calibration should have an error under 0.5. Notice that a bad pair of images could increase highly your error.

Difficult time trying to do shape recognition for 3D objects

I am trying to make a shape recognition classifier in which if you give an individual picture of an object (from a scene), it would be able to classify (after machine learning) the shape of an object (cylinder, cube, sphere, etc).
Original scene:
Individual objects it will classify:
I attempted to do this using cv2.approxPolyDB with an attempt to classify a cylinder. However, either my implementation isn't good or this wasn't a good choice of an algorithm to choose in the first place, the objects in the shape of cylinders were assigned a approxPolyDB value of 3 or 4.
Perhaps I can threshold and, in general, if given a value of 3 or 4, assume the object is a cylinder, but I feel like it's not the most reliable method for 3D shape classification. I feel like there is a better way to implement this and a better method as opposed to just hardcoding values. I feel like that with this method, it can easily confuse a cylinder with a cube.
Is there any way I can improve my 3D shape recognition program?
import cv2
import numpy as np
from pyimagesearch import imutils
from PIL import Image
from time import time
def invert_img(img):
img = (255-img)
return img
def threshold(im):
imgray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
imgray = cv2.medianBlur(imgray,9)
imgray = cv2.Canny(imgray,75,200)
return imgray
def view_all_contours(im, size_min, size_max):
main = np.array([[]])
cnt_target = im.copy()
for c in cnts:
epsilon = 0.1*cv2.arcLength(c,True)
approx = cv2.approxPolyDP(c,epsilon,True)
area = cv2.contourArea(c)
print 'area: ', area
test = im.copy()
# To weed out contours that are too small or big
if area > size_min and area < size_max:
print c[0,0]
print 'approx: ', len(approx)
max_pos = c.max(axis=0)
max_x = max_pos[0,0]
max_y = max_pos[0,1]
min_pos = c.min(axis=0)
min_x = min_pos[0,0]
min_y = min_pos[0,1]
# Load each contour onto image
cv2.drawContours(cnt_target, c, -1,(0,0,255),2)
print 'Found object'
frame_f = test[min_y:max_y , min_x:max_x]
main = np.append(main, approx[None,:][None,:])
thresh = frame_f.copy()
thresh = threshold(thresh)
contours_small, hierarchy = cv2.findContours(thresh.copy(),cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
cnts_small = sorted(contours_small, key = cv2.contourArea, reverse = True)
cv2.drawContours(frame_f, cnts_small, -1,(0,0,255),2)
cv2.imshow('Thresh', thresh)
cv2.imshow('Show Ya', frame_f)
# Uncomment in order to show all rectangles in image
print '---------------------------------------------'
#cv2.drawContours(cnt_target, cnts, -1,(0,255,0),2)
print main.shape
print main
return cnt_target
time_1 = time()
roi = cv2.imread('images/beach_trash_3.jpg')
hsv = cv2.cvtColor(roi,cv2.COLOR_BGR2HSV)
target = cv2.imread('images/beach_trash_3.jpg')
target = imutils.resize(target, height = 400)
hsvt = cv2.cvtColor(target,cv2.COLOR_BGR2HSV)
img_height = target.shape[0]
img_width = target.shape[1]
# calculating object histogram
roihist = cv2.calcHist([hsv],[0, 1], None, [180, 256], [0, 180, 0, 256] )
# normalize histogram and apply backprojection
dst = cv2.calcBackProject([hsvt],[0,1],roihist,[0,180,0,256],1)
# Now convolute with circular disc
disc = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(5,5))
# threshold and binary AND
ret,thresh = cv2.threshold(dst,50,255,0)
thresh_one = thresh.copy()
thresh = cv2.merge((thresh,thresh,thresh))
res = cv2.bitwise_and(target,thresh)
# Implementing morphological erosion & dilation
kernel = np.ones((9,9),np.uint8) # (6,6) to get more contours (9,9) to reduce noise
thresh_one = cv2.erode(thresh_one, kernel, iterations = 3)
thresh_one = cv2.dilate(thresh_one, kernel, iterations=2)
# Invert the image
thresh_one = invert_img(thresh_one)
# To show prev img
#res = np.vstack((target,thresh,res))
#cv2.imshow('Before contours', thresh_one)
cnt_target = target.copy()
cnt_full = target.copy()
# Code to draw the contours
contours, hierarchy = cv2.findContours(thresh_one.copy(),cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
cnts = sorted(contours, key = cv2.contourArea, reverse = True)
print time() - time_1
size_min = 200
size_max = 5000
cnt_target = view_all_contours(target, size_min, size_max)
cv2.drawContours(cnt_full, cnts, -1,(0,0,255),2)
res = imutils.resize(thresh_one, height = 700)
cv2.imshow('Original image', target)
cv2.imshow('Preprocessed', thresh_one)
cv2.imshow('All contours', cnt_full)
cv2.imshow('Filtered contours', cnt_target)

Detect if UIImage is blurry [duplicate]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I was wondering if there is a way to determine if an image is blurry or not by analyzing the image data.
Another very simple way to estimate the sharpness of an image is to use a Laplace (or LoG) filter and simply pick the maximum value. Using a robust measure like a 99.9% quantile is probably better if you expect noise (i.e. picking the Nth-highest contrast instead of the highest contrast.) If you expect varying image brightness, you should also include a preprocessing step to normalize image brightness/contrast (e.g. histogram equalization).
I've implemented Simon's suggestion and this one in Mathematica, and tried it on a few test images:
The first test blurs the test images using a Gaussian filter with a varying kernel size, then calculates the FFT of the blurred image and takes the average of the 90% highest frequencies:
testFft[img_] := Table[
blurred = GaussianFilter[img, r];
fft = Fourier[ImageData[blurred]];
{w, h} = Dimensions[fft];
windowSize = Round[w/2.1];
fft[[w/2 - windowSize ;; w/2 + windowSize,
h/2 - windowSize ;; h/2 + windowSize]]])]]
), {r, 0, 10, 0.5}]
Result in a logarithmic plot:
The 5 lines represent the 5 test images, the X axis represents the Gaussian filter radius. The graphs are decreasing, so the FFT is a good measure for sharpness.
This is the code for the "highest LoG" blurriness estimator: It simply applies an LoG filter and returns the brightest pixel in the filter result:
testLaplacian[img_] := Table[
blurred = GaussianFilter[img, r];
Max[Flatten[ImageData[LaplacianGaussianFilter[blurred, 1]]]];
), {r, 0, 10, 0.5}]
Result in a logarithmic plot:
The spread for the un-blurred images is a little better here (2.5 vs 3.3), mainly because this method only uses the strongest contrast in the image, while the FFT is essentially a mean over the whole image. The functions are also decreasing faster, so it might be easier to set a "blurry" threshold.
Yes, it is. Compute the Fast Fourier Transform and analyse the result. The Fourier transform tells you which frequencies are present in the image. If there is a low amount of high frequencies, then the image is blurry.
Defining the terms 'low' and 'high' is up to you.
As stated in the comments, if you want a single float representing the blurryness of a given image, you have to work out a suitable metric.
nikie's answer provide such a metric. Convolve the image with a Laplacian kernel:
1 -4 1
And use a robust maximum metric on the output to get a number which you can use for thresholding. Try to avoid smoothing too much the images before computing the Laplacian, because you will only find out that a smoothed image is indeed blurry :-).
During some work with an auto-focus lens, I came across this very useful set of algorithms for detecting image focus. It's implemented in MATLAB, but most of the functions are quite easy to port to OpenCV with filter2D.
It's basically a survey implementation of many focus measurement algorithms. If you want to read the original papers, references to the authors of the algorithms are provided in the code. The 2012 paper by Pertuz, et al. Analysis of focus measure operators for shape from focus (SFF) gives a great review of all of these measure as well as their performance (both in terms of speed and accuracy as applied to SFF).
EDIT: Added MATLAB code just in case the link dies.
function FM = fmeasure(Image, Measure, ROI)
%This function measures the relative degree of focus of
%an image. It may be invoked as:
% FM = fmeasure(Image, Method, ROI)
% Image, is a grayscale image and FM is the computed
% focus value.
% Method, is the focus measure algorithm as a string.
% see 'operators.txt' for a list of focus
% measure methods.
% ROI, Image ROI as a rectangle [xo yo width heigth].
% if an empty argument is passed, the whole
% image is processed.
% Said Pertuz
% Abr/2010
if ~isempty(ROI)
Image = imcrop(Image, ROI);
WSize = 15; % Size of local window (only some operators)
switch upper(Measure)
case 'ACMO' % Absolute Central Moment (Shirvaikar2004)
if ~isinteger(Image), Image = im2uint8(Image);
FM = AcMomentum(Image);
case 'BREN' % Brenner's (Santos97)
[M N] = size(Image);
DH = Image;
DV = Image;
DH(1:M-2,:) = diff(Image,2,1);
DV(:,1:N-2) = diff(Image,2,2);
FM = max(DH, DV);
FM = FM.^2;
FM = mean2(FM);
case 'CONT' % Image contrast (Nanda2001)
ImContrast = inline('sum(abs(x(:)-x(5)))');
FM = nlfilter(Image, [3 3], ImContrast);
FM = mean2(FM);
case 'CURV' % Image Curvature (Helmli2001)
if ~isinteger(Image), Image = im2uint8(Image);
M1 = [-1 0 1;-1 0 1;-1 0 1];
M2 = [1 0 1;1 0 1;1 0 1];
P0 = imfilter(Image, M1, 'replicate', 'conv')/6;
P1 = imfilter(Image, M1', 'replicate', 'conv')/6;
P2 = 3*imfilter(Image, M2, 'replicate', 'conv')/10 ...
-imfilter(Image, M2', 'replicate', 'conv')/5;
P3 = -imfilter(Image, M2, 'replicate', 'conv')/5 ...
+3*imfilter(Image, M2, 'replicate', 'conv')/10;
FM = abs(P0) + abs(P1) + abs(P2) + abs(P3);
FM = mean2(FM);
case 'DCTE' % DCT energy ratio (Shen2006)
FM = nlfilter(Image, [8 8], #DctRatio);
FM = mean2(FM);
case 'DCTR' % DCT reduced energy ratio (Lee2009)
FM = nlfilter(Image, [8 8], #ReRatio);
FM = mean2(FM);
case 'GDER' % Gaussian derivative (Geusebroek2000)
N = floor(WSize/2);
sig = N/2.5;
[x,y] = meshgrid(-N:N, -N:N);
G = exp(-(x.^2+y.^2)/(2*sig^2))/(2*pi*sig);
Gx = -x.*G/(sig^2);Gx = Gx/sum(Gx(:));
Gy = -y.*G/(sig^2);Gy = Gy/sum(Gy(:));
Rx = imfilter(double(Image), Gx, 'conv', 'replicate');
Ry = imfilter(double(Image), Gy, 'conv', 'replicate');
FM = Rx.^2+Ry.^2;
FM = mean2(FM);
case 'GLVA' % Graylevel variance (Krotkov86)
FM = std2(Image);
case 'GLLV' %Graylevel local variance (Pech2000)
LVar = stdfilt(Image, ones(WSize,WSize)).^2;
FM = std2(LVar)^2;
case 'GLVN' % Normalized GLV (Santos97)
FM = std2(Image)^2/mean2(Image);
case 'GRAE' % Energy of gradient (Subbarao92a)
Ix = Image;
Iy = Image;
Iy(1:end-1,:) = diff(Image, 1, 1);
Ix(:,1:end-1) = diff(Image, 1, 2);
FM = Ix.^2 + Iy.^2;
FM = mean2(FM);
case 'GRAT' % Thresholded gradient (Snatos97)
Th = 0; %Threshold
Ix = Image;
Iy = Image;
Iy(1:end-1,:) = diff(Image, 1, 1);
Ix(:,1:end-1) = diff(Image, 1, 2);
FM = max(abs(Ix), abs(Iy));
FM = sum(FM(:))/sum(sum(FM~=0));
case 'GRAS' % Squared gradient (Eskicioglu95)
Ix = diff(Image, 1, 2);
FM = Ix.^2;
FM = mean2(FM);
case 'HELM' %Helmli's mean method (Helmli2001)
MEANF = fspecial('average',[WSize WSize]);
U = imfilter(Image, MEANF, 'replicate');
R1 = U./Image;
index = (U>Image);
FM = 1./R1;
FM(index) = R1(index);
FM = mean2(FM);
case 'HISE' % Histogram entropy (Krotkov86)
FM = entropy(Image);
case 'HISR' % Histogram range (Firestone91)
FM = max(Image(:))-min(Image(:));
case 'LAPE' % Energy of laplacian (Subbarao92a)
LAP = fspecial('laplacian');
FM = imfilter(Image, LAP, 'replicate', 'conv');
FM = mean2(FM.^2);
case 'LAPM' % Modified Laplacian (Nayar89)
M = [-1 2 -1];
Lx = imfilter(Image, M, 'replicate', 'conv');
Ly = imfilter(Image, M', 'replicate', 'conv');
FM = abs(Lx) + abs(Ly);
FM = mean2(FM);
case 'LAPV' % Variance of laplacian (Pech2000)
LAP = fspecial('laplacian');
ILAP = imfilter(Image, LAP, 'replicate', 'conv');
FM = std2(ILAP)^2;
case 'LAPD' % Diagonal laplacian (Thelen2009)
M1 = [-1 2 -1];
M2 = [0 0 -1;0 2 0;-1 0 0]/sqrt(2);
M3 = [-1 0 0;0 2 0;0 0 -1]/sqrt(2);
F1 = imfilter(Image, M1, 'replicate', 'conv');
F2 = imfilter(Image, M2, 'replicate', 'conv');
F3 = imfilter(Image, M3, 'replicate', 'conv');
F4 = imfilter(Image, M1', 'replicate', 'conv');
FM = abs(F1) + abs(F2) + abs(F3) + abs(F4);
FM = mean2(FM);
case 'SFIL' %Steerable filters (Minhas2009)
% Angles = [0 45 90 135 180 225 270 315];
N = floor(WSize/2);
sig = N/2.5;
[x,y] = meshgrid(-N:N, -N:N);
G = exp(-(x.^2+y.^2)/(2*sig^2))/(2*pi*sig);
Gx = -x.*G/(sig^2);Gx = Gx/sum(Gx(:));
Gy = -y.*G/(sig^2);Gy = Gy/sum(Gy(:));
R(:,:,1) = imfilter(double(Image), Gx, 'conv', 'replicate');
R(:,:,2) = imfilter(double(Image), Gy, 'conv', 'replicate');
R(:,:,3) = cosd(45)*R(:,:,1)+sind(45)*R(:,:,2);
R(:,:,4) = cosd(135)*R(:,:,1)+sind(135)*R(:,:,2);
R(:,:,5) = cosd(180)*R(:,:,1)+sind(180)*R(:,:,2);
R(:,:,6) = cosd(225)*R(:,:,1)+sind(225)*R(:,:,2);
R(:,:,7) = cosd(270)*R(:,:,1)+sind(270)*R(:,:,2);
R(:,:,7) = cosd(315)*R(:,:,1)+sind(315)*R(:,:,2);
FM = max(R,[],3);
FM = mean2(FM);
case 'SFRQ' % Spatial frequency (Eskicioglu95)
Ix = Image;
Iy = Image;
Ix(:,1:end-1) = diff(Image, 1, 2);
Iy(1:end-1,:) = diff(Image, 1, 1);
FM = mean2(sqrt(double(Iy.^2+Ix.^2)));
case 'TENG'% Tenengrad (Krotkov86)
Sx = fspecial('sobel');
Gx = imfilter(double(Image), Sx, 'replicate', 'conv');
Gy = imfilter(double(Image), Sx', 'replicate', 'conv');
FM = Gx.^2 + Gy.^2;
FM = mean2(FM);
case 'TENV' % Tenengrad variance (Pech2000)
Sx = fspecial('sobel');
Gx = imfilter(double(Image), Sx, 'replicate', 'conv');
Gy = imfilter(double(Image), Sx', 'replicate', 'conv');
G = Gx.^2 + Gy.^2;
FM = std2(G)^2;
case 'VOLA' % Vollath's correlation (Santos97)
Image = double(Image);
I1 = Image; I1(1:end-1,:) = Image(2:end,:);
I2 = Image; I2(1:end-2,:) = Image(3:end,:);
Image = Image.*(I1-I2);
FM = mean2(Image);
case 'WAVS' %Sum of Wavelet coeffs (Yang2003)
[C,S] = wavedec2(Image, 1, 'db6');
H = wrcoef2('h', C, S, 'db6', 1);
V = wrcoef2('v', C, S, 'db6', 1);
D = wrcoef2('d', C, S, 'db6', 1);
FM = abs(H) + abs(V) + abs(D);
FM = mean2(FM);
case 'WAVV' %Variance of Wav...(Yang2003)
[C,S] = wavedec2(Image, 1, 'db6');
H = abs(wrcoef2('h', C, S, 'db6', 1));
V = abs(wrcoef2('v', C, S, 'db6', 1));
D = abs(wrcoef2('d', C, S, 'db6', 1));
FM = std2(H)^2+std2(V)+std2(D);
case 'WAVR'
[C,S] = wavedec2(Image, 3, 'db6');
H = abs(wrcoef2('h', C, S, 'db6', 1));
V = abs(wrcoef2('v', C, S, 'db6', 1));
D = abs(wrcoef2('d', C, S, 'db6', 1));
A1 = abs(wrcoef2('a', C, S, 'db6', 1));
A2 = abs(wrcoef2('a', C, S, 'db6', 2));
A3 = abs(wrcoef2('a', C, S, 'db6', 3));
A = A1 + A2 + A3;
WH = H.^2 + V.^2 + D.^2;
WH = mean2(WH);
WL = mean2(A);
error('Unknown measure %s',upper(Measure))
function fm = AcMomentum(Image)
[M N] = size(Image);
Hist = imhist(Image)/(M*N);
Hist = abs((0:255)-255*mean2(Image))'.*Hist;
fm = sum(Hist);
function fm = DctRatio(M)
MT = dct2(M).^2;
fm = (sum(MT(:))-MT(1,1))/MT(1,1);
function fm = ReRatio(M)
M = dct2(M);
fm = (M(1,2)^2+M(1,3)^2+M(2,1)^2+M(2,2)^2+M(3,1)^2)/(M(1,1)^2);
A few examples of OpenCV versions:
// OpenCV port of 'LAPM' algorithm (Nayar89)
double modifiedLaplacian(const cv::Mat& src)
cv::Mat M = (Mat_<double>(3, 1) << -1, 2, -1);
cv::Mat G = cv::getGaussianKernel(3, -1, CV_64F);
cv::Mat Lx;
cv::sepFilter2D(src, Lx, CV_64F, M, G);
cv::Mat Ly;
cv::sepFilter2D(src, Ly, CV_64F, G, M);
cv::Mat FM = cv::abs(Lx) + cv::abs(Ly);
double focusMeasure = cv::mean(FM).val[0];
return focusMeasure;
// OpenCV port of 'LAPV' algorithm (Pech2000)
double varianceOfLaplacian(const cv::Mat& src)
cv::Mat lap;
cv::Laplacian(src, lap, CV_64F);
cv::Scalar mu, sigma;
cv::meanStdDev(lap, mu, sigma);
double focusMeasure = sigma.val[0]*sigma.val[0];
return focusMeasure;
// OpenCV port of 'TENG' algorithm (Krotkov86)
double tenengrad(const cv::Mat& src, int ksize)
cv::Mat Gx, Gy;
cv::Sobel(src, Gx, CV_64F, 1, 0, ksize);
cv::Sobel(src, Gy, CV_64F, 0, 1, ksize);
cv::Mat FM = Gx.mul(Gx) + Gy.mul(Gy);
double focusMeasure = cv::mean(FM).val[0];
return focusMeasure;
// OpenCV port of 'GLVN' algorithm (Santos97)
double normalizedGraylevelVariance(const cv::Mat& src)
cv::Scalar mu, sigma;
cv::meanStdDev(src, mu, sigma);
double focusMeasure = (sigma.val[0]*sigma.val[0]) / mu.val[0];
return focusMeasure;
No guarantees on whether or not these measures are the best choice for your problem, but if you track down the papers associated with these measures, they may give you more insight. Hope you find the code useful! I know I did.
Building off of Nike's answer. Its straightforward to implement the laplacian based method with opencv:
short GetSharpness(char* data, unsigned int width, unsigned int height)
// assumes that your image is already in planner yuv or 8 bit greyscale
IplImage* in = cvCreateImage(cvSize(width,height),IPL_DEPTH_8U,1);
IplImage* out = cvCreateImage(cvSize(width,height),IPL_DEPTH_16S,1);
// aperture size of 1 corresponds to the correct matrix
cvLaplace(in, out, 1);
short maxLap = -32767;
short* imgData = (short*)out->imageData;
for(int i =0;i<(out->imageSize/2);i++)
if(imgData[i] > maxLap) maxLap = imgData[i];
return maxLap;
Will return a short indicating the maximum sharpness detected, which based on my tests on real world samples, is a pretty good indicator of if a camera is in focus or not. Not surprisingly, normal values are scene dependent but much less so than the FFT method which has to high of a false positive rate to be useful in my application.
I came up with a totally different solution.
I needed to analyse video still frames to find the sharpest one in every (X) frames. This way, I would detect motion blur and/or out of focus images.
I ended up using Canny Edge detection and I got VERY VERY good results with almost every kind of video (with nikie's method, I had problems with digitalized VHS videos and heavy interlaced videos).
I optimized the performance by setting a region of interest (ROI) on the original image.
Using EmguCV :
//Convert image using Canny
using (Image<Gray, byte> imgCanny = imgOrig.Canny(225, 175))
//Count the number of pixel representing an edge
int nCountCanny = imgCanny.CountNonzero()[0];
//Compute a sharpness grade:
//< 1.5 = blurred, in movement
//de 1.5 à 6 = acceptable
//> 6 =stable, sharp
double dSharpness = (nCountCanny * 1000.0 / (imgCanny.Cols * imgCanny.Rows));
Thanks nikie for that great Laplace suggestion.
OpenCV docs pointed me in the same direction:
using python, cv2 (opencv 2.4.10), and numpy...
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
numpy.max(cv2.convertScaleAbs(cv2.Laplacian(gray, 3)))
result is between 0-255. I found anything over 200ish is very in focus, and by 100, it's noticeably blurry. the max never really gets much under 20 even if it's completely blurred.
One way which I'm currently using measures the spread of edges in the image. Look for this paper:
author = {Pina Marziliano and Frederic Dufaux and Stefan Winkler and Touradj Ebrahimi},
title = {Perceptual blur and ringing metrics: Application to JPEG2000,” Signal Process},
journal = {Image Commun},
year = {2004},
pages = {163--172} }
It's usually behind a paywall but I've seen some free copies around. Basically, they locate vertical edges in an image, and then measure how wide those edges are. Averaging the width gives the final blur estimation result for the image. Wider edges correspond to blurry images, and vice versa.
This problem belongs to the field of no-reference image quality estimation. If you look it up on Google Scholar, you'll get plenty of useful references.
Here's a plot of the blur estimates obtained for the 5 images in nikie's post. Higher values correspond to greater blur. I used a fixed-size 11x11 Gaussian filter and varied the standard deviation (using imagemagick's convert command to obtain the blurred images).
If you compare images of different sizes, don't forget to normalize by the image width, since larger images will have wider edges.
Finally, a significant problem is distinguishing between artistic blur and undesired blur (caused by focus miss, compression, relative motion of the subject to the camera), but that is beyond simple approaches like this one. For an example of artistic blur, have a look at the Lenna image: Lenna's reflection in the mirror is blurry, but her face is perfectly in focus. This contributes to a higher blur estimate for the Lenna image.
Answers above elucidated many things, but I think it is useful to make a conceptual distinction.
What if you take a perfectly on-focus picture of a blurred image?
The blurring detection problem is only well posed when you have a reference. If you need to design, e.g., an auto-focus system, you compare a sequence of images taken with different degrees of blurring, or smoothing, and you try to find the point of minimum blurring within this set. I other words you need to cross reference the various images using one of the techniques illustrated above (basically--with various possible levels of refinement in the approach--looking for the one image with the highest high-frequency content).
I tried solution based on Laplacian filter from this post. It didn't help me. So, I tried the solution from this post and it was good for my case (but is slow):
import cv2
image = cv2.imread("test.jpeg")
height, width = image.shape[:2]
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
def px(x, y):
return int(gray[y, x])
sum = 0
for x in range(width-1):
for y in range(height):
sum += abs(px(x, y) - px(x+1, y))
Less blurred image has maximum sum value!
You can also tune speed and accuracy by changing step, e.g.
this part
for x in range(width - 1):
you can replace with this one
for x in range(0, width - 1, 10):
Matlab code of two methods that have been published in highly regarded journals (IEEE Transactions on Image Processing) are available here:
check the CPBDM and JNBM algorithms. If you check the code it's not very hard to be ported and incidentally it is based on the Marzialiano's method as basic feature.
i implemented it use fft in matlab and check histogram of the fft compute mean and std but also fit function can be done
fa = abs(fftshift(fft(sharp_img)));
fb = abs(fftshift(fft(blured_img)));
That's what I do in Opencv to detect focus quality in a region:
Mat grad;
int scale = 1;
int delta = 0;
int ddepth = CV_8U;
Mat grad_x, grad_y;
Mat abs_grad_x, abs_grad_y;
/// Gradient X
Sobel(matFromSensor, grad_x, ddepth, 1, 0, 3, scale, delta, BORDER_DEFAULT);
/// Gradient Y
Sobel(matFromSensor, grad_y, ddepth, 0, 1, 3, scale, delta, BORDER_DEFAULT);
convertScaleAbs(grad_x, abs_grad_x);
convertScaleAbs(grad_y, abs_grad_y);
addWeighted(abs_grad_x, 0.5, abs_grad_y, 0.5, 0, grad);
cv::Scalar mu, sigma;
cv::meanStdDev(grad, /* mean */ mu, /*stdev*/ sigma);
focusMeasure = mu.val[0] * mu.val[0];

Closed. This question needs to be more focused. It is not currently accepting answers.
