I'd like to compute a sort of direction field on a 2D image, as (poorly) illustrated from this photoshop mockup. NOTE: This is NOT a vector field as you learn about in differential equations. Instead, this is something that draws along the lines that one would see if they computed level sets of the image.
Are there known methods of obtaining this type of direction field (red lines) of an image? It seems like it almost behaves like the normal to the gradient, but this isn't exactly it, either, since there are places where the gradient is zero and I'd like direction fields at these locations as well.

I was able to find a paper on how to do this for fingerprint processing that went into enough detail that their results were repeatable. It's unfortunately behind a paywall, but here it is for anyone interested and able to access the full text:
Systematic methods for the computation of the directional fields and singular points of fingerprints
EDIT: As requested, here is a quick and dirty summary (in Python) of how this is achieved in the above paper.
A naive approach would be to average the gradient in a small square neighborhood around the target pixel, much like the superimposed grid on the image in the question, and then compute the normal. However, if you simply average the gradient, it's possible that opposite gradients in the region will cancel each other (e.g. when computing the orientation along a ridge). Thus, it is common to compute with squared gradients, since gradients pointing in opposite directions would then be aligned. There is a clever formula for the squared gradient based on the original gradient. I won't give the derivation, but here is the formula:
Now, take the sum of squared gradients over the region (modulo some piece-wise defined compensations for the way angles work). Finally, through some arctangent magic, you'll get the orientation field.
If you run the following code on a smooth grayscale bitmap image with the grid-size chosen appropriately and then plot the orientation field O alongside your original image, you'll see how the orientation field more or less gives the angles I asked about in my original question.
from scipy import misc
import numpy as np
import math
# Import the grayscale image
bmp = misc.imread('path/filename.bmp')
# Compute the gradient - VERY important to convert to floats!
grad = np.gradient(bmp.astype(float))
# Set the block size (superimposed grid on the sample image in the question)
# Compute the orientation field. Result will be a matrix of angles in [0, \pi), one for each pixel in the original (grayscale) image.
O = np.zeros(bmp.shape)
for x in range(0,bmp.shape[0]):
for y in range(0,bmp.shape[1]):
numerator = 0.
denominator = 0.
for i in range(max(0,x-blockRadius),min(bmp.shape[0],x+blockRadius)):
for j in range(max(0,y-blockRadius),min(bmp.shape[0],y+blockRadius)):
numerator = numerator + 2.*grad[0][i,j]*grad[1][i,j]
denominator = denominator + (math.pow(grad[0][i,j],2.) - math.pow(grad[1][i,j],2.))
if denominator==0:
O[x,y] = 0
elif denominator > 0:
O[x,y] = (1./2.)*math.atan(numerator/denominator)
elif numerator >= 0:
O[x,y] = (1./2.)*(math.atan(numerator/denominator)+math.pi)
elif numerator < 0:
O[x,y] = (1./2.)*(math.atan(numerator/denominator)-math.pi)
for x in range(0,bmp.shape[0]):
for y in range(0,bmp.shape[1]):
if O[x,y] <= 0:
O[x,y] = O[x,y] + math.pi
O[x,y] = O[x,y]


Code for a multiple quadratic (or polynomial) least squares (surface fit)?

for a machine vision project I am trying to search image data for quadratic surfaces (f(x,y) = Ax^2+Bx+Cy^2+Dy+Exy+F). My plan is to iterate through regions of data and perform a surface-fit, look at the error, see if it's a continuous surface (which would probably indicate a feature in the image).
I was previously able to find quadratic curves (f(x) = Ax^2+Bx+C) in the image data by sampling lines, by using the equations on this site
this worked well, was promising, but it would be much more useful for my task to find 2-D regions that form continuous surfaces.
I see lots of articles indicating that least squares regressions scales up to multiple dimensions, but I'm not able to find code for this Hopefully there is a "closed form" (non-iterative, just compute from your data points) solution, like described above for 1D data. Does anybody know of some source or pseudocode that accomplishes this? Thanks.
(Sorry if my terminology is a bit off.)
I'm not sure what your background is, but if you know some linear algebra you will find linear least squares on wikipedia useful.
Lets take the following example. Say we have the following image
and we want to know how well this fits to a 2D quadratic function in a least squares sense.
Probably the most straightforward way to solve the problem is to compute the optimal coefficients in a least squares sense, then check the error.
First we need to describe the matrices.
Let X be a matrix containing every x,y coordinate in the image, taking the form
X = [x1 x1^2 y1 y1^2 x1*y1 1;
x2 x2^2 y2 y2^2 x2*y2 1;
xN xN^2 yN yN^2 xN*yN 1];
For the example image above, X would be a 100x6 matrix.
Let y be the image intensity values in a vector of the form
y = [img(x1,y1);
In this case y is a 100 element column vector.
We want to minimize the least squares objective function S with respect to the vector of coefficients b
S(b) = |y - X*b|^2
where |.| is the L2 norm and b is the desired coefficients
b = [A;
Taking the vector derivative of S(b) with respect to b, setting to zero, and solving for b leads to the standard least squares solution.
b = inv(X'X)*X'*y
where inv is the matrix inverse, ' is transpose, and * is matrix multiplication.
MATLAB example.
% Generate an image
% define x,y coordinates for each location in the image
[x,y] = meshgrid(1:10,1:10);
% true coefficients
b_true = [0.1 0.5 0.3 -0.4 0.4 124];
% magnitude of noise
P = 2;
% create image
img = b_true(1).*x + b_true(2).*x.^2 + b_true(3).*y + b_true(4).*y.^2 + b_true(5).*x.*y + b_true(6);
noise = P*randn(10,10);
img = img + noise;
% Begin least squares optimization
% create matrices
X = [x(:) x(:).^2 y(:) y(:).^2 x(:).*y(:) ones(size(x(:)))];
y = img(:);
% estimated coefficients
b = (X.'*X)\(X.')*y
% mean square error (expected to be near P^2)
E = 1/numel(y) * sum((y - X*b).^2)
b =
E =
In your application you would probably want to define some threshold such that when E < threshold you accept the image (or image region) as a quadratic polynomial.

OpenCV Template matching against video

Assuming I have a template image and searching for a match in a video,what is the measure to be looked for ?
From OpenCV tutorial here
1.loc = np.where( res >= threshold) gives me numpy array.How to infer it on a scale of 1-100,where 100 refers to exact match and 80 refers to 80% match and so on.
2.I am not clear on min,max values ..what does rectangle coordinates denote?
# Apply template Matching
res = cv2.matchTemplate(img,template,method)
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(res)
I'm not too familiar with Python, but I have worked with template matching and OpenCV.
Performing a template match produces a results matrix - called res in your example.
Depending on the template matching method used, the brightest/darkest (max/min) points on this result matrix are your best matches.
In your example the method cv2.TM_SQDIFF_NORMED is used which will normalise the result matrix values between 0 and 1.
You can then iterate over your result matrix points and only store those points which pass a certain threshold, in the example they use 0.8 which is equivalent to an 80% match.
The last step involves marking each match onto the drawing by using the rectangle drawing function which works as follows:
Rectangle(img, pt1, pt2, color, thickness=1, lineType=8, shift=0)
img - image matrix, the picture you want to draw on
pt1 - Top left point of the rectangle (x,y)
pt2 - Bottom right point of the rectangle (x,y)
color - Line colour (BGR format)
I answered a similar question here and provided an example that might be of some help to you too.

Trying to understand implementation of gaussian blurring in matlab

I am trying to blur a scanned text document to the point that the text lines are blurred to black.. I mean the text blends into each other and all I see are black lines.
I'm new to MATLAB and even though I know the basics I cannot get the image to blur properly. I have read this: Gaussian Blurr and according to that the blur is managed/decided by the sigma function. But that is not how it works in the code I wrote.
While trying to learn Gaussian blurring in Matlab I came to find out that its achieved by using this function: fspecial('gaussian',hsize,sigma);
So apparently there are two variables hsize specifies number of rows or columns in the function while sigma is the standard deviation.
Can some one please explain the significance of hsize here and why it has a much deeper effect on the result even more than sigma?
Why is it that even if I increase sigma to a very high value the blurr is not effected but the image is distorted a lot by increasing the hsize
here is my code:
img = imread('c:\new.jpg');
h = fspecial('gaussian',hsize,sigma);
out = imfilter(img,h);
and the results are attached:
Why is it not only controlled by sigma? What role does hsize play? Why cant I get it to blur the text only rather than distort the entire image?
Thank you
hsize refers to the size of the filter. Specifically, a filter that is Nx
x Ny pixels uses a pixel region Nx x Ny in size centered around each
pixel when computing the response of the filter. The response is just how
the pixels in that region are combined together. In the case of a
gaussian filter, the intensity at each pixel around the central one is
weighted according to a gaussian function prior to performing a box average over the region.
sigma refers to the standard deviation of the gaussian (see documentation
for fspecial) with units in pixels. As you increase sigma (keeping the
size of the filter the same) eventually you approach a simple box average with uniform weighting
over the filter area around the central pixel, so you stop seeing an effect from increasing sigma.
The similarity between results obtained with gaussian blur (with large value of sigma) and a box
average are shown in the left and middle images below. The right image shows
the results of eroding the image, which is probably what you want.
The code:
% gaussian filter:
hsize = 5;
sigma = 10;
h = fspecial('gaussian',hsize,sigma);
out = imfilter(img,h);
% box filter:
h = fspecial('average',hsize);
out = imfilter(img,h);
% erode:
out = imerode(img,se);
Fspecial's Manual
h = fspecial('gaussian', hsize, sigma) returns a rotationally
symmetric Gaussian lowpass filter of size hsize with standard
deviation sigma (positive). hsize can be a vector specifying the
number of rows and columns in h, or it can be a scalar, in which case
h is a square matrix. The default value for hsize is [3 3]; the
default value for sigma is 0.5. Not recommended. Use imgaussfilt or
imgaussfilt3 instead.
where they say that fspecial - gaussian is not recommended.
In deciding the standard deviation (sigma), you need still decide hsize which affects the blurring.
In imgaussfilt, you decide the standard deviation and the system considers you the rest.
I can get much more better tolerance levels with imgaussfilt and imgaussfilt3 in my systems in Matlab 2016a, example output here in the body
im = im2double( imgGray );
sigma = 5;
simulatedPsfImage = imgaussfilt(im, sigma);
simulatedPsfImage = im2double( simulatedPsfImage );
[ measuredResolution, standardError, bestFitData ] = ...
EstimateResolutionFromPsfImage( simulatedPsfImage, [1.00 1.00] );
Note that the tolerance levels of fspecial are high [0.70 1.30] by default.

How to identify different objects in an image?

I'm intending to write a program to detect and differentiate certain objects from a nearly solid background. The foreground and the background have a high contrast difference which I would further increase to aid in the object identification process. I'm planning to use Hough transform technique and OpenCV.
Sample image
As seen in the above image, I would want to separately identify the circular objects and the square objects (or any other shape out of a finite set of shapes). Since I'm quite new to image processing I do not have an idea whether such a situation needs a neural network to be implemented and each shape to be learned beforehand. Would a technique such as template matching let me do this without a neural network?
These posts will get you started:
How to detect circles
How to detect squares
How to detect a sheet of paper (advanced square detection)
You will probably have to adjust some parameters in these codes to match your circles/squares, but the core of the technique is shown on these examples.
If you intend to detect shapes other than just circles, (and from the image I assume you do), I would recommend the Chamfer matching for a quick start, especially as you have a good contrast.
The basic premise, explained in simple terms, is following:
You do an edge detection (for example, cvCanny in opencv)
You create a distance image, where the value of each pixel means the distance fom the nearest edge.
You take the shapes you would like to detect, define sample points along the edges of the shape, and try to match these points on the distance image. Basically you just add the values on the distance image which are "under" the coordinates of your sample points, given a specific position of your objects.
Find a good minimization algorithm, the effectiveness of this depends on your application.
This basic approach is a general solution, usually works well, but without further advancements, it is very slow.
Usually it's a good idea to first separate the objects of interest, so you don't have to always do the full search on the whole image. Find a good threshold, so you can separate objects. You still don't know which object it is, but you only have to do the matching itself in close proximity of this object.
Another good idea is, instead of doing the full search on the high resolution image, first do it on a very low resolution. The result will not be very accurate, but you can know the general areas where it's worth to do a search on a higher resolution, so you don't waste your time on areas where there is nothing of interest.
There are a number of more advanced techniques, but it's still worth to take a look at the basic chamfer matching, as it is the base of a large number of techniques.
With the assumption that the objects are simple shapes, here's an approach using thresholding + contour approximation. Contour approximation is based on the assumption that a curve can be approximated by a series of short line segments which can be used to determine the shape of a contour. For instance, a triangle has three vertices, a square/rectangle has four vertices, a pentagon has five vertices, and so on.
Obtain binary image. We load the image, convert to grayscale, Gaussian blur, then adaptive threshold to obtain a binary image.
Detect shapes. Find contours and identify the shape of each contour using contour approximation filtering. This can be done using arcLength to compute the perimeter of the contour and approxPolyDP to obtain the actual contour approximation.
Input image
Detected objects highlighted in green
Labeled contours
import cv2
def detect_shape(c):
# Compute perimeter of contour and perform contour approximation
shape = ""
peri = cv2.arcLength(c, True)
approx = cv2.approxPolyDP(c, 0.04 * peri, True)
# Triangle
if len(approx) == 3:
shape = "triangle"
# Square or rectangle
elif len(approx) == 4:
(x, y, w, h) = cv2.boundingRect(approx)
ar = w / float(h)
# A square will have an aspect ratio that is approximately
# equal to one, otherwise, the shape is a rectangle
shape = "square" if ar >= 0.95 and ar <= 1.05 else "rectangle"
# Star
elif len(approx) == 10:
shape = "star"
# Otherwise assume as circle or oval
shape = "circle"
return shape
# Load image, grayscale, Gaussian blur, and adaptive threshold
image = cv2.imread('1.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (7,7), 0)
thresh = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,31,3)
# Find contours and detect shape
cnts = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
# Identify shape
shape = detect_shape(c)
# Find centroid and label shape name
M = cv2.moments(c)
cX = int(M["m10"] / M["m00"])
cY = int(M["m01"] / M["m00"])
cv2.putText(image, shape, (cX - 20, cY), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (36,255,12), 2)
cv2.imshow('thresh', thresh)
cv2.imshow('image', image)

Rectangle detection with Hough transform

I'm trying to implement rectangle detection using the Hough transform, based on
this paper.
I programmed it using Matlab, but after the detection of parallel pair lines and orthogonal pairs, I must detect the intersection of these pairs. My question is about the quality of the two line intersection in Hough space.
I found the intersection points by solving four equation systems. Do these intersection points lie in cartesian or polar coordinate space?
For those of you wondering about the paper, it's:
Rectangle Detection based on a Windowed Hough Transform by Cláudio Rosito Jung and Rodrigo Schramm.
Now according to the paper, the intersection points are expressed as polar coordinates, obviously you implementation may be different (the only way to tell is to show us your code).
Assuming you are being consistent with his notation, your peaks should be expressed as:
You must then perform peak paring given by equation (3) in section 4.3 or
where represents the angular threshold corresponding to parallel lines
and is the normalized threshold corresponding to lines of similar length.
The accuracy of the Hough space should be dependent on two main factors.
The accumulator maps onto Hough Space. To loop through the accumulator array requires that the accumulator divide the Hough Space into a discrete grid.
The second factor in accuracy in Linear Hough Space is the location of the origin in the original image. Look for a moment at what happens if you do a sweep of \theta for any given change in \rho. Near the origin, one of these sweeps will cover far less pixels than a sweep out near the edges of the image. This has the consequence that near the edges of the image you need a much higher \rho \theta resolution in your accumulator to achieve the same level of accuracy when transforming back to Cartesian.
The problem with increasing the resolution of course is that you will need more computational power and memory to increase it. Also If you uniformly increase the accumulator resolution you have wasted resolution near the origin where it is not needed.
Some ideas to help with this.
place the origin right at the
center of the image. as opposed to
using the natural bottom left or top
left of an image in code.
try using the closest image you can
get to a square. the more elongated an
image is for a given area the more
pronounced the resolution trap
becomes at the edges
Try dividing your image into 4/9/16
etc different accumulators each with
an origin in the center of that sub-image.
It will require a little overhead to link
the results of each accumulator together
for rectangle detection, but it should help
spread the resolution more evenly.
The ultimate solution would be to increase
the resolution linearly depending on the
distance from the origin. this can be achieved using the
(x-a)^2 + (y-b)^2 = \rho^2
circle equation where
- x,y are the current pixel
- a,b are your chosen origin
- \rho is the radius
once the radius is known adjust your accumulator
resolution accordingly. You will have to keep
track of the center of each \rho \theta bin.
for transforming back to Cartesian
The link to the referenced paper does not work, but if you used the standard hough transform than the four intersection points will be expressed in cartesian coordinates. In fact, the four lines detected with the hough tranform will be expressed using the "normal parametrization":
rho = x cos(theta) + y sin(theta)
so you will have four pairs (rho_i, theta_i) that identifies your four lines. After checking for orthogonality (for example just by comparing the angles theta_i) you solve four equation system each of the form:
rho_j = x cos(theta_j) + y sin(theta_j)
rho_k = x cos(theta_k) + y sin(theta_k)
where x and y are the unknowns that represents the cartesian coordinates of the intersection point.
I am not a mathematician. I am willing to stand corrected...
From Hough 2) ... any line on the xy plane can be described as p = x cos theta + y sin theta. In this representation, p is the normal distance and theta is the normal angle of a straight line, ... In practical applications, the angles theta and distances p are quantized, and we obtain an array C(p, theta).
from CRC standard math tables Analytic Geometry, Polar Coordinates in a Plane section ...
Such an ordered pair of numbers (r, theta) are called polar coordinates of the point p.
Straight lines: let p = distance of line from O, w = counterclockwise angle from OX to the perpendicular through O to the line. Normal form: r cos(theta - w) = p.
From this I conclude that the points lie in polar coordinate space.
