Assuming I have a template image and searching for a match in a video,what is the measure to be looked for ?
From OpenCV tutorial here
1.loc = np.where( res >= threshold) gives me numpy array.How to infer it on a scale of 1-100,where 100 refers to exact match and 80 refers to 80% match and so on.
2.I am not clear on min,max values ..what does rectangle coordinates denote?
# Apply template Matching
res = cv2.matchTemplate(img,template,method)
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(res)
I'm not too familiar with Python, but I have worked with template matching and OpenCV.
Performing a template match produces a results matrix - called res in your example.
Depending on the template matching method used, the brightest/darkest (max/min) points on this result matrix are your best matches.
In your example the method cv2.TM_SQDIFF_NORMED is used which will normalise the result matrix values between 0 and 1.
You can then iterate over your result matrix points and only store those points which pass a certain threshold, in the example they use 0.8 which is equivalent to an 80% match.
The last step involves marking each match onto the drawing by using the rectangle drawing function which works as follows:
Rectangle(img, pt1, pt2, color, thickness=1, lineType=8, shift=0)
img - image matrix, the picture you want to draw on
pt1 - Top left point of the rectangle (x,y)
pt2 - Bottom right point of the rectangle (x,y)
color - Line colour (BGR format)
I answered a similar question here and provided an example that might be of some help to you too.
Related
For an openCV project, I need to find a "perfect" homography from descriptor matching. I pushed paramters : many corners, retroprojection threshold very low...
It's close but I need almost pixel perfect homography.
I name marker the image I'm looking for, query the image which contains the marker.
My idea was to refine the 4 corner positions of the marker in query image. And then recalculate homography from these refined 4 corners.
I was thinking of using hough to detect line intersections. So I get corners which are indeed good candidates.
Now I need a score function to assess which of these candidates is the "best". Here what I tried
1/
- let's call H an homogrphy
- test = query + H*marker. So if my homography was "pefect", test would be identical to query (but for shadows...).
2/ Then I calculate the "difference" naively like below. I sum the absolute difference and then divide by the area (otherwise the smaller, the better). Not reliable... I gave more weight to the sums and still fails.
It really seemed a good idea to me but it's an epic fail for now: the difference does not help find the "better" corners. Any idea ?
Thank you very much,
Michaƫl
def diffImageScore(testImage,workingImage, queryPersp):
height,width,d = testImage.shape
height2,width2,d2 = workingImage.shape
area=quadrilatereArea(queryPersp)*1.0
score=0
height, width, depth = testImage.shape
img1Gray= cv2.cvtColor(cv2.blur(testImage,(3,3)), cv2.COLOR_BGR2GRAY)
img2Gray= cv2.cvtColor(cv2.blur(workingImage,(3,3)), cv2.COLOR_BGR2GRAY)
for i in range(0, height):
for j in range(0, width):
s1=abs(int(img1Gray[i,j]) -int(img2Gray[i,j] ))
s1=pow(s1,3)
score+=s1
return score/area
I'd like to compute a sort of direction field on a 2D image, as (poorly) illustrated from this photoshop mockup. NOTE: This is NOT a vector field as you learn about in differential equations. Instead, this is something that draws along the lines that one would see if they computed level sets of the image.
Are there known methods of obtaining this type of direction field (red lines) of an image? It seems like it almost behaves like the normal to the gradient, but this isn't exactly it, either, since there are places where the gradient is zero and I'd like direction fields at these locations as well.
I was able to find a paper on how to do this for fingerprint processing that went into enough detail that their results were repeatable. It's unfortunately behind a paywall, but here it is for anyone interested and able to access the full text:
Systematic methods for the computation of the directional fields and singular points of fingerprints
EDIT: As requested, here is a quick and dirty summary (in Python) of how this is achieved in the above paper.
A naive approach would be to average the gradient in a small square neighborhood around the target pixel, much like the superimposed grid on the image in the question, and then compute the normal. However, if you simply average the gradient, it's possible that opposite gradients in the region will cancel each other (e.g. when computing the orientation along a ridge). Thus, it is common to compute with squared gradients, since gradients pointing in opposite directions would then be aligned. There is a clever formula for the squared gradient based on the original gradient. I won't give the derivation, but here is the formula:
Now, take the sum of squared gradients over the region (modulo some piece-wise defined compensations for the way angles work). Finally, through some arctangent magic, you'll get the orientation field.
If you run the following code on a smooth grayscale bitmap image with the grid-size chosen appropriately and then plot the orientation field O alongside your original image, you'll see how the orientation field more or less gives the angles I asked about in my original question.
from scipy import misc
import numpy as np
import math
# Import the grayscale image
bmp = misc.imread('path/filename.bmp')
# Compute the gradient - VERY important to convert to floats!
grad = np.gradient(bmp.astype(float))
# Set the block size (superimposed grid on the sample image in the question)
blockRadius=5
# Compute the orientation field. Result will be a matrix of angles in [0, \pi), one for each pixel in the original (grayscale) image.
O = np.zeros(bmp.shape)
for x in range(0,bmp.shape[0]):
for y in range(0,bmp.shape[1]):
numerator = 0.
denominator = 0.
for i in range(max(0,x-blockRadius),min(bmp.shape[0],x+blockRadius)):
for j in range(max(0,y-blockRadius),min(bmp.shape[0],y+blockRadius)):
numerator = numerator + 2.*grad[0][i,j]*grad[1][i,j]
denominator = denominator + (math.pow(grad[0][i,j],2.) - math.pow(grad[1][i,j],2.))
if denominator==0:
O[x,y] = 0
elif denominator > 0:
O[x,y] = (1./2.)*math.atan(numerator/denominator)
elif numerator >= 0:
O[x,y] = (1./2.)*(math.atan(numerator/denominator)+math.pi)
elif numerator < 0:
O[x,y] = (1./2.)*(math.atan(numerator/denominator)-math.pi)
for x in range(0,bmp.shape[0]):
for y in range(0,bmp.shape[1]):
if O[x,y] <= 0:
O[x,y] = O[x,y] + math.pi
else:
O[x,y] = O[x,y]
Cheers!
I am developing an application where I am using SIFT + RANSAC and Homography to find an object (OpenCV C++,Java). The problem I am facing is that where there are many outliers RANSAC performs poorly.
For this reasons I would like to try what the author of SIFT said to be pretty good: voting.
I have read that we should vote in a 4 dimension feature space, where the 4 dimensions are:
Location [x, y] (someone says Traslation)
Scale
Orientation
While with opencv is easy to get the match scale and orientation with:
cv::Keypoints.octave
cv::Keypoints.angle
I am having hard time to understand how I can calculate the location.
I have found an interesting slide where with only one match we are able to draw a bounding box:
But I don't get how I could draw that bounding box with just one match. Any help?
You are looking for the largest set of matched features that fit a geometric transformation from image 1 to image 2. In this case, it is the similarity transformation, which has 4 parameters: translation (dx, dy), scale change ds, and rotation d_theta.
Let's say you have matched to features: f1 from image 1 and f2 from image 2. Let (x1,y1) be the location of f1 in image 1, let s1 be its scale, and let theta1 be it's orientation. Similarly you have (x2,y2), s2, and theta2 for f2.
The translation between two features is (dx,dy) = (x2-x1, y2-y1).
The scale change between two features is ds = s2 / s1.
The rotation between two features is d_theta = theta2 - theta1.
So, dx, dy, ds, and d_theta are the dimensions of your Hough space. Each bin corresponds to a similarity transformation.
Once you have performed Hough voting, and found the maximum bin, that bin gives you a transformation from image 1 to image 2. One thing you can do is take the bounding box of image 1 and transform it using that transformation: apply the corresponding translation, rotation and scaling to the corners of the image. Typically, you pack the parameters into a transformation matrix, and use homogeneous coordinates. This will give you the bounding box in image 2 corresponding to the object you've detected.
When using the Hough transform, you create a signature storing the displacement vectors of every feature from the template centroid (either (w/2,h/2) or with the help of central moments).
E.g. for 10 SIFT features found on the template, their relative positions according to template's centroid is a vector<{a,b}>. Now, let's search for this object in a query image: every SIFT feature found in the query image, matched with one of template's 10, casts a vote to its corresponding centroid.
votemap(feature.x - a*, feature.y - b*)+=1 where a,b corresponds to this particular feature vector.
If some of those features cast successfully at the same point (clustering is essential), you have found an object instance.
Signature and voting are reverse procedures. Let's assume V=(-20,-10). So during searching in the novel image, when the two matches are found, we detect their orientation and size and cast a respective vote. E.g. for the right box centroid will be V'=(+20*0.5*cos(-10),+10*0.5*sin(-10)) away from the SIFT feature because it is in half size and rotated by -10 degrees.
To complete Dima's , one needs to add that the 4D Hough space is quantized into a (possibly small) number of 4D boxes, where each box corresponds to the simiƩarity given by its center.
Then, for each possible similarity obtained via a tentative matching of features, add 1 into the corresponding box (or cell) in the 4D space. The output similarity is given by the cell with the more votes.
In order to computethe transform from 1 match, just use Dima's formulas in his answer. For several pairs of matches, you may need to use some least squares fit.
Finally, the transform can be applied with the function cv::warpPerspective(), where the third line of the perspective matrix is set to [0,0,1].
I need an algorithm written in any language to find an image inside of an image, including at different scales. Does anyone know a starting point to solving a problem like this?
For example:
I have an image of 800x600 and in that image is a yellow ball measuring 180 pixels in circumference. I need to be able to find this image with a search pattern of a yellow ball having a circumference of 15 pixels.
Thanks
Here's an algorithm:
Split the image into RGB and take the blue channel. You will notice that areas that were yellow in the color image are now dark in the blue channel. This is because blue and yellow are complementary colors.
Invert the blue channel
Create a greyscale search pattern with a circle that's the same size as what's in the image (180 pixels in circumference). Make it a white circle on a black background.
Calculate the cross-correlation of the search pattern with the inverted blue channel.
The cross-correlation peak will correspond to the location of the ball.
Here's the algorithm in action:
RGB and R:
G and B:
Inverted B and pattern:
Python + OpenCV code:
import cv
if __name__ == '__main__':
image = cv.LoadImage('ball-b-inv.png')
template = cv.LoadImage('ball-pattern-inv.png')
image_size = cv.GetSize(image)
template_size = cv.GetSize(template)
result_size = [ s[0] - s[1] + 1 for s in zip(image_size, template_size) ]
result = cv.CreateImage(result_size, cv.IPL_DEPTH_32F, 1)
cv.MatchTemplate(image, template, result, cv.CV_TM_CCORR)
min_val, max_val, min_loc, max_loc = cv.MinMaxLoc(result)
print max_loc
Result:
misha#misha-desktop:~/Desktop$ python cross-correlation.py
(72, 28)
This gives you the top-left co-ordinate of the first occurence of the pattern in the image. Add the radius of the circle to both x and y co-ordinates if you want to find the center of the circle.
You should take a look at OpenCV, an open source computer vision library - this would be a good starting point. Specifically check out object detection and the cvMatchTemplate method.
a version of one of previous posts made with opencv 3 and python 3
import cv2
import sys
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(cv2.matchTemplate(cv2.imread(sys.argv[1]),cv2.imread(sys.argv[2]),cv2.TM_CCOEFF_NORMED))
print(max_loc)
save as file.py and run as:
python file.py image pattern
A simple starting point would be the Hough transform, if you want to find circles.
However there is a whole research area arount this subject called object detection and recognition. The state of the art has advanced significantly the past decade.
I'm trying to use OpenCV to "parse" screenshots from the iPhone game Blocked. The screenshots are cropped to look like this:
I suppose for right now I'm just trying to find the coordinates of each of the 4 points that make up each rectangle. I did see the sample file squares.c that comes with OpenCV, but when I run that algorithm on this picture, it comes up with 72 rectangles, including the rectangular areas of whitespace that I obviously don't want to count as one of my rectangles. What is a better way to approach this? I tried doing some Google research, but for all of the search results, there is very little relevant usable information.
The similar issue has already been discussed:
How to recognize rectangles in this image?
As for your data, rectangles you are trying to find are the only black objects. So you can try to do a threshold binarization: black pixels are those ones which have ALL three RGB values less than 40 (I've found it empirically). This simple operation makes your picture look like this:
After that you could apply Hough transform to find lines (discussed in the topic I referred to), or you can do it easier. Compute integral projections of the black pixels to X and Y axes. (The projection to X is a vector of x_i - numbers of black pixels such that it has the first coordinate equal to x_i). So, you get possible x and y values as the peaks of the projections. Then look through all the possible segments restricted by the found x and y (if there are a lot of black pixels between (x_i, y_j) and (x_i, y_k), there probably is a line probably). Finally, compose line segments to rectangles!
Here's a complete Python solution. The main idea is:
Apply pyramid mean shift filtering to help threshold accuracy
Otsu's threshold to get a binary image
Find contours and filter using contour approximation
Here's a visualization of each detected rectangle contour
Results
import cv2
image = cv2.imread('1.png')
blur = cv2.pyrMeanShiftFiltering(image, 11, 21)
gray = cv2.cvtColor(blur, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
cnts = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
peri = cv2.arcLength(c, True)
approx = cv2.approxPolyDP(c, 0.015 * peri, True)
if len(approx) == 4:
x,y,w,h = cv2.boundingRect(approx)
cv2.rectangle(image,(x,y),(x+w,y+h),(36,255,12),2)
cv2.imshow('thresh', thresh)
cv2.imshow('image', image)
cv2.waitKey()
I wound up just building on my original method and doing as Robert suggested in his comment on my question. After I get my list of rectangles, I then run through and calculate the average color over each rectangle. I check to see if the red, green, and blue components of the average color are each within 10% of the gray and blue rectangle colors, and if they are I save the rectangle, if they aren't I discard it. This process gives me something like this:
From this, it's trivial to get the information I need (orientation, starting point, and length of each rectangle, considering the game window as a 6x6 grid).
The blocks look like bitmaps - why don't you use simple template matching with different templates for each block size/color/orientation?
Since your problem is the small rectangles I would start by removing them.
Since those lines are much thinner than the borders of the rectangles I would start by applying morphological operations on the image.
Using a structural element that looks like this:
element = [ 1 1
1 1 ]
should remove lines that are less than two pixels wide. After the small lines are removed the rectangle finding algorithm of OpenCV will most likely do the rest of the job for you.
The erosion can be done in OpenCV by the function cvErode
Try one of the many corner detectors like harris corner detector. also it is in general a good idea to try that at multiple resolutions : so do some preprocessing of of varying magnification.
It appears that you want some sort of color dominated square then you can suppress the other colors, by first using something like cvsplit .....and then thresholding the color...so only that region remains....follow that with a cropping operation ...I think that could work as well ....