Searching for objects from database on image - image-processing

Let's suppose I have database with thousands of images with different forms and sizes (smaller than 100 x 100px) and it's guaranted that every of images shows only one object - symbol, logo, road sign, etc. I would like to be able to take any image ("my_image.jpg") from the Internet and answer the question "Do my_image contains any object (object can be resized, but without deformations) from my database?" - let's say with 95% reliability. To simplify my_images will have white background.
I was trying use imagehash (https://github.com/JohannesBuchner/imagehash), which would be very helpful, but to get rewarding results I think I have to calculate (almost) every possible hash of my_image - the reason is I don't know object size and location on my_image:
hash_list = []
MyImage = Image.open('my_image.jpg')
for x_start in range(image_width):
for y_start in range(image_height):
for x_end in range(x_start, image_width):
for y_end in range(y_start, image_height):
hash_list.append(imagehash.phash(MyImage.\
crop(x_start, y_start, x_end, y_end)))
...and then try to find similar hash in database, but when for example image_width = image_height = 500 this loops and searching will take ages. Of course I can optymalize it a little bit but it still looks like seppuku for bigger images:
MIN_WIDTH = 30
MIN_HEIGHT = 30
STEP = 2
hash_list = []
MyImage = Image.open('my_image.jpg')
for x_start in range(0, image_width - MIN_WIDTH, STEP):
for y_start in range(0, image_height - MIN_HEIGHT, STEP):
for x_end in range(x_start + MIN_WIDTH, image_width, STEP):
for y_end in range(y_start + MIN_HEIGHT, image_height, STEP):
hash_list.append(...)
I wonder if there is some nice way to define which parts of my_image are profitable to calculate hashes - for example cutting edges looks like bad idea. And maybe there is an easier solve? It will be great if the program could give the answer in max 20 minutes. I would be gratefull for any advice.
PS: sorry for my English :)

This looks like an image retrieval problem to me. However, in your case, you are more interested in a binary YES / NO answer which tells if the input image (my_image.jpg) is of an object which is present in your database.
The first thing which I can suggest is that you can resize all the images (including input) to a fixed size, say 100 x 100. But if an object in some image is very small or is present in a specific region of image (for e.g., top left) then resizing can make things worse. However, it was not clear from your question that how likely this is in you case.
About your second question for finding out the location of object, I think you were considering this because your input images are of large size, such as 500 x 500? If so, then resizing is better idea. However, if you asked this question because objects a localized to particular regions in images, then I think you can compute a gradient image which will help you to identify background regions as follows: since background has no variation (complete white) gradient values will be zero for pixels belonging to background regions.
Rather than calculating and using image hash, I suggest you to read about bag-of-visual-words (for e.g., here) based approaches for object categorization. Although your aim is not to categorize objects, but it will help you come up with a different approach to solve your problem.

After all I found solution that looks really nice for me and maybe it will be useful for someone else:
I'm using SIFT to detect "best candidates" from my_image:
def multiscale_template_matching(template, image):
results = []
for scale in np.linspace(0.2, 1.4, 121)[::-1]:
res = imutils.resize(image, width=int(image.shape[1] * scale))
r = image.shape[1] / float(res.shape[1])
if res.shape[0] < template.shape[0] or res.shape[1] < template.shape[1];
break
## bigger correlation <==> better matching
## template_mathing uses SIFT to return best correlation and coordinates
correlation, (x, y) = template_matching(template, res)
coordinates = (x * r, y * r)
results.appent((correlation, coordinates, r))
results.sort(key=itemgetter(0), reverse=True)
return results[:10]
Then for results I'm calculating hashes:
ACCEPTABLE = 10
def find_best(image, template, candidates):
template_hash = imagehash.phash(template)
best_result = 50 ## initial value must be greater than ACCEPTABLE
best_cand = None
for cand in candidates:
cand_hash = get_hash(...)
hash_diff = template_hash - cand_hash
if hash_diff < best_result:
best_result = hash_diff
best_cand = cand
if best_result <= ACCEPTABLE:
return best_cand, best_result
else:
return None, None
If result < ACCEPTABLE, I'm almost sure the answer is "GOT YOU!" :) This solve allows me to compare my_image with 1000 of objects in 7 minutes.

Related

Gabor filter parametrs for fingerprint image enhancement?

i am biggner in image processing and in gabor filter and i want to use this filter to enhance fingerprint image
i read many articles about fingerprint image enhancement and i know that the steps for that is
read image -> noramalize -> get orientation map -> gabor filter -> binarize -> skeleton
now i am in step 4 , my question is how to get the right values for ( lambds and gamma ) for gabor
filter
my image :
my code :
1- read image and get the orientation map using HOG features
imgc = imread(r'C:\Users\iP\Desktop\printe.jpg',as_gray=True)
imgc = resize(imgc, (64*3,128*3))
rows,cols=imgc.shape
offset=24
ori=9 # to get angels (0,45,90,135) only
fd, hog_image = hog(imgc, orientations=ori, pixels_per_cell=(offset, offset),
cells_per_block=(1, 1), visualize=True, multichannel=None,feature_vector=False
)
orientation map :
2- reshape the orientation map from (8, 16, 1, 1, 9) to (8, 16, 9),,,
8 ->rows , 16 -> cols , 9 orientation
fd=np.array(fd)
fd=np.reshape(fd,(fd.shape[0],fd.shape[1],ori))
# from (8, 16, 9) to (8, 16, 1)
# Choose the angle that has the most potential ( biggest magntude )
angels=np.zeros((fd.shape[0],fd.shape[1],1))
for r in range(fd.shape[0]):
for c in range(fd.shape[1]):
bloc_prop = fd[r,c]
angelss=bloc_prop.reshape((1,ori))
angel=np.argmax(angelss)
angels[r,c]=angel
angels=angels.astype(np.int32)
3- the convolve function
def conv_gabor(img,orient_map,gabor_kernel_shape):
#
# loop on all pixels in the image and convolve it with it's angel in the orientation map
#
roo,coo=img.shape
#to get the padding value for immage before convolving it with kernels
pad=(gabor_kernel_shape-1)
padded=np.zeros((img.shape[0]+pad,img.shape[1]+pad)) # adding the cols and rows
padded[int(pad/2):-int(pad/2),int(pad/2):-int(pad/2)]=img # copy image to inside the padded
image
#result image
dst=padded.copy()
# start from the image that inside the padded
for r in range(int(pad/2),int(pad/2)+roo):
for c in range(int(pad/2),int(pad/2)+coo):
# get the angel from the orientation map
ro=(r-int(pad/2))//offset
co=(c-int(pad/2))//offset
ang=angels[ro,co]
real_angel=(((180/ori)*ang))
# bloack around the pixe to convolve it
block=padded[r-int(pad/2):r+int(pad/2)+1,c-int(pad/2):c+int(pad/2)+1]
# get Gabor kernel
# here is my question ->> what to get the parametres values for ( lambda and gamma
and phi)
ker= cv2.getGaborKernel( (gabor_kernel_shape,gabor_kernel_shape), 3,
np.deg2rad(real_angel),np.pi/4,0.001,0 )
dst[r,c]=np.sum((ker*block))
return dst
dst=conv_gabor(imgc,angels,11)
dst :
you see the image is too bad i dont know why this , i think because the lambda and gamma or what ?
but when i filter with one angel only 45 :
ker= cv2.getGaborKernel( (11,11), 2, np.deg2rad(45),np.pi/4,0.5,0 )
filt = cv2.filter2D(imgc,cv2.CV_64F,ker)
plt.imshow(filt,'gray')
reslut :
you see the edges that has 45 on the left is good quality
can anyone help me please , and tell me what should i do in this probelm ?
thanks all :)
EDIT:
i searched for another way and i found that i can use gabor fiter bank with many orientation and get best score in filtred images , so how can i find best score for pixels from filtred images
this is the output when i use gabor fiter bank with 45,60,65,90,135 angels and divide the filtered images to 16*16 and find the highest standard deviation (best score -> i use standard deviation as the score) for each block and get the best filtred image
so as you can see there are good and bad parts in the image ,i think using standard deviation alone is ineffective in some parts of the image , so my new question is what is best score function that gives me good output parts in the image
original image :
In my opinion, weighting the filtered images might be enough for your task. Considering your filter orientations, the filters with angle 45 and 135 respond quite well at different regions of the image. So, you can calculate the weighted sum to get the best filter result.
img = cv2.imread('fingerprint.jpg',0)
w_45 = 0.5
w_135 = 0.5
img_45 = cv2.filter2D(img,cv2.CV_64F,cv2.getGaborKernel( (11,11), 2, np.deg2rad(45),np.pi/4,0.5,0 ))
img_135 = cv2.filter2D(img,cv2.CV_64F,cv2.getGaborKernel( (11,11), 2, np.deg2rad(135),np.pi/4,0.5,0 ))
result = img_45*w_45+img_135*w_135
result = result/np.amax(result)*255
plt.imshow(result,cmap='gray')
plt.show()
Feel free to play with the weights. The result totally depends on what your next step is.

Difference between absdiff and normal subtraction in OpenCV

I am currently planning on training a binary image classification model. The images I want to train on are the difference between two original pictures. In other words, for each data entry, I start out with 2 pictures, take their difference, and the label that difference as a 0 or 1. My question is what is the best way to find this difference. I know about cv2.absdiff and then normal subtraction of images - what is the most effective way to go about this?
About the data: The images I'm training on are screenshots that usually are the same but may have small differences. I found that normal subtraction seems to show the differences less than absdiff.
This is the code I use for absdiff:
diff = cv2.absdiff(img1, img2)
mask = cv2.cvtColor(diff, cv2.COLOR_BGR2GRAY)
th = 1
imask = mask>1
canvas = np.zeros_like(img2, np.uint8)
canvas[imask] = img2[imask]
And then this for normal subtraction:
def extract_diff(self,imageA, imageB, image_name, path):
subtract = imageB.astype(np.float32) - imageA.astype(np.float32)
mask = cv2.inRange(np.abs(subtract),(30,30,30),(255,255,255))
th = 1
imask = mask>1
canvas = np.zeros_like(imageA, np.uint8)
canvas[imask] = imageA[imask]
Thanks!
A difference can be negative or positive.
For some number types, such as uint8 (unsigned 8-bit int), which can't be negative (have no sign), a negative value wraps around and the value would make no sense anymore. Other types can be signed (e.g. floats, signed ints), so a negative value can be represented correctly.
That's why cv.absdiff exists. It always gives you absolute differences, and those are okay to represent in an unsigned type.
Example with numbers: a = 4, b = 6. a-b should be -2, right?
That value, as an uint8, will wrap around to become 0xFE, or 254 in decimal. The 254 value has some relation to the true -2 difference, but it also incorporates the range of values of the data type (8 bits: 256 values), so it's really just "code".
cv.absdiff would give you the absolute of the difference (-2), which is 2.

Extracting Semi-structured Text from a complex UI (Golf Simulator)

I'm fairly new to the world of OCR, OpenCV, Tesseract etc and was hoping to get some advice or a nudge in the right direction for a project I'm working on. For context, I practice golf at an indoor simulator that is powered by Full Swing Golf. My goal is to build an app (preferably iphone, but desktop is fine too) that will be able to grab the data provided by the simulator and process it however I'd like. The overall workflow would look something like:
Set up iPhone or laptop camera to watch the simulator screen.
Hit ball
Statistics Screen is displayed that looks more or less like:
Detect that the Statistics Screen has been displayed and grab all relevant data:
| Distance | Launch | Back Spin | Club Speed | Carry | To Pin | Direction | Ball Speed | Side Spin | Club Face | Club Path |
|----------|--------|-----------|------------|-------|--------|-----------|------------|-----------|-----------|-----------|
| 345 | 13 | 3350 | 135 | 335 | 80 | 2.4 | 190 | 350 | 4.3 | 1.6 |
5-?: Save the data to my app, keep track of it over time etc...
Attempts So Far:
It seemed like OpenCV's matchTemplate would be a simple way to find all of the headings in the image (Distance, Launch etc...) and it does seem to work when the image and template are both the perfect resolution. However, as this will be an iPhone app, the quality is not something I can really guarantee (within reason). Moreso, the screen will almost never be straight-on as it appears above. Most likely, the camera will be off to the side and we will have to de-skew accordingly. I've attempted to use the following image to work on my deskewing logic to no avail:
Finding the reference points in order to deskew via getPerspectiveTransform and warpPerspective has proven to be incredibly difficult due to the above issues with matching templates.
I've also tried dynamically adjusting for scale with code resembling the following:
def findTemplateLocation(image_path):
template = cv2.imread(image_path)
template = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY)
w, h = template.shape[::-1]
threshold = 0.65
loc = []
for scale in np.linspace(0.1, 2, 20)[::-1]:
resized = imutils.resize(template, width=int(template.shape[1] * scale))
w, h = resized.shape[::-1]
res = cv2.matchTemplate(image_gray, resized, cv2.TM_CCOEFF_NORMED)
loc = np.where(res >= threshold)
if len(list(zip(*loc[::-1]))) > 0:
break
if loc and len(list(zip(*loc[::-1]))) > 0:
adjusted_w = int(w/scale)
adjusted_h = int(h/scale)
print(str(adjusted_w) + " " + str(adjusted_h) + " " + str(scale))
ret = []
for pt in zip(*loc[::-1]):
ret.append({'width': w, 'height': h, 'location': pt})
return ret
return None
This still returns a ton of false positives.
I'm hoping to get some advice on how to approach this problem with a clean slate. I'm open to any language / workflow.
If it does seem that I'm on the right track, my current code is at https://gist.github.com/naderhen/9ec8d45f13d92507131d5bce0e84fad8 . Would really appreciate any suggestions for the best next steps.
Thanks for any help you can provide!
EDIT: Additional Resources
I've uploaded a number of videos and still photos from my time at the indoor simulator this weekend: https://www.dropbox.com/sh/5vub2mi4rvunyaw/AAAY1_7Q_WBV4JvmDD0dEiTDa?dl=0
I tried to get a number of different angles, with different lighting etc. Please let me know if I can provide any other resources that may help.
So, I tried two different methods:
Contour detection - This seemed to be the most obvious method since the statistics screen is the primary part of the image and is present in all your images. Although it does work with two of the three images, it might not be very robust with the parameters. Here are the steps that I tried for contour:
First, get the the image in grayscale or take one of the Value channel in HSV. Then, threshold the image using either Otsu or Adaptive Thresholding. After playing with a lot of the associated parameters, I got satisfactory results, which would basically mean nice whole statistics screen in white on a black background. After this, sort the contours like this:
contours = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)[1]
# Sort the contours to avoid unnecessary comparison in the for loop below
cntsSorted = sorted(contours, key=lambda x: cv2.contourArea(x), reverse=True)
for cnt in cntsSorted[0:20]:
peri = cv2.arcLength(cnt, True)
approx = cv2.approxPolyDP(cnt, 0.04 * peri, True)
if len(approx) == 4 and peri > 10000:
cv2.drawContours(sorted_image, cnt, -1, (0, 255, 0), 10)
Feature Detection and Matching: Since using contours wasn't robust enough, I tried another method which I've worked on for a similar problem as yours. This method is fairly robust, much faster (I tried this on an android phone 2 years back and it would do the job in less than a second for a 1280 x 760 image). However, after trying on your work cases, I figured that your images are pretty vague. What I mean by that is, you have two images in your question that have fairly similar primaries and it works for that but the images you posted in comments are very different from these and hence it doesn't find suitable number of good matches (at least 10 in my case). If you can post a nice set of images that you actually will encounter, I will update this answer with my results on the new set. More importantly, the images of the scene evidently have change in perspective which shouldn't be an issue assuming you are able to get a very good source image (as the first one in your question). However, the change in lighting conditions can be a pain. I'd suggest either using different color spaces such as HSV, Lab and Luv instead of BGR.
Here is where you can find a working example of how to implement your own feature matcher. There are some code changes required depending on the version of OpenCV you are using but I am sure you can find the solutions ( I did ;) ).
A good example:
Some suggestions:
Try getting as clean an image as possible for the image you are using to match with the others (your first image in my case). Hopefully, this would require you to do less processing.
Try using unsharp mask before finding Keypoints.
My results are from using ORB. You can also try with other detectors/descriptors like SURF, SIFT and FAST.
Finally, your approach of template matching should work in cases where there is a change only in the scaling and not the perspective.
Hope this helps! Write a comment if you have any additional questions and/or when you have a good image set ready (rubs palms). Cheers!
Edit 1: This is the code that I used for the Feature Detection and Matching in Opencv 3.4.3 and Python 3.4
def unsharp_mask(im):
# This is used to sharpen images
gaussian_3 = cv2.GaussianBlur(im, (3, 3), 3.0)
return cv2.addWeighted(im, 2.0, gaussian_3, -1.0, 0, im)
def screen_finder2(image, source, num=0):
def resize(im, new_width):
r = float(new_width) / im.shape[1]
dim = (new_width, int(im.shape[0] * r))
return cv2.resize(im, dim, interpolation=cv2.INTER_AREA)
width = 300
source = resize(source, new_width=width)
image = resize(image, new_width=width)
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2LUV)
image, u, v = cv2.split(hsv)
hsv = cv2.cvtColor(source, cv2.COLOR_BGR2LUV)
source, u, v = cv2.split(hsv)
MIN_MATCH_COUNT = 10
orb = cv2.ORB_create()
kp1, des1 = orb.detectAndCompute(image, None)
kp2, des2 = orb.detectAndCompute(source, None)
flann = cv2.DescriptorMatcher_create(cv2.DescriptorMatcher_FLANNBASED)
# Without the below 2 lines, matching doesn't work
des1 = np.asarray(des1, dtype=np.float32)
des2 = np.asarray(des2, dtype=np.float32)
matches = flann.knnMatch(des1, des2, k=2)
# store all the good matches as per Lowe's ratio test
good = []
for m, n in matches:
if m.distance < 0.7 * n.distance:
good.append(m)
if len(good) >= MIN_MATCH_COUNT:
src_pts = np.float32([kp1[m.queryIdx].pt for m in good]).reshape(-1,
1, 2)
dst_pts = np.float32([kp2[m.trainIdx].pt for m in good]).reshape(-1,
1, 2)
M, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC, 5.0)
matchesMask = mask.ravel().tolist()
h,w = image.shape
pts = np.float32([[0, 0], [0, h-1], [w-1, h-1], [w-1, 0]]).reshape(-1,
1, 2)
dst = cv2.perspectiveTransform(pts, M)
source_bgr = cv2.cvtColor(source, cv2.COLOR_GRAY2BGR)
img2 = cv2.polylines(source_bgr, [np.int32(dst)], True, (0,0,255), 3,
cv2.LINE_AA)
cv2.imwrite("out"+str(num)+".jpg", img2)
else:
print("Not enough matches." + str(len(good)))
matchesMask = None
draw_params = dict(matchColor=(0, 255, 0), # draw matches in green color
singlePointColor=None,
matchesMask=matchesMask, # draw only inliers
flags=2)
img3 = cv2.drawMatches(image, kp1, source, kp2, good, None, **draw_params)
cv2.imwrite("ORB"+str(num)+".jpg", img3)
match_image = unsharp_mask(cv2.imread("source.jpg"))
image_1 = unsharp_mask(cv2.imread("Screen_1.jpg"))
screen_finder2(match_image, image_1, num=1)

OpenCV: Essential Matrix Decomposition

I am trying to extract Rotation matrix and Translation vector from the essential matrix.
<pre><code>
SVD svd(E,SVD::MODIFY_A);
Mat svd_u = svd.u;
Mat svd_vt = svd.vt;
Mat svd_w = svd.w;
Matx33d W(0,-1,0,
1,0,0,
0,0,1);
Mat_<double> R = svd_u * Mat(W).t() * svd_vt; //or svd_u * Mat(W) * svd_vt;
Mat_<double> t = svd_u.col(2); //or -svd_u.col(2)
</code></pre>
However, when I am using R and T (e.g. to obtain rectified images), the result does not seem to be right(black images or some obviously wrong outputs), even so I used different combination of possible R and T.
I suspected to E. According to the text books, my calculation is right if we have:
E = U*diag(1, 1, 0)*Vt
In my case svd.w which is supposed to be diag(1, 1, 0) [at least in term of a scale], is not so. Here is an example of my output:
svd.w = [21.47903827647813; 20.28555196246256; 5.167099204708699e-010]
Also, two of the eigenvalues of E should be equal and the third one should be zero. In the same case the result is:
eigenvalues of E = 0.0000 + 0.0000i, 0.3143 +20.8610i, 0.3143 -20.8610i
As you see, two of them are complex conjugates.
Now, the questions are:
Is the decomposition of E and calculation of R and T done in a right way?
If the calculation is right, why the internal rules of essential matrix are not satisfied by the results?
If everything about E, R, and T is fine, why the rectified images obtained by them are not correct?
I get E from fundamental matrix, which I suppose to be right. I draw epipolar lines on both the left and right images and they all pass through the related points (for all the 16 points used to calculate the fundamental matrix).
Any help would be appreciated.
Thanks!
I see two issues.
First, discounting the negligible value of the third diagonal term, your E is about 6% off the ideal one: err_percent = (21.48 - 20.29) / 20.29 * 100 . Sounds small, but translated in terms of pixel error it may be an altogether larger amount.
So I'd start by replacing E with the ideal one after SVD decomposition: Er = U * diag(1,1,0) * Vt.
Second, the textbook decomposition admits 4 solutions, only one of which is physically plausible (i.e. with 3D points in front of the camera). You may be hitting one of non-physical ones. See http://en.wikipedia.org/wiki/Essential_matrix#Determining_R_and_t_from_E .

Blocproc in matlab with two output variables

I have the following problem. I have to compute dense SIFT interest points in a very high dimensional image (182MP). When I run the code in the full image Matlab always close suddently. So I decided to run the code in image patches.
the code
I tried to use blocproc in matlab to call the c++ function that performs the dense sift interest points detection this way:
fun = #(block_struct) denseSIFT(block_struct.data, options);
[dsift , infodsift] = blockproc(ndvi,[1000 1000],fun);
where dsift is the sift descriptors (vectors) and infodsift has the information of the interest points, such as the x and y coordinates.
the problem
The problem is the fact that blocproc just allow one output, but i want both outputs. The following error is given by matlab when i run the code.
Error using blockproc
Too many output arguments.
Is there a way for me doing this?
Would it be a problem for you to "hard code" a version of blockproc?
Assuming for a moment that you can divide your image into NxM smaller images, you could loop around as follows:
bigImage = someFunction();
sz = size(bigImage);
smallSize = sz ./ [N M];
dsift = cell(N,M);
infodsift = cell(N,M);
for ii = 1:N
for jj = 1:M
smallImage = bigImage((ii-1)*smallSize(1) + (1:smallSize(1)), (jj-1)*smallSize(2) + (1:smallSize(2));
[dsift{ii,jj} infodsift{ii,jj}] = denseSIFT(smallImage, options);
end
end
The results will then be in the two cell arrays. No real need to pre-allocate, but it's tidier if you do. If the individual matrices are the same size, you can convert into a single large matrix with
dsiftFull = cell2mat(dsift);
Almost magic. This won't work if your matrices are different sizes - but then, if they are, I'm not sure you would even want to put them all in a single one (unless you decide to horzcat them).
If you do decide you want a list of "all the colums as a giant matrix", then you can do
giantMatrix = [dsift{:}];
This will return a matrix with (in your example) 128 rows, and as many columns as there were "interest points" found. It's shorthand for
giantMatrix = [dsift{1,1} dsift{2,1} dsift{3,1} ... dsift{N,M}];

Resources