Scale and multiscale space Theory [closed] - image-processing

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
First of all this Theory confuse me could someone explain it for me in some words.?
also the word scale in computer vision context does it means the various size of objects
Or the various units measurement of objects ( i.e meter , cm etc) or what I think is the various degrees smoothing/blurring for the same interesting Image ?
Second making multi-scale of Image by using smooth/blur operator which one I know the Gaussian blur operator. why they do a numbers of Smoothing for the Same Image , what the point of making numbers of smooth Images with different details/resolution but not different in size for the same scene (i.e one smooth operator on the interest image with size 256X256 and another time with 512X512 ).
I'm talking in context of Features extraction & description .
I will be thankful if some one could clarify the subject for me sorry for my Language !.

"Scale" here alludes to both the size of the image as well as the size of the objects themselves... at least for current feature detection algorithms. The reason why you construct a scale space is because we can focus on features of a particular size depending on what scale we are looking at. The smaller the scale, the coarser or smaller features we can concentrate on. Similarly, the larger the scale, the finer or larger features we can concentrate on.
You do all of this on the same image because this is a common pre-processing step for feature detection. The whole point of feature detection is to be able to detect features over multiple scales of the image. You only output those features that are reliable over all of the different scales. This is actually the basis of the Scale-Invariant Feature Transform (SIFT) where one of the objectives is to be able to detect keypoints robustly that can be found over multiple scales of the image.
What you do to create multiple scales is decompose an image by repeatedly subsampling the image and blurring the image with a Gaussian filter at each subsampled result. This is what is known as a scale space. A typical example of what a scale space looks like is shown here:
The reason why you choose a Gaussian filter is fundamental to the way the scale space works. At each scale, you can think of each image produced as being a more "simplified" version of the one found from the previous scale. With typical blurring filters, they introduce new spurious structures that don't correspond to those simplifications made in the finer scales. I won't go into the details, but there is a whole bunch of scale space theory where in the end, scale space construction using the Gaussian blur is the most fundamental way to do this, because new structures are not created when going from a fine scale to any coarse scale. You can check out that Wikipedia article I linked you to above that talks about the scale space for more details.
Now, traditionally a scale space is created by convolving your image with a Gaussian filter of various standard deviations, and that Wikipedia article has a nice pictorial representation of that. However, when you look at more recent feature detection algorithms like SURF or SIFT, they use a combination of blurring using different standard deviations as well as subsampling the image, which is what I talked about at the beginning of this post.
Either way, check out that Wikipedia post for more details. They talk about about this stuff more in depth than what I've done here.
Good luck!

Related

OpenCV compare similar hand drawn images

I am trying to compare two mono-chrome, basic hand drawn images, captured electronically. The scale may be different but the essences of the image is the same. I want to compare one hand drawn image to a save library of images and get a relative score of how similar they are. Think of several basic geometric shapes, lines, and curves that make up a drawing.
I have tried several techniques without much luck. Pixel based comparisons are too exact. I have tried scaling and cropping images and that did not get accurate results.
I have tried OpenCV with C# and have had a little success. I have experimented with SURF and it works for a few images, but not others that the eye can tell are very similar.
So now my question: Are there any examples of using openCV or commercial software that can support comparing drawings that are not exact? I prefer C# but I am open to any solutions.
Thanks in advance for any guidance.
(I have been working on this for over a month and have searched the internet and Stack Overflow without success. I of course could have missed something)
You need to extract features from these images and after that using a basic euclidean distance would be enough to calculate similarity. But hand writtend drawn thins are not easy to extract features. For example, companies that work on face recognition generally have much less accuracy on drawn face portraits.
I have a suggestion for you. For a machine learning homework, one of my friends got the signature recognition assingment. I do not fully know how he did it with a high accuracy, but I know feature extraction part. Firtstly he converted it to binary image. And than he calculated the each row's black pixel count. Than he used that features to train a NN or etc.
So you can use this similar approach to extract features. Than use a euclidean distance to calculate similarities.

Histogram of Oriented Gradients object detection [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
HOG is popular in human detection. Can it be used for detecting objects like cup in the image for example.
I am sorry for not asking programming question, but I mean to get the idea if i can use hog to extract object features.
According to my research I have dont for few days I feel yes but I am not sure.
Yes, HOG (Histogram of Oriented Gradients) can be used to detect any kind of objects, as to a computer, an image is a bunch of pixels and you may extract features regardless of their contents. Another question, though, is its effectiveness in doing so.
HOG, SIFT, and other such feature extractors are methods used to extract relevant information from an image to describe it in a more meaningful way. When you want to detect an object or person in an image with thousands (and maybe millions) of pixels, it is inefficient to simply feed a vector with millions of numbers to a machine learning algorithm as
It will take a large amount of time to complete
There will be a lot of noisy information (background, blur, lightning and rotation changes) which we do not wish to regard as important
The HOG algorithm, specifically, creates histograms of edge orientations from certain patches in images. A patch may come from an object, a person, meaningless background, or anything else, and is merely a way to describe an area using edge information. As mentioned previously, this information can then be used to feed a machine learning algorithm such as the classical support vector machines to train a classifier able to distinguish one type of object from another.
The reason HOG has had so much success with pedestrian detection is because a person can greatly vary in color, clothing, and other factors, but the general edges of a pedestrian remain relatively constant, especially around the leg area. This does not mean that it cannot be used to detect other types of objects, but its success can vary depending on your particular application. The HOG paper shows in detail how these descriptors can be used for classification.
It is worthwhile to note that for several applications, the results obtained by HOG can be greatly improved using a pyramidal scheme. This works as follows: Instead of extracting a single HOG vector from an image, you can successively divide the image (or patch) into several sub-images, extracting from each of these smaller divisions an individual HOG vector. The process can then be repeated. In the end, you can obtain a final descriptor by concatenating all of the HOG vectors into a single vector, as shown in the following image.
This has the advantage that in larger scales the HOG features provide more global information, while in smaller scales (that is, in smaller subdivisions) they provide more fine-grained detail. The disadvantage is that the final descriptor vector grows larger, thus taking more time to extract and to train using a given classifier.
In short: Yes, you can use them.

Comparing similar images as photographs -- detecting difference, image diff

The situation is kind of unique from anything I have been able to find asked already, and is as follows: If I took a photo of two similar images, I'd like to be able to highlight the differing features in the two images. For example the following two halves of a children's spot the difference game:
The differences in the images will be bits missing/added and/or colour change, and the type of differences which would be easily detectable from the original image files by doing nothing cleverer than a pixel-by-pixel comparison. However the fact that they're subject to the fluctuations of light and imprecision of photography, I'll need a far more lenient/clever algorithm.
As you can see, the images won't necessarily line up perfectly if overlaid.
This question is tagged language-agnostic as I expect answers that point me towards relevant algorithms, however I'd also be interested in current implementations if they exist, particularly in Java, Ruby, or C.
The following approach should work. All of these functionalities are available in OpenCV. Take a look at this example for computing homographies.
Detect keypoints in the two images using a corner detector.
Extract descriptors (SIFT/SURF) for the keypoints.
Match the keypoints and compute a homography using RANSAC, that aligns the second image to the first.
Apply the homography to the second image, so that it is aligned with the first.
Now simply compute the pixel-wise difference between the two images, and the difference image will highlight everything that has changed from the first to the second.
My general approach would be to use an optical flow to align both images and perform a pixel by pixel comparison once they are aligned.
However, for the specifics, standard optical flows (OpenCV etc.) are likely to fail if the two images differ significantly like in your case. If that indeed fails, there are recent optical flow techniques that are supposed to work even if the images are drastically different. For instance, you might want to look at the paper about SIFT flows by Ce Liu et al that addresses this problem with sparse correspondences.

find mosquitos' head in the image

I have images of mosquitos similar to these ones and I would like to automatically circle around the head of each mosquito in the images. They are obviously in different orientations and there are random number of them in different images. some error is fine. Any ideas of algorithms to do this?
This problem resembles a face detection problem, so you could try a naïve approach first and refine it if necessary.
First you would need to recreate your training set. For this you would like to extract small images with examples of what is a mosquito head or what is not.
Then you can use those images to train a classification algorithm, be careful to have a balanced training set, since if your data is skewed to one class it would hit the performance of the algorithm. Since images are 2D and algorithms usually just take 1D arrays as input, you will need to arrange your images to that format as well (for instance: http://en.wikipedia.org/wiki/Row-major_order).
I normally use support vector machines, but other algorithms such as logistic regression could make the trick too. If you decide to use support vector machines I strongly recommend you to check libsvm (http://www.csie.ntu.edu.tw/~cjlin/libsvm/), since it's a very mature library with bindings to several programming languages. Also they have a very easy to follow guide targeted to beginners (http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf).
If you have enough data, you should be able to avoid tolerance to orientation. If you don't have enough data, then you could create more training rows with some samples rotated, so you would have a more representative training set.
As for the prediction what you could do is given an image, cut it using a grid where each cell has the same dimension that the ones you used on your training set. Then you pass each of this image to the classifier and mark those squares where the classifier gave you a positive output. If you really need circles then take the center of the given square and the radius would be the half of the square side size (sorry for stating the obvious).
So after you do this you might have problems with sizes (some mosquitos might appear closer to the camera than others) , since we are not trained the algorithm to be tolerant to scale. Moreover, even with all mosquitos in the same scale, we still might miss some of them just because they didn't fit in our grid perfectly. To address this, we will need to repeat this procedure (grid cut and predict) rescaling the given image to different sizes. How many sizes? well here you would have to determine that through experimentation.
This approach is sensitive to the size of the "window" that you are using, that is also something I would recommend you to experiment with.
There are some research may be useful:
A Multistep Approach for Shape Similarity Search in Image Databases
Representation and Detection of Shapes in Images
From the pictures you provided this seems to be an extremely hard image recognition problem, and I doubt you will get anywhere near acceptable recognition rates.
I would recommend a simpler approach:
First, if you have any control over the images, separate the mosquitoes before taking the picture, and use a white unmarked underground, perhaps even something illuminated from below. This will make separating the mosquitoes much easier.
Then threshold the image. For example here i did a quick try taking the red channel, then substracting the blue channel*5, then applying a threshold of 80:
Use morphological dilation and erosion to get rid of the small leg structures.
Identify blobs of the right size to be moquitoes by Connected Component Labeling. If a blob is large enough to be two mosquitoes, cut it out, and apply some more dilation/erosion to it.
Once you have a single blob like this
you can find the direction of the body using Principal Component Analysis. The head should be the part of the body where the cross-section is the thickest.

minimum texture image dimension for effective classification

Iam a beginner in image mining. I would like to know the minimum dimension required for effective classification of textured images. As what i feel if a image is too small feature extraction step will not extract enough features. And if the image size goes beyond a certain dimension the processing time will increase exponentially with image size.
This is a complex question that requires a bit of thinking.
Short answer: It depends.
Long answer: It depends on the type of texture you want to classify and the type of feature your classification is based on. If the feature extracted is, say, color only, you can use "texture" as small as 1x1 pixel (in that case, using the word "texture" is a bit of an abuse). If you want to classify, say for example characters, you can usually extract a lot of local information from edges (Hough transform, Gabor filters, etc). The image plane just have to be big enough to hold the characters (say 16x16 pixels for Latin alphabet).
If you want to be able to classify any kind of images in any kind of number, you can also base your classification on global information, like entropy, correlogram, energy, inertia, cluster shade, cluster prominence, color and correlation. Those features are used for content based image retrieval.
From the top of my head, I would try using texture as small as 32x32 pixels if the kind of texture you are using is a priori unknown. If on the contrary the kind of texture is a priori known, I would choose one or more feature that I know would classify the images according to my needs (1x1 pixel for color-only, 16x16 pixels for characters, etc). Again, it really depends on what you are trying to achieve. There isn't a unique answer to your question.

Resources