I read Dalal and Triggs paper for HOG description and a blog by Chris McCormick regarding the same. The blog says that the image needs to be re-sampled at different scales to recognize different person.
My question is: Already we have a window which we place on the image having a size of 64*128 and which slides over the image. Then why re-sampling instead of sliding the whole window over the image which can detect the persons instead. ?
Please rectify if I am wrong, thanks in advance !!
You're right about the fact that the size of 64*128 is trained to be classified as either 'person' or 'non person'. But do all the persons in real world images always come in a handy 64*128 size?
That is where the scaling comes to play. By progressively making image smaller, the same 64*128 pixel region will cover larger area in the original image allowing detection of multiple sizes people.
For example,Here is an example from one of my models after running the detection on multiple scales. The result presented is after applying non-maximal supression to weed out extreneous detection windows.
Related
I have a large set of "apple" images in various shapes, sizes, lighting, color, etc. These "apple" images were part of a larger image from different angles.
Now I want to train Darknet to detect "apple"s in images. I don't want to go through annotation process as I already have cropped out ready jpg images of apples.
Can I use these ready and cropped "apple" images to train Darknet or do I still have to go through annotation process?
In object detection models, you annotate the object in an image because it will understand where the object is in a particular image. If you have an entire dataset containing only apple images, the model will learn in a way such that every image you provide will contain the only apple. So even if you provide an "orange" as a test image, it might still give apple because it doesn't know another class except for apple.
So there are two important points to consider:
Have dataset in such a way that there are apples, apples with other fruits or other objects. This will help the model to understand clearly what apple is.
As the coordinates of the bounding box are inputs for the detection, although you can give the regular dimensions of the image as the bounding box, it won't learn effectively well as mentioned above. Therefore, have multiple objects in the image and then annotate well so that the model can learn well
Your answer relates to a process that we called "Data Augmentation". You can google it how others do.
Since your apple images are all cropped-ready, you can assume all apple images were already tagged by their full sizes. Then collect some background images of which the sizes are all bigger than any of your apple images. And now you can write a tool to randomly select an apple image and combine it to your randomly-selected background to generate 'new' apple images with backgrounds. Since you must know the size of each apple image, you can definitely calculate the size of the bounding box and its position and then generate its tag file.
Given a logo image as a reference image, how to detect/recognize it in a cluttered natural image?
The logo may be quite small in the image, it can appear in clothes, hats, shoes, background wall etc. I have tried SIFT feature for matching without any other preprocessing, and the result is good for cases in which the size of the logo in images is big and the logo is clear. However, it fails for some cases where the scene is quite cluttered and the proportion of the logo size is quite small compared with the whole image. It seems that SIFT feature is sensitive to perspective distortions.
Anyone know some better features or ideas for logo detection/recognition in natural images? For example, training a classifier to locate candidate regions first, and then apply directly SIFT matching for further recognition. However, training a model needs many data, especially it needs manually annotating logo regions in images, and it needs re-training (needs to collect and annotate new image) if I want to apply it for new logos.
So, any suggestions for this? Detailed workflow/code/reference will be highly appreciated, thanks!
There are many algorithms from shape matching to haar classifiers. The best algorithm very depend on kind of logo.
If you want to continue with feature registration, i recommend:
For detection of small logos, use tiles. Split whole image to smaller (overlapping) tiles and perform usual detection. It will use "locality" of searched features.
Try ASIFT for affine invariant detection.
Use many template images for reference feature extraction, with different lightning , different background images (black, white, gray)
SHORT: is there a function in OpenCV or a general algorithm which could return an index for image homogenity?
LONG VERSION :-) I am implementing auto-focus based on image-data evaluation. My images are biological cells, which are spread fairly in similar density across the image area. Unfortunatelly, sometimes my algorithm is disturbed by dirt on the cover glass, which are mostly a few bright spots. So my idea is, to discard focus-function peaks caused by inhomogenious images.
Thank you for any suggestions!
Example images as requested: (not the best ones, but should fairly show the problem)
The left image captured at wrong Z-position because of dirt. The right one is OK.
Looking at the image, you could split it up in different parts (say 4x4 subimages), compute variance in each sub image, and see if the difference between lowest and highest variance is big.
I'm trying to understand Viola Jones method, and I've mostly got it.
It uses simple Haar like features boosted into strong classifiers and organized into layers /cascade in order to accomplish better performances (not bother with obvious 'non object' regions).
I think I understand integral image and I understand how are computed values for the features.
The only thing I can't figure out is how is algorithm dealing with the face size variations.
As far as I know they use 24x24 subwindow that slides over the image, and within it algorithm goes through classifiers and tries to figure out is there a face/object on it, or not.
And my question is - what if one face is 10x10 size, and other 100x100? What happens then?
And I'm dying to know what are these first two features (in first layer of the cascade), how do they look like (keeping in mind that these two features, acording to Viola&Jones, will almost never miss a face, and will eliminate 60% of the incorrect ones) ? How??
And, how is possible to construct these features to work with these statistics for different face sizes in image?
Am I missing something, or maybe I've figured it all wrong?
If I'm not clear enough, I'll try to explain better my confusion.
Training
The Viola-Jones classifier is trained on 24*24 images. Each of the face images contains a similarly scaled face. This produces a set of feature detectors built out of two, three, or four rectangles optimised for a particular sized face.
Face size
Different face sizes are detected by repeating the classification at different scales. The original paper notes that good results are obtained by trying different scales a factor of 1.25 apart.
Note that the integral image means that it is easy to compute the rectangular features at any scale by simply scaling the coordinates of the corners of the rectangles.
Best features
The original paper contains pictures of the first two features selected in a typical cascade (see page 4).
The first feature detects the wide dark rectangle of the eyes above a wide brighter rectangle of the cheeks.
----------
----------
++++++++++
++++++++++
The second feature detects the bright thin rectangle of the bridge of the nose between the darker rectangles on either side containing the eyes.
---+++---
---+++---
---+++---
I need to sort a huge number of photos, and remove the blurry images (due to camera shake), the over/under exposed ones and detect whether the image was shot in the landscape or portrait orientation. Can these things be done on an image using an image processing library or are they still beyond the realms of an algorithmic solution ?
Let's look at your question as three separate question.
Can I find blurry images?
There are some methods for finding blurry images either from :
Sharpening an image and comparing it to the original
Using wavelets to detect blurring ( Link1 )
Hough Transform ( Link )
Can I find images that are under or over exposed?
The only way I can think of this is that your overall brightness is either really high or really low. But the problem is that you would have know if the picture was taken at night or day. You could create a histogram of your image and see if it is really skewed one way or the other and that might be some indication of over/under exposure.
Can I determine the orientation of the image?
There are techniques that have been used such as SVM, Color Moments, Edge Direction Histograms, Bayesian Framework using cues.
Can I find images that are under or over exposed?
here histograms is recommended.