Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
HOG is popular in human detection. Can it be used for detecting objects like cup in the image for example.
I am sorry for not asking programming question, but I mean to get the idea if i can use hog to extract object features.
According to my research I have dont for few days I feel yes but I am not sure.
Yes, HOG (Histogram of Oriented Gradients) can be used to detect any kind of objects, as to a computer, an image is a bunch of pixels and you may extract features regardless of their contents. Another question, though, is its effectiveness in doing so.
HOG, SIFT, and other such feature extractors are methods used to extract relevant information from an image to describe it in a more meaningful way. When you want to detect an object or person in an image with thousands (and maybe millions) of pixels, it is inefficient to simply feed a vector with millions of numbers to a machine learning algorithm as
It will take a large amount of time to complete
There will be a lot of noisy information (background, blur, lightning and rotation changes) which we do not wish to regard as important
The HOG algorithm, specifically, creates histograms of edge orientations from certain patches in images. A patch may come from an object, a person, meaningless background, or anything else, and is merely a way to describe an area using edge information. As mentioned previously, this information can then be used to feed a machine learning algorithm such as the classical support vector machines to train a classifier able to distinguish one type of object from another.
The reason HOG has had so much success with pedestrian detection is because a person can greatly vary in color, clothing, and other factors, but the general edges of a pedestrian remain relatively constant, especially around the leg area. This does not mean that it cannot be used to detect other types of objects, but its success can vary depending on your particular application. The HOG paper shows in detail how these descriptors can be used for classification.
It is worthwhile to note that for several applications, the results obtained by HOG can be greatly improved using a pyramidal scheme. This works as follows: Instead of extracting a single HOG vector from an image, you can successively divide the image (or patch) into several sub-images, extracting from each of these smaller divisions an individual HOG vector. The process can then be repeated. In the end, you can obtain a final descriptor by concatenating all of the HOG vectors into a single vector, as shown in the following image.
This has the advantage that in larger scales the HOG features provide more global information, while in smaller scales (that is, in smaller subdivisions) they provide more fine-grained detail. The disadvantage is that the final descriptor vector grows larger, thus taking more time to extract and to train using a given classifier.
In short: Yes, you can use them.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I am interested in knowing the importance of data augmentation(rotation at various angles, flipping the images) while providing a dataset to a Machine Learning problem.
Whether it is really needed? Or the CNN networks using will handle that as well no matter how different the data are transformed?
So I took a classification task with 2 classes to conclude some results
Arrow shapes
Circle shapes
The idea is to train the shapes with only one orientation(I have taken arrows pointing right) and check the model with a different orientation(I have taken arrows pointing downwards) which is not at all given during the training stage.
Some of the samples used in Training
Some of the samples used in Testing
This is the entire dataset I am using in for creating a tensorflow model.
https://bitbucket.org/akhileshmalviya/samples/src/bab50b85d826?at=master
I am wondering with the results I got,
(i) Except a few downward arrows all others are getting predicted correctly as arrow. Does it mean data augmentation is not at all needed?
(ii) Or is this the right use case I have taken to understand the importance of data augmentation?
Kindly share your thoughts, Any help could be really appreciated!
Data augmentation is a data-depended process.
In general, you need it when your training data is complex and you have a few samples.
A neural network can easily learn to extract simple patterns like arcs or straight lines and these patterns are enough to classify your data.
In your case data augmentation can barely help, the features the network will learn to extract are easy and highly different from each other.
When you, instead, have to deal with complex structures (cats, dogs, airplanes, ...) you can't rely on simple features like edges, arcs, etc..
Instead, you have to show to your network that the instances you're trying to classify got an high variance and that the features extracted can be combined in a lot of different ways for the same subject.
Think about a cat: it can be of any color, the picture can be taken in different light conditions, its whole body can be in any position, the picture could be taken with a certain orientation...
To correctly classify instances so different, the network must learn to extract robust features that could be learned only after seeing a lot of different inputs.
In your case, instead, simple features can completely discriminate your input, thus any sort of data augmentation could help by just a little bit.
The task you are solving can be easily solved without any NN and even without machine learning.
Just because the problem is so simple it does not really matter whether you do a data augmentation or not. The need for data augmentation is task specific and depends on many things:
how easy is to augment the data with preserving the ability to correctly mark the class. For image, sounds which we used to see/hear it is not a problem (we know that adding small noise to the sound does not change the meaning, rotating the lizard is still a lizard). For other things augmenting without preserving the class/value is hard (for example in Go, randomly adding a stone can change the value of the position dramatically)
does the augmented data is drawn from the same distribution you care about. Adding random stones to Go does not work, but rotating flipping the board works and preserves distribution. But for example in a racing king game (variant of chess) it will not help. You can't flip the position (left <-> right), the evaluation stays the same, but it will never happen in real game and therefore drawn from different distribution and useless
how much data do you have and how expressive is your model. The more parameters you model have, the bigger the chance of overfitting and the more is your need for data. If you train a linear regression in n dims, you will have n + 1 params. You do not really need to augment this. Also if you already have 10bln data points, the augmentation is probably will not be helpful.
how expensive the augmentation procedure. For rotating/scaling the image it is very cheap, but for other augmentation it can be computationally expensive
something else that I forgot.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
First of all this Theory confuse me could someone explain it for me in some words.?
also the word scale in computer vision context does it means the various size of objects
Or the various units measurement of objects ( i.e meter , cm etc) or what I think is the various degrees smoothing/blurring for the same interesting Image ?
Second making multi-scale of Image by using smooth/blur operator which one I know the Gaussian blur operator. why they do a numbers of Smoothing for the Same Image , what the point of making numbers of smooth Images with different details/resolution but not different in size for the same scene (i.e one smooth operator on the interest image with size 256X256 and another time with 512X512 ).
I'm talking in context of Features extraction & description .
I will be thankful if some one could clarify the subject for me sorry for my Language !.
"Scale" here alludes to both the size of the image as well as the size of the objects themselves... at least for current feature detection algorithms. The reason why you construct a scale space is because we can focus on features of a particular size depending on what scale we are looking at. The smaller the scale, the coarser or smaller features we can concentrate on. Similarly, the larger the scale, the finer or larger features we can concentrate on.
You do all of this on the same image because this is a common pre-processing step for feature detection. The whole point of feature detection is to be able to detect features over multiple scales of the image. You only output those features that are reliable over all of the different scales. This is actually the basis of the Scale-Invariant Feature Transform (SIFT) where one of the objectives is to be able to detect keypoints robustly that can be found over multiple scales of the image.
What you do to create multiple scales is decompose an image by repeatedly subsampling the image and blurring the image with a Gaussian filter at each subsampled result. This is what is known as a scale space. A typical example of what a scale space looks like is shown here:
The reason why you choose a Gaussian filter is fundamental to the way the scale space works. At each scale, you can think of each image produced as being a more "simplified" version of the one found from the previous scale. With typical blurring filters, they introduce new spurious structures that don't correspond to those simplifications made in the finer scales. I won't go into the details, but there is a whole bunch of scale space theory where in the end, scale space construction using the Gaussian blur is the most fundamental way to do this, because new structures are not created when going from a fine scale to any coarse scale. You can check out that Wikipedia article I linked you to above that talks about the scale space for more details.
Now, traditionally a scale space is created by convolving your image with a Gaussian filter of various standard deviations, and that Wikipedia article has a nice pictorial representation of that. However, when you look at more recent feature detection algorithms like SURF or SIFT, they use a combination of blurring using different standard deviations as well as subsampling the image, which is what I talked about at the beginning of this post.
Either way, check out that Wikipedia post for more details. They talk about about this stuff more in depth than what I've done here.
Good luck!
Information:
I would like to use OpenCV's HOG detection to identify objects that can be seen in a variety of orientations. The only problem is, I can't seem to find a reasonable feature detector or classifier to detect this in a rotation and scale invaraint way (as is needed by objects such as forearms).
Prior Work:
Lets focus on forearms for this discussion. A forearm can have multiple orientations, the primary distinct features probably being its contour edges. It is possible to have images of forearms that are pointing in any direction in an image, thus the complexity. So far I have done some in depth research on using HOG descriptors to solve this problem, but I am finding that the variety of poses produced by forearms in my positives training set is producing very low detection scores in actual images. I suspect the issue is that the gradients produced by each positive image do not produce very consistent results when saved into the Histogram. I have reviewed many research papers on the topic trying to resolve or improvie this, including the original from Dalal & Triggs [Link]: http://lear.inrialpes.fr/people/triggs/pubs/Dalal-cvpr05.pdf It also seems that the assumptions made for detecting whole humans do not necessary apply to detecting individual features (particularly the assumption that all humans are standing up seems to suggest HOG is not a good route for rotation invariant detection like that of forearms).
Note:
If possible, I would like to steer clear of any non-free solutions such as those pertaining to Sift, Surf, or Haar.
Question:
What is a good solution to detecting rotation and scale invariant objects in an image? Particularly for this example, what would be a good solution to detecting all orientations of forearms in an image?
I use hog to detect human heads and shoulders. To train particular part you have to give the location of it. If you use opencv, you can clip samples containing only the training part you want, and make sure all training samples share the same size. For example, I clip images to contain only head and shoulder and resize all them to 64x64. Other opensource codes may require you to pass the location as the input parameter, essentially the same.
Are you trying the Discriminatively trained deformable part model ?http://www.cs.berkeley.edu/~rbg/latent/
you may find answers there.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm trying to develop a system, which recognizes various objects present in an image based on their primitive features like texture, shape & color.
The first stage of this process is to extract out individual objects from an image and later on doing image processing on each one by one.
However, segmentation algorithm I've studied so far are not even near perfect or so called Ideal Image segmentation algorithm.
Segmentation accuracy will decide how much better the system responds to given query.
Segmentation should be fast as well as accurate.
Can any one suggest me any segmentation algorithm developed or implemented so far, which won't be too complicated to implement but will be fair enough to complete my project..
Any Help is appreicated..
A very late answer, but might help someone searching for this in google, since this question popped up as the first result for "best segmentation algorithm".
Fully convolutional networks seem to do exactly the task you're asking for. Check the paper in arXiv, and an implementation in MatConvNet.
The following image illustrates a segmentation example from these CNNs (the paper I linked actually proposes 3 different architectures, FCN-8s being the best).
Unfortunately, the best algorithm type for facial recognition uses wavelet reconstruction. This is not easy, and almost all current algorithms in use are proprietary.
This is a late response, so maybe it's not useful to you but one suggestion would be to use the watershed algorithm.
beforehand, you can use a generic drawing(black and white) of a face, generate a FFT of the drawing---call it *FFT_Face*.
Now segment your image of a persons face using the watershed algorithm. Call the segmented image *Water_face*.
now find the center of mass for each contour/segment.
generate an FFT of *Water_Face*, and correlate it with the *FFT_Face image*. The brightest pixel in resulting image should be the center of the face. Now you can compute the distances between this point and the centers of segments generated earlier. The first few distances should be enough to distinguish one person from another.
I'm sure there are several improvements to the process, but the general idea should get you there.
Doing a Google search turned up this paper: http://www.cse.iitb.ac.in/~sharat/papers/prim.pdf
It seems that getting it any better is a hard problem, so I think you might have to settle for what's there.
you can try the watershed segmentation algorithm
also you can calculate the accuracy of the segmentation algorithm by the qualitative measures
I want to develop an application in which user input an image (of a person), a system should be able to identify face from an image of a person. System also works if there are more than one persons in an image.
I need a logic, I dont have any idea how can work on image pixel data in such a manner that it identifies person faces.
Eigenface might be a good algorithm to start with if you're looking to build a system for educational purposes, since it's relatively simple and serves as the starting point for a lot of other algorithms in the field. Basically what you do is take a bunch of face images (training data), switch them to grayscale if they're RGB, resize them so that every image has the same dimensions, make the images into vectors by stacking the columns of the images (which are now 2D matrices) on top of each other, compute the mean of every pixel value in all the images, and subtract that value from every entry in the matrix so that the component vectors won't be affine. Once that's done, you compute the covariance matrix of the result, solve for its eigenvalues and eigenvectors, and find the principal components. These components will serve as the basis for a vector space, and together describe the most significant ways in which face images differ from one another.
Once you've done that, you can compute a similarity score for a new face image by converting it into a face vector, projecting into the new vector space, and computing the linear distance between it and other projected face vectors.
If you decide to go this route, be careful to choose face images that were taken under an appropriate range of lighting conditions and pose angles. Those two factors play a huge role in how well your system will perform when presented with new faces. If the training gallery doesn't account for the properties of a probe image, you're going to get nonsense results. (I once trained an eigenface system on random pictures pulled down from the internet, and it gave me Bill Clinton as the strongest match for a picture of Elizabeth II, even though there was another picture of the Queen in the gallery. They both had white hair, were facing in the same direction, and were photographed under similar lighting conditions, and that was good enough for the computer.)
If you want to pull faces from multiple people in the same image, you're going to need a full system to detect faces, pull them into separate files, and preprocess them so that they're comparable with other faces drawn from other pictures. Those are all huge subjects in their own right. I've seen some good work done by people using skin color and texture-based methods to cut out image components that aren't faces, but these are also highly subject to variations in training data. Color casting is particularly hard to control, which is why grayscale conversion and/or wavelet representations of images are popular.
Machine learning is the keystone of many important processes in an FR system, so I can't stress the importance of good training data enough. There are a bunch of learning algorithms out there, but the most important one in my view is the naive Bayes classifier; the other methods converge on Bayes as the size of the training dataset increases, so you only need to get fancy if you plan to work with smaller datasets. Just remember that the quality of your training data will make or break the system as a whole, and as long as it's solid, you can pick whatever trees you like from the forest of algorithms that have been written to support the enterprise.
EDIT: A good sanity check for your training data is to compute average faces for your probe and gallery images. (This is exactly what it sounds like; after controlling for image size, take the sum of the RGB channels for every image and divide each pixel by the number of images.) The better your preprocessing, the more human the average faces will look. If the two average faces look like different people -- different gender, ethnicity, hair color, whatever -- that's a warning sign that your training data may not be appropriate for what you have in mind.
Have a look at the Face Recognition Hompage - there are algorithms, papers, and even some source code.
There are many many different alghorithms out there. Basically what you are looking for is "computer vision". We had made a project in university based around facial recognition and detection. What you need to do is google extensively and try to understand all this stuff. There is a bit of mathematics involved so be prepared. First go to wikipedia. Then you will want to search for pdf publications of specific algorithms.
You can go a hard way - write an implementaion of all alghorithms by yourself. Or easy way - use some computer vision library like OpenCV or OpenVIDIA.
And actually it is not that hard to make something that will work. So be brave. A lot harder is to make a software that will work under different and constantly varying conditions. And that is where google won't help you. But I suppose you don't want to go that deep.