I want to make a face detection application that detect only my face using Python, OpenCV. I wonder if there is any rule about choosing negative samples. Should I select the any image that does not contain my face? (For example: roads, scenes, animals etc.) or should I select faces of people that does not contain my face as negative image.
And also I wonder that does common environment where the object that looked for occupy affect the efficiency? (For example: Is it a good practice to select empty roads as negative image when detecting cars?)
I really wonder your thoughts. Could you also share articles, document about it if there is any? I would really appreciate for that. Thanks for your helps!
This is quite a difficult question, and depends heavily on your chosen algorithm. If you e.g. use SIFT, then you only use positive samples. If you are using Cascades then negative samples are necessary.
What samples to use depends on the application and no single answer exist. You should generally provide positive and negative samples covering the possible situations you could expect to appear. As the algorithm becomes even more complex (Deep Learning) this becomes even more relevant.
So if you think other faces can appear in the image and you want to make sure it sees them as negative you need to provide this. If your face can appear with different backgrounds then it is also important to include such situations.
To summarize, include samples (positive/negative) of situations expected to appear.
Related
I have a problem statement to recognize 10 classes of different variations(variations in color and size) of same object (bottle cap) while falling taking into account the camera sees different viewpoint of the object. I have split this into sub-tasks
1) Trained a deep learning model to classify only the flat surface of the object and successful in this attempt.
Flat Faces of sample 2 class
2) Instead of taking fall into account, trained a model for possible perspective changes - not successful.
Perception changes of sample 2 class
What are the approaches to recognize the object even for perspective changes. I am not constrained to arrive with a single camera solution. Open to ideas in approaching towards this problem of variable perceptions.
Any help could be really appreciated, Thanks in advance!
The answer I want to give you is: CapsNets
You should definately check out the paper, where you will be introduced to some short comings of CNNs and how they tried to fix them.
That said, I find it hard to believe that your architecture cannot solve the problem successfully when the perspective changes. Is your dataset extremely small? I'd expect the neural network to learn filters for the riffled edges, which can be seen from all perspectives.
If you're not limited to one camera you could try to train a "normal" classifier, which you feed multiple images in production and average the prediction. Or you could build an architecture that takes in multiple perspectives at once. You have to try for yourself, what works best.
Also, never underestimate the power of old school image preprocessing. If you have 3 different perspectives, you could take the one that comes closest to the "flat" perspective. This is probably as easy as using the image with the largest colored area, where img.sum() is the highest.
Another idea is to figure out the color through explicit programming, which should be fairly easy and then feed the network a grayscale image. Maybe your network is confused by the strong correlation of the color and ignores the shape altogether.
I'm searching for algorithms/methods that are used to classify or differentiate between two outdoor environments. Given an image with vehicles, I need to be able to detect whether the vehicles are in a natural desert landscape, or whether they're in the city.
I've searched but can't seem to find relevant work on this. Perhaps because I'm new at computer vision, I'm using the wrong search terms.
Any ideas? Is there any work (or related) available in this direction?
I'd suggest reading Prince's Computer Vision: Models, Learning, and Inference (free PDF available). It covers image classification, as well as many other areas of CV. I was fortunate enough to take the Machine Vision course at UCL which the book was designed for and it's an excellent reference.
Addressing your problem specifically, a simple MAP or MLE model on pixel colours will probably provide a reasonable benchmark. From there you could look at more involved models and feature engineering.
Seemingly complex classifications similar to "civilization" vs "nature" might be able to be solved simply with the help of certain heuristics along with classification based on color. Like Gilevi said, city scenes are sure to contain many flat lines and right angles, while desert scenes are dominated by rolling dunes and so on.
To address this directly, you could use OpenCV's hough - lines algorithm on the images (tuned for this problem of course) and look at:
a) how many lines are fit to the image at a given threshold
b) of the lines that are fit what is the expected angle between two of them; if the angles are uniformly distributed then chances are its nature, but if the angles are clumped up around multiples of pi/2 (more right angles and straight lines) then it is more likely to be a cityscape.
Color components, textures, and degree of smoothness(variation or gradient of image) may differentiate the desert and city background. You may also try Hough transform, which is used for line detection that can be viewed as city feature (building, road, bridge, cars,,,etc).
I would recommend you this research very similar with your project. This article presents a comparison of different classification techniques to obtain the scene classifier (urban, highway, and rural) based on images.
See my answer here: How to match texture similarity in images?
You can use the same method. I already solved in the past problems like the one you described with this method.
The problem you are describing is that of scene categorization. Search for works that use the SUN database.
However, you only working with two relatively different categories, so I don't think you need to kill yourself implementing state-of-the-art algorithms. I think taking GIST features + color features and training a non-linear SVM would do the trick.
Urban environments is usually characterized with a lot of horizontal and vertical lines, GIST captures that information.
Information:
I would like to use OpenCV's HOG detection to identify objects that can be seen in a variety of orientations. The only problem is, I can't seem to find a reasonable feature detector or classifier to detect this in a rotation and scale invaraint way (as is needed by objects such as forearms).
Prior Work:
Lets focus on forearms for this discussion. A forearm can have multiple orientations, the primary distinct features probably being its contour edges. It is possible to have images of forearms that are pointing in any direction in an image, thus the complexity. So far I have done some in depth research on using HOG descriptors to solve this problem, but I am finding that the variety of poses produced by forearms in my positives training set is producing very low detection scores in actual images. I suspect the issue is that the gradients produced by each positive image do not produce very consistent results when saved into the Histogram. I have reviewed many research papers on the topic trying to resolve or improvie this, including the original from Dalal & Triggs [Link]: http://lear.inrialpes.fr/people/triggs/pubs/Dalal-cvpr05.pdf It also seems that the assumptions made for detecting whole humans do not necessary apply to detecting individual features (particularly the assumption that all humans are standing up seems to suggest HOG is not a good route for rotation invariant detection like that of forearms).
Note:
If possible, I would like to steer clear of any non-free solutions such as those pertaining to Sift, Surf, or Haar.
Question:
What is a good solution to detecting rotation and scale invariant objects in an image? Particularly for this example, what would be a good solution to detecting all orientations of forearms in an image?
I use hog to detect human heads and shoulders. To train particular part you have to give the location of it. If you use opencv, you can clip samples containing only the training part you want, and make sure all training samples share the same size. For example, I clip images to contain only head and shoulder and resize all them to 64x64. Other opensource codes may require you to pass the location as the input parameter, essentially the same.
Are you trying the Discriminatively trained deformable part model ?http://www.cs.berkeley.edu/~rbg/latent/
you may find answers there.
I'm currently working on a vision system for a UAV I am building. The goal of the system is to find target objects, which are rather well defined (see below), in a video stream that will be a 2-D flyover view of the ground. So far I have tried training and using a Haar-like feature based cascade, a la Viola Jones, to do the detection. I am training it with 5000+ images of the targets at different angles (perspective shifts) and ranges (sizes in the frame), but only 1900 "background" images. This does not yield good results at all, as I cannot find a suitable number of stages to the cascade that balances few false positives with few false negatives.
I am looking for advice from anyone who has experience in this area, as to whether I should: 1) ditch the cascade, in favor of something more suitable to objects defined by their outline and color (which I've read the VJ cascade is not).
2) improve my training set for the cascade, either by adding positives, background frames, organizing/shooting them better, etc.
3) Some other approach I can't fathom currently.
A description of the targets:
Primary shapes: triangles, squares, circles, ellipses, etc.
Distinct, solid, primary (or close to) colors.
Smallest dimension between two and eight feet (big enough to be seen easily from a couple hundred feet AGL
Large, single alphanumeric in the center of the object, with its own distinct, solid, primary or almost primary color.
My goal is use something very fast, such as the VJ cascade, to find possible objects and their associated bounding box, and then pass these on to finer processing routines to determine the properties (color of the object and AN, value of the AN, actual shape, and GPS location). Any advice you can give me towards completing this goal would be much appreciated. The source code I currently have is a little lengthy for post here, but is freely available should you like to see it for reference. Thanks in advance!
-JB
I would recommend ditching Haar classification, since you know a lot about your objects. In these cases you should start by checking what features you can use:
1) overhead flight means, as you said, you can basically treat these as fixed shapes on a 2D plane. There will be scaling, rotations and some minor affine transformations, which depends a lot on how wide-angled your camera is. If it isn't particularly wide-angled, that part can probably be ignored. Also, you probably know your altitude, by which you can probably also make very good assumption on the target size (scaling).
2) You know the colors, which also makes it quite easy to find objects. If these are very defined as primary color, then you can just filter the image based on color and find those contours. If you want to do something a little more advanced (which to me doesn't seem necessary though...) you can do a backprojection, which in my experience is very effective and fast. Note, if you're creating the objects, it would be better to use Red Green and Blue instead of primary colors (red green and yellow). Then you can simply split the image into it's respective channels and use a very high threshold.
3) You know the geometric shapes. I've never done this myself, but as far as I know, the options are using moments or using Hough transforms (although openCV only has hough algorithms for lines and circles, so you'd have to write your own for other shapes...). You might already have sufficiently good results without this step though...
If you want more specific recommendations, it would be very useful to upload a couple sample images. :)
May be solved but I came across a paper recently with an open-source license for generic object detection using normalised gradient features : http://mmcheng.net/bing/comment-page-9/
The details of the algorithms performance against illumination, rotation and scale may require a little digging. I can't remember on the top of my head where the original paper is.
I have a simple photograph that may or may not include a logo image. I'm trying to identify whether a picture includes the logo shape or not. The logo (rectangular shape with a few extra features) could be of various sizes and could have multiple occurrences. I'd like to use Computer Vision techniques to identify the location of these logo occurrences. Can someone point me in the right direction (algorithm, technique?) that can be used to achieve this goal?
I'm quite a novice to Computer Vision so any direction would be very appreciative.
Thanks!
Practical issues
Since you need a scale-invariant method (that's the proper jargon for "could be of various sizes") SIFT (as mentioned in Logo recognition in images, thanks overrider!) is a good first choice, it's very popular these days and is worth a try. You can find here some code to download. If you cannot use Matlab, you should probably go with OpenCV. Even if you end up discarding SIFT for some reason, trying to make it work will teach you a few important things about object recognition.
General description and lingo
This section is mostly here to introduce you to a few important buzzwords, by describing a broad class of object detection methods, so that you can go and look these things up. Important: there are many other methods that do not fall in this class. We'll call this class "feature-based detection".
So first you go and find features in your image. These are characteristic points of the image (corners and line crossings are good examples) that have a lot of invariances: whatever reasonable processing you do to to your image (scaling, rotation, brightness change, adding a bit of noise, etc) it will not change the fact that there is a corner in a certain point. "Pixel value" or "vertical lines" are bad features. Sometimes a feature will include some numbers (e.g. the prominence of a corner) in addition to a position.
Then you do some clean-up, like remove features that are not strong enough.
Then you go to your database. That's something you've built in advance, usually by taking several nice and clean images of whatever you are trying to find, running you feature detection on them, cleaning things up, and arrange them in some data structure for your next stage —
Look-up. You have to take a bunch of features form your image and try to match them against your database: do they correspond to an object you are looking for? This is pretty non-trivial, since on the face of it you have to consider all subsets of the bunch of features you've found, which is exponential. So there are all kinds of smart hashing techniques to do it, like Hough transform and Geometric hashing.
Now you should do some verification. You have found some places in the image which are suspect: it's probable that they contain your object. Usually, you know what is the presumed size, orientation, and position of your object, and you can use something simple (like a convolution) to check if it's really there.
You end up with a bunch of probabilities, basically: for a few locations, how probable it is that your object is there. Here you do some outlier detection. If you expect only 1-2 occurrences of your object, you'll look for the largest probabilities that stand out, and take only these points. If you expect many occurrences (like face detection on a photo of a bunch of people), you'll look for very low probabilities and discard them.
That's it, you are done!