Can (or maybe should) I use facial recognition software to detect non-faces? For example, let's say I'm trying to find a can of soda in a picture of a bedroom. If I use the haartraining algorithms on many pictures of cans of soda, will the facial recognition feature then find the can of soda in a picture of a bedroom?
Yes you can. Those algorithms are based on extracting features from training sets. If you pass them faces the will be good at detecting features from faces. If you pass them cans they will be good at detecting cans.
A different idea is when you specialize an algorithm on a specific object by restricting the search, but haar are not the case.
I think you are referring to face-detection not face-recognition.
In general, face detection works on discriminating appearance or textures within the face area. They also implicitly assume a mostly planar object.
You can use the same classifiers to detect other mostly-flat objects which are discriminated by appearance, for example, facial features like eyes and mouth, book covers etc.
Soda cans are cylindrical, so you may have to use multiple overlapping rotated view of each can to get it to work.
Related
I am working on a hand detection project. There are many good project on web to do this, but what I need is a specific hand pose detection. It needs a totally open palm and the whole palm face to outwards, like the image below:
The first hand faces to inwards, so it will not be detected, and the right one faces to outwards, it will be detected. Now I can detect hand with OpenCV. but how to tell the hand orientation?
Problem of matching with the forehand belongs to the texture classification, it's a classic pattern recognition problem. I suggest you to try one of the following methods:
Gabor filters: it is good to detect the orientation and pixel intensities (as forehand has different features), opencv has getGaborKernel function, the very important params of this function is theta (orientation) and lambd: (frequencies). To make it simple you can apply this process on a cropped zone of palm (as you have already detected it, it would be easy to crop for example the thumb, or a rectangular zone around the gravity center..etc). Then you can convolute it with a small database of images of the same zone to get the a rate of matching, or you can use the SVM classifier, where you have to train your SVM on a set of images by constructing the training matrix needed for SVM (check this question), this paper
Local Binary Patterns (LBP): it's an important feature descriptor used for texture matching, you can apply it on whole palm image or on a cropped zone or finger of image, it's easy to use in opencv, a lot of tutorials with codes are available for this method. I recommend you to read this paper talking about Invariant Texture Classification
with Local Binary Patterns. here is a good tutorial
Haralick Texture: I've read that it works perfectly when a set of features quantifies the entire image (Global Feature Descriptors). it's not implemented in opencv but easy to be implemented, check this useful tutorial
Training Models: I've already suggested a SVM classifier, to be coupled with some descriptor, that can works perfectly.
Opencv has an interesting FaceRecognizer class for face recognition, it could be an interesting idea to use it replacing the face images by the palm ones, (do resizing and rotation to get an unique pose of palm), this class has three methods can be used, one of them is Local Binary Patterns Histograms, which is recommended for texture recognition. and why not to try the other models (Eigenfaces and Fisherfaces ) , check this tutorial
well if you go for a MacGyver way you can notice that the left hand has bones sticking out in a certain direction, while the right hand has all finger lines and a few lines in the hand palms.
These lines are always sort of the same, so you could try to detect them with opencv edge detection or hough lines. Due to the dark color of the lines, you might even be able to threshold them out of it. Then gather the information from those lines, like angles, regressions, see which features you can collect and train a simple decision tree.
That was assuming you do not have enough data, if you have then you go into deeplearning, just take a basic inceptionV3 model and retrain the last dense layer to classify between two classes with a softmax, or to predict the probablity if the hand being up/down with sigmoid. Check this link, Tensorflow got your back on the training of this one, pure already ready code to execute.
Questions? Ask away
Take a look at what leap frog has done with the oculus rift. I'm not sure what they're using internally to segment hand poses, but there is another paper that produces hand poses effectively. If you have a stereo camera setup, you can use this paper's methods: https://arxiv.org/pdf/1610.07214.pdf.
The only promising solutions I've seen for mono camera train on large datasets.
use Haar-Cascade classifier,
you can get the classifier model file then use it here.
Just search for 'Haarcascade detection of Palm in Google' or use below code.
import cv2
cam=cv2.VideoCapture(0)
ccfr2=cv2.CascadeClassifier('haar-cascade-files-master/palm.xml')
while True:
retval,image=cam.read()
grey=cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
palm=ccfr2.detectMultiScale(grey,scaleFactor=1.05,minNeighbors=3)
for x,y,w,h in palm:
image=cv2.rectangle(image,(x,y),(x+w,y+h),(256,256,256),2)
cv2.imshow("Window",image)
if cv2.waitKey(1) & 0xFF==ord('q'):
cv2.destroyAllWindows()
break
del(cam)
Best of Luck for your experience using HaarCascade.
I'm trying to use the hog detector in openCV, to detect 3 types of object from a video feed through a fish eye. The types are:
People
Books (when held by some person)
Chairs
The snapshot of the video I have looks like this image from this website - :
I setup the hog classifier using the default people detector and tried do first detect the people. I noticed when the people were of the size that you would expect from a non-fish eye lens (something you would get with a standard 35mm lens), they would get detected. If not the people would not get detected. This seemed logical as the classifier would expect people to be a standard size.
I was wondering how I could modify the classifier to detect people thorough a fish eye lens. The options I see are these:
Undistort the fish eye effect and run the classifier - I do not like to do this, because currently, I'm not in a position to calibrate the camera and get the distortion coefficients
Distort people images from a people image data set to around the distortion I would get through my video and re-train the classifier - I think this would work, but would like to understand would this not work as I think it work.
My question is:
What would be a valid approach for this problem? Will #2 of my options work for all 3 types of objects (people, books and chairs).
What is good classifier that can be trained to identify the 3 types of objects (cascade or hog or anything else - please suggest a library as well)? Will my #2 method of distorting and training with positive and negative examples be a good solution?
Retraining the HOG cascade to the performance level of the cascade included with OpenCV would be a pretty involved process. You would also have to simulate the distortion of your specific lens to modify the training data.
For the quickest solution I would recommend your first option of distorting the image. If you are willing to put in the time and resources to retrain the classifier (which you may have to do depending on how you are detecting chairs and books) then there are some publicly available pedestrian datasets that will be useful.
1) http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/
2) http://pascal.inrialpes.fr/data/human/
Its unlikely that you'll be able to find a chair cascade due to the variability in chair design. I would recommend you train your own cascade on the specific chairs you intend to detect. I don't know of any existing cascade for books and a quick google search didn't yield any promising results. A good resource for data if you intend on training your own cascade for books is ImageNet.
Context:
I have the RGB-D video from a Kinect, which is aimed straight down at a table. There is a library of around 12 objects I need to identify, alone or several at a time. I have been working with SURF extraction and detection from the RGB image, preprocessing by downscaling to 320x240, grayscale, stretching the contrast and balancing the histogram before applying SURF. I built a lasso tool to choose among detected keypoints in a still of the video image. Then those keypoints are used to build object descriptors which are used to identify objects in the live video feed.
Problem:
SURF examples show successful identification of objects with a decent amount of text-like feature detail eg. logos and patterns. The objects I need to identify are relatively plain but have distinctive geometry. The SURF features found in my stills are sometimes consistent but mostly unimportant surface features. For instance, say I have a wooden cube. SURF detects a few bits of grain on one face, then fails on other faces. I need to detect (something like) that there are four corners at equal distances and right angles. None of my objects has much of a pattern but all have distinctive symmetric geometry and color. Think cellphone, lollipop, knife, bowling pin. My thought was that I could build object descriptors for each significantly different-looking orientation of the object, eg. two descriptors for a bowling pin: one standing up and one laying down. For a cellphone, one laying on the front and one on the back. My recognizer needs rotational invariance and some degree of scale invariance in case objects are stacked. Ability to deal with some occlusion is preferable (SURF behaves well enough) but not the most important characteristic. Skew invariance would be preferable and SURF does well with paper printouts of my objects held by hand at a skew.
Questions:
Am I using the wrong SURF parameters to find features at the wrong scale? Is there a better algorithm for this kind of object identification? Is there something as readily usable as SURF that uses the depth data from the Kinect along with or instead of the RGB data?
I was doing something similar for a project, and ended up using a super simple method for object recognition, which was using OpenCV blob detection, and recognizing objects based on their areas. Obviously, there needs to be enough variance for this method to work.
You can see my results here: http://portfolio.jackkalish.com/Secondhand-Stories
I know there are other methods out there, one possible solution for you could be approxPolyDP, which is described here:
How to detect simple geometric shapes using OpenCV
Would love to hear about your progress on this!
I am trying to detect and track hand in real time using opencv. I thought haar cascade classifiers would yield a fair result. After training with 10k and 20k positive and negative images respectively, I obtained a classifier xml file. Unfortunately, it detects hand only in certain positions, proving that it works best only for rigid objects. So I am now thinking of adopting another algorithm that can track hand, once detected through haar classifier.
My question is,if I make sure that haar classifier detects hand in a certain frame, certain position, what method would yield robust tracking of hand further?
I searched web a bit, and have understood I can go for optical flow of the detected hand , or kalman filter or particle filter, but also have come across their own disadvantages.
also, If I incorporate stereo vision, would it help me, as I can possibly reconstruct hand in 3d.
You concluded rightly about Haar features - they aren't that useful when it comes to non-rigid objects.
Take a look at the following papers which use skin colour to detect hands.
Interaction between hands and wearable cameras
Markerless inspection of augmented reality objects
and this paper that uses KLT features to track the hand after the first detection:
Fast 2D hand tracking with flocks of features and multi-cue integration
I would say that a stereo camera will not help your cause much, as 3D reconstruction of non-rigid objects isn't straightforward and would require a whole lot of innovation and development. However, you can take a look at the papers in the hand pose estimation section of this page if you wish to pursue 3D tracking.
EDIT: Also take a look at this recent paper, which seems to get good results.
Zhang et al.'s Real-time Compressive Tracking does a reasonable job of tracking an object, once it has been detected by some other method, provided that the motion is not too fast. They have an OpenCV implementation (but it would need a bit of work to reuse).
This research paper describes a method to track hands, without using gloves by using a stereo camera setup.
there have been similar questions on stack overflow...
have a look at my answer and that of others: https://stackoverflow.com/a/17375647/1463143
you can for certain get better results by avoiding haar training and detection for deformable entities.
CamShift algorithm is generally fast and accurate, if you want to track the hand as a single entity. OpenCV documentation contains a good, easy-to-understand demo program that you can easily modify.
If you need to track fingers etc., however, further modeling will be needed.
Information:
I would like to use OpenCV's HOG detection to identify objects that can be seen in a variety of orientations. The only problem is, I can't seem to find a reasonable feature detector or classifier to detect this in a rotation and scale invaraint way (as is needed by objects such as forearms).
Prior Work:
Lets focus on forearms for this discussion. A forearm can have multiple orientations, the primary distinct features probably being its contour edges. It is possible to have images of forearms that are pointing in any direction in an image, thus the complexity. So far I have done some in depth research on using HOG descriptors to solve this problem, but I am finding that the variety of poses produced by forearms in my positives training set is producing very low detection scores in actual images. I suspect the issue is that the gradients produced by each positive image do not produce very consistent results when saved into the Histogram. I have reviewed many research papers on the topic trying to resolve or improvie this, including the original from Dalal & Triggs [Link]: http://lear.inrialpes.fr/people/triggs/pubs/Dalal-cvpr05.pdf It also seems that the assumptions made for detecting whole humans do not necessary apply to detecting individual features (particularly the assumption that all humans are standing up seems to suggest HOG is not a good route for rotation invariant detection like that of forearms).
Note:
If possible, I would like to steer clear of any non-free solutions such as those pertaining to Sift, Surf, or Haar.
Question:
What is a good solution to detecting rotation and scale invariant objects in an image? Particularly for this example, what would be a good solution to detecting all orientations of forearms in an image?
I use hog to detect human heads and shoulders. To train particular part you have to give the location of it. If you use opencv, you can clip samples containing only the training part you want, and make sure all training samples share the same size. For example, I clip images to contain only head and shoulder and resize all them to 64x64. Other opensource codes may require you to pass the location as the input parameter, essentially the same.
Are you trying the Discriminatively trained deformable part model ?http://www.cs.berkeley.edu/~rbg/latent/
you may find answers there.