In the tensorflow object detection API, what is the confidence threshold applied to the detection to calculate the Coco mAP? - object-detection-api

If you specify coco metrics in the model config file, you get an output in the terminal with the requested information. The COCO mAP is calculated over a variety of IOU threshold, but what is the confidence threshold for deciding if a bounding box is a detection? I can't track a number down in the git repo so I thought I would ask here.

Related

Identification/Classification using BOW and SVM

I have been working on a project which would identify diseases from a leaf. I did search and worked out a few things. However some confusions remains.
I believe following should be the flow (suggestions required)
Crop diseased area (Manually) from leafs for building Vocabulary.
Use SIFT to get keypoints and descriptors
Create Bag of Words Vocabulary and Cluster (K means)
Train SVM from descriptors obtained above
To Evaluate/Classify Take input image of entire leaf and crop it to extract diseased area using HarCascade
Use SIFT to get keypoints and Descriptors and then use SVM to Predict.
Questions are
Is above workflow reasonable ? or i am missing something?
I am confused about how does SVM learns the Name of object or disease for example where does SVM get Name of object it learned or detected?
How does SVM outputs the Name of object it identified ?

OpenCV Face Verification

Is there way that I can implement Face Recognition using OpenCV? I tried to use LBPH, and train with one image. It gives a confidence score, but I am not sure how accurate this is to use for verification.
My question is how can I create a face recognition system that tells me how similar the two faces are/if they are the same person or not using OpenCV. It doesn't seem like the confidence score is an accurate measure, if I'm doing this correctly.
Also, is a higher confidence score better?
Thanks
OpenCV 3 currently support following algorithms for face recognition:
- Eigenfaces (see createEigenFaceRecognizer())
- Fisherfaces (see createFisherFaceRecognizer())
- Local Binary Patterns Histograms (see createLBPHFaceRecognizer())
Confidence score by these algorithms is the similarity measure between faces, but these methods are really old and perform poorly. I'd suggest you try this article : http://www.robots.ox.ac.uk/~vgg/publications/2015/Parkhi15/parkhi15.pdf
Basically you need to download trained caffe model from here: http://www.robots.ox.ac.uk/~vgg/software/vgg_face/src/vgg_face_caffe.tar.gz
Use opencv to run this classifier like shown is this example:
http://docs.opencv.org/trunk/d5/de7/tutorial_dnn_googlenet.html#gsc.tab=0
Then collect fc8 feature layer of size 4096 floats from caffe network. And calculate your similarity as L2 norm between two fc8 layers calculated for your faces.

OpenCV4Android SVM is not giving the correct prediction

I am new to machine learning and openCV. I have taken a set of 10 images for each emotion(neutral and happy) from Cohn-Kanade face database. Then I have extracted the facial features from each image and put them in my trainingData Matrix and assigned the label for the respective emotion (Example: 0 for neutral and 1 for happy).
I have used the RBF kernel with gamma = 0.1 and C = 1. Once trained, I am passing the facial features extracted from the live camera frames from a smartphone camera for prediction. The prediction always returns 1.
If I increase the number of training samples for neutral expression(example: 15 neutral expression images and 10 happy expression images), then the prediction always returns 0 and if there are equal number of images for each expression in the training samples, then SVM prediction always returns 1.
Why is the SVM behaving this way? How to check if I am using the right values for gamma and C? Also, does SVM depend on the resolution of training images and testing images?
I would request you to upload the SVM function so we can understand your code. Secondly, I have used SVM before and you need to normalize the training data and the labels. You should also make sure you are using the correct classifier as not all classifiers are supported. Follows this link for some tutorials http://docs.opencv.org/3.0-beta/modules/ml/doc/support_vector_machines.html
For answering your other questions, unfortunately you have to find the best combination for gamma and C yourself, which is kind of the drawback of SVM. https://www.quora.com/What-are-C-and-gamma-with-regards-to-a-support-vector-machine
Yes, the SVM does depend on the resolution as your features/feature vectors would change depending on the resolution and hence the inputs and the labels.
P.S. This should ideally be in comments but unfortunately i don't have enough points to do that.

opencv SVM Prediction

I am currently working on age estimation and extracting features using Gabor Filters. I have decided to classify ages using a SVM, the training is successful and it does not take a long time since the feature vectors are only about 3000 in dimensions by a 1000 training samples.
The problem is that when I want to predict an image, the result returned is always 18 (I pass the feature vector of the testing image to the predict function of the SVM). But when I predict an age from the training set the result is always correct. I do not know why this is happening. Any help will be appreciated.
Also, an observation is that whether or not I change and/ or include the SVMParams the prediction still outputs the same value.

Is Hog descriptors constructed in peopledetect.cpp?

I am new to hog, I am using opencv2.4.4 and visual studio 2010, i am running the sample peopledetect.cpp in the package and its compiling and running, but i want to understand the the source code in detail.In peopledetect.cpp is hog descriptors constructed/ already trained for peopledetection 3780 vectors are fed into svm classifier? when i try to debug the peopledetect.cpp i could only find HOGDescriptor creates hog descriptor and detector, i basically doesnt understand what this API does HOGDescriptor as i see peopledetect.cpp doesnt go through the steps of hog processing, it loads the already trained vectors to svm classifier to detect people/no people, am i wrong?. As there is no documentation about this.
Can anyone please brief about this.
The implementation of People Detection algorithm in opencv is based on HOG descriptors as features and SVM as classifier.
1. A training database (positives samples as person, negatives samples as non-person) is used to learn to SVM parameters (it computes and store the support vectors). Cross-validation is also perform (I assume) to optimize the soft margin parameter C and the kernel parameters (it could be linear kernel).
2. To detect people on testing video data, peopledetect.cpp loads the pre-learnt SVM, computes the HOG descriptors on different positions and scales, then merges the windows with high detection scores (outputs of binary SVM classifer).
Here is a good paper (inria) to start with.
Coming to more clearer answer, peopledetect.cpp goes through all the hog steps.
digging deeper i was more clear. Basically if you debug peopledetect.cpp goes through these steps.
Initially image is divided into several scales, scale0(1.05) is coefficient for detection window increase. For each scale of the image features are extracted from window and a classifier window is run, like above it follows scale-space pyramid method. So its pretty big computational process, very expensive, so opencv team has tried to parallelise for each scale.
I was baffled before why i was not able to debug/go through the steps, This parallel_for_(Range(0, (int)levelScale.size()),HOGInvoker()) creates several threads where each thread works on each scale, depends how much threads or created something like this.
because of this i was not able to debug, what i did was freeze all the threads and debug only the main thread. for different scales of the image hog processing steps are
Here in peopledetect.cpp hog and classifier window are kinda combined.In a single window(64x128) both feature extraction and running classifier takes place. After this is done for each scale of the image. There are a number of pedestrian windows of different scales are often associated with this region, this is grouped using grouprectangle() function
Training SVM consist to find parameters of the max margin between postive and negative samples.
if the same feature extraction is done for 1000+ negative and positive sample there is must be millions of features rite?
Yes. These coefficient are extracted from training databases. You don't have them. SVM stores only support vectors which are sufficient to characterise the margin. See dual form of linear SVM for example.
a number of pedestrian windows of different scales are often associated with the region
True. A merging function is apply. Different methods (such groupRectangles(..)) are available (see here) and take in arguments parameters given to detectMultiScale(..).
What i understood from different papers is that feature extraction using hog is done using several positive and negative images, these features which were extracted is fed to Linear SVM to train them,So peopledetect.cpp uses this trained linear SVM sample, so This feature extraction process is not done by peopledetect.cpp i.e HOGDescriptor::getDefaultPeopleDetector() consists of coefficients of the classifier trained for people detection. The features extracted from hog detection/window(64x128)gives a total of length 3780(4 cells x 9 bins x 7 x 15 blocks = 3780) features. These features are then used to train a linear SVM classifier. If the same feature extraction is done for 1000+ negative and positive sample there is must be millions of features rite? How do we get these co-efficients?
But The HOG descriptors are known to contain redundant information because of the different detection window sizes being used. So when the SVM classifier classifies a region as “pedestrian”, a number of pedestrian windows of different scales are often associated with the region. what peopledetect.cpp mainly does is (hog.detectMultiScale(img, found, 0, Size(8,8), Size(32,32), 1.05, 2);) The detection window is scanned across the image at all positions and scales, and conventional non-maximum suppression is run on the output pyramid to detect object instances.

Resources