I am struggling to create a custom haar classifier. I have found a couple tutorials on the web, but they do not specify which version of opencv they are using. What I need is a very concise and simplified example of the steps that are required, along with a simple dataset of images. I also need to know the opencv version and the OS platform so I can get it running. I have tried a matrix of opencv versions on both windows and linux and I have run into memory error after memory error. I would like to start with a known good set of data and simple commands before expanding it to fit my problem.
Thanks for your help,
Chris
OpenCV provides two utility commands createsamples.exe and haartraining.exe, which can generate xml files used by Haar Classifiers. That is, with the xml file outputted from haartraining.exe, you can directly use the face detection sample with your xml file to detect any customized objects.
About the detailed procedures to use the commands, you may consult Page 513-516 in the book "Learning OpenCV", or this tutorial.
About the internal mechanism of how the classifier works, you may consult the paper "Rapid Object Detection using a Boosted Cascade of Simple
Features", which has been cited 5500+ times.
Related
I'm building an in androidStudio that uses OpenCV to identify an object. The detection is ok, but I don't know how a simple file XML allow my programm to idetify my object.
Everything that I know is that somehow OpenCV uses a convolucional neural network to do it, so it's necessary that I do the training of the CNN do adjust the internal parameter, but what does the XML exactly??? How to works this magic thing??
I'm not completely sure if this is what you mean, but I'm going to make a guess.
If you use an xml inside openCV to do some sort of detection, you're most likely working with a Haar feature-based cascade classifier or something very similar. You can learn more about it on this page!
There is also a very good blog post written on which steps openCV takes to make the detection happen: here
Hopefully this can dispel a bit of this magic and clear things up!
In very short, the xml holds a sort of 'pattern' that was learned using machine learning and a lot of examples. Once trained, openCV can search images for the pattern that it found in the xml.
In a paper titled, "Machine Learning at the Limit," Canny, et. al. report substantial word2vec processing speed improvements.
I'm working with the BIDMach library used in this paper, and cannot find any resource that explains how Word2Vec is implemented or how it should be used within this framework.
There are several scripts in the repo:
getw2vdata.sh
getwv2data.ssc
I've tried running them (after building the referenced tparse2.exe file) with no success.
I've tried modifying them to get them to run but have nothing but errors come back.
I emailed the author, and posted an issue on the github repo, but have gotten nothing back. I only got somebody else having the same troubles, who says he got it to run but at much slower speeds than reported on newer GPU hardware.
I've searched all over trying to find anyone that has used this library to achieve these speeds with no luck. There are multiple references floating around that point to this library as the fastest implementation out there, and cite the numbers in the paper:
Intel research references the reported numbers without running the code on GPU (they cite numbers reported in the original paper)
old reddit post pointing to BIDMach as the best (but the OP says "I haven't tested BIDMach myself yet")
SO post citing BIDMach as the best (OP doesn't actually run the library to make this claim...)
many more not worth listing citing BIDMach as the best/fastest without example or claims of "I haven't tested myself..."
When I search for a similiar library (gensim), and the import code required to run it, I find thousands of results and tutorials but a similar search for the BIDMach code yields only the BIDMach repo.
This BIDMach implementation certainly carries the reputation for being the best, but can anyone out there tell me how to use it?
All I want to do is run a simple training process to compare it to a handful of other implementations on my own hardware.
Every other implementation of this concept I can find either has works with the original shell script test file, provides actual instructions, or provides shell scripts of their own to test.
UPDATE:
The author of the library has added additional shell scripts to get the previously mentioned scripts running, but exactly what they mean or how they work is still a total mystery and I can't understand how to get the word2vec training procedure to run on my own data.
EDIT (for bounty)
I'll give out the bounty to anywone that can explain how I'd use my own corpus (text8 would be great), and then train a model, and then save the ouput vectors and the vocabulary to files that can be read by Omar Levy's Hyperwords.
This is exactly what the original C implementation would do with arguments -binary 1 -output vectors.bin -save-vocab vocab.txt
This is also what Intel's implementation does, and other CUDA implementations, etc, so this is a great way to generate something that can be easily compared with other versions...
UPDATE (bounty expired without answer)
John Canny has updated a few scripts in the repo and added a fmt.txt file, thus making it possible to run test scripts that are package in the repo.
However, my attempt to run this with the text8 corpus yields near 0% accuracy on they hyperwords test.
Running the training process on the billion word benchmark (which is what the repo scripts now do) also yields well-below-average accuracy on the hyperwords test.
So, either the library never yielded accuracy on these tests, or I'm still missing something in my setup.
The issue remains open on github.
BIDMach's Word2vec is a tool for learning vector representations of words, also known as word embeddings. To use Word2vec in BIDMach, you will need to first download and install BIDMach, which is an open-source machine learning library written in Scala. Once you have BIDMach installed, you can use the word2vec function to train a Word2vec model on a corpus of text data. This function takes a number of parameters, such as the size of the word vectors, the number of epochs to train for, and the type of model to use. You can find more detailed instructions and examples of how to use the word2vec function in the BIDMach documentation.
For some time, I have been using OpenCV. It satisfied all my needs of feature extraction, matching and clustering(k-means till now) and classification(SVM). Recently, I came across Apache Mahout. But, most of the algorithms for machine learning are already available in OpenCV as well. Are there any advantages of using Mahout over OpenCV if the work relates to Videos and Images ?
This question might be put on hold since it is opinion based. I still want to add a basic comparison.
OpenCV is capable of anything about vision and ml that is possibly researched, or invented. The vision literature is based on it, and it develops according to the literature. Even the newborn ml algorithms -like TLD, originated on MATLAB- (http://www.tldvision.com/) can also be implemented using OpenCV (http://gnebehay.github.io/OpenTLD/) with some effort.
Mahout is capable, too and specific to ml. It includes not only the well known ml algorithms, but also the specific ones. Say you came across to a paper "Processing Apples with K-means Orientation Filtering". You can find OpenCV implementations of this paper all around the web. Even the actual algorithm might be open source and developed using OpenCV. With OpenCV, say it takes 500 lines of code, but with Mahout, the paper might be already implemented with a single method making everything easier
An example about this case is http://en.wikipedia.org/wiki/Canopy_clustering_algorithm, which is harder to implement using OpenCV right now.
Since you are going to work with image data sets you will need to learn about HIPI, too.
To sum up, here is a simple pro-con table:
know-how (learning curve): OpenCV is easier, since you already know about it. Mahout+HIPI will take more time.
examples: Literature + vision community commonly use OpenCV. Open source algorithms are mostly created with C++ api of OpenCV.
ml algorithms: Mahout is only about ml, whereas OpenCV is more generic. Still OpenCV has access to basic ml algorithms.
development: Mahout is easier to work with in terms of coding and time complexity (I am not sure about the latter, but I reckon it is).
I have seen multiple haarcascade xmls in opencv for face detection, eye detection , ear detection, Human body detection etc., But couldnt see proper documentation or explanation for these xmls.
For example in a application if I need to detect side faces which xml should I use and what are the parameters to be passed for detectMultiScale?
In some cases if I vary the parameters to detectMultiScale the false detections get reduced, but I did all the tests with trial and error method. I couldnt find any definite articles on explaining the use of each xml and parameters.
Can some one provide the documents on this if any, else some explanation on this would be grateful.
OpenCV has a built-in profile face classifier xml under "..\data\haarcascades". If you want to create your own cascade classifier, you should follow this procedure. Here is another link regarding that.
To learn about the detectMultiScale method, check out the documentation. To understand the how the classifier and its parameters work, check out the viola-jones (2001) article or its explanation.
Here is a paper by Vadim Pisarevsky, one of the OpenCV developers, which may be helpful, in understanding some of the parameters.
On the other hand, if using OpenCV is not a hard requirement, please take a look at vision.CascadeObjectDetector in the Computer Vision System Toolbox for Matlab, which provides the same functionality. It also saves you the trouble of figuring out which xml file to use for profile faces.
I´m a beginner on computer vision, but I know how to use some functions on opencv. I´m tryng to use Opencv for Document Recognition, I want a help to find the steps for it.
I´m thinking to use opencv example find_obj.cpp , but the documents, for example passport, has some variables, name, birthdate, pictures. So, I need a help to define the steps for it, and if is possible how function I have to use on the steps.
I'm not asking a whole code, but if anyone has any example link or you can just type a walkthrough, it is of great help.
There are two very different steps involved here. One is detecting your object, and the other is analyzing it.
For object detection, you're just trying to figure out whether the object is in the frame, and approximately where it's located. The OpenCv features framework is great for this. For some tutorials and comprehensive sample code, see the OpenCv features2d tutorials and especially the feature matching tutorial.
For analysis, you need to dig into optical character recognition (OCR). OpenCv does not include OCR libraries, but I recommend checking out tesseract-ocr, which is a great OCR library. If your documents have a fixed structured (consistent layout of text fields) then tesseract-ocr is all you need. For more advanced analysis checking out ocropus, which uses tesseract-ocr but adds layout analysis.