I am involved in a project regarding image processing where I need to extract features of a given image. I am supposed to do that using wavelets and curvelets. But I cannot find any source where I can fully understand them. I have downloaded several journals and publications but couldn't figure out exactly how features are extracted using them.
Can someone explain how its done. Any tutorial that easily explains them is also welcome.
Thanks in advance.
If you are interested in image processing, you musst know the existance of the library OpenCV. This is the most usefull library for image processing.
In This library there is an implementation of Haar Wavelet transform, maybe that could interest you.
For all this kine of algorithms there is another powerfull source of data. That is Matworks File Exchange. This web page is a matlab open source platform. If you don't use matlab, you can see source codes provided on this web site to understand how does wavelet and curvelet works.
For example, this project may interest you :
http://www.mathworks.com/matlabcentral/fileexchange/33146-feature-extraction-using-multisignal-wavelet-packet-decomposition
Related
I am currently using OpenCV3.0 with the hope i will be able to create a program that does 3 things. First, finds faces within a live video feed. Secondly, extracts the locations of facial landmarks using ASM or AAM. Finally, uses a SVM to classify the facial expression on the persons face in the video.
I have done a fair amount of research into this but can't find anywhere the most suitable open source AAM or ASM library to complete this function. Also if possible I would like to be able to train the AAM or ASM to extract the specific face landmarks i require. For example, all the numbered points in the picture linked below:
www.imgur.com/XnbCZXf
If there are any alternatives to what i have suggested to get the required functionality then feel free to suggest them to me.
Thanks in advance for any answers, all advice is welcome to help me along with this project.
In the comments, I see that you are opting to train your own face landmark detector using the dlib library. You had a few questions regarding what training set dlib used to generate their provided "shape_predictor_68_face_landmarks.dat" model.
Some pointers:
The author (Davis King) stated that he used the annotated images from the iBUG 300-W dataset. This dataset has a total of 11,167 images annotated with the 68-point convention. As a standard trick, he also mirrors each image to effectively double the training set size, ie 11,167*2=22334 images. Here's a link to the dataset: http://ibug.doc.ic.ac.uk/resources/facial-point-annotations/
Note: the iBUG 300-W dataset includes two datasets that are not freely/publicly available: XM2VTS, and FRGCv2. Unfortunately, these images make up a majority of the ibug 300-W (7310 images, or 65.5%).
The original paper only trained on the HELEN, AFW, and LFPW datasets. So, you ought to be able to generate a reasonably-good model on only the publicly-available images (HELEN,LFPW,AFW,IBUG), ie 3857 images.
If you Google "one millisecond face alignment kazemi", the paper (and project page) will be the top hits.
You can read more about the details of the training procedure by reading the comments section of this dlib blog post. In particular, he briefly discusses the parameters he chose for training: http://blog.dlib.net/2014/08/real-time-face-pose-estimation.html
With the size of the training set in mind (thousands of images), I don't think you will get acceptable results with just a handful of images. Fortunately, there are many publicly available face datasets out there, including the dataset linked above :)
Hope that helps!
AAM and ASM are pretty old school and results are a little bit disappointing.
Most of Facial landmarks trackers use cascade of patches or deep-learning. You have DLib that performs pretty well (+BSD licence) with this demo, some other on github or a bunch of API as this one that is free to use.
You can also give a look at my project using C++/OpenCV/DLib with all functionalities you quoted and perfectly operational.
Try Stasm4.0.0. It gives approximately 77 points on face.
I advise you to use FaceTracker library. It is written in C++ using OpenCV 2.x. You won't be disappointed on it.
I have seen multiple haarcascade xmls in opencv for face detection, eye detection , ear detection, Human body detection etc., But couldnt see proper documentation or explanation for these xmls.
For example in a application if I need to detect side faces which xml should I use and what are the parameters to be passed for detectMultiScale?
In some cases if I vary the parameters to detectMultiScale the false detections get reduced, but I did all the tests with trial and error method. I couldnt find any definite articles on explaining the use of each xml and parameters.
Can some one provide the documents on this if any, else some explanation on this would be grateful.
OpenCV has a built-in profile face classifier xml under "..\data\haarcascades". If you want to create your own cascade classifier, you should follow this procedure. Here is another link regarding that.
To learn about the detectMultiScale method, check out the documentation. To understand the how the classifier and its parameters work, check out the viola-jones (2001) article or its explanation.
Here is a paper by Vadim Pisarevsky, one of the OpenCV developers, which may be helpful, in understanding some of the parameters.
On the other hand, if using OpenCV is not a hard requirement, please take a look at vision.CascadeObjectDetector in the Computer Vision System Toolbox for Matlab, which provides the same functionality. It also saves you the trouble of figuring out which xml file to use for profile faces.
I'm doing a business card scanner for my final examination about digital image processing, and I would ask to you how I have to preprocess a photo of a business card so tesseract can recognize text. I tried a lot of things, like erosion, dilation, thresholding, but I can't have a good result... Can you help me?
Thank you
Marco
If your concern is only about text recognition and not about preprocessing, consider using ScanTailor. It is an excellent pre-processing tool and it is open source.
If you want to implement the pre-processing yourself, you might want to have a look at this paper - especially the skew correction and the background estimation.The results of the algorithms described here are good. ScanTailor uses some of these.
I would recommend the open source C++ image processing library OpenCV combination with the open source free Optical Character Recognition (OCR) library tesseract.
Since your information of your problem isn't quite specific, i can answer your question in general
The main procedure in OCR is:
perform some kind of preprocessing on the image
text detection to get your ROI (Region of interest, the region containing your text)
character detection (take the text-only image and use it as input for tesseract
a few words about tesseract:
There is a lot of information to the library available online. It is a google open source library used for the google books OCR purpose. Can also handle layout analyzes in your image, but isn't perfect in this, therefore a preprocessing yourself and using tesseract only for the real character recognition part can lead to a better result. Feel free to question, if you still have questions, or if I missunderstood your question.
I´m a beginner on computer vision, but I know how to use some functions on opencv. I´m tryng to use Opencv for Document Recognition, I want a help to find the steps for it.
I´m thinking to use opencv example find_obj.cpp , but the documents, for example passport, has some variables, name, birthdate, pictures. So, I need a help to define the steps for it, and if is possible how function I have to use on the steps.
I'm not asking a whole code, but if anyone has any example link or you can just type a walkthrough, it is of great help.
There are two very different steps involved here. One is detecting your object, and the other is analyzing it.
For object detection, you're just trying to figure out whether the object is in the frame, and approximately where it's located. The OpenCv features framework is great for this. For some tutorials and comprehensive sample code, see the OpenCv features2d tutorials and especially the feature matching tutorial.
For analysis, you need to dig into optical character recognition (OCR). OpenCv does not include OCR libraries, but I recommend checking out tesseract-ocr, which is a great OCR library. If your documents have a fixed structured (consistent layout of text fields) then tesseract-ocr is all you need. For more advanced analysis checking out ocropus, which uses tesseract-ocr but adds layout analysis.
I need to process DICOM formatted medical images and visualize them in 3D, also do some image processing on these images on real-time. Therefore, I am asking this question to learn which SDK has better real-time characteristics for medical visualization and image processing?
The Visualization Toolkit (VTK) is an open-source, freely available software system for 3D computer graphics, image processing and visualization.
You can find details here.
Or another solution would be the modifying or utilizing 3D engine that supports volume rendering.
Moreover, for computer vision algorithms, OpenCV seems promising.
osgVolume is an add-in to the popular openscenegraph library for doing this
Just use GDCM+VTK. In 2D simply use gdcmviewer. In 3D you need to build gdcmorthoplanes.
Ref:
http://sourceforge.net/apps/mediawiki/gdcm/index.php?title=Gdcmviewer
http://sourceforge.net/apps/mediawiki/gdcm/index.php?title=Using_GDCM_API
You could check out MITK (http://mitk.org) which combines the already mentioned VTK with the Insight Toolkit (http://www.itk.org) for image processing. Another option to start from could be Slicer (http://www.slicer.org), but this depends on the license you need.
In a uni we were taught Matlab for DICOM file processing. I think it has pretty nice and easy to use plugins for that as well. The end results were that using Matlab I was able to do all kinds of DICOM image processing, filtering and so forth.
As you probably know, Matlab is not SDK but a complete environment. Nevertheless you can write scripts to achieve normal application behavior: Create windows, buttons, images, etc.