I started learning Image Recognition a few days back and I would like to do a project in which it need to identify different brand logos in Android.
For Ex: If I take a picture of Nike logo in an Android device then it needs to display "Nike".
Low computational time would be the main criteria for me.
For this, I have done some work and started learning OpenCV sample examples.
What would be the best Image Recognition that would be used for me.
1) I came to know from Template Matching that their applicability is limited mostly by the available computational power, as identification of big and complex templates can be time consuming. (and so I don't want to use it)
2) Feature Based detectors like SIFT/SURF/STAR (As per my knowledge this would be a better option for me)
3) How about Deep Learning and Pattern recognition concepts? (I was digging on this and don't know whether it would be an option for me). Can any of you let me know whether I can use this and why it would be an better choice for me when compared with 1 and 2.
4) Haar caascade classifiers (From one of the posts in SO, I came to know that by using Haar it doesn't work in Rotation and Scale invariant and so I haven't concentrated much on this). Does this been a better Option for me If I focus up on.
I’m now running one of my pet projects and it's required face recognition – detecting the area with face on the photo, if it exists with Raspberry pi, so I’ve done some analysis about that task
And I found this approach. The key idea is in avoiding scanning entire picture to help by scanning windows of different sizes like it was in OpenCV, but by dividing an entire photo into 49 (7x7) squares and train the model not only for detecting of presenting one of classes inside each square, but also for determining the location and size of detecting object
It’s only 49 runs of trained model, so I think it's possible to execute this in less than in a second even on non state-of-the-art smartphones. Anyway, it will be a trade-off between accuracy and performance
About the model
I will use vgg –like model, probably a bit simpler than even vgg11A.
In my case ready dataset already exists. So I can train convolutional network with it
Why deep learning approach is better than 1-3 you mentioned? Because of its higher accuracy for such kind of tasks. It’s practically proven. You could check it in kaggle. Majority of the top models for classification competitions are based on convolutional networks
The only disadvantage for you – probably it would be necessary create your own dataset to train the model
Here is a post that I think can be useful for you: Image Processing: Algorithm Improvement for 'Coca-Cola Can' Recognition. Another one: Logo recognition in images.
2) Feature Based detectors like SIFT/SURF/STAR (As per my knowledge
this would be a better option for me)
Just remember that SIFT and SURF are both patented so you will need a license for any commercial use (free for non-commercial use).
4) Haar caascade classifiers (From one of the posts in SO, I came to know that by using Haar it doesn't work in Rotation and Scale invariant and so I haven't concentrated much on this). Does this been a better Option for me If I focus up on.
It works (if I understand your question right), much of this depends of how you trained your classifier. You could train it to detect all kind of rotations and scales. Anyways, I would discourage you to go for this option as I think the other possible solutions are better meant for the case.
I want to do pedestrian detection and tracking.
Input: Video Stream from CCTV camera.
Output:
#(no of) people going from left to right
# people going from right to left
# No. of people in the middle
What have i done so far:
For pedestrian detection I am using HOG and SVM. The detection is decent with high false positive rate. And its very slow as i am running in android platform.
Question:
After detection how to do I calculate the required values listed above. Can anyone tell me what is the tracking algorithm I have to use and any good algorithm for pedestrian detection.
Or should I use tracking algorithm? Is there a way to do without it?
Any references to codes/blogs/technical papers is appreciated.
Platform: C++ & OpenCV / android.
--Thanks
This is somehow close to a research problem.
You may want to have a look to this website which gathers a lot of references.
In particular, the work done by the group from Oxford present therein is pretty close to what you are doing, since their are using HOG for detection. (That work has been extremely illuminating for me).
EPFL and Julich have as well work done in the field.
You may also want to give a look to this review which describes several detection/tracking techniques, often involving variants of the HOG algorithm.
Along with #Acorbe response, I suggest the publications section of this (archived) website.
A recent work at the end of last year also released a code base here:
https://bitbucket.org/rodrigob/doppia
There have also been earlier pedestrian detector works that have released code as well:
https://sites.google.com/site/wujx2001/home/c4
http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians
The best accurate way is to use tracking algorithm instead of statistic appearance counting of incoming people and detection occurred left right and middle..
You can use extended statistical models.. That produce how many inputs producing one of the outputs and back validate from output detection the input.
My experience is that tracking leads to better results than approach above. But is also little bit complicated. We talk about multi target tracking when the critical is match detection with tracked model which should be update based on detection. If tracking is matched with wrong model. The problems are there.
Here on youtube I developed some multi target tracker by simple LBP people detector, but multi model and kalman filter for tracking. Both capabilities are available in opencv. You need to when something is detected create new kalman filter for each object and update in case you match same detection. Predict in case detection is not here in frame and also remove the Kalman i it is not necessary to track any more.
1 Detect
2 Match detections with kalmans, hungarian algorithm and l2 norm. (for example)
3 Lot of work. Decide if kalman shoudl be established, remove, update, or results is not detected and should be predicted. This is lot of work here.
Pure statistic approach is less accurate, second one is for experience people at least one moth of coding and 3 month of tuning.. If you need to be faster and your resources are quite limited. You can by smart statistic achieve your results by pure detection much faster and little bit less accurate. People are judge the image and video tracking even multi target tracking is capable to beat human. Try to count and register each person in video and count exits point. You are not able to do this in some number of people. It is really repents on, what you want, application, customer you have, and results you show to customers. If this is 4 numbers income, left, right, middle and your error is 20 percent is still much more than one bored small paid guard should achieved by all day long counting..
https://www.youtube.com/watch?v=d-RCKfVjFI4
You can find on my BLOG Some dataset for people detection and car detection on my blog same as script for learning ideas, tutorials and tracking examples..
Opencv blog tutorials code and ideas
You can use KLT for this purpose as this will tell you the flow of person traveling from left to right then you can compute that by computing line length which in given example is drawn using cv2.line you can use input parameters of this functions to compute your case, little math involved. if there is a flow of pixels from left to right this is case 1 or right to left then case 3 and for no flow case 2. Or you can use this basic tutorial to track object movement. LINK
In brief, what are the available options for implementing the Tracking of a particular Image(A photo/graphic/logo) in webcam feed using OpenCv?In particular i am trying to collate opinion about the following:
Would HaarTraining be overkill(considering that it is not 3d objects but simply Images to be tracked) or is it the only way out?
Have tried Template Matching, Color-based detection but these don't offer reliable tracking under varying illumination/Scale/Orientation at all.
Would SIFT,SURF feature matching work as reliably in video as with static image
comparison?
Am a relative beginner to OpenCV , as is evident by my previous queries on SO (very helpful replies). Any cues or links to what could be good resources for beginning NFT implementation with OpenCV?
Can you talk a bit more about your requirements? Namely, what type of appearance variations do you expect/how much control you have over the environment. What type of constraints do you have in terms of speed/power/resource footprint?
Without those, I can only give some general assessment to the 3 paths you are talking about.
1.
Haar would work well and fast, particularly for instance recognition.
Note that Haar doesn't work all that well for 3D unless you train with a full spectrum of templates to cover various perspectives. The poster child application of Haar cascades is Viola Jones' face detection system which is largely geared towards frontal faces (can certainly be trained for many other things)
For a tutorial on doing Haar training using OpenCV, see here.
2.
Try NCC or better yet, Lucas Kanade tracking (cvCalcOpticalFlowPyrLK which is a pyramidal as in coarse-to-fine LK - a 4 level pyramid usually works well) for a template. Usually good upto 10% scale or 10 degrees rotation without template changes. Beyond that, you can have automatically evolving templates which can drift over time.
For a quick Optical Flow/tracking tutorial, see this.
3.
SIFT/SURF would indeed work very well. I'd suggest some additional geometric verification step to remove spurious matches.
I'd be a bit concerned about the amount of computational time involved. If there isn't significant illumination/scale/in-plane rotation, then SIFT is probably overkill. If you truly need it, check out Changchang Wu's excellent SIFTGPU implmentation. Note: 3rd party, not OpenCV.
It seems that none of the methods when applied alone could bring reliable results unless it is a hobby project. Probably some adaptive algorithm would be more or less acceptable. For example see a famous opensource project where they use machine learning.
for my final thesis i am trying to build up an 3d face recognition system by combining color and depth information. the first step i did, is to realign the data-head to an given model-head using the iterative closest point algorithm. for the detection step i was thinking about using the libsvm. but i dont understand how to combine the depth and the color information to one feature vector? they are dependent information (each point consist of color (RGB), depth information and also scan quality).. what do you suggest to do? something like weighting?
edit:
last night i read an article about SURF/SIFT features i would like to use them! could it work? the concept would be the following: extracting this features out of the color image and the depth image (range image), using each feature as a single feature vector for the svm?
Concatenation is indeed a possibility. However, as you are working on 3d face recognition you should have some strategy as to how you go about it. Rotation and translation of faces will be hard to recognize using a "straightforward" approach.
You should decide whether you attempt to perform a detection of the face as a whole, or of sub-features. You could attempt to detect rotation by finding some core features (eyes, nose, etc).
Also, remember that SVMs are inherently binary (i.e. they separate between two classes). Depending on your exact application you will very likely have to employ some multi-class strategy (One-against-all or One-against-many).
I would recommend doing some literature research to see how others have attacked the problem (a google search will be a good start).
It sounds simple, but you can simply concatenate the two vectors into one. Many researchers do this.
What you arrived at is an important open problem. Yes, there are some ways to handle it, as mentioned here by Eamorr. For example you can concatenate and do PCA (or some non linear dimensionality reduction method). But it is kind of hard to defend the practicality of doing so, considering that PCA takes O(n^3) time in the number of features. This alone might be unreasonable for data in vision that may have thousands of features.
As mentioned by others, the easiest approach is to simply combine the two sets of features into one.
SVM is characterized by the normal to the maximum-margin hyperplane, where its components specify the weights/importance of the features, such that higher absolute values have a larger impact on the decision function. Thus SVM assigns weights to each feature all on its own.
In order for this to work, obviously you would have to normalize all the attributes to have the same scale (say transform all features to be in the range [-1,1] or [0,1])
A developer I am working with is developing a program that analyzes images of pavement to find cracks in the pavement. For every crack his program finds, it produces an entry in a file that tells me which pixels make up that particular crack. There are two problems with his software though:
1) It produces several false positives
2) If he finds a crack, he only finds small sections of it and denotes those sections as being separate cracks.
My job is to write software that will read this data, analyze it, and tell the difference between false-positives and actual cracks. I also need to determine how to group together all the small sections of a crack as one.
I have tried various ways of filtering the data to eliminate false-positives, and have been using neural networks to a limited degree of success to group cracks together. I understand there will be error, but as of now, there is just too much error. Does anyone have any insight for a non-AI expert as to the best way to accomplish my task or learn more about it? What kinds of books should I read, or what kind of classes should I take?
EDIT My question is more about how to notice patterns in my coworker's data and identify those patterns as actual cracks. It's the higher-level logic that I'm concerned with, not so much the low-level logic.
EDIT In all actuality, it would take AT LEAST 20 sample images to give an accurate representation of the data I'm working with. It varies a lot. But I do have a sample here, here, and here. These images have already been processed by my coworker's process. The red, blue, and green data is what I have to classify (red stands for dark crack, blue stands for light crack, and green stands for a wide/sealed crack).
In addition to the useful comments about image processing, it also sounds like you're dealing with a clustering problem.
Clustering algorithms come from the machine learning literature, specifically unsupervised learning. As the name implies, the basic idea is to try to identify natural clusters of data points within some large set of data.
For example, the picture below shows how a clustering algorithm might group a bunch of points into 7 clusters (indicated by circles and color):
(source: natekohl.net)
In your case, a clustering algorithm would attempt to repeatedly merge small cracks to form larger cracks, until some stopping criteria is met. The end result would be a smaller set of joined cracks. Of course, cracks are a little different than two-dimensional points -- part of the trick in getting a clustering algorithm to work here will be defining a useful distance metric between two cracks.
Popular clustering algorithms include k-means clustering (demo) and hierarchical clustering. That second link also has a nice step-by-step explanation of how k-means works.
EDIT: This paper by some engineers at Phillips looks relevant to what you're trying to do:
Chenn-Jung Huang, Chua-Chin Wang, Chi-Feng Wu, "Image Processing Techniques for Wafer Defect Cluster Identification," IEEE Design and Test of Computers, vol. 19, no. 2, pp. 44-48, March/April, 2002.
They're doing a visual inspection for defects on silicon wafers, and use a median filter to remove noise before using a nearest-neighbor clustering algorithm to detect the defects.
Here are some related papers/books that they cite that might be useful:
M. Taubenlatt and J. Batchelder, “Patterned Wafer Inspection Using Spatial Filtering for Cluster Environment,” Applied Optics, vol. 31, no. 17, June 1992, pp. 3354-3362.
F.-L. Chen and S.-F. Liu, “A Neural-Network Approach to Recognize Defect Spatial Pattern in Semiconductor Fabrication.” IEEE Trans. Semiconductor Manufacturing, vol. 13, no. 3, Aug. 2000, pp. 366-373.
G. Earl, R. Johnsonbaugh, and S. Jost, Pattern Recognition and Image Analysis, Prentice Hall, Upper Saddle River, N.J., 1996.
Your problem falls in the very broad field of image classification. These types of problems can be notoriously difficult, and at the end of the day, solving them is an art. You must exploit every piece of knowledge you have about the problem domain to make it tractable.
One fundamental issue is normalization. You want to have similarly classified objects to be as similar as possible in their data representation. For example, if you have an image of the cracks, do all images have the same orientation? If not, then rotating the image may help in your classification. Similarly, scaling and translation (refer to this)
You also want to remove as much irrelevant data as possible from your training sets. Rather than directly working on the image, perhaps you could use edge extraction (for example Canny edge detection). This will remove all the 'noise' from the image, leaving only the edges. The exercise is then reduced to identifying which edges are the cracks and which are the natural pavement.
If you want to fast track to a solution then I suggest you first try the your luck with a Convolutional Neural Net, which can perform pretty good image classification with a minimum of preprocessing and noramlization. Its pretty well known in handwriting recognition, and might be just right for what you're doing.
I'm a bit confused by the way you've chosen to break down the problem. If your coworker isn't identifying complete cracks, and that's the spec, then that makes it your problem. But if you manage to stitch all the cracks together, and avoid his false positives, then haven't you just done his job?
That aside, I think this is an edge detection problem rather than a classification problem. If the edge detector is good, then your issues go away.
If you are still set on classification, then you are going to need a training set with known answers, since you need a way to quantify what differentiates a false positive from a real crack. However I still think it is unlikely that your classifier will be able to connect the cracks, since these are specific to each individual paving slab.
I have to agree with ire_and_curses, once you dive into the realm of edge detection to patch your co-developers crack detection, and remove his false positives, it seems as if you would be doing his job. If you can patch what his software did not detect, and remove his false positives around what he has given you. It seems like you would be able to do this for the full image.
If the spec is for him to detect the cracks, and you classify them, then it's his job to do the edge detection and remove false positives. And your job to take what he has given you and classify what type of crack it is. If you have to do edge detection to do that, then it sounds like you are not far from putting your co-developer out of work.
There are some very good answers here. But if you are unable to solve the problem, you may consider Mechanical Turk. In some cases it can be very cost-effective for stubborn problems. I know people who use it for all kinds of things like this (verification that a human can do easily but proves hard to code).
https://www.mturk.com/mturk/welcome
I am no expert by any means, but try looking at Haar Cascades. You may also wish to experiment with the OpenCV toolkit. These two things together do face detection and other object-detection tasks.
You may have to do "training" to develop a Haar Cascade for cracks in pavement.
What’s the best approach to recognize patterns in data, and what’s the best way to learn more on the topic?
The best approach is to study pattern recognition and machine learning. I would start with Duda's Pattern Classification and use Bishop's Pattern Recognition and Machine Learning as reference. It would take a good while for the material to sink in, but getting basic sense of pattern recognition and major approaches of classification problem should give you the direction. I can sit here and make some assumptions about your data, but honestly you probably have the best idea about the data set since you've been dealing with it more than anyone. Some of the useful technique for instance could be support vector machine and boosting.
Edit: An interesting application of boosting is real-time face detection. See Viola/Jones's Rapid Object Detection using a Boosted Cascade of Simple
Features (pdf). Also, looking at the sample images, I'd say you should try improving the edge detection a bit. Maybe smoothing the image with Gaussian and running more aggressive edge detection can increase detection of smaller cracks.
I suggest you pick up any image processing textbook and read on the subject.
Particularly, you might be interested in Morphological Operations like Dilation and Erosion, which complements the job of an edge detector. Plenty of materials on the net...
This is an image processing problem. There are lots of books written on the subject, and much of the material in these books will go beyond a line-detection problem like this. Here is the outline of one technique that would work for the problem.
When you find a crack, you find some pixels that make up the crack. Edge detection filters or other edge detection methods can be used for this.
Start with one (any) pixel in a crack, then "follow" it to make a multipoint line out of the crack -- save the points that make up the line. You can remove some intermediate points if they lie close to a straight line. Do this with all the crack pixels. If you have a star-shaped crack, don't worry about it. Just follow the pixels in one (or two) directions to make up a line, then remove these pixels from the set of crack pixels. The other legs of the star will recognized as separate lines (for now).
You might perform some thinning on the crack pixels before step 1. In other words, check the neighbors of the pixels, and if there are too many then ignore that pixel. (This is a simplification -- you can find several algorithms for this.) Another preprocessing step might be to remove all the lines that are too thin or two faint. This might help with the false positives.
Now you have a lot of short, multipoint lines. For the endpoints of each line, find the nearest line. If the lines are within a tolerance, then "connect" the lines -- link them or add them to the same structure or array. This way, you can connect the close cracks, which would likely be the same crack in the concrete.
It seems like no matter the algorithm, some parameter adjustment will be necessary for good performance. Write it so it's easy to make minor changes in things like intensity thresholds, minimum and maximum thickness, etc.
Depending on the usage environment, you might want to allow user judgement do determine the questionable cases, and/or allow a user to review the all the cracks and click to combine, split or remove detected cracks.
You got some very good answer, esp. #Nate's, and all the links and books suggested are worthwhile. However, I'm surprised nobody suggested the one book that would have been my top pick -- O'Reilly's Programming Collective Intelligence. The title may not seem germane to your question, but, believe me, the contents are: one of the most practical, programmer-oriented coverage of data mining and "machine learning" I've ever seen. Give it a spin!-)
It sounds a little like a problem there is in Rock Mechanics, where there are joints in a rock mass and these joints have to be grouped into 'sets' by orientation, length and other properties. In this instance one method that works well is clustering, although classical K-means does seem to have a few problems which I have addressed in the past using a genetic algorithm to run the interative solution.
In this instance I suspect it might not work quite the same way. In this case I suspect that you need to create your groups to start with i.e. longitudinal, transverse etc. and define exactly what the behviour of each group is i.e. can a single longitudinal crack branch part way along it's length, and if it does what does that do to it's classification.
Once you have that then for each crack, I would generate a random crack or pattern of cracks based on the classification you have created. You can then use something like a least squares approach to see how closely the crack you are checking fits against the random crack / cracks you have generated. You can repeat this analysis many times in the manner of a Monte-Carlo analysis to identify which of the randomly generated crack / cracks best fits the one you are checking.
To then deal with the false positives you will need to create a pattern for each of the different types of false positives i.e. the edge of a kerb is a straight line. You will then be able to run the analysis picking out which is the most likely group for each crack you analyse.
Finally, you will need to 'tweak' the definition of different crack types to try and get a better result. I guess this could either use an automated approach or a manual approach depending on how you define your different crack types.
One other modification that sometimes helps when I'm doing problems like this is to have a random group. By tweaking the sensitivity of a random group i.e. how more or less likely a crack is to be included in the random group, you can sometimes adjust the sensitivty of the model to complex patterns that don't really fit anywhere.
Good luck, looks to me like you have a real challenge.
You should read about data mining, specially pattern mining.
Data mining is the process of extracting patterns from data. As more data are gathered, with the amount of data doubling every three years, data mining is becoming an increasingly important tool to transform these data into information. It is commonly used in a wide range of profiling practices, such as marketing, surveillance, fraud detection and scientific discovery.
A good book on the subject is Data Mining: Practical Machine Learning Tools and Techniques
(source: waikato.ac.nz) ](http://www.amazon.com/Data-Mining-Ian-H-Witten/dp/3446215336 "ISBN 0-12-088407-0")
Basically what you have to do is apply statistical tools and methodologies to your datasets. The most used comparison methodologies are Student's t-test and the Chi squared test, to see if two unrelated variables are related with some confidence.