what are the steps in object detection?

what are the steps in object detection? - image-processing

I'm new to image processing and I want to do a project in object detection. So help me by suggesting a step-by-step procedure to this project. Thanx.

Object detection is a very complex problem that includes some real hardcore math and long tuning of parameters to the computation methods involved. Your best bet is to use some freely available library for that - Google will help.

There are lot of algorithms about the theme and no one is the best of all. It's usually a mixture of them what makes the best solution to the solution.
For example, for object movement detection you could look at frame differencing and misture of gaussians.
Also, it's very dependent of your application, the environment (i.e. noise, signal quality), the processing capacity you may have available, the allowable error margin...
Besides, for it to work, most of time it's first necessary to do some kind of image processing to the input data like median filter, sobel filter, contrast enhancement and a large so on.
I think you should start reading all you can: books, google and, very important, a lot of papers about the subjects (there are many free in internet) you are interested in.
And first of all, i think it's fundamental (at least it has been for me) having a good library for testing. The one i have used/use is OpenCV. It's very complete, implement many of the actual more advanced algorithms, is very active, has a big community and it's free.
Open Computer Vision Library (OpenCV)
Have luck ;)

Take a look at AForge.NET. It's nowhere near Project Natal's levels of accuracy or usefulness, but it does give you the tools to learn the algorithms easily. It's an image processing and AI library and there are several tutorials on colored object tracking and motion detection.
Another one to look at is OpenCV from Intel. I believe it's a bit more advanced, but it's written in C.

Take a look at this. It might get you started in this complex field. The algorithm pages that it links to are interesting reading.
http://sun-valley.stanford.edu/projects/helicopters/final.html

This lecture by Jeff Hawkins, will give you an idea about the state of the art in this super-difficult field.
Seems that video disappeared... but this vid should cover similar ground.

Related

Using OpenCV to find people who wear a certain hat

I would like to use computer vision to do the following:
A camera is mounted outside a building, capturing a videostream of the street below. The camera is installed approximately 5-6 meters above the street.
Whenever a person wearing a certain kind of hat(white, round) is captured by the camera, an event should be triggered.
Which algorithm should I look into to implement this kind of behavior ?
Is this best achieved through training the algorithm with sample data or is there another way to tell it to look for this type of hat ?
Also, how do I use multiple frames of video to increase the quality of detection ?
Edit: Added a picture of the hat

Before we do everything in comments I will start an answer here.
The first link you posted describes a simple color-based detection. You can try that, but it will fail if there are other pixel clusters of similar color in the image. Your idea of combining it with tracking is good: Identify clusters, build trajectories over several images, and only accept plausible trajectories as a hit. For robust tracking you may want to look into Kalman filtering. A problem you will most likely encounter is that a "white" hat will hardly be "white" in the images your camera delivers.
The second link you refer to - boosted Classifiers Based on Haar-like Features - is for detection of more complex objects. It probably won't help you find white blobs. Invest your time and energy in learning about tracking.
I'm happy to repeat myself here: "Solving a computer vision problem" is not something like "sorting an array". OpenCV is not the C++ Standard Library. You can use an std::map without knowing anything about a red-black tree. But (IMHO) you can't use Vision APIs without knowing a good deal of the math and theory. Working solutions Computer Vision are typically heavily tuned towards the specific problem scenario. Sorry if that sounds pedantic, but it explains why your question got beaten.

Can the Hough Transform be used in commercial software?

Can the Hough Transform be used in commercial software?
I mean, it is one of those things that seem research only and unstable.
You would not put it in a commercial compositing software for example
and have the user rely on it at all times.
Any opinions?
Thanks

The Hough transform has been in use in commercial and industrial applications all over the world for years, decades even. From the wikipedia page you can see that it was first developed in 1972, based on earlier ideas from 1962. That means it is older than the CCD that you use to capture the images you use in the compositing software.
Given that it "seems research only and unstable" to you, I would suggest you spend some time learning various computer vision and image analysis algorithms and techniques, and get a good mathematical basis in the field in general before you implement the Hough transform in commercial compositing software.
And when you are done studying I'd suggest you use a well tested open source implementation.

Yes. In fact, I've written Hough transform code for a piece of commercial software that wasn't meant to be a research tool like MATLAB. Though I put a lot of time into its robustness towards a specific application, it worked great.
The Hough transform by itself can sometimes be unreliable in applications where you have some level noise, such in webcams, or when there are some distortions in the shape you need to extract. This may be what you are seeing. In this case you may need to do a little more tuning towards your application, or try some basic image preprocessing.

I'm a bit annoyed with the condescending tone in both the comment to the question (by High Performance Mark), as well as the accepted answer here.
Firstly, that programming libraries/frameworks provide an implementation of an algorithm does not mean it is used, or rather, suited for commercial applications (i.e. getting the job done, robustly, on less pristine conditions). The Hough transform is a well defined algorithm (with possible uses and limitations) which is simple enough to understand, and very commonly taught in introductory image processing courses. Not surprisingly, it has been implemented in general purpose libraries such as Matlab's, Octave's and OpenCV. I don't believe the question was intended to discuss the robustness of an implementation and possibility of inclusion in commercial image processing frameworks, but rather if the algorithm itself is well suited for end user software (an application that counts circles, or what not).
The accepted answer, as it stands, is "The algorithm is very old. Here is a book on image processing, here is a link to a image processing library that has implemented it". The other answer with zero score seems to be on topic (i.e. discussion possible applications), though isn't very specific ("worked for me").
So, why do some people get the impression that the hough transform is unreliable for shape detection? Here is a good example: Unreliable results with cv2.HoughCircles
The input seems to be very well defined circles. However, the more robust, suggested working solution doesn't use Hough transform. I've had similar experience with my own projects. Usually, the more robust way is some kind of object segmentation, distance transform, watershed and peak localization. Have I ever used Hough transform with good results? No. I think it could be useful in some cases. In particular if the shapes of the imaged objects are perfectly defined, and partially occluded.
In other words, I'm also curious as to commercial applications that ended up benefiting from Hough transform. That's how I came across this question, and subsequently disappointed in the "you wouldn't ask that question if you understood the subject better", responses.

Natural feature tracking with openCV- evaluating the options

In brief, what are the available options for implementing the Tracking of a particular Image(A photo/graphic/logo) in webcam feed using OpenCv?In particular i am trying to collate opinion about the following:
Would HaarTraining be overkill(considering that it is not 3d objects but simply Images to be tracked) or is it the only way out?
Have tried Template Matching, Color-based detection but these don't offer reliable tracking under varying illumination/Scale/Orientation at all.
Would SIFT,SURF feature matching work as reliably in video as with static image
comparison?
Am a relative beginner to OpenCV , as is evident by my previous queries on SO (very helpful replies). Any cues or links to what could be good resources for beginning NFT implementation with OpenCV?

Can you talk a bit more about your requirements? Namely, what type of appearance variations do you expect/how much control you have over the environment. What type of constraints do you have in terms of speed/power/resource footprint?
Without those, I can only give some general assessment to the 3 paths you are talking about.
1.
Haar would work well and fast, particularly for instance recognition.
Note that Haar doesn't work all that well for 3D unless you train with a full spectrum of templates to cover various perspectives. The poster child application of Haar cascades is Viola Jones' face detection system which is largely geared towards frontal faces (can certainly be trained for many other things)
For a tutorial on doing Haar training using OpenCV, see here.
2.
Try NCC or better yet, Lucas Kanade tracking (cvCalcOpticalFlowPyrLK which is a pyramidal as in coarse-to-fine LK - a 4 level pyramid usually works well) for a template. Usually good upto 10% scale or 10 degrees rotation without template changes. Beyond that, you can have automatically evolving templates which can drift over time.
For a quick Optical Flow/tracking tutorial, see this.
3.
SIFT/SURF would indeed work very well. I'd suggest some additional geometric verification step to remove spurious matches.
I'd be a bit concerned about the amount of computational time involved. If there isn't significant illumination/scale/in-plane rotation, then SIFT is probably overkill. If you truly need it, check out Changchang Wu's excellent SIFTGPU implmentation. Note: 3rd party, not OpenCV.

It seems that none of the methods when applied alone could bring reliable results unless it is a hobby project. Probably some adaptive algorithm would be more or less acceptable. For example see a famous opensource project where they use machine learning.

learning steps for image recognition algorithm

I have decided to spend my personal time after office hours to learn the building blocks of how images jpeg type are parsed and represented in screen. My interest is on object recognition in an image.so I want to know where to start , I know there are math involved in this.so I needed step by step on what resources in Internet specifically to look at.

Need a lot more information on what you want, but take a look at OpenCV
http://sourceforge.net/projects/opencvlibrary/
To see good examples.

I'd get Ritter's book (warning: costly!) and give it serious studying. If you just want to grab existing code and go play then perhaps you should look at libraries like OpenCV (see Lou's answer).

The ultimate goal of most image processing is to extract information about some high-level and application-dependent objects from an image available in low-level (pixel) form. The objects may be of every day interest like in robotics, cosmic ray showers or particle tracks like in physics, chromosomes like in biology, houses, roads, or differently used agricultural surfaces like in aerial photography or synthetic-aperture radar, etc.
This task of pattern recognition is usually preceded by multiple steps of image restoration and enhancement, image segmentation, or feature extraction, steps which can be described in general terms. The final description in problem-dependent terms, and even more so the eventual image reconstruction, escapes such generality, and the literature of application areas has to be consulted.

GUI version of OpenCV for feature-detection (SIFT etc.) prototyping before actual project development?

I had an idea for which I need to be able to recognize certain objects or models from a rendered three dimensional digital movie.
After limited research, I know now that what I need is called feature detection in the field of Computer Vision.
So, what I want to do is:
create a few screenshots of a certain character in the movie (eg. front/back/leftSide/rightSide)
play the movie
while playing the movie, continuously create new screenshots of the movie
for each screenshot, perform feature detection (SIFT?, with openCV?) to see if any of our character appearances are there (they must still be recognized if the character is further away and thus appears smaller, or if the character is eg. lying down).
give a notice whenever the character is found
This would be possible with OpenCV, right?
The "issue" is that I would have to learn c++ or python to develop this application. This is not a problem if my movie and screenshots are applicable for what I want to do.
So, I would like to first test my screenshots of the movie. Is there a GUI version of OpenCV that I can input my test data and then execute it's feature detection algorithms manually as a means of prototyping?
Any feedback is appreciated. Thanks.

There is no GUI of OpenCV able to do what you want. You will be able to use OpenCV for some aspects of your problem, but there is no ready-made solution waiting there for you.
While it's definitely possible to solve your problem, the learning curve for this problem is quite long. If you're a professional, then an alternative to learning about it yourself would be to hire an expert to do it for you. It would cost money, but save you time.
EDIT
As far as template matching goes, you wouldn't normally use it to solve such a problem because the thing you're looking for is changing appearance and shape. There aren't really any "dynamic parameters to set". The closest thing you could try is have a massive template collection that would try to cover the expected forms that your target may take. But it would hardly be an elegant solution. Plus it wouldn't scale.
Next, to your point about face recognition. This is kind of related, but most facial recognition applications deal with a controlled environment: lighting, distance, pose, angle, etc. Outside of that controlled environment face detection effectiveness drops significantly. If you're detecting objects in a movie, then your environment isn't really controlled.
You may want to first try a simpler problem of accurately detecting where the characters are, without determining who they are (video surveillance, essentially). While it may sound simple, you'll find that it's actually non-trivial for arbitrary scenes. The result of solving that problem may be useful in identifying the characters.

There is Find-Object by Mathieu Labbé. It was very helpful for me to start getting an understanding of the descriptors since you can change them while your video is running to see what happens.
This is probably too late, but might help someone else looking for a solution.

Well, using OpenCV you would of taking a frame of a video file and do any computations on it.
You can do several different methods of detecting a character on that image, but it's not so easy to have it as flexible so you can even get that person if it's lying on the floor for example, if you only entered reference images of that character standing.
Basically you could try extracting all important features from your set of reference pictures and have a (in your case supervised) learning algorithm that gets a good feature-vector of that character for classification.
You then need to write your code that plays the video and which takes a video frame let's say each 500ms (or other as you desire), gets a segmentation of the object you thing would be that character and compare it with the reference values you get from your learning algorithm. If there's a match, your code can yell "Yehaaawww!" or do other things...
But all this depends on how flexible you want this to be. You could also try a template match or cross-correlation which basically shifts the reference image(s) over the frame and checks how equal both parts are. But this unfortunately is very sensitive for rotation, deformations or other noise... so you wouldn't get that person if its i.e. laying down. And I doubt you can get all those calculations done in realtime...
Basically: Yes OpenCV is good to use for your image processing/computer vision tasks. But it offers a lot of methods and ways and you'd need to find a way that works for your images... it's not a trivial task though...
Hope that helps...

Have you tried looking at some of the work of the Oxford visual geometry group?
Their Video Google system describes to a large extent what you want, instance detection.
Their work into Naming People in TV shows is also pretty relevant. A face detection and facial feature pipeline is included that can be run from Matlab. Are you familiar with Matlab?

Have you tried computer vision frameworks like Cassandra? There you can exactly do that just by some mouse clicks.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart