I am currently using a CNN based object detection module which gives me objects which I then use as input for tracking using OpenCV. The object detection module produced rectangles until now but I want to shift to a segmentation module like Mask-RCNN which outputs masks along with rectangles for each object. Masks are a more accurate representation of an object. All the trackers in OpenCV take rectangles as input. Is there any way to use the masks for tracking an object rather than the boxes. I can convert the masks to contours if that will help me track the object.
Sorry, there is no built-in out-of-box solution in OpenCV for active contour models.
This segmentation model is widely used on computer vision problems (was proposed by Kass on 1988 and is the starting point for other segmentation model based on energies like level sets models, geodesic active contours or fuzzy-snake model.
So, trying to perform the active contour segmentation on OpenCV, there are several solution, but I think you must understand the mathematical model in order to be able to set the parameters properly according to the context of application.
There is a nice implementation (a bit obfuscated) by Eric Yuan
And others implementation from SO, that could help you to link between theory and implementation:
Solution 1
Solution 2
My advice:
Read the original paper to understand the parameters.
Test some examples on Matlab to play a bit with parameters and results.
Test some of the implementation using OpenCV that are linked here.
Determine the best parameters for you problem context and test them.
Think about contributing to OpenCV with you results.
active contours can track using contours as input. https://www.ee.iitb.ac.in/uma/~krishnan/research.html
So you initialize the first frame using contour from cnn model and in subsequent frames, you don't need to call the expensive forward but able to update the contour to a new one based on this model.
Related
I'm in research for my final project, i want to make object detection and motion classification like amazon go, i have read lot of research like object detection with SSD or YOLO and video classification using CNN+LSTM, i want to propose training algorithm like this:
Real time detection for multiple object (in my case: person) with SSD/YOLO
Get the boundary object and crop the frame
Feed cropped frame info to CNN+LSTM algo to make motion prediction (if the person's walking/taking items)
is it possible to make it in real-time environment?
or is there any better method for real-time detection and motion classification
If you want to use it in real-time application, several other things must be considered which are not appeared before implementation of algorithm in real environment.
About your 3-step proposed method, it already could be result in a good method, but the first step would be very accurate. I think it is better to combine the 3 steps in one step. Because the motion type of person is a good feature of a person. Because of that, I think all steps could be gathered in one step.
My idea is as follows:
1. a video classification dataset which just tag the movement of person or object
2. cnn-lstm based video classification method
This would solve your project properly.
This answer need to more details, if u interested in, I can answer u in more details.
Had pretty much the same problem. Motion prediction does not work that well in complex real-life situations. Here is a simple one:
(See in action)
I'm building a 4K video processing tool (some examples). Current approach looks like the following:
do rough but super fast segmentation
extract bounding box and shape
apply some "meta vision magic"
do precise segmentation within identified area
(See in action)
As of now the approach looks way more flexible comparing to motion tracking.
"Meta vision" intended to properly track shape evolution:
(See in action)
Let's compare:
Meta vision disabled
Meta vision enabled
I am new to opencv, I am guessing that this problem could be somewhat simple: I am trying to detect an object which is almost 25 by 15 pixels in an image which is 470 by 590 pixels.
I am attaching a zoomed image of this object, I have several options to go with:
1 - Two close Circles Detection using hough transformation,
2 - Histogram matching
3 - SURF feature detection
Any advise on which direction should I take? Please consider speed and real-time application. Thanks
I think it should go without explicitly saying so, but there are probably hundreds of things that could be tried, and with only one example image it is quite difficult to advise. For instance are the LED always green? we don't know.
That aside, imho, two good places to start would be with the ol' faithful template matching, or blob detection.
Then if that is not robust enough, you will need to look at some alternative representations of the template/blob, like the classic HoG (good for shape, maybe a bit heavy this app.), or even your own bespoke one that encodes your own domain specific knowledge of this problem.
Then if that is not robust enough, build a dataset of representative +ve and -ve examples, as big as you can, and then train a machine like svm , or a boosted classifier.
Template Matching:
http://docs.opencv.org/doc/tutorials/imgproc/histograms/template_matching/template_matching.html
Blob detection:
https://code.google.com/p/cvblob/
Machine Learning:
http://docs.opencv.org/modules/ml/doc/ml.html
TIPS:
Add as much domain knowledge as possible, i.e. if they are always green, use color in the representation, like hog on g channel for instance. If they are always circular, try to encode that, like use a log-polar grid in the template,rather than a regular grid... and so on.
Machine Learning is not magic, a linear classifier will essentially weight different points in the feature space, so you still require a good representation, so if the Template matching was a total fail, the it is unlikely that simple linear ml with help, but if the Template matching was okay, then ml may well boost the performance to a good level.
step 1: Remove the black background.
step 2: A snake algorithm can be used to find the boundaries of your object
I'm trying to do object detecting jobs using OpenCV. But there is something confuses me. Tracking & predicting algorithm like camshift and kalman filters can fulfill the task of tracking while SURF matching methods can also do that.
I don't quite understand the difference between the two approaches. I have done some codings based on feature2d (SURF is used) and motion_analysis_and_object_tracking (camshift is used) of OpenCV tutorial. It seems like they're just two means of one purpose. Am I right or am I missing out some concept?
And is it a good way to combine camshift tracking with SURF feature matching?...maybe more stuff can be applied, like contour matching?
Short answer is:
Detect interesting object using keypoint (SURF) or any other approach.
Get bounding rectangle of object and pass it as input for object tracker (e.g. CAMShift).
Use object tracker unless object will not lost.
Object tracking is process of finding the position of an object
using the information in previous frames. The difference between tracking and
detection is that while both the process localize the position of the object,
detection does not used any information from previous frames to localize the
object.
Look at "Object Tracking: A Survey" by Alper Yilmaz, Omar Javed and Mubarak Shah. This paper contains comprehensive overview of detection and tracking techniques.
I would like to know, if there is any code or any good documentation available for implementing HOG features? I tried to read the documentation here but it's quite difficult to understand and it needs SVM..
What I need is just to implement a HOG detector for objects.... Like what it does SIFT or SURF
Btw, I'm not interesting in this work.
Thank you..
you can take a look at
http://szproxy.blogspot.com/2010/12/testtest.html
he also published "tutorial" for HOG on source forge here:
http://sourceforge.net/projects/hogtrainingtuto/?_test=beta
I know this since I'm having the same problem as you. The tutorial though isn't what i would call a tutorial, its a bunch of source codes, no documentation, but I assume that it works and can at least get you somewhere.
At the end and simplifying a bit, all that you need to detect specific objects in image is:
Localize "points of interest" to extract the patches:
In order to get points of interest, you can use some algorithms like Harris corner detector, randomly or something simply like sliding windows.
From these points get patches:
You will have to take the decission of the patch size.
From these patches compute the feature descriptor. (like HOG).
Instead of HOG you can use another feature descriptor like SIFT, SURF...
HOG's implementation is not too hard. You have to calculate the gradients of the extracted patch doing applying Sobel X and Y kernels, after that you have to divide the patch in NxM cells, 8x8 for instance, and compute an histogram of gradients, angle and magnitude. In the following link you can see it more detailed explanation:
HOG Person Detector Tutorial
Check your feature vector in the previously trained classifier
Once you got this vector, check if it is the desired object or not with a previously trained classifier like SMV. Instead SVM you could use NeuralNetworks for instance.
SVM implementation is more dificult, but there are some libraries like opencv that you can use.
There is a function extractHOGFeatures in the Computer Vision System Toolbox for MATLAB.
I have a basic understanding in image processing and now studying in-depth the "Digital Image Processing" book by Gonzales.
When image given and object of interest approximated form is known (e.g. circle, triangle),
what is the best algorithm / method to find this object on image?
The object can be slightly deformed, so brute force approach will not help.
You may try using Histograms of Oriented Gradients (also called Edge Orientation Histograms). We have used them for detecting road signs. http://en.wikipedia.org/wiki/Histogram_of_oriented_gradients and the papers by Bill Triggs should get you started.
I recommend you use the Hough transform, which allows you to find any given pattern described by a equation. What's more the Hough transform works also great for deformed objects.
The algorithm and implementation itself is quite simple.
More details can be found here: http://en.wikipedia.org/wiki/Hough_transform , even a source code for this algorithm is included on a referenced page (http://www.rob.cs.tu-bs.de/content/04-teaching/06-interactive/HNF.html).
I hope that helps you.
I would look at your problem in two steps:
first finding your object's outer boundary:
I'm supposing you have contrasted enough image, that you can easily threshold to get a binary image of your object. You need to extract the object boundary chain-code.
then analyzing the boundary's shape to deduce the form (circle, polygon,...):
You can calculate the curvature in each point of the boundary chain and thus determine how many sharp angles (i.e. high curvature value) there are in your shape. Several sharp angles means you have a polygon, none means you have a circle (constant curvature).
You can find a description on how to get your object's boundary from the binary image and ways of analysing it in Gonzalez's Digital Image Processing, chapter 11.
I also found this insightful presentation on binary image analyis (PPT) and a matlab script that implements some of the techniques that Gonzalez talks about in DIP.
I strongly recommend you to use OpenCV, it's a great computer vision library that greatly help with anything related to computer vision. Their website isn't really attractive, nor helpful, but the API is really powerful.
A book that helped me a lot since there isn't a load of documentation on the web is Learning OpenCV. The documentation that comes with the API is good, but not great for learning how to use it.
Related to your problem, you could use a Canny Edge detector to find the border of your item and then analyse it, or you could proceed with and Hough transform to search for lines and or circles.
you can specially try 'face recognition'. Because, you know that is a specific topic. On the other hand 'face detection' etc. EmguCV can be useful for you.. It is .Net wrapper to the Intel OpenCV image processing library.
It looks like professor Jean Rouat from the University of Sherbooke, has found a way to find objects in images by processing neutral spiking neural network. His technology name RN-SPIKES, seems to be available for licencing.