I'm trying to build an object detection model with Create ML. In addition to detecting the type of objects in a picture, I would like it to give me in the output the coordinates (or the position) of each detected object.
How can I get the coordinates of each detected object in the output picture?
Is it possible to do that with Create ML? And if yes, how?
Yes, the object detector returns an array of VNRecognizedObjectObservation objects of what it finds. Each object contains an array of matching labels, a confidence value and a bounding box as well as other information.
The bounding box is normalized, so you need to convert it into pixel coordinates with VNImageRectForNormalizedRect before you use it.
Apple has an excellent sample app with detailed explanations which you can find here.
Related
How can I get the actual shape of the detected object, for example, to change its color?
In the picture below
you can see that banana is detected and was put in Rect, but how can I get the actual detected shape of it, for example, to make the banana pink?
Thanks
I found out that it is called semantic segmentation.
I looks like you try to classify objects with machine learning. But you model is only trained to recognize some pretrained objects. Not to give you the shape of an object.
I am working on a project where I need to figure out real world distance of an object with reference to a fixed point using an image.
I have detected the object in a 2D image, using SURF. My object is inside a box now. What will give me the position of the centroid of the object. How can I use this to find out the real word distance?
If I plan to incorporate stereo vision, triangulating the centroid of the object, what is the difference between the distance I obtain here and in the previous method?
On a single image, probably the best starting point to learn about estimating metrical properties is Antonio Criminisi's work (2000 vintage, but still very relevant): http://www.cs.illinois.edu/~dhoiem/courses/vision_spring10/sources/criminisi00.pdf
hello everyone i am pursuing mtech my project is object recognition to recognize specific objects such as weapons etc not allowed at airport so input will be scanned images of baggage/luggage in matlab for now its for static images only now i am using edge detection and histogram processing techniques.. i have gone through internet found ANN genetic algorithm and many more but can't summarize whole scenarios each paper explain in its own way plz help me out to how to proceed with object recognition using edge detection and histogram processing techniques
If you'd like to do object recognition with only the contours, use Shape Context.
Essentially, you will have a database of shapes apriori, where you know the label of each shape (gun, something_harmless_1, knife, something_harmless_2). At query time, you take the contour of your object and compute the Shape Context Distance between your query shape and all shapes in your database. The shape with the shortest Shape Context Distance is then deemed as the true class of your object.
Alternatively, if you wanted to use the histogram of the object, you could do a similar matching but with a different distance metric. Instead of using the Shape Context Distance, you could store a histogram for all objects in your database and compute the Earth Mover's Distance between your query object and all other objects in your database.
It is possible to encode both of these distances in your final result. You can come up with some weighting scheme between them that makes sense for you.
I had been using LK algorithm in detecting corners and interested point for tracking.
However, I am stucked at this point where I need to have something like a rectangle box to follow the tracked object. All I have now was just a lot of points showing my moving objects.
Is there any methods or suggestions for that? Also, any idea on adding counter into the window so that my object moving in and out the screen can be counted as well?
Thank you
There are lots of options! Within OpenCV, I'd suggest using CamShift as a starting point, since it is a relatively easy to use. CamShift uses mean shift to iteratively search for an object in consecutive frames.
Note that you need to seed the tracker with some kind of input. You could have the user draw a rectangle around the object, or use a detector to get the initial input. If you want to track faces, for example, OpenCV has a cascade classifier and training data for a face detector included.
I am currently looking for a way to fit a simple shape (e.g. a T or an L shape) to a 2D point cloud. What I need as a result is the position and orientation of the shape.
I have been looking at a couple of approaches but most seem very complicated and involve building and learning a sample database first. As I am dealing with very simple shapes I was hoping that there might be a simpler approach.
By saying you don't want to do any training I am guessing that you mean you don't want to do any feature matching; feature matching is used to make good guesses about the pose (location and orientation) of the object in the image, and would be applicable along with RANSAC to your problem for guessing and verifying good hypotheses about object pose.
The simplest approach is template matching, but this may be too computationally complex (it depends on your use case). In template matching you simply loop over the possible locations of the object and its possible orientations and possible scales and check how well the template (a cloud that looks like an L or a T at that location and orientation and scale) matches (or you sample possible locations orientations and scales randomly). The checking of the template could be made fairly fast if your points are organised (or you organise them by e.g. converting them into pixels).
If this is too slow there are many methods for making template matching faster and I would recommend to you the Generalised Hough Transform.
Here, before starting the search for templates you loop over the boundary of the shape you are looking for (T or L) and for each point on its boundary you look at the gradient direction and then the angle at that point between the gradient direction and the origin of the object template, and the distance to the origin. You add that to a table (Let us call it Table A) for each boundary point and you end up with a table that maps from gradient direction to the set of possible locations of the origin of the object. Now you set up a 2D voting space, which is really just a 2D array (let us call it Table B) where each pixel contains a number representing the number of votes for the object in that location. Then for each point in the target image (point cloud) you check the gradient and find the set of possible object locations as found in Table A corresponding to that gradient, and then add one vote for all the corresponding object locations in Table B (the Hough space).
This is a very terse explanation but knowing to look for Template Matching and Generalised Hough transform you will be able to find better explanations on the web. E.g. Look at the Wikipedia pages for Template Matching and Hough Transform.
You may need to :
1- extract some features from the image inside which you are looking for the object.
2- extract another set of features in the image of the object
3- match the features (it is possible using methods like SIFT)
4- when you find a match apply RANSAC algorithm. it provides you with transformation matrix (including translation, rotation information).
for using SIFT start from here. it is actually one of the best source-codes written for SIFT. It includes RANSAC algorithm and you do not need to implement it by yourself.
you can read about RANSAC here.
Two common ways for detecting the shapes (L, T, ...) in your 2D pointcloud data would be using OpenCV or Point Cloud Library. I'll explain steps you may take for detecting those shapes in OpenCV. In order to do that, you can use the following 3 methods and the selection of the right method depends on the shape (Size, Area of the shape, ...):
Hough Line Transformation
Template Matching
Finding Contours
The first step would be converting your point to a grayscale Mat object, by doing that you basically make an image of your 2D pointcloud data and so you can use other OpenCV functions. Then you may smooth the image in order to reduce the noises and the result would be somehow a blurry image which contains real edges, if your application does not need real-time processing, you can use bilateralFilter. You can find more information about smoothing here.
The next step would be choosing the method. If the shape is just some sort of orthogonal lines (such as L or T) you can use Hough Line Transformation in order to detect the lines and after detection, you can loop over the lines and calculate the dot product of the lines (since they are orthogonal the result should be 0). You can find more information about Hough Line Transformation here.
Another way would be detecting your shape using Template Matching. Basically, you should make a template of your shape (L or T) and use it in matchTemplate function. You should consider that the size of the template you want to use should be in the order of your image, otherwise you may resize your image. More information about the algorithm can be found here.
If the shapes include areas you can find contours of the shape using findContours, it will give you the number of polygons which are around your shape you want to detect. For instance, if your shape is L, it would have polygon which has roughly 6 lines. Also, you can use some other filters along with findContours such as calculating the area of the shape.