I'm wondering if it is possible to use the OpenCV framework to recognise a building?
For example, if I store an image of a building, is it possible to use OpenCV to detect this building through the iPhone camera?
Thanks!
Detecting known objects such as your building in an image can be done using the features2d module in OpenCV.
It works by detecting key points in the known image and computing a set of descriptors for these that can be compared to the key points and descriptors computed from the unknown scene image by a process known as matching.
The find_obj.py demo in the samples/python2 folder of OpenCV shows how to detect a known object in an image.
There is also a tutorial in the user guide, see http://docs.opencv.org/doc/user_guide/ug_features2d.html
Note that some of the algorithms often used (e.g. SURF and SIFT) are not free, and need to be licensed separately if you use them.
Is possible, but, you have a long road to go.
One way to do this: use visual keypoints to recognise objects.
OpenCV Sift Documentation
Related
I have an object I'd like to track using OpenCV. In my detection algorithm I can create bounded boxes around the objects it sees, and can create a target object to track properly. My detection algorithm works well, but I want to pass this object to a tracking algorithm.I can't quite get this done without having to re write the detection and image display issues. I'm working with an NVIDA Jetson Nanoboard with an Intel Realsense camera if that helps.
The OpenCV DNN module comes with python samples of state of the art trackers. I've heard good things about the "siamese" based ones. Have a look
Also the OpenCV contrib repo contains a whole module of various trackers. Give those a try first. They have a simple API.
I am using OpenCV library and I can detect multiple faces in a video file or using a webcam. Now, I want to recognize those faces.
If any one guide me step by step means what should I do after detecting faces,it will be great for me. I am using C and C++ language.
#KISHAN, You may follow a tutorial with an example of using OpenFace deep learning network. It takes a 96x96 image of human's face and returns 128-dimensional unit vector called embedding vector. You may match two persons by dot product of these embeddings. So this neural network maps faces to multidimensional unit sphere where similar faces are mapped to the closer points.
NOTE: there is a live demo which downloads models (~35MB) if you pressed a Start button.
I am working on identifying an object by using Kinect sensor so to get x,y,z coordinates of the object.
I am trying to find the related information for this but could not able to find much. I have seen the videos as well but nobody is sharing the information or any sample code?
This is what I want to achieve https://www.youtube.com/watch?v=nw3yix3XomY
Proabably, few people may asked same question but as I am new to the Kinect and these libraries due to which I need little more guidance.
I read somewhere that object detection is not possible using Kinect v1. We need to use 3rd party libraries like open CV or point-clouds (pcl).
Can somebody help me that even by using third party libraries how exactly can I identify object via a Kinect sensor?
It will be really helpful.
Thank you.
As the author of the video you linked stated in the comment, following this PCL tutorial will help you. As you found out already, realizing this may not be possible using the standalone SDK. Relying on PCL will help you not reinvent the wheel.
The idea there is to:
Downsample the cloud to have less data to deal with in the next steps (this also reduces noise a bit).
Identify keypoints/features (i.e. points, areas, textures that remain somehow invariant to some transformations).
Compute the keypoint descriptors, mathematical representations of these features.
For each scene keypoint descriptor, find nearest neighbor into the model keypoints descriptor cloud and add it to the correspondences vector.
Perform clustering on the keypoints and detect the model in the scene.
The software in the tutorial needs the user to manually feed in the model and scene files. It doesn't do that on live feed, as the video you linked.
The process should be pretty similar though. I'm not sure how cpu-intensive the detection is, so it might require additional performance tweaking.
Once you have frame-by-frame detection in place, you could start thinking about actually tracking an object across the frames. But that's another topic.
Currently I'm trying to develop a program which can skeleton track
From the research I've been doing, I found out that the best way to tackle this problem is by using an RGB-Depth Camera such as Kinect.
Challenge : MS Kinect does not support skeleton tracking therefore I need to build a custom skeleton tracking
First Problem : How to Detect with a RGB-Depth camera?
What I found : Use a machine learning algorithm
Question: Is machine learning the only option to detect ? do I need the depth information for detection?
You can use the depth channel for the purpose of segmentation of the target silhouette, and then extract descriptive features and classify with them.
I'm not sure if what you need is to classify a target as human or animal, or whether you need to find what type of animal. If you need the former, features such as aspect ratio are very simple and good separators. If you need to classify which animal, it depends on the list of classes. Some cases are easier and some are harder.
I'm planning on doing my Final Year Project of my degree on Augmented Reality. It will be using markers and there will also be interaction between virtual objects. (sort of a simulation).
Do you recommend using libraries like ARToolkit, NyARToolkit, osgART for such project since they come with all the functions for tracking, detection, calibration etc? Will there be much work left from the programmers point of view?
What do you think if I use OpenCV and do the marker detection, recognition, calibration and other steps from scratch? Will that be too hard to handle?
I don't know how familiar you are with image or video processing, but writing a tracker from scratch will be very time-consuming if want it to return reliable results. The effort also depends on which kind of markers you plan to use. Artoolkit e.g. compares the marker's content detected from the video stream to images you earlier defined as markers. Hence it tries to match images and returns a value of probability that a certain part of the video stream is a predefined marker. Depending on the threshold you are going to use and the lighting situation, markers are not always recognized correctly. Then there are other markers like datamatrix, qrcode, framemarkers (used by QCAR) that encode an id optically. So there is no image matching required, all necessary data can be retrieved from the video stream. Then there are more complex approaches like natural feature tracking, where you can use predefined images, given that they offer enough contrast and points of interest so they can be recognized later by the tracker.
So if you are more interested in the actual application or interaction than in understanding how trackers work, you should base your work on an existing library.
I suggest you to use OpenCV, you will find high quality algorithms and it is fast. They are continuously developing new methods so soon it will be possible to run it real-time in mobiles.
You can start with this tutorial here.
Mastering OpenCV with Practical Computer Vision Projects
I did the exact same thing and found Chapter 2 of this book immensely helpful. They provide source code for the marker tracking project and I've written a framemarker generator tool. There is still quite a lot to figure out in terms of OpenGL, camera calibration, projection matrices, markers and extending it, but it is a great foundation for the marker tracking portion.