I'm planning on doing my Final Year Project of my degree on Augmented Reality. It will be using markers and there will also be interaction between virtual objects. (sort of a simulation).
Do you recommend using libraries like ARToolkit, NyARToolkit, osgART for such project since they come with all the functions for tracking, detection, calibration etc? Will there be much work left from the programmers point of view?
What do you think if I use OpenCV and do the marker detection, recognition, calibration and other steps from scratch? Will that be too hard to handle?
I don't know how familiar you are with image or video processing, but writing a tracker from scratch will be very time-consuming if want it to return reliable results. The effort also depends on which kind of markers you plan to use. Artoolkit e.g. compares the marker's content detected from the video stream to images you earlier defined as markers. Hence it tries to match images and returns a value of probability that a certain part of the video stream is a predefined marker. Depending on the threshold you are going to use and the lighting situation, markers are not always recognized correctly. Then there are other markers like datamatrix, qrcode, framemarkers (used by QCAR) that encode an id optically. So there is no image matching required, all necessary data can be retrieved from the video stream. Then there are more complex approaches like natural feature tracking, where you can use predefined images, given that they offer enough contrast and points of interest so they can be recognized later by the tracker.
So if you are more interested in the actual application or interaction than in understanding how trackers work, you should base your work on an existing library.
I suggest you to use OpenCV, you will find high quality algorithms and it is fast. They are continuously developing new methods so soon it will be possible to run it real-time in mobiles.
You can start with this tutorial here.
Mastering OpenCV with Practical Computer Vision Projects
I did the exact same thing and found Chapter 2 of this book immensely helpful. They provide source code for the marker tracking project and I've written a framemarker generator tool. There is still quite a lot to figure out in terms of OpenGL, camera calibration, projection matrices, markers and extending it, but it is a great foundation for the marker tracking portion.
Related
I am working on identifying an object by using Kinect sensor so to get x,y,z coordinates of the object.
I am trying to find the related information for this but could not able to find much. I have seen the videos as well but nobody is sharing the information or any sample code?
This is what I want to achieve https://www.youtube.com/watch?v=nw3yix3XomY
Proabably, few people may asked same question but as I am new to the Kinect and these libraries due to which I need little more guidance.
I read somewhere that object detection is not possible using Kinect v1. We need to use 3rd party libraries like open CV or point-clouds (pcl).
Can somebody help me that even by using third party libraries how exactly can I identify object via a Kinect sensor?
It will be really helpful.
Thank you.
As the author of the video you linked stated in the comment, following this PCL tutorial will help you. As you found out already, realizing this may not be possible using the standalone SDK. Relying on PCL will help you not reinvent the wheel.
The idea there is to:
Downsample the cloud to have less data to deal with in the next steps (this also reduces noise a bit).
Identify keypoints/features (i.e. points, areas, textures that remain somehow invariant to some transformations).
Compute the keypoint descriptors, mathematical representations of these features.
For each scene keypoint descriptor, find nearest neighbor into the model keypoints descriptor cloud and add it to the correspondences vector.
Perform clustering on the keypoints and detect the model in the scene.
The software in the tutorial needs the user to manually feed in the model and scene files. It doesn't do that on live feed, as the video you linked.
The process should be pretty similar though. I'm not sure how cpu-intensive the detection is, so it might require additional performance tweaking.
Once you have frame-by-frame detection in place, you could start thinking about actually tracking an object across the frames. But that's another topic.
I'd like to create an app (on iPhone) which does this:
I have a template image (a logo or any object) and I'd like to find that in camera view and put a layer on the place of where it is found and tracking it!
It is a markless AR with OpenCV!
I read some docs and books and Q&A-s here, but sadly
actually i'd like to create something like this or something like this.
If anyone can send to me some source code or a really useful tutorial (step by step) i'd really be happy!!!
Thank you!
Implementing this is not trivial - it involves Augmented Reality combined with template matching and 3D rendering.
A rough outline:
Use some sort of stable feature extraction to obtain features from the input video stream. (eg. see FAST in OpenCV).
Combine these features and back-project to estimate the camera parameters and pose. (See Camera Calibration for a discussion, but note that these usually require calibration pattern such as a checkerboard.)
Use template matching to scan the image for patches of your target image, then use the features and camera parameters to determine the pose of the object.
Apply the camera and object transforms forward and render the replacement image into the scene.
Implementing all this will require much research and hard work!
There are a few articles on the web you might find useful:
Simple Augmented Reality for OpenCV
A minimal library for Augmented Reality
AR with NyartToolkit
You might like to investigate some of the AR libraries and frameworks available. Wikipedia has a good list:
AR Software
Notable is Qualcomm's toolkit, which is not FLOSS but appears highly capable.
I'm a Software Engineering student in my last year in a 4-year bachelor degree program, I'm required to work on a graduation project of my own choice.
we are trying to find a way to notify the user of any thing the gets on his/her way while walking, this will be implemented as an android application so we have the ability to use the camera, we thought of Image processing and computer vision but neither me or any of my group members have any Image processing background, we searched a little bit and we found out about OpenCv.
So my question is do I need any special background to deal with OpenCv? and is it a good choice for the objective of my project to use computer vision, if not what alternatives do u advise me to use?
I appreciate your help.. thanks in advance!
At the first glance I would use 2 standard cameras to find depth image - stereo vision (similar to MS Kinect depth sensor)
from that it would be easy to fix a threshold to some distance.
Those algorithms are very CPU hungry so I do not think it will work on Android (although I have zero experience).
I you must use Android, I would look for some depth sensor (to avoid extracting depth data from 2 images)
For prototyping I would use MATLAB (or Octave), then I would switch to OpenCV (pointers, mem. allocations, blah...)
Is there a way to accomplish something similar to what the iTunes and App Store Apps do when you redeem a Gift Card using the device camera, recognizing a short string of characters in real time on top of the live camera feed?
I know that in iOS 7 there is now the AVMetadataMachineReadableCodeObject class which, AFAIK, only represents barcodes. I'm more interested in detecting and reading the contents of a short string. Is this possible using publicly available API methods, or some other third party SDK that you might know of?
There is also a video of the process in action:
https://www.youtube.com/watch?v=c7swRRLlYEo
Best,
I'm working on a project that does something similar to the Apple app store redeem with camera as you mentioned.
A great starting place on processing live video is a project I found on GitHub. This is using the AVFoundation framework and you implement the AVCaptureVideoDataOutputSampleBufferDelegate methods.
Once you have the image stream (video), you can use OpenCV to process the video. You need to determine the area in the image you want to OCR before you run it through Tesseract. You have to play with the filtering, but the broad steps you take with OpenCV are:
Convert the images to B&W using cv::cvtColor(inputMat, outputMat, CV_RGBA2GRAY);
Threshold the images to eliminate unnecessary elements. You specify the threshold value to eliminate, and then set everything else to black (or white).
Determine the lines that form the boundary of the box (or whatever you are processing). You can either create a "bounding box" if you have eliminated everything but the desired area, or use the HoughLines algorithm (or the probabilistic version, HoughLinesP). Using this, you can determine line intersection to find corners, and use the corners to warp the desired area to straighten it into a proper rectangle (if this step is necessary in your application) prior to OCR.
Process the portion of the image with Tesseract OCR library to get the resulting text. It is possible to create training files for letters in OpenCV so you can read the text without Tesseract. This could be faster but also could be a lot more work. In the App Store case, they are doing something similar to display the text that was read overlaid on top of the original image. This adds to the cool factor, so it just depends on what you need.
Some other hints:
I used the book "Instant OpenCV" to get started quickly with this. It was pretty helpful.
Download OpenCV for iOS from OpenCV.org/downloads.html
I have found adaptive thresholding to be very useful, you can read all about it by searching for "OpenCV adaptiveThreshold". Also, if you have an image with very little in between light and dark elements, you can use Otsu's Binarization. This automatically determines the threshold values based on the histogram of the grayscale image.
This Q&A thread seems to consistently be one of the top search hits for the topic of OCR on iOS, but is fairly out of date, so I thought I'd post some additional resources that might be useful that I've found as of the time of writing this post:
Vision Framework
https://developer.apple.com/documentation/vision
As of iOS 11, you can now use the included CoreML-based Vision framework for things like rectangle or text detection. I've found that I no longer need to use OpenCV with these capabilities included in the OS. However, note that text detection is not the same as text recognition or OCR so you will still need another library like Tesseract (or possibly your own CoreML model) to translate the detected parts of the image into actual text.
SwiftOCR
https://github.com/garnele007/SwiftOCR
If you're just interested in recognizing alphanumeric codes, this OCR library claims significant speed, memory consumption, and accuracy improvements over Tesseract (I have not tried it myself).
ML Kit
https://firebase.google.com/products/ml-kit/
Google has released ML Kit as part of its Firebase suite of developer tools, in beta at the time of writing this post. Similar to Apple's CoreML, it is a machine learning framework that can use your own trained models, but also has pre-trained models for common image processing tasks like Vision Framework. Unlike Vision Framework, this also includes a model for on-device text recognition of Latin characters. Currently, use of this library is free for on-device functionality, with charges for using cloud/SAAS API offerings from Google. I have opted to use this in my project, as the speed and accuracy of recognition seems quite good, and I also will be creating an Android app with the same functionality, so having a single cross platform solution is ideal for me.
ABBYY Real-Time Recognition SDK
https://rtrsdk.com/
This commercial SDK for iOS and Android is free to download for evaluation and limited commercial use (up to 5000 units as of time of writing this post). Further commercial use requires an Extended License. I did not evaluate this offering due to its opaque pricing.
'Real time' is just a set of images. You don't even need to think about processing all of them, just enough to broadly represent the motion of the device (or the change in the camera position). There is nothing built into the iOS SDK to do what you want, but you can use a 3rd party OCR library (like Tesseract) to process the images you grab from the camera.
I would look into Tesseract. It's an open source OCR library that takes image data and processes it. You can add different regular expressions and only look for specific characters as well. It isn't perfect, but from my experience it works pretty well. Also it can be installed as a CocoaPod if you're into that sort of thing.
If you wanted to capture that in real time you might be able to use GPUImage to catch images in the live feed and do processing on the incoming images to speed up Tesseract by using different filters or reducing the size or quality of the incoming images.
There's a project similar to that on github: https://github.com/Devxhkl/RealtimeOCR
I want to build a video based tracking software. I can manage the control and display quite easily but the actual object tracking in a video stream is very difficult (color tracking is not an option).
Solutions like openCV would probably require a very long learning curve which I can't afford ATM.
Are there professional packages which expose a simple API for object tracking? C# and C++ are the preferred languages but other would be fine as well. Price is also less of an issue.
Computer Vision System Toolbox for MATLAB provides tracking functionality. Please check out the following examples:
Tracking a face
Tracking multiple objects
Generally, a lot depends on the specific problem you are trying to solve. Is the camera moving or stationary? Do you need to track a single object or multiple objects? Does your object have a distinctive color or texture? Does your object move in some predictable way?
Use OpenTLD. It tracks almost anything, but at a time, track only one thing. And code is in matlab.