Is it possible to do "Markerless motion capturing" using a normal webcam, in real time? Are there any Open source frameworks available?
I need to construct a 3d model which will be animated using the captured data.
You could start with The Artvertiser (full source code available at the link).
You could also look at using the Ferns library, I'm not sure but I think it might be a part of the latest OpenCV.
Related
I have an object I'd like to track using OpenCV. In my detection algorithm I can create bounded boxes around the objects it sees, and can create a target object to track properly. My detection algorithm works well, but I want to pass this object to a tracking algorithm.I can't quite get this done without having to re write the detection and image display issues. I'm working with an NVIDA Jetson Nanoboard with an Intel Realsense camera if that helps.
The OpenCV DNN module comes with python samples of state of the art trackers. I've heard good things about the "siamese" based ones. Have a look
Also the OpenCV contrib repo contains a whole module of various trackers. Give those a try first. They have a simple API.
Is there a way to accomplish something similar to what the iTunes and App Store Apps do when you redeem a Gift Card using the device camera, recognizing a short string of characters in real time on top of the live camera feed?
I know that in iOS 7 there is now the AVMetadataMachineReadableCodeObject class which, AFAIK, only represents barcodes. I'm more interested in detecting and reading the contents of a short string. Is this possible using publicly available API methods, or some other third party SDK that you might know of?
There is also a video of the process in action:
https://www.youtube.com/watch?v=c7swRRLlYEo
Best,
I'm working on a project that does something similar to the Apple app store redeem with camera as you mentioned.
A great starting place on processing live video is a project I found on GitHub. This is using the AVFoundation framework and you implement the AVCaptureVideoDataOutputSampleBufferDelegate methods.
Once you have the image stream (video), you can use OpenCV to process the video. You need to determine the area in the image you want to OCR before you run it through Tesseract. You have to play with the filtering, but the broad steps you take with OpenCV are:
Convert the images to B&W using cv::cvtColor(inputMat, outputMat, CV_RGBA2GRAY);
Threshold the images to eliminate unnecessary elements. You specify the threshold value to eliminate, and then set everything else to black (or white).
Determine the lines that form the boundary of the box (or whatever you are processing). You can either create a "bounding box" if you have eliminated everything but the desired area, or use the HoughLines algorithm (or the probabilistic version, HoughLinesP). Using this, you can determine line intersection to find corners, and use the corners to warp the desired area to straighten it into a proper rectangle (if this step is necessary in your application) prior to OCR.
Process the portion of the image with Tesseract OCR library to get the resulting text. It is possible to create training files for letters in OpenCV so you can read the text without Tesseract. This could be faster but also could be a lot more work. In the App Store case, they are doing something similar to display the text that was read overlaid on top of the original image. This adds to the cool factor, so it just depends on what you need.
Some other hints:
I used the book "Instant OpenCV" to get started quickly with this. It was pretty helpful.
Download OpenCV for iOS from OpenCV.org/downloads.html
I have found adaptive thresholding to be very useful, you can read all about it by searching for "OpenCV adaptiveThreshold". Also, if you have an image with very little in between light and dark elements, you can use Otsu's Binarization. This automatically determines the threshold values based on the histogram of the grayscale image.
This Q&A thread seems to consistently be one of the top search hits for the topic of OCR on iOS, but is fairly out of date, so I thought I'd post some additional resources that might be useful that I've found as of the time of writing this post:
Vision Framework
https://developer.apple.com/documentation/vision
As of iOS 11, you can now use the included CoreML-based Vision framework for things like rectangle or text detection. I've found that I no longer need to use OpenCV with these capabilities included in the OS. However, note that text detection is not the same as text recognition or OCR so you will still need another library like Tesseract (or possibly your own CoreML model) to translate the detected parts of the image into actual text.
SwiftOCR
https://github.com/garnele007/SwiftOCR
If you're just interested in recognizing alphanumeric codes, this OCR library claims significant speed, memory consumption, and accuracy improvements over Tesseract (I have not tried it myself).
ML Kit
https://firebase.google.com/products/ml-kit/
Google has released ML Kit as part of its Firebase suite of developer tools, in beta at the time of writing this post. Similar to Apple's CoreML, it is a machine learning framework that can use your own trained models, but also has pre-trained models for common image processing tasks like Vision Framework. Unlike Vision Framework, this also includes a model for on-device text recognition of Latin characters. Currently, use of this library is free for on-device functionality, with charges for using cloud/SAAS API offerings from Google. I have opted to use this in my project, as the speed and accuracy of recognition seems quite good, and I also will be creating an Android app with the same functionality, so having a single cross platform solution is ideal for me.
ABBYY Real-Time Recognition SDK
https://rtrsdk.com/
This commercial SDK for iOS and Android is free to download for evaluation and limited commercial use (up to 5000 units as of time of writing this post). Further commercial use requires an Extended License. I did not evaluate this offering due to its opaque pricing.
'Real time' is just a set of images. You don't even need to think about processing all of them, just enough to broadly represent the motion of the device (or the change in the camera position). There is nothing built into the iOS SDK to do what you want, but you can use a 3rd party OCR library (like Tesseract) to process the images you grab from the camera.
I would look into Tesseract. It's an open source OCR library that takes image data and processes it. You can add different regular expressions and only look for specific characters as well. It isn't perfect, but from my experience it works pretty well. Also it can be installed as a CocoaPod if you're into that sort of thing.
If you wanted to capture that in real time you might be able to use GPUImage to catch images in the live feed and do processing on the incoming images to speed up Tesseract by using different filters or reducing the size or quality of the incoming images.
There's a project similar to that on github: https://github.com/Devxhkl/RealtimeOCR
I want to build a video based tracking software. I can manage the control and display quite easily but the actual object tracking in a video stream is very difficult (color tracking is not an option).
Solutions like openCV would probably require a very long learning curve which I can't afford ATM.
Are there professional packages which expose a simple API for object tracking? C# and C++ are the preferred languages but other would be fine as well. Price is also less of an issue.
Computer Vision System Toolbox for MATLAB provides tracking functionality. Please check out the following examples:
Tracking a face
Tracking multiple objects
Generally, a lot depends on the specific problem you are trying to solve. Is the camera moving or stationary? Do you need to track a single object or multiple objects? Does your object have a distinctive color or texture? Does your object move in some predictable way?
Use OpenTLD. It tracks almost anything, but at a time, track only one thing. And code is in matlab.
For the people that have experience with OpenCV, are there any webcams that don't work with OpenCV.
I am looking into the feasibility of a project and I know I am going to need a high quality feed (1080p), so I am going to need a webcam that is capable of that. So does OpenCV have problems with certain cameras?
To be analysing a video feed of that resolution on the fly I am going to need a fast processor, I know this, but will I need a machine that is not consumer available...ie, will an i7 do?
Thanks.
On Linux, if it's supported by v4l2, it is probably going to work (e.g., my home webcam isn't listed, but it's v4l2 compatible and works out of the box). You can always use the camera manufacturer's driver to acquire frames, and feed them to your OpenCV code. You can even sub-class the VideoCapture class, and implement your camera driver to make it work seamlessly with OpenCV.
I would think the latest i7 series should work just fine. You may want to also check out Intel's IPP library for more optimized routines. IPP also easily integrates into OpenCV code since OpenCV was an Intel project at its inception.
If you need really fast image processing, you might want to consider adding a high performance GPU to the box, so that you have that option available to you.
Unfortunately, the page that I'm about to reference doesn't exist anymore. OpenCV evolved a lot since I first wrote this answer in 2011 and it's difficult for them to keep track of which cameras in the market are supported by OpenCV.
Anyway, here is the old list of supported cameras organized by Operating System (this list was available until the beginning of 2013).
It depends if your camera is supported by OpenCV, mainly by the driver model that your camera is using.
Quote from Getting Started with OpenCV capturing,
Currently two camera interfaces can be used on Windows: Video for Windows (VFW) and Matrox Imaging Library (MIL) and two on Linux: Video for Linux(V4L) and IEEE1394. For the latter there exists two implemented interfaces (CvCaptureCAM_DC1394_CPP and CvCapture_DC1394V2).
So if your camera is VFW or MIL compliant under Windows or suits into standard V4L or IEEE1394 driver model, then probably it will work.
But if not, like mevatron says, you can even sub-class the VideoCapture class, and implement your camera driver to make it work seamlessly with OpenCV.
I'm planning on doing my Final Year Project of my degree on Augmented Reality. It will be using markers and there will also be interaction between virtual objects. (sort of a simulation).
Do you recommend using libraries like ARToolkit, NyARToolkit, osgART for such project since they come with all the functions for tracking, detection, calibration etc? Will there be much work left from the programmers point of view?
What do you think if I use OpenCV and do the marker detection, recognition, calibration and other steps from scratch? Will that be too hard to handle?
I don't know how familiar you are with image or video processing, but writing a tracker from scratch will be very time-consuming if want it to return reliable results. The effort also depends on which kind of markers you plan to use. Artoolkit e.g. compares the marker's content detected from the video stream to images you earlier defined as markers. Hence it tries to match images and returns a value of probability that a certain part of the video stream is a predefined marker. Depending on the threshold you are going to use and the lighting situation, markers are not always recognized correctly. Then there are other markers like datamatrix, qrcode, framemarkers (used by QCAR) that encode an id optically. So there is no image matching required, all necessary data can be retrieved from the video stream. Then there are more complex approaches like natural feature tracking, where you can use predefined images, given that they offer enough contrast and points of interest so they can be recognized later by the tracker.
So if you are more interested in the actual application or interaction than in understanding how trackers work, you should base your work on an existing library.
I suggest you to use OpenCV, you will find high quality algorithms and it is fast. They are continuously developing new methods so soon it will be possible to run it real-time in mobiles.
You can start with this tutorial here.
Mastering OpenCV with Practical Computer Vision Projects
I did the exact same thing and found Chapter 2 of this book immensely helpful. They provide source code for the marker tracking project and I've written a framemarker generator tool. There is still quite a lot to figure out in terms of OpenGL, camera calibration, projection matrices, markers and extending it, but it is a great foundation for the marker tracking portion.