Machine Learning: Question regarding processing of RGBD streams and involved components - opencv

I would like to experiment with machine learning (especially CNNs) on the aligned RGB and depth stream of either an Intel RealSense or an Orbbec Astra camera. My goal is to do some object recognisation and highlight/mark them in the output video stream (as a starting point).
But after having read many articles I am still confused about the involved frameworks and how the data flows from the camera through the involved software components. I just can't get a high level picture.
This is my assumption regarding the processing flow:
Sensor => Driver => libRealSense / Astra SDK => TensorFlow
Questions
Is my assumption correct regarding the processing?
Orbbec provides an additional Astra OpenNI SDK besides the Astra SDK where as Intel has wrappers (?) for OpenCV and OpenNI. When or why would I need this additional libraries/support?
What would be the quickest way to get started? I would prefer C# over C++

Your assumptions are correct: the data acquisition flow is: sensor -> driver -> camera library -> other libraries built on top of it (see OpenCV support for Intel RealSense)-> captured image. Once you got the image, you can do whatever you want of course.
The various libraries allow you to work easily with the device. In particular, OpenCV compiled with the Intel RealSense support allows you to use OpenCV standard data acquisition stream, without bothering about the image format coming from the sensor and used by the Intel library. 10/10 use these libraries, they make your life easier.
You can start from the OpenCV wrapper documentation for Intel RealSense (https://github.com/IntelRealSense/librealsense/tree/master/wrappers/opencv). Once you are able to capture the RGBD images, you can create your input pipeline for your model using tf.data and develop in tensorflow any application that uses CNNs on RGDB images (just google it and look on arxiv to have ideas about the possible applications).
Once your model has been trained, just export the trained graph and use it in inference, hence your pipeline will become: sensor -> driver -> camera library -> libs -> RGBD image -> trained model -> model output

Related

How do I run a Tensorflow Object Detection API model in iOS?

I have just trained a model with satisfactory results and I have the frozen_inference_graph.pb . How would I go about running this on iOS? It was trained on SSD Mobilenet V1 if that helps. Optimally I'd like to run it using the GPU (I know the tensorflow API can't do that on iOS), but it would be great to just have it on CPU first.
Support was just announced for importing TensorFlow models into Core ML. This is accomplished using the tfcoreml converter, which should take in your .pb graph and output a Core ML model. From there, you can use this model with Core ML and either take in still images or video frames for processing.
At that point, it's up to you to make sure you're providing the correct input colorspace and size, then extracting and processing the SSD results correctly to get your object classes and bounding boxes.

API availability to track other objects apart from human gesture for Windows Kinect

APIs shipped with MS Windows Kinect SDK is all about program around Voice, Movement and Gesture Recognition related to humans.
Is there any open source or commercial APIs for tracking & recognizing dynamically moving objects like vehicles for its classification.
Is it feasible and good approach of employee Kinect for Automated vehicle classification than traditional image processing approaches
Even image processing technologies have made remarkable innovations, why fully automated vehicle classification is not used at Most of the toll collection.
why existing technologies (except RFID approach) failing to classify the vehicle (i.e, they are not yet 100% accurate in classifying) or is there any other reasons apart from image processing.
You will need to use a regular image processing suite to track objects that are not supported by the Kinect API. A few being:
OpenCV
Emgu CV (OpenCV in .NET)
ImageMagick
There is no library that directly supports the depth capabilities of the Kinect, to my knowledge. As a result, using the Kinect over a regular camera would be of no benefit.

Image Processing on CUDA or OpenCV?

I need to develop an image processing program for my project in which I have to count the number of cars on the road. I am using GPU programming. Should I go for OpenCV program with GPU processing feature or should I develop my entire program on CUDA without any OpenCV library?
The algorithms which I am using for counting the number of cars is background subtraction, segmentation and edge detection.
You can use GPU functions in OpenCV.
First visit the introduction about this : http://docs.opencv.org/modules/gpu/doc/introduction.html
Secondly, I think above mentioned processes are already implemented in OpenCV optimized for GPU. So It will be much easier to develop with OpenCV.
Canny Edge Detection : http://docs.opencv.org/modules/gpu/doc/image_processing.html#gpu-canny
PerElement Operations (including subtraction): http://docs.opencv.org/modules/gpu/doc/per_element_operations.html#per-element-operations
For other functions, visit OpenCV docs.
OpenCV, no doubt, has the biggest collection of Image processing functionality and recently they've started porting functions to CUDA as well. There's a new GPU module in latest OpenCV with few functions ported to CUDA.
Being said that, OpenCV is not the best option to build a CUDA based application as there are many dedicated CUDA libraries like CUVI that beat OpenCV in Performance. If you're looking for an optimized solution, you should also give them a try.

OpenCV + Webcam compatibility

For the people that have experience with OpenCV, are there any webcams that don't work with OpenCV.
I am looking into the feasibility of a project and I know I am going to need a high quality feed (1080p), so I am going to need a webcam that is capable of that. So does OpenCV have problems with certain cameras?
To be analysing a video feed of that resolution on the fly I am going to need a fast processor, I know this, but will I need a machine that is not consumer available...ie, will an i7 do?
Thanks.
On Linux, if it's supported by v4l2, it is probably going to work (e.g., my home webcam isn't listed, but it's v4l2 compatible and works out of the box). You can always use the camera manufacturer's driver to acquire frames, and feed them to your OpenCV code. You can even sub-class the VideoCapture class, and implement your camera driver to make it work seamlessly with OpenCV.
I would think the latest i7 series should work just fine. You may want to also check out Intel's IPP library for more optimized routines. IPP also easily integrates into OpenCV code since OpenCV was an Intel project at its inception.
If you need really fast image processing, you might want to consider adding a high performance GPU to the box, so that you have that option available to you.
Unfortunately, the page that I'm about to reference doesn't exist anymore. OpenCV evolved a lot since I first wrote this answer in 2011 and it's difficult for them to keep track of which cameras in the market are supported by OpenCV.
Anyway, here is the old list of supported cameras organized by Operating System (this list was available until the beginning of 2013).
It depends if your camera is supported by OpenCV, mainly by the driver model that your camera is using.
Quote from Getting Started with OpenCV capturing,
Currently two camera interfaces can be used on Windows: Video for Windows (VFW) and Matrox Imaging Library (MIL) and two on Linux: Video for Linux(V4L) and IEEE1394. For the latter there exists two implemented interfaces (CvCaptureCAM_DC1394_CPP and CvCapture_DC1394V2).
So if your camera is VFW or MIL compliant under Windows or suits into standard V4L or IEEE1394 driver model, then probably it will work.
But if not, like mevatron says, you can even sub-class the VideoCapture class, and implement your camera driver to make it work seamlessly with OpenCV.

Which SDK should I use to visualize medical images in 3D?

I need to process DICOM formatted medical images and visualize them in 3D, also do some image processing on these images on real-time. Therefore, I am asking this question to learn which SDK has better real-time characteristics for medical visualization and image processing?
The Visualization Toolkit (VTK) is an open-source, freely available software system for 3D computer graphics, image processing and visualization.
You can find details here.
Or another solution would be the modifying or utilizing 3D engine that supports volume rendering.
Moreover, for computer vision algorithms, OpenCV seems promising.
osgVolume is an add-in to the popular openscenegraph library for doing this
Just use GDCM+VTK. In 2D simply use gdcmviewer. In 3D you need to build gdcmorthoplanes.
Ref:
http://sourceforge.net/apps/mediawiki/gdcm/index.php?title=Gdcmviewer
http://sourceforge.net/apps/mediawiki/gdcm/index.php?title=Using_GDCM_API
You could check out MITK (http://mitk.org) which combines the already mentioned VTK with the Insight Toolkit (http://www.itk.org) for image processing. Another option to start from could be Slicer (http://www.slicer.org), but this depends on the license you need.
In a uni we were taught Matlab for DICOM file processing. I think it has pretty nice and easy to use plugins for that as well. The end results were that using Matlab I was able to do all kinds of DICOM image processing, filtering and so forth.
As you probably know, Matlab is not SDK but a complete environment. Nevertheless you can write scripts to achieve normal application behavior: Create windows, buttons, images, etc.

Resources