How to locate objects based on specific motion in video using computer vision? - opencv

I am working on a problem where I have to track some machine usage of some surveillance video(stock example video).
My problem is what kind of modern Machine learning techniques or traditional computer vision methods that can I use, I don't want to use a single frame input source model (like object recognition on frame) because there are lots of occlusions & also is hard to recognize some machine objects because of side viewpoint ,so I have to use uniform motions of machine data from input video.

Related

Comma.ai self-driving car neural network using client/server architecture in TensorFlow, why?

In comma.ai's self-driving car software they use a client/server architecture. Two processes are started separately, server.py and train_steering_model.py.
server.py sends data to train_steering_model.py via http and sockets.
Why do they use this technique? Isn't this a complicated way of sending data? Isn't this easier to make train_steering_model.py load the data set by it self?
The document DriveSim.md in the repository links to a paper titled Learning a Driving Simulator. In the paper, they state:
Due to the problem complexity we decided to learn video prediction with separable networks.
They also mention the frame rate they used is 5 Hz.
While that sentence is the only one that addresses your question, and it isn't exactly crystal clear, let's break down the task in question:
Grab an image from a camera
Preprocess/downsample/normalize the image pixels
Pass the image through an autoencoder to extract representative feature vector
Pass the output of the autoencoder on to an RNN that will predict proper steering angle
The "problem complexity" refers to the fact that they're dealing with a long sequence of large images that are (as they say in the paper) "highly uncorrelated." There are lots of different tasks that are going on, so the network approach is more modular - in addition to allowing them to work in parallel, it also allows scaling up the components without getting bottlenecked by a single piece of hardware reaching its threshold computational abilities. (And just think: this is only the steering aspect. The Logs.md file lists other components of the vehicle to worry about that aren't addressed by this neural network - gas, brakes, blinkers, acceleration, etc.).
Now let's fast forward to the practical implementation in a self-driving vehicle. There will definitely be more than one neural network operating onboard the vehicle, and each will need to be limited in size - microcomputers or embedded hardware, with limited computational power. So, there's a natural ceiling to how much work one component can do.
Tying all of this together is the fact that cars already operate using a network architecture - a CAN bus is literally a computer network inside of a vehicle. So, this work simply plans to farm out pieces of an enormously complex task to a number of distributed components (which will be limited in capability) using a network that's already in place.

Using OpenCV to create a Neural network?

I am working on creating a Real-time image processor for a self driving small scale car project for uni, It uses a raspberry pi to get various information to send to the program to base a decision by.
the only stage i have left is to create a Neural network which will view the image displayed from the camera ( i already have to code to send the array of CV_32F values between 0-255 etc.
I have been scouring the internet and cannot seem to find any example code that is related to my specific issue or my kind of task in general (how to implement a neural network of this kind), so my question is is it possible to create a NN of this size in c++ without hard coding it (aka utilising openCv's capabilities): it will need 400 input nodes for each value (from 20x20 image) and produce 4 outputs of left right fwd or backwards respectively.
How would one create a neural network in opencv?
Does openCV provide a backpropogation(training) interface /function or would I have to write this myself.
once it is trained am I correct in assuming I can load the neural network using ANN_MLP load etc? following this pass the live stream frame (as an array of values) to it and it should be able to produce the correct output.
edit:: I have found this OpenCV image recognition - setting up ANN MLP. and It is very simple in comparison to what I want to do, and I am not Sure how to adapt that to my problem.
OpenCV is not a neural network framework and in turn won't find any advanced features. It's far more common to use a dedicated ANN library and combine it with OpenCV. Caffe is a great choice as a computer vision dedicated deep learning framework (with C++ API), and it can be combined with OpenCV.

What is the current state of the art in Multi-View Clustering?

Many real-world datasets have representations in the form of multiple views. For example, a person can be identified by face, fingerprint, signature and iris or an image that can be represented by its color and texture features. Multi-view is basically information obtained from multiple sources. In the context of machine learning/data clustering/computer vision, what are the most relevant applications that deal with this approach?
In the context of computer vision multi-view refers to the images of the same object taken from different views/angles/positions. There are multiple applications of this strategy. 3D reconstruction from multiple view is one of the most popular examples.
The type of multi-view you are referring to is basically data augmentation to solve a single problem. As you have mentioned too, identification of a person from different kind of data-sources is an application of data-augmentation. There can be multiple other applications too. For example expression estimation, to identify the mood of a person, using data from a RGB camera + 3D data from Kinect + Audio is another example.
In the context of machine-learning data-augmentation is everywhere. Combining different features of an image or audio for classification is data-augmentation.

API availability to track other objects apart from human gesture for Windows Kinect

APIs shipped with MS Windows Kinect SDK is all about program around Voice, Movement and Gesture Recognition related to humans.
Is there any open source or commercial APIs for tracking & recognizing dynamically moving objects like vehicles for its classification.
Is it feasible and good approach of employee Kinect for Automated vehicle classification than traditional image processing approaches
Even image processing technologies have made remarkable innovations, why fully automated vehicle classification is not used at Most of the toll collection.
why existing technologies (except RFID approach) failing to classify the vehicle (i.e, they are not yet 100% accurate in classifying) or is there any other reasons apart from image processing.
You will need to use a regular image processing suite to track objects that are not supported by the Kinect API. A few being:
OpenCV
Emgu CV (OpenCV in .NET)
ImageMagick
There is no library that directly supports the depth capabilities of the Kinect, to my knowledge. As a result, using the Kinect over a regular camera would be of no benefit.

Using Augmented Reality libraries for Academic Project

I'm planning on doing my Final Year Project of my degree on Augmented Reality. It will be using markers and there will also be interaction between virtual objects. (sort of a simulation).
Do you recommend using libraries like ARToolkit, NyARToolkit, osgART for such project since they come with all the functions for tracking, detection, calibration etc? Will there be much work left from the programmers point of view?
What do you think if I use OpenCV and do the marker detection, recognition, calibration and other steps from scratch? Will that be too hard to handle?
I don't know how familiar you are with image or video processing, but writing a tracker from scratch will be very time-consuming if want it to return reliable results. The effort also depends on which kind of markers you plan to use. Artoolkit e.g. compares the marker's content detected from the video stream to images you earlier defined as markers. Hence it tries to match images and returns a value of probability that a certain part of the video stream is a predefined marker. Depending on the threshold you are going to use and the lighting situation, markers are not always recognized correctly. Then there are other markers like datamatrix, qrcode, framemarkers (used by QCAR) that encode an id optically. So there is no image matching required, all necessary data can be retrieved from the video stream. Then there are more complex approaches like natural feature tracking, where you can use predefined images, given that they offer enough contrast and points of interest so they can be recognized later by the tracker.
So if you are more interested in the actual application or interaction than in understanding how trackers work, you should base your work on an existing library.
I suggest you to use OpenCV, you will find high quality algorithms and it is fast. They are continuously developing new methods so soon it will be possible to run it real-time in mobiles.
You can start with this tutorial here.
Mastering OpenCV with Practical Computer Vision Projects
I did the exact same thing and found Chapter 2 of this book immensely helpful. They provide source code for the marker tracking project and I've written a framemarker generator tool. There is still quite a lot to figure out in terms of OpenGL, camera calibration, projection matrices, markers and extending it, but it is a great foundation for the marker tracking portion.

Resources