pi3 GPU functionaly with camera - opencv

In my project I have 2 main tasks – image recognition for the camera frames and saving the vidoes.
I think to use pi GPU here for accelerate this.
Is it possible using pi GPU get the frames from camera, than convert and save them in SD card?
And meantime pass the frames to processor for doing image recognition?
Can someone please provide some info about how I can use GPU and processor separately and what video-camera related operations can GPU do.
Thanks

I think you really just want to use the umat class. It makes a lot of opencvs functions run on the GPU (if possible). It can in some cases release a lot of cpu time for other tasks.
Some opencv functions are also often multiple times faster when run on a GPU.
See opencv-transparent-api
You can also easily find examples using it here on stack overflow.

Related

How to use OpenCV functions in Metal on iOS?

I have developed the Xcode project that uses OpenCV functions for image processing when the iPhone camera live stream.
It takes some time to process one frame and doesn't look like real time.
Is it possible to accelerate the calculation by integrating OpenCV and Metal?
For example, OpenCV function "grabCut" takes more than 1 second to detect certain foreground objects.
How can I reduce the processing time down to 10ms at least using Metal?
You can't call OpenCV functions from Metal.
If you want to speed up this algorithm, you could try porting it to Metal but that's only an option if the algorithm -- or major parts of it -- are highly parallel.
Now, it looks like grabCut has a CUDA implementation (which I found by googling for "grabcut cuda"), which means that implementing this in Metal might actually be worth doing. If you can find the CUDA source code, it's usually a relatively straightforward port.

Does OpenCV UMat always reside on GPU?

I would like to know that whether OpenCV UMat always resides on GPU side if there is a OpenCL-compatible GPU available? Does "cv::ocl::setUseOpenCL(true)" make any difference?
If UMat does sits on GPU side, does it mean that the data transferring between CPU-GPU only happens when I call umat.getMat()?
Thanks a lot!
No the UMat doesn't completely resides on a single hardware component like CPU or GPU, The UMat internally implements OpenCL framework which tries to harness the processing power of any capable hardware attached to the device, it could be CPU, GPU or even Digital Signal Processor present in mobile devices, so OpenCL performs efficient multi-processing along various available devices capable of processing. For more information follow this link

OpenCV BackgroundSubtratorMOG2

I've finished an algorithm aimed to foreground extraction based on video recently, but it processes too slowly per frame. There is an algorithm based on Mixed Gaussian Model named BackgroundSubtractorMOG2 in OpenCV3.0 and I find it processes quickly as nearly 15 times as mine per frame. I just wonder is it accelerated by OpenCL on GPU ? Or it is just run on CPU? p.s. I've seen some source codes of it and noticed there are OpenCL blocks but I'm not sure since I'm fresh. I will be very appreciated if anyone could help me figure it out!
If you look at the API page here You will find the line:
The function implements a sparse iterative version of the Lucas-Kanade optical flow in pyramids. See [Bouguet00]. The function is parallelized with the TBB library.
The TBB library is a parallization library and is used to "write parallel C++ programs that take full advantage of multicore performance" - this means that it is using more than just one CPU at a time, a much quicker way of processing. This can be seen on lines like this (Line 566):
parallel_for_(Range(0, image.rows),
MOG2Invoker(image, fgmask,
(GMM*)bgmodel.data,
(float*)(bgmodel.data + sizeof(GMM)*nmixtures*image.rows*image.cols),
bgmodelUsedModes.data, nmixtures, (float)learningRate,
(float)varThreshold,
backgroundRatio, varThresholdGen,
fVarInit, fVarMin, fVarMax, float(-learningRate*fCT), fTau,
bShadowDetection, nShadowDetection));

Would it work and be faster if I call function in OpenCV GPU module in my kernel function?

OpenCV has a gpu. GPU-accelerated Computer Vision module (http://docs.opencv.org/modules/gpu/doc/gpu.html). There are many functions which is already use GPU techniques. So I can directly use the function OpenCV applies. But I wonder whether it would be faster if I write my own kernel and in each kernel I call function of OpenCV GPU module. This is in the case I have many images. To handle each image I call OpenCV funtion in GPU module. Then it would be parallel-nested-parallel.
Your question is not entirely clear to me, but I would like to say this: it's impossible to say which would be faster, unless somebody already implemented that same algorithm using the approach you have in mind, and then shared a report about the benchmark tests.
There's a number of factors involved:
It depends on the type of operation you are trying to implement: techniques that have a high arithmetic intensity are better fit for GPUs for sure, however, not all problems can be modeled for GPUs.
The size of the input images matter: wasting time sending data from RAM to the GPU might not compensate in the end, so running the algorithm on the CPU can be faster for small images.
The model/power of the CPU/GPU: if the computer has a really crappy GPU, then it's probably better to run the algorithms on the CPU.
What I'm saying is: don't assume OpenCV GPU's module will always run it's algorithms faster than the CPU you got. Test it, measure it! The only way to know for sure is through experimentation and benchmark.

Real Time Cuda Image Processing advice

I am trying to implement an algorithm for a system which the camera get 1000fps, and I need to get the value of each pixel in all images and do the different calculation on the evolution of pixel[i][j] in N number of images, for all the pixels in the images. I have the (unsigned char *ptr) I want to transfer them to the GPU and start implementing the algorithm.but I am not sure what would be the best option for realtime processing.
my system:
CPU Intel Xeon x5660 2.8Ghz(2 processors)
GPU NVIDIA Quadro 5000
I got the following questions:
I do I need to add any Image Processing library addition to CUDA? if yes what do you suggest?
can I create a matrix for pixel[i,j] containing values for images [1:n] for each pixel in the image size? for example for 1000 images with 200x200 size I will end up with 40000 matrix each
containing 1000 values for one pixel? Does CUDA gives me some options like OpenCV to have a Matrices? or Vector?
1 - Do I need to add any Image Processing library addition to CUDA?
Apples and oranges. Each has a different purpose. An image processing library like OpenCV offers a lot more than simple accelerated matrix computations. Maybe you don't need OpenCV to do the processing in this project as you seem to rather use CUDA directly. But you could still have OpenCV around to make it easier to load and write different image formats from the disk.
2 - Does CUDA gives me some options like OpenCV to have a Matrices?
Absolutely. Some time ago I wrote a simple (educational) application that used OpenCV to load an image from the disk and use CUDA to convert it to its grayscale version. The project is named cuda-grayscale. I haven't tested it with CUDA 4.x but the code shows how to do the basic when combining OpenCV and CUDA.
It sounds like you will have 40000 independent calculations, where each calculation works only within one (temporal) pixel. If so, this should be a good task for the GPU. Your 352 core Fermi GPU should be able to beat your 12 hyperthreaded Xeon cores.
Is the algorithm you plan to run a common operation? It sounds like it might not be, in which case you will likely have to write your own kernels.
Yes, you can have arrays of elements of any type in CUDA.
Having this being a "streaming oriented" approach is good for a GPU implementation in that it maximizes number of calculations as compared to transfers over the PCIe bus. It it might also introduce difficulties in that, if you want to process the 1000 values for a given pixel in a specific order (oldest to newest, for instance), you will probably want to avoid continuously shifting all the frames in memory (to make room for the newest frame). It will slightly complicate your addressing of the pixel values, but the best approach, to avoid shifting the frames, may be to overwrite the oldest frame with the newest frame each time a new frame is added. That way, you end up with a "stack of frames" that is fairly well ordered but has a discontinuity between old and new frames somewhere within it.
I do I need to add any Image Processing library addition to CUDA ???
if yes what do you suggest?
Disclosure: My company develop & market CUVILib
There are very few options when it comes to GPU Accelerated Imaging libraries which also offer general-purpose functionality. CUVILib is one of those options which offers the following, very suited for your specific needs:
CuviImage object which holds your image data and image as a 2D matrix
You can write your own GPU function and use CuviImage as a 2D GPU matrix.
CUVILib already provides a rich set of Imaging functionality like Color Operations, Image Statistics, Feature detection, Motion estimation, FFT, Image Transforms etc so chances are that you will find your desired functionality.
As for the question of whether GPUs are suited for your application: Yes! Imaging is one of those domains which are ideal for parallel computation.
Links:
CUVILib: http://www.cuvilib.com
TunaCode: http://www.tunacode.com

Resources