I have a lot of code that is based on open cv but there are many ways in which the Arm Compute library improves performance, so id like to integrate some arm compute library code into my project. Has anyone tried converting between the two corresponding Image structures? If so, what did you do? Or is there a way to share a pointer to the underlying data buffer without needing to copy image data and just set strides and flags appropriately?
I was able to configure an arm_compute::Image corresponding to my cv::Mat properties, allocate the memory, and point it to the data portion of my cv:Mat.
This way, I can process my image efficiently using arm_compute and maintain the opencv infrastructure I had for the rest of my project.
// cv::Mat mat defined and initialized above
arm_compute::Image image;
image.allocator()->init(arm_compute::TensorInfo(mat.cols, mat.rows, Format::U8));
image.allocator()->allocate();
image.allocator()->import_memory(Memory(mat.data));
Update for ACL 18.05 or newer
You need to implement IMemoryRegion.h
I have created a gist for that: link
Related
There's no support TGA format for OpenCV currently.
And I know there's a single header file library named stb_image that allow you to read/write TGA image.
But the use case with OpenCV on the Internet are so few. (more often to see people use it with OpenGL)
The second method I found.
There's a short code included (the answer) in this topic:
Loading a tga/bmp file in C++/OpenGL
Someone use this code to read TGA file into cv::Mat just like the code below.
Tga tgaImg = Tga("/tmp/test.tga");
Mat img(tgaImg.GetHeight(), tgaImg.GetWidth(), CV_8UC4);
memcpy(img.data, tgaImg.GetPixels().data(), tgaImg.GetHeight() * tgaImg.GetWidth() * 4);
But this is only for reading part. I wonder if stb_image can do the same thing like the code above. I mean the image data structure might be different. (not look into them yet)
I would like to ask people who also experience this before. Since DDS/TGA image format are also popular using in game texture, there must be people have already found the way. I mean read/write TGA format in OpenCV code.
Thanks.
For saving opencv image in tga use stbi_write_tga. This function takes pointer to image data as argument, which is img.data in case of cv::Mat type.
I am using IMFSourceReader with hardware acceleration enabled to decode videos and read them into my application. After the ReadSample call, I get hold of the IDirect3DSurface9 from the IMFSample. At this point, I use the LockRect() call to access the raw-bytes and copy them into my applications buffer.
I would like to perform additional operations on the GPU such as transpose and a possible conversion of the image data from row-major order to column-major order.
Is there a Blt operation I can setup to this?
I came across the ID3DXBaseEffect interface but I am not sure that is applicable in my case.
Would appreciate any inputs.
Dinesh
With IDirect3DSurface9, you can use shader (ID3DXBaseEffect).
To do it on GPU directly, before copy the raw-bytes to your application, i will try this :
Call IMFSourceReader::GetServiceForStream to query for MR_VIDEO_ACCELERATION_SERVICE and IDirect3DDeviceManager9.
use IDirect3DDeviceManager9 to query the IDirect3DDevice9 (IDirect3DDeviceManager9::LockDevice).
Use IDirect3DDevice9, IDirect3DSurface9, a new RenderTarget, shader, as usual with Directx.
copy the raw-bytes from the final RenderTarget (after shader apply).
EDIT
See here : mofo7777 github
Under MediaFoundationTransform > MFTDirectxAware > MFTVideoShaderEffect, i'll show the concept.
I am working on a custom device that supports OpenCL 1.2 Embedded Profile and does not have Image support or Texture Memory. I have to pass an image through a Sobel filter and then a Median filter. What could be the best (fastest) way of doing this? Can I avoid having to send the image back to the host after Sobel filter and then reading it back on the device for Median filter? Where to store the intermediate image, global memory, local memory or elsewhere?
You can keep the buffer in the global memory of the device between kernel calls to avoid the extra copies. When you create the buffer, make sure you use the flag 'CL_MEM_READ_WRITE', this will allow the Sobel kernel to write to it, and the Median kernel to read from it afterward. You can get away with two buffers, but I would use three if memory is not a restriction.
create 3 buffers. call them whatever you'd like. (originalBuff, middleBuff, finalBuff)
copy the image data to originalBuff
optionally set other buffers to an all-zero state (can be done on the device by the kernels which write to these buffers)
call the sobel filter kernel with params (originalBuff, middleBuff)
call median kernel with params (middleBuff, finalBuff)
read finalBuff back to host
I left out the other steps, such as creating context/program/queue/etc.. in order to focus on the answer to your question.
Read about clCreateBuffer here.
EDIT:
I have not tried the flag 'CL_MEM_HOST_NO_ACCESS' before, but I think it is worth a try. In my example, middleBuff might benefit from this flag. Like most opencl features, any possible benefit would be implementation-dependent.
I am working on a project that needs a lot of OpenCL code. I am using OpenCV's ocl module to develop my project faster but there are some functions not implemented and I will have to write my own OpenCL code.
My question is this: what is the quickest and cheapest way to transfer data from Mat and/or oclMat to a cl_mem array. Re-wording this, is there a good way to transfer or enqueue (clEnqueueWriteBuffer) data from oclMat or Mat?
Currently, I am using a for-loop to read data from Mat (or download from oclMat and then use for-loops) and then enqueuing it. This is turning out to be costly, hence my question.
Thanks to anyone who sees this question :)
I've written a set of interop functions for the Boost.Compute library which ease the use of OpenCL and OpenCV. Take a look at the opencv_copy_mat_to_buffer() function.
There are also functions for copying from a OpenCL buffer back to the host cv::Mat and for copying cv::Mat to OpenCL image2d objects.
Calculate memory bandwidth, achieved in Host-Device interconnections.
If you get ~60% and more of maximal bandwidth, you've nothing to do, memory transfer is as fast as it can be. But if your bandwidth results are lower that 55% - 60% of theoretical maximum, try to use multiple command queues with unblocking operations (don't forget to sync at the end). Also, pay attention on avg image size. Small data transfers usually have big overhead rate.
If your Device uses shared memory, use memory mapping instead of read/write, this may dramatically save time. If Device has it's own memory, apply pinned memory technique, which is well described in NVIDIA OpenCL Best Practices Guide.
The documentation of oclMat states that there is some sort of functionality to the underlying ocl buffer data:
//! pointer to the data(OCL memory object)
uchar *data;
If you have clMat already in the device, you can simply perform a copy buffer from clMat.data to your clBuffer. But you will have to hack a little bit the memory, accessing some private members of the oclMat
Something like:
clEnqueueCopyBuffer(command_queue, (clBuffer *)oclMat.data, dst_buffer, 0, 0, size);
NOTE: Take care with the casting, maybe you have to cast another pointer.
For your comment, it's right. The oclMat can be used as cl_mem(void *) for device, since it was alloced by OpenCL device.
Additionally, you can creat svm memory(for example void* svmdata) at first, and then assign a Mat like: Mat A(rows, cols, CV_32FC1, svmdata).
Now you can process the Mat A between host and device without memory copy.
(PS. The svm memory is the new character of OCL, it can be created by clSVMAlloc).
I'm using cv::PCA class for a face recognition project. I convert photos of faces to one row vectors, concatenate them to one big array and feed to pca, to acquire a new space in which I can try to use distance for recognition. Problem is, that calculating the pca from scratch each time I start the program is really time consuming (almost five minutes). I figured out that I need to save the calculated pca to hard drive, and load it when I start the program again. And here is the problem. As I can see, all cv::Mat objects in cv::PCA are of type CV_32F. When i try to save it as a normal picture, its converted to 8 bit image, and there is some data lost. When i use XML/YAML persistence, the generated file is really big, and data is also lost (I have saved it, loaded to another structure and ran cerr<<sum(pca_orginal.mean==pca_loaded.mean)[0]<<endl to check how big is the difference). Right now I'm trying to use std::ofstream::write with std::ofstream::binary flag, and istream::read, but there are some type issues (out.write(_pca.mean.data,_pca.mean.rows*_pca.mean.cols*4/*CV_32F->4*CV_8U*/\); generates error: no matching function for call to ‘std::basic_ofstream<char, std::char_traits<char> >::write(uchar*&, int). I've also heard about openexr library and it's file format, but I would rather avoid using additional libraries. I'm using OpenCV 2.3.1 and OpenCV 2.2.
edit:
I'm sorry for the confusion. I misread cv::Mat operator== description, and thought that it works the opposite way that it does, so sum(pca_orginal.mean==pca_loaded.mean)[0] giving 0 is the worse possible result, not the best. It means that XML/YML works fine apart from generating huge files. Also, after using c-style casting I was able to make the binary streams work, but the files generated are also big (over 150MB).
In the C interface, there are functions cvSave and cvLoad for saving arbitrary matrices. There are probably C++ interface counterparts, too.