I'm using ColorPacket *GetImageHistogram(const Image *image, ...) to extract an histogram. I see IM sources and found, that GetImageHistogram allocates memory via:
histogram=(ColorPacket *) AcquireQuantumMemory((size_t) cube_info->colors,
sizeof(*histogram));
How should I free this memory?
To free memory allocated with AcquireQuantumMemory, use RelinquishMagickMemory:
histogram = RelinquishMagickMemory(histogram);
See the API documentation. This function always returns NULL.
Related
Say I have a big block of mapped memory I finished using. It came from mmaping anonymous memory or using MAP_PRIVATE. I could munmap it, then have malloc mmap again the next time I make a big enough allocation.
Could I instead give the memory to malloc directly? Could I say "Hey malloc, here's an address range I mapped. Go use it for heap space. Feel free to mprotect, mremap, or even munmap it as you wish."?
I'm using glibc on linux.
glibc malloc calls __morecore (a function pointer) to obtain more memory. See <malloc.h>. However, this will not work in general because the implementation assumes that the function behaves like sbrk and returns memory from a single, larger memory region. In practice, with glibc malloc, the only realistic way to make memory available for reuse by malloc is calling munmap.
Other malloc implementations allow donating memory (in some cases as internal interfaces). For example, musl's malloc has a function called __malloc_donate which should do what you are asking for.
i need to be able to allocate 2MB or 4MB sized pages of memory in a kernel module.
In Linux Kernel for allocation of continuous memory you can use function:
__get_free_pages(flags, page_rate);
where flags is usual flags and page_rate defines number of allocated pages where: number of pages = 2 ^ page_rate. You can use this function as proxy between the Kernel and your calling code.
Another approach is allocate huge page if it is possible.
I have a kernel module that handles IOCTL calls from userspace. One of the calls needs to return a variable length buffer from the kernel into userspace. From the module, I can kmalloc( ..., GFP_USER) a buffer for the userspace process to use. But, my question is, can this buffer be free'd from userspace or does it need to be free'd from kernel space?
Alternatively, is there a better way to handle data transfer with variable length data?
No, user space can't free kernel memory. Your module would have to offer another call / ioctl to let user space tell your kernel code to free the memory. You would also have to track your allocations to make sure to free them when the user space process exits so as not to leak memory… Also kernel memory is not swappable, if user space makes you allocate memory again and again it could run the kernel out of memory so you have to guard against that, too.
The easier method is to just let user space offer the buffer from its own memory. Include a maximum length argument in the call so that you won't write more than user space expects and return partial data or an error if the size is too small, as appropriate.
GFP_USER - means that its a kernel space memory that you can allow the user to access (used as a marker for shared kernel/user pages). note, process can sleep/block and run only in process context.
However, memory which gets allocated in the kernel space gets always get freed in the kernel space, and vis-a-vis for user space.
I am working on a project that needs a lot of OpenCL code. I am using OpenCV's ocl module to develop my project faster but there are some functions not implemented and I will have to write my own OpenCL code.
My question is this: what is the quickest and cheapest way to transfer data from Mat and/or oclMat to a cl_mem array. Re-wording this, is there a good way to transfer or enqueue (clEnqueueWriteBuffer) data from oclMat or Mat?
Currently, I am using a for-loop to read data from Mat (or download from oclMat and then use for-loops) and then enqueuing it. This is turning out to be costly, hence my question.
Thanks to anyone who sees this question :)
I've written a set of interop functions for the Boost.Compute library which ease the use of OpenCL and OpenCV. Take a look at the opencv_copy_mat_to_buffer() function.
There are also functions for copying from a OpenCL buffer back to the host cv::Mat and for copying cv::Mat to OpenCL image2d objects.
Calculate memory bandwidth, achieved in Host-Device interconnections.
If you get ~60% and more of maximal bandwidth, you've nothing to do, memory transfer is as fast as it can be. But if your bandwidth results are lower that 55% - 60% of theoretical maximum, try to use multiple command queues with unblocking operations (don't forget to sync at the end). Also, pay attention on avg image size. Small data transfers usually have big overhead rate.
If your Device uses shared memory, use memory mapping instead of read/write, this may dramatically save time. If Device has it's own memory, apply pinned memory technique, which is well described in NVIDIA OpenCL Best Practices Guide.
The documentation of oclMat states that there is some sort of functionality to the underlying ocl buffer data:
//! pointer to the data(OCL memory object)
uchar *data;
If you have clMat already in the device, you can simply perform a copy buffer from clMat.data to your clBuffer. But you will have to hack a little bit the memory, accessing some private members of the oclMat
Something like:
clEnqueueCopyBuffer(command_queue, (clBuffer *)oclMat.data, dst_buffer, 0, 0, size);
NOTE: Take care with the casting, maybe you have to cast another pointer.
For your comment, it's right. The oclMat can be used as cl_mem(void *) for device, since it was alloced by OpenCL device.
Additionally, you can creat svm memory(for example void* svmdata) at first, and then assign a Mat like: Mat A(rows, cols, CV_32FC1, svmdata).
Now you can process the Mat A between host and device without memory copy.
(PS. The svm memory is the new character of OCL, it can be created by clSVMAlloc).
I want to allocate all the available shared memory of an SM to one block. I am doing this because I don't want multiple blocks to be assigned to the same SM.
My GPU card has 64KB (Shared+L1) memory. In my current configuration, 48KB is assigned to the Shared memory and 16KB to the L1.
I wrote the following code to use up all of the available Shared memory.
__global__ void foo()
{
__shared__ char array[49152];
...
}
I have two questions:
How can I make sure that all of the shared memory space is used up?
I can increase "48K" to a much higher value(without getting any error or warning). Is there anyone who can justify this?
Thanks in advance,
Iman
You can read size of available device shared memory from cudaDeviceProp::sharedMemPerBlock that you can obtain by calling cudaGetDeviceProperties
You do not have to specify size of your array. Instead, you may dynamically pass size of the shared memory as 3rd kernel launch parameter.
The "clock" CUDA SDK sample illustrates how you can specify shared memory size at launch time.