I have a kernel module that handles IOCTL calls from userspace. One of the calls needs to return a variable length buffer from the kernel into userspace. From the module, I can kmalloc( ..., GFP_USER) a buffer for the userspace process to use. But, my question is, can this buffer be free'd from userspace or does it need to be free'd from kernel space?
Alternatively, is there a better way to handle data transfer with variable length data?
No, user space can't free kernel memory. Your module would have to offer another call / ioctl to let user space tell your kernel code to free the memory. You would also have to track your allocations to make sure to free them when the user space process exits so as not to leak memory… Also kernel memory is not swappable, if user space makes you allocate memory again and again it could run the kernel out of memory so you have to guard against that, too.
The easier method is to just let user space offer the buffer from its own memory. Include a maximum length argument in the call so that you won't write more than user space expects and return partial data or an error if the size is too small, as appropriate.
GFP_USER - means that its a kernel space memory that you can allow the user to access (used as a marker for shared kernel/user pages). note, process can sleep/block and run only in process context.
However, memory which gets allocated in the kernel space gets always get freed in the kernel space, and vis-a-vis for user space.
Related
Say I have a big block of mapped memory I finished using. It came from mmaping anonymous memory or using MAP_PRIVATE. I could munmap it, then have malloc mmap again the next time I make a big enough allocation.
Could I instead give the memory to malloc directly? Could I say "Hey malloc, here's an address range I mapped. Go use it for heap space. Feel free to mprotect, mremap, or even munmap it as you wish."?
I'm using glibc on linux.
glibc malloc calls __morecore (a function pointer) to obtain more memory. See <malloc.h>. However, this will not work in general because the implementation assumes that the function behaves like sbrk and returns memory from a single, larger memory region. In practice, with glibc malloc, the only realistic way to make memory available for reuse by malloc is calling munmap.
Other malloc implementations allow donating memory (in some cases as internal interfaces). For example, musl's malloc has a function called __malloc_donate which should do what you are asking for.
I am working on a Linux module to interface with a third-party device. When this device is ready to give my module information, it writes directly to the RAM memory address 0x900000.
When I check /proc/iomem, I get:
00000000-3fffffff: System Ram
00008000-00700fff: Kernel code
00742000-007a27b3: Kernel datat
From, my understanding, this means that it is writing to an address that is floating out in the middle of user-space.
I know that this is not an optimal situation and it would be better to be able to use memory-mapped addresses/registers, but I don’t have the option of changing the way it works right now.
How do I have my kernel module safely claim the user space memory space from 0x900000 to 0x901000?
I tried mmap and ioremap but those are really for memory-mapped registers, not accessing memory that already ‘exists’ in userspace. I believe that I can read/write from the address by just using the pointer, but that doesn’t prevent corruption if that region is allocated to another process.
You can tell the kernel to restrict the address for kernel space by setting the mem parameter in the bootargs :
mem=1M#0x900000 --> instructs to use 1M starting from 0x900000
you can have multiple mem in boot args
example: mem=1M#0x900000 mem=1M#0xA00000
Following command should tell you the memory region allocated to the kernel:
cat /proc/iomem | grep System
I am confused about the the __local memory in OpenCL here.
I read some spec saying that the data flow has to be from Host to
__Global, and then __Local.
But I also see some kernel function like this:
__kernel void foo(__local float * a)
I was wondering how the data was transferred directly into the __local
memory in this way?
Thanks.
It is not possible to fill local buffer on the host side. Therefore you have to follow the flow host -> __global -> __local.
Local buffer can be either created on the host side and then it is passed as a kernel parameter or on gpu side inside the kernel.
Creating local buffer on the host side gives the advantage to decide about its size before the kernel is run which can be important if the local buffer size needs to be different each time the kernel is run.
Local memory is not visible to anything but a single work-group, and may be allocated as the work-group is dispatched by hardware on many architectures. Hardware that can mix multiple work-groups from different kernels on each CU will allow the scheduling component to chunk up the local memory for each of the groups being issued. It doesn't exist before the group is launched, and does not exist after the group terminates. The size of this region is what you pass in as other answers have pointed out.
The result of this is that the only way on many architectures for filling local memory from the host would be for kernel code to be inserted by the compiler that would copy data in from global memory. Given that as the basis, it isn't any worse in terms of performance for the programmer to do it manually, and gives more control over exactly what happens. You do not end up in a situation where the compiler always generates copy code and ends up copying more than was really necessary because the API didn't make it clear what memory was copy-in and what was not.
In summary, you cannot fill local memory in any automated way. In practice you will rarely want to, because doing it manually gives you the opportunity to only put the result of a first stage into local, removing extra copy operations, or to transform the data on the way in to local, allowing padding or data transposition to remove bank conflicts and so on.
As #doqtor said, the size of local memory on kernel parameter can be specified by clSetKernelArg calls.
Fortunately, OpenCL 1.2+ support VLA(variable length array), local memory kernel parameter is not required any more.
I am trying to find some useful information on the malloc function.
when I call this function it allocates memory dynamically. it returns the pointer (e.g. the address) to the beginning of the allocated memory.
the questions:
how the returned address is used in order to read/write into the allocated memory block (using inderect addressing registers or how?)
if it is not possible to allocate a block of memory it returns NULL. what is NULL in terms of hardware?
in order to allocate memory in heap we need to know which memory parts are occupied. where this information (about the occupied memory) is stored (if for example we use a small risc microcontroller)?
Q3 The usual way that heaps are managed are through a linked list. In the simplest case, the malloc function retains a pointer to the first free-space block in the heap, and each free-space block has a header that points to the next free space block in the heap. So the heap is in-effect self-defining in terms of knowing what is not occupied (and by inference what is therefore occupied); this minimizes the amount of overhead RAM needed to manage the heap.
When new space is needed via a malloc call, a large enough free-space block is found by traversing the linked list. That found free-space block is given to the malloc caller (with a small hidden header), and if needed a smaller free-space block is inserted into the linked list with any residual space between the original free space block and how much memory the malloc call asked for.
When a heap block is released by the application, its block is just formatted with the linked-list header, and added to the linked list, usually with some extra logic to combine consecutive free-space blocks into one larger free-space block.
Debugging versions of malloc usually do more, including retaining linked-lists of the allocated areas too, "guard zones" around the allocated heap areas to help detect memory overflows, etc. These take up extra heap space (making the heap effectively smaller in terms of usable space for the applications), but are extremely helpful when debugging.
Q2 A NULL pointer is effectively just a zero, which if used attempts to access memory starting at location 0 of RAM, which is almost always reserved memory of the OS. This is the cause of a significant quantity of memory violation aborts, all caused by programmer's lack of error checking for NULL returns from functions that allocate memory).
Because accessing memory location 0 by a non-OS application is never what is wanted, most hardware aborts any attempt to access location 0 by non-OS software. Even with page mapping such that the applications memory space (including location 0) is never mapped to real RAM location 0, since NULL is always zero, most CPUs will still abort attempts to access location 0 on the assumption that this is an access via a pointer that contains NULL.
Given your RISC processor, you will need to read its documentation to see how it handles attempts to access memory location 0.
Q1 There are many high-level language ways to use allocated memory, primarily through pointers, strings, and arrays.
In terms of assembly language and the hardware itself, the allocated heap block address just gets put into a register that is being used for memory indirection. You will need to see how that is handled in the RISC processor. However if you use C or C++ or such higher level language, then you don't need to worry about registers; the compiler handles all that.
Since you are using malloc, can we assume you are using C?
If so, you assign the result to a pointer variable, then you can access the memory by referencing through the variable. You don't really know how this is implemented in assembly. That depends on CPU you are using. malloc return 0 if it fails. Since usually NULL is defined as 0, you can test for NULL. You don't care how malloc tracks the free memory. If you really need this information, you should look at the source in glibc/malloc available on the net
char * c = malloc(10); // allocate 10 bytes
if (c == NULL)
// handle error case
else
*c = 'a' // write a in the first character on the block
I know I can reserve virtual memory using VirtualAlloc.
e.g. I can claim 1GB of virtual memory and then call in the first MB of that to put my a growing array into.
When the array grows beyond 1MB I call in the 2nd MB and so on.
This way I don't need to move the array around in memory when it grows, it just stays in place and the Intel/AMD virtual memory manager takes care of my problems.
However does FastMM support this structure, so I don't have to do my own memory management?
Pseudo code:
type
PBigarray = ^TBigarray;
TBigArray = array[0..0] of SomeRecord;
....
begin
VirtualMem:= FastMM.ReserveVirtualMemory(1GB);
PBigArray:= FastMM.ClaimPhysicalMemory(VirtualMem, 1MB);
....
procedure GrowBigArray
begin
FastMM.ClaimMorePhysicalMemory(PBigArray, 1MB {extra});
//will generate OOM exception when claim exceeds 1GB
Does FastMM support this?
No, FastMM4 (as of the latest version I looked at) does not explicitly support this. It's really not a functionality you would expect in a general purpose memory manager as it's trivially simple to do with VirtualAlloc calls.
NexusMM4 (which is part of NexusDB) does something that gives you a similar result, but without wasting all the address space before it is needed in the background.
If you make an initial large allocation (directly via GetMem, or indirectly via a dynamic array or such) the memory is allocated in just the size needed, via VirtualAlloc.
But if that allocation is then resized to a larger size, NexusMM will use a different way to allocate memory which allows it to simply unmap the allocation from the address space an remap it again, at a larger size, when further reallocs takes place.
This prevents the 2 major problems that most general purpose memory managers have when reallocating:
during a normal realloc the existing and new allocation need to be present in the address space at the same time, temporarily doubling the address space and physical memory requirements
during a normal realloc, the whole contents of the existing allocation needs to be copied
So with NexusMM you would get all the advantages of what you showed in your pseudo code (with the exception that the first realloc will involve a copy, and that growing your array might change it's address) by simply using normal GetMem/ReallocMem/FreeMem calls.