How can I use Hugepage memory from kernel space? - memory

i need to be able to allocate 2MB or 4MB sized pages of memory in a kernel module.

In Linux Kernel for allocation of continuous memory you can use function:
__get_free_pages(flags, page_rate);
where flags is usual flags and page_rate defines number of allocated pages where: number of pages = 2 ^ page_rate. You can use this function as proxy between the Kernel and your calling code.
Another approach is allocate huge page if it is possible.

Related

Can i give mapped memory to malloc?

Say I have a big block of mapped memory I finished using. It came from mmaping anonymous memory or using MAP_PRIVATE. I could munmap it, then have malloc mmap again the next time I make a big enough allocation.
Could I instead give the memory to malloc directly? Could I say "Hey malloc, here's an address range I mapped. Go use it for heap space. Feel free to mprotect, mremap, or even munmap it as you wish."?
I'm using glibc on linux.
glibc malloc calls __morecore (a function pointer) to obtain more memory. See <malloc.h>. However, this will not work in general because the implementation assumes that the function behaves like sbrk and returns memory from a single, larger memory region. In practice, with glibc malloc, the only realistic way to make memory available for reuse by malloc is calling munmap.
Other malloc implementations allow donating memory (in some cases as internal interfaces). For example, musl's malloc has a function called __malloc_donate which should do what you are asking for.

How is there internal fragmentation in paging and no external fragmentation?

Please explain it nicely. Don't just write definition. Also explain what it does and how is it different from segmentation.
Fragmentation needs to be considered with memory allocation techniques. Paging is basically not a memory allocation technique, but rather a means of providing virtual address spaces.
Considering the comparison with segmentation, what you're probably asking about is the difference between a memory allocation technique using fixed size blocks (like the pages of paging, assuming 4KB page size here) and a technique using variable size blocks (like the segments used for segmentation).
Now, assume that you directly use the page allocation interface to implement memory management, that is you have two functions for dealing with memory:
alloc_page, which allocates a single page and returns a pointer to the beginning of the newly available address space, and
free_page, which frees a single, allocated page.
Now suppose all of your currently available virtual memory is used, but you need to store 1 additional byte. You call alloc_page and get a 4KB block of memory. You only use 1 byte of that huge block, but also the other 4095 bytes are, from the perspective of the allocator, used. If this happens multiple times eventually all pages will be allocated, so further calls to alloc_page will fail. Even if you just need another additional byte (which could be one of the 4095 that got wasted above) the allocator will tell you that you're out of memory. This is internal fragmentation.
If, on the other hand, you would use variable sized blocks (like in segmentation), then you're vulnerable to external fragmentation: Suppose you manage 6 bytes of memory (F means "free"):
FFFFFF
You first allocate 3 bytes for a, then 1 for b and finally 2 bytes for c:
aaabcc
Now you free both a and c, leaving only b allocated:
FFFbFF
You now have 5 bytes of unused memory, but if you try to allocate a block of 4 bytes (which is less than the available memory) the allocation will fail due to the unfavorable placement of the memory for b. This is external fragmentation.
Now, if you extend your page allocator to be able to allocate multiple pages and add alloc_multiple_pages, you have to deal with both internal and external fragmentation.
There is no external fragmentation in paging but internal fragmentation exists.
First, we need to understand what is external fragmentation. External fragmentation occurs when we have a memory to accommodate a process but it's not continuous.
How does it not occur in paging?
Paging divides virtual memory or all processes into equal-sized pages and physical memory into fixed size frames. So you are typically fixing equal size blocks called pages into equal block shaped spaces called frames! Try to visualize and conclude that there can never be external fragmentation.
In the case of segmentation, we divide virtual addresses into different sized blocks that is why there may be the case some blocks in main memory must stick together or compact to make space for the new process! I hope it helps!
When a process is divided into fix sized pages, there is generally some leftover space in the last page(internal fragmentation). When there are many processes, each of their last page's unused area could add up to be greater than or equal to size of one page. Now even if you have to total free size of one page or more but you cannot load a new page because a page has to be continuous. External fragmentation has happened. So, I don't think external fragmentation is completely zero in paging.
EDIT: It is all about how External Fragmentation is defined. The collection of internal fragmentation do not contribute to external fragmentation. External fragmentation is contributed by the empty space which is EXTERNAL to partition(or page). So if suppose there are only two frames in main memory ,say of size 16B, each occupied by only 1B data. The internal fragmentation in each frame is 15B. The total unused space is 30B. Now if you want to load one new page of some process, you will see that you do not have any frame available. You are unable to load a new page eventhough you have 30B unused space. Will you call this as external fragmentation? Answer is no. Because these 15B unused space are INTERNAL to the pages. So in paging, internal fragmentation is possible but not external fragmentation.
Paging allows a process to be allocated physical memory in non-contiguous fashion. I will answer that why external fragmentation can't occur in paging.
External frag occurs when a process, which was allocated contiguous memory , is unloaded from physical memory, which creates a hole (free space ) in the memory.
Now if a new process comes, which requires more memory than this hole, then we won't be able to allocate contiguous memory to that process due to non contiguous nature of free memory, this is called external fragmentation.
Now, the problem above originated due to the constraint of allocating contiguous memory to the process. This is what paging solved by allowing process to get non contiguous physical memory.
In paging, the probability of having external fragmentation is very low although internal fragmentation may occur.
In paging scheme, the whole main memory and the virtual memory is divided into some fixed size slots which are called pages (in case of virtual memory) and page frames (in case of main memory or RAM or physical memory). So, whenever a process is executed in main memory, it occupies the entire space of a page frame. Let us say, the main memory has 4096 page frames with each page frame having a size of 4096 bytes. Suppose, there is a process P1 which requires 3000 bytes of space for its execution in main memory. So, in order to execute P1, it is brought from virtual memory to main memory and placed in a page frame (F1) but P1 requires only 3000 bytes of space for its execution and as a result of which (4096 - 3000 = 1096 bytes) of space in the page frame F1 is wasted. In other words, this denotes the case of internal fragmentation in the page frame F1.
Again, external fragmentation may occur if some space of the main memory could not be included in a page frame. But this case is very rare as usually the size of a main memory, the size of a page frame as well as the total no. of page frames in main memory can be expressed in terms of power of 2.
As far as I've understood, I would answer your question like so:
Why is there internal fragmentation with paging?
Because a page has fixed size, but processes may request more or less space. Say a page is 32 units, and a process requests 20 units. Then when a page is given to the requesting process, that page is no longer useable despite having 12 units of free "internal" space.
Why is there no external fragmentation with paging?
Because in paging, a process is allowed to be allocated spaces that are non-contiguous in the physical memory. Meanwhile, the logical representation of those blocks will be contiguous in the virtual memory. This is what I mean:
A process requires 128 units of space. This is 4 pages as in the previous example. Unregardless of the actual page numbers (formally frame numbers) in the physical memory, you give those pages the numbers 0, 1, 2, and 3. This is the virtual representation that is the defining characteristic of paging itself. Those pages may be 21, 213, 23, 234 in the actual physical memory. But they can really be anything, contiguous or non-contiguous. Therefore, even if paging leaves small free spaces in between used spaces, those small free spaces can still be used together as if they were one contiguous block of space. That's why external fragmentation won't happen.
Frames are allocated as units. If the memory requirements of a process do not happen to coincide with page boundaries, the last frame allocated may not be completely full.
For example, if the page size is 2,048 bytes, a process of 72,766 bytes will need 35 pages plus 1,086 bytes. It will be allocated 36 frames, resulting in internal fragmentation of 2,048 - 1,086 = 962 bytes. In the worst case, a process would need 11 pages plus 1 byte. It would be allocated 11 + 1 frames, resulting in internal fragmentation of almost an entire frame.

Can a userspace process kfree() memory with GFP_USER?

I have a kernel module that handles IOCTL calls from userspace. One of the calls needs to return a variable length buffer from the kernel into userspace. From the module, I can kmalloc( ..., GFP_USER) a buffer for the userspace process to use. But, my question is, can this buffer be free'd from userspace or does it need to be free'd from kernel space?
Alternatively, is there a better way to handle data transfer with variable length data?
No, user space can't free kernel memory. Your module would have to offer another call / ioctl to let user space tell your kernel code to free the memory. You would also have to track your allocations to make sure to free them when the user space process exits so as not to leak memory… Also kernel memory is not swappable, if user space makes you allocate memory again and again it could run the kernel out of memory so you have to guard against that, too.
The easier method is to just let user space offer the buffer from its own memory. Include a maximum length argument in the call so that you won't write more than user space expects and return partial data or an error if the size is too small, as appropriate.
GFP_USER - means that its a kernel space memory that you can allow the user to access (used as a marker for shared kernel/user pages). note, process can sleep/block and run only in process context.
However, memory which gets allocated in the kernel space gets always get freed in the kernel space, and vis-a-vis for user space.

How to allocate all of the available shared memory to a single block in CUDA?

I want to allocate all the available shared memory of an SM to one block. I am doing this because I don't want multiple blocks to be assigned to the same SM.
My GPU card has 64KB (Shared+L1) memory. In my current configuration, 48KB is assigned to the Shared memory and 16KB to the L1.
I wrote the following code to use up all of the available Shared memory.
__global__ void foo()
{
__shared__ char array[49152];
...
}
I have two questions:
How can I make sure that all of the shared memory space is used up?
I can increase "48K" to a much higher value(without getting any error or warning). Is there anyone who can justify this?
Thanks in advance,
Iman
You can read size of available device shared memory from cudaDeviceProp::sharedMemPerBlock that you can obtain by calling cudaGetDeviceProperties
You do not have to specify size of your array. Instead, you may dynamically pass size of the shared memory as 3rd kernel launch parameter.
The "clock" CUDA SDK sample illustrates how you can specify shared memory size at launch time.

Maximum memory allocation on openCL CPU

I have read that there's a limit to the maximum memory allocation to around 60% of device memory, and these can be changed by modifying the GPU_MAX_HEAP_SIZE and GPU_MAX_ALLOC_SIZE environment variables for GPU.
I am wonder if the AMD SDK has something similar for the CPU if I want to raise the limit of memory allocation?
For my current configuration, it returns the following:
CL_DEVICE_MAX_MEM_ALLOC_SIZE = 2973.37MB
CL_DEVI_CEGLOBAL_MEM_SIZE = 11893.5MB
Thanks.
I was able to change this on my system. I don't know if this method was possible when you originally asked the question.
set the environment variable 'CPU_MAX_ALLOC_PERCENT' to the percentage of total memory you want to be able to allocate for a single global buffer. I have 8GB system memory, and after setting CPU_MAX_ALLOC_PERCENT to 80, clinfo reports the following:
Max memory allocation: 6871207116
Success! 6.399GB
You can also use GPU_MAX_ALLOC_PERCENT in the same way for your GPU devices.

Resources