Using external and internal memory for heap

Using external and internal memory for heap - memory

I have hooked up external SRAM memory in my project. What I want to do is to use malloc() to store data in external OR internal memory in runtime. How can I decide during code execution in which memory store heap data with malloc? I know I have to edit linker script but after that it will store ALL heap data in external memory.
Is there any linker command that can say to allocate next malloc() in external or internal memory? For stack data we can use attribute((section("name"))) variable attribute but is there anything for heap?
Thank you!

malloc from your C library can generally only use memory from one location. If you use newlib then it finds this memory using _sbrk. The default implementation of _sbrk depends on the definition of the symbol end or _end by the linker script, but you can also implement your own.
You will have to pick one location for malloc to access, and use your own custom function to allocate memory from somewhere else.
Many libraries and RTOS implementations do this. See for example mem_malloc in LwIP or rt_alloc_mem in Keil RTX
There are many schemes you can use to decide which memory to use, for example having pools of fixed size blocks for a particular purpose. I tend to use the fastest internal SRAM for malloc because it will become quite fragmented. I then make sure to only use malloc for small things and then custom functions for larger allocations.

Related

Is device memory allocated using CudaMalloc inaccessible on the device with free?

I cannot deallocate memory on the host that I've allocated on the device or deallocate memory on the device that I allocated on the host. I'm using CUDA 5.5 with VS2012 and Nsight. Is it because the heap that's on the host is not transferred to the heap that's on the device or the other way around, so dynamic allocations are unknown between host and device?
If this is in the documentation, it is not easy to find. It's also important to note, an error wasn't thrown until I ran the program with CUDA debugging and with Memory Checker enabled. The problem did not cause a crash outside of CUDA debugging, but would've cause problems later if I hadn't checked for memory issues retroactively. If there's a handy way to copy the heap/stack from host to device, that'd be fantastic... hopes and dreams.
Here's an example for my question:
__global__ void kernel(char *ptr)
{
free(ptr);
}
void main(void)
{
char *ptr;
cudaMalloc((void **)&ptr, sizeof(char *), cudaMemcpyHostToDevice);
kernel<<<1, 1>>>(ptr);
}

No you can't do this.
This topic is specifically covered in the programming guide here
Memory allocated via malloc() cannot be freed using the runtime (i.e., by calling any of the free memory functions from Device Memory).
Similarly, memory allocated via the runtime (i.e., by calling any of the memory allocation functions from Device Memory) cannot be freed via free().
It's in section B.18.2 of the programming guide, within section B.18 "B.18. Dynamic Global Memory Allocation and Operations".
The basic reason for it is that the mechanism used to reserve allocations using the runtime (e.g. cudaMalloc, cudaFree) is separate from the device code allocator, and in fact they reserve out of logically separate regions of global memory.
You may want to read the entire B.18 section of the programming guide, which covers these topics on device dynamic memory allocation.

Here is my solution to mixing dynamic memory allocation on the host using CRT, with the host's CUDA API, and with the kernel memory functions. First off, as mentioned above, they all must be managed separately using strategy that does not require dynamic allocations to be transferred directly between system and device without prior communication and coordination. Manual data copies are required that do not validate against the kernel's device heap as noted in Robert's answer/comments.
I also suggest to keep track of, audit, the number of bytes allocated and deallocated in the 3 different memory management APIs. For instance, every time a system:malloc, host:cudaMalloc, device:malloc or associated frees are called, use a variable to hold the number of bytes allocated or deallocated in each heap, i.e. from system, host, device. This helps with tracking leaks when debugging.
The process is complex to dynamically allocate, manage, and audit
memory between the system, host and device perspectives for deep
dynamic structure copies. Here is a strategy that works, suggestions
are welcomed:
Allocate system memory using cudaHostMalloc or malloc of a
structural type that contains pointers on the system heap;
Allocate device memory from host for the struct, and copy the
structure to the device (i.e. cudaMalloc, cudaMemcpy, etc.);
From within a kernel, use malloc to create a memory allocation
managed using the device heap and save the pointer(s) in the
structure that exists on the device from step 2;
Communicate what was allocated by the kernel to system by exchanging
the size of the allocations for each of the pointers in the struct;
Host performs the same allocation on the device using CUDA API (i.e.
cudaMalloc) from the system as was done by the kernel on the device,
recommended to have a separate pointer variable in the structure for
this;
At this point, the memory allocated dynamically from the kernel in
device memory can be manually copied to the location dynamically
allocated by the host in device memory (i.e. not using host:memcpy,
device:memcpy or cudaMemcpy);
Kernel cleans up memory allocations; and,
Host uses cudaMemcpy to move the structure from the device, a
similar strategy outlined in the above answer's comment can be used
as necessary for deep copies.
Note, cudaHostMalloc and system:malloc (or cudaHostMalloc) both share the same system heap, making system heap and host heap the same and interoperable, as mentioned in the CUDA guide, referenced above. Therefore, only system heap and device heap are mentioned.

How does a program know how much memory to release?

I suspect the answer to my question is language specific, so I'd like to know about C and C++. When I call free() on a buffer or use delete[], how does the program know how much memory to free?
Where is the size of the buffer or of the dynamically allocated array stored and why isn't it available to the programmer as well?

Each implementation will be different, but typically the runtime allocates a bit more than asked for, and uses some hidden fields at the start of the block to remember the allocated size. The address returned to the caller is therefore offset a bit from the start of the memory claimed from the heap.
It isn't available to the caller because the true amount of memory claimed from the heap is an implementation detail, and will vary between compilers and platforms. As for knowing how much the caller asked for, rather than how much was allocated from the heap... well, the language designers assume that the programmer is capable of remembering this if needed.

The heap keeps track of all memory blocks, both allocated and free, specifically for that purpose. Typical (if naive) implemenation allocates memory, uses several bytes in the beginning for bookkeeping, and returns the address past those bytes. On subsequent operations (free/realloc), it would subtract a few bytes to get to the bookkeeping area.
Some heap implementations (say, Windows' GlobalAlloc()) let you know the block size given the starting address. But in the C/C++ RTL heap, no such service.
Note that the malloc() sometimes overallocates memory, so the information about mallocated block size would be of limited utility. C++ new[]'ed arrays, that's a whole another matter - for those, knowing exact array size is essential for array destruction to work properly. Still, there's no such thing in C++ as a dynamic_sizeof operator.

The memory allocator that gave you that chunk of memory is responsible for all that maintenance data. Typically it's stored in the beginning of the chunk (right before the actual address you use) so it's easy to access on freeing.
Regarding to your other question: why should your app know about it? It's not your concern. It decouples memory allocation management from the app so you can use different allocators (for performance or debugging reasons).

It's stored internally in a location dependent on the language/compiler/OS.
Sometimes it is available (.Length in C# for example), though that may only refer to how much memory you're allowed to use, and not the object's total size.

Usually because the size to free is stored somewhere within the allocated buffer. A common technique is to have the size stored in memory just previous to the returned pointer.
Why isn't such information available to the programmer? I don't really know. I guess its because an implementation may be able to provide memory allocation without actually needing to store its size, and such implementation -if it exists- shouldn't be penalized by the others.

It's not so much language specific. It's all done by the memory manager.
How it knows depends on how the memory manager manages memory. The general idea is that the memory manager allocates more memory than you ask for. It stores extra data about the allocated blocks of memory in those locations. Thus, when you release the memory, it uses the information stored in those locations (reconstructed based on the given pointer) and figures out how much actual memory to stop managing.

Don't confound deallocation and destruction.
free() knows the size of the memory because of some internal magic ("implementation-defined"), e.g. the allocator could keep a list of all the allocated memory regions indexed by their corresponding pointers and just look up the pointer to know what to deallocate; or that information could be stored next to the allocated memory itself in some hidden block of data.
The array-delete expression delete[] arr; does not only deallocate memory, but it also invokes all destructors. For that purpose, it is not sufficient to just know the memory size, but we also need to know the number of elements. For that purpose, new T[N] actually allocates more than sizeof(T) * N bytes of memory, so the array-deleter knows how many destructors to call. All that memory is properly deallocated by the corresponding delete-operator.

Inside Dynamics memory management

i am student and want to know more about the dynamics memory management. For C++, calling operator new() can allocate a memory block under the Heap(Free Store ). In fact, I have not a full picture how to achieve it.
There are a few questions:
1) What is the mechanism that the OS can allocate a memory block?? As I know, there are some basic memory allocation schemes like first-fit, best-fit and worst-fit. Does OS use one of them to allocate memory dynamically under the heap?
2) For different platform like Android, IOS, Window and so on, are they used different memory allocation algorithms to allocate a memory block?
3) For C++, when i call operator new() or malloc(), Does the memory allocator allocate a memory block randomly in the heap?
Hope anyone can help me.
Thanks

malloc is not a system call, it is library (libc) routine which goes through some of its internal structures to give you address of a free piece of memory of the required size. It only does a system call if the process' data segment (i.e. virtual memory it can use) is not "big enough" according to the logic of malloc in question. (On Linux, the system call to enlarge data segment is brk)
Simply said, malloc provides fine-grained memory management, while OS manages coarser, big chunks of memory made available to that process.
Not only different platforms, but also different libraries use different malloc; some programs (e.g. python) use its internal allocator instead as they know its own usage patterns and can increase performance that way.
There is a longthy article about malloc at wikipedia.

How do programs allocate large amounts of memory?

I have 3 questions concerning memory allocation that I thought better to put into one question than 3.
When memory is allocated as I understand, it is allocated on the heap, which is just 16mb. How hen do programs such as video games or modern browsers manage to use over 1GB?
Since it is obviously possible for this much memory to be used, why can it not be allocated at the start? I have found the most I can allocate in High Level Assembly language is around 100MB. This is a lot more than 16MB, and far less than I have 3, so where does this limitation come from?
Why allocate memory in the first place, rather than allocating variables and letting the compiler/system handle it?

When memory is allocated as I understand, it is allocated on the heap,
which is just 16mb. How hen do programs such as video games or modern
browsers manage to use over 1GB?
The heap can grow. It isn't limited to any value and certainly not 16MB. You can easily allocate 1GB of heap, just make a program test and you'll see.
Since it is obviously possible for this much memory to be used, why
can it not be allocated at the start? I have found the most I can
allocate in High Level Assembly language is around 100MB. This is a
lot more than 16MB, and far less than I have 3, so where does this
limitation come from?
I'm not sure why your OS isn't filling larger allocation requests. Perhaps due to memory fragmentation? It's going to be a problem specific to your setup, which you didn't share. I can allocation much more memory than that without an issue.
You can try to use the mmap system call if malloc (which uses the brk system call) is having some sort of issue. Note that for GNU libc, malloc actually uses mmap instead of brk when the allocation is large enough (over 128k I think).
Why allocate memory in the first place, rather than allocating
variables and letting the compiler/system handle it?
Variable must live in memory somewhere. What you are saying is "why manually manage memory? Why can't some algorithm do that for me?". It is actually very common for the compiler and a runtime component to handle allocation/freeing - it's called garbage collection.

What is meaning of small footprint in terms of programming?

I heard many libraries such as JXTA and PjSIP have smaller footprints. Is this pointing to small resource consumption or something else?

Footprint designates the size occupied by your application in computer RAM memory.
Footprint can have different meaning when speaking about memory consumption.
In my experience, memory footprint often doesn't include memory allocated on the heap (dynamic memory), or resource loaded from disc etc. This is because dynamic allocations are non constant and may vary depending on how the application or module is used. When reporting "low footprint" or "high footprint", a constant or top measure of the required space is usually wanted.
If for example including dynamic memory in the footprint report of an image editor, the footprint would entirely depend on the size of the image loaded into the application by the user.
In the context of a third party library, the library author can optimize the static memory footprint of the library by assuring that you never link more code into your application binary than absolutely needed. A common method used for doing this in for instance C, is to distribute library functions to separate c-files. This is because most C linkers will link all code from a c-file into your application, not just the function you call. So if you put a single function in the c-file, that's all the linker will incoporate into your application when calling it. If you put five functions in the c-file the linker will probably link all of them into your app even if you only use one of them.
All this being said, the general (academic) definition of footprint includes all kinds of memory/storage aspects.

From Wikipedia Memory footprint article:
Memory footprint refers to the amount of main memory that a program uses or references while running.
This includes all sorts of active memory regions like code segment containing (mostly) program instructions (and occasionally constants), data segment (both initialized and uninitialized), heap memory, call stack, plus memory required to hold any additional data structures, such as symbol tables, debugging data structures, open files, shared libraries mapped to the current process, etc., that the program ever needs while executing and will be loaded at least once during the entire run.

Generally it's the amount of memory it takes up - the 'footprint' it leaves in memory when running. However it can also refer to how much space it takes up on your harddrive - although these days that's less of an issue.
If you're writing an app and have memory limitations, consider running a profiler to keep track of how much your program is using.

It does refer to resources. Particularly memory. It requires a smaller amount of memory when running.

yes, resources such as memory or disk

Footprint in Computing i-e for computer programs or Computer machines is referred as the occupied device memory , for a program , process , code ,etc

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart