Memory management of Metal resources [closed]

Memory management of Metal resources [closed] - ios

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
There some MTLTextures of which histogram is to be calculated, this textures changes at every loop iteration so calculation of histogram is carried out in loop using MPSImageHistogram.
There is a single command queue on which a new command buffer is created at every loop iteration to perform histogram calculation.
Memory footprints (from allocation instrument) keeps on increasing due to new command buffer creation in the loop.
The question is how to clear the allocated memory of the command buffer once it's executed? Or is there any way for restructuring the calculation scheme?
In short, how to deallocate the memory consumed by metal objects like command buffer, compute pipeline, encoders etc.

I would suggest looking at autoreleasepool. It is intended to be used for iterations, it frees memory each loop.
Supposing your render function is called performHistogram you can do something like this:
var myTexture: MTLTexture? = nil
autoreleasepool {
myTexture = performHistogram()
}
//myTexture has been updated and memory should be freed.

Related

FreeRTOS - The reason for the stack increase?

I have created several tasks, the body of which is the same function. Inside the function, there is a delay that is the same for every task. So, when this delay is large enough, the stack of each task is filled with 6 words less than when the delay is less.
As far as I understand, the stack of tasks increases when there is a situation with several tasks with the status Ready.
1.In this situation, some additional 6-word context is written to the task stack?
2.In my example, it turns out 6 words (24 bytes), can this value change somehow?
3.What else can affect the increase in the stack, such as jumping into an interrupt handler?
int main(void){
xTaskCreate(Task_PrintCountString, "Task_1", mySTACK_SIZE, &param1, 1, NUUL);
xTaskCreate(Task_PrintCountString, "Task_2", mySTACK_SIZE, &param2, 1, NUUL);
xTaskCreate(Task_PrintCountString, "Task_3", mySTACK_SIZE, &param3, 1, NUUL);
vTaskStartScheduler();
}
#define myDELAY 100
void Task_PrintCountString(void *pParams){
uint16_t c=0;
for(;;){
if(xSemaphoreTake(WriteCountMutex, portMAX_DELAY) == pdTRUE){
PrintCountString(*(uint8_t *)pParams, c++);
xSemaphoreGive(WriteCountMutex);
}
vTaskDelay(myDELAY/portTICK_PERIOD_MS);//When myDELAY=1, the task stack is 6 words more than when myDELAY= 100!
}
}
Address 0x20000158 is the top of the stack for one of the tasks. Similarly, others!
The only function PrintCountString always the same depth. But at the same time, the stack can grow to its maximum knowledge in several stages, going through dozens of iterations of the task cycle! It turns out that not only the context is saved to the stack, but something else?
P.S.
I use ARM CM0 port and heap_1.
I made an observation:
In the PrintCountString function there was a block for waiting for the SPI flag - while(). I replaced the DMA transfer method and removed the while() loop. The stack of both tasks began to fill up immediately to the maximum value, regardless of delays.

FreeRTOS is just C code so, with one exception, the stack usage is determined by the compiler, including the selected optimisation level. The one exception be that when a task is not running its context is saved to its stack. The context size is fixed in all cases that matter. The size of the context depends on which FreeRTOS port you are using (there are more than 40).
There are a few things in your post I'm not sure about.
when this delay is large enough, the stack of each task is filled with
6 words less than when the delay is less
As above, the C code doesn't change depending on how long a delay you have - the compiler knows nothing about FreeRTOS or the concept of a delay. You may see different stack depths depending on where the code was in the call tree when an interrupt occurs (including interrupts that cause context switches).
I look at the top of the stack and count the remaining number of bytes
to be 0xA5
You don't say which port you are using, so I don't know if the stack grows up or down. In any case the stack is filled with 0xa5 when the task is created but never touched by the kernel again so the number of 0xa5s left shows the maximum amount of stack the task has used since it was created. That value is returned by calling uxTaskGetStackHighWaterMark().

In general the stack memory is used by the microcontroller itself. There are several instructions which writes/reads some data to/from the stack.
The PUSH instruction copies the content of one register to the stack and modifies the stack pointer register
The POP instruction reads a word from the stack into a register and modifies the stack pointer register
Both instructions are managed by the compiler. If a functions is executed the compiler creates some PUSH instructions to "free" some registers to work with. At the end of the function the original register state will be restored using the POP instructions to guarantee the upcoming control flow.
and each branch instruction stores several register values to the stack
The amount of used stack memory depends on the call hierarchy.
According to your example, if the SemaphoreTake function is called and finds a released semaphore the call hierarchy is different from a situation where the semaphore is blocked. This creates a different stack usage.

Calculation of stack size in FreeRtos or TI rtos

Recently I was working with Rtos and created some tasks to perform my required actions. Although it seems like every time when I create new task with xTaskCreate or TI GUI configuration, I simply try to keep my stack size as much so that the stack must not get overflowed.
Is there any way to calculate the maximum stack size used by my task with respect to these events?
1. Stack used by global and local variable
2. Stack used by the maximum number of recursion of the function
3. Including the interrupt context switching

The compiler, compiler optimisation level, CPU architecture, local variable allocations and function call nesting depth all have a large impact on the stack size. The RTOS has minimal impact. For example, FreeRTOS will add approximately 60 bytes to the stack on a Cortex-M - which is used to store the task's context when the task is not running. Whichever method you use to calculate stack usage in your non-RTOS project can be used in your RTOS project too - then add approximately 60 bytes.
You can calculate these things, and that can be important in safety critical applications, but in other cases a more pragmatic approach is to try it and see - use the features of the RTOS to measure how much stack is actually being used and use the stack overflow detection - then adjust until you find something optimal.
http://www.freertos.org/Stacks-and-stack-overflow-checking.html
http://www.freertos.org/uxTaskGetStackHighWaterMark.html

I'm used this code:
TaskHandle_t cipTask;
UBaseType_t uxHighWaterMark;
/* Print actual size of stack has used */
for (;;) {
uxHighWaterMark = uxTaskGetStackHighWaterMark(cipTask);
Serial.println(uxHighWaterMark);
}

DirectX 12 Updating the Descriptor Heap

I'm currently writing my own graphics framework for DirectX12 (I've already written several DirectX 11 frameworks for personal game engines), and I'm currently trying to copy the methods used in the recent Hitman game for resource binding.
I'm confused about the best way to handle per-object resource binding for the SRV/CBV/UAV heap. I've watched several GDC presentations, and they all seem to gloss over this.
Only 1 SRV/CBV/UAV heap can be bound at a time, and switching the currently-bound heap in the middle of a command list can be bad for performance on some hardware by forcing a flush. Because of this, what is the best way to handle updating the heap with new descriptors? To me, it seems like each command list would:
Get a hold of a SRV/CBV/UAV heap for itself.
For each object in a subset of objects, create descriptors on the heap pointing to per-object data that was placed into a separate upload heap.
Afterwards, another command list takes this filled descriptor heap and binds it, then issues draw calls mixed with SetGraphicsRootDescriptorTable in order to move through the current descriptor heap.
This being said, several sources online (including another SO post) suggest using one large SRV/CBV/UAV heap and copying into it using CPU-visible heaps. I'm assuming they're not attempting to use the asynchronous CopyDescriptors, but rather CopyBufferRegion. I tried using CopyBufferRegion to update data per-object, but to me this seems under-performant with so many transitions between D3D12_RESOURCE_STATE_VERTEX_AND_CONSTANT_BUFFER and D3D12_RESOURCE_STATE_COPY_DEST. Am I misunderstanding something? Any clarity would be appreciated.

CopyDescriptors is not asynchronous, it is a CPU operation that is immediate on the CPU. It can happen anytime before a command list is executed for volatile descriptor ( after the command list operation using it is recorded ), or have to be ready at the usage for static descriptor ( root signature 1.1 ).
The usual approach is to have a large descriptor heap, keep a portion for static descriptors, then use the rest as a ring buffer, allocating descriptor table offset on demand to copy and use the needed descriptor for any draw/compute operation.
CopyBufferRegion has nothing to do here, remember that mapping buffers is also an immediate operation, so you also ring buffer a big chunk of memory for your per objet constant buffers, and you cycle into it. The only thing is that you need to make sure you do not overwrite memory or descriptor while they may still be in use, so you have to fence to prevent the case.

opencl asked about size of type Memory Model

how can know the maximum size of all type memory model for opencl (gpu & cpu)
Global Memory,Constant Memory,Local Memory,Private Memory.
and what of this type is the faster ???

"All of the memory" isn't really a relevant phrase for opencl. You use devices separately, and you poll them for their specs separately. use clGetDeviceInfo and pass in the constant/code you want to ask from the driver. Read more about this here: clGetDeviceInfo
If you want to know about total memory, you need to sum up the values for the memory types you ask about.
As for the speed of memory types, I answered a related question about this not too long ago: stack overflow.

What is the purpose of each of the memory locations, stack, heap, etc? (lost in technicalities)

Ok, I asked the difference between Stackoverflow and bufferoverflow yesterday and almost getting voted down to oblivion and no new information.
So it got me thinking and I decided to rephrase my question in the hopes that I get reply which actually solves my issue.
So here goes nothing.
I am aware of four memory segments(correct me if I am wrong). The code, data, stack and heap. Now AFAIK the the code segment stores the code, while the data segment stores the data related to the program. What seriously confuses me is the purpose of the stack and the heap!
From what I have understood, when you run a function, all the related data to the function gets stored in the stack and when you recursively call a function inside a function, inside of a function... While the function is waiting on the output of the previous function, the function and its necessary data don't pop out of the stack. So you end up with a stack overflow. (Again please correct me if I am wrong)
Also I know what the heap is for. As I have read someplace, its for dynamically allocating data when a program is executing. But this raises more questions that solves my problems. What happens when I initially initialize my variables in the code.. Are they in the code segment or in the data segment or in the heap? Where do arrays get stored? Is it that after my code executes all that was in my heap gets erased? All in all, please tell me about heap in a more simplified manner than just, its for malloc and alloc because I am not sure I completely understand what those terms are!
I hope people when answering don't get lost in the technicalities and can keep the terms simple for a layman to understand (even if the concept to be described is't laymanish) and keep educating us with the technical terms as we go along. I also hope this is not too big a question, because I seriously think they could not be asked separately!

What is the stack for?
Every program is made up of functions / subroutines / whatever your language of choice calls them. Almost always, those functions have some local state. Even in a simple for loop, you need somewhere to keep track of the loop counter, right? That has to be stored in memory somewhere.
The thing about functions is that the other thing they almost always do is call other functions. Those other functions have their own local state - their local variables. You don't want your local variables to interfere with the locals in your caller. The other thing that has to happen is, when FunctionA calls FunctionB and then has to do something else, you want the local variables in FunctionA to still be there, and have their same values, when FunctionB is done.
Keeping track of these local variables is what the stack is for. Each function call is done by setting up what's called a stack frame. The stack frame typically includes the return address of the caller (for when the function is finished), the values for any method parameters, and storage for any local variables.
When a second function is called, then a new stack frame is created, pushed onto the top of the stack, and the call happens. The new function can happily work away on its stack frame. When that second function returns, its stack frame is popped (removed from the stack) and the caller's frame is back in place just like it was before.
So that's the stack. So what's the heap? It's got a similar use - a place to store data. However, there's often a need for data that lives longer than a single stack frame. It can't go on the stack, because when the function call returns, it's stack frame is cleaned up and boom - there goes your data. So you put it on the heap instead. The heap is a basically unstructured chunk of memory. You ask for x number of bytes, and you get it, and can then party on it. In C / C++, heap memory stays allocated until you explicitly deallocate. In garbage collected languages (Java/C#/Python/etc.) heap memory will be freed when the objects on it aren't used anymore.
To tackle your specific questions from above:
What's the different between a stack overflow and a buffer overflow?
They're both cases of running over a memory limit. A stack overflow is specific to the stack; you've written your code (recursion is a common, but not the only, cause) so that it has too many nested function calls, or you're storing a lot of large stuff on the stack, and it runs out of room. Most OS's put a limit on the maximum size the stack can reach, and when you hit that limit you get the stack overflow. Modern hardware can detect stack overflows and it's usually doom for your process.
A buffer overflow is a little different. So first question - what's a buffer? Well, it's a bounded chunk of memory. That memory could be on the heap, or it could be on the stack. But the important thing is you have X bytes that you know you have access to. You then write some code that writes X + more bytes into that space. The compiler has probably already used the space beyond your buffer for other things, and by writing too much, you've overwritten those other things. Buffer overruns are often not seen immediately, as you don't notice them until you try to do something with the other memory that's been trashed.
Also, remember how I mentioned that return addresses are stored on the stack too? This is the source of many security issues due to buffer overruns. You have code that uses a buffer on the stack and has an overflow vulnerability. A clever hacker can structure the data that overflows the buffer to overwrite that return address, to point to code in the buffer itself, and that's how they get code to execute. It's nasty.
What happens when I initially initialize my variables in the code.. Are they in the code segment or in the data segment or in the heap?
I'm going to talk from a C / C++ perspective here. Assuming you've got a variable declaration:
int i;
That reserves (typically) four bytes on the stack. If instead you have:
char *buffer = malloc(100);
That actually reserves two chunks of memory. The call to malloc allocates 100 bytes on the heap. But you also need storage for the pointer, buffer. That storage is, again, on the stack, and on a 32-bit machine will be 4 bytes (64-bit machine will use 8 bytes).
Where do arrays get stored...???
It depends on how you declare them. If you do a simple array:
char str[128];
for example, that'll reserve 128 bytes on the stack. C never hits the heap unless you explicitly ask it to by calling an allocation method like malloc.
If instead you declare a pointer (like buffer above) the storage for the pointer is on the stack, the actual data for the array is on the heap.
Is it that after my code executes all that was in my heap gets erased...???
Basically, yes. The OS will clean up the memory used by a process after it exits. The heap is a chunk of memory in your process, so the OS will clean it up. Although it depends on what you mean by "clean it up." The OS marks those chunks of RAM as now free, and will reuse it later. If you had explicit cleanup code (like C++ destructors) you'll need to make sure those get called, the OS won't call them for you.
All in all, please tell me about heap in a more simplified manner than just, its for malloc and alloc?
The heap is, much like it's name, a bunch of free bytes that you can grab a piece at a time, do whatever you want with, then throw back to use for something else. You grab a chunk of bytes by calling malloc, and you throw it back by calling free.
Why would you do this? Well, there's a couple of common reasons:
You don't know how many of a thing
you need until run time (based on
user input, for example). So you
dynamically allocate on the heap as
you need them.
You need large data structures. On
Windows, for example, a thread's
stack is limited by default to 1
meg. If you're working with large
bitmaps, for example, that'll be a
fast way to blow your stack and get
a stack overflow. So you grab that
space of the heap, which is usually
much, much larger than the stack.
The code, data, stack and heap?
Not really a question, but I wanted to clarify. The "code" segment contains the executable bytes for your application. Typically code segments are read only in memory to help prevent tampering. The data segment contains constants that are compiled into the code - things like strings in your code or array initializers need to be stored somewhere, the data segment is where they go. Again, the data segment is typically read only.
The stack is a writable section of memory, and usually has a limited size. The OS will initialize the stack and the C startup code calls your main() function for you. The heap is also a writable section of memory. It's reserved by the OS, and functions like malloc and free manage getting chunks out of it and putting them back.
So, that's the overview. I hope this helps.

With respect to stack... This is precicely where the parameters and local variables of the functions / procedures are stored. To be more precise, the params and local variables of the currently executing function is only accessible from the stack... Other variables that belong to chain of functions that were executed before it will be in stack but will not be accessible until the current function completed its operations.
With respect global variables, I believe these are stored in data segment and is always accessible from any function within the created program.
With respect to Heap... These are additional memories that can be made allotted to your program whenever you need them (malloc or new)... You need to know where the allocated memory is in heap (address / pointer) so that you can access it when you need. Incase you loose the address, the memory becomes in-accessible, but the data still remains there. Depending on the platform and language this has to be either manually freed by your program (or a memory leak occurs) or needs to be garbage collected. Heap is comparitively huge to stack and hence can be used to store large volumes of data (like files, streams etc)... Thats why Objects / Files are created in Heap and a pointer to the object / file is stored in stack.

In terms of C/C++ programs, the data segment stores static (global) variables, the stack stores local variables, and the heap stores dynamically allocated variables (anything you malloc or new to get a pointer to). The code segment only stores the machine code (the part of your program that gets executed by the CPU).

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart