Memory padding and alignment for local functions in C/C++ - memory

I know what padding and alignment is and how is it used in structs in C/C++.
My question is do I need to minimize the padding in variables inside local functions also or they have almost negligible memory.

Related

Stack Pointer is decremented to allocate space for local variables when a function is called

I read somewhere that Stack Pointer is decremented to allocate space for local variables when a function is called.
I don't understand how it is true because according to me it should be incremented. Can somebody please explain?
First, it does not really matter if it is increased or decreased - the difference is only that the CPU increases or decreases SP on push/pop operations. This does not effect et al what is the essence of a stack: we get the data read from it exactly in the opposite order as we put it into.
The reason for it is historical: on machines without a paging-based virtual memory support, we have a fixed address space. There should be the code, the heap and the stack somehow placed - without overwriting each other.
The code part of the program typically does not change (except self-overwriting code on the ASM level - it is nearly surreal today, but it was rare even long ago). The size of the heap (data) segment sometimes grows, sometimes shrink, and also fragments. The stack only grows or shrinks, but it does not fragment.
This resulted that the typical memory layout of a process address space was this:
code (at the beginning)
heap (right after the code, but note: its size varies and it can not overlap with the stack!)
stack (because we do not know, how the heap will grow, it needs to placed to far away from the data as it is possible).
This results that the stack has to be at the end of the address space, but to make its growth possible, it has to be decreased on data insertion.
Another memory layouts were also possible, there were CPUs where the stack has grown on data insertion.
Later, with the appearance of the paging-based memory virtualization, this problem could have been roughly solved (although a decreasing stack were still better if the size of the virtual address space was not big enough). But there was no need to break compatibility for a zero-to-little improvement.

Min and Max Stack Sizes in Delphi

I came accross an option in the Delphi 6 IDE:
How does changing the memory stack sizes here affect the IDE? If I increase this value would there be more memory available for the IDE?
No, stack size does not influence on IDE work.
This is linker option, it defines how much stack size will be available for your compiled program. At most max stack size.
Stack is used to hold local variables and sometimes function arguments. You seldom need to increase stack sizes if application design is quite good. Stack overflow (if happens) might be a result of unlimited recursion due to logical mistakes or result of defining too large local variables (for example - static arrays)
P.S. What problem are you going to solve?

Why does the stack overflow?

Ok. so my understanding of how executables are laid out in memory is... image a square box that represents the memory accessible by your app.
The program code resides at the bottom of the memory, the stack is allocated to a spot just beyond the program code and is allocated upwards. the heap starts at the top of the memory and is allocated downwards.
If this is the case, why is it possible to allocate more heap memory than stack memory?
Because even on modern systems with lots of virtual memory available, the maximum size of the call stack is usually deliberately limited to, say, 1MB.
This is not usually a fundamental limit; it's possible to modify this (using e.g. setrlimit() in Linux, or the -Xss flag for Java). But needing to do so usually indicates an abnormal program; if you have large data-sets, they should normally be stored on the heap.

Why can NAND flash memory cells only be directly written to when they are empty?

I'm trying to understand why you have to erase cells before writing to them with respect to SSDs and how they slow down over time.
Here is how writing to NAND and erasing works:
When a block is erased, all the bits are set to 1. To change bits from 1 to 0, bits are programmed (written to). Programming cannot change bits from 0 to 1.
Let's suppose you have to store the value 11001100. First, the block needs to erased to represent all 1s (11111111). Then, the particular bits are programmed (11001100). Now, the same memory location cannot be programmed to 11111100, because programming cannot change a 0 to 1.
This is the reason NAND finds a free/empty page with all 1s and then changes the specific bits from 1 to 0. The conventional idea that writing can change 1s to 0s and 0s to 1s is not true for NAND flash. The fact that the NAND programming operation can only change bits from 1 to 0, means that we need an erased page before we start writing.
The following figure shows the relationship between pages and blocks, from an article on flashdba.com:
For more, consult this introduction to NAND flash by Micron.
Why erasure before write?
Erasing a cell means removing most electrons from its floating gate. No electrons in the floating gate commonly represents binary 1:
This and the next illustration are from "How Does Flash Memory Work? (SSD)" by BLITZ.
Assuming one cell represents one bit (that's called single-level cell, or SLC for short), you could informally say that erasing is the write operation: You set the cell/bit to 1 by erasing the cell which means removing most electrons from the floating gate. On the other hand, filling the floating gate with electrons is how setting the bit to 0 is physically implemented:
A non-empty cell with a valid amount of electrons—representing a zero for SLC—is referred to as a programmed cell.
Of significance here is that erasure (setting bits to 1) is rather coarse-grained in flash memory: You erase whole blocks which typically consist of about 40.000 cells. Erasing cells in bulk is faster than erasing individual cells. This whole-block-erasing strategy is what differentiates EEPROM from flash memory. Tangentially, that's also what gave flash memory its name:
According to Toshiba, the name "flash" was suggested by Masuoka's [the inventor of flash] colleague, Shōji Ariizumi, because the erasure process of the memory contents reminded him of the flash of a camera. — From Wikipedia
When one cell represents multiple bits via the amount of electrons it contains, I can think of two reasons why you'd want to empty the cell before putting electrons into it:
You can only increase the amount of electrons for individual cells. The sole way of lowering the electron count is by erasing and that means removing (nearly) all electrons from the whole block of cells. I'm not aware if any SSD takes this shortcut to represent a different state: "just" increasing the amount of electrons in a cell without emptying it first.
As described in this video lecture by Jisung Park of ETH Zürich: Writing to cells is unreliable, getting in the right amount of electrons is difficult and takes multiple tries (see also incremental step-pulse programming). Starting with an empty cell and filling it gradually until it has the right amount of electrons is more reliable than trying to add the correct amount of electrons with other electrons already present in the cell. And, as mentioned above, you'd only be able to increase, not decrease, the electrons for individual cells.
Slowdown of cells over time
Moving electrons in and out of the floating gate physically damages the insulating barriers around the floating gate. The silicon dioxide, SiO₂, starts leaking the electrons stored in the floating gate after enough writes and erasures. If untreated, those leaks cause data corruption. For example, the cell's value changes from 101 to 011 by itself due to lost electrons. To avoid data corruption, the SSD regularly refreshes those cells.
Another problem in aging cells is that reading from or writing to them might take longer since they got more unreliable after too many electrons have been shot through the silicon dioxide insulators around the floating gate.
As shown in the following annotated screenshot from an excellent video by Branch Education, those silicon dioxide (SiO₂) insulators can be fewer than 100 atoms, about 8 nanometres, wide:
The Wikipedia article seems to at least hint at the answer. It appears that "tunnel injection" is used for writing and "tunnel release" for erasing. I'll leave it to the physicists to explain exactly what the implications of that are.
I'm trying to understand why you have to erase cells before writing to them
You don't have to erase a flash memory cell before writing to it. However, you can only write to one entire block of cells at a time. Typically these blocks of cells are at least 128KB in size.
So suppose you are writing a 4KB file to your SSD. Well, you have to write one 128KB block at a time. If there is already data in that 128KB block, the drive firmware has to read the 128KB block into its memory, modify the 4KB section you are writing to, and then write the entire 128KB block back out to the flash memory.
The way modern flash chips are designed, it's easier to program a cell in one direction than the other. If a chip holding 16,777,216 bytes in 256 blocks of 65,536 bytes each can only be erased as a unit, then it will require ~128 million "little" circuits to allow programming of the individual bits, and 256 "large" circuits to erase those blocks. For the chip to allow pages of 256 bytes to be erased would require 65,536 of those "large" circuits. I'm not sure what fraction of the chip would be used up by that many page-erase circuits, but it would be significant. Using larger erase blocks allows chips to be manufactured more cheaply; for many applications, a cheaper chip with large erase blocks is preferable to a more costly chip with smaller ones.

memory allocation in small memory devices

Some systems such as Symbian insist people to use heap instead of stack when allocating
big objects(such as pathnames, which may be more than 512 bytes). Is there any specific reason for this?
Generally the stack on an embedded device is fixed to be quite small i.e. 8K is the default stack size on Symbian.
If you consider a maximum length filename is 256bytes, but double that for unicode that's 512bytes already (1/16th of your whole stack) just for 1 filename. So you can imagine that it is quite easy to use up the stack if you're not careful.
Most Symbian devices do come with an MMU, but, until very recently, do not support paging. This means that physical RAM is committed for every running process. Each thread on Symbian has (usually) a fixed 8KB stack. If each thread has a stack, then increasing the size of this stack from 8KB to, say 32KB, would have a large impact on the memory requirements of the device.
The heap is global. Increasing its size, if you need to do so, has far less impact. So, on Symbian, the stack is for small data items only - allocate larger ones from the heap.
Embedded devices often have a fixed-sized stack. Since a subroutine call in C only needs to push a few words onto the stack, a few hundred byte may suffice (if you avoid recursive function calls).
Most embedded devices doesn't come with a memory management unit so there is no way for the OS to grow the stack space automatically, transparent to the programmer. Even assuming a growable stack, you will have to manage it yourself which is no better than heap allocation and defeats the purpose of using a stack in the first place.
The stack for embedded devices usually resides in a very small amount of high-speed memory. If you allocate large objects on the stack on such a device, you might be facing a stack overflow.

Resources