Why is there a stack and a heap? - memory

Why do assembly languages use both a stack and a heap? They seem redundant.

They're not redundant. Each of them has strengths and weaknesses: A stack is faster if used right, because memory allocation is trivial (push / pop). The downside is that you can only add and remove items at the top (hence the name, stack). Also, total stack space is limited, and when you run out, you have a... well, stack overflow. The heap, by contrast, allows random allocation and deallocation, and you can store large amounts of data there, but the downside is that allocation carries more overhead - for each allocated block of memory, a suitable free portion must be found, and in the long run, fragmentation of the free space needs to be avoided, and the system must track where the free blocks are.
You use the stack to pass around small short-lived values, e.g. local counter variables, function arguments, return values, etc.; these lend themselves to push/pop allocation style. For larger or long-lived data structures, you use the heap.

You could certainly construct a computing system that utilised either one of them as its only memory model. However, they both have rather different properties each with its own good and bad points. Most systems utilise both so as to get the benefits from each of them.
Stacks
A stack can be thought of as a pile of plates, you write a value on a plate and put it on the top of the stack this is called a push operation and stores a value on the stack. You can obviously also remove the top plate from the stack this is called a pop operation. But new allocations must always be at the top of the stack.
The stack tend to be used for local variables and passing values between functions. Generally stacks have the following awesome properties:
Requires only a handful of pointers to manage
Very easy to implement in hardware, most processors have built in hardware support for a stack making it even faster.
Very quick to allocate memory
The problem with the stack comes from the fact items can only be added/removed from the top of the stack. Now this makes great sense when traversing up and down through function calls: pop functions inputs from the stack, allocate space for local variables on the stack, run function, clear local variables from the top of the stack and push the return value onto the stack. If on the other hand I want to allocate some memory and say pass it to another thread or in general free it far away from where it was allocated all of a sudden I have a problem, the stack is not in the correct position when I want to free the memory.
You could say the stack facilitates fast sequential memory allocation.
Heap
Now the heap is different each allocation is generally tracked separately. This causes a lot of overhead for allocations and deallocations, but each one can be handled independently of other memory allocations, well until you run out of memory.
There are numerous algorithms for accomplishing this and it is probably a bit unwise to twitter on about them here but here is a link that talks about a few good simple heap allocation algorithms: Alternatives to malloc and new
So the heap facilitates random memory allocation but this comes with a runtime penalty, however that penalty is often small that what would be incurred if you had to handle the situation using just the stack.

It is about the memory handling and managing.
There are different type of registers of x86 architectures.
There are possibilities of hardware supported memory management on x86 architecture and so on.
Stack is used by instruction pointer, Heap is for data segment in some applications.
To read more I advice you read the following links:
http://en.wikipedia.org/wiki/Data_segment
http://en.wikipedia.org/wiki/X86_memory_segmentation
"A memory model allows a compiler to perform many important
optimizations" - Wikipedia

Related

Memory collision in Stacks

So I understand what a stack overflow is, when memory collides (and the title of this website) but what I do not understand is why new entries to the stack are in a decremental memory address. Why are they not in a random memory address, would it not make more sense so that memory collision is not an issue? I am guessing there is some sort of optimizing reason behind that?
** EDIT **
What I did not realize is a stack is given x amount of address space. Makes sense now but brings me to a follow-up question. Can I explicitly state how much memory I want to allocate to a stack?
"Memory collides" would better suit the term of "buffer overflow", where you write outside of the predestined space, but where it is likely to be within a different allocated memory block.
A stack overflow is not about writing outside of one's memory allocation into another memory allocation. It's just about writing outside of one's stack memory allocation. Most likely outside of the stack there's a guard memory page, that is not allocated for anything and which causes a fault on a read or write attempt.
And assigning a random address for each value pushed on the stack makes it hard to find data on the stack (and it's not a stack anymore). When the compiler or programmer knows that subsequent elements occupy subsequent addresses, then it's easy to compute those addresses just from the base pointer of the stack frame.
The answer to this question is probably complex, but basically stack operations are considered to be very primitive functions that the processor does as part of normal execution of code. (Saving return addresses and other stuff.)
So where do you put the memory management code? Where do you track the allocated addresses or add code to allocate new addresses? There really isn't anywhere to do this as these are basic operations performed by the processor itself.
Similar to the memory that holds the code itself, the stack is assumed to be setup before the code runs (and pointed to by the stack register). There really isn't any place to add complex memory management to stack memory. And so, yes, if not enough memory was provided, the stack will overflow.
Stack overflow is when you have used up all available stack space. The space available for the stack is, in most cases just an arbitrary limit chosen by the system designers. It is possible to alter this, but on modern systems, it's not really an issue - code that needs several megabytes of stack, unless the system is REALLY huge, is probably not correctly designed.
The stack grows towards zero from "custom" - it has to go in a defined direction or it would be very hard to follow what is going on, and lower adddress is just as good as higher address. It used to be that stack and heap grew towards each other, which would allow code that uses a lot of stack and not so much heap to work in the same amount of memory as something that uses a smaller amount of stack and a larger amount of heap. But these days, there is typically enough memory (space) that the heap can be defined to be somewhere completely separate from the stack. Instead the stack overflow is detected by having a region of "reserved" memory just at the top of the stack that is not usable - so the OS gets a "trap" for using memory that isn't available, and the application can be killed.

Is stack in CPU or RAM?

I have a few question about stack.
Is stack in CPU or RAM?
Is stack a place to run OPcode?
Is EIP in CPU or RAM?
Stack is always in RAM. There is a stack pointer that is kept in a register in CPU that points to the top of stack, i.e., the address of the location at the top of stack.
The stack is found within the RAM and not within the CPU. A segment is dedicated for the stack as seen in the following diagram:
From Wiki:
The stack area contains the program stack, a LIFO structure, typically
located in the higher parts of memory.
Which CPU are you talking about?
Some might contain memory that is used for callstacks, some contain memory that can be used for callstacks but require the OS to implement the callstack management code, and others contain no writable memory at all. For example, the x86 architecture tends to have one or more code caches and data caches built into the CPU.
Some CPUs or OSes implement operations that make specific areas of memory non-executable. To prevent stack-based buffer overflows, for example, many OSes use hardware and/or software-based data execution prevention, which might prevent stack memory from being executed as code. Some don't; It's entirely possible that an x86 CPU data cache line might be used to store both the callstack and code to be executed in faster memory.
EIP sounds like a register for the IA32 CPU architecture. If you're referring to IA-32, then yes, it's a CPU operation, though many OSes will switch it to/from RAM to emulate multi-tasking.
In modern architectures stack is mapped in ram.
Programming languages such ar C, C++, Pascal can allocate memory in ram, this is called Heap allocation, and other variables which live withing functions are stack allocated.
This dictated processors and operating systems to consider stack mapped within ram segment. And for processors with Memory Management Unit this can be anywhere in the ram. However, intel 8080 had a state bit indicating when it reads/writes from stack, thus stack could be implemented physically isolated from RAM. It is not known to me if such machine was implemented, but think of the situation, what memory does a C pointer points to, Heap or Stack.
Should Stack separation gain popularity we should have stack pointer and heap pointer in modern programming languages.

memory management and segmentation faults in modern day systems (Linux)

In modern-day operating systems, memory is available as an abstracted resource. A process is exposed to a virtual address space (which is independent from address space of all other processes) and a whole mechanism exists for mapping any virtual address to some actual physical address.
My doubt is:
If each process has its own address space, then it should be free to access any address in the same. So apart from permission restricted sections like that of .data, .bss, .text etc, one should be free to change value at any address. But this usually gives segmentation fault, why?
For acquiring the dynamic memory, we need to do a malloc. If the whole virtual space is made available to a process, then why can't it directly access it?
Different runs of a program results in different addresses for variables (both on stack and heap). Why is it so, when the environments for each run is same? Does it not affect the amount of addressable memory available for usage? (Does it have something to do with address space randomization?)
Some links on memory allocation (e.g. in heap).
The data available at different places is very confusing, as they talk about old and modern times, often not distinguishing between them. It would be helpful if someone could clarify the doubts while keeping modern systems in mind, say Linux.
Thanks.
Technically, the operating system is able to allocate any memory page on access, but there are important reasons why it shouldn't or can't:
different memory regions serve different purposes.
code. It can be read and executed, but shouldn't be written to.
literals (strings, const arrays). This memory is read-only and should be.
the heap. It can be read and written, but not executed.
the thread stack. There is no reason for two threads to access each other's stack, so the OS might as well forbid that. Moreover, the tread stack can be de-allocated when the tread ends.
memory-mapped files. Any changes to this region should affect a specific file. If the file is open for reading, the same memory page may be shared between processes because it's read-only.
the kernel space. Normally the application should not (or can not) access that region - only kernel code can. It's basically a scratch space for the kernel and it's shared between processes. The network buffer may reside there, so that it's always available for writes, no matter when the packet arrives.
...
The OS might assume that all unrecognised memory access is an attempt to allocate more heap space, but:
if an application touches the kernel memory from user code, it must be killed. On 32-bit Windows, all memory above 1<<31 (top bit set) or above 3<<30 (top two bits set) is kernel memory. You should not assume any unallocated memory region is in the user space.
if an application thinks about using a memory region but doesn't tell the OS, the OS may allocate something else to that memory (OS: sure, your file is at 0x12341234; App: but I wanted to store my data there). You could tell the OS by touching the end of your array (which is unreliable anyways), but it's easier to just call an OS function. It's just a good idea that the function call is "give me 10MB of heap", not "give me 10MB of heap starting at 0x12345678"
If the application allocates memory by using it then it typically does not de-allocate at all. This can be problematic as the OS still has to hold the unused pages (but the Java Virtual Machine does not de-allocate either, so hey).
Different runs of a program results in different addresses for variables
This is called memory layout randomisation and is used, alongside of proper permissions (stack space is not executable), to make buffer overflow attacks much more difficult. You can still kill the app, but not execute arbitrary code.
Some links on memory allocation (e.g. in heap).
Do you mean, what algorithm the allocator uses? The easiest algorithm is to always allocate at the soonest available position and link from each memory block to the next and store the flag if it's a free block or used block. More advanced algorithms always allocate blocks at the size of a power of two or a multiple of some fixed size to prevent memory fragmentation (lots of small free blocks) or link the blocks in a different structures to find a free block of sufficient size faster.
An even simpler approach is to never de-allocate and just point to the first (and only) free block and holds its size. If the remaining space is too small, throw it away and ask the OS for a new one.
There's nothing magical about memory allocators. All they do is to:
ask the OS for a large region and
partition it to smaller chunks
without
wasting too much space or
taking too long.
Anyways, the Wikipedia article about memory allocation is http://en.wikipedia.org/wiki/Memory_management .
One interesting algorithm is called "(binary) buddy blocks". It holds several pools of a power-of-two size and splits them recursively into smaller regions. Each region is then either fully allocated, fully free or split in two regions (buddies) that are not both fully free. If it's split, then one byte suffices to hold the size of the largest free block within this block.

In what situations Static Allocation fares better than Dynamic Allocation?

I was going through some of the decisions made to make Xara Xtreme, an open source SVG graphics application. Their memory management decision was quite intriguing to me since I naively took it for granted that on-demand dynamic allocation as the way of writing object oriented application.
The explanation from the documentation is
How on earth can static allocations be efficient?
If you are used to large dynamic data structures, this may seem strange
to you. Firstly, all our objects (and
thus allocation size) are far smaller
(on average) than each dynamic area
allocation within a program such as
Impression. This means that though
there are likely to be many holes
within memory, they are small. Also,
we have far more allocated objects
within memory, and thus these holes
quickly get filled. Furthermore,
virtual memory managers will free up
any pages of memory that contain no
allocations and give this memory back
to the operating system so that it may
be used again (either by us, or by
another task).
We benefit greatly from
the fact that whenever we allocate
memory in this manner, we do not have
to move any memory about. This proved
a bottleneck in ArtWorks which also
had many small allocations being used
concurrently. more
In brief, the presence of plenty of small objects and the need to prevent memory move are the reasons given for choosing static allocation. I don't have clear understanding about the reasons mentioned.
Though this talks about static allocation, what I see from the cursory look at the code is that a block of memory is dynamically allocated at the application start and kept alive till the application ends, roughly simulating static allocation.
Could you explain in what situations Static Allocation fares better than on-demand Dynamic Allocation in order to consider it as the main mode of allocation in a serious applications?
It's quicker because you avoid the overhead of calling a system routine to manage your storage. malloc() maintains a heap, so every request requires a scan for an appropriately-sized block, possibly resizing the block, updating the block list to mark this block as used, etc. If you're allocating a lot of small objects, this overhead can be excessive. With static allocation you can create an allocation pool and just maintain a simple bitmap to show which areas are in use. This assumes that each object is the same size, so you commonly create one pool per object type.
In short, there's really no such thing as static allocation other than the space allocated for your functions themselves and other read-only kinds of memory. (Do an assemble-only "gcc -S" and look for all the memory blocks, if you're interested.) If you're making and breaking objects, you're dynamically allocating. That being said, there's nothing to stop you from tightly controlling the allocation mechanism itself.
That's what functions like mallinfo() and mallopt() do for controlling how malloc() does its magic. However, that might not even be good enough for you. If you know all your chunks are going to be the same size, you can allocate and deallocate much more efficiently. And if you know you have 3 sizes of stuff, you can keep 3 arenas of memory each with their own allocator.
On top of this, you have the situation at runtime where the process doesn't have enough room and needs to ask the os for more - that involves a system call that is more expensive than just incrementing an array index. On unix, it's usually brk() or sbrk() or the like. And that can take valuable time.
Another, rarer situation, would be if you need to multiply-allocate things. Like 3 threads need to share information and only when all 3 release it does it get freed. That's something nonstandard and not generally covered by typical mallopt() or even pthread-specific memory or mutex/semaphore-locked chunks.
So if you have high speed optimization issues or you are running on an embedded system where you need to squeeze all you can out of the available memory, then "static allocation", or at least controlling the allocation mechanism, may be the way to go.

When does the stack really overflow?

Is infinite recursion the only case or can it happen for other reasons?
Doesn't the stack size grow as needed same as heap?
Sorry if this question has been asked before, would appreciate links to them if that is the case.
I can't speak for all platforms, but as it happens, I've just spent some time working with Windows .exe files (I mean, actually studying the binary format of them - I know in a sense all of us here work with executable files ;) ). I'm betting that most other platforms have similar capabilities, but I'm not immediate familiar with them.
Part of the file format itself includes two values relevant to the current discussion:
typedef struct _IMAGE_OPTIONAL_HEADER {
...
DWORD SizeOfStackReserve;
DWORD SizeOfStackCommit;
...
} IMAGE_OPTIONAL_HEADER32, *PIMAGE_OPTIONAL_HEADER32;
From MSDN:
SizeOfStackReserve
The number of bytes to reserve for the
stack. Only the memory specified by
the SizeOfStackCommit member is
committed at load time; the rest is
made available one page at a time
until this reserve size is reached.
SizeOfStackCommit
The number of bytes to commit for the
stack.
In other words, the linker specifies a maximum size for the program's stack. If you hit the maximum size, you overflow - no matter how you hit the maximum size. You could write a simple program to do it in one line of code just by allocating a single stack variable (say, an array) that's bigger than the maximum stack size. Or you could do it via infinite (or finite, but very deep) recursion, or just by allocating too many stack variables.
The Microsoft linker sets this value to 1MB by default on X86 platforms (4MB on Itanium systems). This seems small on the face of it, for a modern system. However, more modern versions of Windows interpret these values slightly differently. Instead of completely limiting the stack, it limits the physical memory the stack will use. If your stack grows beyond this, virtual memory will get involved, so you should still be good... assuming you have enough virtual memory.
Remember, it is possible to run out of memory, even on modern systems with huge amounts of RAM and plenty of virtual memory on disk. You just need to allocate really big amounts of data.
So, long story short: is it possible to overflow the stack without infinite recursion? Definitely. Is it likely? Not really, unless you're allocating really huge objects.
The stack overflows when the stack pointer is pushed out of the memory block the operating system has allocated for the stack. Some operating systems will resize the stack as it grows (IIRC Linux does this) while in others the stack size is fixed at the start of the process or thread (IIRC Windows does this).
Possible reasons for overflowing the stack:
An unbounded number of stack frames (e.g. from unbounded recursion)
Attempting to allocate large blocks from the stack
Buffer overflows for buffers allocated on the stack
There are probably other reasons as well that I can't think of off the top of my head.
This question doesn't specify which stack is "the" stack. So, here are a few answers:
Call Stack
The call stack gets overflowed whenever the number of calls on the stack overruns the amount of memory it has. The most common way is infinite recursion, but it's quite possible to have recursion that's excessive but not infinite. For example, computing the Ackermann function naively will tax any computer.
Languages
Stack-based languages
Some languages, like Postscript and Forth, and some virtual machines, like the Java virtual machine, are stack-based. In these languages, it may be possible to make expressions so complex that they overflow the stack.
Context-free languages
Context-free languages are often implemented using a stack. If the strings for the code of these languages gets too complex, it's possible to overflow the stack.
On a laptop or desktop machine it may be unusual to overflow the stack without infinite (or very deeply nested) recursion when running from the main thread... however, stack overflows are not uncommon for:
Threaded code in which the thread has been allotted a small, fixed-sized stack.
Signal handling code in which the signal handling context has a small, fixed-sized stack.
Code executing on embedded devices, where memory is generally scarce.
As an example, if you register a signal handler using sigaction, if the signal handler does any complex (i.e. deeply nested operations) it is very easy to get a stack overflow on a number of operating systems, since signal handlers are usually allotted a small, fixed-sized stack. Similarly, if you spawn a thread with pthread_create, but you specify a small stacksize with pthread_attr_setstacksize, then it is very easy to attain a stack overflow. On very memory-limited devices such wireless sensors, it is an art to avoid stack overflows.
My day job involves a lot of work with LotusScript in Lotus Notes, which has fixed stack limits for various scopes. E.g. most variables in a procedure/function must fit in a 32kB stack, except that the content of class variables is stored on the heap.
If fixed-size variables exceed the stack size, code won't compile.
Run-time stack overflows can occur with recursion. This is easy to achieve in LotusScript as it limits recursion of any single function to a 32kB stack. I gave up on using a recursive QuickSort years ago because of this.
If your program exceeds its alloted stack space without any infinite recursion going on, then you're doing something wrong.
Though it can happen if you leave off some asterisks and try to pass some huge buffers by value.
The memory allocated for the stack does generally grow as needed within reasonable boundaries - I'm not sure what the upper limit is on various systems.

Resources