On computer memory, say IA32, what is the range of stack in general? I know an address like 0xffff1234 is probably on the stack. Is it possible for the stack to grow to 0x0800abcd, for example? How about the heap? I know the heap address is always lower than the stack address, but what is normally its range? Also what is the area below heap?
The stack - The memory the program uses to actually run the program. This contains local variables, call-back data (for example when you call a function, the stack stores the state and place you were in the code before you entered the new function), and some other little things of that nature. You usually don't control the stack directly, the variables and data are destroyed, created when you move in and out function scopes.
The heap - The "dynamic" memory of the program. Each time you create a new object or variable dynamically, it is stored on the heap. This memory is controlled by the programmer directly, you are supposed to take care of the creation AND deletion of the objects there.
Thanks a lot!
Stack:
You can define the size of your stack in linking time.
As I know, windows app default stack size is 2MB.
You can change the size of stack in your project setting. But when App is built, stack size is fixed.
And OS would set guard page for stack overflow. If any operation try to access the guard page would trigger EXCEPTION.
Heap:
Default heap size i guess also can be changed in project settings.
Because in your App, you can create your own heap, or use CRT heap, Win32 heap. So there should be lots of heaps.
When your try to allocate memory, Heap Manager based on algorithm to allocate memory. If there's not enough memory, Heap Manager would apply for memory from Virtual Memory Manager. Until there's not enough memory in User Address Space, throw exception: Out of Memory.
There's several definitions: HeapNode, HeapSegment, LFH, LEA, BEA.
And you can use Windbg: !heap -s, !heap -a, these commands to check the structure of Windows Heap.
Related
I have several doubts about processes and memory management. List the main. I'm slowly trying to solve them by myself but I would still like some help from you experts =).
I understood that the data structures associated with a process are more or less these:
text, data, stack, kernel stack, heap, PCB.
If the process is created but the LTS decides to send it to secondary memory, are all the data structures copied for example on SSD or maybe just text and data (and PCB in kernel space)?
Pagination allows you to allocate processes in a non-contiguous way:
How does the kernel know if the process is trying to access an illegal memory area? After not finding the index on the page table, does the kernel realize that it is not even in virtual memory (secondary memory)? If so, is an interrupt (or exception) thrown? Is it handled immediately or later (maybe there was a process switch)?
If the processes are allocated non-contiguously, how does the kernel realize that there has been a stack overflow since the stack typically grows down and the heap up? Perhaps the kernel uses virtual addresses in PCBs as memory pointers that are contiguous for each process so at each function call it checks if the VIRTUAL pointer to the top of the stack has touched the heap?
How do programs generate their internal addresses? For example, in the case of virtual memory, everyone assumes starting from the address 0x0000 ... up to the address 0xffffff ... and is it then up to the kernel to proceed with the mapping?
How did the processes end? Is the system call exit called both in case of normal termination (finished last instruction) and in case of killing (by the parent process, kernel, etc.)? Does the process itself enter kernel mode and free up its associated memory?
Kernel schedulers (LTS, MTS, STS) when are they invoked? From what I understand there are three types of kernels:
separate kernel, below all processes.
the kernel runs inside the processes (they only change modes) but there are "process switching functions".
the kernel itself is based on processes but still everything is based on process switching functions.
I guess the number of pages allocated the text and data depend on the "length" of the code and the "global" data. On the other hand, is the number of pages allocated per heap and stack variable for each process? For example I remember that the JVM allows you to change the size of the stack.
When a running process wants to write n bytes in memory, does the kernel try to fill a page already dedicated to it and a new one is created for the remaining bytes (so the page table is lengthened)?
I really thank those who will help me.
Have a good day!
I think you have lots of misconceptions. Let's try to clear some of these.
If the process is created but the LTS decides to send it to secondary memory, are all the data structures copied for example on SSD or maybe just text and data (and PCB in kernel space)?
I don't know what you mean by LTS. The kernel can decide to send some pages to secondary memory but only on a page granularity. Meaning that it won't send a whole text segment nor a complete data segment but only a page or some pages to the hard-disk. Yes, the PCB is stored in kernel space and never swapped out (see here: Do Kernel pages get swapped out?).
How does the kernel know if the process is trying to access an illegal memory area? After not finding the index on the page table, does the kernel realize that it is not even in virtual memory (secondary memory)? If so, is an interrupt (or exception) thrown? Is it handled immediately or later (maybe there was a process switch)?
On x86-64, each page table entry has 12 bits reserved for flags. The first (right-most bit) is the present bit. On access to the page referenced by this entry, it tells the processor if it should raise a page-fault. If the present bit is 0, the processor raises a page-fault and calls an handler defined by the OS in the IDT (interrupt 14). Virtual memory is not secondary memory. It is not the same. Virtual memory doesn't have a physical medium to back it. It is a concept that is, yes implemented in hardware, but with logic not with a physical medium. The kernel holds a memory map of the process in the PCB. On page fault, if the access was not within this memory map, it will kill the process.
If the processes are allocated non-contiguously, how does the kernel realize that there has been a stack overflow since the stack typically grows down and the heap up? Perhaps the kernel uses virtual addresses in PCBs as memory pointers that are contiguous for each process so at each function call it checks if the VIRTUAL pointer to the top of the stack has touched the heap?
The processes are allocated contiguously in the virtual memory but not in physical memory. See my answer here for more info: Each program allocates a fixed stack size? Who defines the amount of stack memory for each application running?. I think stack overflow is checked with a page guard. The stack has a maximum size (8MB) and one page marked not present is left underneath to make sure that, if this page is accessed, the kernel is notified via a page-fault that it should kill the process. In itself, there can be no stack overflow attack in user mode because the paging mechanism already isolates different processes via the page tables. The heap has a portion of virtual memory reserved and it is very big. The heap can thus grow according to how much physical space you actually have to back it. That is the size of the swap file + RAM.
How do programs generate their internal addresses? For example, in the case of virtual memory, everyone assumes starting from the address 0x0000 ... up to the address 0xffffff ... and is it then up to the kernel to proceed with the mapping?
The programs assume an address (often 0x400000) for the base of the executable. Today, you also have ASLR where all symbols are kept in the executable and determined at load time of the executable. In practice, this is not done much (but is supported).
How did the processes end? Is the system call exit called both in case of normal termination (finished last instruction) and in case of killing (by the parent process, kernel, etc.)? Does the process itself enter kernel mode and free up its associated memory?
The kernel has a memory map for each process. When the process dies via abnormal termination, the memory map is crossed and cleared off of that process's use.
Kernel schedulers (LTS, MTS, STS) when are they invoked?
All your assumptions are wrong. The scheduler cannot be called otherwise than with a timer interrupt. The kernel isn't a process. There can be kernel threads but they are mostly created via interrupts. The kernel starts a timer at boot and, when there is a timer interrupt, the kernel calls the scheduler.
I guess the number of pages allocated the text and data depend on the "length" of the code and the "global" data. On the other hand, is the number of pages allocated per heap and stack variable for each process? For example I remember that the JVM allows you to change the size of the stack.
The heap and stack have portions of virtual memory reserved for them. The text/data segment start at 0x400000 and end wherever they need. The space reserved for them is really big in virtual memory. They are thus limited by the amount of physical memory available to back them. The JVM is another thing. The stack in JVM is not the real stack. The stack in JVM is probably heap because JVM allocates heap for all the program's needs.
When a running process wants to write n bytes in memory, does the kernel try to fill a page already dedicated to it and a new one is created for the remaining bytes (so the page table is lengthened)?
The kernel doesn't do that. On Linux, the libstdc++/libc C++/C implementation does that instead. When you allocate memory dynamically, the C++/C implementation keeps track of the allocated space so that it won't request a new page for a small allocation.
EDIT
Do compiled (and interpreted?) Programs only work with virtual addresses?
Yes they do. Everything is a virtual address once paging is enabled. Enabling paging is done via a control register set at boot by the kernel. The MMU of the processor will automatically read the page tables (among which some are cached) and will translate these virtual addresses to physical ones.
So do pointers inside PCBs also use virtual addresses?
Yes. For example, the PCB on Linux is the task_struct. It holds a field called pgd which is an unsigned long*. It will hold a virtual address and, when dereferenced, it will return the first entry of the PML4 on x86-64.
And since the virtual memory of each process is contiguous, the kernel can immediately recognize stack overflows.
The kernel doesn't recognize stack overflows. It will simply not allocate more pages to the stack then the maximum size of the stack which is a simple global variable in the Linux kernel. The stack is used with push pops. It cannot push more than 8 bytes so it is simply a matter of reserving a page guard for it to create page-faults on access.
however the scheduler is invoked from what I understand (at least in modern systems) with timer mechanisms (like round robin). It's correct?
Round-robin is not a timer mechanism. The timer is interacted with using memory mapped registers. These registers are detected using the ACPI tables at boot (see my answer here: https://cs.stackexchange.com/questions/141870/when-are-a-controllers-registers-loaded-and-ready-to-inform-an-i-o-operation/141918#141918). It works similarly to the answer I provided for USB (on the link I provided here). Round-robin is a scheduler priority scheme often called naive because it simply gives every process a time slice and executes them in order which is not currently used in the Linux kernel (I think).
I did not understand the last point. How is the allocation of new memory managed.
The allocation of new memory is done with a system call. See my answer here for more info: Who sets the RIP register when you call the clone syscall?.
The user mode process jumps into a handler for the system call by calling syscall in assembly. It jumps to an address specified at boot by the kernel in the LSTAR64 register. Then the kernel jumps to a function from assembly. This function will do the stuff the user mode process requires and return to the user mode process. This is often not done by the programmer but by the C++/C implementation (often called the standard library) that is a user mode library that is linked against dynamically.
The C++/C standard library will keep track of the memory it allocated by, itself, allocating some memory and by keeping records. Then, if you ask for a small allocation, it will use the pages it already allocated instead of requesting new ones using mmap (on Linux).
In a typical memory layout there are 4 items:
code/text (where the compiled code of the program itself resides)
data
stack
heap
I am new to memory layouts so I am wondering if v8, which is a JIT compiler and dynamically generates code, stores this code in the "code" segment of the memory, or just stores it in the heap along with everything else. I'm not sure if the operating system gives you access to the code/text so not sure if this is a dumb question.
The below is true for the major operating systems running on the major CPUs in common use today. Things will differ on old or some embedded operating systems (in particular things are a lot simpler on operating systems without virtual memory) or when running code without an OS or on CPUs with no support for memory protection.
The picture in your question is a bit of a simplification. One thing it does not show is that (virtual) memory is made up of pages provided to you by the operating system. Each page has its own permissions controlling whether your process can read, write and/or execute the data in that page.
The text section of a binary will be loaded onto pages that are executable, but not writable. The read-only data section will be loaded onto pages that are neither writable nor executable. All other memory in your picture ((un)initialized data, heap, stack) will be stored on pages that are writable, but not executable.
These permissions prevent security flaws (such as buffer overruns) that could otherwise allow attackers to execute arbitrary code by making the program jump into code provided by the attacker or letting the attacker overwrite code in the text section.
Now the problem with these permissions, with regards to JIT compilation, is that you can't execute your JIT-compiled code: if you store it on the stack or the heap (or within a global variable), it won't be on an executable page, so the program will crash when you try to jump into the code. If you try to store it in the text area (by making use of left-over memory on the last page or by overwriting parts of the JIT-compilers code), the program will crash because you're trying to write to read-only memory.
But thankfully operating systems allow you to change the permissions of a page (on POSIX-systems this can be done using mprotect and on Windows using VirtualProtect). So your first idea might be to store the generated code on the heap and then simply make the containing pages executable. However this can be somewhat problematic: VirtualProtect and some implementations of mprotect require a pointer to the beginning of a page, but your array does not necessarily start at the beginning of a page if you allocated it using malloc (or new or your language's equivalent). Further your array may share a page with other data, which you don't want to be executable.
To prevent these issues, you can use functions, such as mmap on Unix-like operating systems and VirtualAlloc on Windows, that give you pages of memory "to yourself". These functions will allocate enough pages to contain as much memory as you requested and return a pointer to the beginning of that memory (which will be at the beginning of the first page). These pages will not be available to malloc. That is, even if you array is significantly smaller than the size of a page on your OS, the page will only be used to store your array - a subsequent call to malloc will not return a pointer to memory in that page.
So the way that most JIT-compilers work is that they allocate read-write memory using mmap or VirtualAlloc, copy the generated machine instructions into that memory, use mprotect or VirtualProtect to make the memory executable and non-writable (for security reasons you never want memory to be executable and writable at the same time if you can avoid it) and then jump into it. In terms of its (virtual) address, the memory will be part of the heap's area of the memory, but it will be separate from the heap in the sense that it won't be managed by malloc and free.
Heap and stack are the memory regions where programs can allocate at runtime. This is not specific to V8, or JIT compilers. For more detail, I humbly suggest that you read whatever book that illustration came from ;-)
So I understand what a stack overflow is, when memory collides (and the title of this website) but what I do not understand is why new entries to the stack are in a decremental memory address. Why are they not in a random memory address, would it not make more sense so that memory collision is not an issue? I am guessing there is some sort of optimizing reason behind that?
** EDIT **
What I did not realize is a stack is given x amount of address space. Makes sense now but brings me to a follow-up question. Can I explicitly state how much memory I want to allocate to a stack?
"Memory collides" would better suit the term of "buffer overflow", where you write outside of the predestined space, but where it is likely to be within a different allocated memory block.
A stack overflow is not about writing outside of one's memory allocation into another memory allocation. It's just about writing outside of one's stack memory allocation. Most likely outside of the stack there's a guard memory page, that is not allocated for anything and which causes a fault on a read or write attempt.
And assigning a random address for each value pushed on the stack makes it hard to find data on the stack (and it's not a stack anymore). When the compiler or programmer knows that subsequent elements occupy subsequent addresses, then it's easy to compute those addresses just from the base pointer of the stack frame.
The answer to this question is probably complex, but basically stack operations are considered to be very primitive functions that the processor does as part of normal execution of code. (Saving return addresses and other stuff.)
So where do you put the memory management code? Where do you track the allocated addresses or add code to allocate new addresses? There really isn't anywhere to do this as these are basic operations performed by the processor itself.
Similar to the memory that holds the code itself, the stack is assumed to be setup before the code runs (and pointed to by the stack register). There really isn't any place to add complex memory management to stack memory. And so, yes, if not enough memory was provided, the stack will overflow.
Stack overflow is when you have used up all available stack space. The space available for the stack is, in most cases just an arbitrary limit chosen by the system designers. It is possible to alter this, but on modern systems, it's not really an issue - code that needs several megabytes of stack, unless the system is REALLY huge, is probably not correctly designed.
The stack grows towards zero from "custom" - it has to go in a defined direction or it would be very hard to follow what is going on, and lower adddress is just as good as higher address. It used to be that stack and heap grew towards each other, which would allow code that uses a lot of stack and not so much heap to work in the same amount of memory as something that uses a smaller amount of stack and a larger amount of heap. But these days, there is typically enough memory (space) that the heap can be defined to be somewhere completely separate from the stack. Instead the stack overflow is detected by having a region of "reserved" memory just at the top of the stack that is not usable - so the OS gets a "trap" for using memory that isn't available, and the application can be killed.
In modern-day operating systems, memory is available as an abstracted resource. A process is exposed to a virtual address space (which is independent from address space of all other processes) and a whole mechanism exists for mapping any virtual address to some actual physical address.
My doubt is:
If each process has its own address space, then it should be free to access any address in the same. So apart from permission restricted sections like that of .data, .bss, .text etc, one should be free to change value at any address. But this usually gives segmentation fault, why?
For acquiring the dynamic memory, we need to do a malloc. If the whole virtual space is made available to a process, then why can't it directly access it?
Different runs of a program results in different addresses for variables (both on stack and heap). Why is it so, when the environments for each run is same? Does it not affect the amount of addressable memory available for usage? (Does it have something to do with address space randomization?)
Some links on memory allocation (e.g. in heap).
The data available at different places is very confusing, as they talk about old and modern times, often not distinguishing between them. It would be helpful if someone could clarify the doubts while keeping modern systems in mind, say Linux.
Thanks.
Technically, the operating system is able to allocate any memory page on access, but there are important reasons why it shouldn't or can't:
different memory regions serve different purposes.
code. It can be read and executed, but shouldn't be written to.
literals (strings, const arrays). This memory is read-only and should be.
the heap. It can be read and written, but not executed.
the thread stack. There is no reason for two threads to access each other's stack, so the OS might as well forbid that. Moreover, the tread stack can be de-allocated when the tread ends.
memory-mapped files. Any changes to this region should affect a specific file. If the file is open for reading, the same memory page may be shared between processes because it's read-only.
the kernel space. Normally the application should not (or can not) access that region - only kernel code can. It's basically a scratch space for the kernel and it's shared between processes. The network buffer may reside there, so that it's always available for writes, no matter when the packet arrives.
...
The OS might assume that all unrecognised memory access is an attempt to allocate more heap space, but:
if an application touches the kernel memory from user code, it must be killed. On 32-bit Windows, all memory above 1<<31 (top bit set) or above 3<<30 (top two bits set) is kernel memory. You should not assume any unallocated memory region is in the user space.
if an application thinks about using a memory region but doesn't tell the OS, the OS may allocate something else to that memory (OS: sure, your file is at 0x12341234; App: but I wanted to store my data there). You could tell the OS by touching the end of your array (which is unreliable anyways), but it's easier to just call an OS function. It's just a good idea that the function call is "give me 10MB of heap", not "give me 10MB of heap starting at 0x12345678"
If the application allocates memory by using it then it typically does not de-allocate at all. This can be problematic as the OS still has to hold the unused pages (but the Java Virtual Machine does not de-allocate either, so hey).
Different runs of a program results in different addresses for variables
This is called memory layout randomisation and is used, alongside of proper permissions (stack space is not executable), to make buffer overflow attacks much more difficult. You can still kill the app, but not execute arbitrary code.
Some links on memory allocation (e.g. in heap).
Do you mean, what algorithm the allocator uses? The easiest algorithm is to always allocate at the soonest available position and link from each memory block to the next and store the flag if it's a free block or used block. More advanced algorithms always allocate blocks at the size of a power of two or a multiple of some fixed size to prevent memory fragmentation (lots of small free blocks) or link the blocks in a different structures to find a free block of sufficient size faster.
An even simpler approach is to never de-allocate and just point to the first (and only) free block and holds its size. If the remaining space is too small, throw it away and ask the OS for a new one.
There's nothing magical about memory allocators. All they do is to:
ask the OS for a large region and
partition it to smaller chunks
without
wasting too much space or
taking too long.
Anyways, the Wikipedia article about memory allocation is http://en.wikipedia.org/wiki/Memory_management .
One interesting algorithm is called "(binary) buddy blocks". It holds several pools of a power-of-two size and splits them recursively into smaller regions. Each region is then either fully allocated, fully free or split in two regions (buddies) that are not both fully free. If it's split, then one byte suffices to hold the size of the largest free block within this block.
Why do assembly languages use both a stack and a heap? They seem redundant.
They're not redundant. Each of them has strengths and weaknesses: A stack is faster if used right, because memory allocation is trivial (push / pop). The downside is that you can only add and remove items at the top (hence the name, stack). Also, total stack space is limited, and when you run out, you have a... well, stack overflow. The heap, by contrast, allows random allocation and deallocation, and you can store large amounts of data there, but the downside is that allocation carries more overhead - for each allocated block of memory, a suitable free portion must be found, and in the long run, fragmentation of the free space needs to be avoided, and the system must track where the free blocks are.
You use the stack to pass around small short-lived values, e.g. local counter variables, function arguments, return values, etc.; these lend themselves to push/pop allocation style. For larger or long-lived data structures, you use the heap.
You could certainly construct a computing system that utilised either one of them as its only memory model. However, they both have rather different properties each with its own good and bad points. Most systems utilise both so as to get the benefits from each of them.
Stacks
A stack can be thought of as a pile of plates, you write a value on a plate and put it on the top of the stack this is called a push operation and stores a value on the stack. You can obviously also remove the top plate from the stack this is called a pop operation. But new allocations must always be at the top of the stack.
The stack tend to be used for local variables and passing values between functions. Generally stacks have the following awesome properties:
Requires only a handful of pointers to manage
Very easy to implement in hardware, most processors have built in hardware support for a stack making it even faster.
Very quick to allocate memory
The problem with the stack comes from the fact items can only be added/removed from the top of the stack. Now this makes great sense when traversing up and down through function calls: pop functions inputs from the stack, allocate space for local variables on the stack, run function, clear local variables from the top of the stack and push the return value onto the stack. If on the other hand I want to allocate some memory and say pass it to another thread or in general free it far away from where it was allocated all of a sudden I have a problem, the stack is not in the correct position when I want to free the memory.
You could say the stack facilitates fast sequential memory allocation.
Heap
Now the heap is different each allocation is generally tracked separately. This causes a lot of overhead for allocations and deallocations, but each one can be handled independently of other memory allocations, well until you run out of memory.
There are numerous algorithms for accomplishing this and it is probably a bit unwise to twitter on about them here but here is a link that talks about a few good simple heap allocation algorithms: Alternatives to malloc and new
So the heap facilitates random memory allocation but this comes with a runtime penalty, however that penalty is often small that what would be incurred if you had to handle the situation using just the stack.
It is about the memory handling and managing.
There are different type of registers of x86 architectures.
There are possibilities of hardware supported memory management on x86 architecture and so on.
Stack is used by instruction pointer, Heap is for data segment in some applications.
To read more I advice you read the following links:
http://en.wikipedia.org/wiki/Data_segment
http://en.wikipedia.org/wiki/X86_memory_segmentation
"A memory model allows a compiler to perform many important
optimizations" - Wikipedia