Stack size required for bios interrupt call - stack

I am working on a small bootloader for learning purposes. Is there any specification/information available about the (free) stack size required for a bios interrupt call?

before entering the interrupt handler, all registers are pushed to the stack along with the far return address, sum your registers sizes up and add the space needed to store the return address to get the minimal stack size .
take note that you will need some more space if you are pushing more data into the stack while in the interrupt handler

From http://www.o3one.org/hwdocs/bios_doc/pci_bios_21.pdf ("Calling Conventions" on page 3), it looks like the BIOS call can use up to 1024 bytes of stack space. My googling hasn't turned up any other sources.

I've noticed that if you are using int 0x13 you should have a stack that is at least 4096 bytes. Modern BIOSes often have an AHCI compatible int 0x13 handler, and since AHCI is quite complicated the BIOS int 0x13 requires a lot of stack space.
In the perfect world the BIOS should have it's own stack, but many BIOSes rely on the stack you provide.

The simple answer is that the stack that the BIOS used to make interrupt calls (including the int 13h to load the boot sector from the usb flash drive) prior to loading the boot sector is sufficient for boot sector use.
The happy answer is that the BIOS interrupts (except for the newer bloated PCI) are designed to execute in minimal space so there is no need to setup a stack in the boot sector.

Related

Operating Systems: Processes, Pagination and Memory Allocation doubts

I have several doubts about processes and memory management. List the main. I'm slowly trying to solve them by myself but I would still like some help from you experts =).
I understood that the data structures associated with a process are more or less these:
text, data, stack, kernel stack, heap, PCB.
If the process is created but the LTS decides to send it to secondary memory, are all the data structures copied for example on SSD or maybe just text and data (and PCB in kernel space)?
Pagination allows you to allocate processes in a non-contiguous way:
How does the kernel know if the process is trying to access an illegal memory area? After not finding the index on the page table, does the kernel realize that it is not even in virtual memory (secondary memory)? If so, is an interrupt (or exception) thrown? Is it handled immediately or later (maybe there was a process switch)?
If the processes are allocated non-contiguously, how does the kernel realize that there has been a stack overflow since the stack typically grows down and the heap up? Perhaps the kernel uses virtual addresses in PCBs as memory pointers that are contiguous for each process so at each function call it checks if the VIRTUAL pointer to the top of the stack has touched the heap?
How do programs generate their internal addresses? For example, in the case of virtual memory, everyone assumes starting from the address 0x0000 ... up to the address 0xffffff ... and is it then up to the kernel to proceed with the mapping?
How did the processes end? Is the system call exit called both in case of normal termination (finished last instruction) and in case of killing (by the parent process, kernel, etc.)? Does the process itself enter kernel mode and free up its associated memory?
Kernel schedulers (LTS, MTS, STS) when are they invoked? From what I understand there are three types of kernels:
separate kernel, below all processes.
the kernel runs inside the processes (they only change modes) but there are "process switching functions".
the kernel itself is based on processes but still everything is based on process switching functions.
I guess the number of pages allocated the text and data depend on the "length" of the code and the "global" data. On the other hand, is the number of pages allocated per heap and stack variable for each process? For example I remember that the JVM allows you to change the size of the stack.
When a running process wants to write n bytes in memory, does the kernel try to fill a page already dedicated to it and a new one is created for the remaining bytes (so the page table is lengthened)?
I really thank those who will help me.
Have a good day!
I think you have lots of misconceptions. Let's try to clear some of these.
If the process is created but the LTS decides to send it to secondary memory, are all the data structures copied for example on SSD or maybe just text and data (and PCB in kernel space)?
I don't know what you mean by LTS. The kernel can decide to send some pages to secondary memory but only on a page granularity. Meaning that it won't send a whole text segment nor a complete data segment but only a page or some pages to the hard-disk. Yes, the PCB is stored in kernel space and never swapped out (see here: Do Kernel pages get swapped out?).
How does the kernel know if the process is trying to access an illegal memory area? After not finding the index on the page table, does the kernel realize that it is not even in virtual memory (secondary memory)? If so, is an interrupt (or exception) thrown? Is it handled immediately or later (maybe there was a process switch)?
On x86-64, each page table entry has 12 bits reserved for flags. The first (right-most bit) is the present bit. On access to the page referenced by this entry, it tells the processor if it should raise a page-fault. If the present bit is 0, the processor raises a page-fault and calls an handler defined by the OS in the IDT (interrupt 14). Virtual memory is not secondary memory. It is not the same. Virtual memory doesn't have a physical medium to back it. It is a concept that is, yes implemented in hardware, but with logic not with a physical medium. The kernel holds a memory map of the process in the PCB. On page fault, if the access was not within this memory map, it will kill the process.
If the processes are allocated non-contiguously, how does the kernel realize that there has been a stack overflow since the stack typically grows down and the heap up? Perhaps the kernel uses virtual addresses in PCBs as memory pointers that are contiguous for each process so at each function call it checks if the VIRTUAL pointer to the top of the stack has touched the heap?
The processes are allocated contiguously in the virtual memory but not in physical memory. See my answer here for more info: Each program allocates a fixed stack size? Who defines the amount of stack memory for each application running?. I think stack overflow is checked with a page guard. The stack has a maximum size (8MB) and one page marked not present is left underneath to make sure that, if this page is accessed, the kernel is notified via a page-fault that it should kill the process. In itself, there can be no stack overflow attack in user mode because the paging mechanism already isolates different processes via the page tables. The heap has a portion of virtual memory reserved and it is very big. The heap can thus grow according to how much physical space you actually have to back it. That is the size of the swap file + RAM.
How do programs generate their internal addresses? For example, in the case of virtual memory, everyone assumes starting from the address 0x0000 ... up to the address 0xffffff ... and is it then up to the kernel to proceed with the mapping?
The programs assume an address (often 0x400000) for the base of the executable. Today, you also have ASLR where all symbols are kept in the executable and determined at load time of the executable. In practice, this is not done much (but is supported).
How did the processes end? Is the system call exit called both in case of normal termination (finished last instruction) and in case of killing (by the parent process, kernel, etc.)? Does the process itself enter kernel mode and free up its associated memory?
The kernel has a memory map for each process. When the process dies via abnormal termination, the memory map is crossed and cleared off of that process's use.
Kernel schedulers (LTS, MTS, STS) when are they invoked?
All your assumptions are wrong. The scheduler cannot be called otherwise than with a timer interrupt. The kernel isn't a process. There can be kernel threads but they are mostly created via interrupts. The kernel starts a timer at boot and, when there is a timer interrupt, the kernel calls the scheduler.
I guess the number of pages allocated the text and data depend on the "length" of the code and the "global" data. On the other hand, is the number of pages allocated per heap and stack variable for each process? For example I remember that the JVM allows you to change the size of the stack.
The heap and stack have portions of virtual memory reserved for them. The text/data segment start at 0x400000 and end wherever they need. The space reserved for them is really big in virtual memory. They are thus limited by the amount of physical memory available to back them. The JVM is another thing. The stack in JVM is not the real stack. The stack in JVM is probably heap because JVM allocates heap for all the program's needs.
When a running process wants to write n bytes in memory, does the kernel try to fill a page already dedicated to it and a new one is created for the remaining bytes (so the page table is lengthened)?
The kernel doesn't do that. On Linux, the libstdc++/libc C++/C implementation does that instead. When you allocate memory dynamically, the C++/C implementation keeps track of the allocated space so that it won't request a new page for a small allocation.
EDIT
Do compiled (and interpreted?) Programs only work with virtual addresses?
Yes they do. Everything is a virtual address once paging is enabled. Enabling paging is done via a control register set at boot by the kernel. The MMU of the processor will automatically read the page tables (among which some are cached) and will translate these virtual addresses to physical ones.
So do pointers inside PCBs also use virtual addresses?
Yes. For example, the PCB on Linux is the task_struct. It holds a field called pgd which is an unsigned long*. It will hold a virtual address and, when dereferenced, it will return the first entry of the PML4 on x86-64.
And since the virtual memory of each process is contiguous, the kernel can immediately recognize stack overflows.
The kernel doesn't recognize stack overflows. It will simply not allocate more pages to the stack then the maximum size of the stack which is a simple global variable in the Linux kernel. The stack is used with push pops. It cannot push more than 8 bytes so it is simply a matter of reserving a page guard for it to create page-faults on access.
however the scheduler is invoked from what I understand (at least in modern systems) with timer mechanisms (like round robin). It's correct?
Round-robin is not a timer mechanism. The timer is interacted with using memory mapped registers. These registers are detected using the ACPI tables at boot (see my answer here: https://cs.stackexchange.com/questions/141870/when-are-a-controllers-registers-loaded-and-ready-to-inform-an-i-o-operation/141918#141918). It works similarly to the answer I provided for USB (on the link I provided here). Round-robin is a scheduler priority scheme often called naive because it simply gives every process a time slice and executes them in order which is not currently used in the Linux kernel (I think).
I did not understand the last point. How is the allocation of new memory managed.
The allocation of new memory is done with a system call. See my answer here for more info: Who sets the RIP register when you call the clone syscall?.
The user mode process jumps into a handler for the system call by calling syscall in assembly. It jumps to an address specified at boot by the kernel in the LSTAR64 register. Then the kernel jumps to a function from assembly. This function will do the stuff the user mode process requires and return to the user mode process. This is often not done by the programmer but by the C++/C implementation (often called the standard library) that is a user mode library that is linked against dynamically.
The C++/C standard library will keep track of the memory it allocated by, itself, allocating some memory and by keeping records. Then, if you ask for a small allocation, it will use the pages it already allocated instead of requesting new ones using mmap (on Linux).

Change Stack size in Contiki

Is there a way to programmatically change the stack size in Contiki?
I know on Linux systems I'm able to call:
ulimit -s SIZE
But I'm currently using Contiki as a flashed binary, and don't really have access to a traditional terminal. I've tried executing the command from C using system() and popen() calls to no avail.
Perhaps there's a CFLAG or LDFLAG I can leverage? Or modifying something in the makefile?
FYI I'm flashing the binary to a Texas Instruments cc2650, which has a 32 bit processor.
CC2650 does not have a MPU (Memory Protection Unit), which implies that no one checks for the boundaries of the stack region during runtime, which in turn implies that there is no way to "reserve" stack in the same sense that stack memory is reserved on Linux.
Essentially, if you keep allocating new things on stack, the stack will keep to grow even after it reaches other memory regions - usually the .data region, which contains dynamically allocated memory, if any, and static/global variables. The growth of stack will corrupt the memory in those other regions in a way that you might even fail to notice, leading to hard-to-find bugs.
There are a couple of things to do. One is to reserve bigger stack memory during compile time. This will not limit the stack region, but will limit the extent of the data region. To do that, change the CC2650 linker script in cpu/cc26xx-cc13xx/cc26xx.ld:
_Min_Stack_Size = 0x100; /* 256 bytes by default for the stack */
The other thing is to use Contiki-NG recent revisions, which have built-in stackoverflow checks. There is still no way to change the stack region size during runtime, but you will get an error if stack overflow happens.

What is the real memory structure? [duplicate]

This question already has answers here:
Why do stacks typically grow downwards?
(11 answers)
Closed 5 years ago.
I have a stupid doubt about something related to memory. My doubt is: Why in memory, the higher addresses are considered in the "bottom", and the lowest addresses are considered in the "top"? Im going to explain in more detail:
The stack memory starts in high addresses and grows to lower addresses. So far this is what I understood, but why does the stack grow "up"? Why are the lower addresses located in the top of the memory?
I've seen various, contradictory memory structures: ones which consider the lower addresses at the bottom of the memory, and ones which consider the lower addresses at the top of the memory. Does it depend on the processor?
Thank you in advance.
It depends upon the instruction set architecture and the ABI. They both influence the organization (and growth direction) of the call stack.
Usually the call stack grows downwards, but there have been ISAs where that is not the case (see this). And some ISAs (e.g. IBM serie Z mainframes) don't have any hardware call stack (then, the call stack is just an ABI convention about register usage).
Most application software (e.g. your game, word processor, compiler, ...) are running above some operating system, in some process having some virtual address space (so in virtual memory).
Read some book on OSes, e.g. Operating Systems: Three Easy Pieces.
In practice (unless you code some operating system kernel managing virtual memory) you care mostly about the virtual address space (often made of numerous discontinuous segments). On Linux, use /proc/ (see proc(5)) to explore it (e.g. try cat /proc/$$/maps in your terminal). And notice that for a multi-threaded application each thread has its own call stack. Then "top" or "bottom" of the virtual address space don't really matter and don't have much sense.
If (as most people) you are writing (in any programming language other than the assembler) some application software above some OS, you don't care (as a developer) about real memory, but about virtual memory and resident set size. You don't care about the stack growth (it is managed by the OS, compiler, ISA, ... for the automatic variables of your code). You need to avoid stack overflow. It often happens that some pages (e.g. of your code segment[s]), perhaps those containing never used code, never get into RAM and stay paged out. And in practice most of the (virtual) memory of some process is not for its call stack: you usually allocate memory in the heap. My firefox browser (on my Linux desktop) has a virtual address space of 2.3 gigabytes (in more than a thousand of segments), but only 124 kilobytes of stack. Read about memory management. The call stack is often limited (e.g. to a few megabytes).

What is the address space in Go(lang)?

I try to understand the basics of concurrent programming in Go. Almost all articles use the term "address space", for example: "All goroutines share the same address space". What does it mean?
I've tried to understand the following topics from wiki, but it wasn't successful:
http://en.wikipedia.org/wiki/Virtual_memory
http://en.wikipedia.org/wiki/Memory_segmentation
http://en.wikipedia.org/wiki/Page_(computer_memory)
...
However at the moment it's difficult to understand for me, because my knowledges in areas like memory management and concurrent programming are really poor. There are many unknown words like segments, pages, relative/absolute addresses, VAS etc.
Could anybody explain to me the basics of the problem? May be there are some useful articles, that I can't find.
Golang spec:
A "go" statement starts the execution of a function call as an independent concurrent thread of control, or goroutine, within the same address space.
Could anybody explain to me the basics of the problem?
"Address space" is a generic term which can apply to many contexts:
Address spaces are created by combining enough uniquely identified qualifiers to make an address unambiguous (within a particular address space)
Dave Cheney's presentation "Five things that make Go fast" illustrates the main issue addressed by having goroutine within the same process address space: stack management.
Dave's qualifies the "address space", speaking first of thread:
Because a process switch can occur at any point in a process’ execution, the operating system needs to store the contents of all of these registers because it does not know which are currently in use.
This lead to the development of threads, which are conceptually the same as processes, but share the same memory space.
(so this is about memory)
Then Dave illustrates the stack within a process address space (the addresses managed by a process):
Traditionally inside the address space of a process,
the heap is at the bottom of memory, just above the program (text) and grows upwards.
The stack is located at the top of the virtual address space, and grows downwards.
See also "What and where are the stack and heap?".
The issue:
Because the heap and stack overwriting each other would be catastrophic, the operating system usually arranges to place an area of unwritable memory between the stack and the heap to ensure that if they did collide, the program will abort.
With threads, that can lead to restrict the heap size of a process:
as the number of threads in your program increases, the amount of available address space is reduced.
goroutine uses a different approach, while still sharing the same process address space:
what about the stack requirements of those goroutines ?
Instead of using guard pages, the Go compiler inserts a check as part of every function call to check if there is sufficient stack for the function to run. If there is not, the runtime can allocate more stack space.
Because of this check, a goroutines initial stack can be made much smaller, which in turn permits Go programmers to treat goroutines as cheap resources.
Go 1.3 introduces a new way of managing those stacks:
Instead of adding and removing additional stack segments, if the stack of a goroutine is too small, a new, larger, stack will be allocated.
The old stack’s contents are copied to the new stack, then the goroutine continues with its new larger stack.
After the first call to H the stack will be large enough that the check for available stack space will always succeed.
When you application runs on the RAM, addresses in RAM are allocated to your application by the memory manager. This is refered to as address space.
Concept:
the processor (CPU) executes instructions in a Fetch-Decode-Execute
cycle. It executes instructions in an applicaiton by fetching it to
the RAM (Random Acces Memory). This is done because it is very
in-efficient to get it all the way from disk. Some-one needs to keep
track of memory usage, so the operating system implements a memory
manager. Your appication, consists of some program, in your case this
is written in Go programming language. When you execute your script,
the OS executes the instructions in the above mentioned fashion.
Reading your post i can empathize. The terms you mentioned will become familiar to you as program more and more.
I first encountered these terms from the operating systems book, a.k.a the dinosaur book.
Hope this helps you.

When does the stack really overflow?

Is infinite recursion the only case or can it happen for other reasons?
Doesn't the stack size grow as needed same as heap?
Sorry if this question has been asked before, would appreciate links to them if that is the case.
I can't speak for all platforms, but as it happens, I've just spent some time working with Windows .exe files (I mean, actually studying the binary format of them - I know in a sense all of us here work with executable files ;) ). I'm betting that most other platforms have similar capabilities, but I'm not immediate familiar with them.
Part of the file format itself includes two values relevant to the current discussion:
typedef struct _IMAGE_OPTIONAL_HEADER {
...
DWORD SizeOfStackReserve;
DWORD SizeOfStackCommit;
...
} IMAGE_OPTIONAL_HEADER32, *PIMAGE_OPTIONAL_HEADER32;
From MSDN:
SizeOfStackReserve
The number of bytes to reserve for the
stack. Only the memory specified by
the SizeOfStackCommit member is
committed at load time; the rest is
made available one page at a time
until this reserve size is reached.
SizeOfStackCommit
The number of bytes to commit for the
stack.
In other words, the linker specifies a maximum size for the program's stack. If you hit the maximum size, you overflow - no matter how you hit the maximum size. You could write a simple program to do it in one line of code just by allocating a single stack variable (say, an array) that's bigger than the maximum stack size. Or you could do it via infinite (or finite, but very deep) recursion, or just by allocating too many stack variables.
The Microsoft linker sets this value to 1MB by default on X86 platforms (4MB on Itanium systems). This seems small on the face of it, for a modern system. However, more modern versions of Windows interpret these values slightly differently. Instead of completely limiting the stack, it limits the physical memory the stack will use. If your stack grows beyond this, virtual memory will get involved, so you should still be good... assuming you have enough virtual memory.
Remember, it is possible to run out of memory, even on modern systems with huge amounts of RAM and plenty of virtual memory on disk. You just need to allocate really big amounts of data.
So, long story short: is it possible to overflow the stack without infinite recursion? Definitely. Is it likely? Not really, unless you're allocating really huge objects.
The stack overflows when the stack pointer is pushed out of the memory block the operating system has allocated for the stack. Some operating systems will resize the stack as it grows (IIRC Linux does this) while in others the stack size is fixed at the start of the process or thread (IIRC Windows does this).
Possible reasons for overflowing the stack:
An unbounded number of stack frames (e.g. from unbounded recursion)
Attempting to allocate large blocks from the stack
Buffer overflows for buffers allocated on the stack
There are probably other reasons as well that I can't think of off the top of my head.
This question doesn't specify which stack is "the" stack. So, here are a few answers:
Call Stack
The call stack gets overflowed whenever the number of calls on the stack overruns the amount of memory it has. The most common way is infinite recursion, but it's quite possible to have recursion that's excessive but not infinite. For example, computing the Ackermann function naively will tax any computer.
Languages
Stack-based languages
Some languages, like Postscript and Forth, and some virtual machines, like the Java virtual machine, are stack-based. In these languages, it may be possible to make expressions so complex that they overflow the stack.
Context-free languages
Context-free languages are often implemented using a stack. If the strings for the code of these languages gets too complex, it's possible to overflow the stack.
On a laptop or desktop machine it may be unusual to overflow the stack without infinite (or very deeply nested) recursion when running from the main thread... however, stack overflows are not uncommon for:
Threaded code in which the thread has been allotted a small, fixed-sized stack.
Signal handling code in which the signal handling context has a small, fixed-sized stack.
Code executing on embedded devices, where memory is generally scarce.
As an example, if you register a signal handler using sigaction, if the signal handler does any complex (i.e. deeply nested operations) it is very easy to get a stack overflow on a number of operating systems, since signal handlers are usually allotted a small, fixed-sized stack. Similarly, if you spawn a thread with pthread_create, but you specify a small stacksize with pthread_attr_setstacksize, then it is very easy to attain a stack overflow. On very memory-limited devices such wireless sensors, it is an art to avoid stack overflows.
My day job involves a lot of work with LotusScript in Lotus Notes, which has fixed stack limits for various scopes. E.g. most variables in a procedure/function must fit in a 32kB stack, except that the content of class variables is stored on the heap.
If fixed-size variables exceed the stack size, code won't compile.
Run-time stack overflows can occur with recursion. This is easy to achieve in LotusScript as it limits recursion of any single function to a 32kB stack. I gave up on using a recursive QuickSort years ago because of this.
If your program exceeds its alloted stack space without any infinite recursion going on, then you're doing something wrong.
Though it can happen if you leave off some asterisks and try to pass some huge buffers by value.
The memory allocated for the stack does generally grow as needed within reasonable boundaries - I'm not sure what the upper limit is on various systems.

Resources