What is a stack pointer used for in microprocessors? - stack

I am preparing for a microprocessor exam. If the use of a program counter is to hold the address of the next instruction, what is use of stack pointer?

A stack is a LIFO data structure (last in, first out, meaning last entry you push on to the stack is the first one you get back when you pop). It is typically used to hold stack frames (bits of the stack that belong to the current function).
This may include, but is not limited to:
the return address.
a place for a return value.
passed parameters.
local variables.
You push items onto the stack and pop them off. In a microprocessor, the stack can be used for both user data (such as local variables and passed parameters) and CPU data (such as return addresses when calling subroutines).
The actual implementation of a stack depends on the microprocessor architecture. It can grow up or down in memory and can move either before or after the push/pop operations.
Operation which typically affect the stack are:
subroutine calls and returns.
interrupt calls and returns.
code explicitly pushing and popping entries.
direct manipulation of the stack pointer register, sp.
Consider the following program in my (fictional) assembly language:
Addr Opcodes Instructions ; Comments
---- -------- -------------- ----------
; 1: pc<-0000, sp<-8000
0000 01 00 07 load r0,7 ; 2: pc<-0003, r0<-7
0003 02 00 push r0 ; 3: pc<-0005, sp<-7ffe, (sp:7ffe)<-0007
0005 03 00 00 call 000b ; 4: pc<-000b, sp<-7ffc, (sp:7ffc)<-0008
0008 04 00 pop r0 ; 7: pc<-000a, r0<-(sp:7ffe[0007]), sp<-8000
000a 05 halt ; 8: pc<-000a
000b 06 01 02 load r1,[sp+2] ; 5: pc<-000e, r1<-(sp+2:7ffe[0007])
000e 07 ret ; 6: pc<-(sp:7ffc[0008]), sp<-7ffe
Now let's follow the execution, describing the steps shown in the comments above:
This is the starting condition where pc (the program counter) is 0 and sp is 8000 (all these numbers are hexadecimal).
This simply loads register r0 with the immediate value 7 and moves pc to the next instruction (I'll assume that you understand the default behavior will be to move to the next instruction unless otherwise specified).
This pushes r0 onto the stack by reducing sp by two then storing the value of the register to that location.
This calls a subroutine. What would have been pc in the next step is pushed on to the stack in a similar fashion to r0 in the previous step, then pc is set to its new value. This is no different to a user-level push other than the fact it's done more as a system-level thing.
This loads r1 from a memory location calculated from the stack pointer - it shows a way to pass parameters to functions.
The return statement extracts the value from where sp points and loads it into pc, adjusting sp up at the same time. This is like a system-level pop instruction (see next step).
Popping r0 off the stack involves extracting the value from where sp currently points, then adjusting sp up.
The halt instruction simply leaves pc where it is, an infinite loop of sorts.
Hopefully from that description, it will become clear. Bottom line is: a stack is useful for storing state in a LIFO way and this is generally ideal for the way most microprocessors do subroutine calls.
Unless you're a SPARC of course, in which case you use a circular buffer for your stack :-)
Update: Just to clarify the steps taken when pushing and popping values in the above example (whether explicitly or by call/return), see the following examples:
LOAD R0,7
PUSH R0
Adjust sp Store val
sp-> +--------+ +--------+ +--------+
| xxxx | sp->| xxxx | sp->| 0007 |
| | | | | |
| | | | | |
| | | | | |
+--------+ +--------+ +--------+
POP R0
Get value Adjust sp
+--------+ +--------+ sp->+--------+
sp-> | 0007 | sp->| 0007 | | 0007 |
| | | | | |
| | | | | |
| | | | | |
+--------+ +--------+ +--------+

The stack pointer stores the address of the most recent entry that was pushed onto the stack.
To push a value onto the stack, the stack pointer is incremented to point to the next physical memory address, and the new value is copied to that address in memory.
To pop a value from the stack, the value is copied from the address of the stack pointer, and the stack pointer is decremented, pointing it to the next available item in the stack.
The most typical use of a hardware stack is to store the return address of a subroutine call. When the subroutine is finished executing, the return address is popped off the top of the stack and placed in the Program Counter register, causing the processor to resume execution at the next instruction following the call to the subroutine.
http://en.wikipedia.org/wiki/Stack_%28data_structure%29#Hardware_stacks

You got more preparing [for the exam] to do ;-)
The Stack Pointer is a register which holds the address of the next available spot on the stack.
The stack is a area in memory which is reserved to store a stack, that is a LIFO (Last In First Out) type of container, where we store the local variables and return address, allowing a simple management of the nesting of function calls in a typical program.
See this Wikipedia article for a basic explanation of the stack management.

For 8085: Stack pointer is a special purpose 16-bit register in the Microprocessor, which holds the address of the top of the stack.
The stack pointer register in a computer is made available for general purpose use by programs executing at lower privilege levels than interrupt handlers. A set of instructions in such programs, excluding stack operations, stores data other than the stack pointer, such as operands, and the like, in the stack pointer register. When switching execution to an interrupt handler on an interrupt, return address data for the currently executing program is pushed onto a stack at the interrupt handler's privilege level. Thus, storing other data in the stack pointer register does not result in stack corruption. Also, these instructions can store data in a scratch portion of a stack segment beyond the current stack pointer.
Read this one for more info.
General purpose use of a stack pointer register

The Stack is an area of memory for keeping temporary data. Stack is used by the CALL instruction to keep the return address for procedures The return RET instruction gets this value from the stack and returns to that offset. The same thing happens when an INT instruction calls an interrupt. It stores in the Stack the flag register, code segment and offset. The IRET instruction is used to return from interrupt call.
The Stack is a Last In First Out (LIFO) memory. Data is placed onto the Stack with a PUSH instruction and removed with a POP instruction. The Stack memory is maintained by two registers: the Stack Pointer (SP) and the Stack Segment (SS) register. When a word of data is PUSHED onto the stack the the High order 8-bit Byte is placed in location SP-1 and the Low 8-bit Byte is placed in location SP-2. The SP is then decremented by 2. The SP addds to the (SS x 10H) register, to form the physical stack memory address. The reverse sequence occurs when data is POPPED from the Stack. When a word of data is POPPED from the stack the the High order 8-bit Byte is obtained in location SP-1 and the Low 8-bit Byte is obtained in location SP-2. The SP is then incremented by 2.

The stack pointer holds the address to the top of the stack. A stack allows functions to pass arguments stored on the stack to each other, and to create scoped variables. Scope in this context means that the variable is popped of the stack when the stack frame is gone, and/or when the function returns. Without a stack, you would need to use explicit memory addresses for everything. That would make it impossible (or at least severely difficult) to design high-level programming languages for the architecture.
Also, each CPU mode usually have its own banked stack pointer. So when exceptions occur (interrupts for example), the exception handler routine can use its own stack without corrupting the user process.

Should you ever crave deeper understanding, I heartily recommend Patterson and Hennessy as an intro and Hennessy and Patterson as an intermediate to advanced text. They're pricey, but truly non-pareil; I just wish either or both were available when I got my Masters' degree and entered the workforce designing chips, systems, and parts of system software for them (but, alas!, that was WAY too long ago;-). Stack pointers are so crucial (and the distinction between a microprocessor and any other kind of CPU so utterly meaningful in this context... or, for that matter, in ANY other context, in the last few decades...!-) that I doubt anything but a couple of thorough from-the-ground-up refreshers can help!-)

On some CPUs, there is a dedicated set of registers for the stack. When a call instruction is executed, one register is loaded with the program counter at the same time as a second register is loaded with the contents of the first, a third register is be loaded with the second, and a fourth with the third, etc. When a return instruction is executed, the program counter is latched with the contents of the first stack register and the same time as that register is latched from the second; that second register is loaded from a third, etc. Note that such hardware stacks tend to be rather small (many the smaller PIC series micros, for example, have a two-level stack).
While a hardware stack does have some advantages (push and pop don't add any time to a call/return, for example) having registers which can be loaded with two sources adds cost. If the stack gets very big, it will be cheaper to replace the push-pull registers with an addressable memory. Even if a small dedicated memory is used for this, it's cheaper to have 32 addressable registers and a 5-bit pointer register with increment/decrement logic, than it is to have 32 registers each with two inputs. If an application might need more stack than would easily fit on the CPU, it's possible to use a stack pointer along with logic to store/fetch stack data from main RAM.

A stack pointer is a small register that stores the address of the top of stack. It is used for the purpose of pointing address of the top of the stack.

Related

In memory, should stack bottom and heap bottom have the same address?

I'm using a tm4c123gh6pm MCU with this linker script. Going to the bottom, I see:
...
...
.bss (NOLOAD):
{
_bss = .;
*(.bss*)
*(COMMON)
_ebss = .;
} > SRAM
_heap_bottom = ALIGN(8);
_heap_top = ORIGIN(SRAM) + LENGTH(SRAM) - _stack_size;
_stack_bottom = ALIGN(8);
_stack_top = ORIGIN(SRAM) + LENGTH(SRAM);
It seems that heap and stack bottoms are the same. I have double checked it:
> arm-none-eabi-objdump -t mcu.axf | grep -E "(heap|stack)"
20008000 g .bss 00000000 _stack_top
20007000 g .bss 00000000 _heap_top
00001000 g *ABS* 00000000 _stack_size
20000558 g .bss 00000000 _heap_bottom
20000558 g .bss 00000000 _stack_bottom
Is this correct? As far as I can see, the stack could overwrite the heap, is this the case?
If I flash this FW it 'works' (at least for now), but I'm expecting it to fail if the stack gets big enough and I use dynamic memory. I have observed though that no one in my code or the startup script uses the stack and bottom symbols, so maybe even if I use the stack and heap everything keeps working. (Unless the stack and heap are special symbols used by someone I can't see, is this the case?)
I want to change the last part by:
_heap_bottom = ALIGN(8);
_heap_top = ORIGIN(SRAM) + LENGTH(SRAM) - _stack_size;
_stack_bottom = ORIGIN(SRAM) + LENGTH(SRAM) - _stack_size + 4; // or _heap_top + 4
_stack_top = ORIGIN(SRAM) + LENGTH(SRAM);
Is the above correct?
If you write your own linker script then it is up to you how stack and heap are arranged.
One common approach is to have stack and heap in the same block, with stack growing downwards from the highest address towards the lowest, and heap growing upwards from a lower address towards the highest.
The advantage of this approach is that you don't need to calculate how much heap or stack you need separately. As long as the total of stack and heap used at any one instant is less than the total memory available, then everything will be ok.
The disadvantage of this approach is that when you allocate more memory than you have, your stack will overflow into your heap or vice-versa, and your program will fail in a variety of ways which are very difficult to predict or to identify when they happen.
The linker script in your question uses this approach, but appears to have a mistake detailed below.
Note that using the names top and bottom when talking about stacks on ARM is very unhelpful because when you push something onto the stack the numerical value of the stack pointer decreases. When the stack is empty the stack pointer has its highest value, and when it is full the stack pointer has its lowest value. It is ambiguous whether "top" refers to the highest address or the location of the current pointer, and whether bottom refers to the lowest address or the address where the first item is pushed.
In the CMSIS example linker scripts the lower and upper bounds of the heap are called __heap_base and __heap_limit, and the lower and upper bounds of the stack are called __stack_limit and __initial_sp respectively.
In this script the symbols have the following meanings:
_heap_bottom is the lowest address of the heap.
_heap_top is the upper address that the heap must not grow beyond if you want to leave at least _stack_size bytes for the stack.
For _stack_bottom, it appears that the script author probably mistakenly thought that ALIGN(8) would align the most recently assigned value, and so they wanted _stack_bottom to be an aligned version of _heap_top, which would make it the value of the stack pointer when _stack_size bytes are pushed to it. In fact ALIGN(8) aligns the value of ., which still has the same value as _heap_bottom as you have observed.
Finally _stack_top is the highest address in memory, it is the value the stack pointer will start with when the stack is empty.
Having an incorrect value for the stack limit almost certainly does absolutely nothing at all, because this symbol is probably never used in the code. On this ARMv7M processor the push and pop instructions and other accesses to the stack by hardware assume that the stack is an infinite resource. Compilers using all the normal ABIs also generate code which does not check before growing the stack either. The reason for this is that it is one of the most common operations performed, and so adding extra instructions would cripple performance. The next generation ARMv8M does have hardware support for a stack limit register, though.
My advice to you is just delete the line. If anything is using it, then you are basically losing the whole benefit of sharing your stack and heap space. If you do want to calculate and check for it, then your suggestion is correct except that you don't need to add + 4. This would create a 4 byte gap which is not usable as either heap or stack.
As an aside, I personally prefer to put the stack at the bottom of memory and the heap at the top, growing way from each other. That way, if either of them get bigger than they should they go into an unallocated address space which can be configured to cause a bus fault straight away, without any software checking the values all the time.

I'm failing to understand how the stack works

I'm building an emulator for the MOS6502 processor, and at the moment I'm trying to simulate the stack in code, but I'm really failing to understand how the stack works in the context of the 6502.
One of the features of the 6502's stack structure is that when the stack pointer reaches the end of the stack it will wrap around, but I don't get how this feature even works.
Let's say we have a stack with 64 maximum values if we push the values x, y and z onto the stack, we now have the below structure. With the stack pointer pointing at address 0x62, because that was the last value pushed onto the stack.
+-------+
| x | 0x64
+-------+
| y | 0x63
+-------+
| z | 0x62 <-SP
+-------+
| | ...
+-------+
All well and good. But now if we pop those three values off the stack we now have an empty stack, with the stack pointer pointing at value 0x64
+-------+
| | 0x64 <-SP
+-------+
| | 0x63
+-------+
| | 0x62
+-------+
| | ...
+-------+
If we pop the stack a fourth time, the stack pointer wraps around to point at address 0x00, but what's even the point of doing this when there isn't a value at 0x00?? There's nothing in the stack, so what's the point in wrapping the stack pointer around????
I can understand this process when pushing values, if the stack is full and a value needs to be pushed to the stack it'll overwrite the oldest value present on the stack. This doesn't work for popping.
Can someone please explain this because it makes no sense.
If we pop the stack a fourth time, the stack pointer wraps around to point at address 0x00, but what's even the point of doing this when there isn't a value at 0x00?? There's nothing in the stack, so what's the point in wrapping the stack pointer around????
It is not done for a functional reason. The 6502 architecture was designed so that pushing and popping could be done by incrementing an 8 bit SP register without any additional checking. Checks for overflow or underflow of the SP register would involve more silicon to implement them, more silicon to implement the stack overflow / underflow handling ... and extra gate delays in a critical path.
The 6502 was designed to be cheap and simple using 1975 era chip technology1. Not fast. Not sophisticated. Not easy to program2
1 - According to Wikipedia, the original design had ~3200 or ~3500 transistors. One of the selling points of the 6502 was that it was cheaper than its competitors. Fewer transistors meant smaller dies, better yields and lower production costs.
2 - Of course, this is relative. Compared to some ISAs, the 6502 is easy because it is simple and orthogonal, and you have so few options to chose from. But compared to others, the limitations that make it simple actually make it difficult. For example, the fact that there are at most 256 bytes in the stack page that have to be shared by everything. It gets awkward if you are implementing threads or coroutines. Compare this with an ISA where the SP is a 16 bit register or the stack can be anywhere.

What does 'return from subroutine' mean?

I'm trying to build my first ever CHIP-8 emulator from scratch using C. While writing necessary code for the instructions, I came across this opcode:
00EE - RET
Return from a subroutine.
The interpreter sets the program counter to the address at the top of the stack, then subtracts 1 from the stack pointer.
(http://devernay.free.fr/hacks/chip8/C8TECH10.HTM)
I know that a subroutine is basically a function, but what does it mean to 'return' from a subroutine? And what is happening to the program counter, stack, and the stack pointer respectively?
(One additional question): If I created an array that can hold 16 values to represent the stack, will the 'top of the stack' be STACK[0] or STACK[15]? And where should my stack pointer be?
To return from a subroutine is to return code execution to the point it was at before the subroutine was called.
Therefore, given that calling a subroutine pushes the current address PC+2 (+2 to jump past the call instruction) onto the stack. Returning from a subroutine will return execution to the address that was pushed to the stack by popping the address from the stack. (e.g. pc=stack[sp]; sp-=2;)
As for the additional question, it really depends on whether you define your stack as being ascending or descending. For the CHIP-8 the choice is not specified.

Initial stack pointer not starting at the required offset (where are the extra byte offsets coming from?)

I have this following sample code in the start-up file for Cortex-M3 taken from Keil(compiling it with Microlib).
; <h> Stack Configuration
; <o> Stack Size (in Bytes) <0x0-0xFFFFFFFF:8>
; </h>
EXPORT __initial_sp
Stack_Size EQU 0x00000100
AREA STACK, NOINIT, READWRITE, ALIGN=3
Stack_Mem SPACE Stack_Size
__initial_sp
And this area is finally placed into a RAM region starting at address 0x20000000 with size of the executable region say 0x400 in the scatter file.
When I get into the debugger I see the that the value at memory address 0x0 is 0x20000118 which is the initial stack pointer and even the register window shows the msp register as 0x20000118.
But my understanding was that the start of the stack would be from 0x20000100 because that is what the above code snippet is doing.
I am unable to get from where are these extra 0x18 bytes coming from.
Also, i just switch off the Microlib mode, now I see the initial stack pointer is 0x20000120.
Again, from where are these extra 0x20 bytes offset coming from to the stack pointer.
Why isn't stack starting from where I want it to be(0x20000100), instead having some extra offsets?
No, this code snippet doesn't say that initial stack pointer will be at 0x20000100.
Firstly, it EXPORTs symbol "__initial_sp". This only declares this symbol as "global" (accessed by other files). Next, the value 0x100 is assigned to symbol "Stack_Size". Next instructions are to create dummy "STACK" section which will be of "stack_size" size.
The initial stack pointer value will be calculated (usually) by linker script. You also need to see source code of vector table (in most cases it will be in a file called startup.s or similar) and see what symbol there is used as a first entry (is it really "__initial_sp"?).
Note, if you have (for example) 32KB of RAM and your RAM starts at 0x20000000, then you want (usually) your initial SP to be at 0x20008000 (end of RAM). If "stack size" is equal to "0x100" it means that you don't expect SP to be less than 0x20007F00. But, you can also have initial stack pointer at address that depends on size of other sections (for instance .heap or .data). This is why you can see differences when linking to standard library (it will change size of other sections).

x86 memory ordering: Loads Reordered with Earlier Stores vs. Intra-Processor Forwarding

I am trying to understand section 8.2 of Intel's System Programming Guide (that's Vol 3 in the PDF).
In particular, I see two different reordering scenarios:
8.2.3.4 Loads May Be Reordered with Earlier Stores to Different Locations
and
8.2.3.5 Intra-Processor Forwarding Is Allowed
However, I do not understand the difference between these scenarios from the observable effects POW. The examples provided in those sections seem interchangeable to me. 8.2.3.4 example can be explained by 8.2.3.5 rule just as well as by its own rule. And the converse seems true to me as well, although I am not that sure in that case.
So here is my question: are there better examples or explanations how the observable effects of 8.2.3.4 are different from observable effects of 8.2.3.5?
The example at 8.2.3.5 should be "surprising" if you expect memory ordering to be all strict an clean, and even if you acknowledge that 8.2.3.4 allows loads to reorder with stores of different addresses.
Processor 0 | Processor 1
--------------------------------------
mov [x],1 | mov [y],1
mov R1, [x] | mov R3,[y]
mov R2, [y] | mov R4,[x]
Note that the key part is that the newly added loads in the middle both return 1 (store-to-load forwarding makes that possible in the uarch without stalling). So in theory, you would expect that both stores have been "observed" globally by the time both these loads completed (that would have been the case with sequential consistency, where there is a unique ordering between stores and all cores see it).
However, having later R2 = R4 = 0 as a valid outcome proves this is not the case - the stores are in fact observed locally first. In other words, allowing this outcome means that processor 0 sees the stores as time(x) < time(y), while processor 1 sees the opposite.
This is a very important observation about the consistency of this memory model, which the previous example doesn't prove. This nuance is the biggest difference between Sequential Consistency and Total Store Ordering - the second example breaks SC, the first one doesn't.

Resources