Where in the CPU is the Assembly Stack located? - memory

I'm studying assembly with this image:
In Assembly you'll use the stack with commands like:
push EAX
pop EBP
sub esp, 4
...
Where is this stack exactly? From the picture, the only place it could be is the Memory, but surely that's not the case, right? Won't that slow down the entire cycle?

It's not in the CPU. The stack is, in fact, in memory. The stack pointer, however, is a register in the CPU that holds the address of the top of the stack.

Related

Does Rust allocate on the stack to initialize an Option reference with Some?

I've been working with Rust in a constrained embedded environment (on the STM32F303 MCU), and I noticed that some of my functions were allocating an unexpectedly large amount of stack space.
In this environment, I do not have an allocator, and need to allocate large, mutable, static data structures on the stack.
Eventually, I found that some functions were unexpectedly allocating space on the stack, and causing my memory-constrained stack to overflow.
I am looking to understand how much memory needs to be allocated on the stack for the following mutate function.
I've been searching for answers to this problem on this site, but none seem to give an answer for this more general problem.
I understand that for this code block below I can look at the LLVM/assembly to see where the allocations are made, but I'm trying to understand how to predict when stack allocations are made for a general class of problems- independent of compiler options and the optimizer.
The following problem is a toy example which mimics the pattern I'm using (and having issues with) in my embedded Rust program.
Question: How much memory does mutate below need to allocate on the stack?
struct Parent {
data: Option<[u32; 1024]>,
}
impl Parent {
pub fn new() -> Self {
Parent { data: None }
}
// Ideally, this function should allocate negligible memory on the stack
pub fn mutate(&mut self) {
let arr = [0u32; 1024];
self.data = Some(arr);
for i in 0..self.data.unwrap().len() {
self.data.unwrap()[i] = i as u32;
}
}
}
fn main() {
let mut it = Parent::new();
it.mutate();
}
Considerations
Other posts seem to suggest this behavior is up to the optimizer. If this is the case, is there a way I can rewrite the code above so that each function's stack allocation size is obvious to the reader of the code?
Does the let keyword (first line in the mutate function) have any affect on whether the large array is allocated on the stack?
I am sure I could dip into unsafe rust to ensure this function doesn't need to make any allocations/memcpys (like I could in C). Is there a better way to do this without using unsafe?
Thanks in advance for any help on this issue.
Question: How much memory does mutate below need to allocate on the stack?
4kB (32 = 8 * 1024)? You're literally creating an array local on the stack, and Rust doesn't have placement new so even if you did not create an array local on the stack it's basically up to the optimiser where it handles the allocation.
Also note that your Parent struct necessarily always takes 4kB as well, in fact it likely takes 4kB + 4 or 8 bytes (not sure what the alignment requirements are for an array of u32) for the Option's tag as the array has no invalid value which rustc could use for niche variant optimisation.
edit: oh wait no, you'er also copying the array back into the function with the self.data.unwrap() calls, so that's 2 more of those, so at least 12kB. In fact plugging this into Compiler Explorer it tells me:
example::Parent::mutate:
mov eax, 28856
so that's 28k, not quite sure where the extras come from.
Anyway activating -O, it looks like it all gets optimised away so… lucky?:
example::Parent::mutate:
push rax
mov dword ptr [rdi], 1
add rdi, 4
mov edx, 4096
xor esi, esi
call qword ptr [rip + memset#GOTPCREL]
pop rax
ret

Local variables: are they always on the stack?

In the following procedure, will the array be allocated on the stack?
procedure One:
var
arr: array[0..1023] of byte;
begin
end;
What is the largest item that can go on the stack?
Is there a speed difference between accessing variable on the stack and on the heap?
In the following procedure, will the array be allocated on the stack?
Yes, provided that the local variable is not captured by an anonymous method. Such local variables reside on the heap.
What is the largest item that can go on the stack?
It depends on how large the stack is, and how much of the stack has already been used, and how much of the stack is used by calls made by the function itself. The stack is a fixed size, determined when the thread is created. The stack overflows if it grows beyond that size. On Windows at least, the default stack size is 1MB, so I would not expect you to encounter problems with a 1KB array as can be seen here.
Is there a speed difference between accessing variable on the stack and on the heap?
By and large no, but again this depends. Variables on the stack are probably more likely to be accessed frequently, and so probably easier to be cached. But for a decently sized object, like the 1KB array we can see here, I would not expect there to be any difference in access time. In terms of the underlying memory architecture, there's no difference between stack and heap, it's all just memory.
Now, where there is a difference in performance is in allocation. Heap allocation is more expensive than stack allocation. And especially if you have a multi-threaded application, heap allocation can be a bottleneck. In particular, the default Delphi memory manager does not scale well in multi-threaded use.

What do the contents of the general purpose registers contain?

I included the iOS tag, but I'm running in the simulator on a Core i7 MacBook Pro (x86-64, right?), so I think that's immaterial.
I'm currently debugging a crash in Flurry's video ads. I have a breakpoint set on Objective-C exceptions. When the breakpoint is hit I am in objc_msgSend. The callstack contains a mix of private Flurry and iOS methods, nothing public and nothing that I've written. Calling register read from the objc_msgSend stack frame outputs the following:
(lldb) register read
General Purpose Registers:
eax = 0x1ac082d0
ebx = 0x009600b5 "spaceWillDismiss:interstitial:"
ecx = 0x03e2cddb "makeKeyAndVisible"
edx = 0x0000003f
edi = 0x0097c6f3 "removeWindow"
esi = 0x00781e65 App`-[FlurryAdViewController removeWindow] + 12
ebp = 0xbfffd608
esp = 0xbfffd5e8
ss = 0x00000023
eflags = 0x00010202 App`-[FeedTableCell setupVisibleCommentAndLike] + 1778 at FeedTableCell.m:424
eip = 0x049bd09b libobjc.A.dylib`objc_msgSend + 15
cs = 0x0000001b
ds = 0x00000023
es = 0x00000023
fs = 0x00000000
gs = 0x0000000f
I've got a few questions about this output.
I assumed $ebx contains the selector that caused the crash and $edi is the last executing method. Is that the case?
$eip is where I crashed. Is that usually the case?
$eflags references an instance method that, as far as I know, has nothing to do with this crash. What is that?
Is there any other information I can pry out of these registers?
I can't speak to iOS/Objective-C frame layouts specifically, so I can't answer your question about EBX and EDI. But I can help you regarding EIP and EFLAGS and give you some general hints about ESP/EBP and the selector registers. (By the way, the simulator is simulating a 32-bit x86 environment; you can tell because your registers are 32 bits long.)
EIP is the instruction pointer register, also known as the program counter, which contains the address of the currently executing machine instruction. Thus it will point to where your program crashed, or more generally, where your program is when it hits a breakpoint, dumps core etc.
EIP is saved and restored to implement function calls (at the machine code level -- inlining may result in high-level language calls not performing actual calls). In memory-unsafe languages, a stack buffer overflow can overwrite the saved value of the instruction pointer, causing the return instruction to return to the wrong place. If you're lucky, the overwritten value will trigger a segfault on the next memory fetch, but the value of EIP will be arbitrary and unhelpful in debugging the problem. If you're unlucky, an attacker crafted the new EIP to point to useful code, so many environments use "stack cookies" or "canaries" to detect these overwrites before restoring the saved/overwritten EIP, in which case the EIP value may be useful.
EFLAGS isn't a memory address, and arguably isn't a general purpose register. Each bit of EFLAGS is a flag that can be set or tested by various instructions. The most important flags are the carry, zero and sign flags, which are set by arithmetic instructions and used for conditional branching. Your debugger is misinterpreting it as a memory address and displaying it as the closest function, but that isn't actually related to your crash. (The + 1778 is the giveaway: this means EFLAGS points 1778 bytes into the function, but the function is unlikely to actually be 1778 bytes long.)
ESP is the stack pointer and EBP is (usually) the frame pointer (also called the base pointer). These registers bound the current frame on the call stack. Your debugger usually can show you the values of stack variables and the current call stack based on these pointers. In case of corruption, sometimes you can manually inspect the stack to recover EBP and manually unwind the call stack. Note that code can be compiled without frame pointers (frame pointer omission), freeing EBP for other uses; this is common on x86 because there are so few general-purpose registers.
SS, CS, DS, ES, FS and GS hold segment selectors, used in the bad old days before paging to implement segmentation. Today FS and GS are commonly used by operating systems for process and thread state blocks; they were the only selector registers carried forward into x86-64. The selector registers are generally not helpful for debugging.

Does the system allocates memory from high->low or the reverse?

IIRC it should be high->low,but according to this image,it's low->high.
I'm now confused,which is the case?
Seems the code is also executed from low->hight:
0x0000000000400498 <main+0>: push %rbp
0x0000000000400499 <main+1>: mov %rsp,%rbp
0x000000000040049c <main+4>: sub $0x10,%rsp
0x00000000004004a0 <main+8>: movl $0x6,-0x4(%rbp)
On Intel x86/x64, which are the most popular architectures that run Windows, the stack "grows" towards the lower addresses. I.e., pushing onto the stack involves subtracting from the stack pointer (ESP), and popping from the stack involves adding to the stack pointer.
The stack grows from the top to the bottom in your example. This is the function's prologue, and it uses the SUB instruction to allocate stack space for local variables. You might be confusing the stack with the memory in which your program is stored -- in that area, the CPU executes instructions sequentially, from low to high addresses, until a branch (e.g. JMP) instruction is encountered.

What is a stack pointer used for in microprocessors?

I am preparing for a microprocessor exam. If the use of a program counter is to hold the address of the next instruction, what is use of stack pointer?
A stack is a LIFO data structure (last in, first out, meaning last entry you push on to the stack is the first one you get back when you pop). It is typically used to hold stack frames (bits of the stack that belong to the current function).
This may include, but is not limited to:
the return address.
a place for a return value.
passed parameters.
local variables.
You push items onto the stack and pop them off. In a microprocessor, the stack can be used for both user data (such as local variables and passed parameters) and CPU data (such as return addresses when calling subroutines).
The actual implementation of a stack depends on the microprocessor architecture. It can grow up or down in memory and can move either before or after the push/pop operations.
Operation which typically affect the stack are:
subroutine calls and returns.
interrupt calls and returns.
code explicitly pushing and popping entries.
direct manipulation of the stack pointer register, sp.
Consider the following program in my (fictional) assembly language:
Addr Opcodes Instructions ; Comments
---- -------- -------------- ----------
; 1: pc<-0000, sp<-8000
0000 01 00 07 load r0,7 ; 2: pc<-0003, r0<-7
0003 02 00 push r0 ; 3: pc<-0005, sp<-7ffe, (sp:7ffe)<-0007
0005 03 00 00 call 000b ; 4: pc<-000b, sp<-7ffc, (sp:7ffc)<-0008
0008 04 00 pop r0 ; 7: pc<-000a, r0<-(sp:7ffe[0007]), sp<-8000
000a 05 halt ; 8: pc<-000a
000b 06 01 02 load r1,[sp+2] ; 5: pc<-000e, r1<-(sp+2:7ffe[0007])
000e 07 ret ; 6: pc<-(sp:7ffc[0008]), sp<-7ffe
Now let's follow the execution, describing the steps shown in the comments above:
This is the starting condition where pc (the program counter) is 0 and sp is 8000 (all these numbers are hexadecimal).
This simply loads register r0 with the immediate value 7 and moves pc to the next instruction (I'll assume that you understand the default behavior will be to move to the next instruction unless otherwise specified).
This pushes r0 onto the stack by reducing sp by two then storing the value of the register to that location.
This calls a subroutine. What would have been pc in the next step is pushed on to the stack in a similar fashion to r0 in the previous step, then pc is set to its new value. This is no different to a user-level push other than the fact it's done more as a system-level thing.
This loads r1 from a memory location calculated from the stack pointer - it shows a way to pass parameters to functions.
The return statement extracts the value from where sp points and loads it into pc, adjusting sp up at the same time. This is like a system-level pop instruction (see next step).
Popping r0 off the stack involves extracting the value from where sp currently points, then adjusting sp up.
The halt instruction simply leaves pc where it is, an infinite loop of sorts.
Hopefully from that description, it will become clear. Bottom line is: a stack is useful for storing state in a LIFO way and this is generally ideal for the way most microprocessors do subroutine calls.
Unless you're a SPARC of course, in which case you use a circular buffer for your stack :-)
Update: Just to clarify the steps taken when pushing and popping values in the above example (whether explicitly or by call/return), see the following examples:
LOAD R0,7
PUSH R0
Adjust sp Store val
sp-> +--------+ +--------+ +--------+
| xxxx | sp->| xxxx | sp->| 0007 |
| | | | | |
| | | | | |
| | | | | |
+--------+ +--------+ +--------+
POP R0
Get value Adjust sp
+--------+ +--------+ sp->+--------+
sp-> | 0007 | sp->| 0007 | | 0007 |
| | | | | |
| | | | | |
| | | | | |
+--------+ +--------+ +--------+
The stack pointer stores the address of the most recent entry that was pushed onto the stack.
To push a value onto the stack, the stack pointer is incremented to point to the next physical memory address, and the new value is copied to that address in memory.
To pop a value from the stack, the value is copied from the address of the stack pointer, and the stack pointer is decremented, pointing it to the next available item in the stack.
The most typical use of a hardware stack is to store the return address of a subroutine call. When the subroutine is finished executing, the return address is popped off the top of the stack and placed in the Program Counter register, causing the processor to resume execution at the next instruction following the call to the subroutine.
http://en.wikipedia.org/wiki/Stack_%28data_structure%29#Hardware_stacks
You got more preparing [for the exam] to do ;-)
The Stack Pointer is a register which holds the address of the next available spot on the stack.
The stack is a area in memory which is reserved to store a stack, that is a LIFO (Last In First Out) type of container, where we store the local variables and return address, allowing a simple management of the nesting of function calls in a typical program.
See this Wikipedia article for a basic explanation of the stack management.
For 8085: Stack pointer is a special purpose 16-bit register in the Microprocessor, which holds the address of the top of the stack.
The stack pointer register in a computer is made available for general purpose use by programs executing at lower privilege levels than interrupt handlers. A set of instructions in such programs, excluding stack operations, stores data other than the stack pointer, such as operands, and the like, in the stack pointer register. When switching execution to an interrupt handler on an interrupt, return address data for the currently executing program is pushed onto a stack at the interrupt handler's privilege level. Thus, storing other data in the stack pointer register does not result in stack corruption. Also, these instructions can store data in a scratch portion of a stack segment beyond the current stack pointer.
Read this one for more info.
General purpose use of a stack pointer register
The Stack is an area of memory for keeping temporary data. Stack is used by the CALL instruction to keep the return address for procedures The return RET instruction gets this value from the stack and returns to that offset. The same thing happens when an INT instruction calls an interrupt. It stores in the Stack the flag register, code segment and offset. The IRET instruction is used to return from interrupt call.
The Stack is a Last In First Out (LIFO) memory. Data is placed onto the Stack with a PUSH instruction and removed with a POP instruction. The Stack memory is maintained by two registers: the Stack Pointer (SP) and the Stack Segment (SS) register. When a word of data is PUSHED onto the stack the the High order 8-bit Byte is placed in location SP-1 and the Low 8-bit Byte is placed in location SP-2. The SP is then decremented by 2. The SP addds to the (SS x 10H) register, to form the physical stack memory address. The reverse sequence occurs when data is POPPED from the Stack. When a word of data is POPPED from the stack the the High order 8-bit Byte is obtained in location SP-1 and the Low 8-bit Byte is obtained in location SP-2. The SP is then incremented by 2.
The stack pointer holds the address to the top of the stack. A stack allows functions to pass arguments stored on the stack to each other, and to create scoped variables. Scope in this context means that the variable is popped of the stack when the stack frame is gone, and/or when the function returns. Without a stack, you would need to use explicit memory addresses for everything. That would make it impossible (or at least severely difficult) to design high-level programming languages for the architecture.
Also, each CPU mode usually have its own banked stack pointer. So when exceptions occur (interrupts for example), the exception handler routine can use its own stack without corrupting the user process.
Should you ever crave deeper understanding, I heartily recommend Patterson and Hennessy as an intro and Hennessy and Patterson as an intermediate to advanced text. They're pricey, but truly non-pareil; I just wish either or both were available when I got my Masters' degree and entered the workforce designing chips, systems, and parts of system software for them (but, alas!, that was WAY too long ago;-). Stack pointers are so crucial (and the distinction between a microprocessor and any other kind of CPU so utterly meaningful in this context... or, for that matter, in ANY other context, in the last few decades...!-) that I doubt anything but a couple of thorough from-the-ground-up refreshers can help!-)
On some CPUs, there is a dedicated set of registers for the stack. When a call instruction is executed, one register is loaded with the program counter at the same time as a second register is loaded with the contents of the first, a third register is be loaded with the second, and a fourth with the third, etc. When a return instruction is executed, the program counter is latched with the contents of the first stack register and the same time as that register is latched from the second; that second register is loaded from a third, etc. Note that such hardware stacks tend to be rather small (many the smaller PIC series micros, for example, have a two-level stack).
While a hardware stack does have some advantages (push and pop don't add any time to a call/return, for example) having registers which can be loaded with two sources adds cost. If the stack gets very big, it will be cheaper to replace the push-pull registers with an addressable memory. Even if a small dedicated memory is used for this, it's cheaper to have 32 addressable registers and a 5-bit pointer register with increment/decrement logic, than it is to have 32 registers each with two inputs. If an application might need more stack than would easily fit on the CPU, it's possible to use a stack pointer along with logic to store/fetch stack data from main RAM.
A stack pointer is a small register that stores the address of the top of stack. It is used for the purpose of pointing address of the top of the stack.

Resources