Understanding how call return works - callstack

I am somewhat confused regarding exactly when the context of a called function gets deleted. I have read that the stackframe of the called function is popped off when it returns. I am trying to apply that knowledge to the following scenario where function foo calls function bar, which return a structure. The code could look something like this:
//...
struct Bill
{
float amount;
int id;
char address[100];
};
//...
Bill bar(int);
//...
void foo() {
// ...
Bill billRecord = bar(56);
//...
}
Bill bar(int i) {
//...
Bill bill = {123.56, 2347890, "123 Main Street"};
//...
return bill;
}
The memory for bill object in the function bar is from the stackframe of bar, which is popped off when bar returns. It appears to be still valid when the assignment of the returned structure is made to billRecord in foo.
So does it mean that the stackframe for bar is not deleted the instant it returns but only after the value returned by bar is used in foo?

You're right, there's this "hole" between when bar returns and foo copies the return value somewhere with an assignment operation. The way this works is that there is notionally some 'return value' space where the return value lives while being returned. So from the execution model, there are two copies of the return value -- from the local in bar to the return space and from the return space to billRecord in foo.
Exactly how this works depends on the calling conventions. On x86_64, the "return value space" is in registers for small return values and in some memory controlled by the caller for larger return values. If the return value is larger than two registers worth, then the caller must pass a 'hidden' extra argument with a pointer to the space where the return value should be stored. bar will then copy its local variable into that space before deleting its stack frame and returning.
So when compiling foo the compiler knows it needs to provide that extra hidden argument and knows it needs to allocate some space for it. If it is smart (and you enable optimization) it will simply re-use the space for billRecord for this (passing a pointer to billRecord as the hidden argument), and the assignment in foo will then be a noop (as it knows bar will do all the work)1`.
If the compiler is smart when compiling bar it might do "return value optimization" and, realizing it is just going to return the local variable bill, allocate that local var in the return value space it got from its caller, rather than in its own stack frame.
1Of course, it can only do this if it knows there's no way for bar to access billRecord directly. This requires what is known as "escape analysis" -- if the location of billRecord "escapes" from foo (for example, by taking its address and storing it somewhere or passing it as an argument somewhere), this optimization can't be done and it will need to allocate additional space in its stack frame for the return space in addition to that used by billRecord

The stack frame is "deleted" right before the function returns. Usually the return value is stored from the stack into a register and then the function returns. Let's look under the hood a bit to see what's really going on. I'm going to omit the actual disassembly of your code since it's a bit overwhelming, but I'll summarize the key points here. Typically a function written in C does the following (I'm using x86 Assembly as an example, the process is similar on other architectures but the register names will be different)
First, to use a function, it must be CALLed.
CALL bar
Doing so pushes the contents of the rip register on the stack (which can be thought of as representing "what line of code we're going to run next.")
bar:
push rbp
mov rbp,rsp
Most functions written by a C compiler start out like this. The purpose of this is to create a stack frame. The contents of rbp are stored on the stack for safekeeping. Then, we copy the value of the stack pointer (rsp) to rbp. The reason C does this is simple: rsp can be altered by certain instructions such as push,pop,call, and ret, where rbp is not. In addition, the free space above the stack that hasn't been used yet can be used by the calling function.
Next, our local variables are stored onto the stack. One of them was the 56 we passed to bar. C chose to store the value 56 into the esi register prior to calling the function.
mov DWORD PTR [rbp-12], esi
This basically means "take the contents of the esi register and store them 12 bytes before the address pointed to by rbp. This is guaranteed to be free space, thanks to the push rbp mov rbp,rsp sequence from earlier.
Once the function does what it needs to do, the return value is stored in rax and then we do the exit sequence.
pop rbp
ret
As for the stack frame, it wasn't actually "deleted" per se. Those values are temporarily still there until they are overwritten by another function. However, for all intents and purposes, they are considered deleted, as that stack space is now considered "free" and can be used by anything (such as hardware interrupts etc.) Therefore, after a function returns, there is no guarantee that any of its local values are still there if you try to access them. (Not that C would let you access them without inline assembly, but what I'm saying is you shouldn't even try.)

Related

How does Assembly Work with Stack Correctly

I always have a question about how to calculate the stretch of the stack. For example, when I have more than 8 parameters in arm64, he actually uses the area of my previous function call stack. After BL enters the function, he uses SP to add back to get the parameters, which is equivalent to crossing a stack. How can he avoid polluting the previous stack in this case? Thank you for your answer
You are correct: the function arguments which do not fit in registers will be pushed onto the stack before calling your function. Therefore, they will be at addresses with positive offsets from SP on entry to your function, and I can see why you might be concerned that it is not safe to access this memory. However, this memory is in fact "yours".
The ARM Procedure Call Standard section 6.4.2 states "A callee is permitted to modify any stack space used for receiving parameter values from the caller". So, there is no need to worry. The caller is expecting you to access this memory, and even to modify it if you want, and nothing will break if you do.

What does 'return from subroutine' mean?

I'm trying to build my first ever CHIP-8 emulator from scratch using C. While writing necessary code for the instructions, I came across this opcode:
00EE - RET
Return from a subroutine.
The interpreter sets the program counter to the address at the top of the stack, then subtracts 1 from the stack pointer.
(http://devernay.free.fr/hacks/chip8/C8TECH10.HTM)
I know that a subroutine is basically a function, but what does it mean to 'return' from a subroutine? And what is happening to the program counter, stack, and the stack pointer respectively?
(One additional question): If I created an array that can hold 16 values to represent the stack, will the 'top of the stack' be STACK[0] or STACK[15]? And where should my stack pointer be?
To return from a subroutine is to return code execution to the point it was at before the subroutine was called.
Therefore, given that calling a subroutine pushes the current address PC+2 (+2 to jump past the call instruction) onto the stack. Returning from a subroutine will return execution to the address that was pushed to the stack by popping the address from the stack. (e.g. pc=stack[sp]; sp-=2;)
As for the additional question, it really depends on whether you define your stack as being ascending or descending. For the CHIP-8 the choice is not specified.

How does cpu obtain return address from stack

How does cpu obtains return address from stack which is pushed by caller function. how he know it is a return address not anything else?
I had to look it up, but it's sufficiently explained on Wikipedia
So the callee (called subroutine) itself is responsible to pop everything from the stack (own local variables) and to perform the jump to the return address which the caller function provided.
The return address is e.g. the very stack entry after local variables from the callee have been popped (at least in the Wikipedia example - there may be differences on different architectures).
The frame pointer would be a hint for the location to the return address, but can be omitted for performance, so you can't rely on that.
I don't know whether the callee is responsible to remove the parameters which were passed from caller - this may be architecture dependent.
Update: an assembly example
At the end of a function (callee), variables that got saved on the stack (i.e. some register values and the return address to the caller) are popped back into the corresponding registers:
pop {r4, r5, r6, pc}
On ARM, this gets the four next words on the stack into those registers.
One is the return address which is popped into $PC (program counter).
Thus the execution continues with the instruction at the return address which is popped into $PC.
I can't exactly say how the link register is working. It's supposed to contain a return address (but for nested function calls of course we still need the stack to store several return addresses).

Go memory layout compared to C++/C

In Go, it seems there are no constructors, but it is suggested that you allocate an object of a struct type using a function, usually named by "New" + TypeName, for example
func NewRect(x,y, width, height float) *Rect {
return &Rect(x,y,width, height)
}
However, I am not sure about the memory layout of Go. In C/C++, this kind of code means you return a pointer, which point to a temporary object because the variable is allocated on the stack, and the variable may be some trash after the function return. In Go, do I have to worry such kind of thing? Because It seems no standard shows that what kind of data will be allocated on the stack vs what kind of data will be allocated on the heap.
As in Java, there seems to have a specific point out that the basic type such as int, float will be allocated on the stack, other object derived from the object will be allocated on the heap. In Go, is there a specific talk about this?
The Composite Literal section mentions:
Taking the address of a composite literal (§Address operators) generates a unique pointer to an instance of the literal's value.
That means the pointer returned by the New function will be a valid one (allocated on the stack).
Calls:
In a function call, the function value and arguments are evaluated in the usual order.
After they are evaluated, the parameters of the call are passed by value to the function and the called function begins execution.
The return parameters of the function are passed by value back to the calling function when the function returns.
You can see more in this answer and this thread.
As mentioned in "Stack vs heap allocation of structs in Go, and how they relate to garbage collection":
It's worth noting that the words "stack" and "heap" do not appear anywhere in the language spec.
The blog post "Escape Analysis in Go" details what happens, mentioning the FAQ:
When possible, the Go compilers will allocate variables that are local to a function in that function's stack frame.
However, if the compiler cannot prove that the variable is not referenced after the function returns, then the compiler must allocate the variable on the garbage-collected heap to avoid dangling pointer errors.
Also, if a local variable is very large, it might make more sense to store it on the heap rather than the stack.
The blog post adds:
The code that does the “escape analysis” lives in src/cmd/gc/esc.c.
Conceptually, it tries to determine if a local variable escapes the current scope; the only two cases where this happens are when a variable’s address is returned, and when its address is assigned to a variable in an outer scope.
If a variable escapes, it has to be allocated on the heap; otherwise, it’s safe to put it on the stack.
Interestingly, this applies to new(T) allocations as well.
If they don’t escape, they’ll end up being allocated on the stack. Here’s an example to clarify matters:
var intPointerGlobal *int = nil
func Foo() *int {
anInt0 := 0
anInt1 := new(int)
anInt2 := 42
intPointerGlobal = &anInt2
anInt3 := 5
return &anInt3
}
Above, anInt0 and anInt1 do not escape, so they are allocated on the stack;
anInt2 and anInt3 escape, and are allocated on the heap.
See also "Five things that make Go fast":
Unlike C, which forces you to choose if a value will be stored on the heap, via malloc, or on the stack, by declaring it inside the scope of the function, Go implements an optimisation called escape analysis.
Go’s optimisations are always enabled by default.
You can see the compiler’s escape analysis and inlining decisions with the -gcflags=-m switch.
Because escape analysis is performed at compile time, not run time, stack allocation will always be faster than heap allocation, no matter how efficient your garbage collector is.

Implementation of closures in Lua?

I have a question about how closures are implemented.
Say this is in a file named test.lua:
local a = 'asdf'
local function b()
return a
end
a = 10
return b
And another file does
a = require 'test'
a()
it will print
10
If a is a pointer on the stack to 'asdf' (on the heap I assume, but it doesn't matter), and the closure b is created so presumably the address that was in a is saved for b to use, how does a = 10 change the pointer inside the closure as well?
Wikipedia says quite well what is perplexing me:
A language implementation cannot easily support full closures if its run-time memory model allocates all local variables on a linear stack1. In such languages, a function's local variables are deallocated when the function returns.
I was thinking that perhaps b really didn't save a pointer to 'asdf' but a stack offset to a, so that you can change a and the stack offset will get you to a which points to the last thing you set a to, but then how does this work when a (the pointer) is popped off the stack and the stack offset becomes invalid?
1 I know Lua doesn't allocate the values on the stack, but it allocates local pointers on the stack to values in the heap, doesn't it?
I really wish you had named these variables a bit more reasonably. So I will:
local inner = 'asdf'
local function b()
return inner
end
inner = 10
return b
and
func = require 'test'
func()
OK, now that we know what we're talking about, I can proceed.
The Lua chunk test has a local variable called inner. Within that chunk you create a new function b. Since this is a new function, it has a scope within the scope of the chunk test.
Since it is within a function, it has the right to access local variables declared outside of that function. But because it is inside of a function, it does not access those variables like it would one of its own locals. The compiler detects that inner is a local variable declared outside of the function's scope, so it converts it into what Lua calls an "upvalue".
Functions in Lua can have an arbitrary number of values (up to 255) associated with them, called "upvalues". Functions created in C/C++ can store some number of upvalues by using lua_pushcclosure. Functions created by the Lua compiler use upvalues to provide lexical scoping.
A scope is everything that happens within a fixed block of Lua code. So:
if(...) then
--yes
else
--no
end
The yes block has a scope, and the no block has a different scope. Any local variables declared in the yes block cannot be accessed from the no block, because they are outside of the scope of the no block.
The Lua constructs that define a scope are if/then/else/end, while/do/end, repeat/until, do/end, for/end, and function/end. Also, each script, called a Lua "chunk", has a scope.
Scopes are nested. From within one scope, you can access local variables declared in a higher scope.
A "stack" represents all variables declared as local within a particular scope. So if you have no local variables in a certain scope, the stack for that scope is empty.
In C and C++, the "stack" that you are familiar with is just a pointer. When you call a function, the compiler has predetermined how many bytes of space that the function's stack needs. It advances the pointer by that amount. All stack variables used in the function are just byte offsets from the stack pointer. When the function exits, the stack pointer is decreased by the stack amount.
In Lua, things are different. The stack for a particular scope is an object, not merely a pointer. For any particular scope, there are some number of local variables defined for it. When the Lua interpreter enters a scope, it "allocates" a stack of the size necessary to access those local variables. All references to local variables are just offsets into that stack. Access to local variables from higher scopes (previously defined) simply access a different stack object.
So in Lua, you conceptually have a stack of stacks (which I will refer to as the "s-stack" for clarity). Each scope creates a new stack and pushes it, and when you leave a scope, it pops the stack off of the s-stack.
When the Lua compiler encounters a reference to a local variable, it converts that reference into an index into the s-stack, and an offset into that particular stack. So if it accesses a variable in the current local stack, the index into the s-stack refers to the top of the s-stack, and the offset is the offset into that stack where the variable is.
That's fine for most Lua constructs that access scopes. But function/end don't just create a new scope; they create a new function. And this function is allowed to access stacks that aren't just the local stack of that function.
Stacks are objects. And in Lua, objects are subject to garbage collection. When the interpreter enters a scope, it allocates a stack object and pushes it. So long as the stack object is pushed onto the s-stack, it cannot be destroyed. The stack of stacks refers to the object. However, once the interpreter exits the scope, it pops the stack off of the s-stack. So since it is no longer referenced, it is subject to being collected.
However, a function that accesses variables outside of its own local scope can still be referencing that stack. When the Lua compiler sees a reference to a local variable that is not within the function's local scope, it alters the function. It figures out which stack the local it is referencing belongs to, and then stores that stack as an upvalue in the function. It converts the reference to that variable to an offset into that particular upvalue, rather than an offset into a stack that is currently on the s-stack.
So as long as the function object continues to exist, so too will the stack(s) that it references.
Remember that stacks are dynamically created and destroyed as the Lua interpreter enters and exits the scope of functions. So if you were to run test twice, by calling loadfile and executing the returned function twice, you would get two separate functions that refer to two separate stacks. Neither function will see the value from the other.
Note that this may not be exactly how it's implemented, but that's the general idea behind it.

Resources