How does a program look in memory? - memory

How is a program (e.g. C or C++) arranged in computer memory? I kind of know a little about segments, variables etc, but basically I have no solid understanding of the entire structure.
Since the in-memory structure may differ, let's assume a C++ console application on Windows.
Some pointers to what I'm after specifically:
Outline of a function, and how is it called?
Each function has a stack frame, what does that contain and how is it arranged in memory?
Function arguments and return values
Global and local variables?
const static variables?
Thread local storage..
Links to tutorial-like material and such is welcome, but please no reference-style material assuming knowledge of assembler etc.

Might this be what you are looking for:
http://en.wikipedia.org/wiki/Portable_Executable
The PE file format is the binary file structure of windows binaries (.exe, .dll etc). Basically, they are mapped into memory like that. More details are described here with an explanation how you yourself can take a look at the binary representation of loaded dlls in memory:
http://msdn.microsoft.com/en-us/magazine/cc301805.aspx
Edit:
Now I understand that you want to learn how source code relates to the binary code in the PE file. That's a huge field.
First, you have to understand the basics about computer architecture which will involve learning the general basics of assembly code. Any "Introduction to Computer Architecture" college course will do. Literature includes e.g. "John L. Hennessy and David A. Patterson. Computer Architecture: A Quantitative Approach" or "Andrew Tanenbaum, Structured Computer Organization".
After reading this, you should understand what a stack is and its difference to the heap. What the stack-pointer and the base pointer are and what the return address is, how many registers there are etc.
Once you've understood this, it is relatively easy to put the pieces together:
A C++ object contains code and data, i.e., member variables. A class
class SimpleClass {
int m_nInteger;
double m_fDouble;
double SomeFunction() { return m_nInteger + m_fDouble; }
}
will be 4 + 8 consecutives bytes in memory. What happens when you do:
SimpleClass c1;
c1.m_nInteger = 1;
c1.m_fDouble = 5.0;
c1.SomeFunction();
First, object c1 is created on the stack, i.e., the stack pointer esp is decreased by 12 bytes to make room. Then constant "1" is written to memory address esp-12 and constant "5.0" is written to esp-8.
Then we call a function that means two things.
The computer has to load the part of the binary PE file into memory that contains function SomeFunction(). SomeFunction will only be in memory once, no matter how many instances of SimpleClass you create.
The computer has to execute function SomeFunction(). That means several things:
Calling the function also implies passing all parameters, often this is done on the stack. SomeFunction has one (!) parameter, the this pointer, i.e., the pointer to the memory address on the stack where we have just written the values "1" and "5.0"
Save the current program state, i.e., the current instruction address which is the code address that will be executed if SomeFunction returns. Calling a function means pushing the return address on the stack and setting the instruction pointer (register eip) to the address of the function SomeFunction.
Inside function SomeFunction, the old stack is saved by storing the old base pointer (ebp) on the stack (push ebp) and making the stack pointer the new base pointer (mov ebp, esp).
The actual binary code of SomeFunction is executed which will call the machine instruction that converts m_nInteger to a double and adds it to m_fDouble. m_nInteger and m_fDouble are found on the stack, at ebp - x bytes.
The result of the addition is stored in a register and the function returns. That means the stack is discarded which means the stack pointer is set back to the base pointer. The base pointer is set back (next value on the stack) and then the instruction pointer is set to the return address (again next value on the stack). Now we're back in the original state but in some register lurks the result of the SomeFunction().
I suggest, you build yourself such a simple example and step through the disassembly. In debug build the code will be easy to understand and Visual Studio displays variable names in the disassembly view. See what the registers esp, ebp and eip do, where in memory your object is allocated, where the code is etc.

What a huge question!
First you want to learn about virtual memory. Without that, nothing else will make sense. In short, C/C++ pointers are not physical memory addresses. Pointers are virtual addresses. There's a special CPU feature (the MMU, memory management unit) that transparently maps them to physical memory. Only the operating system is allowed to configure the MMU.
This provides safety (there is no C/C++ pointer value you can possibly make that points into another process's virtual address space, unless that process is intentionally sharing memory with you) and lets the OS do some really magical things that we now take for granted (like transparently swap some of a process's memory to disk, then transparently load it back when the process tries to use it).
A process's address space (a.k.a. virtual address space, a.k.a. addressable memory) contains:
a huge region of memory that's reserved for the Windows kernel, which the process isn't allowed to touch;
regions of virtual memory that are "unmapped", i.e. nothing is loaded there, there's no physical memory assigned to those addresses, and the process will crash if it tries to access them;
parts the various modules (EXE and DLL files) that have been loaded (each of these contains machine code, string constants, and other data); and
whatever other memory the process has allocated from the system.
Now typically a process lets the C Runtime Library or the Win32 libraries do most of the super-low-level memory management, which includes setting up:
a stack (for each thread), where local variables and function arguments and return values are stored; and
a heap, where memory is allocated if the process calls malloc or does new X.
For more about the stack is structured, read about calling conventions. For more about how the heap is structured, read about malloc implementations. In general the stack really is a stack, a last-in-first-out data structure, containing arguments, local variables, and the occasional temporary result, and not much more. Since it is easy for a program to write straight past the end of the stack (the common C/C++ bug after which this site is named), the system libraries typically make sure that there is an unmapped page adjacent to the stack. This makes the process crash instantly when such a bug happens, so it's much easier to debug (and the process is killed before it can do any more damage).
The heap is not really a heap in the data structure sense. It's a data structure maintained by the CRT or Win32 library that takes pages of memory from the operating system and parcels them out whenever the process requests small pieces of memory via malloc and friends. (Note that the OS does not micromanage this; a process can to a large extent manage its address space however it wants, if it doesn't like the way the CRT does it.)
A process can also request pages directly from the operating system, using an API like VirtualAlloc or MapViewOfFile.
There's more, but I'd better stop!

For understanding stack frame structure you can refer to
http://en.wikipedia.org/wiki/Call_stack
It gives you information about structure of call stack, how locals , globals , return address is stored on call stack

Another good illustration
http://www.cs.uleth.ca/~holzmann/C/system/memorylayout.pdf

It might not be the most accurate information, but MS Press provides some sample chapters of of the book Inside Microsoft® Windows® 2000, Third Edition, containing information about processes and their creation along with images of some important data structures.
I also stumbled upon this PDF that summarizes some of the above information in an nice chart.
But all the provided information is more from the OS point of view and not to much detailed about the application aspects.

Actually - you won't get far in this matter with at least a little bit of knowledge in Assembler. I'd recoomend a reversing (tutorial) site, e.g. OpenRCE.org.

Related

How does IDA calculate stack variable offsets?

IDA is able to read the machine instructions of a subroutine and display the relative offsets of each variable that gets stored on the stack.
I'm writing a program that analyzes stack memory, and I would like to be able to put the values stored in the stack memory into their respective variable types. What logic is going on under the hood that enables IDA to display the stack variable offsets?
Thanks for your time.
It infers that from the function's code by looking on where and how stack addresses are used. Like, loading into a 4-byte register and doing arithmetics probably mean that stack memory from which the load was performed belongs to a some int variable.
If you want details on IDA's algorithm, I doubt you can found it. You can look at Avast's Retargetable Decompiler open source project, that performs analysis much like IDA and study its code.

eheap allocated in erlang

I use recon_alloc:memory(allocated_types) and get info like below.
34> recon_alloc:memory(allocated_types).
[{binary_alloc,1546650440},
{driver_alloc,21504840},
{eheap_alloc,28704768840},
{ets_alloc,526938952},
{fix_alloc,145359688},
{ll_alloc,403701800},
{sl_alloc,688968},
{std_alloc,67633992},
{temp_alloc,21504840}]
The eheap_alloc is using 28G. But sum up with heap_size of all process
>lists:sum([begin {_, X}=process_info(P, heap_size), X end || P <- processes()]).
683197586
Only 683M !Any idea where is the 28G ?
You are not comparing the right values. From erlang:process_info
{heap_size, Size}
Size is the size in words of youngest heap generation of the
process. This generation currently include the stack of the process.
This information is highly implementation dependent, and may change if
the implementation change.
recon_alloc:memory(allocated_types) is in bytes by default. You can change it using set_unit. It is not the memory that is currently used but it is the memory reserved by the VM grouped into different allocators. You can use recon_alloc:memory(used) instead. More details in allocator() - Recon Library
Searching through the Erlang source code for the eheap_alloc keyword I didn't come up with much. The most relevant piece of code was this XML code from erts_alloc.xml (https://github.com/erlang/otp/blob/172e812c491680fbb175f56f7604d4098cdc9de4/erts/doc/src/erts_alloc.xml#L46):
<tag><c>eheap_alloc</c></tag>
<item>Allocator used for Erlang heap data, such as Erlang process heaps.</item>
This says that process heaps are stored in eheap_alloc but it doesn't say what else is stored in eheap_alloc. The eheap_alloc stores everything your application needs to run along with some extra memory along with some additional space, so the VM doesn't have to request more memory from the OS every time something needs to be added. There are things the VM must keep in memory that aren't associated with a specific process. For example, large binaries, even though they may used within a process, are not stored inside that processes heap. They are stored in a shared process binary heap called binary_alloc. The binary heap, along with the process heaps and some extra memory, are what make up eheap_alloc.
In your case it looks like you have a lot of memory in your binary_alloc. binary_alloc is probably using a significant portion of your eheap_alloc.
For more details on binary handling checkout these pages:
http://blog.bugsense.com/post/74179424069/erlang-binary-garbage-collection-a-love-hate
http://www.erlang.org/doc/efficiency_guide/binaryhandling.html#id65224

What is the purpose of each of the memory locations, stack, heap, etc? (lost in technicalities)

Ok, I asked the difference between Stackoverflow and bufferoverflow yesterday and almost getting voted down to oblivion and no new information.
So it got me thinking and I decided to rephrase my question in the hopes that I get reply which actually solves my issue.
So here goes nothing.
I am aware of four memory segments(correct me if I am wrong). The code, data, stack and heap. Now AFAIK the the code segment stores the code, while the data segment stores the data related to the program. What seriously confuses me is the purpose of the stack and the heap!
From what I have understood, when you run a function, all the related data to the function gets stored in the stack and when you recursively call a function inside a function, inside of a function... While the function is waiting on the output of the previous function, the function and its necessary data don't pop out of the stack. So you end up with a stack overflow. (Again please correct me if I am wrong)
Also I know what the heap is for. As I have read someplace, its for dynamically allocating data when a program is executing. But this raises more questions that solves my problems. What happens when I initially initialize my variables in the code.. Are they in the code segment or in the data segment or in the heap? Where do arrays get stored? Is it that after my code executes all that was in my heap gets erased? All in all, please tell me about heap in a more simplified manner than just, its for malloc and alloc because I am not sure I completely understand what those terms are!
I hope people when answering don't get lost in the technicalities and can keep the terms simple for a layman to understand (even if the concept to be described is't laymanish) and keep educating us with the technical terms as we go along. I also hope this is not too big a question, because I seriously think they could not be asked separately!
What is the stack for?
Every program is made up of functions / subroutines / whatever your language of choice calls them. Almost always, those functions have some local state. Even in a simple for loop, you need somewhere to keep track of the loop counter, right? That has to be stored in memory somewhere.
The thing about functions is that the other thing they almost always do is call other functions. Those other functions have their own local state - their local variables. You don't want your local variables to interfere with the locals in your caller. The other thing that has to happen is, when FunctionA calls FunctionB and then has to do something else, you want the local variables in FunctionA to still be there, and have their same values, when FunctionB is done.
Keeping track of these local variables is what the stack is for. Each function call is done by setting up what's called a stack frame. The stack frame typically includes the return address of the caller (for when the function is finished), the values for any method parameters, and storage for any local variables.
When a second function is called, then a new stack frame is created, pushed onto the top of the stack, and the call happens. The new function can happily work away on its stack frame. When that second function returns, its stack frame is popped (removed from the stack) and the caller's frame is back in place just like it was before.
So that's the stack. So what's the heap? It's got a similar use - a place to store data. However, there's often a need for data that lives longer than a single stack frame. It can't go on the stack, because when the function call returns, it's stack frame is cleaned up and boom - there goes your data. So you put it on the heap instead. The heap is a basically unstructured chunk of memory. You ask for x number of bytes, and you get it, and can then party on it. In C / C++, heap memory stays allocated until you explicitly deallocate. In garbage collected languages (Java/C#/Python/etc.) heap memory will be freed when the objects on it aren't used anymore.
To tackle your specific questions from above:
What's the different between a stack overflow and a buffer overflow?
They're both cases of running over a memory limit. A stack overflow is specific to the stack; you've written your code (recursion is a common, but not the only, cause) so that it has too many nested function calls, or you're storing a lot of large stuff on the stack, and it runs out of room. Most OS's put a limit on the maximum size the stack can reach, and when you hit that limit you get the stack overflow. Modern hardware can detect stack overflows and it's usually doom for your process.
A buffer overflow is a little different. So first question - what's a buffer? Well, it's a bounded chunk of memory. That memory could be on the heap, or it could be on the stack. But the important thing is you have X bytes that you know you have access to. You then write some code that writes X + more bytes into that space. The compiler has probably already used the space beyond your buffer for other things, and by writing too much, you've overwritten those other things. Buffer overruns are often not seen immediately, as you don't notice them until you try to do something with the other memory that's been trashed.
Also, remember how I mentioned that return addresses are stored on the stack too? This is the source of many security issues due to buffer overruns. You have code that uses a buffer on the stack and has an overflow vulnerability. A clever hacker can structure the data that overflows the buffer to overwrite that return address, to point to code in the buffer itself, and that's how they get code to execute. It's nasty.
What happens when I initially initialize my variables in the code.. Are they in the code segment or in the data segment or in the heap?
I'm going to talk from a C / C++ perspective here. Assuming you've got a variable declaration:
int i;
That reserves (typically) four bytes on the stack. If instead you have:
char *buffer = malloc(100);
That actually reserves two chunks of memory. The call to malloc allocates 100 bytes on the heap. But you also need storage for the pointer, buffer. That storage is, again, on the stack, and on a 32-bit machine will be 4 bytes (64-bit machine will use 8 bytes).
Where do arrays get stored...???
It depends on how you declare them. If you do a simple array:
char str[128];
for example, that'll reserve 128 bytes on the stack. C never hits the heap unless you explicitly ask it to by calling an allocation method like malloc.
If instead you declare a pointer (like buffer above) the storage for the pointer is on the stack, the actual data for the array is on the heap.
Is it that after my code executes all that was in my heap gets erased...???
Basically, yes. The OS will clean up the memory used by a process after it exits. The heap is a chunk of memory in your process, so the OS will clean it up. Although it depends on what you mean by "clean it up." The OS marks those chunks of RAM as now free, and will reuse it later. If you had explicit cleanup code (like C++ destructors) you'll need to make sure those get called, the OS won't call them for you.
All in all, please tell me about heap in a more simplified manner than just, its for malloc and alloc?
The heap is, much like it's name, a bunch of free bytes that you can grab a piece at a time, do whatever you want with, then throw back to use for something else. You grab a chunk of bytes by calling malloc, and you throw it back by calling free.
Why would you do this? Well, there's a couple of common reasons:
You don't know how many of a thing
you need until run time (based on
user input, for example). So you
dynamically allocate on the heap as
you need them.
You need large data structures. On
Windows, for example, a thread's
stack is limited by default to 1
meg. If you're working with large
bitmaps, for example, that'll be a
fast way to blow your stack and get
a stack overflow. So you grab that
space of the heap, which is usually
much, much larger than the stack.
The code, data, stack and heap?
Not really a question, but I wanted to clarify. The "code" segment contains the executable bytes for your application. Typically code segments are read only in memory to help prevent tampering. The data segment contains constants that are compiled into the code - things like strings in your code or array initializers need to be stored somewhere, the data segment is where they go. Again, the data segment is typically read only.
The stack is a writable section of memory, and usually has a limited size. The OS will initialize the stack and the C startup code calls your main() function for you. The heap is also a writable section of memory. It's reserved by the OS, and functions like malloc and free manage getting chunks out of it and putting them back.
So, that's the overview. I hope this helps.
With respect to stack... This is precicely where the parameters and local variables of the functions / procedures are stored. To be more precise, the params and local variables of the currently executing function is only accessible from the stack... Other variables that belong to chain of functions that were executed before it will be in stack but will not be accessible until the current function completed its operations.
With respect global variables, I believe these are stored in data segment and is always accessible from any function within the created program.
With respect to Heap... These are additional memories that can be made allotted to your program whenever you need them (malloc or new)... You need to know where the allocated memory is in heap (address / pointer) so that you can access it when you need. Incase you loose the address, the memory becomes in-accessible, but the data still remains there. Depending on the platform and language this has to be either manually freed by your program (or a memory leak occurs) or needs to be garbage collected. Heap is comparitively huge to stack and hence can be used to store large volumes of data (like files, streams etc)... Thats why Objects / Files are created in Heap and a pointer to the object / file is stored in stack.
In terms of C/C++ programs, the data segment stores static (global) variables, the stack stores local variables, and the heap stores dynamically allocated variables (anything you malloc or new to get a pointer to). The code segment only stores the machine code (the part of your program that gets executed by the CPU).

How does a stackless language work?

I've heard of stackless languages. However I don't have any idea how such a language would be implemented. Can someone explain?
The modern operating systems we have (Windows, Linux) operate with what I call the "big stack model". And that model is wrong, sometimes, and motivates the need for "stackless" languages.
The "big stack model" assumes that a compiled program will allocate "stack frames" for function calls in a contiguous region of memory, using machine instructions to adjust registers containing the stack pointer (and optional stack frame pointer) very rapidly. This leads to fast function call/return, at the price of having a large, contiguous region for the stack. Because 99.99% of all programs run under these modern OSes work well with the big stack model, the compilers, loaders, and even the OS "know" about this stack area.
One common problem all such applications have is, "how big should my stack be?". With memory being dirt cheap, mostly what happens is that a large chunk is set aside for the stack (MS defaults to 1Mb), and typical application call structure never gets anywhere near to using it up. But if an application does use it all up, it dies with an illegal memory reference ("I'm sorry Dave, I can't do that"), by virtue of reaching off the end of its stack.
Most so-called called "stackless" languages aren't really stackless. They just don't use the contiguous stack provided by these systems. What they do instead is allocate a stack frame from the heap on each function call. The cost per function call goes up somewhat; if functions are typically complex, or the language is interpretive, this additional cost is insignificant. (One can also determine call DAGs in the program call graph and allocate a heap segment to cover the entire DAG; this way you get both heap allocation and the speed of classic big-stack function calls for all calls inside the call DAG).
There are several reasons for using heap allocation for stack frames:
If the program does deep recursion dependent on the specific problem it is solving,
it is very hard to preallocate a "big stack" area in advance because the needed size isn't known. One can awkwardly arrange function calls to check to see if there's enough stack left, and if not, reallocate a bigger chunk, copy the old stack and readjust all the pointers into the stack; that's so awkward that I don't know of any implementations.
Allocating stack frames means the application never has to say its sorry until there's
literally no allocatable memory left.
The program forks subtasks. Each subtask requires its own stack, and therefore can't use the one "big stack" provided. So, one needs to allocate stacks for each subtask. If you have thousands of possible subtasks, you might now need thousands of "big stacks", and the memory demand suddenly gets ridiculous. Allocating stack frames solves this problem. Often the subtask "stacks" refer back to the parent tasks to implement lexical scoping; as subtasks fork, a tree of "substacks" is created called a "cactus stack".
Your language has continuations. These require that the data in lexical scope visible to the current function somehow be preserved for later reuse. This can be implemented by copying parent stack frames, climbing up the cactus stack, and proceeding.
The PARLANSE programming language I implemented does 1) and 2). I'm working on 3). It is amusing to note that PARLANSE allocates stack frames from a very fast-access heap-per-thread; it costs typically 4 machine instructions. The current implementation is x86 based, and the allocated frame is placed in the x86 EBP/ESP register much like other conventional x86 based language implementations. So it does use the hardware "contiguous stack" (including pushing and poppping) just in chunks. It also generates "frame local" subroutine calls the don't switch stacks for lots of generated utility code where the stack demand is known in advance.
Stackless Python still has a Python stack (though it may have tail call optimization and other call frame merging tricks), but it is completely divorced from the C stack of the interpreter.
Haskell (as commonly implemented) does not have a call stack; evaluation is based on graph reduction.
There is a nice article about the language framework Parrot. Parrot does not use the stack for calling and this article explains the technique a bit.
In the stackless environments I'm more or less familiar with (Turing machine, assembly, and Brainfuck), it's common to implement your own stack. There is nothing fundamental about having a stack built into the language.
In the most practical of these, assembly, you just choose a region of memory available to you, set the stack register to point to the bottom, then increment or decrement to implement your pushes and pops.
EDIT: I know some architectures have dedicated stacks, but they aren't necessary.
Call me ancient, but I can remember when the FORTRAN standards and COBOL did not support recursive calls, and therefore didn't require a stack. Indeed, I recall the implementations for CDC 6000 series machines where there wasn't a stack, and FORTRAN would do strange things if you tried to call a subroutine recursively.
For the record, instead of a call-stack, the CDC 6000 series instruction set used the RJ instruction to call a subroutine. This saved the current PC value at the call target location and then branches to the location following it. At the end, a subroutine would perform an indirect jump to the call target location. That reloaded saved PC, effectively returning to the caller.
Obviously, that does not work with recursive calls. (And my recollection is that the CDC FORTRAN IV compiler would generate broken code if you did attempt recursion ...)
There is an easy to understand description of continuations on this article: http://www.defmacro.org/ramblings/fp.html
Continuations are something you can pass into a function in a stack-based language, but which can also be used by a language's own semantics to make it "stackless". Of course the stack is still there, but as Ira Baxter described, it's not one big contiguous segment.
Say you wanted to implement stackless C. The first thing to realize is that this doesn't need a stack:
a == b
But, does this?
isequal(a, b) { return a == b; }
No. Because a smart compiler will inline calls to isequal, turning them into a == b. So, why not just inline everything? Sure, you will generate more code but if getting rid of the stack is worth it to you then this is easy with a small tradeoff.
What about recursion? No problem. A tail-recursive function like:
bang(x) { return x == 1 ? 1 : x * bang(x-1); }
Can still be inlined, because really it's just a for loop in disguise:
bang(x) {
for(int i = x; i >=1; i--) x *= x-1;
return x;
}
In theory a really smart compiler could figure that out for you. But a less-smart one could still flatten it as a goto:
ax = x;
NOTDONE:
if(ax > 1) {
x = x*(--ax);
goto NOTDONE;
}
There is one case where you have to make a small trade off. This can't be inlined:
fib(n) { return n <= 2 ? n : fib(n-1) + fib(n-2); }
Stackless C simply cannot do this. Are you giving up a lot? Not really. This is something normal C can't do well very either. If you don't believe me just call fib(1000) and see what happens to your precious computer.
Please feel free to correct me if I'm wrong, but I would think that allocating memory on the heap for each function call frame would cause extreme memory thrashing. The operating system does after all have to manage this memory. I would think that the way to avoid this memory thrashing would be a cache for call frames. So if you need a cache anyway, we might as well make it contigous in memory and call it a stack.

How does a virtual machine work?

I've been looking into how programming languages work, and some of them have a so-called virtual machines. I understand that this is some form of emulation of the programming language within another programming language, and that it works like how a compiled language would be executed, with a stack. Did I get that right?
With the proviso that I did, what bamboozles me is that many non-compiled languages allow variables with "liberal" type systems. In Python for example, I can write this:
x = "Hello world!"
x = 2**1000
Strings and big integers are completely unrelated and occupy different amounts of space in memory, so how can this code even be represented in a stack-based environment? What exactly happens here? Is x pointed to a new place on the stack and the old string data left unreferenced? Do these languages not use a stack? If not, how do they represent variables internally?
Probably, your question should be titled as "How do dynamic languages work?."
That's simple, they store the variable type information along with it in memory. And this is not only done in interpreted or JIT compiled languages but also natively-compiled languages such as Objective-C.
In most VM languages, variables can be conceptualized as pointers (or references) to memory in the heap, even if the variable itself is on the stack. For languages that have primitive types (int and bool in Java, for example) those may be stored on the stack as well, but they can not be assigned new types dynamically.
Ignoring primitive types, all variables that exist on the stack have their actual values stored in the heap. Thus, if you dynamically reassign a value to them, the original value is abandoned (and the memory cleaned up via some garbage collection algorithm), and the new value is allocated in a new bit of memory.
The VM has nothing to do with the language. Any language can run on top of a VM (the Java VM has hundreds of languages already).
A VM enables a different kind of "assembly language" to be run, one that is more fit to adapting a compiler to. Everything done in a VM could be done in a CPU, so think of the VM like a CPU. (Some actually are implemented in hardware).
It's extremely low level, and in many cases heavily stack based--instead of registers, machine-level math is all relative to locations relative to the current stack pointer.
With normal compiled languages, many instructions are required for a single step. a + might look like "Grab the item from a point relative to the stack pointer into reg a, grab another into reg b. add reg a and b. put reg a into a place relative to the stack pointer.
The VM does all this with a single, short instruction, possibly one or two bytes instead of 4 or 8 bytes PER INSTRUCTION in machine language (depending on 32 or 64 bit architecture) which (guessing) should mean around 16 or 32 bytes of x86 for 1-2 bytes of machine code. (I could be wrong, my last x86 coding was in the 80286 era.)
Microsoft used (probably still uses) VMs in their office products to reduce the amount of code.
The procedure for creating the VM code is the same as creating machine language, just a different processor type essentially.
VMs can also implement their own security, error recovery and memory mechanisms that are very tightly related to the language.
Some of my description here is summary and from memory. If you want to explore the bytecode definition yourself, it's kinda fun:
http://java.sun.com/docs/books/jvms/second_edition/html/Instructions2.doc.html
The key to many of the 'how do VMs handle variables like this or that' really comes down to metadata... The meta information stored and then updated gives the VM a much better handle on how to allocate and then do the right thing with variables.
In many cases this is the type of overhead that can really get in the way of performance. However, modern day implementations, etc have come a long way in doing the right thing.
As for your specific questions - treating variables as vanilla objects / etc ... comes down to reassigning / reevaluating meta information on new assignments - that's why x can look one way and then the next.
To answer a part of your questions, I'd recommend a google tech talk about python, where some of your questions concerning dynamic languages are answered; for example what a variable is (it is not a pointer, nor a reference, but in case of python a label).

Resources