x86 Can push/pop be less than 4 bytes? [duplicate]

x86 Can push/pop be less than 4 bytes? [duplicate] - memory

This question already has answers here:
Why is it not possible to push a byte onto a stack on Pentium IA-32?
(4 answers)
Closed 7 years ago.
Hi I am reading a guide on x86 by the University of Virginia and it states that pushing and popping the stack either removes or adds a 4-byte data element to the stack.
Why is this set to 4 bytes? Can this be changed, could you save memory on the stack by pushing on smaller data elements?
The guide can be found here if anyone wishes to view it:
http://www.cs.virginia.edu/~evans/cs216/guides/x86.html

Short answer: Yes, 16 or 32 bits. And, for x86-64, 64 bits.
The primary reasons for a stack are to return from nested function calls and to save/restore register values. It is also typically used to pass parameters and return function results. Except for the smallest parameters, these items usually have the same size by the design of the processor, namely, the size of the instruction pointer register. For 8088/8086, it is 16-bits. For 80386 and successors, it is 32-bits. Therefore, there is little value in having stack instructions that operate on other sizes.
There is also the consideration of the size of the data on the memory bus. It takes the same amount of time to retrieve or store a word as it does a byte. (Except for 8088 which has 16-bit registers but an 8-bit data bus.) Alignment also comes into play. The stack should be aligned on word boundaries so each value can be retrieved as one memory operation. The trade-off is usually taken to save time over saving memory. To pass one byte as a parameter, one word is usually used. (Or, depending on the optimization available to the compiler, one word-sized register would be used, avoiding the stack altogether.)

Related

why stack pointer is initialized to the maximum value?

why stack pointer is initialized to the maximum value?
I only knows that It is the tiny register which stores the last program request’s address in a stack. It is the particular kind of buffer that stores the information in the order of top-down. but can anyone explain me why initially it's having max value.

Without more detail on the actual microprocessor, there isn't a single exact answer; but in general, each architecture handles stack pointer initialization a bit differently. For example, the version of ARM used in many microprocessors initializes the stack pointer (also R13) from the vector table (the first entry). Other architectures either initialize the register to 0 or some other value; so it's not always all 1s as you mention in your question. If the hardware itself doesn't initialize the stack pointer so somewhere meaningful, some of the first instructions usually does. And usually this value is near or at the top of memory as you mention, the stack typically grows down from higher addresses to lower ones; so a value of all 1s or some other larger value might make sense depending on how memory is laid out and managed.
One thing also worth mentioning is that you say the stack stores the "last program request's address" which if I understand correctly is one of the things the stack stores. In more architectures, the stack can store much more than just the return address of a call but also local variables, saved context when a call is made or a context is swapped (either by an OS or by an exception/interrupt) and anything else the program might want to push onto it.
So the short answer is: it isn't always set to the maximum value but it's usually set to some high value as the stack will grow down to lower addresses as things like data and addresses are pushed on it.

How do stack machines efficiently store data types of different sizes?

Suppose I have the following primitive stack implementation for a virtual machine:
unsigned long stack[512];
unsigned short top = 0;
void push(unsigned long qword) {
stack[top] = qword;
top++;
}
void pop() {
top--;
}
unsigned long get() {
return top-1;
}
This stack actually works fine (except that it doesn't check for an overflow) but I now have the following problem: It is quite inefficient.
Here is an example:
Let's say I want to push a byte onto the stack. I would now have to cast it to a long and then push it onto the stack. But now a whole 7 bytes are not being used. This feels kind of wrong.
So now I have the following question:
How do stack machines efficiently store data types of different sizes? Do they do the same as in this implementation?

There are different metrics of efficiency. Using an eight bytes long to store a single byte will raise the memory consumption. On the other hand, memory is not the major concern on most of today’s machines. Further, a stack is a pre-allocated amount of memory, typically. So as long as not the entire memory block has been exhausted, it is entirely irrelevant whether the unused seven bytes are within that long or on the other side of the location marked by top.
In terms of CPU time, you don’t gain any advantage of transferring a quantity smaller than the hardware’s bus size. In the best case, it makes no difference. In the worst case, transferring a single byte boils down to reading a long from memory, manipulating one byte of it and writing the long back. In the latter case, it would be more efficient to expand the byte to long, to overwrite all eight bytes explicitly.
This is reflected by the design of the Java bytecode, for example. It does not only drop support for pushing and popping quantities smaller than 32 bit, it doesn’t even support arithmetic instructions for them¹. So for most use cases, you don’t even know that a quantity could be a byte before pushing. Only formal parameter types and array types may refer to byte.
But note that a JVM isn’t even a stack engine in the narrowest sense. There is no support for pushing and popping arbitrary numbers of items. As explained in this answer, expressing the intent using a stack allows very compact instructions. But Java bytecode doesn’t allow branching to code locations with a different number of items on the stack. So it doesn’t support pushing or popping items in a loop. In other words, for each instruction, the actual offset into the stack is predictable and also the operand types are known. So it’s always possible to transform Java bytecode to an IR not using a stack straight-forwardly. Such transformed code could use instructions with arbitrary operand sizes, if that has a benefit on the particular target architecture.
¹ And that was accounting for hardware in use a quarter century ago

There's no "one true" way of doing this, and the Java VM uses a few different strategies. All types less than 32-bits in size are widened to 32-bits. Pushing 1 byte to the stack effectively pushes 4 bytes to the stack. The benefit is simplicity when there are fewer native value sizes to deal with.
Another strategy is used for 64-bit values. They occupy two stack slots instead of one. The JVM has specific opcodes which indicate which type of value they expect on the stack, and the verifier ensures that no opcode is attempting to access a variable off the stack that doesn't match the type that should be there.
A third strategy is used for object references. The actual pointer size can be 32 bits or 64 bits, depending on the CPU capabilities, whether the JVM is running in 64-bit mode, etc. The JVM has specific opcodes for handling object references, and the verifier checks this too.

MIPS architecture padding

How do data types get allocated in stack in MIPS architecture?
If i have 2 char and 1 int data, is stack going to allocate them in 8 byte form(2 chars are in same memory segment and 1 int is in another memory segment) or 12 byte form(memory segment for each chars and 1 memory segment for int)? I am trying to understand 32 bit MIPS architecture.

To the question, it matters if the data types being allocated are for local variables vs. for parameter passing.
For locals you can allocate whatever you want as long as the int is aligned on 4 byte boundary.  The total stack allocation is rounded up to 8 bytes (though some don't bother with this, e.g. for homework, and, is only strictly necessary if your function calls other functions that may rely on the expected 8 byte alignment of the stack.)
For parameters you should follow the documented calling convention — there are several so you have to know which you're working with.  See here to see some of them; look for "MIPS EABI 32-bit Calling Convention" and/vs. "MIPS O32 32-bit Calling Convention".
What they have in common is that the first four parameters are passed in registers, which effectively means that chars take a full 32-bit word; char parameters passed on the stack also follow that form, so take a full 32-bit word each.

Stack data size

I cannot find answer to this question. What is the size of data pushed on stack?
Let's say that you push some data on the stack. For example an int. Then value of stack pointer decreases by 4 bytes.
Until today I thought that the biggest data that can be pushed on stack cannot be larger than pointer size. But I did a little experiment. I wrote a simple app in C#:
int i = 0; //0x68 <-- address of variable
int j = 1; //0x64
ulong k = 2; //0x5C
int l = 3; //0x58
Due to my predictions, ulong should be allocated on heap, because it needs 8 bytes, while I am working on 32 bit system (so pointer size is 4 bytes). And that would be very odd, because this is simple local variable.
But it was pushed on stack.
So I think that there is something wrong with the way I think of stack. Because how would stack pointer "know" if he must be changed 4 bytes (if we push int) or 8 bytes (if we push long or double) or just 1 byte?
Or maybe values of local variables are on heap but addresses of those variables are on stack. But this would make no sense, because that's how objects are processed. When I create object (using new keyword) object is allocated on heap and address to this object is pushed on stack, right?
So could someone tell me (or give a link to article) how it really works?

The compiler or run-time environment knows the kind of data you are working with, and knows how to structure into units that will fit onto the stack. So, for example, on an architecture where the stack pointer only moves in 32-bit chunks, the compiler or run-time will know how to format a data element that is longer that 32-bits as multiple 32-bit chunks to push them. It will know, similarly, how to pop them off the stack and reconstruct a variable of the appropriate type.
The problem here is not really different for the stack than it is for any other kind of memory. The CPU architecture will be optimised to work with a small number of data elements of a particular size, but programming languages typically provide a much wider range of data types. The compiler or the run-time environment has to handle the packing and unpacking of data this situation requires.

Memory Locations of Variables when Using IA-32 Assembly Language

Quick question on memory locations in IA-32 assembly language that i cannot seem to find the answer for anywhere else.
On IA-32 each memory address is 4 bytes long (e.g. 0x0040120e). Each of these addresses points to a 1 byte value (or in the case of a larger value, the first byte of it). Now look at these two simple IA-32 assembly language statements:
var1 db 2
var2 db 3
This will place var1 and var2 in adjacent memory cells (let's say 0x0040120e and 0f). Now I realize that the define directive db allocates 1 byte to the value. But, in the case above I have two values (2 and 3) that in fact only requires two bits each, to be stored.
Questions:
When using the db directive, do these two values still consume a full byte, even though they are smaller than 1 byte?
Is using a full byte for values that could get away with less, still the common way to go (as we have so much memory that we don't care)?
Does integers 0 to 255 then generally take up 1 byte and integers 256 to (2^16 - 1) take up 2 bytes (a word), etc.?
Thank you,
Magnus
EDIT 1: Made questions more clear (apologies for the back and forth)
EDIT 2: Added a structured reply below, based on other posters' input

yes. the B in DB is for Byte.
You could use a nibble for each, like so:
combined db 0x23
but you'd have to
a) shift the result for 4 bits right if you need the "2".
b) mask the leftmost 4 bits if you need the "3".
Hardly worth the effort these days ;-)

Yes, since the architecture is byte-addressable and cannot address anything smaller than a byte.
This means that data requiring less than one byte will need to share its address with other data.
In practice this means that you're going to have to know which bits in the pointed-out byte are used for this particular value.
For hardware registers this sort of mapping is very common.
EDIT: Ah, you seem to mean "values of the same variable" when you said "2 and 3". I thought you meant 2-bit and 3-bit values. You need to decide how many bits are needed at most for a particular variable, for all the values you need that variable to be able to store. There are variable-length encodings for integers of course, but that's generally rarely used in assembly and not what you'd typically use for some general-purpose variable.
You generally should expect to reserve all bits required for all values that a variable need to hold, up front. Otherwise, if you're worried about "wasting memory", you would need to move all other variables as soon as you get some "vacant bits" somewhere. That would end up costing fantastically much. Also, knowing the size of a variable is constant makes it possible to generate (or write) the proper code to handle it, otherwise you would of course also need to explicitly store somewhere "the size of the value held in variable x is now y bits". That becomes extremely painful very very quickly.

My initial question was a bit unstructured, so for the benefit of other searchers stopping by here I will use the answers received from #unwind and #geert3 to create a structured response. Again, this was my fault due to the initial poor structuring and creds for the answers goes to #unwind and #geert3.
When using the db directive you allocate 1 byte to the variable, and even if the variable takes up less space than 1 byte, it will still consume that full 1-byte address spot. As one might guess, that wastes a few bits of memory, but that is okay as you have enough memory and not too bothered about wasting a couple of bits. The reason you want to use the full 1-byte memory location is that it is easier to reference the variable when it is alone in the address slot (see #geert3's note on how to access it if you use less than a byte), and additionally, in case you want to reuse the variable later, it is great to know you have space for any number up to 255.
Yes, see answer to 1
Yes, you would normally allocate multiples of a byte to a variable, in a byte-addressable system

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart