x86 mov / add Instructions & Memory Addressing - memory

I'm learning x86 assembly in class and I'm very lost as to how you differentiate between what a register operand and what a memory reference does. I had several confusions I was hoping to clear up.
The following code is what my textbook says is the long way to do push and pop respectively:
subl $4, %esp
movl %ebp, (%esp)
movl (%esp), %eax
addl $4, %esp
So in the subl instruction, can we always expect %esp to hold an address value?
Also what is the difference between the two movl functions? Can the first one be written as
movl (%ebp), %esp
? And for the second movl, does that move the address of %esp or does it move the value pointed to by %esp?
As a follow-up then, why can't we have the source and destination be memory references like so?
movw (%eax), 4(%esp)
And lastly, for the following code:
movb (%esp, %edx, 4), %dh
if the source is more than 1 byte (the size of %dh), then what happens? Does it just truncate the value?
Sorry, this was a ton of questions but any help would be greatly appreciated.

The following code is what my textbook says is the long way to do push
and pop respectively:
subl $4, %esp
movl %ebp, (%esp)
movl (%esp), %eax
addl $4, %esp
So in the subl instruction, can we always expect %esp to hold an
address value?
Yes. The ESP register holds the memory address of the last value pushed into the stack.
Also what is the difference between the two movl functions? Can the
first one be written as
movl (%ebp), %esp
? And for the second movl, does that move the address of %esp or does
it move the value pointed to by %esp?
The MOV instruction, in AT&T syntax, expects two operands: source and destination. MOV copies data (in this case, 32 bits, as denoted by the L suffix) from the first operand written at the left, to the second operand, written at the right. If one of them is enclosed in parenthesis, that means that the operand is a memory operand, and the value in parenthesis is its memory address, rather than the actual value)
So, movl %ebp,(%esp) means: copy the value of register EBP into memory, at the address pointed by the value of register ESP.
What you meant with movl (%ebp),%esp is: copy 32 bits of data starting at the memory address pointed by the value of EBP, into the ESP register.
So you are changing the direction of the movement.
As a follow-up then, why can't we have the source and destination be
memory references like so?
movw (%eax), 4(%esp)
Short answer: because the encoding used by Intel doesn't allow that. Long answer: the way Intel designed the ISA, available resources to calculate two effective addresses in the good old 8086, etc
And lastly, for the following code:
movb (%esp, %edx, 4), %dh
if the source is more than 1 byte (the size of %dh), then what
happens? Does it just truncate the value?
The source is the same size as the destination. This is imposed both by the B suffix and the fact that the destination is an 8-bit register. The value iin parenthesis is then the address of a single memory byte. By the way, this address is ESP+EDX*4

So in the subl instruction, can we always expect %esp to hold an address value?
By definition yes, because the next instruction uses it as an address. Whatever junk was in there is now an address. If it's some crap like 0xDEADBEEF .. well that's an address now. It may or may not be a valid address (hopefully valid, an invalid stack is kind of a bad thing), but that's an other matter.
Also what is the difference between the two movl functions?
The one used in the "push" writes to memory, the one used in the "pop" reads. Brackets on the right side means it's a write, on the left side it's a read. So can you switch them around? Obviously not, it would do something different. movl (%ebp), %esp would read something from memory and then change where the stack is.
As a follow-up then, why can't we have the source and destination be memory references like so?
Because you can't, there is no such instruction. It wouldn't fit the normal operand encoding. I suppose an instruction like that could have existed, but it doesn't, so you can't use it.
movb (%esp, %edx, 4), %dh if the source is more than 1 byte (the size of %dh), then what happens?
It isn't, by definition that instruction will read 1 byte.

Related

register and memory, risc-v

I'm studying computer architecture in my university and I guess I don't know the basic of computer system and C language concepts, few things really confuse me and I was kept searching bout it but couldn't find answer what I want and make me more confuse so upload question here.
1. I thought register is holding an instruction, storage address or any kind of data in CPU. And I also learned memory layout.
------------------
stack
dynamic data
static data
text
reserved part
------------------
Then register is having this memory layout in CPU? Or am I just confusing it with computer's 5 components(input, output, memory, control, datapath)'s memory's layout. I thought this is one of this 5 component's layout.
RISC-V (while loop in C)
Loop:
slli x10, x22, 3
add x10, x10, x25
ld x9, 0(x10)
bne x9, x24, Exit
addi x22, x22, 1
beq x0, x0, Loop
Exit:...
Then where does this operation happens? Register?
I learned RISC-V Registers like below.
x0: the constant value 0
x1: return address
...
x5-x7, x28-x31: temporaries
...
If register is in that memory layout what I draw above, then that x0, x1 stuffs are contained in where? It doesn't make sense from here. So I'm confusing how do I have to think register looks like.
Everything is so abstract in my mind so I guess question sounds bit weird. If anything is not cleared, comment me please.
Then register is having this memory layout in CPU?
No, that makes zero sense, your thinking is on the wrong track here.
The register file is its own separate space, not part of memory address space. It's not indexable with a variable, only by hard-coding register numbers into instructions, so there's not really any sense in which x2 is the "next register after x1" or anything. e.g. you can't loop over registers. They're just two separate 32 or 64-bit data-storage spaces that software can use however they want.
The natural categories to break them up are based on software / calling conventions:
stack pointer
call-preserved registers (function calls don't modify them, or conversely if you want to use one in a function you have to save/restore it)
call-clobbered registers (function calls must be assumed to step on them, and conversely can be used without saving/restoring)
the zero register.
Also arg-passing vs. return-value registers.

LC-3: BLKW ho to specify memory location to store data at?

In LC-3 when you use BLKW how do you initialize the block of memory to be at location x3102 instead of the next available memory location?
First, let's make a side note that all the memory of the 16-bit address space is there for you to use, according to the memory map (e.g. at least in the range x3000-xFDFF), and, it is initialized to zero; you can use it at will.
Generally speaking, the LC-3 assemblers don't allow multiple .ORIG directives throughout the file, instead, they require one at the beginning of the file. If they did allow subsequent .ORIG directives, this would be a way to accomplish what you're asking about.
But even if they did, frequently, we'd run into instruction offset encoding limitations. So there's an alternate solution I'll show below.
But first, let's look at the instruction offset/immediate encoding limitations.
The usual data memory access instruction formats have a very limited offset, only 9 bits worth (+/- about 256), and the offset is pc-relative. So, for example, the following won't work:
.ORIG x3000
LEA R0, TESTA
LD R1, TESTA
LEA R2, TESTB ; will get an error due to instruction offset encoding limitation
LD R3, TESTB ; will get an error due to instruction offset encoding limitation
HALT
TESTA
.FILL #1234
.BLKW xFA ; exactly enough padding to relocate TESTB to x3100
TESTB
.FILL #4321 ; which we can initialize with a non-zero value
.END
This illustrative: while this will successfully place TESTB at x3100, it cannot be reached by the either the LEA or the LD instructions due to the limited 9-bit pc-relative displacement.
(There is also the other practical limitation that as instructions are added the .BLKW operand has to shrink in size, which is clearly painful — this aspect would have been eliminated by supporting a .ORIG directive within.)
So, the alternative for large blocks and other such is to resort to using zero-initialized memory, and referencing this other memory using pointer variables nearby: using LD to load an address, rather than LEA, and LDI to access a value rather than LD.
.ORIG x3000
LEA R0, TESTA
LD R1, TESTA
LD R2, TESTBRef ; will put x3100 into R3
LDI R3, TESTBRef ; will access the memory value at address x3100..
HALT
TESTA
.FILL #1234
TESTBRef ; a nearby data pointer, that points using the full 16-bits
.FILL x3100
.END
In the latter, above, there is no declaration to reserve storage at x3100, nor can we initialize that storage at x3100 with non-zero initialization (e.g. no strings, no pre-defined arrays).
A data-to-data pointer, TESTBRef is used. Unlike code-to-code/data references (i.e. instructions referencing code or data), data-to-code/data
(i.e. data referencing code or data) pointers have all 16-bits available for pointing.
So, once we use this approach, of simply using other memory, we forgo the automatic placement of labels after other labels (for those other areas), and also forgo non-zero initialization.
Some LC-3 assemblers will allow multiple files, and these each allow their own .ORIG directive — so by using multiple files, we can place code&data at varied locations in the address space. However, the code-to-data instruction offset encoding limits still apply, and so, you'll likely end up managing other such memory areas manually anyway, and using data pointers as well.
Note that the JSR instruction has an 11-bit offset so code-to-code references can reach farther than code-to-data references.

Assembly memory addressing in big endian format

Kinda stuck here and was hoping for a pointer on memory addressing.
In theory, these represent R1 through R4. I assume 0x60 is R1, and 0x6C is R4, incrementing by a word each time. Is that the case?
If I wanted to run
ADD R1, R2
Would it store the result of the addition of 0x60 and 0x6C in memory location 0x60? Or am I looking at this wrong?
ARM registers do not correspond to any memory location. In some contexts ("spill slots" on the stack, "task state" used for multitasking) there will be memory locations reserved to save the contents of some or all registers, but they must be explicitly copied back and forth.
The problem you're trying to do is poorly worded, but I think the table gives the values of memory locations 0x60 through 0x6C, and, separately, the text ("[R1] = ..., [R2] = ..., etc") gives the values of the registers. If I'm reading this right, the instruction labeled (a) will copy the low byte of the value at memory location 0x62, which is either 0x9A or 0x90, I'm not sure which, into register R1, sign-extending it. I hope that's enough to get you unstuck.

Memory access using _m128i address

I'm working on one project that uses SSE in non-conventional ways. One of the things about it, is that addresses of memory locations are kept duplicated in __m128i variable.
My task is to get value from memory using this address and do it as fast as possible. Value that we want to get from memory is also 128 bit long. I know that keeping address in __m128i is an abuse of SSE, but it cannot be done other way. Addresses have to be duplicated.
My current implementation:
Get lower 64 bit of duplicated address using MOVQ
Having address, use MOVAPS to get value from the memory
In assembly it looks like this:
MOVQ %xmm1, %rax
MOVAPS (%rax), %xmm2
Question: can it be done faster? May be some optimizations can be applied if we do this multiple times in a row?
That movq / dereference sequence is your best bet if you have addresses stored in xmm registers.
Haswell's gather implementation is slower than manually loading things, so using VGATHERQPS (qword indices -> float data) is unlikely to be a win. Maybe with a future CPU design that has a much faster gather.
But the real question is why would you have addresses in XMM registers in the first place? Esp. duplicated into both halves of the register. This just seems like a bad idea that would take extra time to set up, and take extra time to use. (esp. on AMD hardware, where move between GP and vector registers takes 5 or 10 cycles, vs. 1 for Intel.) It would be better to load addresses from RAM directly into GP registers.

What does CLD do here?

I have seen code
procedure FillDWord(var Dest; Count, What: dword); assembler ;
asm
PUSH EDI
MOV EDI, Dest
MOV EAX, What
MOV ECX, Count
CLD
REP STOSD
POP EDI
end;
I googled CLD and it says it clears the direction flag... so is it important here? after I removed it, the function seems working fine.
The direction flag controls if - during the execution of REP STOSD the EDI register will be incremented or decremented.
In case of a cleared direction flag (e.g. after execution of CLD) the pointer will be incremented, so the function does a memory fill.
The CLD is in this code because the programmer probably was not able to guarantee that the direction flag was cleared. Therefore he made sure that it is cleared before executing REP STOSD.
If the code works when CLD is removed, then the direction flag was clear at the entry of the function. Since the direction flag is not part of the calling conventions that was just by luck. It could be the other way next time, and in this case your program will very likely crash.
Clearing/setting the flag is a very fast operation, so it's good practice to add them to the assembler code. This also makes it easy for other programmers to understand your function because the state of the direction flag is explicitly defined.
The stosd command can either work down the memory, incrementing EDI, or up the memory, decrementing it. This depends on the value of the direction ("D") flag. If the flag is set to 1 upon function entrance and never explicitly cleared, it'll misbehave wildly. There's no convention on the default value of that flag; so the function plays it safe.
EDIT: Egor says Delphi has a convention :) Still, better safe than sorry.

Resources