Kinda stuck here and was hoping for a pointer on memory addressing.
In theory, these represent R1 through R4. I assume 0x60 is R1, and 0x6C is R4, incrementing by a word each time. Is that the case?
If I wanted to run
ADD R1, R2
Would it store the result of the addition of 0x60 and 0x6C in memory location 0x60? Or am I looking at this wrong?
ARM registers do not correspond to any memory location. In some contexts ("spill slots" on the stack, "task state" used for multitasking) there will be memory locations reserved to save the contents of some or all registers, but they must be explicitly copied back and forth.
The problem you're trying to do is poorly worded, but I think the table gives the values of memory locations 0x60 through 0x6C, and, separately, the text ("[R1] = ..., [R2] = ..., etc") gives the values of the registers. If I'm reading this right, the instruction labeled (a) will copy the low byte of the value at memory location 0x62, which is either 0x9A or 0x90, I'm not sure which, into register R1, sign-extending it. I hope that's enough to get you unstuck.
Related
I'm studying computer architecture in my university and I guess I don't know the basic of computer system and C language concepts, few things really confuse me and I was kept searching bout it but couldn't find answer what I want and make me more confuse so upload question here.
1. I thought register is holding an instruction, storage address or any kind of data in CPU. And I also learned memory layout.
------------------
stack
dynamic data
static data
text
reserved part
------------------
Then register is having this memory layout in CPU? Or am I just confusing it with computer's 5 components(input, output, memory, control, datapath)'s memory's layout. I thought this is one of this 5 component's layout.
RISC-V (while loop in C)
Loop:
slli x10, x22, 3
add x10, x10, x25
ld x9, 0(x10)
bne x9, x24, Exit
addi x22, x22, 1
beq x0, x0, Loop
Exit:...
Then where does this operation happens? Register?
I learned RISC-V Registers like below.
x0: the constant value 0
x1: return address
...
x5-x7, x28-x31: temporaries
...
If register is in that memory layout what I draw above, then that x0, x1 stuffs are contained in where? It doesn't make sense from here. So I'm confusing how do I have to think register looks like.
Everything is so abstract in my mind so I guess question sounds bit weird. If anything is not cleared, comment me please.
Then register is having this memory layout in CPU?
No, that makes zero sense, your thinking is on the wrong track here.
The register file is its own separate space, not part of memory address space. It's not indexable with a variable, only by hard-coding register numbers into instructions, so there's not really any sense in which x2 is the "next register after x1" or anything. e.g. you can't loop over registers. They're just two separate 32 or 64-bit data-storage spaces that software can use however they want.
The natural categories to break them up are based on software / calling conventions:
stack pointer
call-preserved registers (function calls don't modify them, or conversely if you want to use one in a function you have to save/restore it)
call-clobbered registers (function calls must be assumed to step on them, and conversely can be used without saving/restoring)
the zero register.
Also arg-passing vs. return-value registers.
In LC-3 when you use BLKW how do you initialize the block of memory to be at location x3102 instead of the next available memory location?
First, let's make a side note that all the memory of the 16-bit address space is there for you to use, according to the memory map (e.g. at least in the range x3000-xFDFF), and, it is initialized to zero; you can use it at will.
Generally speaking, the LC-3 assemblers don't allow multiple .ORIG directives throughout the file, instead, they require one at the beginning of the file. If they did allow subsequent .ORIG directives, this would be a way to accomplish what you're asking about.
But even if they did, frequently, we'd run into instruction offset encoding limitations. So there's an alternate solution I'll show below.
But first, let's look at the instruction offset/immediate encoding limitations.
The usual data memory access instruction formats have a very limited offset, only 9 bits worth (+/- about 256), and the offset is pc-relative. So, for example, the following won't work:
.ORIG x3000
LEA R0, TESTA
LD R1, TESTA
LEA R2, TESTB ; will get an error due to instruction offset encoding limitation
LD R3, TESTB ; will get an error due to instruction offset encoding limitation
HALT
TESTA
.FILL #1234
.BLKW xFA ; exactly enough padding to relocate TESTB to x3100
TESTB
.FILL #4321 ; which we can initialize with a non-zero value
.END
This illustrative: while this will successfully place TESTB at x3100, it cannot be reached by the either the LEA or the LD instructions due to the limited 9-bit pc-relative displacement.
(There is also the other practical limitation that as instructions are added the .BLKW operand has to shrink in size, which is clearly painful — this aspect would have been eliminated by supporting a .ORIG directive within.)
So, the alternative for large blocks and other such is to resort to using zero-initialized memory, and referencing this other memory using pointer variables nearby: using LD to load an address, rather than LEA, and LDI to access a value rather than LD.
.ORIG x3000
LEA R0, TESTA
LD R1, TESTA
LD R2, TESTBRef ; will put x3100 into R3
LDI R3, TESTBRef ; will access the memory value at address x3100..
HALT
TESTA
.FILL #1234
TESTBRef ; a nearby data pointer, that points using the full 16-bits
.FILL x3100
.END
In the latter, above, there is no declaration to reserve storage at x3100, nor can we initialize that storage at x3100 with non-zero initialization (e.g. no strings, no pre-defined arrays).
A data-to-data pointer, TESTBRef is used. Unlike code-to-code/data references (i.e. instructions referencing code or data), data-to-code/data
(i.e. data referencing code or data) pointers have all 16-bits available for pointing.
So, once we use this approach, of simply using other memory, we forgo the automatic placement of labels after other labels (for those other areas), and also forgo non-zero initialization.
Some LC-3 assemblers will allow multiple files, and these each allow their own .ORIG directive — so by using multiple files, we can place code&data at varied locations in the address space. However, the code-to-data instruction offset encoding limits still apply, and so, you'll likely end up managing other such memory areas manually anyway, and using data pointers as well.
Note that the JSR instruction has an 11-bit offset so code-to-code references can reach farther than code-to-data references.
I'm working on one project that uses SSE in non-conventional ways. One of the things about it, is that addresses of memory locations are kept duplicated in __m128i variable.
My task is to get value from memory using this address and do it as fast as possible. Value that we want to get from memory is also 128 bit long. I know that keeping address in __m128i is an abuse of SSE, but it cannot be done other way. Addresses have to be duplicated.
My current implementation:
Get lower 64 bit of duplicated address using MOVQ
Having address, use MOVAPS to get value from the memory
In assembly it looks like this:
MOVQ %xmm1, %rax
MOVAPS (%rax), %xmm2
Question: can it be done faster? May be some optimizations can be applied if we do this multiple times in a row?
That movq / dereference sequence is your best bet if you have addresses stored in xmm registers.
Haswell's gather implementation is slower than manually loading things, so using VGATHERQPS (qword indices -> float data) is unlikely to be a win. Maybe with a future CPU design that has a much faster gather.
But the real question is why would you have addresses in XMM registers in the first place? Esp. duplicated into both halves of the register. This just seems like a bad idea that would take extra time to set up, and take extra time to use. (esp. on AMD hardware, where move between GP and vector registers takes 5 or 10 cycles, vs. 1 for Intel.) It would be better to load addresses from RAM directly into GP registers.
I'm creating a 64-bit model of IA-32 and am representing memory as a 0-based array of 2**64 bytes (the language I'm modeling this in uses ** as the exponentiation operator). This means that valid indices into the array are from 0 to 2**64-1. Now, to model the possible modes of accessing that memory, one can treat one element as an 8-bit number, two elements as a (little-endian) 16-bit number, etc.
My question is, what should my model do if they ask for a 16-bit (or 32-bit, etc.) number from location 2**64-1? Right now, what the model does is say that the returned value is Memory(2**64-1) + (8 * Memory(0)). I'm not updating any flags (which feels wrong). Is wrapping like this the correct behavior? Should I be setting any flags when the wrapping happens?
I have a copy of Intel-64-ia-32-ISA.pdf which I'm using as a reference, but it's 1,479 pages, and I'm having a hard time finding the answer to this particular question.
The answer is in Volume 3A, section 5.3: "Limit checking."
For ia-32:
When the effective limit is FFFFFFFFH (4 GBytes), these accesses [which extend beyond the end of the segment] may or may not cause the indicated exceptions. Behavior is implementation-specific and may vary from one execution to another.
For ia-64:
In 64-bit mode, the processor does not perform rumtime limit checking on code or data segments. Howver, the processor does check descriptor-table limits.
I tested it (did anyone expect that?) for 64bit numbers with this code:
mov dword [0], 0xDEADBEEF
mov dword [-4], 0x01020304
mov rdi, [-4]
call writelonghex
In a custom OS, with pages mapped as appropriate, running in VirtualBox. writelonghex just writes rdi to the screen as a 16-digit hexadecimal number. The result:
So yes, it does just wrap. Nothing funny happens.
No flags should be affected (though the manual doesn't say that no flags should be set for address wrapping, it does say that mov reg, [mem] doesn't affect them ever, and that includes this case), and no interrupt/trap/whatever happens (unless of course one or both pages touched are not present).
I am working with the MIPS architecture (not sure if this is relevant since we are dealing with memory).
I am told that A 32-bit integer is in memory at physical address 0x00A0CE48.
I assume that number is 00000000111111110000000011111111.
The system is byte-addressable, what value would be at memory address 0x00A0?
I wasn't sure if the first 8 bits were at address 0x00, next 8 bits at 0x00A0, next 8 bits at 0x00A0CE, and the last 8 bits at 0x00A0CE48? I'm asking because I have to manipulate a value in 0x00A0, but im not sure what's there.
Part of the problem is to 1st assume big endian is used, then little endian.
A 32-bit integer resides in memory at physical address 0x00A0CE48. The bits within the 32-bit word are numbered 0 to 31 from least significant bit to most significant bit. The code below extracts a single bit from this 32 bit pattern and places the bit into $t4.
lui $t0,0x00A0
ori $t0,$t0,0xCE48
lbu $t4,2($t0)
srl $t4,$t4,5
andi $t4,$t4,1
The next question in my assignment is to indicate the number of the bit (0 through 31) within the 32-bit word that is left in $t4 if the memory order used is little-endian or big-endian.
On a big endian system, the most significant byte is stored first. So, assuming the value is 0x12345678, 0x12 will be stored at address 0x00A0CE48, 0x34 will be stored at address 0x00A0CE49, 0x56 will be stored at address 0x00A0CE4A, and 0x78 will be stored at address 0x00A0CE4B.
On the other hand, on a little endian system, the least significant byte will be stored first. So, 0x78 would be stored at 0x00A0CE48, and so on.
Note that if a 32-bit word is stored at address 0x00A0CE48, the next word will be four bytes later, at address 0x00A0CE4C. The arithmetic should be performed on the address as a whole. You cannot consider the bytes making up the address separately when reading from memory.
In the assembly you've posted, lui (which stands for "load upper immediate") will shift the immediate value 16 bits to the left and store it in $t0. After that instruction, the value in $t0 will be 0x00A00000. The next instruction will OR the contents of $t0 with 0xCE48 and store the results in $t0. After that, $t0 will contain your full address, 0x00A0CE48.