Memory ordering. Two processors. - memory

This example comes from 64-IA-32 Architectures Software Developers Manual.
And I cannot understand why store of CPU #0 must be visible for other processors? In that concrete example ( when r1 = 1, r2 = 1) I agree that it is obvious that store made by CPU #0 must be retired firstly- I mean the 1 value must be in memory/cache in fact ( not in store buffer).
But what about general case? Is it possible the following situation:
CPU#0 stored: mov [_x], 1 but it was stored in CPU#0's store buffer, and then CPU#1 loaded: mov r1, [_x] and in fact, r1 is equal to 0?

Related

Detecting boundaries and resetting circular buffer pointer in both directions

I am working with an 8051 microcontroller, but my question is more algorithm specific.
I have created a circular buffer in memory for random incoming data from external sources. Suppose the buffer is 32 bytes and I received 34 bytes of data. Yes I'll handle that two bytes are dropped, but if I wanted to read the last 5 bytes then I'll have to somehow wrap around to the end of the buffer again to read more than 2 bytes.
Here's an example in 8051 code of what I'm trying to achieve:
BUFFER equ 40h ;our buffer = 40-5Fh (32 bytes)
BUFFERMASK equ 5Fh ;Mask so buffer doesn't go past 32nd byte
initialization:
mov R1,#BUFFER ;R1=our buffer pointer
mov #R1,#xxh ;Add some incoming data
inc R1
anl R1,#BUFFERMASK
mov #R1,#xxh ;Add some incoming data
inc R1
anl R1,#BUFFERMASK
...
mov #R1,#xxh ;Add some incoming data
inc R1
anl R1,#BUFFERMASK
;At this point we filled a large chunk of the buffer with data.
;Lets assume the buffer wrapped around and address is 41h
;and we want to read the data in reverse
mov A,#R1 ;Get last byte at 41h
dec R1
??? R1,??? (anl won't work here :( )
mov A,#R1 ;Get byte at 40h
dec R1
??? R1,??? (anl won't work here :( )
mov A,#R1 ;Get byte at 5Fh (how do we jump with a logic statement?)
dec R1
??? R1,??? (anl won't work here :( )
I understand that I could get away with a CJNE (compare and jump if not equal), but the disadvantages with that statement are: 1.) a need for a label for each CJNE, 2.) and the carry flag being modified after execution, and 3.) an extra clock cycle wasted if the boundary is hit.
Is there any way I could pull this off with simple anl/orl (AND or OR) logic? I am willing to change the memory address of the cyclic buffer if that creates an advantage in my situation.

Fastest way of storing non-adjacent d registers with NEON intrinsics

I am porting 32bit NEON asm code to NEON intrinsics, and I am wondering if this code can be written in a concise way using intrinsics:
vst4.32 {d0[0], d2[0], d4[0], d6[0]}, [%[v1]]!
1) The previous code operates on q registers, but when it comes to storage, instead of using q0, q1, q2 and q3, it has to recreate vectors which have each part in one of the d registers, e.g. v1[0] = d0[0], v1[1] = d2[0] ... v2[0] = d0[1], v2[1] = d2[1] ... v3[0] = d1[0], v3[1] = d3[0] ... etc.
This operation is a one-liner in asm, but with intrinsics I don't know if I can do that without first splitting high and low bits and building a new float32x4x4_t variable to feed to vst4_f32.
Is that possible?
2) I'm not entirely sure of what [%[v1]]! does (yes, I googled quite a bit): it should be a reference to a variable named v1 and the exclamation mark will do writeback, which should mean the pointer is increased by the same amount that was written by the instruction on the same line.
Correct? Any way of replicating that with intrinsics?
After some more investigation I found this specific instruction to store a specific lane of an array of 4 vectors, so no need to split into high and low bits variables:
float32x4x4_t u = { q0, q1, q2, q3 };
vst4q_lane_f32(v1, u, 0);
v1 += 4;
Writeback is just an increased pointer, as #charlesbaylis wrote.
In principle, a sufficiently smart compiler could use the instruction you want for the vst4_f32 intrinsic, but in practice, no compiler is that good.
To get the post-index writeback, you can write
vst4_f32(ptr, v);
ptr += 4;
Some compilers will recognise this. GCC 5.1 (when released) will do this in at least some cases.
[Edit: misread the question, vst4q_lane_f32 does map to the required instruction perfectly]
It seems to be inline assembly.
Anyway, the answers are:
1) No
2) Yes

x86 Assembly: Writing a Program to Test Memory Functionality for Entire 1MB of Memory

Goal:
I need to write a program that tests the write functionality of an entire 1MB of memory on a byte by byte basis for a system using an Intel 80186 microprocessor. In other words, I need to write a 0 to every byte in memory and then check if a 0 was actually written. I need to then repeat the process using a value of 1. Finally, any memory locations that did not successfully have a 0 or 1 written to them during their respective write operation needs to be stored on the stack.
Discussion:
I am an Electrical Engineering student in college (Not Computer Science) and am relatively new to x86 assembly language and MASM611. I am not looking for a complete solution. However, I am going to need some guidance.
Earlier in the semester, I wrote a program that filled a portion of memory with 0's. I believe that this will be a good starting point for my current project.
Source Code For Early Program:
;****************************************************************************
;Program Name: Zeros
;File Name: PROJ01.ASM
;DATE: 09/16/14
;FUNCTION: FILL A MEMORY SEGMENT WITH ZEROS
;HISTORY:
;AUTHOR(S):
;****************************************************************************
NAME ZEROS
MYDATA SEGMENT
MYDATA ENDS
MYSTACK SEGMENT STACK
DB 0FFH DUP(?)
End_Of_Stack LABEL BYTE
MYSTACK ENDS
ASSUME SS:MYSTACK, DS:MYDATA, CS:MYCODE
MYCODE SEGMENT
START: MOV AX, MYSTACK
MOV SS, AX
MOV SP, OFFSET End_Of_Stack
MOV AX, MYDATA
MOV DS, AX
MOV AX, 0FFFFh ;Moves a Hex value of 65535 into AX
MOV BX, 0000h ;Moves a Hex value of 0 into BX
CALL Zero_fill ;Calls procedure Zero_fill
MOV AX, 4C00H ;Performs a clean exit
INT 21H
Zero_fill PROC NEAR ;Declares procedure Zero_fill with near directive
MOV DX, 0000h ;Moves 0H into DX
MOV CX, 0000h ;Moves 0H into CX. This will act as a counter.
Start_Repeat: INC CX ;Increments CX by 1
MOV [BX], DX ;Moves the contents of DX to the memory address of BX
INC BX ;Increments BX by 1
CMP CX, 10000h ;Compares the value of CX with 10000H. If equal, Z-flag set to one.
JNE Start_Repeat ;Jumps to Start_Repeat if CX does not equal 10000H.
RET ;Removes 16-bit value from stack and puts it in IP
Zero_fill ENDP ;Ends procedure Zero_fill
MYCODE ENDS
END START
Requirements:
1. Employ explicit segment structure.
2. Use the ES:DI register pair to address the test memory area.
3. Non destructive access: Before testing each memory location, I need to store the original contents of the byte. Which needs to be restored after testing is complete.
4. I need to store the addresses of any memory locations that fail the test on the stack.
5. I need to determine the highest RAM location.
Plan:
1. In a loop: Write 0000H to memory location, Check value at that mem location, PUSH values of ES and DI to the stack if check fails.
2. In a loop: Write FFFFH to memory location, Check value at that mem location, PUSH values of ES and DI to the stack if check fails.
Source Code Implementing Preliminary Plan:
;****************************************************************************
;Program Name: Memory Test
;File Name: M_TEST.ASM
;DATE: 10/7/14
;FUNCTION: Test operational status of each byte of memory between a starting
; location and an ending location
;HISTORY: Template code from Assembly Project 1
;AUTHOR(S):
;****************************************************************************
NAME M_TEST
MYDATA SEGMENT
MYDATA ENDS
MYSTACK SEGMENT STACK
DB 0FFH DUP(?)
End_Of_Stack LABEL BYTE
MYSTACK ENDS
ESTACK SEGMENT COMMON
ESTACK ENDS
ASSUME SS:MYSTACK, DS:MYDATA, CS:MYCODE, ES:ESTACK
MYCODE SEGMENT
START: MOV AX, MYSTACK
MOV SS, AX
MOV SP, OFFSET End_Of_Stack
MOV AX, MYDATA
MOV DS, AX
MOV AX, FFFFH ;Moves a Hex value of 65535 into AX
MOV BX, 0000H ;Moves a Hex value of 0 into BX
CALL M_TEST ;Calls procedure M_TEST
MOV AX, 4C00H ;Performs a clean exit
INT 21H
M_TEST PROC NEAR ;Declares procedure M_TEST with near directive
MOV DX, 0000H ;Fill DX with 0's
MOV AX, FFFFH ;Fill AX with 1's
MOV CX, 0000H ;Moves 0H into CX. This will act as a counter.
Start_Repeat: MOV [BX], DX ;Moves the contents of DX to the memory address of BX
CMP [BX], 0000H ;Compare value at memory location [BX] with 0H. If equal, Z-flag set to one.
JNE SAVE ;IF Z-Flag NOT EQUAL TO 0, Jump TO SAVE
MOV [BX], AX ;Moves the contents of AX to the memory address of BX
CMP [BX], FFFFH ;Compare value at memory location [BX] with FFFFH. If equal, Z-flag set to one.
JNE SAVE ;IF Z-Flag NOT EQUAL TO 0, Jump TO SAVE
INC CX ;Increments CX by 1
INC BX ;Increments BX by 1
CMP CX, 10000H ;Compares the value of CX with 10000H. If equal, Z-flag set to one.
JNE Start_Repeat ;Jumps to Start_Repeat if CX does not equal 10000H.
SAVE: PUSH ES
PUSH DI
RET ;Removes 16-bit value from stack and puts it in IP
M_TEST ENDP ;Ends procedure Zero_fill
MYCODE ENDS
END START
My commenting might not be accurate.
Questions:
1. How do I use ES:DI to address the test memory area?
2. What is the best way to hold on to the initial memory value so that I can replace it when I'm done testing a specific memory location? I believe registers AX - DX are already in use.
Also, if I have updated code and questions, should I post it on this same thread, or should I create a new post with a link to this one?
Any other advice would be greatly appreciated.
Thanks in advance.
How do I use ES:DI to address the test memory area?
E.g. mov al, es:[di]
What is the best way to hold on to the initial memory value so that I can replace it when I'm done testing a specific memory location? I believe registers AX - DX are already in use.
Right. You could use al to store the original value and have 0 and 1 pre-loaded in bl and cl and then do something like this (off the top of my head):
mov al, es:[di] // load/save original value
mov es:[di], bl // store zero
cmp bl, es:[di] // check that it sticks
jne #pushbad // jump if it didn't
mov es:[di], cl // same for 'one'
cmp cl, es:[di]
jne #pushbad
mov es:[di], al // restore original value
jmp #nextAddr
#pushbad:
mov es:[di], al // restore original value (may be redundant as the mem is bad)
push es
push di
#nextAddr:
...
Some words about for to test also the memory location that our own routine is claimed. We can copy and run our routine into the framebuffer of the display device.
..
Note: If we want to store or compare a memory location with an immediate value, then we have to specify how many bytes we want to access. (But in opposite of it with using a register as a source, or a target, the assembler already knows the size of it, so we do not need to specify.)
Accessing one byte of one address (with an immediate value):
CMP BYTE[BX], 0 ; with NASM (Netwide Assembler)
MOV BYTE[BX], 0
CMP BYTE PTR[BX], 0 ; with MASM (Microsoft Macro Assembler)
MOV BYTE PTR[BX], 0
Accessing two bytes of two adresses together
(executing faster, if the target address is even aligned):
CMP WORD[BX], 0 ; with NASM
MOV WORD[BX], 0
CMP WORD PTR[BX], 0 ; with MASM
MOV WORD PTR[BX], 0
If you start with the assumption that any location in RAM might be faulty; then this means you can't use RAM to store your code or your data. This includes temporary usage - for example, you can't temporarily store your code in RAM and then copy it to display memory, because you risk copying corrupted code from RAM to display memory.
With this in mind; the only case where this makes sense is code in ROM testing the RAM - e.g. during the firmware's POST (Power On Self Test). Furthermore; this means that you can't use the stack at all - not for keeping track of faulty areas, or even for calling functions/routines.
Note that you might assume that you can test a small area (e.g. find the first 1 KiB that isn't faulty) and then use that RAM for storing results, etc. This would be a false assumption.
For RAM faults there are many causes. The first set of causes is "open connection" and "shorted connection" on either the address bus or the data bus. For a simple example, if address line 12 happens to be open circuit, the end result will be that the first 4 KiB always has identical contents to the second 4 KiB of RAM. You can test the first 4 KiB of RAM as much as you like and decide it's "good", but then when you test the second 4 KiB of RAM you trash the contents of the first 4 KiB of RAM.
There is a "smart sequence" of tests. Specifically, test the address lines from highest to lowest (e.g. write different values to 0x000000 and 0x800000 and check that they're both correct; then do the same for 0x000000 and 0x400000, then 0x000000 and 0x200000, and so on until you get to addresses 0x000000 and 0x000001). However, the way RAM chips are connected to the CPU is not necessarily as simple as a direct mapping. For example, maybe the highest bit of the address selects which bank of RAM; and in that case you'd have to test both 0x000000 and 0x400000 and also 0x800000 and 0xC00000 to test both banks.
Once you're sure the address lines work; then you can do similar for data lines and the RAM itself. The most common test is called a "walking ones" test; where you store 0x01, then 0x02, and so on (up to 0x80). This detects things like "sticky bits" (e.g. where a bit's state happens to be "stuck" to its neighbour's state). If you only write (e.g.) 0x00 and test it then write 0xFF and test it, then you will miss most RAM faults.
Also; be very careful with "open connection". On some machines bus capacitance can play tricks on you, where you write a value and the bus capacitance "stores" the previous value, so that when you read it back it looks like it's correct even when there's no connection. To avoid this risk, you need to write a different value in between - e.g. write 0x55 to the address you're testing, then write 0xAA somewhere else, then read the original value back (and hope you get 0x55 because the RAM works, and not 0xAA). With this in mind (for performance) you may consider doing "walking ones" in one area of RAM while also doing "walking zeros" in the next area of RAM; so that you're always alternating between reading a value from one area and reading the inverted value from the other.
Finally, some RAM problems depend on noise, temperature, etc. In these cases you can do extremely thorough RAM tests, say that it's all perfect, then suffer from RAM corruption 2 minutes afterwards. This is why (e.g.) the typical advice is to run something like "memtest" for 8 hours or so if you really want to test RAM properly.

Memory address wont load

I am trying to move the value 0 into the address stored in ax (assume that this is writable for now).
mov ax, 0EC7 ; assume writable
mov BYTE [ax], 0
But, nasm is giving me this error:
error: invalid effective address
Any ideas?
16-bit addressing modes are quite limited. You can use an (optional) offset (a plain number), plus an (optional) base register (bx or bp), plus an (optional) index register (si or di). That's it.
In 32-bit addressing modes, any register can be a base register and any register but esp can be an index register. 32-bit addressing also introduces an (optional) scale (1, 2, 4, or 8) to be multiplied by the index register.
[eax] will work - even in 16-bit code. The assembler generates an "address size override prefix" byte (0x67). If the value in eax exceeds the segment limit (usually 64k), an exception is generated (not handled in real DOS - it just hangs), so be careful with it.

Storing data in RAM without avr-libc. How can I configure the proper memory sections using a custom linker script and initialization code?

Here's my linker script:
MEMORY {
text (rx) : ORIGIN = 0x000000, LENGTH = 64K
data (rw!x) : ORIGIN = 0x800100, LENGTH = 0xFFA0
}
SECTIONS {
.vectors : AT (0x0000) { entry.o (.vectors); }
.text : AT (ADDR (.vectors) + SIZEOF(.vectors)) { * (.text.startup); * (.text); * (.progmem.data); _etext = .; }
.data : AT (ADDR (.text) + SIZEOF (.text)) { PROVIDE (__data_start = .); * (.data); * (.rodata); * (.rodata.str1.1); PROVIDE (__data_end = .); } > data
.bss : AT (ADDR (.bss)) { PROVIDE (__bss_start = .); * (.bss); PROVIDE (__bss_end = .); } > data
__data_load_start = LOADADDR(.data);
__data_load_end = __data_load_start + SIZEOF(.data);
}
And this is my initialization code. init is called at reset.
.section .text,"ax",#progbits
/* Handle low level hardware initialization. */
.global init
init: eor r1, r1
out 0x3f, r1
ldi r28, 0xFF
ldi r29, 0x02
out 0x3e, r29
out 0x3d, r28
rjmp __do_copy_data
rjmp __do_clear_bss
jmp main
/* Handle copying data into RAM. */
.global __do_copy_data
__do_copy_data: ldi r17, hi8(__data_end)
ldi r26, lo8(__data_start)
ldi r27, hi8(__data_start)
ldi r30, lo8(__data_load_start)
ldi r31, hi8(__data_load_start)
rjmp .L__do_copy_data_start
.L__do_copy_data_loop: lpm r0, Z+
st X+, r0
.L__do_copy_data_start: cpi r26, lo8(__data_end)
cpc r27, r17
brne .L__do_copy_data_loop
rjmp main
/* Handle clearing the BSS. */
.global __do_clear_bss
__do_clear_bss: ldi r17, hi8(__bss_end)
ldi r26, lo8(__bss_start)
ldi r27, hi8(__bss_start)
rjmp .L__do_clear_bss_start
.L__do_clear_bss_loop: st X+, r1
.L__do_clear_bss_start: cpi r26, lo8(__bss_end)
cpc r27, r17
brne .L__do_clear_bss_loop
The problem is that the initialization code hangs sometime during the copying process. Here's an edited dump of my symbol table, if it's helpful to anyone.
00000000 a __tmp_reg__
...
00000000 t reset
...
00000001 a __zero_reg__
...
0000003d a __SP_L__
...
00000074 T main
0000009a T init
000000ae T __do_copy_data
000000c6 T __do_clear_bss
...
00000446 A __data_load_start
00000446 T _etext
0000045b A __data_load_end
00800100 D __data_start
00800100 D myint
00800115 B __bss_start
00800115 D __data_end
00800115 b foobar.1671
00800135 B ticks
00800139 B __bss_end
C is designed to work on von Neumann architectures. AVR is Harvard based. This means that C expects strings to be in RAM. As a consequence, if you ever take a look at the disassembly for any elf binary that would eventually be copied as a hex to your AVR chip, you will see two sections: __do_copy_data and __do_clear_bss [when required]. These routines, which are added in the linking stages, take care of the basic needs of the C language. As a consequence, what you are seeing here with your pointers is likely that they are pointing to the wrong addresses. In other words, they are either pointing to an address in program space but you are reading from data space [different address bus]. Or you are purposely reading from data space but have not copied the strings over.
See:
avr/pgmspace.h
FAQ: ROM Array and scroll down for strings
and of course, the AVR instruction set as provided by Atmel, especifically, the instruction to copy program memory over to data memory
Edited to reflect new question and comment: Your assembly for both sections look ok to me. I will have to take a look at your linker scripts with closer scrutinity to check if there any funny businesses going on there. Since you are writing a bootloader, do you mind if I ask if you have taken a look at bootloader support on AVR-libc?
http://deans-avr-tutorials.googlecode.com/svn/trunk/Progmem/Output/Progmem.pdf
This document may help to clarify the use of flash/ram memories of AVR.
I actually got it to work. All I had to do was enable reading from and writing to external RAM in the SREG. It was mindblowingly obvious and simple, I know, but it was buried in the data sheet and it was poorly documented.
This resource was helpful, but didn't have any documentation for my chip. If you're having this problem, look in the data sheet for your AVR and see how it is related. The process is not the same for all variations of AVRs.
How to use external RAM.

Resources