XOR instruction not working as thought (Intel 8086) - memory

I am studying a topic of mine that I am fascinated with, reverse engineering. But I have run into a little speed bump. I know the bitwise operator xor and what it does to the bits but it doesnt seem to be working correctly when I watch it in process in the disassembler. The small segement of code I am dealing with is:
MOV EAX, 0040305D
XOR DWORD PTR [EAX], 1234567
Before the xor has taken place, the number that resides at the location 0040305D is 1234 or 31323334 hexadecimal (It is represented as ASCII because it was taken from a user input and it firmly resides as 31323334 in memory). When I looked up a xor calculator on the internet to check to make sure I was doing everything alright on paper I got the result of the xor calculation as 30117653 hexadecimal. But when I run the operation in disassembler it replaced the memory location held in EAX with 56771035.
What just happened? Am I missing something here? I checked the xor calculation on many calculators and I am not able to get the answer of 56771035. Can someone give me a hand and tell me what I am doing wrong?
-Dan

The numbers displayed are all in hex and you have forgotten to use proper endianness. If the user input was ascii 1234 that means the memory contains the bytes 31 32 33 34. Since x86 is little endian, the operand 1234567 is byte sequence 67 45 23 01. Performing the xor operation we get the byte sequence 56 77 10 35 which is what you see.

Related

x86: accessing unaligned pixel bytes of BMP image

I am working on a program in C + x86 assembly (NASM) which performs rotating and scaling of an image. In order to do that it goes through pixels of the destination image one by one and calculates the corresponding pixel in the source image.
That part of the assembly code:
; buffer operations
push ebx
fistp dword loc ; store the number of src pixel
mov dword ebx, loc ; move it to ebx
imul ebx, 3 ; 3 bytes per pixel, so multiply pixel number by 3
mov dword eax, [ebx+esi]; store that pixel's color bytes ; ERROR, SEGSEV
mov dword [edi], eax ; draw a pixel
pop ebx
particularly the line marked 'ERROR, SEGSEV' generates a segmentation fault.
I reckon that is due to the fact that I'm trying to access the unaligned memory address.
That said, the bmp file pixel buffer is organised in a way that each pixel has B, G, R bytes stored one after another, so 3 bytes per pixel and each pixel's first byte can have a position in memory that is not divisible by 4 (eg: pixel one: 0.B, 1.G, 2.R; pixel two: 3.B, 4.G, 5.R - so I must access the address 3 to get to the second pixel).
The question is: how then can I access pixel's data if I'm not allowed to access unaligned memory location and how is it usually done when working with bmp files?
The OP assumed that the x86 architecture cannot access unaligned data (data at an address which is not a multiple of the size of the register being used), which is a common problem in Reduced Instruction Set Computing (RISC) processors. For example:
mov ebx, 0x12345677 ; Note the odd address
mov eax, [ebx] ; Load a 32-bit register from an odd address
In a RISC architecture like ARM or PowerPC, this can indeed cause problems: either always (ARM) or if not disabled (PowerPC). With the x86 architecture unaligned accesses have always been possible (albeit sometimes at a speed penalty) - and it's only since the '486 that it was even able to be checked; but that check is almost always off.
It turned out that the problem was elsewhere:
mov dword eax, [ebx+esi]; store that pixel's color bytes ; ERROR, SEGSEV
The OP hadn't confirmed that esi held the desired value.
Note, though, that an unaligned access with the x86 can still cause problems. All segments are defined to be a certain size (even if that size is 4GiB). An access near the top of that segment can "overflow" the segment - and unaligned accesses are the easiest way to suffer this.

How i can calculate stuffbits in standard CAN 2.0A data frame

I have one CAN standard 2.0A frame which contain 8 Bytes of DATA.
e.g
CAN Frame Data "00 CA 22 FF 55 66 AA DF" (8 Bytes)
Now I want to check how many stuff bits would be add in this CAN frame(bit stuffing). standers formula to calculate the Worst case bit stuffing scenario is as following:
64+47+[(34+64-1)/4] ->64 :: Data bits and 47 :: overhead bits 2.0A
How to calculate real stuffed bits in this sample CAN message ??
Any comment, suggestion would be warmly welcome.
There is no way to mathematically "calculate" the stuffed bits. You need to construct the frame (on bit level), traverse the bits, and count.
You could read more about bit stuffin at the link below.
https://en.wikipedia.org/wiki/CAN_bus#Bit_stuffing
Basic principle:
1. Contruct the can frame on bit level
2. Start at frame start bit. When 5 consecutive bits of same polarity is found than insert a bit of opposite polarity.
3. Continue to CRC delimeter (CRC delimeter is excluded)

SIMD zero vector test

Does there exist a quick way to check whether a SIMD vector is a zero vector (all components equal +-zero). I am currently using an algorithm, using shifts, that runs in log2(N) time, where N is the dimension of the vector. Does there exist anything faster? Note that my question is broader (tags), than the proposed answer and it refers to vectors of all types (integer, float, double, ...).
How about this straightforward avx code? I think it's O(N) and don't know how you could possibly do better without making assumptions about the input data - you have to actually read every value to know if its 0 so it's about doing as much of that as possible per cycle.
You should be able to massage the code to your needs. Should treat both +0 and -0 as zero. Will work for unaligned memory addresses but aligning to 32 byte addresses will make the loads faster. You may need to add something to deal with remaining bytes if size isn't a multiple of 8.
uint64_t num_non_zero_floats(float *mem_address, int size) {
uint64_t num_non_zero = 0;
__m256 zeros _mm256_setzero_ps ();
for(i = 0; i != size; i+=8) {
__m256 vec _mm256_loadu_ps (mem_addr + i);
__m256 comparison_out _mm256_cmp_ps (zeros, vec, _CMP_EQ_OQ); //3 cycles latency, throughput 1
uint64_t bits_non_zero = _mm256_movemask_ps(comparison_out); //2-3 cycles latency
num_non_zero += __builtin_popcountll(bits_non_zero);
}
return num_non_zero;
}
If you want to test floats for +/- 0.0, then you can check for all the bits being zero, except the sign bit. Any set-bits anywhere except the sign bit mean the float is non-zero. (http://www.h-schmidt.net/FloatConverter/IEEE754.html)
Agner Fog's asm optimization guide points out that you can test a float or double for zero using integer instructions:
; Example 17.4b
mov eax, [rsi]
add eax, eax ; shift out the sign bit
jz IsZero
For vectors, though, using ptest with a sign-bit mask is better than using paddd to get rid of the sign bit. Actually, test [rsi], $0x7fffffff may be more efficient than Agner Fog's load/add sequence, but a 32bit immediate probably stops the load from micro-fusing on Intel, and maybe have a larger code-size.
x86 PTEST (SSE4.1) does a bitwise AND and sets flags based on the result.
movdqa xmm0, [mask]
.loop:
ptest xmm0, [rsi+rcx]
jnz nonzero
add rcx, 16 # count up towards zero
jl .loop # with rsi pointing to past the end of the array
...
nonzero:
Or cmov could be useful to consume the flags set by ptest.
IDK if it'd be possible to use a loop-counter instruction that didn't set the zero flag, so you could do both tests with one jump instruction or something. Probably not. And the extra uop to merge the flags (or the partial-flags stall on earlier CPUs) would cancel out the benefit.
#Iwillnotexist Idonotexist: re one of your comments on the OP: you can't just movemask without doing a pcmpeq first, or a cmpps. The non-zero bit might not be in the high bit! You probably knew that, but one of your comments seemed to leave it out.
I do like the idea of ORing together multiple values before actually testing. You're right that sign-bits would OR with other sign-bits, and then you ignore them the same way you would if you were testing one at a time. A loop that PORs 4 or 8 vectors before each PTEST would probably be faster. (PTEST is 2 uops, and can't macro-fuse with a jcc.)

Why are memory addresses incremented by 4 in MIPS?

If something is stored at 0x1001 0000 the next thing is stored at 0x1001 0004. And if I'm correct the memory pieces in a 32-bit architecture are 32 bits each. So would 0x1001 0002 point to the second half of the 32 bits?
First of all, memory addresses in MIPS architecture are not incremented by 4. MIPS uses byte addressing, so you can address any byte from memory (see e.g. lb and lbu to read a single byte, lh and lhu to read a half-word).
The fact is that if you read words which are 32 bits length (4 bytes, lw), then two consecutive words will be 4 bytes away from each other. In this case, you would add 4 to the address of the first word to get the address of the next word.
Beside this, if you read words you have to align them in multiples of 4, otherwise you will get an alignment exception.
In your example, if the first word is stored in 0x10010000 then the next word will be in 0x10010004 and of course the first half/second half would be in 0x1001000 and 0x1001002 (the ordering will depend on the endianness of the architecture).
You seem to have answered this one yourself! 32 bits make 4 bytes, so if you're e.g. pushing to a stack, where all elements are pushed as the same size, each next item will be 4 bytes ahead (or before) the next.

Bits in a memory address

While debugging on Windows XP 32-bit using the immunity debugger, I see the following on the stack:
_Address_ -Value_
00ff2254 ff090045
00ff2258 00000002
My understanding is that every address location contains 8 bits.
Is this correct?
If I'm understanding your question correctly, the answer is yes, every individual memory location contains 8 bits.
The debugger is showing you 4 bytes (32 bits) at a time, to make the display more compact (and because many data types take up 32 bits, so it's often useful to see 32-bit values). That's why the addresses in the left column are 4 locations apart.
If the debugger showed one byte (8 bits) at a time, the display would look like this:
_Address_ -Value_
00ff2254 45
00ff2255 00
00ff2256 09
00ff2257 ff
00ff2258 02
00ff2259 00
00ff225a 00
00ff225b 00
(assuming you're on a "little-endian" machine, which most modern desktop PCs are.)
I think the main problem with your question is that you ask for one thing, but I detect a different question lurking in the shadows.
First, and foremost, addressable entities in the memory of a computer is organized as bytes, which are 8 bits each, so yes, each address can be said to refer to 8 bits, or a byte.
However, you can easily group more bytes together to form bigger and more complex data structures.
If your question is really "Why am I seeing an 8-digit value as the contents at an address in my stack dump", then the reason for that is that it dumps 32-bit (4 bytes) values.
In other words, you can take the address, the address+1, the address+2, and the address+3, grab the bytes from each of those, and combine to a 32-bit value.
Is that really your question?
To complete the answer of RH, you may be surprised to have so many numbers for a given address.
You should consider
Address Byte (8 bits)
00ff2254 45
00ff2255 00
00ff2256 09
00ff2257 ff
00ff2258 02
...
(On a cpu architecture used by XP)
A memory location refers to a location of memory, and each consecutive memory location refers to the next byte in memory. So, you can only address memory on a one byte boundary, and everyone should know that a byte is 8 bits wide.

Resources