Decode hex string encoding - character-encoding

I have a .bin saved with a VB program, the .bin format is:
String bytes | String
06 00 | C0 E1 E0 E8 F1 E0
The problem is I don't know how the string is encoded. I know what the string is supposed to be: Abaira
Can anyone recognize the encoding used?

I'm not aware of any standard character encoding for this. It is neither ASCII nor EBCDIC.
It seems to be some trivial sort of 8-bit (non-Unicode) ASCII (perhaps ANSI) encryption. Compare your unknown encoding with ASCII:
Unknown ASCII
Hex MSB LSB Hex MSB LSB
A CO 1100 0000 41 0100 0001
b E1 1110 0001 62 0110 0010
a E0 1110 0000 61 0110 0001
i E8 1110 1000 69 0110 1001
r F1 1111 0001 72 0111 0010
a E0 1110 0000 61 0110 0001
Let's define:
MSB: First nibble = most significant 4 bits
LSB: Second nibble = least significant 4 bits
_U: of Unknown
_A: of ASCII
Then you find:
MSB_U = MSB_A Xor 0x80 (maybe MSB_A Or 0x80)
LSB_U = LSB_A + 1 (to tell how overflow is handled I need to see ASCII char 'O' or 'o')
Then U is the concatenation MSB_U & LSB_U.
Further example ASCII to Unknown:
ASCII Hex MSB LSB MSB Xor 0x80 LSB - 1 Concatenated Hex
H 48 0100 1000 1100 1001 1100 0111 C7
e 65 0110 1001 1110 1010 1110 1000 E8
r 72 0111 0010 1111 0011 1111 0001 F1 (as you have shown)
b 62 0110 0010 1110 0011 1110 0001 E1 (do.)

Related

Option rom: PXE design

During the study of the PCI firmware specification and the looking at the existing implementations of the PXE Boot Agents, I had a misunderstanding of how this should work.
According to PCI Firmware Specification, during the POST procedure the BIOS should map Option ROM into UMB memory (0xC000-0xF000), then call "Init" entry point by the offset 0x3, and after this the BIOS can disable Option ROM.
PXE oprom binary consists from three parts: "Initialization code", "Base code" and "UNDI code".
BIOS loads into UMB only "Initialization code". Base code and UNDI code are loaded into memory later through copying directly from the flash memory (from PCI Flash BAR (BAR1, according Intel specifications).
The question: what are the reasons for the need for such an algorithm of work?
Why the vendors do not use the BIOS mechanisms and do not load the entire Extension ROM into memory (instead copying from Flash BARs)?
A monolithic PXE option ROM was a single unit but most PXE option ROMs now have a split architecture (split into UNDI option ROM and a BC option ROM). Although, the BC ROM is typically embedded in the BIOS and may not even appear as an option ROM.
The NIC only has one option ROM nowadays, the UNDI option ROM.
Option ROM Header: 0x000DA000
55 AA 08 E8 76 10 CB 55 BC 01 00 00 00 00 00 00 U...v..U........
00 00 00 00 00 00 20 00 40 00 60 00 ...... .#.`.
Signature 0xAA55
Length 0x08 (4096 bytes)
Initialization entry 0xCB1076E8 //call then far return
Reserved 0x55 0xBC 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Reserved 0x00 0x00 0x00 0x00 0x00
PXEROMID Offset 0x0020 //RWEverything didn't pick it up as a separate field and made it part of the reserved section so I separated it.
PCI Data Offset 0x0040
Expansion Header Offset 0x0060
UNDI ROM ID Structure: 0x000DA020 //not recognised by RW Everything so I parsed it myself
55 4E 44 49 16 08 00 00 01 02 32 0D 00 08 B0 C4 UNDI......2...
80 46 50 43 49 52 ¦-ÇFPCIR
Signature UNDI
StructLength 0x16
Checksum 0x08
StructRev 0x00
UNDIRev 0x00 0x01 0x02
UNDI Loader Offset 0x0D32
StackSize 0x0800
DataSize 0xC4B0
CodeSize 0x4680
BusType PCIR
PCI Data Structure: 0x000DA040
50 43 49 52 EC 10 68 81 00 00 1C 00 03 00 00 02 PCIR..h.........
08 00 01 02 00 80 08 00 ........
Signature PCIR
Vendor ID 0x10EC - Realtek Semiconductor
Device ID 0x8168
Product Data 0x0000
Structure Length 0x001C
Structure Revision 0x03
Class Code 0x00 0x00 0x02
Image Length 0x0008
Revision Level 0x0201
Code Type 0x00
Indicator 0x80
Reserved 0x0008
PnP Expansion Header: 0x000DA060
24 50 6E 50 01 02 00 00 00 D7 00 00 00 00 AF 00 $PnP............
92 01 02 00 00 E4 00 00 00 00 C1 0B 00 00 00 00 ................
Signature $PnP
Revision 0x01
Length 0x02 (32 bytes)
Next Header 0x0000
Reserved 0x00
Checksum 0xD7
Device ID 0x00000000
Manufacturer 0x00AF - Intel Corporation
Product Name 0x0192 - Realtek PXE B02 D00
Device Type Code 0x02 0x00 0x00
Device Indicators 0xE4
Boot Connection Vector 0x0000
Disconnect Vector 0x0000
Bootstrap Entry Vector 0x0BC1 // will be at 0xDABC1
Reserved 0x0000
Resource info. vector 0x0000

H.264 NALU Byte Alignment

I am trying to get my head around the H.264 NALU headers in the following data stored in a mov container.
Example from file:
00 00 00 02 09 30 00 00 00 0E 06 01 09 00 02 08
24 68 00 00 03 00 01 80 00 00 2B 08 21 9A 01 01
64 47 D4 B2 5C 45 76 DA 72 E4 3B F3 AE A9 56 91
B2 3F FE CE 87 1A 48 13 14 A9 E0 12 C8 AD E9 22
...
So far I have assumed that the bit-stream is not byte aligned due to the start code sequence offset to the left by one bit:
0x00 0x00 0x00 0x02 -> 00000000 00000000 00000000 00000010
So I have shifted the these and subsequent bytes to the right one bit which results in the following start sequence code and header bits for the first header:
0000000 00000000 00000000 00000001 [0 00 00100]
However I am coming unstuck when I reach following byte sequence in the example:
0x00 0x00 0x00 0x0E
I am assuming it is another start sequence code but with a different byte alignment.
00000000 00000000 00000000 00001110 00000110 00000001 00001001 00000000
After byte alignment I am getting the following header byte:
00000 00000000 00000000 00000001 [1 10 00000]
The first bit in the header (the forbidden_zero_bit) is non-zero which violates the rule that it must be zero
Where am I tripping up?
Am I making the wrong assumptions here?
As was already answered MOV-container (or MP4) doesn't use Annex B encoding with start codes. It use MP4-style encoding where NALs are prefixed with NALUnitLength field. This field can be of different size (and that size signaled somewhere else in container) but usually it is 4 bytes. In your case NALUnitLength is probably 4 bytes and 3 NALs from you dump have sizes of: 2-bytes (00 00 00 02), 14-bytes (00 00 00 0E) and 11016-bytes (00 00 2B 08).
Start codes are used in "Byte stream format" (H.264 Annex B) and are byte aligned themselves. Decoder is supposed to identify start code by checking byte sequences, without bit shifts.
MOV, MP4 containers don't use start codes, however they have their own structure (atoms, boxes) with parameter set NAL units, without prefixes, in sample description atoms and then data itself separately again as original NAL units.
What you quoted is presumably a fragment of MOV atoms which correspond to file structure bytes and not NAL units.

Why could be natural thinking to Unicode encoding as an array of 32-bit integers?

I was reading Python guide about Unicode. In this section, it says:
To summarize the previous section: a Unicode string is a sequence of code points, which are numbers from 0 to 0x10ffff. This sequence needs to be represented as a set of bytes (meaning, values from 0-255) in memory. The rules for translating a Unicode string into a sequence of bytes are called an encoding.
The first encoding you might think of is an array of 32-bit integers. In this representation, the string “Python” would look like this:
P y t h o n
0x50 00 00 00 79 00 00 00 74 00 00 00 68 00 00 00 6f 00 00 00 6e 00 00 00
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Why might we think of 32-bit integers if code points are numbers from 0 to 0x10ffff? Maybe is it assuming that we are on a 32-bit system?

24 bit-address in hex

How many hex digits does a 24-bit memory address have?
One hex digit corresponds to 4 binary digits(bits)
for 24 bits, there are 3 bytes(8bits) which makes 6 hex digits.
8 bits = 1 byte
24 bits = 3 bytes
1 byte = 2 hex characters
2 bytes = 4 hex characters
3 bytes = 6 hex characters
Each hex digit handles four bits, so a 24-bit address requires six hex digits. You can see the relationship between hex and binary here:
Hex Binary Hex Binary
--- ------ --- ------
0 0000 8 1000
1 0001 9 1001
2 0010 A 1010
3 0011 B 1011
4 0100 C 1100
5 0101 D 1101
6 0110 E 1110
7 0111 F 1111
every hex is 4 bits,
every number in base 16, is 2^4
hence 4 digits in base 2

How to calculate Internet checksum?

I have a question regarding how the Internet checksum is calculated. I couldn't find any good explanation from the book, so I ask it here.
Have a look at the following example.
The following two messages are sent: 10101001 and 00111001. The checksum is calculated with 1's complement. So far I understand. But how is the sum calculated? At first I thought it maybe is XOR, but it seems not to be the case.
10101001
00111001
--------
Sum 11100010
Checksum: 00011101
And then when they calculate if the message arrived OK. And once again how is the sum calculated?
10101001
00111001
00011101
--------
Sum 11111111
Complement 00000000 means that the pattern is O.K.
It uses addition, hence the name "sum". 10101001 + 00111001 = 11100010.
For example:
+------------+-----+----+----+----+---+---+---+---+--------+
| bin value | 128 | 64 | 32 | 16 | 8 | 4 | 2 | 1 | result |
+------------+-----+----+----+----+---+---+---+---+--------+
| value 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 169 |
| value 2 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 57 |
| sum/result | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 226 |
+------------+-----+----+----+----+---+---+---+---+--------+
If by internet checksum you mean TCP Checksum there's a good explanation here and even some code.
When you're calculating the checksum remember that it's not just a function of the data but also of the "pseudo header" which puts the source IP, dest IP, protocol, and length of the TCP packet into the data to be checksummed. This ties the TCP meta data to some data in the IP header.
TCP/IP Illustrated Vol 1 is a good reference for this and explains it all in detail.
The calculation of the internet checksum uses ones complement arithmetic. Consider the data being checksummed is a sequence of 8 bit integers. First you need to add them using ones complement arithmetic and take the ones complement of the result.
NOTE: When adding numbers ones complement arithmetic, a carryover from the MSB needs to be added to the result. Consider for eg., the addition of 3(0011) and 5(0101).
3'->1100
5'->1010
0110 with a carry of 1
Thus we have, 0111(1's complement representation of -8).
The checksum is the 1's complement of the result obtained int he previous step. Hence we have 1000. If no carry exists, we just complement the result obtained in the summing stage.
The UDP checksum is created on the sending side by summing all the 16-bit words in the segment, with any overflow being wrapped around and then the 1's complement is performed and the result is added to the checksum field inside the segment.
at the receiver side, all words inside the packet are added and the checksum is added upon them if the result is 1111 1111 1111 1111 then the segment is valid else the segment has an error.
exmaple:
0110 0110 0110 0000
0101 0101 0101 0101
1000 1111 0000 1100
--------------------
1 0100 1010 1100 0001 //there is an overflow so we wrap it up, means add it to the sum
the sum = 0100 1010 1100 0010
now let's take the 1's complement
checksum = 1011 0101 0011 1101
at the receiver the sum is calculated and then added to the checksum
0100 1010 1100 0010
1011 0101 0011 1101
----------------------
1111 1111 1111 1111 //clearly this should be the answer, if it isn't then there is an error
references:Computer networking a top-down approach[Ross-kurose]
Here's a complete example with a real header of an IPv4 packet.
In the following example, I use bc, printf and here strings to calculate the header checksum and verify it. Consequently, it should be easy to reproduce the results on Linux by copy-pasting the commands.
These are the twenty bytes of our example packet header:
45 00 00 34 5F 7C 40 00 40 06 [00 00] C0 A8 B2 14 C6 FC CE 19
The sender hasn't calculated the checksum yet. The two bytes in square brackets is where the checksum will go. The checksum's value is initially set to zero.
We can mentally split up this header as a sequence of ten 16-bit values: 0x4500, 0x0034, 0x5F7C, etc.
Let's see how the sender of the packet calculates the header checksum:
Add all 16-bit values to get 0x42C87: bc <<< 'obase=16;ibase=16;4500 + 0034 + 5F7C + 4000 + 4006 + 0000 + C0A8 + B214 + C6FC + CE19'
The leading digit 4 is the carry count, we add this to the rest of the number to get 0x2C8B: bc <<< 'obase=16;ibase=16;2C87 + 4'
Invert¹ 0x2C8B to get the checksum: 0xD374
Finally, insert the checksum into the header:
45 00 00 34 5F 7C 40 00 40 06 [D3 74] C0 A8 B2 14 C6 FC CE 19
Now the header is ready to be sent.
The recipient of the IPv4 packet then creates the checksum of the received header in the same way:
Add all 16-bit values to get 0x4FFFB: bc <<< 'obase=16;ibase=16;4500 + 0034 + 5F7C + 4000 + 4006 + D374 + C0A8 + B214 + C6FC + CE19'
Again, there's a carry count so we add that to the rest to get 0xFFFF: bc <<< 'obase=16;ibase=16;FFFB + 4'
If the checksum is 0xFFFF, as in our case, the IPv4 header is intact.
See the Wikipedia entry for more information.
¹Inverting the hexadecimal number means converting it to binary, flipping the bits, and converting it to hexadecimal again. You can do this online or with Bash: hex_nr=0x2C8B; hex_len=$(( ${#hex_nr} - 2 )); inverted=$(printf '%X' "$(( ~ hex_nr ))"); trunc_inverted=${inverted: -hex_len}; echo $trunc_inverted

Resources