padding algorithm more than 512 bits in sha256 - message

if anyone answered my quesion i would be very much grateful,
my question is in sha256 , if we have 512 bits message length how can i do padding , because in sha256 l+1+k=448mod512, it will overflow 512 bit block, i mean to say that for hashing computation we need to divide 512 bit block into 16 x 32 blocks , and the process is the message bit length (l) + k number of '0' + binary representation of l(length of message). my point is 512 bit after k number 0f '0' and bit representation of l(length of message) we will get more than 512 bits , how ae we going to divide into 16 x 32 bit blocks.

You don't need to pad your data to be a certain length in order to get a sha256 checksum value; the algorithm will handle a few bytes as well as megabytes of data:
Linux> echo 'hey' | sha256sum
4e955fea0268518cbaa500409dfbec88f0ecebad28d84ecbe250baed97dba889 -
Linux> echo '' | sha256sum
01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b -
Linux> ls -oh L0_2018_08_28_Disk1_8mtbno4o_3_1.bkp
-rw-r-----. 1 nfsnobody 348M Aug 28 23:48 L0_2018_08_28_Disk1_8mtbno4o_3_1.bkp
Linux> sha256sum L0_2018_08_28_Disk1_8mtbno4o_3_1.bkp
34438a1bc45f613b7be8797b1139aaa66a60d73844f2d7554184b17c621b4576 L0_2018_08_28_Disk1_8mtbno4o_3_1.bkp
If you want to learn about the actual algorithm, one open-source implementation says
Algorithm specification can be found here:
http://csrc.nist.gov/publications/fips/fips180-2/fips180-2withchangenotice.pdf

Related

Little Endian vs. Big Endian architectures

I've a question it is a kind of disagreement with a university professor, about Endianness, So I did not find any mean to solve this and find the right answer but asking and open a discussion in Stack Overflow community.
Let's say that we have this number (hex)11FF1 defined as an integer, for example in C++ it will be like : int num = 0x11FF1, and I say that the number will be presented in the memory in a little endian machine as :
addr[0] is f1 addr[1] is 1f addr[2] is 01 addr[3] is 00
in binary : 1111 0001 0001 1111 0000 0001 0000 0000
as the compiler considers 0x11ff1 as 0x0001ff1 and considers also 00 as the 1st byte and 01 as the 2nd byte and so on, and for the Big Endian I believe it will look like:
addr[0] is 00 addr[1] is 01 addr[2] is 1f addr[3] is f1
in binary : 0000 0000 0000 0001 0001 1111 1111 0001
but he has another opinion, he says :
Little Endian
Big Endian:
Actually I don't see anything logical in his representation, so I hope the developers Resolve this disagreement, Thanks in advance.
Your hex and binary numbers are correct.
Your (professor's?) French image for little-endian makes no sense at all, none of the 3 representations are consistent with either of the other 2.
73713 is 0x11ff1 in hex, so there aren't any 0xFF bytes (binary 11111111).
In 32-bit little-endian, the bytes are F1 1F 01 00 in order of increasing memory address.
You can get that by taking pairs of hex digits (bytes / octets) from the low end of the full hex value, then fill with zeros once you've consumed the value.
It looks like they maybe padded the wrong side of the hex value with 0s to zero-extend to 32 bits as 0x11ff1000, not 0x00011ff1. Note these are full hex values of the whole number, not an attempt to break it down into separate hex bytes in any order.
But the hex and binary don't match each other; their binary ends with an all-ones byte, so it has FF as the high byte, not the 3rd byte. I didn't check if that matches their hex in PDP (mixed) endian.
They broke up their hex column into 4 byte-sized groups, which would seem to indicate that it's showing bytes in memory order. But that column is the same between their big- and little-endian images, so apparently that's not what they're doing, and they really did just extend it to 32 bits by left shifting (padding with low instead of high zero).
Also, the binary field in the big vs. little endian aren't the reverse of each other. To flip from big to little endian, you reverse the order of the bytes within the integer, keeping each byte value the same. (like x86 bswap). Their 11111111 (FF) byte is 2nd in their big-endian version, but last in little-endian.
TL:DR: unfortunately, nothing about those images makes any sense that I can see.

Assembly Memory Diagram Verification

Given this data,
.data
Alpha WORD 0022h, 45h
Beta BYTE 56h
Gamma DWORD 4567h
Delta BYTE 23h
Assuming that the data segment begins at 0x00404000, can anyone verify how correct this table is?
Address Variable Data
00404000 Alpha 22
00404001 Alpha + 1 00
00404002 Alpha + 2 45
00404003 Beta 56
00404004 Gamma 67
00404005 Gamma+1 45
00404006 Delta 23
Impossible to answer without knowing the addressing of the processor in question (and how the assembler views the addressing). Nonetheless, you'd need a pretty unusual system for it to be correct.
Alpha is defined has having the type "word". You're showing the first word as allocating two bytes (fairly reasonable), but the second only one byte. This is much less reasonable--a word might be one byte or it might be two, but its size is normally going to at least be consistent.
For the moment, let's assume a word is two bytes, and a dword is four bytes. In that case, I'd expect something more like:
Alpha 22h
alpha+1 00h
alpha+2 45h
Alpha+3 00h
Beta 56h
Gamma 67h
Gamma+1 45h
Gamma+2 00h
Gamma+3 00h
Delta 23h

How come the largest value that can be stored in a bit is 127?

The decimal number 128, is 10000000 in binary. Isn't this 8 bits? How come byte's highest value is 127 then? Thankyou!!
In two's complement representation, you have to allow for negative numbers as well.
Eight bits will give you 256 distinct values, -128 thru 127 inclusive.
00000000 - 01111111 0 to 127
10000000 - 11111111 -128 to -1 (or 128 to 255 for unsigned).
Note that there are other encoding schemes, such as ones' complement or sign/magnitude, which have slightly different properties. Both those have a positive and negative zero so the range is -127..127.
Counting is zero based - it starts at 0.
Hence 0 to 127 is 128 items and the maximum value is 127.
Note that this assumes that you are talking about signed 8 bit bytes/integers.
For unsigned 8 bit bytes/integers the maximum value that can be represented is 255 (0-255 is 256 items).

Assembly Language: Memory Bytes and Offsets

I am confused as to how memory is stored when declaring variables in assembly language. I have this block of sample code:
val1 db 1,2
val2 dw 1,2
val3 db '12'
From my study guide, it says that the total number of bytes required in memory to store the data declared by these three data definitions is 8 bytes (in decimal). How do I go about calculating this?
It also says that the offset into the data segment of val3 is 6 bytes and the hex byte at offset 5 is 00. I'm lost as to how to calculate these bytes and offsets.
Also, reading val1 into memory will produce 0102 but reading val3 into memory produces 3132. Are apostrophes represented by the 3 or where does it come from? How would val2 be read into memory?
You have two bytes, 0x01 and 0x02. That's two bytes so far.
Then you have two words, 0x0001 and 0x0002. That's another four bytes, making six to date.
The you have two more bytes making up the characters of the string '12', which are 0x31 and 0x32 in ASCII (a). That's another two bytes bringing the grand total to eight.
In little-endian format (which is what you're looking at here based on the memory values your question states), they're stored as:
offset value
------ -----
0 0x01
1 0x02
2 0x01
3 0x00
4 0x02
5 0x00
6 0x31
7 0x32
(a) The character set you're using in this case is the ASCII one (you can follow that link for a table describing all the characters in that set).
The byte values 0x30 thru 0x39 are the digits 0 thru 9, just as the bytes 0x41 thru 0x5A represent the upper-case alpha characters. The pseudo-op:
db '12'
is saying to insert the bytes for the characters '1' and '2'.
Similarly:
db 'Pax is a really cool guy',0
would give you the hex-dump representation:
addr +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 +A +B +C +D +E +F +0123456789ABCDEF
0000 50 61 78 20 69 73 20 61 20 72 65 61 6C 6C 79 20 Pax is a really
0010 63 6F 6F 6C 20 67 75 79 00 cool guy.
val1 is two consecutive bytes, 1 and 2. db means "direct byte". val2 is two consecutive words, i.e. 4 bytes, again 1 and 2. in memory they will be 1, 0, 2, 0, assuming you're on a big endian machine. val3 is a two bytes string. 31 and 32 in are 49 and 50 in hexadecimal notation, they are ASCII codes for the characters "1" and "2".

How to calculate Internet checksum?

I have a question regarding how the Internet checksum is calculated. I couldn't find any good explanation from the book, so I ask it here.
Have a look at the following example.
The following two messages are sent: 10101001 and 00111001. The checksum is calculated with 1's complement. So far I understand. But how is the sum calculated? At first I thought it maybe is XOR, but it seems not to be the case.
10101001
00111001
--------
Sum 11100010
Checksum: 00011101
And then when they calculate if the message arrived OK. And once again how is the sum calculated?
10101001
00111001
00011101
--------
Sum 11111111
Complement 00000000 means that the pattern is O.K.
It uses addition, hence the name "sum". 10101001 + 00111001 = 11100010.
For example:
+------------+-----+----+----+----+---+---+---+---+--------+
| bin value | 128 | 64 | 32 | 16 | 8 | 4 | 2 | 1 | result |
+------------+-----+----+----+----+---+---+---+---+--------+
| value 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 169 |
| value 2 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 57 |
| sum/result | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 226 |
+------------+-----+----+----+----+---+---+---+---+--------+
If by internet checksum you mean TCP Checksum there's a good explanation here and even some code.
When you're calculating the checksum remember that it's not just a function of the data but also of the "pseudo header" which puts the source IP, dest IP, protocol, and length of the TCP packet into the data to be checksummed. This ties the TCP meta data to some data in the IP header.
TCP/IP Illustrated Vol 1 is a good reference for this and explains it all in detail.
The calculation of the internet checksum uses ones complement arithmetic. Consider the data being checksummed is a sequence of 8 bit integers. First you need to add them using ones complement arithmetic and take the ones complement of the result.
NOTE: When adding numbers ones complement arithmetic, a carryover from the MSB needs to be added to the result. Consider for eg., the addition of 3(0011) and 5(0101).
3'->1100
5'->1010
0110 with a carry of 1
Thus we have, 0111(1's complement representation of -8).
The checksum is the 1's complement of the result obtained int he previous step. Hence we have 1000. If no carry exists, we just complement the result obtained in the summing stage.
The UDP checksum is created on the sending side by summing all the 16-bit words in the segment, with any overflow being wrapped around and then the 1's complement is performed and the result is added to the checksum field inside the segment.
at the receiver side, all words inside the packet are added and the checksum is added upon them if the result is 1111 1111 1111 1111 then the segment is valid else the segment has an error.
exmaple:
0110 0110 0110 0000
0101 0101 0101 0101
1000 1111 0000 1100
--------------------
1 0100 1010 1100 0001 //there is an overflow so we wrap it up, means add it to the sum
the sum = 0100 1010 1100 0010
now let's take the 1's complement
checksum = 1011 0101 0011 1101
at the receiver the sum is calculated and then added to the checksum
0100 1010 1100 0010
1011 0101 0011 1101
----------------------
1111 1111 1111 1111 //clearly this should be the answer, if it isn't then there is an error
references:Computer networking a top-down approach[Ross-kurose]
Here's a complete example with a real header of an IPv4 packet.
In the following example, I use bc, printf and here strings to calculate the header checksum and verify it. Consequently, it should be easy to reproduce the results on Linux by copy-pasting the commands.
These are the twenty bytes of our example packet header:
45 00 00 34 5F 7C 40 00 40 06 [00 00] C0 A8 B2 14 C6 FC CE 19
The sender hasn't calculated the checksum yet. The two bytes in square brackets is where the checksum will go. The checksum's value is initially set to zero.
We can mentally split up this header as a sequence of ten 16-bit values: 0x4500, 0x0034, 0x5F7C, etc.
Let's see how the sender of the packet calculates the header checksum:
Add all 16-bit values to get 0x42C87: bc <<< 'obase=16;ibase=16;4500 + 0034 + 5F7C + 4000 + 4006 + 0000 + C0A8 + B214 + C6FC + CE19'
The leading digit 4 is the carry count, we add this to the rest of the number to get 0x2C8B: bc <<< 'obase=16;ibase=16;2C87 + 4'
Invert¹ 0x2C8B to get the checksum: 0xD374
Finally, insert the checksum into the header:
45 00 00 34 5F 7C 40 00 40 06 [D3 74] C0 A8 B2 14 C6 FC CE 19
Now the header is ready to be sent.
The recipient of the IPv4 packet then creates the checksum of the received header in the same way:
Add all 16-bit values to get 0x4FFFB: bc <<< 'obase=16;ibase=16;4500 + 0034 + 5F7C + 4000 + 4006 + D374 + C0A8 + B214 + C6FC + CE19'
Again, there's a carry count so we add that to the rest to get 0xFFFF: bc <<< 'obase=16;ibase=16;FFFB + 4'
If the checksum is 0xFFFF, as in our case, the IPv4 header is intact.
See the Wikipedia entry for more information.
¹Inverting the hexadecimal number means converting it to binary, flipping the bits, and converting it to hexadecimal again. You can do this online or with Bash: hex_nr=0x2C8B; hex_len=$(( ${#hex_nr} - 2 )); inverted=$(printf '%X' "$(( ~ hex_nr ))"); trunc_inverted=${inverted: -hex_len}; echo $trunc_inverted

Resources