I am looking for a checksum algorithm where for a large block of data the checksum is equal to the sum of checksums from all the smaller component blocks. Most of what I have found is from RFCs 1624/1141 which do provide this functionality. Does anyone have any experience with these checksumming techniques or a similar one?
If it's just a matter of quickly combining the checksums of the smaller blocks to get to the checksums of the larger message (not necessarily by a plain summation) you can do this with a CRC-type (or similar) algorithm.
The CRC-32 algorithm is as simple as this:
uint32_t update(uint32_t state, unsigned bit)
{
if (((state >> 31) ^ bit) & 1) state = (state << 1) ^ 0x04C11DB7;
else state = (state << 1);
return state;
}
Mathematically, the state represents a polynomial over the field GF2 that is always reduced modulo the generator polynomial. Given a new bit b the old state is transformed into the new state like this
state --> (state * x^1 + b * x^32) mod G
where G is the generator polynomial and addition is done in GF2 (xor). This checksum is linear in the sense that you can write the message M as a sum (xor) of messages A,B,C,... like this
10110010 00000000 00000000 = A = a 00000000 00000000
00000000 10010001 00000000 = B = 00000000 b 00000000
00000000 00000000 11000101 = C = 00000000 00000000 c
-------------------------------------------------------------
= 10110010 10010001 11000101 = M = a b c
with the following properties
M = A + B + C
checksum(M) = checksum(A) + checksum(B) + checksum(C)
Again, I mean the + in GF2 which you can implement with a binary XOR.
Finally, it's possible to compute checksum(B) based on checksum(b) and the position of the subblock b relative to B. The simple part is leading zeros. Leading zeros don't affect the checksum at all. So checksum(0000xxxx) is the same as checksum(xxxx). If you want to compute the checksum of a zero-padded (to the right -> trailing zeros) message given the checksum of the non-padded message it is a bit more complicated. But not that complicated:
zero_pad(old_check_sum, number_of_zeros)
:= ( old_check_sum * x^{number_of_zeros} ) mod G
= ( old_check_sum * (x^{number_of_zeros} mod G) ) mod G
So, getting the checksum of a zero-padded message is just a matter of multiplying the "checksum polynomial" of the non-padded message with some other polynomial (x^{number_of_zeros} mod G) that only depends on the number of zeros you want to add. You could precompute this in a table or use the square-and-multiply algorithm to quickly compute this power.
Suggested reading: Painless Guide to CRC Error Detection Algorithms
I have only used Adler/Fletcher checksums which work as you describe.
There is a nice comparison of crypto++ hash/checksum implementations here.
To answer Amigable Clark Kent's bounty question, for file identity purposes you probably want a cryptographic hash function, which tries to guarantee that any two given files have an extremely low probability of producing the same value, as opposed to a checksum which is generally used for error detection only and may provide the same value for two very different files.
Many cryptographic hash functions, such as MD5 and SHA-1, use the Merkle–Damgård construction, in which there is a computation to compress a block of data into a fixed size, and then combine that with a fixed size value from the previous block (or an initialization vector for the first block). Thus, they are able to work in a streaming mode, incrementally computing as they go along.
Related
I'm looking for an SSE Bitwise OR between components of same vector. (Editor's note: this is potentially an X-Y problem, see below for the real comparison logic.)
I am porting some SIMD logic from SPU intrinsics. It has an instruction
spu_orx(a)
Which according to the docs
spu_orx: OR word across d = spu_orx(a) The four word elements of
vector a are logically Ored. The result is returned in word element 0
of vector d. All other elements (1,2,3) of d are assigned a value of
zero.
How can I do that with SSE 2 - 4 involving minimum instruction? _mm_or_ps is what I got here.
UPDATE:
Here is the scenario from SPU based code:
qword res = spu_orx(spu_or(spu_fcgt(x, y), spu_fcgt(z, w)))
So it first ORs two 'greater' comparisons, then ORs its result.
Later couples of those results are ANDed to get final comparison value.
This is effectively doing (A||B||C||D||E||F||G||H) && (I||J||K||L||M||N||O||P) && ... where A..D are the 4x 32-bit elements of the fcgt(x,y) and so on.
Obviously vertical _mm_or_ps of _mm_cmp_ps results is a good way to reduce down to 1 vector, but then what? Shuffle + OR, or something else?
UPDATE 1
Regarding "but then what?"
I perform
qword res = spu_orx(spu_or(spu_fcgt(x, y), spu_fcgt(z, w)))
On SPU it goes like this:
qword aRes = si_and(res, res1);
qword aRes1 = si_and(aRes, res2);
qword aRes2 = si_and(aRes1 , res3);
return si_to_uint(aRes2 );
several times on different inputs,then AND those all into a single result,which is finally cast to integer 0 or 1 (false/true test)
SSE4.1 PTEST bool any_nonzero = !_mm_testz_si128(v,v);
That would be a good way to horizontal OR + booleanize a vector into a 0/1 integer. It will compile to multiple instructions, and ptest same,same is 2 uops on its own. But once you have the result as a scalar integer, scalar AND is even cheaper than any vector instruction, and you can branch on the result directly because it sets integer flags.
#include <immintrin.h>
bool any_nonzero_bit(__m128i v) {
return !_mm_testz_si128(v,v);
}
On Godbolt with gcc9.1 -O3 -march=nehalem:
any_nonzero(long long __vector(2)):
ptest xmm0, xmm0 # 2 uops
setne al # 1 uop with false dep on old value of RAX
ret
This is only 3 uops on Intel for a horizontal OR into a single bit in an integer register. AMD Ryzen ptest is only 1 uop so it's even better.
The only risk here is if gcc or clang creates false dependencies by not xor-zeroing eax before doing a setcc into AL. Usually gcc is pretty fanatical about spending extra uops to break false dependencies so I don't know why it doesn't here. (I did check with -march=skylake and -mtune=generic in case it was relying on Nehalem partial-register renaming for -march=nehalem. Even -march=znver1 didn't get it to xor-zero EAX before the ptest.)
It would be nice if we could avoid the _mm_or_ps and have PTEST do all the work. But even if we consider inverting the comparisons, the vertical-AND / horizontal-OR behaviour doesn't let us check something about all 8 elements of 2 vectors, or about any of those 8 elements.
e.g. Can PTEST be used to test if two registers are both zero or some other condition?
// NOT USEFUL
// 1 if all the vertical pairs AND to zero.
// but 0 if even one vertical AND result is non-zero
_mm_testz_si128( _mm_castps_si128(_mm_cmpngt_ps(x,y)),
_mm_castps_si128(_mm_cmpngt_ps(z,w)));
I mention this only to rule it out and save you the trouble of considering this optimization idea. (#chtz suggested it in comments. Inverting the comparison is a good idea that can be useful for other ways of doing things.)
Without SSE4.1 / delaying the horizontal OR
We might be able to delay horizontal ORing / booleanizing until after combining some results from multiple vectors. This makes combining more expensive (imul or something), but saves 2 uops in the vector -> integer stage vs. PTEST.
x86 has cheap vector mask->integer bitmap with _mm_movemask_ps. Especially if you ultimately want to branch on the result, this might be a good idea. (But x86 doesn't have a || instruction that booleanizes its inputs either so you can't just & the movemask results).
One thing you can do is integer multiply movemask results: x * y is non-zero iff both inputs are non-zero. Unlike x & y which can be false for 0b0101 &0b1010for example. (Our inputs are 4-bit movemask results andunsigned` is 32-bit so we have some room before we overflow). AMD Bulldozer family has an integer multiply that isn't fully pipelined so this could be a bottleneck on old AMD CPUs. Using just 32-bit integers is also good for some low-power CPUs with slow 64-bit multiply.
This might be good if throughput is more of a bottleneck than latency, although movmskps can only run on one port.
I'm not sure if there are any cheaper integer operations that let us recover the logical-AND result later. Adding doesn't work; the result is non-zero even if only one of the inputs was non-zero. Concatenating the bits together (shift+or) is also of course like an OR if we eventually just test for any non-zero bit. We can't just bitwise AND because 2 & 1 == 0, unlike 2 && 1.
Keeping it in the vector domain
Horizontal OR of 4 elements takes multiple steps.
The obvious way is _mm_movehl_ps + OR, then another shuffle+OR. (See Fastest way to do horizontal float vector sum on x86 but replace _mm_add_ps with _mm_or_ps)
But since we don't actually need an exact bitwise-OR when our inputs are compare results, we just care if any element is non-zero. We can and should think of the vectors as integer, and look at integer instructions like 64-bit element ==. One 64-bit element covers/aliases two 32-bit elements.
__m128i cmp = _mm_castps_si128(cmpps_result); // reinterpret: zero instructions
// SSE4.1 pcmpeqq 64-bit integer elements
__m128i cmp64 = _mm_cmpeq_epi64(cmp, _mm_setzero_si128()); // -1 if both elements were zero, otherwise 0
__m128i swap = _mm_shuffle_epi32(cmp64, _MM_SHUFFLE(1,0, 3,2)); // copy and swap, no movdqa instruction needed even without AVX
__m128i bothzero = _mm_and_si128(cmp64, swap); // both halves have the full result
After this logical inversion, ORing together multiple bothzero results will give you the AND of multiple conditions you're looking for.
Alternatively, SSE4.1 _mm_minpos_epu16(cmp64) (phminposuw) will tell us in 1 uop (but 5 cycle latency) if either qword is zero. It will place either 0 or 0xFFFF in the lowest word (16 bits) of the result in this case.
If we inverted the original compares, we could use phminposuw on that (without pcmpeqq) to check if any are zero. So basically a horizontal AND across the whole vector. (Assuming that it's elements of 0 / -1). I think that's a useful result for inverted inputs. (And saves us from using _mm_xor_si128 to flip the bits).
An alternative to pcmpeqq (_mm_cmpeq_epi64) would be SSE2 psadbw against a zeroed vector to get 0 or non-zero results in the bottom of each 64-bit element. It won't be a mask, though, it's 0xFF * 8. Still, it's always that or 0 so you can still AND it. And it doesn't invert.
I was reading through some code examples and came across a & on Oracle's website on their Bitwise and Bit Shift Operators page. In my opinion it didn't do too well of a job explaining the bitwise &. I understand that it does a operation directly to the bit, but I am just not sure what kind of operation, and I am wondering what that operation is. Here is a sample program I got off of Oracle's website: http://docs.oracle.com/javase/tutorial/displayCode.html?code=http://docs.oracle.com/javase/tutorial/java/nutsandbolts/examples/BitDemo.java
An integer is represented as a sequence of bits in memory. For interaction with humans, the computer has to display it as decimal digits, but all the calculations are carried out as binary. 123 in decimal is stored as 1111011 in memory.
The & operator is a bitwise "And". The result is the bits that are turned on in both numbers. 1001 & 1100 = 1000, since only the first bit is turned on in both.
The | operator is a bitwise "Or". The result is the bits that are turned on in either of the numbers. 1001 | 1100 = 1101, since only the second bit from the right is zero in both.
There are also the ^ and ~ operators, that are bitwise "Xor" and bitwise "Not", respectively. Finally there are the <<, >> and >>> shift operators.
Under the hood, 123 is stored as either 01111011 00000000 00000000 00000000 or 00000000 00000000 00000000 01111011 depending on the system. Using the bitwise operators, which representation is used does not matter, since both representations are treated as the logical number 00000000000000000000000001111011. Stripping away leading zeros leaves 1111011.
It's a binary AND operator. It performs an AND operation that is a part of Boolean Logic which is commonly used on binary numbers in computing.
For example:
0 & 0 = 0
0 & 1 = 0
1 & 0 = 0
1 & 1 = 1
You can also perform this on multiple-bit numbers:
01 & 00 = 00
11 & 00 = 00
11 & 01 = 01
1111 & 0101 = 0101
11111111 & 01101101 = 01101101
...
If you look at two numbers represented in binary, a bitwise & creates a third number that has a 1 in each place that both numbers have a 1. (Everywhere else there are zeros).
Example:
0b10011011 &
0b10100010 =
0b10000010
Note that ones only appear in a place when both arguments have a one in that place.
Bitwise ands are useful when each bit of a number stores a specific piece of information.
You can also use them to delete/extract certain sections of numbers by using masks.
If you expand the two variables according to their hex code, these are:
bitmask : 0000 0000 0000 1111
val: 0010 0010 0010 0010
Now, a simple bitwise AND operation results in the number 0000 0000 0000 0010, which in decimal units is 2. I'm assuming you know about the fundamental Boolean operations and number systems, though.
Its a logical operation on the input values. To understand convert the values into the binary form and where bot bits in position n have a 1 the result has a 1. At the end convert back.
For example with those example values:
0x2222 = 10001000100010
0x000F = 00000000001111
result = 00000000000010 => 0x0002 or just 2
Knowing how Bitwise AND works is not enough. Important part of learning is how we can apply what we have learned. Here is a use case for applying Bitwise AND.
Example:
Adding any even number in binary with 1's binary will result in zeros. Because all the even number has it's last bit(reading left to right) 0 and the only bit 1 has is 1 at the end.
If you were to ask write a function which takes an argument as a number and returns true for even number without using addition, multiplication, division, subtraction, modulo and you cannot convert number to string.
This function is a perfect use case for using Bitwise AND. As I have explained earlier. You ask show me the code? Here is the java code.
/**
* <p> Helper function </p>
* #param number
* #return 0 for even otherwise 1
*/
private int isEven(int number){
return (number & 1);
}
is doing the logical and digit by digit so for example 4 & 1 became
10 & 01 = 1x0,0x1 = 00 = 0
n & 1 is used for checking even numbers since if a number is even the oeration it will aways be 0
import.java.io.*;
import.java.util.*;
public class Test {
public static void main(String[] args) {
int rmv,rmv1;
//this R.M.VIVEK complete bitwise program for java
Scanner vivek=new Scanner();
System.out.println("ENTER THE X value");
rmv = vivek.nextInt();
System.out.println("ENTER THE y value");
rmv1 = vivek.nextInt();
System.out.println("AND table based\t(&)rmv=%d,vivek=%d=%d\n",rmv,rmv1,rmv&rmv1);//11=1,10=0
System.out.println("OR table based\t(&)rmv=%d,vivek=%d=%d\n",rmv,rmv1,rmv|rmv1);//10=1,00=0
System.out.println("xOR table based\t(&)rmv=%d,vivek=%d=%d\n",rmv,rmv1,rmv^rmv1);
System.out.println("LEFT SWITH based to %d>>4=%d\n",rmv<<4);
System.out.println("RIGTH SWITH based to %d>>2=%d\n",rmv>>2);
for(int v=1;v<=10;v++)
System.out.println("LIFT SWITH based to (-NAGATIVE VALUE) -1<<%d=%p\n",i,-1<<1+i);
}
}
I have to partition a multiset into two sets who sums are equal. For example, given the multiset:
1 3 5 1 3 -1 2 0
I would output the two sets:
1) 1 3 3
2) 5 -1 2 1 0
both of which sum to 7.
I need to do this using Z3 (smt2 input format) and "Linear Arithmetic Logic", which is defined as:
formula : formula /\ formula | (formula) | atom
atom : sum op sum
op : = | <= | <
sum : term | sum + term
term : identifier | constant | constant identifier
I honestly don't know where to begin with this and any advice at all would be appreciated.
Regards.
Here is an idea:
1- Create a 0-1 integer variable c_i for each element. The idea is c_i is zero if element is in the first set, and 1 if it is in the second set. You can accomplish that by saying that 0 <= c_i and c_i <= 1.
2- The sum of the elements in the first set can be written as 1*(1 - c_1) + 3*(1 - c_2) + ... +
3- The sum of the elements in the second set can be written as 1*c1 + 3*c2 + ...
While SMT-Lib2 is quite expressive, it's not the easiest language to program in. Unless you have a hard requirement that you have to code directly in SMTLib2, I'd recommend looking into other languages that have higher-level bindings to SMT solvers. For instance, both Haskell and Scala have libraries that allow you to script SMT solvers at a much higher level. Here's how to solve your problem using the Haskell, for instance: https://gist.github.com/1701881.
The idea is that these libraries allow you to code at a much higher level, and then perform the necessary translation and querying of the SMT solver for you behind the scenes. (If you really need to get your hands onto the SMTLib encoding of your problem, you can use these libraries as well, as they typically come with the necessary API to dump the SMTLib they generate before querying the solver.)
While these libraries may not offer everything that Z3 gives you access to via SMTLib, they are much easier to use for most practical problems of interest.
I need a base converter function for Lua. I need to convert from base 10 to base 2,3,4,5,6,7,8,9,10,11...36 how can i to this?
In the string to number direction, the function tonumber() takes an optional second argument that specifies the base to use, which may range from 2 to 36 with the obvious meaning for digits in bases greater than 10.
In the number to string direction, this can be done slightly more efficiently than Nikolaus's answer by something like this:
local floor,insert = math.floor, table.insert
function basen(n,b)
n = floor(n)
if not b or b == 10 then return tostring(n) end
local digits = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
local t = {}
local sign = ""
if n < 0 then
sign = "-"
n = -n
end
repeat
local d = (n % b) + 1
n = floor(n / b)
insert(t, 1, digits:sub(d,d))
until n == 0
return sign .. table.concat(t,"")
end
This creates fewer garbage strings to collect by using table.concat() instead of repeated calls to the string concatenation operator ... Although it makes little practical difference for strings this small, this idiom should be learned because otherwise building a buffer in a loop with the concatenation operator will actually tend to O(n2) performance while table.concat() has been designed to do substantially better.
There is an unanswered question as to whether it is more efficient to push the digits on a stack in the table t with calls to table.insert(t,1,digit), or to append them to the end with t[#t+1]=digit, followed by a call to string.reverse() to put the digits in the right order. I'll leave the benchmarking to the student. Note that although the code I pasted here does run and appears to get correct answers, there may other opportunities to tune it further.
For example, the common case of base 10 is culled off and handled with the built in tostring() function. But similar culls can be done for bases 8 and 16 which have conversion specifiers for string.format() ("%o" and "%x", respectively).
Also, neither Nikolaus's solution nor mine handle non-integers particularly well. I emphasize that here by forcing the value n to an integer with math.floor() at the beginning.
Correctly converting a general floating point value to any base (even base 10) is fraught with subtleties, which I leave as an exercise to the reader.
you can use a loop to convert an integer into a string containting the required base. for bases below 10 use the following code, if you need a base larger than that you need to add a line that mapps the result of x % base to a character (usign an array for example)
x = 1234
r = ""
base = 8
while x > 0 do
r = "" .. (x % base ) .. r
x = math.floor(x / base)
end
print( r );
Is there any input that SHA-1 will compute to a hex value of fourty-zeros, i.e. "0000000000000000000000000000000000000000"?
Yes, it's just incredibly unlikely. I.e. one in 2^160, or 0.00000000000000000000000000000000000000000000006842277657836021%.
Also, becuase SHA1 is cryptographically strong, it would also be computationally unfeasible (at least with current computer technology -- all bets are off for emergent technologies such as quantum computing) to find out what data would result in an all-zero hash until it occurred in practice. If you really must use the "0" hash as a sentinel be sure to include an appropriate assertion (that you did not just hash input data to your "zero" hash sentinel) that survives into production. It is a failure condition your code will permanently need to check for. WARNING: Your code will permanently be broken if it does.
Depending on your situation (if your logic can cope with handling the empty string as a special case in order to forbid it from input) you could use the SHA1 hash ('da39a3ee5e6b4b0d3255bfef95601890afd80709') of the empty string. Also possible is using the hash for any string not in your input domain such as sha1('a') if your input has numeric-only as an invariant. If the input is preprocessed to add any regular decoration then a hash of something without the decoration would work as well (eg: sha1('abc') if your inputs like 'foo' are decorated with quotes to something like '"foo"').
I don't think so.
There is no easy way to show why it's not possible. If there was, then this would itself be the basis of an algorithm to find collisions.
Longer analysis:
The preprocessing makes sure that there is always at least one 1 bit in the input.
The loop over w[i] will leave the original stream alone, so there is at least one 1 bit in the input (words 0 to 15). Even with clever design of the bit patterns, at least some of the values from 0 to 15 must be non-zero since the loop doesn't affect them.
Note: leftrotate is circular, so no 1 bits will get lost.
In the main loop, it's easy to see that the factor k is never zero, so temp can't be zero for the reason that all operands on the right hand side are zero (k never is).
This leaves us with the question whether you can create a bit pattern for which (a leftrotate 5) + f + e + k + w[i] returns 0 by overflowing the sum. For this, we need to find values for w[i] such that w[i] = 0 - ((a leftrotate 5) + f + e + k)
This is possible for the first 16 values of w[i] since you have full control over them. But the words 16 to 79 are again created by xoring the first 16 values.
So the next step could be to unroll the loops and create a system of linear equations. I'll leave that as an exercise to the reader ;-) The system is interesting since we have a loop that creates additional equations until we end up with a stable result.
Basically, the algorithm was chosen in such a way that you can create individual 0 words by selecting input patterns but these effects are countered by xoring the input patterns to create the 64 other inputs.
Just an example: To make temp 0, we have
a = h0 = 0x67452301
f = (b and c) or ((not b) and d)
= (h1 and h2) or ((not h1) and h3)
= (0xEFCDAB89 & 0x98BADCFE) | (~0x98BADCFE & 0x10325476)
= 0x98badcfe
e = 0xC3D2E1F0
k = 0x5A827999
which gives us w[0] = 0x9fb498b3, etc. This value is then used in the words 16, 19, 22, 24-25, 27-28, 30-79.
Word 1, similarly, is used in words 1, 17, 20, 23, 25-26, 28-29, 31-79.
As you can see, there is a lot of overlap. If you calculate the input value that would give you a 0 result, that value influences at last 32 other input values.
The post by Aaron is incorrect. It is getting hung up on the internals of the SHA1 computation while ignoring what happens at the end of the round function.
Specifically, see the pseudo-code from Wikipedia. At the end of the round, the following computation is done:
h0 = h0 + a
h1 = h1 + b
h2 = h2 + c
h3 = h3 + d
h4 = h4 + e
So an all 0 output can happen if h0 == -a, h1 == -b, h2 == -c, h3 == -d, and h4 == -e going into this last section, where the computations are mod 2^32.
To answer your question: nobody knows whether there exists an input that produces all zero outputs, but cryptographers expect that there are based upon the simple argument provided by daf.
Without any knowledge of SHA-1 internals, I don't see why any particular value should be impossible (unless explicitly stated in the description of the algorithm). An all-zero value is no more or less probable than any other specific value.
Contrary to all of the current answers here, nobody knows that. There's a big difference between a probability estimation and a proof.
But you can safely assume it won't happen. In fact, you can safely assume that just about ANY value won't be the result (assuming it wasn't obtained through some SHA-1-like procedures). You can assume this as long as SHA-1 is secure (it actually isn't anymore, at least theoretically).
People doesn't seem realize just how improbable it is (if all humanity focused all of it's current resources on finding a zero hash by bruteforcing, it would take about xxx... ages of the current universe to crack it).
If you know the function is safe, it's not wrong to assume it won't happen. That may change in the future, so assume some malicious inputs could give that value (e.g. don't erase user's HDD if you find a zero hash).
If anyone still thinks it's not "clean" or something, I can tell you that nothing is guaranteed in the real world, because of quantum mechanics. You assume you can't walk through a solid wall just because of an insanely low probability.
[I'm done with this site... My first answer here, I tried to write a nice answer, but all I see is a bunch of downvoting morons who are wrong and can't even tell the reason why are they doing it. Your community really disappointed me. I'll still use this site, but only passively]
Contrary to all answers here, the answer is simply No.
The hash value always contains bits set to 1.