Trying to use Dafny as a CAS to check that some algebraic calculations are correct.
Dafny does a good job, except that it gets unstable for longer ones, failing to verify some very easy steps even when spoon-fed.
calc == {
// a dozen lines...
k * (a + b) * (a + b);
k * (a * a + b * b + 2 * a * b); // fails
}
When Dafny fails there, I try to help it, in vain.
calc == {
// a dozen lines...
k * (a + b) * (a + b);
calc == {
(a + b) * (a + b);
a * a + b * b + 2 * a * b;
}
k * (a * a + b * b + 2 * a * b); // still fails
}
I've also tried to put the substitued expression in a variable or even a lemma, with the same result.
Is there some way to tell Dafny which part of an expression we want to substitute?
Related
I was attempting to do JIT compilation on a pytorch-based module from an NLP library and I saw that one of the generated fused CUDA kernel code implementations mentions the number 1.000000015047466e+30:
#define NAN __int_as_float(0x7fffffff)
#define POS_INFINITY __int_as_float(0x7f800000)
#define NEG_INFINITY __int_as_float(0xff800000)
template<typename T>
__device__ T maximum(T a, T b) {
return isnan(a) ? a : (a > b ? a : b);
}
template<typename T>
__device__ T minimum(T a, T b) {
return isnan(a) ? a : (a < b ? a : b);
}
extern "C" __global__
void fused_add_mul_mul_sub(float* tattn_mask1_1, float* tac_2, float* tbd_2, float* output_1, float* aten_mul_1) {
{
if (blockIdx.x<1ll ? 1 : 0) {
if ((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)<169ll ? 1 : 0) {
if (blockIdx.x<1ll ? 1 : 0) {
float v = __ldg(tattn_mask1_1 + (long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x));
aten_mul_1[(long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)] = v * 1.000000015047466e+30f;
} } }if ((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)<2028ll ? 1 : 0) {
float v_1 = __ldg(tac_2 + (long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x));
float v_2 = __ldg(tbd_2 + 12ll * (((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)) % 169ll) + ((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)) / 169ll);
float v_3 = __ldg(tattn_mask1_1 + ((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)) % 169ll);
output_1[(long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)] = (v_1 + v_2) * 0.125f - v_3 * 1.000000015047466e+30f;
}}
}
It feels like this should be some sort of FLOAT_MAX constant, but I don't know any numerical type that would have this as a limit.
Google searching for this constant just yields a handful of results that seem to suggest it may be a physical constant of some kind:
This MATLAB forum link suggests it may be a default value for some physical phenomena
This Github repo on whole cell electrophysiology seems to use it as a max limit of some kind.
This Opensea NFT link shows that the number is somehow used as a parameter for some kind of fractal art?
I'm absolutely baffled because I'm doing natural language processing using CUDA and have absolutely no clue why this number is appearing in my CUDA kernel code and why it's also being used in hard science research code. Does this number have some special floating point properties or something? Is it serving as a numerical stability factor somehow? Any leads would be greatly appreciated.
1.000000015047466e+30f is 1030 rounded to the nearest value representable in float (IEEE-754 binary32, also called “single precision”), then rounded to 16 decimal digits, then formatted as a float literal.
Thus, it likely originated as 1e30 or another representation of 1030 that was subsequently converted to float and then used to produce new source code.
(The float nearest 1030 is exactly 1000000015047466219876688855040.)
I've been using lunadry to reformat my code for me, but I've run into errors, namely, this happens when I try it:
lua: ./lunadry.lua:322: assertion failed!
stack traceback:
[C]: in function 'assert'
./lunadry.lua:322: in main chunk
[C]: in ?
Now I've gone through a large chunk of code I had and tracked down the source of this error to this specific function...
function e.insertvalues(e,...)g(1,e,'table')local n,t
if y('#',...)==1 then
n,t=#e+1,...else
n,t=...end
if#t>0 then
for n=#e,n,-1 do
e[n+#t]=e[n]end
local i=1-n
for n=n,n+#t-1 do
e[n]=t[n+i]end
end
return e
end
(yes, it's supposed to look ugly formatted like that).
And even more specifically, taking out this bit of code makes it work again:
if y('#',...)==1 then
n,t=#e+1,...else
n,t=...end
It is the ...else and ...end bits that cause it to mess up.
I've been trying to get it to reformat that code so it looks pretty but it causes the error. For all I know that could simply have one replication of a sea of errors in the author's code, but I hope not. Here is the source of the file which does the magic: click me.
Could someone take a look at this and tell me what needs to be changed, to solve this very annoying bug? Thank you!
This is caused by matching ... as a keyword. For example, instances of lunadry.lua:
K "..."
should instead be
C "..."
Use this patch:
diff --git a/lunadry.lua b/lunadry.lua
index e056140..19d714b 100755
--- a/lunadry.lua
+++ b/lunadry.lua
## -201,7 +201,7 ## local lua = lpeg.locale {
K "true" +
V "Number" +
V "String" +
- K "..." +
+ C "..." +
V "function" +
V "tableconstructor" +
V "functioncall" +
## -251,8 +251,8 ## local lua = lpeg.locale {
funcbody = C "(" * V "whitespace" * (V "parlist" * V "whitespace")^-1 * C ")" * INDENT_INCREASE(V "block" * V "whitespace") * INDENT * K "end";
- parlist = V "namelist" * (V "whitespace" * C "," * SPACE * V "whitespace" * K "...")^-1 +
- K "...";
+ parlist = V "namelist" * (V "whitespace" * C "," * SPACE * V "whitespace" * C "...")^-1 +
+ C "...";
tableconstructor = FLATTEN(C "{" * (INDENT_INCREASE(V "filler" * V "fieldlist" * V "filler") * INDENT + V "filler") * C "}");
I will commit the fix later today.
For example I want to calculate (reasonably efficiently)
2^1000003 mod 12321
And finally I want to do (2^1000003 - 3) mod 12321. Is there any feasible way to do this?
Basic modulo properties tell us that
1) a + b (mod n) is (a (mod n)) + (b (mod n)) (mod n), so you can split the operation in two steps
2) a * b (mod n) is (a (mod n)) * (b (mod n)) (mod n), so you can use modulo exponentiation (pseudocode):
x = 1
for (10000003 times) {
x = (x * 2) % 12321; # x will never grow beyond 12320
}
Of course, you shouldn't do 10000003 iterations, just remember that 21000003 = 2 * 21000002 , and 21000002 = (2500001)2 and so on...
In some reasonably C- or java-like language:
def modPow(Long base, Long exponent, Long modulus) = {
if (exponent < 0) {complain or throw or whatever}
else if (exponent == 0) {
return 1;
} else if (exponent & 1 == 1) { // odd exponent
return (base * modPow(base, exponent - 1, modulus)) % modulus;
} else {
Long halfexp = modPow(base, exponent / 2, modulus);
return (halfexp * halfexp) % modulus;
}
}
This requires that modulus is small enough that both (modulus - 1) * (modulus - 1) and base * (modulus - 1) won't overflow whatever integer type you're using. If modulus is too large for that, then there are some other techniques to compensate a bit, but it's probably just easier to attack it with some arbitrary-precision integer arithmetic library.
Then, what you want is:
(modPow(2, 1000003, 12321) + (12321 - 3)) % 12321
Well in Java there's an easy way to do this:
Math.pow(2, 1000003) % 12321;
For languages without the Math.* functions built in it'd be a little harder. Can you clarify which language this is supposed to be in?
thanks in advance for your help in figuring this out. I'm taking an algorithms class and I'm stuck on something. According to the professor, the following holds true where C(1)=1 and n is a power of 2:
C(n) = 2 * C(n/2) + n resolves to C(n) = n * lg(n) + n
C(n) = 2 * C(n/2) + lg(n) resolves to C(n) = 3 * n - lg(n) - 2
The first one I completely grok. As I understand the form, what's stated is that C(n) resolves to two sub-problems, each of which requires n/2 work to solve, and an additional n amount of work to split and merge everything. As such, for every division of the problem, the constant 2 is increased by a factor of ^k (where k is the number of splits), the 2 in n/2 is also increased by a factor of ^k for much the same reason, and the last n is multiplied by a factor of k because each split creates a multiple of k extra work.
My confusion stems from the second relation. Given that the first and second relations are almost identical, why isn't the result of the second something like nlgn+(lgn^2)?
The general result is the Master Theorem
But in this specific case, you can work out the math for a power of 2:
C(2^k)
= 2 * C(2^(k-1)) + lg(2^k)
= 4 * C(2^(k-2)) + lg(2^k) + 2 * lg(2^(k-1))
= ... repeat ...
= 2^k * C(1) + sum (from i=1 to k) 2^(k-i) * lg 2^i
= 2^k + lg(2) * sum (from i=1 to k) 2^(i) * i
= 2^k - 2 + 2^k+1 - k
= 3 * 2^k - k - 2
= 3 * n - lg(n) - 2
I've been struggling with the following code. It's an F# implementation of the Forward-Euler algorithm used for modelling stars moving in a gravitational field.
let force (b1:Body) (b2:Body) =
let r = (b2.Position - b1.Position)
let rm = (float32)r.MagnitudeSquared + softeningLengthSquared
if (b1 = b2) then
VectorFloat.Zero
else
r * (b1.Mass * b2.Mass) / (Math.Sqrt((float)rm) * (float)rm)
member this.Integrate(dT, (bodies:Body[])) =
for i = 0 to bodies.Length - 1 do
for j = (i + 1) to bodies.Length - 1 do
let f = force bodies.[i] bodies.[j]
bodies.[i].Acceleration <- bodies.[i].Acceleration + (f / bodies.[i].Mass)
bodies.[j].Acceleration <- bodies.[j].Acceleration - (f / bodies.[j].Mass)
bodies.[i].Position <- bodies.[i].Position + bodies.[i].Velocity * dT
bodies.[i].Velocity <- bodies.[i].Velocity + bodies.[i].Acceleration * dT
While this works it isn't exactly "functional". It also suffers from horrible performance, it's 2.5 times slower than the equivalent c# code. bodies is an array of structs of type Body.
The thing I'm struggling with is that force() is an expensive function so usually you calculate it once for each pair and rely on the fact that Fij = -Fji. But this really messes up any loop unfolding etc.
Suggestions gratefully received! No this isn't homework...
Thanks,
Ade
UPDATED: To clarify Body and VectorFloat are defined as C# structs. This is because the program interops between F#/C# and C++/CLI. Eventually I'm going to get the code up on BitBucket but it's a work in progress I have some issues to sort out before I can put it up.
[StructLayout(LayoutKind.Sequential)]
public struct Body
{
public VectorFloat Position;
public float Size;
public uint Color;
public VectorFloat Velocity;
public VectorFloat Acceleration;
'''
}
[StructLayout(LayoutKind.Sequential)]
public partial struct VectorFloat
{
public System.Single X { get; set; }
public System.Single Y { get; set; }
public System.Single Z { get; set; }
}
The vector defines the sort of operators you'd expect for a standard Vector class. You could probably use the Vector3D class from the .NET framework for this case (I'm actually investigating cutting over to it).
UPDATE 2: Improved code based on the first two replies below:
for i = 0 to bodies.Length - 1 do
for j = (i + 1) to bodies.Length - 1 do
let r = ( bodies.[j].Position - bodies.[i].Position)
let rm = (float32)r.MagnitudeSquared + softeningLengthSquared
let f = r / (Math.Sqrt((float)rm) * (float)rm)
bodies.[i].Acceleration <- bodies.[i].Acceleration + (f * bodies.[j].Mass)
bodies.[j].Acceleration <- bodies.[j].Acceleration - (f * bodies.[i].Mass)
bodies.[i].Position <- bodies.[i].Position + bodies.[i].Velocity * dT
bodies.[i].Velocity <- bodies.[i].Velocity + bodies.[i].Acceleration * dT
The branch in the force function to cover the b1 == b2 case is the worst offender. You do't need this if softeningLength is always non-zero, even if it's very small (Epsilon). This optimization was in the C# code but not the F# version (doh!).
Math.Pow(x, -1.5) seems to be a lot slower than 1/ (Math.Sqrt(x) * x). Essentially this algorithm is slightly odd in that it's perfromance is dictated by the cost of this one step.
Moving the force calculation inline and getting rid of some divides also gives some improvement, but the performance was really being killed by the branching and is dominated by the cost of Sqrt.
WRT using classes over structs: There are cases (CUDA and native C++ implementations of this code and a DX9 renderer) where I need to get the array of bodies into unmanaged code or onto a GPU. In these scenarios being able to memcpy a contiguous block of memory seems like the way to go. Not something I'd get from an array of class Body.
I'm not sure if it's wise to rewrite this code in a functional style. I've seen some attempts to write pair interaction calculations in a functional manner and each one of them was harder to follow than two nested loops.
Before looking at structs vs. classes (I'm sure someone else has something smart to say about this), maybe you can try optimizing the calculation itself?
You're calculating two acceleration deltas, let's call them dAi and dAj:
dAi = r*m1*m2/(rm*sqrt(rm)) / m1
dAj = r*m1*m2/(rm*sqrt(rm)) / m2
[note: m1 = bodies.[i].mass, m2=bodies.[j].mass]]
The division by mass cancels out like this:
dAi = rm2 / (rmsqrt(rm))
dAj = rm1 / (rmsqrt(rm))
Now you only have to calculate r/(rmsqrt(rm)) for each pair (i,j).
This can be optimized further, because 1/(rmsqrt(rm)) = 1/(rm^1.5) = rm^-1.5, so if you let r' = r * (rm ** -1.5), then Edit: no it can't, that's premature optimization talking right there (see comment). Calculating r' = 1.0 / (r * sqrt r) is fastest.
dAi = m2 * r'
dAj = m1 * r'
Your code would then become something like
member this.Integrate(dT, (bodies:Body[])) =
for i = 0 to bodies.Length - 1 do
for j = (i + 1) to bodies.Length - 1 do
let r = (b2.Position - b1.Position)
let rm = (float32)r.MagnitudeSquared + softeningLengthSquared
let r' = r * (rm ** -1.5)
bodies.[i].Acceleration <- bodies.[i].Acceleration + r' * bodies.[j].Mass
bodies.[j].Acceleration <- bodies.[j].Acceleration - r' * bodies.[i].Mass
bodies.[i].Position <- bodies.[i].Position + bodies.[i].Velocity * dT
bodies.[i].Velocity <- bodies.[i].Velocity + bodies.[i].Acceleration * dT
Look, ma, no more divisions!
Warning: untested code. Try at your own risk.
I'd like to play arround with your code, but it's difficult since the definition of Body and FloatVector is missing and they also seem to be missing from the orginal blog post you point to.
I'd hazard a guess that you could improve your performance and rewrite in a more functional style using F#'s lazy computations:
http://msdn.microsoft.com/en-us/library/dd233247(VS.100).aspx
The idea is fairly simple you wrap any expensive computation that could be repeatedly calculated in a lazy ( ... ) expression then you can force the computation as many times as you like and it will only ever be calculated once.