Minimizing memory usage in Julia function - memory

This function is a workhorse which I want to optimize. Any idea on how its memory usage can be limited would be great.
function F(len, rNo, n, ratio = 0.5)
s = zeros(len); m = copy(s); d = copy(s);
s[rNo]=1
rNo ≤ len-1 && (m[rNo + 1] = s[rNo+1] = -n[rNo])
rNo > 1 && (m[rNo - 1] = s[rowNo-1] = n[rowNo-1])
r=1
while true
for i ∈ 2:len-1
d[i] = (n[i]*m[i+1] - n[i-1]*m[i-1])/(r+1)
end
d[1] = n[1]*m[2]/(r+1);
d[len] = -n[len-1]*m[len-1]/(r+1);
for i ∈ 1:len
s[i]+=d[i]
end
sum(abs.(d))/sum(abs.(m)) < ratio && break #converged
m = copy(d); r+=1
end
return reshape(s, 1, :)
end
It calculates rows of a special matrix exponential which I stack later.
Although the full method is quite faster than built in exp thanks to the special properties, it takes up far more memory as measured by #time.
Since I am a noob in memory management and also in Julia, I am sure it can be optimized quite a bit..
Am I doing something obviously wrong?

I think most of your allocations come from sum(abs.(d))/sum(abs.(m)) < ratio && break #converged. If you replace it with sum(abs, d)/sum(abs,m) < ratio && break #converged those allocations should go away. (it also will be a speed boost).
Your other allocations can be removed by replacing m = copy(d) with m .= d which does an element-wise copy.
There are also a couple of style things where I think you could make this a nicer function to read and use. My changes would be as follows
function F(rNo, v, ratio = 0.5)
len = length(v)
s = zeros(len+1); m = copy(s); d = copy(s);
s[rNo]=1
rNo ≤ len && (m[rNo + 1] = s[rNo+1] = -v[rNo])
rNo > 1 && (m[rNo - 1] = s[rowNo-1] = v[rowNo-1])
r=1
while true
for i ∈ 2:len
d[i] = (v[i]*m[i+1] - v[i-1]*m[i-1]) / (r+1)
end
d[1] = v[1]*m[2]/(r+1);
d[end] = -v[end]*m[end]/(r+1);
s .+= d
sum(abs, d)/sum(abs, m) < ratio && break #converged
m .= d; r+=1
end
return reshape(s, 1, :)
end
The most notable change is removing len from the arguments. Including an array length argument is common in C (and probably others) where finding the length of an array is hard, but in Julia length is cheap (O(1)), and adding extra arguments is just more clutter and confusion for the people using it. I also made use of the fact that julia is able to turn s[end] into s[length(x)] to make this a little cleaner. Also, in general when using Julia you should look for ways to use dotted operations rather than writing for loops. The for loops will be fast, but why take 3 lines to do what you could in 1 shorter line? (I also renamed n to v since to me n is a number and v is a vector, but that is pure preference).
I hope this helps.

Related

Can somebody help to model this function (polynomial function) in SMT solver Z3?

F(x1) > a;
F(x2) < b;
∀t, F'(x) >= 0 (derivative) ;
F(x) = ∑ ci*x^i; (i∈[0,n] ; c is a constant)
Your question is quite ambiguous, and stack-overflow works the best if you show what you tried and what problems you ran into.
Nevertheless, here's how one can code your problem for a specific function F = 2x^3 + 3x + 4, using the Python interface to z3:
from z3 import *
# Represent F as a function. Here we have 2x^3 + 3x + 4
def F(x):
return 2*x*x*x + 3*x + 4
# Similarly, derivative of F: 6x^2 + 3
def dF(x):
return 6*x*x + 3
x1, x2, a, b = Ints('x1 x2 a b')
s = Solver()
s.add(F(x1) > a)
s.add(F(x2) < b)
t = Int('t')
s.add(ForAll([t], dF(t) >= 0))
r = s.check()
if r == sat:
print s.model()
else:
print ("Solver said: %s" % r)
Note that I translated your ∀t, F'(x) >= 0 condition as ∀t. F'(t) >= 0. I assume you had a typo there in the bound variable.
When I run this, I get:
[x1 = 0, x2 = 0, b = 5, a = 3]
This method can be generalized to arbitrary polynomials with constant coefficients in the obvious way, but that's mostly about programming and not z3. (Note that doing so in SMTLib is much harder. This is where the facilities of host languages like Python and others come into play.)
Note that this problem is essentially non-linear. (Variables are being multiplied with variables.) So, SMT solvers may not be the best choice here, as they don't deal all that well with non-linear operations. But you can deal with those problems as they arise later on. Hope this gets you started!

Reducing an integer set in z3 over addition

I'm experimenting with (and failing at) reducing sets in z3 over operations like addition. The idea is eventually to prove stuff about arbitrary reductions over reasonably-sized fixed-sized sets.
The first of the two examples below seems like it should yield unsat, but it doesn't. The second does work, but I would prefer not to use it as it requires incrementally fiddling with the model.
def test_reduce():
LIM = 5
VARS = 10
poss = [Int('i%d'%x) for x in range(VARS)]
i = Int('i')
s = Solver()
arr = Array('arr', IntSort(), BoolSort())
s.add(arr == Lambda(i, And(i < LIM, i >= 0)))
a = arr
for x in range(len(poss)):
s.add(Implies(a != EmptySet(IntSort()), arr[poss[x]]))
a = SetDel(a, poss[x])
def final_stmt(l):
if len(l) == 0: return 0
return If(Not(arr[l[0]]), 0, l[0] + (0 if len(l) == 1 else final_stmt(l[1:])))
sm = final_stmt(poss)
s.push()
s.add(sm == 1)
assert s.check() == unsat
Interestingly, the example below works much better, but I'm not sure why...
def test_reduce_with_loop_model():
s = Solver()
i = Int('i')
arr = Array('arr', IntSort(), BoolSort())
LIM = 1000
s.add(arr == Lambda(i, And(i < LIM, i >= 0)))
sm = 0
f = Int(str(uuid4()))
while True:
s.push()
s.add(arr[f])
chk = s.check()
if chk == unsat:
s.pop()
break
tmp = s.model()[f]
sm = sm + tmp
s.pop()
s.add(f != tmp)
s.push()
s.add(sm == sum(range(LIM)))
assert s.check() == sat
s.pop()
s.push()
s.add(sm == 11)
assert s.check() == unsat
Note that your call to:
f = Int(str(uuid4()))
Is inside the loop in the first case, and is outside the loop in the second case. So, the second case simply works on one variable, and thus converges quickly. While the first one keeps creating variables and creates a much harder problem for z3. It's not surprising at all that these two behave significantly differently, as they encode entirely different constraints.
As a general note, reducing an array of elements with an operation is just not going to be an easy problem for z3. First, you have to assume an upper bound on the elements. And if that's the case, then why bother with Lambda or Array at all? Simply create a Python list of that many variables, and ignore the array logic completely. That is:
elts = [Int("s%d"%i) for i in range(100)]
And then to access the elements of your 'array', simply use Python accessor notation elts[12].
Note that this only works if all your accesses are with a constant integer; i.e., your index cannot be symbolic. But if you're looking for proving reduction properties, that should suffice; and would be much more efficient.

Efficient way to extract and collect a random subsample of a generator in Julia

Consider a generator in Julia that if collected will take a lot of memory
g=(x^2 for x=1:9999999999999999)
I want to take a random small subsample (Say 1%) of it, but I do not want to collect() the object because will take a lot of memory
Until now the trick I was using was this
temp=collect((( rand()>0.01 ? nothing : x ) for x in g))
random_sample= temp[temp.!=nothing]
But this is not efficient for generators with a lot of elements, collecting something with so many nothing elements doesnt seem right
Any idea is highly appreciated. I guess the trick is to be able to get random elements from the generator without having to allocate memory for all of it.
Thank you very much
You can use a generator with if condition like this:
[v for v in g if rand() < 0.01]
or if you want a bit faster, but more verbose approach (I have hardcoded 0.01 and element type of g and I assume that your generator supports length - otherwise you can remove sizehint! line):
function collect_sample(g)
r = Int[]
sizehint!(r, round(Int, length(g) * 0.01))
for v in g
if rand() < 0.01
push!(r, v)
end
end
r
end
EDIT
Here you have examples of self avoiding sampler and reservoir sampler giving you fixed output size. The smaller fraction of the input you want to get the better it is to use self avoiding sampler:
function self_avoiding_sampler(source_size, ith, target_size)
rng = 1:source_size
idx = rand(rng)
x1 = ith(idx)
r = Vector{typeof(x1)}(undef, target_size)
r[1] = x1
s = Set{Int}(idx)
sizehint!(s, target_size)
for i = 2:target_size
while idx in s
idx = rand(rng)
end
#inbounds r[i] = ith(idx)
push!(s, idx)
end
r
end
function reservoir_sampler(g, target_size)
r = Vector{Int}(undef, target_size)
for (i, v) in enumerate(g)
if i <= target_size
#inbounds r[i] = v
else
j = rand(1:i)
if j < target_size
#inbounds r[j] = v
end
end
end
r
end

Optimization with F#

I'm quite new to F# and have a problem.
I want to solve a nonlinear, constrained optimization problem.
The goal is to minimize a function minFunc with six parameters a, b, c, d, gamma and rho_infty, (the function is quite long so I don´t post it here) and the additional conditions:
a + d > 0,
d > 0,
c > 0,
gamma > 0,
0 <= gamma <= -ln(rho_infty),
0 < roh_infty <= 1.
I´ve tried it with with the Nelder Mead Solver from the Microsoft Solver Foundation, but I don´t know how to add the nonlinear conditions a + d > 0 and 0 <= gamma <= -ln(rho_infty).
My Code so far:
open Microsoft.SolverFoundation.Common
open Microsoft.SolverFoundation.Solvers
let funcFindParameters (startValues:float list) minimizationFunc =
let xInitial = startValues |> List.toArray
let lowerBound = [|-infinity; -infinity; 0.0; 0.0; 0.0; 0.0|]
let upperBound = [|infinity; infinity; infinity; infinity; infinity; 1.0|]
let solution = NelderMeadSolver.Solve(Func<float [], _>(fun parameters -> (minimizationFunc
parameters.[0] parameters.[1] parameters.[2] parameters.[3] parameters.[4] parameters.[5])),
xInitial, lowerBound, upperBound)
where parameters.[0] = a, and so one...
Is there perhaps some possibility to solve it with the Nelder Mead Solver or some other solver?
One comment, is that I would stay away from the Microsoft.SolverFoundation, I have wasted hours of my life on bad algorithms coded there. The R type provider is much better.
With that said, a common hack is simply to simply reparameterize the model to handle the constraints. For example, set:
e=a+d
as the parameter, and inside the optimzation calculate d as:
d=e-a
And now you just have to satisfy the constraint e>0, which is fixed. You can do something similar for the gamma parameter.

Unwrapping nested loops in F#

I've been struggling with the following code. It's an F# implementation of the Forward-Euler algorithm used for modelling stars moving in a gravitational field.
let force (b1:Body) (b2:Body) =
let r = (b2.Position - b1.Position)
let rm = (float32)r.MagnitudeSquared + softeningLengthSquared
if (b1 = b2) then
VectorFloat.Zero
else
r * (b1.Mass * b2.Mass) / (Math.Sqrt((float)rm) * (float)rm)
member this.Integrate(dT, (bodies:Body[])) =
for i = 0 to bodies.Length - 1 do
for j = (i + 1) to bodies.Length - 1 do
let f = force bodies.[i] bodies.[j]
bodies.[i].Acceleration <- bodies.[i].Acceleration + (f / bodies.[i].Mass)
bodies.[j].Acceleration <- bodies.[j].Acceleration - (f / bodies.[j].Mass)
bodies.[i].Position <- bodies.[i].Position + bodies.[i].Velocity * dT
bodies.[i].Velocity <- bodies.[i].Velocity + bodies.[i].Acceleration * dT
While this works it isn't exactly "functional". It also suffers from horrible performance, it's 2.5 times slower than the equivalent c# code. bodies is an array of structs of type Body.
The thing I'm struggling with is that force() is an expensive function so usually you calculate it once for each pair and rely on the fact that Fij = -Fji. But this really messes up any loop unfolding etc.
Suggestions gratefully received! No this isn't homework...
Thanks,
Ade
UPDATED: To clarify Body and VectorFloat are defined as C# structs. This is because the program interops between F#/C# and C++/CLI. Eventually I'm going to get the code up on BitBucket but it's a work in progress I have some issues to sort out before I can put it up.
[StructLayout(LayoutKind.Sequential)]
public struct Body
{
public VectorFloat Position;
public float Size;
public uint Color;
public VectorFloat Velocity;
public VectorFloat Acceleration;
'''
}
[StructLayout(LayoutKind.Sequential)]
public partial struct VectorFloat
{
public System.Single X { get; set; }
public System.Single Y { get; set; }
public System.Single Z { get; set; }
}
The vector defines the sort of operators you'd expect for a standard Vector class. You could probably use the Vector3D class from the .NET framework for this case (I'm actually investigating cutting over to it).
UPDATE 2: Improved code based on the first two replies below:
for i = 0 to bodies.Length - 1 do
for j = (i + 1) to bodies.Length - 1 do
let r = ( bodies.[j].Position - bodies.[i].Position)
let rm = (float32)r.MagnitudeSquared + softeningLengthSquared
let f = r / (Math.Sqrt((float)rm) * (float)rm)
bodies.[i].Acceleration <- bodies.[i].Acceleration + (f * bodies.[j].Mass)
bodies.[j].Acceleration <- bodies.[j].Acceleration - (f * bodies.[i].Mass)
bodies.[i].Position <- bodies.[i].Position + bodies.[i].Velocity * dT
bodies.[i].Velocity <- bodies.[i].Velocity + bodies.[i].Acceleration * dT
The branch in the force function to cover the b1 == b2 case is the worst offender. You do't need this if softeningLength is always non-zero, even if it's very small (Epsilon). This optimization was in the C# code but not the F# version (doh!).
Math.Pow(x, -1.5) seems to be a lot slower than 1/ (Math.Sqrt(x) * x). Essentially this algorithm is slightly odd in that it's perfromance is dictated by the cost of this one step.
Moving the force calculation inline and getting rid of some divides also gives some improvement, but the performance was really being killed by the branching and is dominated by the cost of Sqrt.
WRT using classes over structs: There are cases (CUDA and native C++ implementations of this code and a DX9 renderer) where I need to get the array of bodies into unmanaged code or onto a GPU. In these scenarios being able to memcpy a contiguous block of memory seems like the way to go. Not something I'd get from an array of class Body.
I'm not sure if it's wise to rewrite this code in a functional style. I've seen some attempts to write pair interaction calculations in a functional manner and each one of them was harder to follow than two nested loops.
Before looking at structs vs. classes (I'm sure someone else has something smart to say about this), maybe you can try optimizing the calculation itself?
You're calculating two acceleration deltas, let's call them dAi and dAj:
dAi = r*m1*m2/(rm*sqrt(rm)) / m1
dAj = r*m1*m2/(rm*sqrt(rm)) / m2
[note: m1 = bodies.[i].mass, m2=bodies.[j].mass]]
The division by mass cancels out like this:
dAi = rm2 / (rmsqrt(rm))
dAj = rm1 / (rmsqrt(rm))
Now you only have to calculate r/(rmsqrt(rm)) for each pair (i,j).
This can be optimized further, because 1/(rmsqrt(rm)) = 1/(rm^1.5) = rm^-1.5, so if you let r' = r * (rm ** -1.5), then Edit: no it can't, that's premature optimization talking right there (see comment). Calculating r' = 1.0 / (r * sqrt r) is fastest.
dAi = m2 * r'
dAj = m1 * r'
Your code would then become something like
member this.Integrate(dT, (bodies:Body[])) =
for i = 0 to bodies.Length - 1 do
for j = (i + 1) to bodies.Length - 1 do
let r = (b2.Position - b1.Position)
let rm = (float32)r.MagnitudeSquared + softeningLengthSquared
let r' = r * (rm ** -1.5)
bodies.[i].Acceleration <- bodies.[i].Acceleration + r' * bodies.[j].Mass
bodies.[j].Acceleration <- bodies.[j].Acceleration - r' * bodies.[i].Mass
bodies.[i].Position <- bodies.[i].Position + bodies.[i].Velocity * dT
bodies.[i].Velocity <- bodies.[i].Velocity + bodies.[i].Acceleration * dT
Look, ma, no more divisions!
Warning: untested code. Try at your own risk.
I'd like to play arround with your code, but it's difficult since the definition of Body and FloatVector is missing and they also seem to be missing from the orginal blog post you point to.
I'd hazard a guess that you could improve your performance and rewrite in a more functional style using F#'s lazy computations:
http://msdn.microsoft.com/en-us/library/dd233247(VS.100).aspx
The idea is fairly simple you wrap any expensive computation that could be repeatedly calculated in a lazy ( ... ) expression then you can force the computation as many times as you like and it will only ever be calculated once.

Resources