Dask: what function variable is best to choose for visualize() - dask

I am trying to understand Dask delayed more deeply so I decided to work through the examples here. I modified some of the code to reflect how I want to use Dask (see below). But the results are different than what I expected ie. a tuple vs list. When I try to apply '.visualize()' to see what the execution graph looks like I get nothing.
I worked through all the examples in 'delayed.ipynb' and they all work properly including all the visualizations. I then modified the 'for' loop for one example:
for i in range(256):
x = inc(i)
y = dec(x)
z = add(x, y)
zs.append(z)
to a function call the uses a list comprehension. The result is a variation on the original working example.
%%time
import time
import random
from dask import delayed, compute, visualize
zs = []
#delayed
def inc(x):
time.sleep(random.random())
return x + 1
#delayed
def dec(x):
time.sleep(random.random())
return x - 1
#delayed
def add(x, y):
time.sleep(random.random())
return x + y
def myloop(x):
x.append([add(inc(i), dec(inc(i))) for i in range(8)])
return x
result = myloop(zs)
final = compute(*result)
print(final)
I have tried printing out 'result' (function call) which provides the expected list of delay calls but when I print the results of 'compute' I unexpectedly get the desired list as part of a tuple. Why don't I get a just a list?
When I try to 'visualize' the execution graph I get nothing at all. I was expecting to see as many nodes as are in the generated list.
I did not think I made any significant modifications to the example so what am I not understanding?

The visualize function has the same call signature as compute. So if your compute(*result) call works then try visualize(*result)

Related

Creating an 'add' computation expression

I'd like the example computation expression and values below to return 6. For some the numbers aren't yielding like I'd expect. What's the step I'm missing to get my result? Thanks!
type AddBuilder() =
let mutable x = 0
member _.Yield i = x <- x + i
member _.Zero() = 0
member _.Return() = x
let add = AddBuilder()
(* Compiler tells me that each of the numbers in add don't do anything
and suggests putting '|> ignore' in front of each *)
let result = add { 1; 2; 3 }
(* Currently the result is 0 *)
printfn "%i should be 6" result
Note: This is just for creating my own computation expression to expand my learning. Seq.sum would be a better approach. I'm open to the idea that this example completely misses the value of computation expressions and is no good for learning.
There is a lot wrong here.
First, let's start with mere mechanics.
In order for the Yield method to be called, the code inside the curly braces must use the yield keyword:
let result = add { yield 1; yield 2; yield 3 }
But now the compiler will complain that you also need a Combine method. See, the semantics of yield is that each of them produces a finished computation, a resulting value. And therefore, if you want to have more than one, you need some way to "glue" them together. This is what the Combine method does.
Since your computation builder doesn't actually produce any results, but instead mutates its internal variable, the ultimate result of the computation should be the value of that internal variable. So that's what Combine needs to return:
member _.Combine(a, b) = x
But now the compiler complains again: you need a Delay method. Delay is not strictly necessary, but it's required in order to mitigate performance pitfalls. When the computation consists of many "parts" (like in the case of multiple yields), it's often the case that some of them should be discarded. In these situation, it would be inefficient to evaluate all of them and then discard some. So the compiler inserts a call to Delay: it receives a function, which, when called, would evaluate a "part" of the computation, and Delay has the opportunity to put this function in some sort of deferred container, so that later Combine can decide which of those containers to discard and which to evaluate.
In your case, however, since the result of the computation doesn't matter (remember: you're not returning any results, you're just mutating the internal variable), Delay can just execute the function it receives to have it produce the side effects (which are - mutating the variable):
member _.Delay(f) = f ()
And now the computation finally compiles, and behold: its result is 6. This result comes from whatever Combine is returning. Try modifying it like this:
member _.Combine(a, b) = "foo"
Now suddenly the result of your computation becomes "foo".
And now, let's move on to semantics.
The above modifications will let your program compile and even produce expected result. However, I think you misunderstood the whole idea of the computation expressions in the first place.
The builder isn't supposed to have any internal state. Instead, its methods are supposed to manipulate complex values of some sort, some methods creating new values, some modifying existing ones. For example, the seq builder1 manipulates sequences. That's the type of values it handles. Different methods create new sequences (Yield) or transform them in some way (e.g. Combine), and the ultimate result is also a sequence.
In your case, it looks like the values that your builder needs to manipulate are numbers. And the ultimate result would also be a number.
So let's look at the methods' semantics.
The Yield method is supposed to create one of those values that you're manipulating. Since your values are numbers, that's what Yield should return:
member _.Yield x = x
The Combine method, as explained above, is supposed to combine two of such values that got created by different parts of the expression. In your case, since you want the ultimate result to be a sum, that's what Combine should do:
member _.Combine(a, b) = a + b
Finally, the Delay method should just execute the provided function. In your case, since your values are numbers, it doesn't make sense to discard any of them:
member _.Delay(f) = f()
And that's it! With these three methods, you can add numbers:
type AddBuilder() =
member _.Yield x = x
member _.Combine(a, b) = a + b
member _.Delay(f) = f ()
let add = AddBuilder()
let result = add { yield 1; yield 2; yield 3 }
I think numbers are not a very good example for learning about computation expressions, because numbers lack the inner structure that computation expressions are supposed to handle. Try instead creating a maybe builder to manipulate Option<'a> values.
Added bonus - there are already implementations you can find online and use for reference.
1 seq is not actually a computation expression. It predates computation expressions and is treated in a special way by the compiler. But good enough for examples and comparisons.

Funny behavior with numba - guvectorized functions using argmax()

Consider the following script:
from numba import guvectorize, u1, i8
import numpy as np
#guvectorize([(u1[:],i8)], '(n)->()')
def f(x, res):
res = x.argmax()
x = np.array([1,2,3],dtype=np.uint8)
print(f(x))
print(x.argmax())
print(f(x))
When running it, I get the following:
4382569440205035030
2
2
Why is this happening? Is there a way to get it right?
Python doesn't have references, so res = ... is not actually assigning to the output parameter, but instead rebinding the name res. I believe res is pointing to uninitialized memory, which is why your first run gives a seemingly random value.
Numba works around this using the slice syntax ([:]) which does mutate res- you also need to declare the type as an array. A working function is:
#guvectorize([(u1[:], i8[:])], '(n)->()')
def f(x, res):
res[:] = x.argmax()

Dask Delayed ignores name for dependent variables

When creating a graph of calculations using delayed I'm trying to assign names so that if I visualize the graph it's readable. However, for delayed variables that are dependent on functions the name parameter doesn't seem to affect the key. Here's a toy example:
def calc_avg(a, b):
return pd.concat([a, b], axis=1).mean(axis=1)
def calc_ratio(a, b):
return a / b
a = delayed(pd.Series(np.random.rand(10)), name='a')
b = delayed(pd.Series(np.random.rand(10)), name='b')
c = delayed(pd.Series(np.random.rand(10)), name='c')
x = delayed(calc_avg, name='avg_result')(a,b)
y = delayed(calc_ratio, name='ratio_result')(x,c)
y.visualize()
You can see the visualization here (I can't embed images), but rather than seeing 'avg_result' I see 'calc_avg-#0' and rather than see 'ratio_result' I see 'calc_ratio-#1'. If I look at x.key or y.key they do not match the names that I provided. Is this the expected behavior?
The key of a dask result needs to be unique for every combination of the function that was delayed, and the inputs you give it. What you see above is the expected behaviour: you are naming the function, but a call with different inputs would expect a different output, so the key must be different.
You can specify the key you'd like associated not when you define the delayed function, but when you call it:
x = delayed(calc_avg)(a, b, dask_key_name='avg_result')
y = delayed(calc_ratio)(x, c, dask_key_name='ratio_result')

Losing the local client when calling dark inside of a distributed-spawned function

I am trying to perform some task operations inside of a function which is sent to a worker through distributed. A simplified version of the code is
client = Client(...)
X_ = dask.array.from_array(...)
X = dask.persist(X_)
def func(X, b):
with distributed.local_client() as c:
with dask.set_options(get=c.get):
return dask.lu_solve(X, b)
client.persist(dask.do(func)(X, b))
The problem is that in doing for several X, b instances, sometimes it works and sometimes I get the Exception Exception: Client not running. Status: closed
any idea on how to address this?
When you pass the inputs dask.array X and b to a dask.delayed function they arrive as numpy arrays. I recommend just using NumPy functions instead.
Alternatively, maybe you're trying to accomplish something else?
If you want to call a dask.array function on dask.arrays you can do it from your normal python session. There is no reason to use a local_client.

Can Z3 call python function during decision making of variables?

I am trying to solve a problem, for example I have a 4 point and each two point has a cost between them. Now I want to find a sequence of nodes which total cost would be less than a bound. I have written a code but it seems not working. The main problem is I have define a python function and trying to call it with in a constraint.
Here is my code: I have a function def getVal(n1,n2): where n1, n2 are Int Sort. The line Nodes = [ Int("n_%s" % (i)) for i in range(totalNodeNumber) ] defines 4 points as Int sort and when I am adding a constraint s.add(getVal(Nodes[0], Nodes[1]) + getVal(Nodes[1], Nodes[2]) < 100) then it calls getVal function immediately. But I want that, when Z3 will decide a value for Nodes[0], Nodes[1], Nodes[2], Nodes[3] then the function should be called for getting the cost between to points.
from z3 import *
import random
totalNodeNumber = 4
Nodes = [ Int("n_%s" % (i)) for i in range(totalNodeNumber) ]
def getVal(n1,n2):
# I need n1 and n2 values those assigned by Z3
cost = random.randint(1,20)
print cost
return IntVal(cost)
s = Solver()
#constraint: Each Nodes value should be distinct
nodes_index_distinct_constraint = Distinct(Nodes)
s.add(nodes_index_distinct_constraint)
#constraint: Each Nodes value should be between 0 and totalNodeNumber
def get_node_index_value_constraint(i):
return And(Nodes[i] >= 0, Nodes[i] < totalNodeNumber)
nodes_index_constraint = [ get_node_index_value_constraint(i) for i in range(totalNodeNumber)]
s.add(nodes_index_constraint)
#constraint: Problem with this constraint
# Here is the problem it's just called python getVal function twice without assiging Nodes[0],Nodes[1],Nodes[2] values
# But I want to implement that - Z3 will call python function during his decission making of variables
s.add(getVal(Nodes[0], Nodes[1]) + getVal(Nodes[1], Nodes[2]) + getVal(Nodes[2], Nodes[3]) < 100)
if s.check() == sat:
print "SAT"
print "Model: "
m = s.model()
nodeIndex = [ m.evaluate(Nodes[i]) for i in range(totalNodeNumber) ]
print nodeIndex
else:
print "UNSAT"
print "No solution found !!"
If this is not a right way to solve the problem then could you please tell me what would be other alternative way to solve it. Can I encode this kind of problem to find optimal sequence of way points using Z3 solver?
I don't understand what problem you need to solve. Definitely, the way getVal is formulated does not make sense. It does not use the arguments n1, n2. If you want to examine values produced by a model, then you do this after Z3 returns from a call to check().
I don't think you can use a python function in your SMT logic. What you could alternatively is define getVal as a Function like this
getVal = Function('getVal',IntSort(),IntSort(),IntSort())
And constraint the edge weights as
s.add(And(getVal(0,1)==1,getVal(1,2)==2,getVal(0,2)==3))
The first two input parameters of getVal represent the node ids and the last integer represents the weight.

Resources