Why is this (very simple) vectorized code orders of magnitude slower than Numpy?

Why is this (very simple) vectorized code orders of magnitude slower than Numpy? - vectorization

I know very well Julia's philosophy concerning vectorization and the like, and I can accept having my code run even ten times slower than Numpy, but why does this code run much, much slower than Numpy? Maybe there is a mistake in the code; I can't imagine the issue is related to using vectorization rather than loops.
I am using vectorization because my requirements are not strong; furthermore memory allocation doesn't seem to be that incredible (and it proves to be very fast with Numpy). The code is also easy to read.
The following piece of code is written in Python; it computes a generalized continued fraction over a part of the complex plane. The continued fraction is defined by two different functions; I use this code for plotting pictures on this blog: https://kettenreihen.wordpress.com/
The Python code is below; it computes 640x640 values in about 2 seconds:
def K(a, b, C):
s = C.shape
P1 = np.ones(s, dtype=np.complex)
P2 = np.zeros(s, dtype=np.complex)
Q1 = np.zeros(s, dtype=np.complex)
Q2 = np.ones(s, dtype=np.complex)
for n in range(1, 65):
A, B = a(C, n), b(C, n)
P = A*P2 + B*P1
Q = A*Q2 + B*Q1
P1, P2 = P2, P
Q1, Q2 = Q2, Q
return P2/Q2
The following Julia code should do the same, but it takes 2 or 3 minutes for computing the same thing.
function K(a, b, C)
s = size(C)
P1 = ones(Complex, s)
P2 = zeros(Complex, s)
Q1 = zeros(Complex, s)
Q2 = ones(Complex, s)
for n = 1:64
println(n)
A, B = a(C, n), b(C, n)
P = A.*P2 + B.*P1
Q = A.*Q2 + B.*Q1
P1, P2 = P2, P
Q1, Q2 = Q2, Q
end
return P2./Q2
end

You are allocating matrices with an abstract element type: the Complex type is an abstract supertype for all specific Complex{T} types. What you want here is arrays of some concrete element type like Complex128 == Complex{Float64} or Complex64 == Complex{Float32}. Presumably in NumPy dtype=np.complex refers to a specific complex type, probably the equivalent of Complex128.
If you want to write code that is generic and will work for different kinds of C matrices, then assuming C is a complex matrix and what you want is to create matrices of ones and zeros of the same element type and shape, you can just call the ones and zeros functions on C to get matrices with the right element type and shape:
function K(a, b, C)
P1 = ones(C)
P2 = zeros(C)
Q1 = zeros(C)
Q2 = ones(C)
for n = 1:64
println(n)
A, B = a(C, n), b(C, n)
P = A.*P2 + B.*P1
Q = A.*Q2 + B.*Q1
P1, P2 = P2, P
Q1, Q2 = Q2, Q
end
return P2./Q2
end
Hopefully that helps the performance. You might be able to get even more performance by pre-allocating three matrices and doing operations in place, rotating through the matrices. However, the convenient syntax support for that hasn't made it into a stable release yet, so on Julia 0.5 it would still be a little verbose, but could get you a performance boost over the vectorized version.

Related

How to use soft constraints in Z3-Python to express 'abstract' biases in SAT search: such as 'I prefer half of the literals to be true and half false'

I previously asked in How to bias Z3's (Python) SAT solving towards a criteria, such as 'preferring' to have more negated literals, whether there was a way in Z3 (Python) to 'bias' the SAT search towards a 'criteria'?
In that post, we learnt 'simple biases' such as I would like Z3 to obtain a model, but not any model: if possible, give me a model that has a great amount of negated literals.
This was performed using a new solver (Optimize instead of Solver) and soft constraints (add_soft instead of add). Concretely, for each literal lit_n and an optimized solver o_solver, this was added: o_solver.add_soft(Not(lit_n)); or, equivalently: o_solver.add_soft(Or(Not(lit1), Not(lit2), ...).
However, I would like to express a 'little more complicated biases'. Concretely: if possible: I prefer models with the half of literals to True and the half to False.
Is there any way I can express this and similar 'biases' using the Optimize tool?

Here’s a simple idea that might help. Count the number of positive literals and subtract from the count of negative ones. Take the absolute value of the difference and minimize it using the optimizer.
That should find the solution with as close a count of positive and negative literals, satisfying your “about half” soft constraint.
Here's a simple example. Let's say you have six literals and you want to satisfy their disjunction. The idiomatic solution would be:
from z3 import *
a, b, c, d, e, f = Bools("a b c d e f")
s = Solver()
s.add(Or(a, b, c, d, e, f))
print(s.check())
print(s.model())
If you run this, you'll get:
sat
[f = False,
b = False,
a = True,
c = False,
d = False,
e = False]
So, z3 simply made a True and all others False. But you wanted to have roughly the same count of positive and negative literals. So, let's encode that:
from z3 import *
a, b, c, d, e, f = Bools("a b c d e f")
s = Optimize()
s.add(Or(a, b, c, d, e, f))
def count(ref, xs):
s = 0
for x in xs:
s += If(x == ref, 1, 0)
return s
def sabs(x):
return If(x > 0, x, -x)
lits = [a, b, c, d, e, f]
posCount = count(True, lits)
negCount = count(False, lits)
s.minimize(sabs(posCount - negCount))
print(s.check())
print(s.model())
Note how we "symbolically" count the negative and positive literals and ask z3 to minimize the absolute value of the difference. If you run this, you'll get:
sat
[a = True,
b = False,
c = False,
d = True,
e = False,
f = True]
With 3 positive and 3 negative literals. If you had 7 literals to start with, it'd have found a 4-3 split. If you prefer more positive literals than negatives, you can additionally add a soft constraint of the form:
s.add_soft(posCount >= negCount)
to bias the solver that way. Hope this gets you started..

Using closure results for Turing-decidable languages

I have a language L1 = {w in {0,1}*| w contains the same number of 1's and 0's} and i have a TM M that decides L1.
I want to prove that L2 = {w in {0,1}*| w contains more 1's than 0's} is Turing-decidable.
I have used the "closed under complement" approach and proven that M' decides the complement of L1 (~L1).
My question is, can I assume that the ~L1 = (L2 or ~L2) and conclude that since M' decides ~L1 that L2 and ~L2 are both decidable languages?
Thank you for any advice
(Sorry, haven't figured out how to use LaTex here yet...)

I just want to flesh out Wellbog's answer. Here is L1 (Read n1(w) as "number of 1's in w"):
L1 = {w∈
{0,1}*: n1(w) = n0(w)}
And here is L2:
L2 = {w∈
{0,1}*: n1(w) > n0(w)}
From the other side, L1-bar is:
L1-bar = {w∈
{0,1}*: n1(w) > n0(w) OR n1(w) < n0(w)}
And clearly, L1-bar and L2 are different.

Currying and multiple integrals

I am interested in learning an elegant way to use currying in a functional programming language to numerically evaluate multiple integrals. My language of choice is F#.
If I want to integrate f(x,y,z)=8xyz on the region [0,1]x[0,1]x[0,1] I start by writing down a triple integral of the differential form 8xyz dx dy dz. In some sense, this is a function of three ordered arguments: a (float -> float -> float -> float).
I take the first integral and the problem reduces to the double integral of 4xy dx dy on [0,1]x[0,1]. Conceptually, we have curried the function to become a (float -> float -> float).
After the second integral I am left to take the integral of 2x dx, a (float -> float), on the unit interval.
After three integrals I am left with the result, the number 1.0.
Ignoring optimizations of the numeric integration, how could I succinctly execute this? I would like to write something like:
let diffForm = (fun x y z -> 8 * x * y * z)
let result =
diffForm
|> Integrate 0.0 1.0
|> Integrate 0.0 1.0
|> Integrate 0.0 1.0
Is this doable, if perhaps impractical? I like the idea of how closely this would capture what is going on mathematically.

I like the idea of how closely this would capture what is going on mathematically.
I'm afraid your premise is false: The pipe operator threads a value through a chain of functions and is closely related to function composition. Integrating over an n-dimensional domain however is analogous to n nested loops, i.e. in your case something like
for x in x_grid_nodes do
for y in y_grid_nodes do
for z in z_grid_nodes do
integral <- integral + ... // details depend on integration scheme
You cannot easily map that to a chain of three independet calls to some Integrate function and thus the composition integrate x1 x2 >> integrate y1 y2 >> integrate z1 z2 is actually not what you do when you integrate f. That is why Tomas' solution—if I understood it correctly (and I am not sure about that...)—essentially evaluates your function on an implicitly defined 3D grid and passes that to the integration function. I suspect that is as close as you can get to your original question.
You did not ask for it, but if you do want to evaluate a n-dimensional integral in practice, look into Monte Carlo integration, which avoids another problem commonly known as the "curse of dimensionality", i.e. that fact that the number of required sample points grows exponentially with n with classic integration schemes.
Update
You can implement iterated integration, but not with a single integrate function, because the type of the function to be integrated is different for each step of the integration (i.e. each step turns an n-ary function to an (n - 1)-ary one):
let f = fun x y z -> 8.0 * x * y * z
// numerically integrate f on [x1, x2]
let trapRule f x1 x2 = (x2 - x1) * (f x1 + f x2) / 2.0
// uniform step size for simplicity
let h = 0.1
// integrate an unary function f on a given discrete grid
let integrate grid f =
let mutable integral = 0.0
for x1, x2 in Seq.zip grid (Seq.skip 1 grid) do
integral <- integral + trapRule f x1 x2
integral
// integrate a 3-ary function f with respect to its last argument
let integrate3 lower upper f =
let grid = seq { lower .. h .. upper }
fun x y -> integrate grid (f x y)
// integrate a 2-ary function f with respect to its last argument
let integrate2 lower upper f =
let grid = seq { lower .. h .. upper }
fun x -> integrate grid (f x)
// integrate an unary function f on [lower, upper]
let integrate1 lower upper f =
integrate (seq { lower .. h .. upper }) f
With your example function f
f |> integrate3 0.0 1.0 |> integrate2 0.0 1.0 |> integrate1 0.0 1.0
yields 1.0.

I'm not entirely sure how you would implement this in a normal way, so this might not fully solve the problem, but here are some ideas.
To do the numerical integration, you'll (I think?) need to call the original function diffForm at various points as specified by the Integrate calls in the pipeline - but you actually need to call it at a product of the ranges - so if I wanted to call it only at the borders, I would still need to call it 2x2x2 times to cover all possible combinations (diffForm 0 0 0, diffForm 0 0 1, diffForm 0 1 0 etc.) and then do some calcualtion on the 8 results you get.
The following sample (at least) shows how to write similar code that calls the specified function with all combinations of the argument values that you specify.
The idea is to use continuations which can be called multiple times (and so when we get a function, we can call it repeatedly at multiple different points).
// Our original function
let diffForm x y z = 8.0 * x * y * z
// At the first step, we just pass the function to a continuation 'k' (once)
let diffFormK k = k diffForm
// This function takes a function that returns function via a continuation
// (like diffFormK) and it fixes the first argument of the function
// to 'lo' and 'hi' and calls its own continuation with both options
let range lo hi func k =
// When called for the first time, 'f' will be your 'diffForm'
// and here we call it twice with 'lo' and 'hi' and pass the
// two results (float -> float -> float) to the next in the pipeline
func (fun f -> k (f lo))
func (fun f -> k (f hi))
// At the end, we end up with a function that takes a continuation
// and it calls the continuation with all combinations of results
// (This is where you need to do something tricky to aggregate the results :-))
let integrate result =
result (printfn "%f")
// Now, we pass our function to 'range' for every argument and
// then pass the result to 'integrate' which just prints all results
let result =
diffFormK
|> range 0.0 1.0
|> range 0.0 1.0
|> range 0.0 1.0
|> integrate
This might be pretty confusing (because continuations take a lot of time to get used to), but perhaps you (or someone else here?) can find a way to turn this first attempt into a real numerical integration :-)

Dynamic programming in F#

What is the most elegant way to implement dynamic programming algorithms that solve problems with overlapping subproblems? In imperative programming one would usually create an array indexed (at least in one dimension) by the size of the problem, and then the algorithm would start from the simplest problems and work towards more complicated once, using the results already computed.
The simplest example I can think of is computing the Nth Fibonacci number:
int Fibonacci(int N)
{
var F = new int[N+1];
F[0]=1;
F[1]=1;
for(int i=2; i<=N; i++)
{
F[i]=F[i-1]+F[i-2];
}
return F[N];
}
I know you can implement the same thing in F#, but I am looking for a nice functional solution (which is O(N) as well obviously).

One technique that is quite useful for dynamic programming is called memoization. For more details, see for example blog post by Don Syme or introduction by Matthew Podwysocki.
The idea is that you write (a naive) recursive function and then add cache that stores previous results. This lets you write the function in a usual functional style, but get the performance of algorithm implemented using dynamic programming.
For example, a naive (inefficient) function for calculating Fibonacci number looks like this:
let rec fibs n =
if n < 1 then 1 else
(fibs (n - 1)) + (fibs (n - 2))
This is inefficient, because when you call fibs 3, it will call fibs 1 three times (and many more times if you call, for example, fibs 6). The idea behind memoization is that we write a cache that stores the result of fib 1 and fib 2, and so on, so repeated calls will just pick the pre-calculated value from the cache.
A generic function that does the memoization can be written like this:
open System.Collections.Generic
let memoize(f) =
// Create (mutable) cache that is used for storing results of
// for function arguments that were already calculated.
let cache = new Dictionary<_, _>()
(fun x ->
// The returned function first performs a cache lookup
let succ, v = cache.TryGetValue(x)
if succ then v else
// If value was not found, calculate & cache it
let v = f(x)
cache.Add(x, v)
v)
To write more efficient Fibonacci function, we can now call memoize and give it the function that performs the calculation as an argument:
let rec fibs = memoize (fun n ->
if n < 1 then 1 else
(fibs (n - 1)) + (fibs (n - 2)))
Note that this is a recursive value - the body of the function calls the memoized fibs function.

Tomas's answer is a good general approach. In more specific circumstances, there may be other techniques that work well - for example, in your Fibonacci case you really only need a finite amount of state (the previous 2 numbers), not all of the previously calculated values. Therefore you can do something like this:
let fibs = Seq.unfold (fun (i,j) -> Some(i,(j,i+j))) (1,1)
let fib n = Seq.nth n fibs
You could also do this more directly (without using Seq.unfold):
let fib =
let rec loop i j = function
| 0 -> i
| n -> loop j (i+j) (n-1)
loop 1 1

let fibs =
(1I,1I)
|> Seq.unfold (fun (n0, n1) -> Some (n0 , (n1, n0 + n1)))
|> Seq.cache

Taking inspiration from Tomas' answer here, and in an attempt to resolve the warning in my comment on said answer, I propose the following updated solution.
open System.Collections.Generic
let fib n =
let cache = new Dictionary<_, _>()
let memoize f c =
let succ, v = cache.TryGetValue c
if succ then v else
let v = f c
cache.Add(c, v)
v
let rec inner n =
match n with
| 1
| 2 -> bigint n
| n ->
memoize inner (n - 1) + memoize inner (n - 2)
inner n
This solution internalizes the memoization, and while doing so, allows the definitions of fib and inner to be functions, instead of fib being a recursive object, which allows the compiler to (I think) properly reason about the viability of the function calls.
I also return a bigint instead of an int, as int quickly overflows with even a small of n.
Edit: I should mention, however, that this solution still runs into stack overflow exceptions with sufficiently large values of n.

Permutation generator function F#

I need to generate a list of all distinct permutations of 1..n x 1..n where teh first value does not equal the second
(i.e. generate 3 -> [(3,2):: (3,1):: (2,3) ::(2,1)::(1,3)::(1,2)]
the exact scenario is you have a pool of objects(cards) and one is dealt to each player. If a player is dealt a card, no other player can be dealt that card(ignore suits for the time being, if I have to I will make a lut for 1-52 to map to the actual cards)
I came up with the following which seems messy at best
let GenerateTuples (numcards: int) =
let rec cellmaker (cardsleft: int) (cardval:int) =
if cardval = cardsleft then (if cardval <= 0 then [] else cellmaker cardsleft (cardval-1) ) elif cardval <= 0 then [] else (cardsleft, cardval) :: cellmaker cardsleft (cardval-1)
let rec generatelists (cardsleft:int) =
cellmaker cardsleft numcards # (if cardsleft > 1 then generatelists (cardsleft-1) else [])
generatelists numcards
is there a better way of doing this?

You can do it easily using list comprehensions:
let GenerateTuples (n:int) =
[for i in 1..n do for j in 1..n do if i <> j then yield (i,j)]

The problem is best seen as a matrix problem, and the nested "for" loops of the imperative solution can be done functionally.
let Permute n =
let rec Aux (x,y) =
if (x,y) = (n,n) then
[]
else
let nextTuple = if y = n then ((x + 1),1) else (x,(y + 1))
if x = y then
Aux nextTuple
else
(x,y)::(Aux nextTuple)
Aux (1,1)
This is not tail-recursive, so get's a stack overflow at approx n = 500, on my machine. It is almost trivial to make this function tail recursive.
The times for this were very interesting. This function (tail recursive version) took 50% more than the original, and the imperative solution took approx 3 times as long again! Yes - the original functional solution is the fastest, this is the next fastest, and the imperative list comprehension was the slowest, by approx 1::1.5::4. Tested on a wide variety of datasets.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Why is this (very simple) vectorized code orders of magnitude slower than Numpy? - vectorization

Related

How to use soft constraints in Z3-Python to express 'abstract' biases in SAT search: such as 'I prefer half of the literals to be true and half false'

Using closure results for Turing-decidable languages

Currying and multiple integrals

Dynamic programming in F#

Permutation generator function F#

Categories

Resources