Binary Tree Maximum Path in Erlang - erlang

The Binary Tree Maximum Path problem can be solved in using DFS.
Here is a possible solution using this approach in Python.
def maxPathSum(self, root):
def maxSum(root):
if not root:
return 0
l_sum = maxSum(root.left)
r_sum = maxSum(root.right)
l = max(0, l_sum)
r = max(0, r_sum)
res[0] = max(res[0], root.val + l + r)
return root.val + max(l, r)
res = [-float('inf')]
maxSum(root)
return res[0]
I am trying to use the same approach in Erlang. Assuming a node would look like:
{Value, Left, Right}
I came up with:
max_sum(undefined) -> 0;
max_sum({Value, Left, Right}) ->
LeftSum = max(0, max_sum(Left)),
RightSum = max(0, max_sum(Right)),
%% Where to store the max? Should I use the process dictionary?
%% Should I send a message?
Value + max(LeftSum, RightSum).
max_path_sum(Root) ->
%% Bonus question: how to represent -infinity in Erlang?
max_sum(Root)
There are no global variables in Erlang. How can I keep track of the maximum during DFS? The only things that come to my mind are to use the process dictionary or an ETS table or maybe have a different process that can keep the maximum, but maybe I am overthinking and there is a more simple and idiomatic way?

The most "erlangish" way would be to pass the global maximum as a second parameter, and return it along with the local maximum:
max_sum(undefined, GlobalMax) -> {0, GlobalMax};
max_sum({Value, Left, Right}, GlobalMax0) ->
{LeftSum, GlobalMax1} = max(0, max_sum(Left, GlobalMax0)),
{RightSum, GlobalMax2} = max(0, max_sum(Right, GlobalMax1)),
NewGlobalMax =
case GlobalMax2 of
undefined ->
Value + LeftSum + RightSum
_ ->
max(GlobalMax2, Value + LeftSum + RightSum)
end,
{Value + max(LeftSum, RightSum), NewGlobalMax}.
max_path_sum(Root) ->
{_, GlobalMax} = max_sum(Root, undefined),
GlobalMax.
Erlang doesn't support infinity values in floats, so I used the atom undefined to represent a smallest value instead.

Related

Is there a way to return a pair of integers using the let construct in standard ML?

I am trying to return a pair of sums using the let construct in sml. Every way I have tried will only return one value. I have tried creating a list by using cons (::) and then returning the list, but that gives an error as well.
val t = [(3,4), (4,5), (5,6)];
fun sumPairs(nil) = 0
| sumPairs((x,y)::zs) =
let
val sumFirst = x + sumPairs(zs)
val sumSecond = y + sumPairs(zs)
in
(sumFirst, sumSecond) <how would I return this as a tuple or list?>
end;
sumPairs(t);
The problem is not with (sumFirst, sumSecond) or with let specifically, but with the rest of your code.
The base case and the recursions say that sumPairs produces an int, not a pair of ints.
Because of this, there is a conflict when you try produce a pair.
Your base case should be (0,0), not 0, since it must be a pair.
You also need to deconstruct the result from the recursion since that produces a pair, not an integer.
Like this
fun sumPairs nil = (0, 0)
| sumPairs ((x,y)::zs) =
let
val (sumFirst, sumSecond) = sumPairs zs
in
(x + sumFirst, y + sumSecond)
end;

Why doesn't this Fibonacci Number function work in O(log N)?

So the Fibonacci number for log (N) — without matrices.
Ni // i-th Fibonacci number
= Ni-1 + Ni-2 // by definition
= (Ni-2 + Ni-3) + Ni-2 // unwrap Ni-1
= 2*Ni-2 + Ni-3 // reduce the equation
= 2*(Ni-3 + Ni-4) + Ni-3 //unwrap Ni-2
// And so on
= 3*Ni-3 + 2*Ni-4
= 5*Ni-4 + 3*Ni-5
= 8*Ni-5 + 5*Ni-6
= Nk*Ni-k + Nk-1*Ni-k-1
Now we write a recursive function, where at each step we take k~=I/2.
static long N(long i)
{
if (i < 2) return 1;
long k=i/2;
return N(k) * N(i - k) + N(k - 1) * N(i - k - 1);
}
Where is the fault?
You get a recursion formula for the effort: T(n) = 4T(n/2) + O(1). (disregarding the fact that the numbers get bigger, so the O(1) does not even hold). It's clear from this that T(n) is not in O(log(n)). Instead one gets by the master theorem T(n) is in O(n^2).
Btw, this is even slower than the trivial algorithm to calculate all Fibonacci numbers up to n.
The four N calls inside the function each have an argument of around i/2. So the length of the stack of N calls in total is roughly equal to log2N, but because each call generates four more, the bottom 'layer' of calls has 4^log2N = O(n2) Thus, the fault is that N calls itself four times. With only two calls, as in the conventional iterative method, it would be O(n). I don't know of any way to do this with only one call, which could be O(log n).
An O(n) version based on this formula would be:
static long N(long i) {
if (i<2) {
return 1;
}
long k = i/2;
long val1;
long val2;
val1 = N(k-1);
val2 = N(k);
if (i%2==0) {
return val2*val2+val1*val1;
}
return val2*(val2+val1)+val1*val2;
}
which makes 2 N calls per function, making it O(n).
public class fibonacci {
public static int count=0;
public static void main(String[] args) {
Scanner scan = new Scanner(System.in);
int i = scan.nextInt();
System.out.println("value of i ="+ i);
int result = fun(i);
System.out.println("final result is " +result);
}
public static int fun(int i) {
count++;
System.out.println("fun is called and count is "+count);
if(i < 2) {
System.out.println("function returned");
return 1;
}
int k = i/2;
int part1 = fun(k);
int part2 = fun(i-k);
int part3 = fun(k-1);
int part4 = fun(i-k-1);
return ((part1*part2) + (part3*part4)); /*RESULT WILL BE SAME FOR BOTH METHODS*/
//return ((fun(k)*fun(i-k))+(fun(k-1)*fun(i-k-1)));
}
}
I tried to code to problem defined by you in java. What i observed is that complexity of above code is not completely O(N^2) but less than that.But as per conventions and standards the worst case complexity is O(N^2) including some other factors like computation(division,multiplication) and comparison time analysis.
The output of above code gives me information about how many times the function
fun(int i) computes and is being called.
OUTPUT
So including the time taken for comparison and division, multiplication operations, the worst case time complexity is O(N^2) not O(LogN).
Ok if we use Analysis of the recursive Fibonacci program technique.Then we end up getting a simple equation
T(N) = 4* T(N/2) + O(1)
where O(1) is some constant time.
So let's apply Master's method on this equation.
According to Master's method
T(n) = aT(n/b) + f(n) where a >= 1 and b > 1
There are following three cases:
If f(n) = Θ(nc) where c < Logba then T(n) = Θ(nLogba)
If f(n) = Θ(nc) where c = Logba then T(n) = Θ(ncLog n)
If f(n) = Θ(nc) where c > Logba then T(n) = Θ(f(n))
And in our equation a=4 , b=2 & c=0.
As case 1 c < logba => 0 < 2 (which is log base 2 and equals to 2) is satisfied
hence T(n) = O(n^2).
For more information about how master's algorithm works please visit: Analysis of Algorithms
Your idea is correct, and it will perform in O(log n) provided you don't compute the same formula
over and over again. The whole point of having N(k) * N(i-k) is to have (k = i - k) so you only have to compute one instead of two. But if you only call recursively, you are performing the computation twice.
What you need is called memoization. That is, store every value that you already have computed, and
if it comes up again, then you get it in O(1).
Here's an example
const int MAX = 10000;
// memoization array
int f[MAX] = {0};
// Return nth fibonacci number using memoization
int fib(int n) {
// Base case
if (n == 0)
return 0;
if (n == 1 || n == 2)
return (f[n] = 1);
// If fib(n) is already computed
if (f[n]) return f[n];
// (n & 1) is 1 iff n is odd
int k = n/2;
// Applying your formula
f[n] = fib(k) * fib(n - k) + fib(k - 1) * fib(n - k - 1);
return f[n];
}

Swit map: error: cannot invoke 'map' with an argument list of type '((_) -> _)'

I can't understand why this one works:
var arr = [4,5,6,7]
arr.map() {
x in
return x + 2
}
while this one not
arr.map() {
x in
var y = x + 2
return y
}
with error
Playground execution failed: MyPlayground.playground:13:5: error:
cannot invoke 'map' with an argument list of type '((_) -> _)'
arr.map() {
The problem here is there error message. In general, when you see something like cannot invoke .. with ... it means that the compiler's type inference has just not worked.
In this case, you've run up against one of the limitations of inference within closures. Swift can infer the type of single-statement closures only, not multiple-statement ones. In your first example:
arr.map() {
x in
return x + 2
}
There's actually only one statement: return x + 2. However, in the second:
arr.map() {
x in
var y = x + 2
return y
}
There's an assignment statement (var y = x + 2), and then the return. So the error is a little misleading: it doesn't mean you "can't invoke map() with this type of argument", what it means to say is "I can't figure out what type x or y is".
By the way, in single-statement closures, there are two other things that can be inferred. The return statement:
arr.map() {
x in
x + 2
}
And the variable name itself:
arr.map() { $0 + 2 }
It all produces the same compiled code, though. So it's really a matter of taste which one you choose. (For instance, while I think the inferred return looks clean and easier to read, I don't like the $0, so I generally always put x in or something, even for very short closures. It's up to you, though, obviously.)
One final thing: since this is all really just syntax stuff, it's worth noting that the () isn't needed either:
arr.map { x in x + 2 }
As #MartinR pointed out, the compiler can infer some types from outer context as well:
let b: [Int] = arr.map { x in
var y = x + 2
return y
}
Which is worth bearing in mind. (it seems that the "one-statement" rule only applies when there's no other type info available)
Swift can't infer type every time. Even though it should see that y = x + 2 means y is an Int too. My guess is that Swift parses the closure in a certain order that makes it not aware of the return type ahead of time in your case.
This works:
arr.map() {
x -> Int in
var y = x + 2
return y
}

using Array.Parallel.map for decreasing running time

Hello everyone
I have converted a project in C# to F# that paints the Mandelbrot set.
Unfortunately does it take around one minute to render a full screen so I am try to find some ways to speed it up.
It is one call that take almost all of the time:
Array.map (fun x -> this.colorArray.[CalcZ x]) xyArray
xyArray (double * double) [] => (array of tuple of double)
colorArray is an array of int32 length = 255
CalcZ is defined as:
let CalcZ (coord:double * double) =
let maxIterations = 255
let rec CalcZHelper (xCoord:double) (yCoord:double) // line break inserted
(x:double) (y:double) iters =
let newx = x * x + xCoord - y * y
let newy = 2.0 * x * y + yCoord
match newx, newy, iters with
| _ when Math.Abs newx > 2.0 -> iters
| _ when Math.Abs newy > 2.0 -> iters
| _ when iters = maxIterations -> iters
| _ -> CalcZHelper xCoord yCoord newx newy (iters + 1)
CalcZHelper (fst coord) (snd coord) (fst coord) (snd coord) 0
As I only use around half of the processor capacity is an idea to use more threads and specifically Array.Parallel.map, translates to system.threading.tasks.parallel
Now my question
A naive solution, would be:
Array.Parallel.map (fun x -> this.colorArray.[CalcZ x]) xyArray
but that took twice the time, how can I rewrite this to take less time, or can I take some other way to utilize the processor better?
Thanks in advance
Gorgen
---edit---
the function that is calling CalcZ looks like this:
let GetMatrix =
let halfX = double bitmap.PixelWidth * scale / 2.0
let halfY = double bitmap.PixelHeight * scale / 2.0
let rect:Mandelbrot.Rectangle =
{xMax = centerX + halfX; xMin = centerX - halfX;
yMax = centerY + halfY; yMin = centerY - halfY;}
let size:Mandelbrot.Size =
{x = bitmap.PixelWidth; y = bitmap.PixelHeight}
let xyList = GenerateXYTuple rect size
let xyArray = Array.ofList xyList
Array.map (fun x -> this.colorArray.[CalcZ x]) xyArray
let region:Int32Rect = new Int32Rect(0,0,bitmap.PixelWidth,bitmap.PixelHeight)
bitmap.WritePixels(region, GetMatrix, bitmap.PixelWidth * 4, region.X, region.Y);
GenerateXYTuple:
let GenerateXYTuple (rect:Rectangle) (pixels:Size) =
let xStep = (rect.xMax - rect.xMin)/double pixels.x
let yStep = (rect.yMax - rect.yMin)/double pixels.y
[for column in 0..pixels.y - 1 do
for row in 0..pixels.x - 1 do
yield (rect.xMin + xStep * double row,
rect.yMax - yStep * double column)]
---edit---
Following a suggestion from kvb (thanks a lot!) in a comment to my question, I built the program in Release mode. Building in the Relase mode generally speeded up things.
Just building in Release took me from 50s to around 30s, moving in all transforms on the array so it all happens in one pass made it around 10 seconds faster. At last using the Array.Parallel.init brought me to just over 11 seconds.
What I learnt from this is.... Use the release mode when timing things and using parallel constructs...
One more time, thanks for the help I have recieved.
--edit--
by using SSE assember from a native dll I have been able to slash the time from around 12 seconds to 1.2 seconds for a full screen of the most computational intensive points. Unfortunately I don't have a graphics processor...
Gorgen
Per the comment on the original post, here is the code I wrote to test the function. The fast version only takes a few seconds on my average workstation. It is fully sequential, and has no parallel code.
It's moderately long, so I posted it on another site: http://pastebin.com/Rjj8EzCA
I'm suspecting that the slowdown you are seeing is in the rendering code.
I don't think that the Array.Parallel.map function (which uses Parallel.For from .NET 4.0 under the cover) should have trouble parallelizing the operation if it runs a simple function ~1 million times. However, I encountered some weird performance behavior in a similar case when F# didn't optimize the call to the lambda function (in some way).
I'd try taking a copy of the Parallel.map function from the F# sources and adding inline. Try adding the following map function to your code and use it instead of the one from F# libraries:
let inline map (f: 'T -> 'U) (array : 'T[]) : 'U[]=
let inputLength = array.Length
let result = Array.zeroCreate inputLength
Parallel.For(0, inputLength, fun i ->
result.[i] <- f array.[i]) |> ignore
result
As an aside, it looks like you're generating an array of coordinates and then mapping it to an array of results. You don't need to create the coordinate array if you use the init function instead of map: Array.Parallel.init 1000 (fun y -> Array.init 1000 (fun x -> this.colorArray.[CalcZ (x, y)]))
EDIT: The following may be inaccurate:
Your problem could be that you call a tiny function a million times, causing the scheduling overhead to overwhelm that actual work you're doing. You should partition the array into much larger chunks so that each individual task takes a millisecond or so. You can use an array of arrays so that you would call Array.Parallel.map on the outer arrays and Array.map on the inner arrays. That way each parallel operation will operate on a whole row of pixels instead of just a single pixel.

Unwrapping nested loops in F#

I've been struggling with the following code. It's an F# implementation of the Forward-Euler algorithm used for modelling stars moving in a gravitational field.
let force (b1:Body) (b2:Body) =
let r = (b2.Position - b1.Position)
let rm = (float32)r.MagnitudeSquared + softeningLengthSquared
if (b1 = b2) then
VectorFloat.Zero
else
r * (b1.Mass * b2.Mass) / (Math.Sqrt((float)rm) * (float)rm)
member this.Integrate(dT, (bodies:Body[])) =
for i = 0 to bodies.Length - 1 do
for j = (i + 1) to bodies.Length - 1 do
let f = force bodies.[i] bodies.[j]
bodies.[i].Acceleration <- bodies.[i].Acceleration + (f / bodies.[i].Mass)
bodies.[j].Acceleration <- bodies.[j].Acceleration - (f / bodies.[j].Mass)
bodies.[i].Position <- bodies.[i].Position + bodies.[i].Velocity * dT
bodies.[i].Velocity <- bodies.[i].Velocity + bodies.[i].Acceleration * dT
While this works it isn't exactly "functional". It also suffers from horrible performance, it's 2.5 times slower than the equivalent c# code. bodies is an array of structs of type Body.
The thing I'm struggling with is that force() is an expensive function so usually you calculate it once for each pair and rely on the fact that Fij = -Fji. But this really messes up any loop unfolding etc.
Suggestions gratefully received! No this isn't homework...
Thanks,
Ade
UPDATED: To clarify Body and VectorFloat are defined as C# structs. This is because the program interops between F#/C# and C++/CLI. Eventually I'm going to get the code up on BitBucket but it's a work in progress I have some issues to sort out before I can put it up.
[StructLayout(LayoutKind.Sequential)]
public struct Body
{
public VectorFloat Position;
public float Size;
public uint Color;
public VectorFloat Velocity;
public VectorFloat Acceleration;
'''
}
[StructLayout(LayoutKind.Sequential)]
public partial struct VectorFloat
{
public System.Single X { get; set; }
public System.Single Y { get; set; }
public System.Single Z { get; set; }
}
The vector defines the sort of operators you'd expect for a standard Vector class. You could probably use the Vector3D class from the .NET framework for this case (I'm actually investigating cutting over to it).
UPDATE 2: Improved code based on the first two replies below:
for i = 0 to bodies.Length - 1 do
for j = (i + 1) to bodies.Length - 1 do
let r = ( bodies.[j].Position - bodies.[i].Position)
let rm = (float32)r.MagnitudeSquared + softeningLengthSquared
let f = r / (Math.Sqrt((float)rm) * (float)rm)
bodies.[i].Acceleration <- bodies.[i].Acceleration + (f * bodies.[j].Mass)
bodies.[j].Acceleration <- bodies.[j].Acceleration - (f * bodies.[i].Mass)
bodies.[i].Position <- bodies.[i].Position + bodies.[i].Velocity * dT
bodies.[i].Velocity <- bodies.[i].Velocity + bodies.[i].Acceleration * dT
The branch in the force function to cover the b1 == b2 case is the worst offender. You do't need this if softeningLength is always non-zero, even if it's very small (Epsilon). This optimization was in the C# code but not the F# version (doh!).
Math.Pow(x, -1.5) seems to be a lot slower than 1/ (Math.Sqrt(x) * x). Essentially this algorithm is slightly odd in that it's perfromance is dictated by the cost of this one step.
Moving the force calculation inline and getting rid of some divides also gives some improvement, but the performance was really being killed by the branching and is dominated by the cost of Sqrt.
WRT using classes over structs: There are cases (CUDA and native C++ implementations of this code and a DX9 renderer) where I need to get the array of bodies into unmanaged code or onto a GPU. In these scenarios being able to memcpy a contiguous block of memory seems like the way to go. Not something I'd get from an array of class Body.
I'm not sure if it's wise to rewrite this code in a functional style. I've seen some attempts to write pair interaction calculations in a functional manner and each one of them was harder to follow than two nested loops.
Before looking at structs vs. classes (I'm sure someone else has something smart to say about this), maybe you can try optimizing the calculation itself?
You're calculating two acceleration deltas, let's call them dAi and dAj:
dAi = r*m1*m2/(rm*sqrt(rm)) / m1
dAj = r*m1*m2/(rm*sqrt(rm)) / m2
[note: m1 = bodies.[i].mass, m2=bodies.[j].mass]]
The division by mass cancels out like this:
dAi = rm2 / (rmsqrt(rm))
dAj = rm1 / (rmsqrt(rm))
Now you only have to calculate r/(rmsqrt(rm)) for each pair (i,j).
This can be optimized further, because 1/(rmsqrt(rm)) = 1/(rm^1.5) = rm^-1.5, so if you let r' = r * (rm ** -1.5), then Edit: no it can't, that's premature optimization talking right there (see comment). Calculating r' = 1.0 / (r * sqrt r) is fastest.
dAi = m2 * r'
dAj = m1 * r'
Your code would then become something like
member this.Integrate(dT, (bodies:Body[])) =
for i = 0 to bodies.Length - 1 do
for j = (i + 1) to bodies.Length - 1 do
let r = (b2.Position - b1.Position)
let rm = (float32)r.MagnitudeSquared + softeningLengthSquared
let r' = r * (rm ** -1.5)
bodies.[i].Acceleration <- bodies.[i].Acceleration + r' * bodies.[j].Mass
bodies.[j].Acceleration <- bodies.[j].Acceleration - r' * bodies.[i].Mass
bodies.[i].Position <- bodies.[i].Position + bodies.[i].Velocity * dT
bodies.[i].Velocity <- bodies.[i].Velocity + bodies.[i].Acceleration * dT
Look, ma, no more divisions!
Warning: untested code. Try at your own risk.
I'd like to play arround with your code, but it's difficult since the definition of Body and FloatVector is missing and they also seem to be missing from the orginal blog post you point to.
I'd hazard a guess that you could improve your performance and rewrite in a more functional style using F#'s lazy computations:
http://msdn.microsoft.com/en-us/library/dd233247(VS.100).aspx
The idea is fairly simple you wrap any expensive computation that could be repeatedly calculated in a lazy ( ... ) expression then you can force the computation as many times as you like and it will only ever be calculated once.

Resources