Why does my program run using so much memory? - erlang

I'm reading a book on Erlang and I make simple example from the book.
%% exrs.erl
-module(exrs).
-export([sum/1]).
sum(0) -> 0;
sum(N) -> N + sum(N - 1).
When I run this example for large number (i.g. 1000000000) it use 16Gb RAM and 48Gb swap file on my PC for calculation this function.
1> exrs:sum(1000000000).
Is this a usual behavior for Erlang VM? And how to avoid the problem like that?
PS:
10> erlang:system_info(version).
"11.1"
11> erlang:system_info(otp_release).
"23"
OS: Win10 x64

As said in other answers, your recursion is not tail optimized. What happens in your code is that erlang evaluates right side of expression and recursively appends new function call to stack. Like below
1_000_000 + sum(999_999 + sum(999_998 + sum(....)))
That is what eats your memory. The proper way is to write function that accepts accumulator as second argument of sum function, like this
-module(exrs).
-export([sum/2]).
sum(0, ACC) -> ACC;
sum(N, ACC) -> sum(N - 1, ACC + N).

Your recursive function can't make use of tail-call optimisation, so it's using a stack frame for each recursive call.
1,000,000,000 recursive calls is a lot of stack frames.
See, for example, this section of "Learn you some Erlang", for more details.

Related

F#: inefficient sequence processing

I have the following code to perform the Sieve of Eratosthenes in F#:
let sieveOutPrime p numbers =
numbers
|> Seq.filter (fun n -> n % p <> 0)
let primesLessThan n =
let removeFirstPrime = function
| s when Seq.isEmpty s -> None
| s -> Some(Seq.head s, sieveOutPrime (Seq.head s) (Seq.tail s))
let remainingPrimes =
seq {3..2..n}
|> Seq.unfold removeFirstPrime
seq { yield 2; yield! remainingPrimes }
This is excruciatingly slow when the input to primesLessThan is remotely large: primes 1000000 |> Seq.skip 1000;; takes nearly a minute for me, though primes 1000000 itself is naturally very fast because it's just a Sequence.
I did some playing around, and I think that the culprit has to be that Seq.tail (in my removeFirstPrime) is doing something intensive. According to the docs, it's generating a completely new sequence, which I could imagine being slow.
If this were Python and the sequence object were a generator, it would be trivial to ensure that nothing expensive happens at this point: just yield from the sequence, and we've cheaply dropped its first element.
LazyList in F# doesn't seem to have the unfold method (or, for that matter, the filter method); otherwise I think LazyList would be the thing I wanted.
How can I make this implementation fast by preventing unnecessary duplications/recomputations? Ideally primesLessThan n |> Seq.skip 1000 would take the same amount of time regardless of how large n was.
Recursive solutions and sequences don't go well together (compare the answers here, it's very much the same pattern you are using). You might want to inspect the generated code, but I'd just consider this a rule of thumb.
LazyList (as defined in FSharpX) does of course come with unfold and filter defined, it would have been quite bizarre if it didn't. Typically in F# code this sort of functionality is provided in separate modules rather than as instance members on the type itself, a convention that does seem to confuse most of those documentation systems.
As you probably know Seq is a lazily evaluated collection. Sieve algorithm is about filtering out non-primes from a sequence so that you don't have to consider them again.
However, when you combine Sieve with a lazily evaluated collection you end up do the filtering of the same non-primes over and over again.
You see much better performance if you switch from Seq to Array or List because of the non-lazy aspect of those collections means that you only filter non-primes once.
One way to improve performance in your code is to introduce caching.
let removeFirstPrime s =
let s = s |> Seq.cache
match s with
| s when Seq.isEmpty s -> None
| s -> Some(Seq.head s, sieveOutPrime (Seq.head s) (Seq.tail s))
I implemented a LazyList that works alot like Seq that allows me to count the number of evaluations:
For all primes up to 2000.
Without caching: 14753706 evaluations
With caching: 97260 evaluations
Of course if you really need performance you use a mutable array implementation.
PS. Performance metrics
Running 'seq' ...
it took 271 ms with cc (16, 4, 0), the result is: 1013507
Running 'list' ...
it took 14 ms with cc (16, 0, 0), the result is: 1013507
Running 'array' ...
it took 14 ms with cc (10, 0, 0), the result is: 1013507
Running 'mutable' ...
it took 0 ms with cc (0, 0, 0), the result is: 1013507
This is Seq with caching. Seq in F# has rather high overhead, there are interesting lazy alternatives to Seq like Nessos.
List and Array run roughly similar but because of the more compact memory layout the GC metrics are better for Array (10 cc0 collections for Array vs 16 cc0 collections for List). Seq has worse GC metrics in that it forced 4 cc1 collections.
The mutable implementation of sieve algorithm has better memory and performance metrics by a large margin.

Make pairs out of a list in erlang

I'm trying to do a process on items in a sorted set in erlang, I call ZRANGE KEY 0 -1 WITHSCORES with eredis, the problem is it returns something like [<<"item1">>, <<"100">>, <<"item2">>, <<"200">>]. How can I run a function f on these items efficiently so that these calls occur: f(<<"item1">>, <<"100">>), f(<<"item2">>, <<"200">>)?
I solved it with something like this
f([X,Y|T]) -> [do_the_job(X,Y)|f(T)];
f([]) -> [].
then calling:
f(List).
Is there a more efficient way for doing so?
An optimized way is using tail-recursion. You can pass your list into do/1 function and it generates an empty list for storing the result of applying f/2 function on each two head items of the given list and then return the results:
do(List) ->
do(List, []).
do([X,Y | Tail], Acc) ->
do(Tail, [f(X, Y) | Acc]);
do([], Acc) ->
lists:reverse(Acc).
f(X, Y) ->
{X, Y}.
A note from Erlang documentation about tail-recursive efficiency:
In most cases, a recursive function uses more words on the stack for each recursion than the number of words a tail-recursive would allocate on the heap. As more memory is used, the garbage collector is invoked more frequently, and it has more work traversing the stack.

Is it possible to create an unbound variable in Erlang?

I'm a completely new to erlang. As an exercise to learn the language, I'm trying to implement the function sublist using tail recursion and without using reverse. Here's the function that I took from this site http://learnyousomeerlang.com/recursion:
tail_sublist(L, N) -> reverse(tail_sublist(L, N, [])).
tail_sublist(_, 0, SubList) -> SubList;
tail_sublist([], _, SubList) -> SubList;
tail_sublist([H|T], N, SubList) when N > 0 ->
tail_sublist(T, N-1, [H|SubList]).
It seems the use of reverse in erlang is very frequent.
In Mozart/Oz, it's very easy to create such the function using unbound variables:
proc {Sublist Xs N R}
if N>0 then
case Xs
of nil then
R = nil
[] X|Xr then
Unbound
in
R = X|Unbound
{Sublist Xr N-1 Unbound}
end
else
R=nil
end
end
Is it possible to create a similar code in erlang? If not, why?
Edit:
I want to clarify something about the question. The function in Oz doesn't use any auxiliary function (no append, no reverse, no anything external or BIF). It's also built using tail recursion.
When I ask if it's possible to create something similar in erlang, I'm asking if it's possible to implement a function or set of functions in erlang using tail recursion, and iterating over the initial list only once.
At this point, after reading your comments and answers, I'm doubtful that it can be done, because erlang doesn't seem to support unbound variables. It seems that all variables need to be assigned to value.
Short Version
No, you can't have a similar code in Erlang. The reason is because in Erlang variables are Single assignment variables.
Unbound Variables are simply not allowed in Erlang.
Long Version
I can't imagine a tail recursive function similar to the one you presenting above due to differences at paradigm level of the two languages you are trying to compare.
But nevertheless it also depends of what you mean by similar code.
So, correct me if I am wrong, the following
R = X|Unbound
{Sublist Xr N-1 Unbound}
Means that the attribution (R=X|Unbound) will not be executed until the recursive call returns the value of Unbound.
This to me looks a lot like the following:
sublist(_,0) -> [];
sublist([],_) -> [];
sublist([H|T],N)
when is_integer(N) ->
NewTail = sublist(T,N-1),
[H|NewTail].
%% or
%%sublist([H|T],N)
%% when is_integer(N) -> [H|sublist(T,N-1)].
But this code isn't tail recursive.
Here's a version that uses appends along the way instead of a reverse at the end.
subl(L, N) -> subl(L, N, []).
subl(_, 0, Accumulator) ->
Accumulator;
subl([], _, Accumulator) ->
Accumulator;
subl([H|T], N, Accumulator) ->
subl(T, N-1, Accumulator ++ [H]).
I would not say that "the use of reverse in Erlang is very frequent". I would say that the use of reverse is very common in toy problems in functional languages where lists are a significant data type.
I'm not sure how close to your Oz code you're trying to get with your "is it possible to create a similar code in Erlang? If not, why?" They are two different languages and have made many different syntax choices.

Choosing between continuation passing style and memoization

While writing up examples for memoization and continuation passing style (CPS) functions in a functional language, I ended up using the Fibonacci example for both. However, Fibonacci doesn't really benefit from CPS, as the loop still has to run exponentially often, while with memoization its only O(n) the first time and O(1) every following time.
Combining both CPS and memoization has a slight benefit for Fibonacci, but are there examples around where CPS is the best way that prevents you from running out of stack and improves performance and where memoization is not a solution?
Or: is there any guideline for when to choose one over the other or both?
On CPS
While CPS is useful as an intermediate language in a compiler, on the source language level it is mostly a device to (1) encode sophisticated control flow (not really performance-related) and (2) transform a non-tail-call consuming stack space into a continuation-allocating tail-call consuming heap space. For example when you write (code untested)
let rec fib = function
| 0 | 1 -> 1
| n -> fib (n-1) + fib (n-2)
let rec fib_cps n k = match n with
| 0 | 1 -> k 1
| n -> fib_cps (n-1) (fun a -> fib_cps (n-2) (fun b -> k (a+b)))
The previous non-tail-call fib (n-2), which allocated a new stack frame, is turned into the tail-call fib (n-2) (fun b -> k (a+b)) which allocates the closure fun b -> k (a+b) (on the heap) to pass it as argument.
This does not asymptotically reduce the memory usage of your program (some further domain-specific optimizations might, but that's another story). You're just trading stack space for heap space, which is interesting on systems where stack space is severely limited by the OS (not the case with some implementations of ML such as SML/NJ, which track their call stack on the heap instead of using the system stack), and potentially performance-degrading because of the additional GC costs and potential locality decrease.
CPS transformation is unlikely to improve performances much (though details of your implementation and runtime systems might make it so), and is a generally applicable transformation that allows to avoid the snark "Stack Overflow" of recursive functions with a deep call stack.
On Memoization
Memoization is useful to introduce sharing of subcalls of recursive functions. A recursive function typically solves a "problem" ("compute fibonacci of n", etc.) by decomposing it into several strictly simpler "sub-problems" (the recursive subcalls), with some base cases for which the problem is solvable right away.
For any recursive function (or recursive formulation of a problem), you can observe the structure of the subproblem space. Which simpler instances of Fib(k) will Fib(n) need to return its result? Which simpler instances will those instances in turn need?
In the general case, this space of subproblem is a graph (generally acyclic for termination purposes): there are some nodes that have several parents, that is several distinct problems for which they are subproblems. For example, Fib(n-2) is a subproblem both of Fib(n) and Fib(n-2). The amount of node sharing in this graph depends on the particular problem / recursive functions. In the case of Fibonacci, all nodes are shared between two parents, so there is a lot of sharing.
A direct recursive call without memoization will not be able observe this sharing. The structure of the calls of a recursive function is a tree, not a graph, and shared subproblems such as Fib(n-2) will be fully visited several times (as many as there are paths from the starting node to the subproblem node in the graph). Memoization introduces sharing by letting some subcalls return directly with "we've already computed this node and here is the result". For problems that have a lot of sharing, this can result in dramatic reduction of (useless) computation: Fibonacci moves from exponential to linear complexity when memoization is introduced -- note that there are other ways to write the function, without using memoization but a different subcalls structure, to have a linear complexity.
let rec fib_pair = function
| 0 -> (1,1)
| n -> let (u,v) = fib_pair (n-1) in (v,u+v)
The technique of using some form of sharing (usually through large tables storing the results) to avoid useless duplication of subcomputations is well-known in the algorithmic community, it is called Dynamic Programming. When you recognize that a problem is amenable to this treatment (you notice the sharing among the subproblems), this can provide large performance benefits.
Does a comparison make sense?
The two seems mostly independent of each other.
There are a lot of problems where memoization is not applicable, because the subproblem graph structure does not have any sharing (it is a tree). On the contrary, CPS transformation is applicable for all recursive functions, but does not by itself lead to performance benefits (other than potential constant factors due to the particular implementation and runtime system you're using, though they're likely to make the code slower rather than faster).
Improving performances by inspecting non-tail contexts
There are optimizations technique that is related to CPS that can improve performance of recursive functions. They consist in looking at the computations "left to be done" after the recursive calls (what would be turned into a function in straightforward CPS style) and finding an alternate, more efficient representation for it, that does not result in systematic closure allocation. Consider for example:
let rec length = function
| [] -> 0
| _::t -> 1 + length t
let rec length_cps li k = match li with
| [] -> k 0
| _::t -> length_cps t (fun a -> k (a + 1))
You can notice that the context of the non-recursive call, namely [_ + 1], has a simple structure: it adds an integer. Instead of representing this as a function fun a -> k (a+1), you may just store the integer to be added corresponding to several application of this function, making k an integer instead of a function.
let rec length_acc li k = match li with
| [] -> k + 0
| _::t -> length_acc t (k + 1)
This function runs in constant, rather than linear, space. By turning the representation of the tail contexts from functions to integers, we have eliminated the allocation step that made memory usage linear.
Close inspection of the order in which the additions are performed will reveal that they are now performed in a different direction: we are adding the 1's corresponding to the beginning of the list first, while the cps version was adding them last. This order-reversal is valid because _ + 1 is an associative operation (if you have several nested contexts, foo + 1 + 1 + 1, it is valid to start compute them either from the inside, ((foo+1)+1)+1, or the outside, foo+(1+(1+1))). The optimization above can be used for all such "associative" contexts around a non-tail-call.
There are certainly other optimizations available based on the same idea (I'm no expert on such optimizations), which is to look at the structure of the continuations involved and represent them under a more efficient form than functions allocated on the heap.
This is related to the transformation of "defunctionalization" that changes the representation of continuations from functions to data-structures, without changing the memory consumption (a defunctionalized program will allocate a data node when a closure would have been allocated in the original program), but allows to express the tail-recursive CPS version in a first-order language (without first-class functions) and can be more efficient on systems where data structures and pattern-matching is more efficient than closure allocation and indirect calls.
type length_cont =
| Linit
| Lcons of length_cont
let rec length_cps_defun li k = match li with
| [] -> length_cont_eval 0 k
| _::t -> length_cps_defun t (Lcons k)
and length_cont_eval acc = function
| Linit -> acc
| Lcons k -> length_cont_eval (acc+1) k
let length li = length_cps_defun li Linit
type fib_cont =
| Finit
| Fminus1 of int * fib_cont
| Fminus2 of fib_cont * int
let rec fib_cps_defun n k = match n with
| 0 | 1 -> fib_cont_eval 1 k
| n -> fib_cps_defun (n-1) (Fminus1 (n, k))
and fib_cont_eval acc = function
| Finit -> acc
| Fminus1 (n, k) -> fib_cps_defun (n-2) (Fminus2 (k, acc))
| Fminus2 (k, acc') -> fib_cont_eval (acc+acc') k
let fib n = fib_cps_defun n Finit
One benefit of CPS is error handling. If you need to fail you just call your failure method.
I think the biggest situation is when you are not talking about calculations, where memoization is great. If you are instead talking about IO or other operations, the benefits of CPS are there but memoization doesn't work.
As to an instance where CPS and memoization are both applicable and CPS is better, I am not sure since I consider them different pieces of functionality.
Finally CPS is a bit diminished in F# since tail recursion makes the whole stack overflow thing a non-issue already.

How do I know if a function is tail recursive in F#

I wrote the follwing function:
let str2lst str =
let rec f s acc =
match s with
| "" -> acc
| _ -> f (s.Substring 1) (s.[0]::acc)
f str []
How can I know if the F# compiler turned it into a loop? Is there a way to find out without using Reflector (I have no experience with Reflector and I Don't know C#)?
Edit: Also, is it possible to write a tail recursive function without using an inner function, or is it necessary for the loop to reside in?
Also, Is there a function in F# std lib to run a given function a number of times, each time giving it the last output as input? Lets say I have a string, I want to run a function over the string then run it again over the resultant string and so on...
Unfortunately there is no trivial way.
It is not too hard to read the source code and use the types and determine whether something is a tail call by inspection (is it 'the last thing', and not in a 'try' block), but people second-guess themselves and make mistakes. There's no simple automated way (other than e.g. inspecting the generated code).
Of course, you can just try your function on a large piece of test data and see if it blows up or not.
The F# compiler will generate .tail IL instructions for all tail calls (unless the compiler flags to turn them off is used - used for when you want to keep stack frames for debugging), with the exception that directly tail-recursive functions will be optimized into loops. (EDIT: I think nowadays the F# compiler also fails to emit .tail in cases where it can prove there are no recursive loops through this call site; this is an optimization given that the .tail opcode is a little slower on many platforms.)
'tailcall' is a reserved keyword, with the idea that a future version of F# may allow you to write e.g.
tailcall func args
and then get a warning/error if it's not a tail call.
Only functions that are not naturally tail-recursive (and thus need an extra accumulator parameter) will 'force' you into the 'inner function' idiom.
Here's a code sample of what you asked:
let rec nTimes n f x =
if n = 0 then
x
else
nTimes (n-1) f (f x)
let r = nTimes 3 (fun s -> s ^ " is a rose") "A rose"
printfn "%s" r
I like the rule of thumb Paul Graham formulates in On Lisp: if there is work left to do, e.g. manipulating the recursive call output, then the call is not tail recursive.

Resources