Why does this F# expression stack overflow - f#

let ints = [1..40000]
// create [{1};{2};.....{40000}]
let a1 = ints |> List.map Seq.singleton
// tail recursively append all the inner list
let a2 = a1 |> List.fold Seq.append Seq.empty
// tail recursively loop through them
let a3 = a2 |> Seq.forall (fun x -> true) // stack overflow...why?
my reason for asking is concern that I have code that will recursively append and I need to be sure it wont blow up....so I ran this example in order establish what was going
both in debug and running as an app.

The first thing to note is that the function causing the SO exception is:
let a2 = a1 |> List.fold Seq.append Seq.empty
but you don't see the SO until you evaluate the next line because sequences are lazily evaluated.
Because you are using Seq.append, each new item you add to your sequence creates a new sequence which contains the previous sequence. You can construct a similar sequence directly like so:
> seq {
yield! seq {
yield! seq {
yield 1
}
yield 2
}
yield 3
}
val it : seq<int> = seq [1; 2; 3]
Notice how, to get to the very first item (1) you have to go to depth 3 of the sequence. In your case that would be depth 40000. The sequence isn't tail recursive, so each level of the sequence ends up as a stack frame when iterating it.

Related

In F#, how to get head/tail of a seq without re-evaluating the seq

I'm reading a file and I want to do something with the first line, and something else with all the other lines
let lines = System.IO.File.ReadLines "filename.txt" |> Seq.map (fun r -> r.Trim())
let head = Seq.head lines
let tail = Seq.tail lines
```
Problem: the call to tail fails because the TextReader is closed.
What it means is that the Seq is evaluated twice: once to get the head once to get the tail.
How can I get the firstLine and the lastLines, while keeping a Seq and without reevaluating the Seq ?
the signature could be, for example :
let fn: ('a -> Seq<'a> -> b) -> Seq<'a> -> b
The easiest thing to do is probably just using Seq.cache to wrap your lines sequence:
let lines =
System.IO.File.ReadLines "filename.txt"
|> Seq.map (fun r -> r.Trim())
|> Seq.cache
Of note from the documentation:
This result sequence will have the same elements as the input sequence. The result can be enumerated multiple times. The input sequence is enumerated at most once and only as far as is necessary. Caching a sequence is typically useful when repeatedly evaluating items in the original sequence is computationally expensive or if iterating the sequence causes side-effects that the user does not want to be repeated multiple times.
I generally use a seq expression in which the Stream is scoped inside the expression. That will allow you to enumerate the sequence fully before the stream is disposed. I usually use a function like this:
let readLines file =
seq {
use stream = File.OpenText file
while not stream.EndOfStream do
yield stream.ReadLine().Trim()
}
Then you should be able to call Seq.head and get the first line in the fail, and Seq.last to get the last line in the file. I think this will technically create two different enumerators though. If you want to only read the file exactly one time, then materializing the sequence to a list or using a function like Seq.cache will be your best option.
I had an important use case for this, where I am using Seq.unfold to read a large number of blocks with REST reads, and sequentially processing each block, with further REST reads.
The reading of the sequence had to be both "lazy" but also cached to avoid duplicate re-evaluation (with every Seq.tail operation).
Hence finding this question and the accepted answer (Seq.cache). Thanks!
I experimented with Seq.cache and discovered that it worked as claimed (ie, lazy and avoid re-evaluation), but with one noteworthy condition - the first five elements of the sequence are always read first (and retained with 'cache'), so experiments on five or smaller numbers won't show lazy evaluation. However, after five, lazy evaluation kicks in for each element.
This code can be used to experiment. Try it for 5, and see no lazy evaluation, and then 10, and see each element after 5 being 'lazy' read, as required. Also remove Seq.cache to see the problem we are addressing (re-evaluation)
// Get a Sequence of numbers.
let getNums n = seq { for i in 1..n do printfn "Yield { %d }" i; yield i}
// Unfold a sequence of numbers
let unfoldNums (nums : int seq) =
nums
|> Seq.unfold
(fun (nums : int seq) ->
printfn "unfold: nums = { %A }" nums
if Seq.isEmpty nums then
printfn "Done"
None
else
let num = Seq.head nums // Value to yield
let tl = Seq.tail nums // Next State. CAUSES RE-EVALUTION!
printfn "Yield: < %d >, tl = { %A }" num tl
Some (num,tl))
// Get n numbers as a sequence, then unfold them as a sequence
// Observe that with 'Seq.cache' input is not re-evaluated unnecessarily,
// and also that lazy evaulation kicks in for n > 5
let experiment n =
getNums n
|> Seq.cache
// Without cache, Seq.tail causes the sequence to be re-evaluated
|> unfoldNums
|> Seq.iter (fun x -> printfn "Process: %d" x)

Does Seq.groupBy preserve order within groups?

I want to group a sequence and then take the first occurrence of each element in the group. When I try this
Seq.groupBy f inSeq
|> Seq.map (fun (k,s) -> (k,s|>Seq.take 1|>Seq.exactlyOne))
I find that sometimes I get a different element from s. Is this expected?
Looking at the source of the groupBy implementation -
here's the relevant bit:
// Build the groupings
seq |> iter (fun v ->
let safeKey = keyf v
let mutable prev = Unchecked.defaultof<_>
match dict.TryGetValue (safeKey, &prev) with
| true -> prev.Add v
| false ->
let prev = ResizeArray ()
dict.[safeKey] <- prev
prev.Add v)
It iterates through the source array and adds the values to the corresponding list for the key. The order of subsequences is directly affected by the order of the input sequence. For the same input sequence, we can expect groupBy to return identical output sequences. This is how tests are coded for groupBy.
If you're seeing variations in the resulting sequences, check the input sequence.
Yes, this is expected. Sequences (seq) aren't guaranteed to be pure. You can define a sequence that will yield different values every time you iterate over them. If you call Seq.take 1 twice, you can get different results.
Consider, as an example, this sequence:
open System
let r = Random ()
let s = seq { yield r.Next(0, 9) }
If you call Seq.take 1 on that, you may get different results:
> s |> Seq.take 1;;
val it : seq<int> = seq [4]
> s |> Seq.take 1;;
val it : seq<int> = seq [1]
Using Seq.head isn't going to help you either:
> s |> Seq.head;;
val it : int = 2
> s |> Seq.head;;
val it : int = 6
If you want to guarantee deterministic behaviour, use a List instead.

Self-reference in F# sequence expression

Is there a way to have a self-reference in F# sequence expression? For example:
[for i in 1..n do if _f(i)_not_in_this_list_ do yield f(i)]
which prevents inserting duplicate elements.
EDIT: In general case, I would like to know the contents of this_list before applying f(), which is very computationally expensive.
EDIT: I oversimplified in the example above. My specific case is a computationally expensive test T (T: int -> bool) having a property T(i) => T(n*i) so the code snippet is:
[for i in 1..n do if _i_not_in_this_list_ && T(i) then for j in i..i..n do yield j]
The goal is to reduce the number of T() applications and use concise notation. I accomplished the former by using a mutable helper array:
let mutable notYet = Array.create n true
[for i in 1..n do if notYet.[i] && T(i) then for j in i..i..n do yield j; notYet.[j] <- false]
You can have recursive sequence expression e.g.
let rec allFiles dir =
seq { yield! Directory.GetFiles dir
for d in Directory.GetDirectories dir do
yield! allFiles d }
but circular reference is not possible.
An alternative is to use Seq.distinct from Seq module:
seq { for i in 1..n -> f i }
|> Seq.distinct
or to convert sequence to set using Set.ofSeq before consumption as per #John's comment.
You may also decide to maintain information about the previously generated elements in an explicit way; for example:
let genSeq n =
let elems = System.Collections.Generic.HashSet()
seq {
for i in 1..n do
if not (elems.Contains(i)) then
elems.Add(i) |> ignore
yield i
}
There are several considerations here.
First, you can't check if f(i) is in a list or not before actually computing f(i). So I guess you meant that your check function is expensive, not f(i) itself. Correct me if I'm wrong.
Second, if check is indeed very computationally expensive, you may look for a more effective algorithm. There's no guarantee you will find one for every sequence, but they often exist. Then your code will be nothing but a single Seq.unfold.
Third. When there's no such optimization, you may take another approach. Within [for...yield], you only build a current element and you can't access prior ones. Instead of returning an element, building an entire list manually seems to be the way to go:
// a simple algorithm checking if some F(x) exists in a sequence somehow
let check (x:string) xs = Seq.forall (fun el -> not (x.Contains el)) xs
// a converter i -> something else
let f (i: int) = i.ToString()
let generate f xs =
let rec loop ys = function
| [] -> List.rev ys
| x::t ->
let y = f x
loop (if check y ys then y::ys else ys) t
loop [] xs
// usage
[0..3..1000] |> generate f |> List.iter (printf "%O ")

Should this sequence expression be tail-recursive?

This F# seq expression looks tail-recursive to me, but I'm getting stack overflow exceptions (with tail-calls enabled). Does anybody know what I'm missing?
let buildSecondLevelExpressions expressions =
let initialState = vector expressions |> randomize
let rec allSeq state = seq {
for partial in state do
if count partial = 1
then yield Seq.head partial
if count partial > 1 || (count partial = 1 && depth (Seq.head partial) <= MAX_DEPTH) then
let allUns = partial
|> pick false 1
|> Seq.collect (fun (el, rr) -> (createExpUnaries el |> Seq.map (fun bn -> add rr bn)))
let allBins = partial // Careful: this case alone produces result recursivley only if |numbers| is even (rightly!).
|> pick false 2
|> Seq.collect (fun (el, rr) -> (createExpBinaries el |> Seq.map (fun bn -> add rr bn)))
yield! allSeq (interleave allBins allUns)
}
allSeq initialState
If you're wondering, though it shouldn't be important, pick is used to generate combinations of elements in a sequence and interleave interleaves elements from 2 sequences. vector is a constructor for a ResizeArray.
As Gideon pointed out, this is not tail-recursive, because you still have other elements in the 'state' list to process. Making this tail-recursive isn't straightforward, because you need some queue of elements that should be processed.
The following pseudo-code shows one possible solution. I added work parameter that stores the remaining work to be done. At every call, we process just the first element. All other elements are added to the queue. When we finish, we pick more work from the queue:
let rec allSeq state work = seq {
match state with
| partial::rest ->
// Yield single thing to the result - this is fine
if count partial = 1 then yield Seq.head partial
// Check if we need to make more recursive calls...
if count partial > 1 || (* ... *) then
let allUns, allBins = // ...
// Tail-recursive call to process the current state. We add 'rest' to
// the collected work to be done after the current state is processed
yield! allSeq (interleave allBins allUns) (rest :: work)
else
// No more processing for current state - let's take remaining
// work from the 'work' list and run it (tail-recursively)
match work with
| state::rest -> yield! allSeq state rest
| [] -> () //completed
| _ ->
// This is the same thing as in the 'else' clause above.
// You could use clever pattern matching to handle both cases at once
match work with
| state::rest -> yield! allSeq state rest
| [] -> () } //completed
I cannot find a definition of which calls inside a sequence expression are in tail position in F# so I would strongly recommend not writing code that depends upon the semantics of the current implementation, i.e. this is undefined behaviour.
For example, trying to enumerate (e.g. applying Seq.length) the following sequence causes a stack overflow:
let rec xs() = seq { yield! xs() }
but, as Tomas pointed out, the following does actually work:
let rec xs n = seq { yield n; yield! xs(n+1) }
My advice is to always replace recursive sequence expressions with Seq.unfold instead. In this case, you probably want to accumulate the work to be done (e.g. when you recurse into a left branch you push the right branch onto the stack in the accumulator).
FWIW, even the F# language reference gets this wrong. It gives the following code for flattening a tree:
type Tree<'a> =
| Tree of 'a * Tree<'a> * Tree<'a>
| Leaf of 'a
let rec inorder tree =
seq {
match tree with
| Tree(x, left, right) ->
yield! inorder left
yield x
yield! inorder right
| Leaf x -> yield x
}
Their own code kills F# interactive with a stack overflow when fed a deep tree on the left.
This is not going to be tail recursive because you could be calling recursively multiple times. To translate to a pseudo-code:
allSeq(state)
{
foreach (partial in state)
{
if (...)
{
yield ...
}
if (...)
{
...
//this could be reached multiple times
yield! allSeq(...)
}
}
}

Avoiding stack overflow (with F# infinite sequences of sequences)

I have this "learning code" I wrote for the morris seq in f# that suffers from stack overflow that I don't know how to avoid. "morris" returns an infinite sequence of "see and say" sequences (i.e., {{1}, {1,1}, {2,1}, {1,2,1,1}, {1,1,1,2,2,1}, {3,1,2,2,1,1},...}).
let printList l =
Seq.iter (fun n -> printf "%i" n) l
printfn ""
let rec morris s =
let next str = seq {
let cnt = ref 1 // Stack overflow is below when enumerating
for cur in [|0|] |> Seq.append str |> Seq.windowed 2 do
if cur.[0] <> cur.[1] then
yield!( [!cnt ; cur.[0]] )
cnt := 0
incr cnt
}
seq {
yield s
yield! morris (next s) // tail recursion, no stack overflow
}
// "main"
// Print the nth iteration
let _ = [1] |> morris |> Seq.nth 3125 |> printList
You can pick off the nth iteration using Seq.nth but you can only get so far before you hit a stack overflow. The one bit of recursion I have is tail recursion and it in essence builds a linked set of enumerators. That's not where the problem is. It's when "enum" is called on the say the 4000th sequence. Note that's with F# 1.9.6.16, the previous version topped out above 14000). It's because the way the linked sequences are resolved. The sequences are lazy and so the "recursion" is lazy. That is, seq n calls seq n-1 which calls seq n-2 and so forth to get the first item (the very first # is the worst case).
I understand that [|0|] |> Seq.append str |> Seq.windowed 2, is making my problem worse and I could triple the # I could generate if I eliminated that. Practically speaking the code works well enough. The 3125th iteration of morris would be over 10^359 characters in length.
The problem I'm really trying to solve is how to retain the lazy eval and have a no limit based on stack size for the iteration I can pick off. I'm looking for the proper F# idiom to make the limit based on memory size.
Update Oct '10
After learning F# a bit better, a tiny bit of Haskell, thinking & investigating this problem for over year, I finally can answer my own question. But as always with difficult problems, the problem starts with it being the wrong question. The problem isn't sequences of sequences - it's really because of a recursively defined sequence. My functional programming skills are a little better now and so it's easier to see what's going on with the version below, which still gets a stackoverflow
let next str =
Seq.append str [0]
|> Seq.pairwise
|> Seq.scan (fun (n,_) (c,v) ->
if (c = v) then (n+1,Seq.empty)
else (1,Seq.ofList [n;c]) ) (1,Seq.empty)
|> Seq.collect snd
let morris = Seq.unfold(fun sq -> Some(sq,next sq))
That basicially creates a really long chain of Seq processing function calls to generate the sequnces. The Seq module that comes with F# is what can't follow the chain without using the stack. There's an optimization it uses for append and recursively defined sequences, but that optimization only works if the recursion is implementing an append.
So this will work
let rec ints n = seq { yield n; yield! ints (n+1) }
printf "%A" (ints 0 |> Seq.nth 100000);;
And this one will get a stackoverflow.
let rec ints n = seq { yield n; yield! (ints (n+1)|> Seq.map id) }
printf "%A" (ints 0 |> Seq.nth 100000);;
To prove the F# libary was the issue, I wrote my own Seq module that implemented append, pairwise, scan and collect using continutions and now I can begin generating and printing out the 50,000 seq without a problem (it'll never finish since it's over 10^5697 digits long).
Some additional notes:
Continuations were the idiom I was looking for, but in this case, they had to go into the F# library, not my code. I learned about continuations in F# from Tomas Petricek's Real-World Functional Programming book.
The lazy list answer that I accepted held the other idiom; lazy evaluation. In my rewritten library, I also had to leverage the lazy type to avoid stackoverflow.
The lazy list version sorta of works by luck (maybe by design but that's beyond my current ability to determine) - the active-pattern matching it uses while it's constructing and iterating causes the lists to calculate values before the required recursion gets too deep, so it's lazy, but not so lazy it needs continuations to avoid stackoverflow. For example, by the time the 2nd sequence needs a digit from the 1st sequence, it's already been calculated. In other words, the LL version is not strictly JIT lazy for sequence generation, only list management.
You should definitely check out
http://research.microsoft.com/en-us/um/cambridge/projects/fsharp/manual/FSharp.PowerPack/Microsoft.FSharp.Collections.LazyList.html
but I will try to post a more comprehensive answer later.
UPDATE
Ok, a solution is below. It represents the Morris sequence as a LazyList of LazyLists of int, since I presume you want it to be lazy in 'both directions'.
The F# LazyList (in the FSharp.PowerPack.dll) has three useful properties:
it is lazy (evaluation of the nth element will not happen until it is first demanded)
it does not recompute (re-evaluation of the nth element on the same object instance will not recompute it - it caches each element after it's first computed)
you can 'forget' prefixes (as you 'tail' into the list, the no-longer-referenced prefix is available for garbage collection)
The first property is common with seq (IEnumerable), but the other two are unique to LazyList and very useful for computational problems such as the one posed in this question.
Without further ado, the code:
// print a lazy list up to some max depth
let rec PrintList n ll =
match n with
| 0 -> printfn ""
| _ -> match ll with
| LazyList.Nil -> printfn ""
| LazyList.Cons(x,xs) ->
printf "%d" x
PrintList (n-1) xs
// NextMorris : LazyList<int> -> LazyList<int>
let rec NextMorris (LazyList.Cons(cur,rest)) =
let count = ref 1
let ll = ref rest
while LazyList.nonempty !ll && (LazyList.hd !ll) = cur do
ll := LazyList.tl !ll
incr count
LazyList.cons !count
(LazyList.consf cur (fun() ->
if LazyList.nonempty !ll then
NextMorris !ll
else
LazyList.empty()))
// Morris : LazyList<int> -> LazyList<LazyList<int>>
let Morris s =
let rec MakeMorris ll =
LazyList.consf ll (fun () ->
let next = NextMorris ll
MakeMorris next
)
MakeMorris s
// "main"
// Print the nth iteration, up to a certain depth
[1] |> LazyList.of_list |> Morris |> Seq.nth 3125 |> PrintList 10
[1] |> LazyList.of_list |> Morris |> Seq.nth 3126 |> PrintList 10
[1] |> LazyList.of_list |> Morris |> Seq.nth 100000 |> PrintList 35
[1] |> LazyList.of_list |> Morris |> Seq.nth 100001 |> PrintList 35
UPDATE2
If you just want to count, that's fine too:
let LLLength ll =
let rec Loop ll acc =
match ll with
| LazyList.Cons(_,rest) -> Loop rest (acc+1N)
| _ -> acc
Loop ll 0N
let Main() =
// don't do line below, it leaks
//let hundredth = [1] |> LazyList.of_list |> Morris |> Seq.nth 100
// if we only want to count length, make sure we throw away the only
// copy as we traverse it to count
[1] |> LazyList.of_list |> Morris |> Seq.nth 100
|> LLLength |> printfn "%A"
Main()
The memory usage stays flat (under 16M on my box)... hasn't finished running yet, but I computed the 55th length fast, even on my slow box, so I think this should work just fine. Note also that I used 'bignum's for the length, since I think this will overflow an 'int'.
I believe there are two main problems here:
Laziness is very inefficient so you can expect a lazy functional implementation to run orders of magnitude slower. For example, the Haskell implementation described here is 2,400× slower than the F# I give below. If you want a workaround, your best bet is probably to amortize the computations by bunching them together into eager batches where the batches are produced on-demand.
The Seq.append function is actually calling into C# code from IEnumerable and, consequently, its tail call doesn't get eliminated and you leak a bit more stack space every time you go through it. This shows up when you come to enumerate over the sequence.
The following is over 80× faster than your implementation at computing the length of the 50th subsequence but perhaps it is not lazy enough for you:
let next (xs: ResizeArray<_>) =
let ys = ResizeArray()
let add n x =
if n > 0 then
ys.Add n
ys.Add x
let mutable n = 0
let mutable x = 0
for i=0 to xs.Count-1 do
let x' = xs.[i]
if x=x' then
n <- n + 1
else
add n x
n <- 1
x <- x'
add n x
ys
let morris =
Seq.unfold (fun xs -> Some(xs, next xs)) (ResizeArray [1])
The core of this function is a fold over a ResizeArray that could be factored out and used functionally without too much performance degradation if you used a struct as the accumulator.
Just save the previous element that you looked for.
let morris2 data = seq {
let cnt = ref 0
let prev = ref (data |> Seq.nth 0)
for cur in data do
if cur <> !prev then
yield! [!cnt; !prev]
cnt := 1
prev := cur
else
cnt := !cnt + 1
yield! [!cnt; !prev]
}
let rec morrisSeq2 cur = seq {
yield cur
yield! morrisSeq2 (morris2 cur)
}

Resources