Self-reference in F# sequence expression - f#

Is there a way to have a self-reference in F# sequence expression? For example:
[for i in 1..n do if _f(i)_not_in_this_list_ do yield f(i)]
which prevents inserting duplicate elements.
EDIT: In general case, I would like to know the contents of this_list before applying f(), which is very computationally expensive.
EDIT: I oversimplified in the example above. My specific case is a computationally expensive test T (T: int -> bool) having a property T(i) => T(n*i) so the code snippet is:
[for i in 1..n do if _i_not_in_this_list_ && T(i) then for j in i..i..n do yield j]
The goal is to reduce the number of T() applications and use concise notation. I accomplished the former by using a mutable helper array:
let mutable notYet = Array.create n true
[for i in 1..n do if notYet.[i] && T(i) then for j in i..i..n do yield j; notYet.[j] <- false]

You can have recursive sequence expression e.g.
let rec allFiles dir =
seq { yield! Directory.GetFiles dir
for d in Directory.GetDirectories dir do
yield! allFiles d }
but circular reference is not possible.
An alternative is to use Seq.distinct from Seq module:
seq { for i in 1..n -> f i }
|> Seq.distinct
or to convert sequence to set using Set.ofSeq before consumption as per #John's comment.

You may also decide to maintain information about the previously generated elements in an explicit way; for example:
let genSeq n =
let elems = System.Collections.Generic.HashSet()
seq {
for i in 1..n do
if not (elems.Contains(i)) then
elems.Add(i) |> ignore
yield i
}

There are several considerations here.
First, you can't check if f(i) is in a list or not before actually computing f(i). So I guess you meant that your check function is expensive, not f(i) itself. Correct me if I'm wrong.
Second, if check is indeed very computationally expensive, you may look for a more effective algorithm. There's no guarantee you will find one for every sequence, but they often exist. Then your code will be nothing but a single Seq.unfold.
Third. When there's no such optimization, you may take another approach. Within [for...yield], you only build a current element and you can't access prior ones. Instead of returning an element, building an entire list manually seems to be the way to go:
// a simple algorithm checking if some F(x) exists in a sequence somehow
let check (x:string) xs = Seq.forall (fun el -> not (x.Contains el)) xs
// a converter i -> something else
let f (i: int) = i.ToString()
let generate f xs =
let rec loop ys = function
| [] -> List.rev ys
| x::t ->
let y = f x
loop (if check y ys then y::ys else ys) t
loop [] xs
// usage
[0..3..1000] |> generate f |> List.iter (printf "%O ")

Related

How do I do in F# what would be called compression in APL?

In APL one can use a bit vector to select out elements of another vector; this is called compression. For example 1 0 1/3 5 7 would yield 3 7.
Is there a accepted term for this in functional programming in general and F# in particular?
Here is my F# program:
let list1 = [|"Bob"; "Mary"; "Sue"|]
let list2 = [|1; 0; 1|]
[<EntryPoint>]
let main argv =
0 // return an integer exit code
What I would like to do is compute a new string[] which would be [|"Bob"; Sue"|]
How would one do this in F#?
Array.zip list1 list2 // [|("Bob",1); ("Mary",0); ("Sue",1)|]
|> Array.filter (fun (_,x) -> x = 1) // [|("Bob", 1); ("Sue", 1)|]
|> Array.map fst // [|"Bob"; "Sue"|]
The pipe operator |> does function application syntactically reversed, i.e., x |> f is equivalent to f x. As mentioned in another answer, replace Array with Seq to avoid the construction of intermediate arrays.
I expect you'll find many APL primitives missing from F#. For lists and sequences, many can be constructed by stringing together primitives from the Seq, Array, or List modules, like the above. For reference, here is an overview of the Seq module.
I think the easiest is to use an array sequence expression, something like this:
let compress bits values =
[|
for i = 0 to bits.Length - 1 do
if bits.[i] = 1 then
yield values.[i]
|]
If you only want to use combinators, this is what I would do:
Seq.zip bits values
|> Seq.choose (fun (bit, value) ->
if bit = 1 then Some value else None)
|> Array.ofSeq
I use Seq functions instead of Array in order to avoid building intermediary arrays, but it would be correct too.
One might say this is more idiomatic:
Seq.map2 (fun l1 l2 -> if l2 = 1 then Some(l1) else None) list1 list2
|> Seq.choose id
|> Seq.toArray
EDIT (for the pipe lovers)
(list1, list2)
||> Seq.map2 (fun l1 l2 -> if l2 = 1 then Some(l1) else None)
|> Seq.choose id
|> Seq.toArray
Søren Debois' solution is good but, as he pointed out, but we can do better. Let's define a function, based on Søren's code:
let compressArray vals idx =
Array.zip vals idx
|> Array.filter (fun (_, x) -> x = 1)
|> Array.map fst
compressArray ends up creating a new array in each of the 3 lines. This can take some time, if the input arrays are long (1.4 seconds for 10M values in my quick test).
We can save some time by working on sequences and creating an array at the end only:
let compressSeq vals idx =
Seq.zip vals idx
|> Seq.filter (fun (_, x) -> x = 1)
|> Seq.map fst
This function is generic and will work on arrays, lists, etc. To generate an array as output:
compressSeq sq idx |> Seq.toArray
The latter saves about 40% of computation time (0.8s in my test).
As ildjarn commented, the function argument to filter can be rewritten to snd >> (=) 1, although that causes a slight performance drop (< 10%), probably because of the extra function call that is generated.

Getting every nth Element of a Sequence

I am looking for a way to create a sequence consisting of every nth element of another sequence, but don't seem to find a way to do that in an elegant way. I can of course hack something, but I wonder if there is a library function that I'm not seeing.
The sequence functions whose names end in -i seem to be quite good for the purpose of figuring out when an element is the nth one or (multiple of n)th one, but I can only see iteri and mapi, none of which really lends itself to the task.
Example:
let someseq = [1;2;3;4;5;6]
let partial = Seq.magicfunction 3 someseq
Then partial should be [3;6]. Is there anything like it out there?
Edit:
If I am not quite as ambitious and allow for the n to be constant/known, then I've just found that the following should work:
let rec thirds lst =
match lst with
| _::_::x::t -> x::thirds t // corrected after Tomas' comment
| _ -> []
Would there be a way to write this shorter?
Seq.choose works nicely in these situations because it allows you do the filter work within the mapi lambda.
let everyNth n elements =
elements
|> Seq.mapi (fun i e -> if i % n = n - 1 then Some(e) else None)
|> Seq.choose id
Similar to here.
You can get the behavior by composing mapi with other functions:
let everyNth n seq =
seq |> Seq.mapi (fun i el -> el, i) // Add index to element
|> Seq.filter (fun (el, i) -> i % n = n - 1) // Take every nth element
|> Seq.map fst // Drop index from the result
The solution using options and choose as suggested by Annon would use only two functions, but the body of the first one would be slightly more complicated (but the principle is essentially the same).
A more efficient version using the IEnumerator object directly isn't too difficult to write:
let everyNth n (input:seq<_>) =
seq { use en = input.GetEnumerator()
// Call MoveNext at most 'n' times (or return false earlier)
let rec nextN n =
if n = 0 then true
else en.MoveNext() && (nextN (n - 1))
// While we can move n elements forward...
while nextN n do
// Retrun each nth element
yield en.Current }
EDIT: The snippet is also available here: http://fssnip.net/1R

Tail recursive copy of a seq to a list in F#

I am trying to build a list from a sequence by recursively appending the first element of the sequence to the list:
open System
let s = seq[for i in 2..4350 -> i,2*i]
let rec copy s res =
if (s|>Seq.isEmpty) then
res
else
let (a,b) = s |> Seq.head
Console.WriteLine(string a)
let newS = s |> Seq.skip(1)|> Seq.cache
let newRes = List.append res ([(a,b)])
copy newS newRes
copy s ([])
Two problems:
. getting a Stack overflow which means my tail recusive ploy sucks
and
. why is the code 100x faster when I put |> Seq.cache here let newS = s |> Seq.skip(1)|> Seq.cache.
(Note this is just a little exercise, I understand you can do Seq.toList etc.. )
Thanks a lot
One way that works is ( the two points still remain a bit weird to me ):
let toList (s:seq<_>) =
let rec copyRev res (enum:Collections.Generic.IEnumerator<_*_>) =
let somethingLeft = enum.MoveNext()
if not(somethingLeft) then
res
else
let curr = enum.Current
Console.WriteLine(string curr)
let newRes = curr::res
copyRev newRes enum
let enumerator = s.GetEnumerator()
(copyRev ([]) (enumerator)) |>List.rev
You say it's just an exercise, but it's useful to point to my answer to
While or Tail Recursion in F#, what to use when?
and reiterate that you should favor more applicative/declarative constructs when possible. E.g.
let rec copy2 s = [
for tuple in s do
System.Console.WriteLine(string(fst tuple))
yield tuple
]
is a nice and performant way to express your particular function.
That said, I'd feel remiss if I didn't also say "never create a list that big". For huge data, you want either array or seq.
In my short experience with F# it is not a good idea to use Seq.skip 1 like you would with lists with tail. Seq.skip creates a new IEnumerable/sequence and not just skips n. Therefore your function will be A LOT slower than List.toSeq. You should properly do it imperative with
s.GetEnumerator()
and iterates through the sequence and hold a list which you cons every single element.
In this question
Take N elements from sequence with N different indexes in F#
I started to do something similar to what you do but found out it is very slow. See my method for inspiration for how to do it.
Addition: I have written this:
let seqToList (xs : seq<'a>) =
let e = xs.GetEnumerator()
let mutable res = []
while e.MoveNext() do
res <- e.Current :: res
List.rev res
And found out that the build in method actually does something very similar (including the reverse part). It do, however, checks whether the sequence you have supplied is in fact a list or an array.
You will be able to make the code entirely functional: (which I also did now - could'nt resist ;-))
let seqToList (xs : seq<'a>) =
Seq.fold (fun state t -> t :: state) [] xs |> List.rev
Your function is properly tail recursive, so the recursive calls themselves are not what is overflowing the stack. Instead, the problem is that Seq.skip is poorly behaved when used recursively, as others have pointed out. For instance, this code overflows the stack on my machine:
let mutable s = seq { 1 .. 20001 }
for i in 1 .. 20000 do
s <- Seq.skip 1 s
let v = Seq.head s
Perhaps you can see the vague connection to your own code, which also eventually takes the head of a sequence which results from repeatedly applying Seq.skip 1 to your initial sequence.
Try the following code.
Warning: Before running this code you will need to enable tail call generation in Visual Studio. This can be done through the Build tab on the project properties page. If this is not enabled the code will StackOverflow processing the continuation.
open System
open System.Collections.Generic
let s = seq[for i in 2..1000000 -> i,2*i]
let rec copy (s : (int * int) seq) =
use e = s.GetEnumerator()
let rec inner cont =
if e.MoveNext() then
let (a,b) = e.Current
printfn "%d" b
inner (fun l -> cont (b :: l))
else cont []
inner (fun x -> x)
let res = copy s
printfn "Done"

Return value in F# - incomplete construct

I've trying to learn F#. I'm a complete beginner, so this might be a walkover for you guys :)
I have the following function:
let removeEven l =
let n = List.length l;
let list_ = [];
let seq_ = seq { for x in 1..n do if x % 2 <> 0 then yield List.nth l (x-1)}
for x in seq_ do
let list_ = list_ # [x];
list_;
It takes a list, and return a new list containing all the numbers, which is placed at an odd index in the original list, so removeEven [x1;x2;x3] = [x1;x3]
However, I get my already favourite error-message: Incomplete construct at or before this point in expression...
If I add a print to the end of the line, instead of list_:
...
print_any list_;
the problem is fixed. But I do not want to print the list, I want to return it!
What causes this? Why can't I return my list?
To answer your question first, the compiler complains because there is a problem inside the for loop. In F#, let serves to declare values (that are immutable and cannot be changed later in the program). It isn't a statement as in C# - let can be only used as part of another expression. For example:
let n = 10
n + n
Actually means that you want the n symbol to refer to the value 10 in the expression n + n. The problem with your code is that you're using let without any expression (probably because you want to use mutable variables):
for x in seq_ do
let list_ = list_ # [x] // This isn't assignment!
list_
The problematic line is an incomplete expression - using let in this way isn't allowed, because it doesn't contain any expression (the list_ value will not be accessed from any code). You can use mutable variable to correct your code:
let mutable list_ = [] // declared as 'mutable'
let seq_ = seq { for x in 1..n do if x % 2 <> 0 then yield List.nth l (x-1)}
for x in seq_ do
list_ <- list_ # [x] // assignment using '<-'
Now, this should work, but it isn't really functional, because you're using imperative mutation. Moreover, appending elements using # is really inefficient thing to do in functional languages. So, if you want to make your code functional, you'll probably need to use different approach. Both of the other answers show a great approach, although I prefer the example by Joel, because indexing into a list (in the solution by Chaos) also isn't very functional (there is no pointer arithmetic, so it will be also slower).
Probably the most classical functional solution would be to use the List.fold function, which aggregates all elements of the list into a single result, walking from the left to the right:
[1;2;3;4;5]
|> List.fold (fun (flag, res) el ->
if flag then (not flag, el::res) else (not flag, res)) (true, [])
|> snd |> List.rev
Here, the state used during the aggregation is a Boolean flag specifying whether to include the next element (during each step, we flip the flag by returning not flag). The second element is the list aggregated so far (we add element by el::res only when the flag is set. After fold returns, we use snd to get the second element of the tuple (the aggregated list) and reverse it using List.rev, because it was collected in the reversed order (this is more efficient than appending to the end using res#[el]).
Edit: If I understand your requirements correctly, here's a version of your function done functional rather than imperative style, that removes elements with odd indexes.
let removeEven list =
list
|> Seq.mapi (fun i x -> (i, x))
|> Seq.filter (fun (i, x) -> i % 2 = 0)
|> Seq.map snd
|> List.ofSeq
> removeEven ['a'; 'b'; 'c'; 'd'];;
val it : char list = ['a'; 'c']
I think this is what you are looking for.
let removeEven list =
let maxIndex = (List.length list) - 1;
seq { for i in 0..2..maxIndex -> list.[i] }
|> Seq.toList
Tests
val removeEven : 'a list -> 'a list
> removeEven [1;2;3;4;5;6];;
val it : int list = [1; 3; 5]
> removeEven [1;2;3;4;5];;
val it : int list = [1; 3; 5]
> removeEven [1;2;3;4];;
val it : int list = [1; 3]
> removeEven [1;2;3];;
val it : int list = [1; 3]
> removeEven [1;2];;
val it : int list = [1]
> removeEven [1];;
val it : int list = [1]
You can try a pattern-matching approach. I haven't used F# in a while and I can't test things right now, but it would be something like this:
let rec curse sofar ls =
match ls with
| even :: odd :: tl -> curse (even :: sofar) tl
| even :: [] -> curse (even :: sofar) []
| [] -> List.rev sofar
curse [] [ 1; 2; 3; 4; 5 ]
This recursively picks off the even elements. I think. I would probably use Joel Mueller's approach though. I don't remember if there is an index-based filter function, but that would probably be the ideal to use, or to make if it doesn't exist in the libraries.
But in general lists aren't really meant as index-type things. That's what arrays are for. If you consider what kind of algorithm would require a list having its even elements removed, maybe it's possible that in the steps prior to this requirement, the elements can be paired up in tuples, like this:
[ (1,2); (3,4) ]
That would make it trivial to get the even-"indexed" elements out:
thelist |> List.map fst // take first element from each tuple
There's a variety of options if the input list isn't guaranteed to have an even number of elements.
Yet another alternative, which (by my reckoning) is slightly slower than Joel's, but it's shorter :)
let removeEven list =
list
|> Seq.mapi (fun i x -> (i, x))
|> Seq.choose (fun (i,x) -> if i % 2 = 0 then Some(x) else None)
|> List.ofSeq

Avoiding stack overflow (with F# infinite sequences of sequences)

I have this "learning code" I wrote for the morris seq in f# that suffers from stack overflow that I don't know how to avoid. "morris" returns an infinite sequence of "see and say" sequences (i.e., {{1}, {1,1}, {2,1}, {1,2,1,1}, {1,1,1,2,2,1}, {3,1,2,2,1,1},...}).
let printList l =
Seq.iter (fun n -> printf "%i" n) l
printfn ""
let rec morris s =
let next str = seq {
let cnt = ref 1 // Stack overflow is below when enumerating
for cur in [|0|] |> Seq.append str |> Seq.windowed 2 do
if cur.[0] <> cur.[1] then
yield!( [!cnt ; cur.[0]] )
cnt := 0
incr cnt
}
seq {
yield s
yield! morris (next s) // tail recursion, no stack overflow
}
// "main"
// Print the nth iteration
let _ = [1] |> morris |> Seq.nth 3125 |> printList
You can pick off the nth iteration using Seq.nth but you can only get so far before you hit a stack overflow. The one bit of recursion I have is tail recursion and it in essence builds a linked set of enumerators. That's not where the problem is. It's when "enum" is called on the say the 4000th sequence. Note that's with F# 1.9.6.16, the previous version topped out above 14000). It's because the way the linked sequences are resolved. The sequences are lazy and so the "recursion" is lazy. That is, seq n calls seq n-1 which calls seq n-2 and so forth to get the first item (the very first # is the worst case).
I understand that [|0|] |> Seq.append str |> Seq.windowed 2, is making my problem worse and I could triple the # I could generate if I eliminated that. Practically speaking the code works well enough. The 3125th iteration of morris would be over 10^359 characters in length.
The problem I'm really trying to solve is how to retain the lazy eval and have a no limit based on stack size for the iteration I can pick off. I'm looking for the proper F# idiom to make the limit based on memory size.
Update Oct '10
After learning F# a bit better, a tiny bit of Haskell, thinking & investigating this problem for over year, I finally can answer my own question. But as always with difficult problems, the problem starts with it being the wrong question. The problem isn't sequences of sequences - it's really because of a recursively defined sequence. My functional programming skills are a little better now and so it's easier to see what's going on with the version below, which still gets a stackoverflow
let next str =
Seq.append str [0]
|> Seq.pairwise
|> Seq.scan (fun (n,_) (c,v) ->
if (c = v) then (n+1,Seq.empty)
else (1,Seq.ofList [n;c]) ) (1,Seq.empty)
|> Seq.collect snd
let morris = Seq.unfold(fun sq -> Some(sq,next sq))
That basicially creates a really long chain of Seq processing function calls to generate the sequnces. The Seq module that comes with F# is what can't follow the chain without using the stack. There's an optimization it uses for append and recursively defined sequences, but that optimization only works if the recursion is implementing an append.
So this will work
let rec ints n = seq { yield n; yield! ints (n+1) }
printf "%A" (ints 0 |> Seq.nth 100000);;
And this one will get a stackoverflow.
let rec ints n = seq { yield n; yield! (ints (n+1)|> Seq.map id) }
printf "%A" (ints 0 |> Seq.nth 100000);;
To prove the F# libary was the issue, I wrote my own Seq module that implemented append, pairwise, scan and collect using continutions and now I can begin generating and printing out the 50,000 seq without a problem (it'll never finish since it's over 10^5697 digits long).
Some additional notes:
Continuations were the idiom I was looking for, but in this case, they had to go into the F# library, not my code. I learned about continuations in F# from Tomas Petricek's Real-World Functional Programming book.
The lazy list answer that I accepted held the other idiom; lazy evaluation. In my rewritten library, I also had to leverage the lazy type to avoid stackoverflow.
The lazy list version sorta of works by luck (maybe by design but that's beyond my current ability to determine) - the active-pattern matching it uses while it's constructing and iterating causes the lists to calculate values before the required recursion gets too deep, so it's lazy, but not so lazy it needs continuations to avoid stackoverflow. For example, by the time the 2nd sequence needs a digit from the 1st sequence, it's already been calculated. In other words, the LL version is not strictly JIT lazy for sequence generation, only list management.
You should definitely check out
http://research.microsoft.com/en-us/um/cambridge/projects/fsharp/manual/FSharp.PowerPack/Microsoft.FSharp.Collections.LazyList.html
but I will try to post a more comprehensive answer later.
UPDATE
Ok, a solution is below. It represents the Morris sequence as a LazyList of LazyLists of int, since I presume you want it to be lazy in 'both directions'.
The F# LazyList (in the FSharp.PowerPack.dll) has three useful properties:
it is lazy (evaluation of the nth element will not happen until it is first demanded)
it does not recompute (re-evaluation of the nth element on the same object instance will not recompute it - it caches each element after it's first computed)
you can 'forget' prefixes (as you 'tail' into the list, the no-longer-referenced prefix is available for garbage collection)
The first property is common with seq (IEnumerable), but the other two are unique to LazyList and very useful for computational problems such as the one posed in this question.
Without further ado, the code:
// print a lazy list up to some max depth
let rec PrintList n ll =
match n with
| 0 -> printfn ""
| _ -> match ll with
| LazyList.Nil -> printfn ""
| LazyList.Cons(x,xs) ->
printf "%d" x
PrintList (n-1) xs
// NextMorris : LazyList<int> -> LazyList<int>
let rec NextMorris (LazyList.Cons(cur,rest)) =
let count = ref 1
let ll = ref rest
while LazyList.nonempty !ll && (LazyList.hd !ll) = cur do
ll := LazyList.tl !ll
incr count
LazyList.cons !count
(LazyList.consf cur (fun() ->
if LazyList.nonempty !ll then
NextMorris !ll
else
LazyList.empty()))
// Morris : LazyList<int> -> LazyList<LazyList<int>>
let Morris s =
let rec MakeMorris ll =
LazyList.consf ll (fun () ->
let next = NextMorris ll
MakeMorris next
)
MakeMorris s
// "main"
// Print the nth iteration, up to a certain depth
[1] |> LazyList.of_list |> Morris |> Seq.nth 3125 |> PrintList 10
[1] |> LazyList.of_list |> Morris |> Seq.nth 3126 |> PrintList 10
[1] |> LazyList.of_list |> Morris |> Seq.nth 100000 |> PrintList 35
[1] |> LazyList.of_list |> Morris |> Seq.nth 100001 |> PrintList 35
UPDATE2
If you just want to count, that's fine too:
let LLLength ll =
let rec Loop ll acc =
match ll with
| LazyList.Cons(_,rest) -> Loop rest (acc+1N)
| _ -> acc
Loop ll 0N
let Main() =
// don't do line below, it leaks
//let hundredth = [1] |> LazyList.of_list |> Morris |> Seq.nth 100
// if we only want to count length, make sure we throw away the only
// copy as we traverse it to count
[1] |> LazyList.of_list |> Morris |> Seq.nth 100
|> LLLength |> printfn "%A"
Main()
The memory usage stays flat (under 16M on my box)... hasn't finished running yet, but I computed the 55th length fast, even on my slow box, so I think this should work just fine. Note also that I used 'bignum's for the length, since I think this will overflow an 'int'.
I believe there are two main problems here:
Laziness is very inefficient so you can expect a lazy functional implementation to run orders of magnitude slower. For example, the Haskell implementation described here is 2,400× slower than the F# I give below. If you want a workaround, your best bet is probably to amortize the computations by bunching them together into eager batches where the batches are produced on-demand.
The Seq.append function is actually calling into C# code from IEnumerable and, consequently, its tail call doesn't get eliminated and you leak a bit more stack space every time you go through it. This shows up when you come to enumerate over the sequence.
The following is over 80× faster than your implementation at computing the length of the 50th subsequence but perhaps it is not lazy enough for you:
let next (xs: ResizeArray<_>) =
let ys = ResizeArray()
let add n x =
if n > 0 then
ys.Add n
ys.Add x
let mutable n = 0
let mutable x = 0
for i=0 to xs.Count-1 do
let x' = xs.[i]
if x=x' then
n <- n + 1
else
add n x
n <- 1
x <- x'
add n x
ys
let morris =
Seq.unfold (fun xs -> Some(xs, next xs)) (ResizeArray [1])
The core of this function is a fold over a ResizeArray that could be factored out and used functionally without too much performance degradation if you used a struct as the accumulator.
Just save the previous element that you looked for.
let morris2 data = seq {
let cnt = ref 0
let prev = ref (data |> Seq.nth 0)
for cur in data do
if cur <> !prev then
yield! [!cnt; !prev]
cnt := 1
prev := cur
else
cnt := !cnt + 1
yield! [!cnt; !prev]
}
let rec morrisSeq2 cur = seq {
yield cur
yield! morrisSeq2 (morris2 cur)
}

Resources