Tested on F# 3.1 on windows 7
fsi.PrintLength <- 5000;;
[1..5000];;
Process is terminated due to StackOverflowException.
Session termination detected. Press Enter to restart.
on Mono (F# 4.0), there doesn't seem to be such a limitation.
I think this is a bug in the formatting module that takes care of pretty printing to F# Interactive.
There are some non-tail recursive functions that uses PrintLength e.g. boundedUnfoldL in this line. Implementation of boundedUnfoldL is indeed not tail-recursive:
let boundedUnfoldL
(itemL : 'a -> layout)
(project : 'z -> ('a * 'z) option)
(stopShort : 'z -> bool)
(z : 'z)
maxLength =
let rec consume n z =
if stopShort z then [wordL "..."] else
match project z with
| None -> [] // exhaused input
| Some (x,z) -> if n<=0 then [wordL "..."] // hit print_length limit
else itemL x :: consume (n-1) z // cons recursive...
consume maxLength z
I don't know why it doesn't blow up on Mono. It would be surprising if F# Interactive on Mono can handle length > 5000 successfully.
You can report this as a bug to https://visualfsharp.codeplex.com/workitem/list/basic.
Related
A simple example, inspired by this question:
module SimpleExample =
let fooFold projection folder state source =
source |> List.map projection |> List.fold folder state
// val fooFold :
// projection:('a -> 'b) ->
// folder:('c -> 'b -> 'c) -> state:'c -> source:'a list -> 'c
let fooReduce projection reducer source =
source |> List.map projection |> List.reduce reducer
// val fooReduce :
// projection:('a -> 'b) -> reducer:('b -> 'b -> 'b) -> source:'a list -> 'b
let game = [0, 5; 10, 15]
let minX, maxX = fooReduce fst min game, fooReduce fst max game
let minY, maxY = fooReduce snd min game, fooReduce snd max game
What would be a natural name for the functions fooFold and fooReduce in this example? Alas, mapFold and mapReduce are already taken.
mapFold is part of the F# library and does a fold operation over the input to return a tuple of 'result list * 'state, similar to scan, but without the initial state and the need to provide the tuple as part of the state yourself. Its signature is:
val mapFold : ('State -> 'T -> 'Result * 'State) -> 'State -> 'T list
-> 'Result list * 'State
Since the projection can easily be integrated into the folder, the fooFold function is only included for illustration purposes.
And MapReduce:
MapReduce is an algorithm for processing huge datasets on certain
kinds of distributable problems using a large number of nodes
Now for a more complex example, where the fold/reduce is not directly applied to the input, but to the groupings following a selection of the keys.
The example has been borrowed from a Python library, where it is called - perhaps misleadingly - reduceby.
module ComplexExample =
let fooFold keySelection folder state source =
source |> Seq.groupBy keySelection
|> Seq.map (fun (k, xs) ->
k, Seq.fold folder state xs)
// val fooFold :
// keySelection:('a -> 'b) ->
// folder:('c -> 'a -> 'c) -> state:'c -> source:seq<'a> -> seq<'b * 'c>
// when 'b : equality
let fooReduce keySelection projection reducer source =
source |> Seq.groupBy keySelection
|> Seq.map (fun (k, xs) ->
k, xs |> Seq.map projection |> Seq.reduce reducer)
// val fooReduce :
// keySelection:('a -> 'b) ->
// projection:('a -> 'c) ->
// reducer:('c -> 'c -> 'c) -> source:seq<'a> -> seq<'b * 'c>
// when 'b : equality
type Project = { name : string; state : string; cost : decimal }
let projects =
[ { name = "build roads"; state = "CA"; cost = 1000000M }
{ name = "fight crime"; state = "IL"; cost = 100000M }
{ name = "help farmers"; state = "IL"; cost = 2000000M }
{ name = "help farmers"; state = "CA"; cost = 200000M } ]
fooFold (fun x -> x.state) (fun acc x -> acc + x.cost) 0M projects
// val it : seq<string * decimal> = seq [("CA", 1200000M); ("IL", 2100000M)]
fooReduce (fun x -> x.state) (fun x -> x.cost) (+) projects
// val it : seq<string * decimal> = seq [("CA", 1200000M); ("IL", 2100000M)]
What would be the natural name for the functions fooFold and fooReduce here?
I'd probably call the first two mapAndFold and mapAndReduce (though I agree that mapFold and mapReduce would be good names if they were not already taken). Alternatively, I'd go with mapThenFold (etc.), which is perhaps more explicit, but it reads a bit cumbersome.
For the more complex ones, reduceBy and foldBy sound good. The issue is that this would not work if you also wanted a version of those functions that do not do the mapping operation. If you wanted that, you'd probably need mapAndFoldBy and mapAndReduceBy (as well as just foldBy and reduceBy). This gets a bit ugly, but I'm afraid that's the best you can do.
More generally, the issue when comparing names with Python is that Python allows overloading whereas F# functions do not. This means that you need to have a unique name for functions that would have multiple overloads. This means that you just need to come up with a consistent naming scheme that will not make the names unbearably long.
(I experienced this when coming up with names for the functions in the Deedle library, which is somewhat inspired by Pandas. You can see for example the aggregation functions in Deedle for an example - there is a pattern in the naming to deal with the fact that each function needs a unique name.)
I have a different opinion as Thomas.
First; I think that not having overloads is a good thing, and giving every operation unique names is also
something good. I also would say that giving long names to functions rarely used is even more important
and should not be avoided.
Writing longer names is usally never a problem as we as programers usually use an IDE with auto-completion.
But reading and understanding is different. Knowing what a functions does because of a long descriptive name
is better then a short name.
A long descriptive function name gets more important the less often a function is used. It helps reading and
understanding the code. A short and less descriptive function name that is rarely used causes confusion. The
confusion would just increase if it even would be just an overload of another function name.
Yes; naming things can be hard, that's the reason why its important and shoudn't be avoided.
To what you describe. I would have name it mapFold and mapReduce. As those exactly describe what they do.
There is already a mapFold in F#, and in my opinion, the F# devs fucked up either with the naming, arguments or the
output of the function. But anyhow, they just fucked up.
I usually would have expected mapFold to do map and then fold. Actually it does, but it also returns the intermediate
list that is created on the run. Something I would not expect it to return. And i would also expect it to pass two
functions instead of one.
When we get to Thomas suggestion on naming it mapAndFold or mapThenFold. Then i would expect different behaviour
for those two functions. mapThenFold exactly tells what it does. map and then fold on it. I think the then is
not important. That's also why I would name it mapFold or mapReduce. Writing it this way already suggest a then.
But mapAndFold or mapAndReduce does not tell something about the order of execution. It just says it does two things
or somehow returns this AND that.
With that in mind, i would say that the F# library should have named its mapFold either mapAndFold, changed the return
value to just return the fold (and have two arguments instead of one). But hey, its fucked up now, we cannot change it anymore.
As for mapReduce, I think you are a little bit mistaken. The mapReduce algorithm is named that way, because it just does
map and then reduce. And that's it.
But functional programming with its stateless and more descriptive operations sometimes have additional benefits. Technically
a map is less powerful compared to a for/fold as it just describes how values are changed, without that the order matters
or the position in a list. But because of this limitation, you can run it in parallel, even on a big computer cluster. And that's all
what mapReduce Algorithm you cite do.
But that doesn't mean a mapReduce must always run its operation on a big cluster or in parallel. In my opinion you could
just name it mapReduce and that's fine. Everybody will know what it does and I think nobody expect it to suddenly run on
cluster.
In general I think the mapFold that F# provides is silly, here are 4 examples how I think it should have been provided.
let double x = x * 2
let add x y = x + y
mapFold double add 0 [1..10] // 110
mapAndFold double add 0 [1..10] // [2;4;6;8;10;12;14;16;18;20] * 110
mapReduce double add [1..10] // Some (110)
mapAndReduce double add [1..10] // Some ([2;4;6;8;10;12;14;16;18;20] * 110)
Well mapFold doesn't work that way, so you have the following options.
Implement mapReduce the way you have it. And ignore the in-consistency with mapFold.
Provide mapAndReduce and mapReduce.
Make your mapReduce return the same crap as the default implementation of mapFold does and provide mapThenReduce.
Like (3) but also add mapThenFold.
Option 4 has the most compatibility and expectation of what already exists in F#. But that doesn't mean you must do it that way.
In my opinion I would just:
implement mapReduce returning the result of map and then reduce.
I wouldn't care about a mapAndReduce version that returns a list and the result.
Provide a mapThenFold expecting two function arguments returning the result just of fold.
As a general notice: Implementing mapReduce just by calling map and then reduce is somewhat pointless. I would
expect it to have a more low-level implementation that does both things by just traversing the data-structure once.
If not, i just can call map and then reduce anyway.
So an implementation should look like:
let mapReduce mapper reducer xs =
let rec loop state xs =
match xs with
| [] -> state
| x::xs -> loop (reducer state (mapper x)) xs
match xs with
| [] -> ValueNone
| [x] -> ValueSome (mapper x)
| x::xs -> ValueSome (loop (mapper x) xs)
let double x = x * 2
let add x y = x + y
let some110 = mapReduce double add [1..10]
I was trying to write a generic mapFoldWhile function, which is just mapFold but requires the state to be an option and stops as soon as it encounters a None state.
I don't want to use mapFold because it will transform the entire list, but I want it to stop as soon as an invalid state (i.e. None) is found.
This was myfirst attempt:
let mapFoldWhile (f : 'State option -> 'T -> 'Result * 'State option) (state : 'State option) (list : 'T list) =
let rec mapRec f state list results =
match list with
| [] -> (List.rev results, state)
| item :: tail ->
let (result, newState) = f state item
match newState with
| Some x -> mapRec f newState tail (result :: results)
| None -> ([], None)
mapRec f state list []
The List.rev irked me, since the point of the exercise was to exit early and constructing a new list ought to be even slower.
So I looked up what F#'s very own map does, which was:
let map f list = Microsoft.FSharp.Primitives.Basics.List.map f list
The ominous Microsoft.FSharp.Primitives.Basics.List.map can be found here and looks like this:
let map f x =
match x with
| [] -> []
| [h] -> [f h]
| (h::t) ->
let cons = freshConsNoTail (f h)
mapToFreshConsTail cons f t
cons
The consNoTail stuff is also in this file:
// optimized mutation-based implementation. This code is only valid in fslib, where mutation of private
// tail cons cells is permitted in carefully written library code.
let inline setFreshConsTail cons t = cons.(::).1 <- t
let inline freshConsNoTail h = h :: (# "ldnull" : 'T list #)
So I guess it turns out that F#'s immutable lists are actually mutable because performance? I'm a bit worried about this, having used the prepend-then-reverse list approach as I thought it was the "way to go" in F#.
I'm not very experienced with F# or functional programming in general, so maybe (probably) the whole idea of creating a new mapFoldWhile function is the wrong thing to do, but then what am I to do instead?
I often find myself in situations where I need to "exit early" because a collection item is "invalid" and I know that I don't have to look at the rest. I'm using List.pick or Seq.takeWhile in some cases, but in other instances I need to do more (mapFold).
Is there an efficient solution to this kind of problem (mapFoldWhile in particular and "exit early" in general) with functional programming concepts, or do I have to switch to an imperative solution / use a Collections.Generics.List?
In most cases, using List.rev is a perfectly sufficient solution.
You are right that the F# core library uses mutation and other dirty hacks to squeeze some more performance out of the F# list operations, but I think the micro-optimizations done there are not particularly good example. F# list functions are used almost everywhere so it might be a good trade-off, but I would not follow it in most situations.
Running your function with the following:
let l = [ 1 .. 1000000 ]
#time
mapFoldWhile (fun s v -> 0, s) (Some 1) l
I get ~240ms on the second line when I run the function without changes. When I just drop List.rev (so that it returns the data in the other order), I get around ~190ms. If you are really calling the function frequently enough that this matters, then you'd have to use mutation (actually, your own mutable list type), but I think that is rarely worth it.
For general "exit early" problems, you can often write the code as a composition of Seq.scan and Seq.takeWhile. For example, say you want to sum numbers from a sequence until you reach 1000. You can write:
input
|> Seq.scan (fun sum v -> v + sum) 0
|> Seq.takeWhile (fun sum -> sum < 1000)
Using Seq.scan generates a sequence of sums that is over the whole input, but since this is lazily generated, using Seq.takeWhile stops the computation as soon as the exit condition happens.
I have an issues concerning the F# on mono. Im doing this course in functional programming at my university. In the course we are using F#, and I uses Xamarin as my editor.
The thing is that we had a lesson on tail recursion, as a tool for getting efficiency. But when you are not able to write your function tail recursive, we had to use continuous, such that we using the heap and not the stack.
This seems not to work on mono 3.10.0 with F# 3.1, I get an System.StackOverflowException. This should be impossible to get, due the continuous should use the heap.
let rec fibC n c =
match n with
|0 -> c 0
|1 -> c 1
|n -> fibC (n-1) (fun v1 -> fibC (n-2) (fun v2 -> c(v1+v2)))
I tested a Fibonacci implementation passing an accumulator instead of a function (continuation) like this:
let fib n =
let rec _fib i (a,b) =
match i with
| 0 -> a
| _ -> _fib (i-1) (b, a+b)
_fib n (0,1)
which worked fine on Mono, i.e. no stack overflow.
So I guess it's only an issue with TCO when using continuations. There's a Xamarin ticket from June 2013 addressing this.
These are 2 functions, fun1 takes 1 parameter, fun2 takes 4 extra useless parameters. When I targeted for x64, fun1 takes 4s but fun2 takes less than 1s. If I targeted for anycpu, then both take less than 1s.
There is a similar question I asked here
why Seq.iter is 2x faster than for loop if target is for x64?
It is compiled in .Net 4.5 Visual Studio 2012, F# 3.0, run in windows 7 x64
open System
open System.Diagnostics
type Position =
{
a: int
b: int
}
[<EntryPoint>]
let main argv =
let fun1 (pos: Position[]) = //<<<<<<<< here
let functionB x y z = 4
Array.fold2 (fun acc x y -> acc + int64 (functionB x x y)) 0L pos pos
let fun2 (pos: Position[]) u v w x = //<<<<<<<< here
let functionB x y z = 4
Array.fold2 (fun acc x y -> acc + int64 (functionB x x y)) 0L pos pos
let s = {a=2;b=3}
let pool = [|s;s;s|]
let test1 n =
let mutable x = 0L
for i in 1 .. n do
x <- fun1 pool
let test2 n =
let mutable x = 0L
for i in 1 .. n do
x <- fun2 pool 1 2 3 4
let sw = new Stopwatch()
sw.Start()
test2 10000000
sw.Stop()
Console.WriteLine(sw.Elapsed)
sw.Restart()
test1 10000000
sw.Stop()
Console.WriteLine(sw.Elapsed)
0 // return an integer exit code
This isn't a complete answer, it is first diagnostics of the problem.
I can reproduce the behaviour with the same configuration. If you turn on F# Interactive 64-bit in Tools -> Options -> F# Tools -> F# Interactive, you can observe the same behaviour there.
Diferrent from the other question, x64 jitter isn't a problem. It turns out "Generate tail calls" option in Project property causes considerable slowdown of test1 compared to test2. If you turn off that option, two cases are at similar speeds.
On the other hand, you can use inline keyword on fun1 so that tail call isn't needed. Two examples are comparable in execution time again no matter fun2 is inlined or not.
That said, it is weird that adding tail. opcode to fun1 makes it much slower than (doing the same with) fun2. You may contact F# team for further investigation.
The difference is almost certainly a quirk of the JITer. It also explains the inconsistent results. This is a common problem with micro-benchmarking tests like this. Perform one or more redundant executions of the methods in order to compile the whole thing behind the scenes, and time the last one. They will be identical.
You can get more bizarre results than this due to this quirk.
In almost all examples, a y-combinator in ML-type languages is written like this:
let rec y f x = f (y f) x
let factorial = y (fun f -> function 0 -> 1 | n -> n * f(n - 1))
This works as expected, but it feels like cheating to define the y-combinator using let rec ....
I want to define this combinator without using recursion, using the standard definition:
Y = λf·(λx·f (x x)) (λx·f (x x))
A direct translation is as follows:
let y = fun f -> (fun x -> f (x x)) (fun x -> f (x x));;
However, F# complains that it can't figure out the types:
let y = fun f -> (fun x -> f (x x)) (fun x -> f (x x));;
--------------------------------^
C:\Users\Juliet\AppData\Local\Temp\stdin(6,33): error FS0001: Type mismatch. Expecting a
'a
but given a
'a -> 'b
The resulting type would be infinite when unifying ''a' and ''a -> 'b'
How do I write the y-combinator in F# without using let rec ...?
As the compiler points out, there is no type that can be assigned to x so that the expression (x x) is well-typed (this isn't strictly true; you can explicitly type x as obj->_ - see my last paragraph). You can work around this issue by declaring a recursive type so that a very similar expression will work:
type 'a Rec = Rec of ('a Rec -> 'a)
Now the Y-combinator can be written as:
let y f =
let f' (Rec x as rx) = f (x rx)
f' (Rec f')
Unfortunately, you'll find that this isn't very useful because F# is a strict language,
so any function that you try to define using this combinator will cause a stack overflow.
Instead, you need to use the applicative-order version of the Y-combinator (\f.(\x.f(\y.(x x)y))(\x.f(\y.(x x)y))):
let y f =
let f' (Rec x as rx) = f (fun y -> x rx y)
f' (Rec f')
Another option would be to use explicit laziness to define the normal-order Y-combinator:
type 'a Rec = Rec of ('a Rec -> 'a Lazy)
let y f =
let f' (Rec x as rx) = lazy f (x rx)
(f' (Rec f')).Value
This has the disadvantage that recursive function definitions now need an explicit force of the lazy value (using the Value property):
let factorial = y (fun f -> function | 0 -> 1 | n -> n * (f.Value (n - 1)))
However, it has the advantage that you can define non-function recursive values, just as you could in a lazy language:
let ones = y (fun ones -> LazyList.consf 1 (fun () -> ones.Value))
As a final alternative, you can try to better approximate the untyped lambda calculus by using boxing and downcasting. This would give you (again using the applicative-order version of the Y-combinator):
let y f =
let f' (x:obj -> _) = f (fun y -> x x y)
f' (fun x -> f' (x :?> _))
This has the obvious disadvantage that it will cause unneeded boxing and unboxing, but at least this is entirely internal to the implementation and will never actually lead to failure at runtime.
I would say it's impossible, and asked why, I would handwave and invoke the fact that simply typed lambda calculus has the normalization property. In short, all terms of the simply typed lambda calculus terminate (consequently Y can not be defined in the simply typed lambda calculus).
F#'s type system is not exactly the type system of simply typed lambda calculus, but it's close enough. F# without let rec comes really close to the simply typed lambda calculus -- and, to reiterate, in that language you cannot define a term that does not terminate, and that excludes defining Y too.
In other words, in F#, "let rec" needs to be a language primitive at the very least because even if you were able to define it from the other primitives, you would not be able to type this definition. Having it as a primitive allows you, among other things, to give a special type to that primitive.
EDIT: kvb shows in his answer that type definitions (one of the features absent from the simply typed lambda-calculus but present in let-rec-less F#) allow to get some sort of recursion. Very clever.
Case and let statements in ML derivatives are what makes it Turing Complete, I believe they're based on System F and not simply typed but the point is the same.
System F cannot find a type for the any fixed point combinator, if it could, it wasn't strongly normalizing.
What strongly normalizing means is that any expression has exactly one normal form, where a normal form is an expression that cannot be reduced any further, this differs from untyped where every expression has at max one normal form, it can also have no normal form at all.
If typed lambda calculi could construct a fixed point operator in what ever way, it was quite possible for an expression to have no normal form.
Another famous theorem, the Halting Problem, implies that strongly normalizing languages are not Turing complete, it says that's impossible to decide (different than prove) of a turing complete language what subset of its programs will halt on what input. If a language is strongly normalizing, it's decidable if it halts, namely it always halts. Our algorithm to decide this is the program: true;.
To solve this, ML-derivatives extend System-F with case and let (rec) to overcome this. Functions can thus refer to themselves in their definitions again, making them in effect no lambda calculi at all any more, it's no longer possible to rely on anonymous functions alone for all computable functions. They can thus again enter infinite loops and regain their turing-completeness.
Short answer: You can't.
Long answer:
The simply typed lambda calculus is strongly normalizing. This means it's not Turing equivalent. The reason for this basically boils down to the fact that a Y combinator must either be primitive or defined recursively (as you've found). It simply cannot be expressed in System F (or simpler typed calculi). There's no way around this (it's been proven, after all). The Y combinator you can implement works exactly the way you want, though.
I would suggest you try scheme if you want a real Church-style Y combinator. Use the applicative version given above, as other versions won't work, unless you explicitly add laziness, or use a lazy Scheme interpreter. (Scheme technically isn't completely untyped, but it's dynamically typed, which is good enough for this.)
See this for the proof of strong normalization:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.127.1794
After thinking some more, I'm pretty sure that adding a primitive Y combinator that behaves exactly the way the letrec defined one does makes System F Turing complete. All you need to do to simulate a Turing machine then is implement the tape as an integer (interpreted in binary) and a shift (to position the head).
Simply define a function taking its own type as a record, like in Swift (there it's a struct) :)
Here, Y (uppercase) is semantically defined as a function that can be called with its own type. In F# terms, it is defined as a record containing a function named call, so for calling a y defined as this type, you have to actually call y.call :)
type Y = { call: Y -> (int -> int) }
let fibonacci n =
let makeF f: int -> int =
fun x ->
if x = 0 then 0 else if x = 1 then 1 else f(x - 1) + f(x - 2)
let y = { call = fun y -> fun x -> (makeF (y.call y)) x }
(y.call y) n
It's not supremely elegant to read but it doesn't resort to recursion for defining a y combinator that is supposed to provide recursion all by itself ^^