Related
I am learning F# and the use cases of the |>, >>, and << operators confuse me. I get that everything if statements, functions, etc. act like variables but how do these work?
Usually we (community) say the Pipe Operator |> is just a way, to write the last argument of a function before the function call. For example
f x y
can be written
y |> f x
but for correctness, this is not true. It just pass the next argument to a function. So you could even write.
y |> (x |> f)
All of this, and all other kind of operators works, because in F# all functions are curried by default. This means, there exists only functions with one argument. Functions with many arguments, are implemented that a functions return another function.
You could also write
(f x) y
for example. The function f is a function that takes x as argument and returns another function. This then gets y passed as an argument.
This process is automatically done by the language. So if you write
let f x y z = x + y + z
it is the same as:
let f = fun x -> fun y -> fun z -> x + y + z
Currying is by the way the reason why parenthesis in a ML-like language are not enforced compared to a LISP like language. Otherwise you would have needded to write:
(((f 1) 2) 3)
to execute a function f with three arguments.
The pipe operator itself is just another function, it is defined as
let (|>) x f = f x
It takes a value x as its first argument. And a function f as its second argument. Because operators a written "infix" (this means between two operands) instead of "prefix" (before arguments, the normal way), this means its left argument to the operator is the first argument.
In my opinion, |> is used too much by most F# people. It makes sense to use piping if you have a chain of operations, one after another. Typically for example if you have multiple list operations.
Let's say, you want to square all numbers in a list and then filter only the even ones. Without piping you would write.
List.filter isEven (List.map square [1..10])
Here the second argument to List.filter is a list that is returned by List.map. You can also write it as
List.map square [1..10]
|> List.filter isEven
Piping is Function application, this means, you will execute/run a function, so it computes and returns a value as its result.
In the above example List.map is first executed, and the result is passed to List.filter. That's true with piping and without piping. But sometimes, you want to create another function, instead of executing/running a function. Let's say you want to create a function, from the above. The two versions you could write are
let evenSquares xs = List.filter isEven (List.map square xs)
let evenSquares xs = List.map square xs |> List.filter isEven
You could also write it as function composition.
let evenSquares = List.filter isEven << List.map square
let evenSquares = List.map square >> List.filter isEven
The << operator resembles function composition in the "normal" way, how you would write a function with parenthesis. And >> is the "backwards" compositon, how it would be written with |>.
The F# documentation writes it the other way, what is backward and forward. But i think the F# language creators are wrong.
The function composition operators are defined as:
let (<<) f g x = f (g x)
let (>>) f g x = g (f x)
As you see, the operator has technically three arguments. But remember currying. When you write f << g, then the result is another functions, that expects the last argument x. Passing less arguments then needed is also often called Partial Application.
Function composition is less often used in F#, because the compiler sometimes have problems with type inference if the function arguments are generic.
Theoretically you could write a program without ever defining a variable, just through function composition. This is also named Point-Free style.
I would not recommend it, it often makes code harder to read and/or understand. But it is sometimes used if you want to pass a function to another
Higher-Order function. This means, a functions that take another function as an argument. Like List.map, List.filter and so on.
Pipes and composition operators have simple definition but are difficult to grasp. But once we have understand them, they are super useful and we miss them when we get back to C#.
Here some explanations but you get the best feedbacks from your own experiments. Have fun!
Pipe right operator |>
val |> fn ≡ fn val
Utility:
Building a pipeline, to chain calls to functions: x |> f |> g ≡ g (f x).
Easier to read: just follow the data flow
No intermediary variables
Natural language in english: Subject Verb.
It's regular in object-oriented code : myObject.do()
In F#, the "subject" is usually the last parameter: List.map f list. Using |>, we get back the natural "Subject Verb" order: list |> List.map f
Final benefit but not the least: help type inference:
let items = ["a"; "bb"; "ccc"]
let longestKo = List.maxBy (fun x -> x.Length) items // ❌ Error FS0072
// ~~~~~~~~
let longest = items |> List.maxBy (fun x -> x.Length) // ✅ return "ccc"
Pipe left operator <|
fn <| expression ≡ fn (expression)
Less used than |>
✅ Small benefit: avoiding parentheses
❌ Major drawback: inverse of the english natural "left to right" reading order and inverse of execution order (because of left-associativity)
printf "%i" 1+2 // 💥 Error
printf "%i" (1+2) // With parentheses
printf "%i" <| 1+2 // With pipe left
What about this kind of expression: x |> fn <| y ❓
In theory, allow using fn in infix position, equivalent of fn x y
In practice, it can be very confusing for some readers not used to it.
👉 It's probably better to avoid using <|
Forward composition operator >>
Binary operator placed between 2 functions:
f >> g ≡ fun x -> g (f x) ≡ fun x -> x |> f |> g
Result of the 1st function is used as argument for the 2nd function
→ types must match: f: 'T -> 'U and g: 'U -> 'V → f >> g :'T -> 'V
let add1 x = x + 1
let times2 x = x * 2
let add1Times2 x = times2(add1 x) // 😕 Style explicit but heavy
let add1Times2' = add1 >> times2 // 👍 Style concise
Backward composition operator <<
f >> g ≡ g << f
Less used than >>, except to get terms in english order:
let even x = x % 2 = 0
// even not 😕
let odd x = x |> even |> not
// "not even" is easier to read 👍
let odd = not << even
☝ Note: << is the mathematical function composition ∘: g ∘ f ≡ fun x -> g (f x) ≡ g << f.
It's confusing in F# because it's >> that is usually called the "composition operator" ("forward" being usually omitted).
On the other hand, the symbols used for these operators are super useful to remember the order of execution of the functions: f >> g means apply f then apply g. Even if argument is implicit, we get the data flow direction:
>> : from left to right → f >> g ≡ fun x -> x |> f |> g
<< : from right to left → f << g ≡ fun x -> f <| (g <| x)
(Edited after good advices from David)
I am learning the basics of functional programming and Erlang, and I've implemented three versions of the factorial function: using recursion with guards, using recursion with pattern matching, and using tail recursion.
I am trying to compare the performance of each factorial implementation (Erlang/OTP 22 [erts-10.4.1]):
%% Simple factorial code:
fac(N) when N == 0 -> 1;
fac(N) when N > 0 -> N * fac(N - 1).
%% Using pattern matching:
fac_pattern_matching(0) -> 1;
fac_pattern_matching(N) when N > 0 -> N * fac_pattern_matching(N - 1).
%% Using tail recursion (and pattern matching):
tail_fac(N) -> tail_fac(N, 1).
tail_fac(0, Acc) -> Acc;
tail_fac(N, Acc) when N > 0 -> tail_fac(N - 1, N * Acc).
Timer helper:
-define(PRECISION, microsecond).
execution_time(M, F, A, D) ->
StartTime = erlang:system_time(?PRECISION),
Result = apply(M, F, A),
EndTime = erlang:system_time(?PRECISION),
io:format("Execution took ~p ~ps~n", [EndTime - StartTime, ?PRECISION]),
if
D =:= true -> io:format("Result is ~p~n", [Result]);
true -> ok
end
.
Execution results:
Recursive version:
3> mytimer:execution_time(factorial, fac, [1000000], false).
Execution took 1253949667 microseconds
ok
Recursive with pattern matching version:
4> mytimer:execution_time(factorial, fac_pattern_matching, [1000000], false).
Execution took 1288239853 microseconds
ok
Tail recursive version:
5> mytimer:execution_time(factorial, tail_fac, [1000000], false).
Execution took 1405612434 microseconds
ok
I was expecting tail recursion version to perform better than the other two but, to my surprise it is less performant. These results are the exact opposite of what I was expecting.
Why?
The problem is in function which you choose. Factorial is a function which grows very fast. Erlang has implemented big integer arithmetics, so it will not overflow. You are effectively measuring how good is underlying big integer implementation. 1000000! is a huge number. It is 8.26×10^5565708 which is like 5.6MB long written as a decadic number. There is a difference between your fac/1 and tail_fac/1 how fast they reach big numbers where big integer implementation kicks in and how fast the number grows. In you fac/1 implementation you are effectively computing 1*2*3*4*...*N. In your tail_fac/1 implementation you are computing N*(N-1)*(N-2)*(N-3)*...*1. Do you see the issue there? You can write tail call implementation in a different way:
tail_fac2(N) when is_integer(N), N > 0 ->
tail_fac2(N, 0, 1).
tail_fac2(X, X, Acc) -> Acc;
tail_fac2(N, X, Acc) ->
Y = X + 1,
tail_fac2(N, Y, Y*Acc).
It will work much better. I'm not patient as you are so I will measure a little bit smaller numbers but the new fact:tail_fac2/1 shoudl outperform fact:fac/1 every single time:
1> element(1, timer:tc(fun()-> fact:fac(100000) end)).
7743768
2> element(1, timer:tc(fun()-> fact:fac(100000) end)).
7629604
3> element(1, timer:tc(fun()-> fact:fac(100000) end)).
7651739
4> element(1, timer:tc(fun()-> fact:tail_fac(100000) end)).
7229662
5> element(1, timer:tc(fun()-> fact:tail_fac(100000) end)).
7104056
6> element(1, timer:tc(fun()-> fact:tail_fac2(100000) end)).
6491195
7> element(1, timer:tc(fun()-> fact:tail_fac2(100000) end)).
6506565
8> element(1, timer:tc(fun()-> fact:tail_fac2(100000) end)).
6519624
As you can see fact:tail_fac2/1 for N = 100000 takes 6.5s, fact:tail_fac/1 takes 7.2s and fact:fac/1 takes 7.6s. Even faster growth doesn't overturn tail call benefit so tail call version is faster than body recursive one there is clearly seen that slower growth of accumulator in fact:tail_fac2/1 show its impact.
If you choose a different function for tail call optimization testing you can see the impact of tail call optimization more clearly. For example sum:
sum(0) -> 0;
sum(N) when N > 0 -> N + sum(N-1).
tail_sum(N) when is_integer(N), N >= 0 ->
tail_sum(N, 0).
tail_sum(0, Acc) -> Acc;
tail_sum(N, Acc) -> tail_sum(N-1, N+Acc).
And speed is:
1> element(1, timer:tc(fun()-> fact:sum(10000000) end)).
970749
2> element(1, timer:tc(fun()-> fact:sum(10000000) end)).
126288
3> element(1, timer:tc(fun()-> fact:sum(10000000) end)).
113115
4> element(1, timer:tc(fun()-> fact:sum(10000000) end)).
104371
5> element(1, timer:tc(fun()-> fact:sum(10000000) end)).
125857
6> element(1, timer:tc(fun()-> fact:tail_sum(10000000) end)).
92282
7> element(1, timer:tc(fun()-> fact:tail_sum(10000000) end)).
92634
8> element(1, timer:tc(fun()-> fact:tail_sum(10000000) end)).
68047
9> element(1, timer:tc(fun()-> fact:tail_sum(10000000) end)).
87748
10> element(1, timer:tc(fun()-> fact:tail_sum(10000000) end)).
94233
As you can see, there we can easily use N=10000000 and it works pretty fast. Anyway, body recursive function is significantly slower 110ms vs 85ms. You can notice the first run of fact:sum/1 took 9x longer than the rest of runs. It is because of body recursive function consuming a stack. You will not see such effect when you use a tail recursive counterpart. (Try it.) You can see the difference if you run each measurement in a separate process.
1> F = fun(G, N) -> spawn(fun() -> {T, _} = timer:tc(fun()-> fact:G(N) end), io:format("~p took ~bus and ~p heap~n", [G, T, element(2, erlang:process_info(self(), heap_size))]) end) end.
#Fun<erl_eval.13.91303403>
2> F(tail_sum, 10000000).
<0.88.0>
tail_sum took 70065us and 987 heap
3> F(tail_sum, 10000000).
<0.90.0>
tail_sum took 65346us and 987 heap
4> F(tail_sum, 10000000).
<0.92.0>
tail_sum took 65628us and 987 heap
5> F(tail_sum, 10000000).
<0.94.0>
tail_sum took 69384us and 987 heap
6> F(tail_sum, 10000000).
<0.96.0>
tail_sum took 68606us and 987 heap
7> F(sum, 10000000).
<0.98.0>
sum took 954783us and 22177879 heap
8> F(sum, 10000000).
<0.100.0>
sum took 931335us and 22177879 heap
9> F(sum, 10000000).
<0.102.0>
sum took 934536us and 22177879 heap
10> F(sum, 10000000).
<0.104.0>
sum took 945380us and 22177879 heap
11> F(sum, 10000000).
<0.106.0>
sum took 921855us and 22177879 heap
-module(count).
-export([count/1]).
count(L) when is_list(L) ->
do_count(L, #{});
count(_) ->
error(badarg).
do_count([], Acc) -> Acc;
do_count([H|T], #{}) -> do_count(T, #{ H => 1 });
do_count([H|T], Acc = #{ H := C }) -> do_count(T, Acc#{ H := C + 1});
do_count([H|T], Acc) -> do_count(T, Acc#{ H => 1 }).
In this example, the third clause where the map key "H" exists and has a count associated with it, will not compile. The compiler complains:
count.erl:11: variable 'H' is unbound
Why is H unbound?
This works by the way:
do_count([], Acc) -> Acc;
do_count([H|T], Acc) -> do_count(T, maps:update_with(H, fun(C) -> C + 1 end, 1, Acc)).
But it seems like the pattern match ought to work and it doesn't.
The answer is pretty much the same as the one I recently gave here:
https://stackoverflow.com/a/46268109/240949.
When you use the same variable multiple times in a pattern, as with H in this case:
do_count([H|T], Acc = #{ H := C }) -> ...
the semantics of pattern matching in Erlang say that this is as if you had written
do_count([H|T], Acc = #{ H1 := C }) when H1 =:= H -> ...
that is, they are first bound separately, then compared for equality. But a key in a map pattern needs to be known - it can't be a variable like H1, hence the error (exactly as for field size specifiers in binary patterns, in the answer I linked to).
The main difference in this question is that you have a function head with two separate arguments, and you might think that the pattern [H|T] should be matched first, binding H before the second pattern is tried, but there is no such ordering guarantee; it's just as if you had used a single argument with a tuple pattern {[H|T], #{ H := C }}.
Because that kind of match occurs out of context for unification. In fact though it doesn't explicitly forbid this in the docs, the docs do explicitly state only that matches with literals will work in function heads. I believe that there is an effort under way to make this construction work, but not yet.
The issues surrounding unification VS assignment in different contexts within function heads is related to another question about matching internal size values within binaries in function heads that came up the other day.
(Remember, the function head is not just doing assignment, it is also trying to efficiently pick a path of execution. So this isn't actually a straightforward issue.)
All that said, a more Erlangish (and simpler) version of your count/1 function could be:
count(Items) ->
count(Items, #{}).
count([], A) ->
A;
count([H | T], A) ->
NewA = maps:update_with(H, fun(V) -> V + 1 end, 1, A),
count(T, NewA).
The case you are writing against was forseen by the stdlib, and we have a nifty solution in the maps module called maps:update_with/4.
Note that we didn't name count/2 a new name. Unless necessary in the program, it is usually easier to name a helper function with a different arity the same thing when doing explicit recursion. A function's identity is Name/Arity, so these are two totally separate functions whether or not the label is the same. Also, notice that we didn't check the argument type because we have an explicit match in count/2 that can only ever match a list and so will throw a bad_arg exception anyway.
Sometimes you will want polymorphic arguments in Erlang, and typechecking is appropriate. You almost never want defensive code in Erlang, though.
Session with a module called foo:
1> c(foo).
{ok,foo}
2> foo:count([1,3,2,4,4,2,2,2,4,4,1,2]).
#{1 => 2,2 => 5,3 => 1,4 => 4}
BUT
We want to avoid explicit recursion unless there is a call for it, as we have all these nifty listy functional abstractions laying about in the stdlib. What you are really doing is trying to condense a list of values into an arbitrarily aggregated single value and that is by definition a fold. So we could rewrite the above perhaps more idiomatically as:
count2(Items) ->
Count = fun(I, A) -> maps:update_with(I, fun(V) -> V + 1 end, 1, A) end,
lists:foldl(Count, #{}, Items).
And we get:
3> foo:count2([1,3,2,4,4,2,2,2,4,4,1,2]).
#{1 => 2,2 => 5,3 => 1,4 => 4}
Regarding case...
What I wrote about unification in a function head holds -- for function heads because they are a completely blank unification context. Richard's answer provides just the best shorthand for remembering why this is crazy:
f(A, #{A := _})
is equivalent to
f(A, #{B := _}) when B =:= A
And that's just not going to fly. His comparison to tuple matching is spot on.
...but...
In a case where the primary objects have already been assigned this all works just fine. Because, as Richard helpfully mentioned in a comment, there is only one A in the case below.
1> M = #{1 => "one", 2 => "two"}.
#{1 => "one",2 => "two"}
2> F =
2> fun(A) ->
2> case M of
2> #{A := B} -> B;
2> _ -> "Oh noes! Not a key!"
2> end
2> end.
#Fun<erl_eval.6.87737649>
3> F(1).
"one"
4> F(2).
"two"
5> F(3).
"Oh noes! Not a key!"
So that may feel a bit idiosyncratic, but it makes sense based on the rules of matching/unification. And means you can write your do_count/2 the way you did above using a case inside of a function, but not as a set of function heads.
I made up this rule for myself: when using maps in the head of a function clause, the order of matching is not guaranteed. As a result, in your example you can't count on a [H|T] match to provide a value for H.
Several features of maps look like they should work, and Joe Armstrong says they should work, but they don't. It's a dumb part of erlang. Witness my incredulity here: https://bugs.erlang.org/browse/ERL-88
Simpler examples:
do_stuff(X, [X|Y]) ->
io:format("~w~n", [Y]).
test() ->
do_stuff(a, [a,b,c]).
4> c(x).
{ok,x}
5> x:test().
[b,c]
ok
But:
-module(x).
-compile(export_all).
do_stuff(X, #{X := Y}) ->
io:format("~w~n", [Y]).
test() ->
do_stuff(a, #{a => 3}).
8> c(x).
x.erl:4: variable 'X' is unbound
-module(count).
-export([count/1]).
count(L) when is_list(L) ->
do_count(L, #{});
count(_) ->
error(badarg).
do_count([], Acc) -> Acc;
do_count([H|T], #{}) -> do_count(T, #{ H => 1 });
do_count([H|T], Acc = #{ H := C }) -> do_count(T, Acc#{ H := C + 1});
do_count([H|T], Acc) -> do_count(T, Acc#{ H => 1 }).
In this example, the third clause where the map key "H" exists and has a count associated with it, will not compile. The compiler complains:
count.erl:11: variable 'H' is unbound
Why is H unbound?
This works by the way:
do_count([], Acc) -> Acc;
do_count([H|T], Acc) -> do_count(T, maps:update_with(H, fun(C) -> C + 1 end, 1, Acc)).
But it seems like the pattern match ought to work and it doesn't.
The answer is pretty much the same as the one I recently gave here:
https://stackoverflow.com/a/46268109/240949.
When you use the same variable multiple times in a pattern, as with H in this case:
do_count([H|T], Acc = #{ H := C }) -> ...
the semantics of pattern matching in Erlang say that this is as if you had written
do_count([H|T], Acc = #{ H1 := C }) when H1 =:= H -> ...
that is, they are first bound separately, then compared for equality. But a key in a map pattern needs to be known - it can't be a variable like H1, hence the error (exactly as for field size specifiers in binary patterns, in the answer I linked to).
The main difference in this question is that you have a function head with two separate arguments, and you might think that the pattern [H|T] should be matched first, binding H before the second pattern is tried, but there is no such ordering guarantee; it's just as if you had used a single argument with a tuple pattern {[H|T], #{ H := C }}.
Because that kind of match occurs out of context for unification. In fact though it doesn't explicitly forbid this in the docs, the docs do explicitly state only that matches with literals will work in function heads. I believe that there is an effort under way to make this construction work, but not yet.
The issues surrounding unification VS assignment in different contexts within function heads is related to another question about matching internal size values within binaries in function heads that came up the other day.
(Remember, the function head is not just doing assignment, it is also trying to efficiently pick a path of execution. So this isn't actually a straightforward issue.)
All that said, a more Erlangish (and simpler) version of your count/1 function could be:
count(Items) ->
count(Items, #{}).
count([], A) ->
A;
count([H | T], A) ->
NewA = maps:update_with(H, fun(V) -> V + 1 end, 1, A),
count(T, NewA).
The case you are writing against was forseen by the stdlib, and we have a nifty solution in the maps module called maps:update_with/4.
Note that we didn't name count/2 a new name. Unless necessary in the program, it is usually easier to name a helper function with a different arity the same thing when doing explicit recursion. A function's identity is Name/Arity, so these are two totally separate functions whether or not the label is the same. Also, notice that we didn't check the argument type because we have an explicit match in count/2 that can only ever match a list and so will throw a bad_arg exception anyway.
Sometimes you will want polymorphic arguments in Erlang, and typechecking is appropriate. You almost never want defensive code in Erlang, though.
Session with a module called foo:
1> c(foo).
{ok,foo}
2> foo:count([1,3,2,4,4,2,2,2,4,4,1,2]).
#{1 => 2,2 => 5,3 => 1,4 => 4}
BUT
We want to avoid explicit recursion unless there is a call for it, as we have all these nifty listy functional abstractions laying about in the stdlib. What you are really doing is trying to condense a list of values into an arbitrarily aggregated single value and that is by definition a fold. So we could rewrite the above perhaps more idiomatically as:
count2(Items) ->
Count = fun(I, A) -> maps:update_with(I, fun(V) -> V + 1 end, 1, A) end,
lists:foldl(Count, #{}, Items).
And we get:
3> foo:count2([1,3,2,4,4,2,2,2,4,4,1,2]).
#{1 => 2,2 => 5,3 => 1,4 => 4}
Regarding case...
What I wrote about unification in a function head holds -- for function heads because they are a completely blank unification context. Richard's answer provides just the best shorthand for remembering why this is crazy:
f(A, #{A := _})
is equivalent to
f(A, #{B := _}) when B =:= A
And that's just not going to fly. His comparison to tuple matching is spot on.
...but...
In a case where the primary objects have already been assigned this all works just fine. Because, as Richard helpfully mentioned in a comment, there is only one A in the case below.
1> M = #{1 => "one", 2 => "two"}.
#{1 => "one",2 => "two"}
2> F =
2> fun(A) ->
2> case M of
2> #{A := B} -> B;
2> _ -> "Oh noes! Not a key!"
2> end
2> end.
#Fun<erl_eval.6.87737649>
3> F(1).
"one"
4> F(2).
"two"
5> F(3).
"Oh noes! Not a key!"
So that may feel a bit idiosyncratic, but it makes sense based on the rules of matching/unification. And means you can write your do_count/2 the way you did above using a case inside of a function, but not as a set of function heads.
I made up this rule for myself: when using maps in the head of a function clause, the order of matching is not guaranteed. As a result, in your example you can't count on a [H|T] match to provide a value for H.
Several features of maps look like they should work, and Joe Armstrong says they should work, but they don't. It's a dumb part of erlang. Witness my incredulity here: https://bugs.erlang.org/browse/ERL-88
Simpler examples:
do_stuff(X, [X|Y]) ->
io:format("~w~n", [Y]).
test() ->
do_stuff(a, [a,b,c]).
4> c(x).
{ok,x}
5> x:test().
[b,c]
ok
But:
-module(x).
-compile(export_all).
do_stuff(X, #{X := Y}) ->
io:format("~w~n", [Y]).
test() ->
do_stuff(a, #{a => 3}).
8> c(x).
x.erl:4: variable 'X' is unbound
I need to convert Elixir function into Erlang function:
In Elixir I have:
Enum.map(0..n, fn i-> fun(i) end)
And I need to re-write to Erlang.
Any Idea? Thanks
Erlang doesn't have a single generic function that can handle mapping over any data structure like Enum.map in Elixir. The simplest way to do this would be to use lists:seq to generate the list and lists:map:
1> lists:map(fun(X) -> X * X end, lists:seq(0, 10)).
[0,1,4,9,16,25,36,49,64,81,100]
Using list comprehensions:
[ F(X) || X <- lists:seq(0, 10) ].
aka
[ X*X || X <- lists:seq(0, 10) ].