Matching Patterns of Objects in F# - f#

I have zero experience with F#. I started reading F# for C# Developers this weekend.
I had a problem recently (C# Code) where I had to search a list of .NET objects for subsets of objects that matched a certain pattern.
I described it as "regex style matching for objects" in the question. And I did get a great solution that I was happy with, and which seems to perform quite well.
But my recent interest in F# lead me to wonder whether functional programming might have an even better solution.
How would you solve this problem in F#?
Regex Style Pattern Matching in .NET Object Lists

One interesting functional concept that might be useful here is parser combinators. The idea is that you use composable functions that describe how to read some input and then compose parsers for complex patterns from a couple of primitives.
Parsers normally work over strings, but there is no reason why you couldn't use the same method to read a sequence of chromosomes.
With parser combinators, you would rewrite the "regex" as a composition of F# functions. Something like:
sequence [
zeroOrMore (chromosomeType R)
repeat 3 (chromosomeType B)
zeroOrMore (chromosomeType D) ]
This would give you a function that takes a list of chromosome objects and returns the parsed result - the subset of the list. The idea is that functions like zeroOrMore build parsers that detect certain patterns and you compose functions to build a parser - you get a function that you can just run on the input to parse it.
Explaining parser combinators is a bit too long for a SO answer, but it would probably be the most idiomatic F# approach to solving the problem.

This is a comment posted as an answer because it is to long for a comment.
Since it seems that Regular Expressions are working for your problem and that you might suspect that F# pattern matching or active patterns might be a better solution I will add some understanding.
The input R+B{3}D+ as I see it is a formal grammar, and as such will require a parser and evaluation engine, or the creation of something like a finite state machine. Since you know already that .Net Regex can solve this problem, why go through all of the trouble to recreate this with F#. Even using F# pattern matching and active patterns will not be easier than using RegEx.
Thus the problem is basically to convert the C# code to F# code and make use of RegEx. So you are asking us to translate your C# to F# and that is not a valid SO question.
EDIT
As Mark Seemann noted in a comment the only input we have is R+B{3}D+. So if your actual grammar is more complicated than what RegEx can handle then there might be a better solution in F#.
Hopefully this helps you understand what you are asking.

Building on the other answers: Start by implementing a parser combinator allowing composition of the type Parser<'a list>. Add parsers for the minimal grammar required by R+B{3}D+.
type Result<'T> = Success of 'T | Failure
type Parser<'T> = Parser of ('T -> Result<'T * 'T>)
module ListParser =
let (.>>.) (Parser f1) (Parser f2) =
Parser <| fun input ->
match f1 input with
| Failure -> Failure
| Success(value1, rest1) ->
match f2 rest1 with
| Failure -> Failure
| Success(value2, rest2) ->
Success(value1 # value2, rest2)
let oneOrMore what =
let rec aux gotOne acc = function
| x::xs when what = x -> aux true (x::acc) xs
| xss when gotOne -> Success(List.rev acc, xss)
| _ -> Failure
Parser <| aux false []
let exactly what n =
let rec aux i acc = function
| xss when i = 0 -> Success(List.rev acc, xss)
| x::xs when what = x -> aux (i - 1) (x::acc) xs
| _ -> Failure
Parser <| aux n []
Finally create a function which will run the parser repeatedly until the input list is exhausted.
open ListParser
let runForall (Parser f) xss =
let rec aux n acc xss =
match xss, f xss with
| [], _ -> List.rev acc
| _::xs, Failure -> aux (n + 1) acc xs
| _, Success(value, rest) ->
aux (n + List.length value) ((n + 1, value)::acc) rest
aux 0 [] xss
type ChromosomeType = R | B | D
[D;R;R;B;B;B;D;D;B;R;R;B;B;B;D;D;R;R;B;B;B;D;D]
|> runForall (oneOrMore R .>>. exactly B 3 .>>. oneOrMore D)
// val it : (int * ChromosomeType list) list =
// [(2, [R; R; B; B; B; D; D]); (10, [R; R; B; B; B; D; D]);
// (17, [R; R; B; B; B; D; D])]

Related

How to write efficient list/seq functions in F#? (mapFoldWhile)

I was trying to write a generic mapFoldWhile function, which is just mapFold but requires the state to be an option and stops as soon as it encounters a None state.
I don't want to use mapFold because it will transform the entire list, but I want it to stop as soon as an invalid state (i.e. None) is found.
This was myfirst attempt:
let mapFoldWhile (f : 'State option -> 'T -> 'Result * 'State option) (state : 'State option) (list : 'T list) =
let rec mapRec f state list results =
match list with
| [] -> (List.rev results, state)
| item :: tail ->
let (result, newState) = f state item
match newState with
| Some x -> mapRec f newState tail (result :: results)
| None -> ([], None)
mapRec f state list []
The List.rev irked me, since the point of the exercise was to exit early and constructing a new list ought to be even slower.
So I looked up what F#'s very own map does, which was:
let map f list = Microsoft.FSharp.Primitives.Basics.List.map f list
The ominous Microsoft.FSharp.Primitives.Basics.List.map can be found here and looks like this:
let map f x =
match x with
| [] -> []
| [h] -> [f h]
| (h::t) ->
let cons = freshConsNoTail (f h)
mapToFreshConsTail cons f t
cons
The consNoTail stuff is also in this file:
// optimized mutation-based implementation. This code is only valid in fslib, where mutation of private
// tail cons cells is permitted in carefully written library code.
let inline setFreshConsTail cons t = cons.(::).1 <- t
let inline freshConsNoTail h = h :: (# "ldnull" : 'T list #)
So I guess it turns out that F#'s immutable lists are actually mutable because performance? I'm a bit worried about this, having used the prepend-then-reverse list approach as I thought it was the "way to go" in F#.
I'm not very experienced with F# or functional programming in general, so maybe (probably) the whole idea of creating a new mapFoldWhile function is the wrong thing to do, but then what am I to do instead?
I often find myself in situations where I need to "exit early" because a collection item is "invalid" and I know that I don't have to look at the rest. I'm using List.pick or Seq.takeWhile in some cases, but in other instances I need to do more (mapFold).
Is there an efficient solution to this kind of problem (mapFoldWhile in particular and "exit early" in general) with functional programming concepts, or do I have to switch to an imperative solution / use a Collections.Generics.List?
In most cases, using List.rev is a perfectly sufficient solution.
You are right that the F# core library uses mutation and other dirty hacks to squeeze some more performance out of the F# list operations, but I think the micro-optimizations done there are not particularly good example. F# list functions are used almost everywhere so it might be a good trade-off, but I would not follow it in most situations.
Running your function with the following:
let l = [ 1 .. 1000000 ]
#time
mapFoldWhile (fun s v -> 0, s) (Some 1) l
I get ~240ms on the second line when I run the function without changes. When I just drop List.rev (so that it returns the data in the other order), I get around ~190ms. If you are really calling the function frequently enough that this matters, then you'd have to use mutation (actually, your own mutable list type), but I think that is rarely worth it.
For general "exit early" problems, you can often write the code as a composition of Seq.scan and Seq.takeWhile. For example, say you want to sum numbers from a sequence until you reach 1000. You can write:
input
|> Seq.scan (fun sum v -> v + sum) 0
|> Seq.takeWhile (fun sum -> sum < 1000)
Using Seq.scan generates a sequence of sums that is over the whole input, but since this is lazily generated, using Seq.takeWhile stops the computation as soon as the exit condition happens.

Composing 2 (or n) ('a -> unit) functions with same arg type

Is there some form of built-in / term I don't know that kinda-but-its-different 'composes' two 'a -> unit functions to yield a single one; e.g.:
let project event =
event |> logDirections
event |> stashDirections
let dispatch (batch:EncodedEventBatch) =
batch.chooseOfUnion () |> Seq.iter project
might become:
let project = logDirections FOLLOWEDBY stashDirections
let dispatch (batch:EncodedEventBatch) =
batch.chooseOfUnion () |> Seq.iter project
and then:
let dispatch (batch:EncodedEventBatch) =
batch.chooseOfUnion () |> Seq.iter (logDirections FOLLOWEDBY stashDirections)
I guess one might compare it to tee (as alluded to in FSFFAP's Railway Oriented Programming series).
(it needs to pass the same arg to both and I'm seeking to run them sequentially without any exception handling trickery concerns etc.)
(I know I can do let project fs arg = fs |> Seq.iter (fun f -> f arg) but am wondering if there is something built-in and/or some form of composition lib I'm not aware of)
The apply function from Klark is the most straightforward way to solve the problem.
If you want to dig deeper and understand the concept more generally, then you can say that you are lifting the sequential composition operation from working on values to work on functions.
First of all, the ; construct in F# can be viewed as sequential composition operator. Sadly, you cannot quite use it as one, e.g. (;) (because it is special and lazy in the second argument) but we can define our own operator instead to explore the idea:
let ($) a b = a; b
So, printfn "hi" $ 1 is now a sequential composition of a side-effecting operation and some expression that evaluates to 1 and it does the same thing as printfn "hi"; 1.
The next step is to define a lifting operation that turns a binary operator working on values to a binary operator working on functions:
let lift op g h = (fun a -> op (g a) (h a))
Rather than writing e.g. fun x -> foo x + bar x, you can now write lift (+) foo bar. So you have a point-free way of writing the same thing - just using operation that works on functions.
Now you can achieve what you want using the lift function and the sequential composition operator:
let seq2 a b = lift ($) a b
let seq3 a b c = lift ($) (lift ($) a b) c
let seqN l = Seq.reduce (lift ($)) l
The seq2 and seq3 functions compose just two operations, while seqN does the same thing as Klark's apply function.
It should be said that I'm writing this answer not because I think it is useful to implement things in F# in this way, but as you mentioned railway oriented programming and asked for deeper concepts behind this, it is interesting to see how things can be composed in functional languages.
Can you just apply an array of functions to a given data?
E.g. you can define:
let apply (arg:'a) (fs:(('a->unit) seq)) = fs |> Seq.iter (fun f -> f arg)
Then you will be able to do something like this:
apply 1 [(fun x -> printfn "%d" (x + 1)); (fun y -> printfn "%d" (y + 2))]

When executed will this be a tail call?

Once compiled and ran will this behave as a tail call?
let rec f accu = function
| [] -> accu
| h::t -> (h + accu) |> f <| t
Maybe there is an easy way to test behavior that I'm not aware of, but that might be another question.
I think this is much easier to see if you do not use the pipelining operator. In fact, the two pipelining operators are defined as inline, so the compiler will simplify the code to the following (and I think this version is also more readable and simpler to understand, so I would write this):
let rec f accu = function
| [] -> accu
| h::t -> f (h + accu) t
Now, you can read the definition of tail-call on Wikipedia, which says:
A tail call is a subroutine call that happens inside another procedure as its final action; it may produce a return value which is then immediately returned by the calling procedure.
So yes, the call to f on the last line is a tail-call.
If you wanted to analyze the original expression (h + accu) |> f <| t (without knowing that the pipelining operators are inlined), then this actually becomes ((h + accu) |> f) <| t. This means that the expression calls the <| operator with two arguments and returns the result - so the call to <| is a tail call.
The <| operator is defined like this:
let (<|) f a = f a
Now, the call to f inside the pipelining operator is also a tail-call (and similarly for the other pipelining operator).
In summary, if the compiler did not do inlining, you would have a sequence of three tail-calls (compiled using the .NET .tail instruction to make sure that .NET performs a tail-call). However, since the compiler performs inlining, it will see that you have a recursive tail-call (f calling f) and this can be more efficiently compiled into a loop. (But calls across multiple functions or operators cannot use loops that easily.)
If you look at the answer here, you will notice that f <| t is the same as f t (it only makes a difference if you put an expression in place of t which requires parenthesis).
Likewise x |> y is the same as y x.
This results in an equivalent expression which looks like this: f (h + accu) t, So (assuming the compiler doesn't have a bug or some such), your function should be tail recursive and most likely will be compiled down to a loop of some sort.

Return item at position x in a list

I was reading this post While or Tail Recursion in F#, what to use when? were several people say that the 'functional way' of doing things is by using maps/folds and higher order functions instead of recursing and looping.
I have this function that returns the item at position x in a list:
let rec getPos l c = if c = 0 then List.head l else getPos (List.tail l) (c - 1)
how can it be converted to be more functional?
This is a primitive list function (also known as List.nth).
It is okay to use recursion, especially when creating the basic building blocks. Although it would be nicer with pattern matching instead of if-else, like this:
let rec getPos l c =
match l with
| h::_ when c = 0 -> h
| _::t -> getPos t (c-1)
| [] -> failwith "list too short"
It is possible to express this function with List.fold, however the result is less clear than the recursive version.
I'm not sure what you mean by more functional.
Are you rolling this yourself as a learning exercise?
If not, you could just try this:
> let mylist = [1;2;3;4];;
> let n = 2;;
> mylist.[n];;
Your definition is already pretty functional since it uses a tail-recursive function instead of an imperative loop construct. However, it also looks like something a Scheme programmer might have written because you're using head and tail.
I suspect you're really asking how to write it in a more idiomatic ML style. The answer is to use pattern matching:
let rec getPos list n =
match list with
| hd::tl ->
if n = 0 then hd
else getPos tl (n - 1)
| [] -> failWith "Index out of range."
The recursion on the structure of the list is now revealed in the code. You also get a warning if the pattern matching is non-exhaustive so you're forced to deal with the index too big error.
You're right that functional programming also encourages the use of combinators like map or fold (so called points-free style). But too much of it just leads to unreadable code. I don't think it's warranted in this case.
Of course, Benjol is right, in practice you would just write mylist.[n].
If you'd like to use high-order functions for this, you could do:
let nth n = Seq.take (n+1) >> Seq.fold (fun _ x -> Some x) None
let nth n = Seq.take (n+1) >> Seq.reduce (fun _ x -> x)
But the idea is really to have basic constructions and combine them build whatever you want. Getting the nth element of a sequence is clearly a basic block that you should use. If you want the nth item, as Benjol mentioned, do myList.[n].
For building basic constructions, there's nothing wrong to use recursion or mutable loops (and often, you have to do it this way).
Not as a practical solution, but as an exercise, here is one of the ways to express nth via foldr or, in F# terms, List.foldBack:
let myNth n xs =
let step e f = function |0 -> e |n -> f (n-1)
let error _ = failwith "List is too short"
List.foldBack step xs error n

Is my rec function tail recursive?

Is this function tail-recursive ?
let rec rec_algo1 step J =
if step = dSs then J
else
let a = Array.init (Array2D.length1 M) (fun i -> minby1J i M J)
let argmin = a|> Array.minBy snd |> fst
rec_algo1 (step+1) (argmin::J)
In general, is there a way to formally check it ?
Thanks.
This function is tail-recursive; I can tell by eyeballing it.
In general it is not always easy to tell. Perhaps the most reliable/pragmatic thing is just to check it on a large input (and make sure you are compiling in 'Release' mode, as 'Debug' mode turns off tail calls for better debugging).
Yes, you can formally prove that a function is tail-recursive. Every expression reduction has a tail-position, and if all recursions are in tail-positions then the function is tail-recursive. It's possible for a function to be tail-recursive in one place, but not in another.
In the expression let pat = exprA in exprB only exprB is in tail-position. That is, while you can go evaluate exprA, you still have to come back to evaluate exprB with exprA in mind. For every expression in the language, there's a reduction rule that tells you where the tail position is. In ExprA; ExprB it's ExprB again. In if ExprA then ExprB else ExprC it's both ExprB and ExprC and so on.
The compiler of course knows this as it goes. However the many expressions available in F# and the many internal optimizations carried out by the compiler as it goes, e.g. during pattern match compiling, computation expressions like seq{} or async{} can make knowing which expressions are in tail-position non-obvious.
Practically speaking, with some practice it's easy for small functions to determine a tail call by just looking at your nested expressions and checking the slots which are NOT in tail positions for function calls. (Remember that a tail call may be to another function!)
You asked how we can formally check this so I'll have a stab. We first have to define what it means for a function to be tail-recursive. A recursive function definition of the form
let rec f x_1 ... x_n = e
is tail recursive if all calls of f inside e are tail calls - ie. occur in a tail context. A tail context C is defined inductively as a term with a hole []:
C ::= []
| e
| let p = e in C
| e; C
| match e with p_1 -> C | ... | p_n -> C
| if e then C else C
where e is an F# expression, x is a variable and p is a pattern. We ought to expand this to mutually recursive function definitions but I'll leave that as an exercise.
Lets now apply this to your example. The only call to rec_algo1 in the body of the function is in this context:
if step = dSs then J
else
let a = Array.init (Array2D.length1 M) (fun i -> minby1J i M J)
let argmin = a|> Array.minBy snd |> fst
[]
And since this is a tail context, the function is tail-recursive. This is how functional programmers eyeball it - scan the body of the definition for recursive calls and then verify that each occurs in a tail context. A more intuitive definition of a tail call is when nothing else is done with the result of the call apart from returning it.

Resources