How to avoid multiple iterations as a pattern? - f#

In functional languages (using F#), I am struggling to find a balance between the advantages of functional composition with single-responsibility and getting the performance of single iteration over sequences. Any code pattern suggestions / examples for achieving both?
I don't have a solid background in computational theory and I run into this general pattern over and over: Iterating over a collection and wanting to do side-effects while iterating to avoid further iterations over the same collection or its result set.
A typical example is a "reduce" or "filter" function: There are many times while filtering that I want to take an additional step based on the filter's result, but I'd like to avoid a second enumeration of the filtered results.
Let's take input validation as a simple problem statement:
Named input array
Piped to an "isValid" function filter
Side-effect: Log invalid input names
Pipe valid inputs to further execution
Problem Example
In F#, I might initially write:
inputs
// how to log invalid or other side-effects without messing up isValid??
|> Seq.filter isValid
|> execution
Solution Example #1
With an in-line side-effect, I need something like:
inputs
|> Seq.filter (fun (name,value) ->
let valid = isValid (name,value)
// side-effect
if not valid then
printfn "Invalid argument %s" name
valid
|> execution
Solution Example #2
I could use tuples to do a more pure separation of concerns, but requiring a second iteration:
let validationResults =
inputs
// initial iteration
|> Seq.filter (fun (name,value) ->
let valid = isValid (name,value)
(name,value,valid)
|> execution
// one example of a 2nd iteration...
validationResults
|> Seq.filter (fun (_,_,valid) -> not valid)
|> Seq.map (fun (name,_,_) -> printfn "Invalid argument %s" name)
|> ignore
// another example of a 2nd iteration...
for validationResult in validationResults do
if not valid then
printfn "Invalid argument %s" name
Update 2014-07-23 per Answer
I used this as the solution per the answer. The pattern was to use an aggregate function containing the conditional. There are probably even more elegantly concise ways to express this...
open System
let inputs = [("name","my name");("number","123456");("invalid","")]
let isValidValue (name,value) =
not (String.IsNullOrWhiteSpace(value))
let logInvalidArg (name,value) =
printfn "Invalid argument %s" name
let execution (name,value) =
printfn "Valid argument %s: %s" name value
let inputPipeline input =
match isValidValue input with
| true -> execution input
| false -> logInvalidArg input
inputs |> Seq.iter inputPipeline

Following up on my other answer regarding composition of logging and other side-effects in F#, in this example, you can write a higher-level function for logging, like this:
let log f (name, value) =
let valid = f (name, value)
if not valid then
printfn "Invalid argument %s" name
valid
It has this signature:
f:(string * 'a -> bool) -> name:string * value:'a -> bool
So you can now compose it with the 'real' isValid function like this:
inputs
|> Seq.filter (log isValid)
|> execution
Since the isValid function has the signature name:'a * value:int -> bool it fits the f argument for the log function, and you can partially apply the log function as above.

This doesn't address your concern of iterating the sequence only once (which, for an array, is very cheap anyway), but is, I think, easier to read and clearer:
let valid, invalid = Array.partition isValid inputs
for name, _ in invalid do printfn "Invalid argument %s" name
execution valid

Related

Why does F# not like the type ('a list list) as input?

*I edited my original post to include more info.
I'm working on an F# assignment where I'm supposed to create a function that takes an "any list list" as input and outputs an "any list". It should be able to concatenate a list of lists into a single list.
Here's what my function looks like:
let llst = [ [1] ; [2;3] ; ['d';'e';'f'] ]
let concat (llst:'a list list) : 'a list =
List.concat llst
List.iter (fun elem -> printf "%d " elem) concat
This solution more or less copied directly from microsofts example of using the List.concat function, the only exception being the specification of input/output types.
When i run the code i get this error:
concat.fsx(7,43): error FS0001: This expression was expected to have type
''a list'
but here has type
''b list list -> 'b list'
So it appears that concat is turning my llst into a character list, which i don't understand.
Can anyone help me understand why I'm getting this type error and how I can write a function that takes the types that I need?
The problem is somewhere in your implementation of the concat function. It is hard to say where exactly without seeing your code, but since this is an assignment, it is actually perhaps better to explain what the error message is telling you, so that you can find the issue yourself.
The error message is telling you that the F# type inference algorithm found a place in your code where the actual type of what you wrote does not match the type that is expected in that location. It also tells you what the two mismatching types are. For example, say you write something like this:
let concat (llst:'a list list) : 'a list =
llst
You will get the error you are getting on the second line, because the type of llst is 'a list list (the compiler knows this from the type annotation you give on line 1), but the expected type is the same as the result type of the function which is 'a list - also specified by your type annotation.
So, to help you find the issue - look at the exact place where you are getting an error and try to infer why compiler thinks that the actual type is 'a list list and try to understand why it expects 'a list as the type that should be in this place.
This is correct:
let concat (llst:'a list list) : 'a list =
List.concat llst
However, it's really equivalent to let concat = List.concat
This, however, doesn't compile, the elements of the lists need to be of the same type:
let llst = [ [1] ; [2;3] ; ['d';'e';'f'] ]
This also is problematic:
List.iter (fun elem -> printf "%d " elem) concat
List.iter has two arguments and the second one needs to be a List. However in your case you are (as per compiler error) providing your concat function which is a a' List List -> a' List.
What I suspect you meant to do, is apply the concat function to your llist first:
List.iter (fun elem -> printf "%d " elem) (concat llist)
// or
llist
|> concat
|> List.iter (fun elem -> printf "%d " elem)
However, all of this is perhaps missing the point of the exercise. What perhaps you need to do is implement some simple recursion based on the empty / non-empty state of your list, ie. fill in the blanks from here:
let rec myconcat acc inlist =
match inlist with
| [] -> ??
| elt :: tail -> ??

F# pattern matching on records with optional fields

F#'s 'options' seem a nice way of using the type system to separate data that's known to be present from data which may or may not be present, and I like the way that the match expression enforces that all cases are considered:
match str with
| Some s -> functionTakingString(s)
| None -> "abc" // The compiler would helpfully complain if this line wasn't present
It's very useful that s (opposed to str) is a string rather than a string option.
However, when working with records that have optional fields...
type SomeRecord =
{
field1 : string
field2 : string option
}
...and those records are being filtered, a match expression feels unnecessary, because there's nothing sensible to do in the None case, but this...
let functionTakingSequenceOfRecords mySeq =
mySeq
|> Seq.filter (fun record -> record.field2.IsSome)
|> Seq.map (fun record -> functionTakingString field2) // Won't compile
... won't compile, because although records where field2 isn't populated have been filtered out, the type of field2 is still string option, not string.
I could define another record type, where field2 isn't optional, but that approach seems complicated, and may be unworkable with many optional fields.
I've defined an operator that raises an exception if an option is None...
let inline (|?!) (value : 'a option) (message : string) =
match value with
| Some x -> x
| None -> invalidOp message
...and have changed the previous code to this...
let functionTakingSequenceOfRecords mySeq =
mySeq
|> Seq.filter (fun record -> record.field2.IsSome)
|> Seq.map (fun record -> functionTakingString (record.field2 |?! "Should never happen")) // Now compiles
...but it doesn't seem ideal. I could use Unchecked.defaultof instead of raising an exception, but I'm not sure that's any better. The crux of it is that the None case isn't relevant after filtering.
Are there any better ways of handling this?
EDIT
The very interesting answers have brought record pattern matching to my attention, which I wasn't aware of, and Value, which I'd seen but misunderstood (I see that it throws a NullReferenceException if None). But I think my example may have been poor, as my more complex, real-life problem involves using more than one field from the record. I suspect I'm stuck with something like...
|> Seq.map (fun record -> functionTakingTwoStrings record.field1 record.field2.Value)
unless there's something else?
In this example, you could use:
let functionTakingSequenceOfRecords mySeq =
mySeq
|> Seq.choose (fun { field2 = v } -> v)
|> Seq.map functionTakingString
Seq.choose allows us to filter items based on optional results. Here we you pattern matching on records for more concise code.
I think the general idea is to manipulate option values using combinators, high-order functions until you would like to transform them into values of other types (e.g. using Seq.choose in this case). Using your |?! is discouraged because it is a partial operator (throwing exceptions in some cases). You can argue that it's safe to use in this particular case; but F# type system can't detect it and warn you about unsafe use in any case.
On a side note, I would recommend to take a look at Railway-Oriented Programming series at http://fsharpforfunandprofit.com/posts/recipe-part2/. The series show you type-safe and composable ways to handle errors where you can keep diagnostic information along.
UPDATE (upon your edit):
A revised version of your function is written as follows:
let functionTakingSequenceOfRecords mySeq =
mySeq
|> Seq.choose (fun { field1 = v1; field2 = v2 } ->
v2 |> Option.map (functionTakingString v1))
It demonstrates the general idea I mentioned where you manipulate option values using high-order functions (Option.map) and transform them at a final step (Seq.choose).
Since you have found the IsSome property, you might have seen the Value property as well.
let functionTakingSequenceOfRecords mySeq =
mySeq
|> Seq.filter (fun record -> record.field2.IsSome)
|> Seq.map (fun record -> functionTakingString record.field2.Value )
There's an alternative with pattern matching:
let functionTakingSequenceOfRecords' mySeq =
mySeq
|> Seq.choose (function
| { field2 = Some v } -> functionTakingString v |> Some
| _ -> None )
The problem as I interpreted is that you want to have the type system at all times reflecting the fact that those records in the collection actually contains a string in field2.
I mean, for sure you can use choose to filter out the records you don't care but still you will end up with a collection of records with an optional string and you know all of them would be Some string.
One alternative is to create a generic record like this:
type SomeRecord<'T> =
{
field1 : string
field2 : 'T
}
But then you can't use record expression to clone the record and change the generic type of the record at the same time. You will need to create the new record by hand, which will not be a major problem if the other fields are not so many and the structure is stable.
The other option is to wrap the record in a tuple with the desired value, here's an example:
let functionTakingSequenceOfRecords mySeq =
let getField2 record =
match record with
| {field2 = Some value} -> Some (value, record)
| _ -> None
mySeq
|> Seq.choose getField2
|> Seq.map (fun (f2, {field1 = f1}) -> functionTakingTwoStrings f1 f2)
So here you ignore the content of field2 and use instead the first value in the tuple.
Unless I misunderstood your problem and you don't care pattern matching again, or doing an incomplete matching with a warning or a #nowarn directive or using the .Value property of the option as shown in the other answers.

Passing partial active patterns as arguments?

I'm learning F# by writing a recursive descent parser using active patterns.
Since all my rules or partial active patterns I need to combine them in different manners, but I'm getting really frustrated with the syntax of passing active patterns as parameters.
The following example shows the trouble I'm having:
// Combines two patterns by chaining them.
let (|Chain|_|) (|Pattern1|_|) (* Should I use pipes here? *) (|Pattern2|_|) data =
match data with
|Pattern1 result ->
match result with
|Pattern2 result2 -> Some result2
|_ -> None
|_ -> None
// Stupid test patterns
let (|IfBiggerThan10ThenDouble|_|) value = if value > 10 then Some (value*2) else None
let (|IfLessThan100ThenDouble|_ |) value = if value < 100 then Some (value*2) else None
match 20 with
// Do I need pipes here?
|Chain (IfBiggerThan10ThenDouble IfLessThan100ThenDouble) value -> printfn "%A" value // Should print 80
| _ -> printfn "Did not match"
My main confusion seems to be about the '|' operator. Sometimes it seems to be a part of the type of the pattern and sometimes part of the name.
You do not really need to implement your own chaining of patterns, because you can directly nest the patterns which gives you the required result:
match 20 with
| IfBiggerThan10ThenDouble(IfLessThan100ThenDouble value) -> printfn "%A" value
| _ -> printfn "Did not match"
This will first call the IfBiggerThan10ThenDouble pattern which calculates 20*2 and passes the value to the nested pattern IfLessThan100ThenDouble. This again doubles the value and binds it to the value symbol (when it succeeds).
That said, your implementation of the Chain pattern actually works and can be called like this:
match 20 with
| Chain (|IfBiggerThan10ThenDouble|_|) (|IfLessThan100ThenDouble|_|) value ->
printfn "%A" value // Should print 80
| _ -> printfn "Did not match"
In general, active pattern (|P|_|) is really just a function with a special name. You can treat it as an ordinary function and call it by writing (|P|_|) argument or you can treat it as a value and pass it as an argument to other functions or parameterized active patterns. Your code would work if you implemented Chain as a pattern taking ordinary functions:
let (|Chain|_|) f g data =
f data |> Option.bind (fun r -> g data)
Then Chain <arg1> <arg2> <pat> is just calling the parameterized active pattern with two functions as an argument. When called, it binds the result to the pattern <pat>. In the above example, the two arguments are function values representing the patterns (these could be ordinary functions, but not lambda functions because of syntactic restrictions).

How to invoke the function in Seq.whatever without "printf"?

I'm new to f# and I tried to write a program supposed to go through all files in a given dir and for each file of type ".txt" to add an id number + "DONE" to the file.
my program:
//const:
[<Literal>]
let notImportantString= "blahBlah"
let mutable COUNT = 1.0
//funcs:
//addNumber --> add the sequence number COUNT to each file.
let addNumber (file : string) =
let mutable str = File.ReadAllText(file)
printfn "%s" str//just for check
let num = COUNT.ToString()
let str4 = str + " " + num + "\n\n\n DONE"
COUNT <- COUNT + 1.0
let str2 = File.WriteAllText(file,str4)
file
//matchFunc --> check if is ".txt"
let matchFunc (file : string) =
file.Contains(".txt")
//allFiles --> go through all files of a given dir
let allFiles dir =
seq
{ for file in Directory.GetFiles(dir) do
yield file
}
////////////////////////////
let dir = "D:\FSharpTesting"
let a = allFiles dir
|> Seq.filter(matchFunc)
|> Seq.map(addNumber)
printfn "%A" a
My question:
Tf I do not write the last line (printfn "%A" a) the files will not change.(if I DO write this line it works and change the files)
when I use debugger I see that it doesn't really computes the value of 'a' when it arrives to the line if "let a =......" it continues to the printfn line and than when it "sees" the 'a' there it goes back and computes the answer of 'a'.
why is it and how can i "start" the function without printing??
also- Can some one tells me why do I have to add file as a return type of the function "addNumber"?
(I added this because that how it works but I don't really understand why....)
last question-
if I write the COUNT variable right after the line of the [] definition
it gives an error and says that a constant cannot be "mutable" but if a add (and this is why I did so) another line before (like the string) it "forgets" the mistakes and works.
why that? and if you really cannot have a mutable const how can I do a static variable?
if I do not write the last line (printfn "%A" a) the files will not change.
F# sequences are lazy. So to force evaluation, you can execute some operation not returning a sequence. For example, you can call Seq.iter (have side effects, return ()), Seq.length (return an int which is the length of the sequence) or Seq.toList (return a list, an eager data structure), etc.
Can some one tells me why do I have to add file : string as a return type of the function "addNumber"?
Method and property access don't play nice with F# type inference. The type checker works from left to right, from top to bottom. When you say file.Contains, it doesn't know which type this should be with Contains member. Therefore, your type annotation is a good hint to F# type checker.
if I write the COUNT variable right after the line of the [<Literal>] definition
it gives an error and says that a constant cannot be "mutable"
Quoting from MSDN:
Values that are intended to be constants can be marked with the Literal attribute. This attribute has the effect of causing a value to be compiled as a constant.
A mutable value can change its value at some point in your program; the compiler complains for a good reason. You can simply delete [<Literal>] attribute.
To elaborate on Alex's answer -- F# sequences are lazily evaluated. This means that each element in the sequence is generated "on demand".
The benefit of this is that you don't waste computation time and memory on elements you don't ever need. Lazy evaluation does take a little getting used to though -- specifically because you can't assume order of execution (or that execution will even happen at all).
Your problem has a simple fix: just use Seq.iter to force execution/evaluation of the sequence, and pass the 'ignore' function to it since we don't care about the values returned by the sequence.
let a = allFiles dir
|> Seq.filter(matchFunc)
|> Seq.map(addNumber)
|> Seq.iter ignore // Forces the sequence to execute
Seq.map is intended to map one value to another, not generally to mutate a value. seq<_> represents a lazily generated sequence so, as Alex pointed out, nothing will happen until the sequence is enumerated. This is probably a better fit for codereview, but here's how I would write this:
Directory.EnumerateFiles(dir, "*.txt")
|> Seq.iteri (fun i path ->
let text = File.ReadAllText(path)
printfn "%s" text
let text = sprintf "%s %d\n\n\n DONE" text (i + 1)
File.WriteAllText(path, text))
Seq.map requires a return type, as do all expressions in F#. If a function performs an action, as opposed to computing a value, it can return unit: (). Regarding COUNT, a value cannot be mutable and [<Literal>] (const in C#). Those are precise opposites. For a static variable, use a module-scoped let mutable binding:
module Counter =
let mutable count = 1
open Counter
count <- count + 1
But you can avoid global mutable data by making count a function with a counter variable as a part of its private implementation. You can do this with a closure:
let count =
let i = ref 0
fun () ->
incr i
!i
let one = count()
let two = count()
f# is evaluated from top to bottom, but you are creating only lazy values until you do printfn. So, printfn is actually the first thing that gets executed which in turn executes the rest of your code. I think you can do the same thing if you tack on a println after Seq.map(addNumber) and do toList on it which will force evaluation as well.
This is a general behaviour of lazy sequence. you have the same in, say C# using IEnumerable, for which seq is an alias.
In pseudo code :
var lazyseq = "abcdef".Select(a => print a); //does not do anything
var b = lazyseq.ToArray(); //will evaluate the sequence
ToArray triggers the evaluation of a sequence :
This illustrate the fact that a sequence is just a description, and does not tell you when it will be enumerated : this is in control of the consumer of the sequence.
To go a bit further on the subject, you might want to look at this page from F# wikibook:
let isNebraskaCity_bad city =
let cities =
printfn "Creating cities Set"
["Bellevue"; "Omaha"; "Lincoln"; "Papillion"]
|> Set.ofList
cities.Contains(city)
let isNebraskaCity_good =
let cities =
printfn "Creating cities Set"
["Bellevue"; "Omaha"; "Lincoln"; "Papillion"]
|> Set.ofList
fun city -> cities.Contains(city)
Most notably, Sequence are not cached (although you can make them so). You see then that the dintinguo between the description and the runtime behaviour can have important consequence as the sequence itself is recomputed which can incur a very high cost and introduce quadratic number of operations if each value is itself linear to get !

Exception handling in pipeline sequence

I' working on a basic 2D CAD engine and the pipeline operator significantly improved my code. Basically several functions start with a point (x,y) in space and compute a final position after a number of move operations:
let finalPosition =
startingPosition
|> moveByLengthAndAngle x1 a1
|> moveByXandY x2 y2
|> moveByXandAngle x3 a3
|> moveByLengthAndAngle x4 a4
// etc...
This is incredibly easy to read and I'd like to keep it that way. The various x1, a1, etc. obviously have a meaning name in the real code.
Now the new requirement is to introduce exception handling. A big try/with around the whole operation chain is not enough because I'd like to know which line caused the exception. I need to know which argument is invalid, so that the user knows what parameter must be changed.
For example if the first line (moveByLengthAndAngle x1 a1) raises an exception, I'd like to tell something like "Hey, -90 is an invalid value for a1! a1 must be between 45 and 90!". Given that many operations of the same type can be used in the sequence it's not enough to define a different exception type for each operation (in this example I wouldn't be able to tell if the error was the first or the last move).
The obvious solution would be to split the chain in single let statements, each within its respective try/with. This however would make my beautiful and readable code a bit messy, not so readable anymore.
Is there a way to satisfy this requirement without sacrificing the readability and elegance of the current code?
(note. right now every moveBy function raises an exception in case of errors, but I'm free to change for ex. to return an option, a bigger tuple, or just anything else if needed).
The solution that Rick described is only going to handle exceptions that are raised when evaluating the arguments of the functions in the pipeline. However, it will not handle the exceptions that are raised by the pipelined functions (as described in answer to your other question).
For example, let's say you have these simple functions:
let times2 n = n * 2
let plus a b = a + b
let fail n = failwith "inside fail"
10 // This will handle exception that happens when evaluating arguments
|> try plus (failwith "evaluating args") with _ -> 0
|> times2
|> try fail with _ -> 0 // This will not handle the exception from 'fail'!
To solve this, you can write a function that wraps any other function in an exception handler. The idea that your protect function will take a function (such as times2 or fail) and will return a new function that takes the input from the pipeline (number) and passes it to the function (times2 or fail), but will do this inside exception handler:
let protect msg f =
fun n ->
try
f n
with _ ->
// Report error and return 0 to the pipeline (do something smarter here!)
printfn "Error %s" msg
0
Now you can protect each function in the pipeline and it will also handle exceptions that happen when evaluating these functions:
let n =
10 |> protect "Times" times2
|> protect "Fail" fail
|> protect "Plus" (plus 5)
How about folding over Choices? Let's say that instead of pipelining the actions, you represent them like this:
let startingPosition = 0. ,0.
let moveByLengthAndAngle l a (x,y) = x,y // too lazy to do the math
let moveByXandY dx dy (x,y) =
//failwith "oops"
x+dx, y+dy
let moveByXandAngle dx a (x,y) = x+dx, y
let actions =
[
moveByLengthAndAngle 0. 0., "failed first moveByLengthAndAngle"
moveByXandY 1. 2., "failed moveByXandY"
moveByXandY 3. 4., "failed moveByXandY"
moveByXandAngle 3. 4., "failed moveByXandAngle"
moveByLengthAndAngle 4. 5., "failed second moveByLengthAndAngle"
]
i.e. actions is of type ((float * float -> float * float) * string) list.
Now, using FSharpx we lift the actions to Choice and fold/bind (not sure how to call it this is similar to foldM in Haskell) over the actions:
let folder position (f,message) =
Choice.bind (Choice.protect f >> Choice.mapSecond (konst message)) position
let finalPosition = List.fold folder (Choice1Of2 startingPosition) actions
finalPosition is of type Choice<float * float, string> , i.e. it's either the final result of all those functions, or an error (as defined in the table above).
Explanation for this last snippet:
Choice.protect is similar to Tomas' protect, except that when it finds an exception, it returns the exception wrapped in a Choice2Of2. When there's no exception, it returns the result wrapped in a Choice1Of2.
Choice.mapSecond changes this potential exception in Choice2Of2 with the error message defined in the table of actions. Instead of (konst message) this could also be a function that builds the error message using the exception.
Choice.bind runs this "protected" action against the current position. It will not run the actual action if the current position is in error (i.e. a Choice2Of2).
Finally, the fold applies all actions threading along / accumulating the resulting Choice (either the current position or an error).
So now we just have to pattern match to handle each case (correct result or error):
match finalPosition with
| Choice1Of2 (x,y) ->
printfn "final position: %f,%f" x y
| Choice2Of2 error ->
printfn "error: %s" error
If you uncomment failwith "oops" above, finalPosition will be a Choice2Of2 "failed moveByXandY"
There's a lot of ways to approach this, the easiest would be to just wrap each call in a try-with block:
let finalPosition =
startingPosition
|> (fun p -> try moveByLengthAndAngle x1 a1 p with ex -> failwith "failed moveByLengthAndAngle")
|> (fun p -> try moveByXandY x2 y2 p with ex -> failwith "failed moveByXandY")
|> (fun p -> try moveByXandAngle x3 a3 p with ex -> failwith "failed moveByXandAngle")
|> (fun p -> try moveByLengthAndAngle x4 a4 p with ex -> failwith "failed moveByLengthAndAngle")
// etc...
Behold the power of expression oriented programming :).
Unfortunately, if you're pipelining over a sequence it becomes much more difficult as:
What happens in the pipeline (for Seqs) is composition, not execution.
Exception handling inside an IEnumerable is undefined and so depends on the implementation of the Enumerator.
The only safe way is to make sure the internals of each sequence operation are wrapped.
Edit: Wow, I can't believe I messed that up. It's fixed now but I do think that the two other solutions are cleaner.
I am unclear why
Now the new requirement is to introduce exception handling. A big
try/with around the whole operation chain is not enough because I'd
like to know which line caused the exception. I need to know which
argument is invalid, so that the user knows what parameter must be
changed.
the debugger isn't sufficient for this. This sounds like a design-time bug in the user's code; each of these methods might throw ArgumentException and nothing would handle it (it would crash the app), and the programmer would debug and see the method/stack that threw the exception, and the exception text would have the argument name.
(Or maybe this is FSI/scripting typically?)
Why not just put the exception handling in the function calls and throw them. Wouldn't this break the code. Then in your function that calls this, catch the error and display to user.

Resources