I have the following Discriminated Union (DU) declaration:
type Book =
| Dictionary of string[]
| Novel of int[]
| Comics of bool[]
An example:
let x = Dictionary [|"a"; "b"|]
How can I extract the length of the array inside without doing pattern matching and without caring about the data type of the array (in this case: string, int, bool). Note: I have no control over the DU declaration; as a result, I can't write new member method within Book, like getArrayLength()
Of course, we can do it in some way as followed:
match x with
| Dictionary (x: _[]) -> x |> Array.length
| Novel (x: _[]) -> x |> Array.length
| Comics (x: _[]) -> x |> Array.length
But typing x |> Array.length a lot is incovenient. This is a simple example, but we can think of a general problem:
type Animal =
| Dog of DogClass
| Cat of CatClass
| Cow of CowClass
...
... and DogClass, CatClass, etc. may share something. We want to get that shared thing. E.g. those classes inherit from AnimalClass, within which there is countLegs() method. Suppsed there are many animals, pattern matching for all of them while the code block after -> is almost the same. I love the principle DRY (Don't Repeat Yourself).
Is there any convenient way to tackle such problem?
==
EDITED 21.10.2019
I was also looking for some syntax like:
let numEles =
match x with
| _ (arr: _[]) -> x |> Array.Length
| _ -> failwith "No identifiers with fields as Array."
let numLegs =
match anAnimall with
| _ (animal: ?> Animal) -> animal.countLegs()
| _ -> failwith "Can't count legs because of not being an animal."
I think this still follows the spirit of matching, but seem like this approach is not supported.
Realistically, there's no getting around pattern matching here. DUs were, in a way, built for it. Since you don't control the type, you can always add a type extension:
type Book with
member this.Length =
match this with
| Dictionary d -> d.Length
| Novel n -> n.Length
| Comics c -> c.Length
let x = Dictionary [|"a"; "b"|]
printfn "%d" x.Length // Prints 2
Though it's also equally valid to define a Book module with a length function on it if you prefer that:
module Book =
let length b =
match b with
| Dictionary d -> d.Length
| Novel n -> n.Length
| Comics c -> c.Length
let x = Dictionary [|"a"; "b"|]
printfn "%d" (x |> Book.length) // prints 2
But you'll need to write a pattern match expression on the Book type at least once. The fact that every case is made up of data that all has the same property doesn't really help the fact that you need to still identify every case individually.
Poeple often use
for i in [0 .. 10] do something
but afaik that creates a list which is then iterated through, it appears to me it would make more sense to use
for i = 0 to 10 do something
without creating that unnecessary list but having the same behaviour.
Am I missing something? (I guess that's the case)
You are correct, writing for i in [0 .. 10] do something generates a list and it does have a significant overhead. Though you can also omit the square brackets, in which case it just builds a lazy sequence (and, it turns out that the compiler even optimizes that case). I generally prefer writing in 0 .. 100 do because it looks the same as code that iterates over a sequence.
Using the #time feature of F# interactive to do a simple analysis:
for i in [ 0 .. 10000000 ] do // 3194ms (yikes!)
last <- i
for i in 0 .. 10000000 do // 3ms
last <- i
for i = 0 to 10000000 do // 3ms
last <- i
for i in seq { 0 .. 10000000 } do // 709ms (smaller yikes!)
last <- i
So, it turns out that the compiler actually optimizes the in 0 .. 10000000 do into the same thing as the 0 to 10000000 do loop. You can force it to create the lazy sequence explicitly (last case) which is faster than a list, but still very slow.
Giving a somewhat different kind of answer but hopefully interesting to some
You are correct in that the F# compiler fails to apply the fast-for-loop optimization in this case. Good news, the F# compiler is open source and it's possible for us to improve upon it's behavior.
So here's a freebie from me:
fast-for-loop optimization happens in tastops.fs. It's rather primitive at the moment, great opportunity for us to improve upon.
// Detect the compiled or optimized form of a 'for <elemVar> in <startExpr> .. <finishExpr> do <bodyExpr>' expression over integers
// Detect the compiled or optimized form of a 'for <elemVar> in <startExpr> .. <step> .. <finishExpr> do <bodyExpr>' expression over integers when step is positive
let (|CompiledInt32ForEachExprWithKnownStep|_|) g expr =
match expr with
| Let (_enumerableVar, RangeInt32Step g (startExpr, step, finishExpr), _,
Let (_enumeratorVar, _getEnumExpr, spBind,
TryFinally (WhileLoopForCompiledForEachExpr (_guardExpr, Let (elemVar,_currentExpr,_,bodyExpr), m), _cleanupExpr))) ->
let spForLoop = match spBind with SequencePointAtBinding(spStart) -> SequencePointAtForLoop(spStart) | _ -> NoSequencePointAtForLoop
Some(spForLoop,elemVar,startExpr,step,finishExpr,bodyExpr,m)
| _ ->
None
let DetectFastIntegerForLoops g expr =
match expr with
| CompiledInt32ForEachExprWithKnownStep g (spForLoop,elemVar,startExpr,step,finishExpr,bodyExpr,m)
// fast for loops only allow steps 1 and -1 steps at the moment
when step = 1 || step = -1 ->
mkFastForLoop g (spForLoop,m,elemVar,startExpr,(step = 1),finishExpr,bodyExpr)
| _ -> expr
The problem here is that RangeInt32Step only detects patterns like 0..10 and 0..1..10. It misses for instance [0..10]
Let's introduce another active pattern SeqRangeInt32Step that matches these kind of expressions:
let (|SeqRangeInt32Step|_|) g expr =
match expr with
// detect '[n .. m]'
| Expr.App(Expr.Val(toList,_,_),_,[TType_var _],
[Expr.App(Expr.Val(seq,_,_),_,[TType_var _],
[Expr.Op(TOp.Coerce, [TType_app (seqT, [TType_var _]); TType_var _],
[RangeInt32Step g (startExpr, step, finishExpr)], _)],_)],_)
when
valRefEq g toList (ValRefForIntrinsic g.seq_to_list_info) &&
valRefEq g seq g.seq_vref &&
tyconRefEq g seqT g.seq_tcr ->
Some(startExpr, step, finishExpr)
| _ -> None
How do you figure out that this is what you need to pattern match for? The approach I often take is that I do a simple F# program with the right properties and put a breakpoint during compilation to inspect the expression. From that I create the pattern to match for:
Let's put the two patterns together:
let (|ExtractInt32Range|_|) g expr =
match expr with
| RangeInt32Step g range -> Some range
| SeqRangeInt32Step g range -> Some range
| _ -> None
CompiledInt32ForEachExprWithKnownStep is updated to use ExtractInt32Range over RangeInt32Step
The complete solution would be something like this:
let (|SeqRangeInt32Step|_|) g expr =
match expr with
// detect '[n .. m]'
| Expr.App(Expr.Val(toList,_,_),_,[TType_var _],
[Expr.App(Expr.Val(seq,_,_),_,[TType_var _],
[Expr.Op(TOp.Coerce, [TType_app (seqT, [TType_var _]); TType_var _],
[RangeInt32Step g (startExpr, step, finishExpr)], _)],_)],_)
when
valRefEq g toList (ValRefForIntrinsic g.seq_to_list_info) &&
valRefEq g seq g.seq_vref &&
tyconRefEq g seqT g.seq_tcr ->
Some(startExpr, step, finishExpr)
| _ -> None
let (|ExtractInt32Range|_|) g expr =
match expr with
| RangeInt32Step g range -> Some range
| SeqRangeInt32Step g range -> Some range
| _ -> None
// Detect the compiled or optimized form of a 'for <elemVar> in <startExpr> .. <finishExpr> do <bodyExpr>' expression over integers
// Detect the compiled or optimized form of a 'for <elemVar> in <startExpr> .. <step> .. <finishExpr> do <bodyExpr>' expression over integers when step is positive
let (|CompiledInt32ForEachExprWithKnownStep|_|) g expr =
match expr with
| Let (_enumerableVar, ExtractInt32Range g (startExpr, step, finishExpr), _,
Let (_enumeratorVar, _getEnumExpr, spBind,
TryFinally (WhileLoopForCompiledForEachExpr (_guardExpr, Let (elemVar,_currentExpr,_,bodyExpr), m), _cleanupExpr))) ->
let spForLoop = match spBind with SequencePointAtBinding(spStart) -> SequencePointAtForLoop(spStart) | _ -> NoSequencePointAtForLoop
Some(spForLoop,elemVar,startExpr,step,finishExpr,bodyExpr,m)
| _ ->
None
Using a simple test program
let print v =
printfn "%A" v
[<EntryPoint>]
let main argv =
for x in [0..10] do
print x
0
Before the optimization the corresponding C# code would look something like this (IL code is better to inspect but can be a bit hard to understand if one is unused to it):
// Test
[EntryPoint]
public static int main(string[] argv)
{
FSharpList<int> fSharpList = SeqModule.ToList<int>(Operators.CreateSequence<int>(Operators.OperatorIntrinsics.RangeInt32(0, 1, 10)));
IEnumerator<int> enumerator = ((IEnumerable<int>)fSharpList).GetEnumerator();
try
{
while (enumerator.MoveNext())
{
Test.print<int>(enumerator.Current);
}
}
finally
{
IDisposable disposable = enumerator as IDisposable;
if (disposable != null)
{
disposable.Dispose();
}
}
return 0;
}
F# creates a list and then uses the enumerator to iterate over it. No wonder it's rather slow compared to a classic for-loop.
After the optimization is applied we get this code:
// Test
[EntryPoint]
public static int main(string[] argv)
{
for (int i = 0; i < 11; i++)
{
Test.print<int>(i);
}
return 0;
}
A significant improvement.
So steal this code, post a PR to https://github.com/Microsoft/visualfsharp/ and bask in glory. Of course you need to add unit tests and emitted IL code tests which can be somewhat tricky to find the right level for, check this commit for inspiration
PS. Probably should support [|0..10|] as well seq {0..10} as well
PS. In addition for v in 0L..10L do print v as well as for v in 0..2..10 do print v is also inefficiently implemented in F#.
The former form requires a special construct in the language (for var from ... to ... by), it is the way followed by ancient programming languages :
'do' loop in Fortran
for var:= expr to expr in Pascal
etc.
The latter form (for var in something) is more général. It works on plain lists, but also with generators (like in python) etc. A construction of the full list may not be needed before running the list. This allows to write loops on potentially infinite lists.
Anyway, a decent compiler/interpreter should recognize the rather frequent special case [expr1..expr2] and avoid the computation and storage of the intermediate list.
following prints "false"
let e1 = <## let d = 1 in d+1 ##>
let e2 = <## let d = 1 in d+1 ##>
printfn "%A" (e1 = e2)
The reason is that Var nodes are compared by pointer reference and not by structural equality.
Is there already implemented a way to compare quotations intuitively?
There are many reasons why comparing quotations does not "work" by default:
Quotations can contain references to values for which comparison may not be defined (e.g. if you create a quotation that captures some .NET object that does not support comparison).
Quotations contain information about the location in the source code - so your two quotations are different simply because they are on different lines!
There is a question whether you want to treat (fun x -> x) and (fun y -> y) as the same - logically, they are, but syntactically, they are not.
So, if you want to check whether quotations are equal, you'll just have to implement your own check. Something like this does the trick for the basic case you have in the example, but it does not handle all the cases:
open Microsoft.FSharp.Quotations
open Microsoft.FSharp.Quotations.Patterns
let rec equal = function
| Let(v1, ea1, eb1), Let(v2, ea2, eb2) ->
v1.Name = v2.Name && equal (ea1, ea2) && equal (eb1, eb2)
| Var(v1), Var(v2) -> v1.Name = v2.Name
| Lambda(v1, e1), Lambda(v2, e2) -> v1.Name = v2.Name && equal (e1, e2)
| Call(None, m1, es1), Call(None, m2, es2) ->
m1 = m2 && (List.zip es1 es2 |> List.forall equal)
| Value(v1), Value(v2) -> v1 = v2
| _ -> false
I've been trying to get my head round various bits of F# (I'm coming from more of a C# background), and parsers interest me, so I jumped at this blog post about F# parser combinators:
http://santialbo.com/blog/2013/03/24/introduction-to-parser-combinators
One of the samples here was this:
/// If the stream starts with c, returns Success, otherwise returns Failure
let CharParser (c: char) : Parser<char> =
let p stream =
match stream with
| x::xs when x = c -> Success(x, xs)
| _ -> Failure
in p //what does this mean?
However, one of the things that confused me about this code was the in p statement. I looked up the in keyword in the MSDN docs:
http://msdn.microsoft.com/en-us/library/dd233249.aspx
I also spotted this earlier question:
Meaning of keyword "in" in F#
Neither of those seemed to be the same usage. The only thing that seems to fit is that this is a pipelining construct.
The let x = ... in expr allows you to declare a binding for some variable x which can then be used in expr.
In this case p is a function which takes an argument stream and then returns either Success or Failure depending on the result of the match, and this function is returned by the CharParser function.
The F# light syntax automatically nests let .. in bindings, so for example
let x = 1
let y = x + 2
y * z
is the same as
let x = 1 in
let y = x + 2 in
y * z
Therefore, the in is not needed here and the function could have been written simply as
let CharParser (c: char) : Parser<char> =
let p stream =
match stream with
| x::xs when x = c -> Success(x, xs)
| _ -> Failure
p
The answer from Lee explains the problem. In F#, the in keyword is heritage from earlier functional languages that inspired F# and required it - namely from ML and OCaml.
It might be worth adding that there is just one situation in F# where you still need in - that is, when you want to write let followed by an expression on a single line. For example:
let a = 10
if (let x = a * a in x = 100) then printfn "Ok"
This is a bit funky coding style and I would not normally use it, but you do need in if you want to write it like this. You can always split that to multiple lines though:
let a = 10
if ( let x = a * a
x = 100 ) then printfn "Ok"
I'm new to F# and functional and am working on some HTML parsing code. I want to remove from a HTML document elements that match some criteria. Here I have a sequence of objects (HtmlNodes) and want to remove them from the document.
Is this idiomatic way of using pattern matching? Also as HtmlNode.Remove() has a side-effect on the original HtmlDocument object, is there any particular way of structuring the code to make the side-effect obvious or how should this be handled. You can be as pedantic as you like with the code.
open HtmlAgilityPack
let removeNodes (node : HtmlNode) =
let (|HiddenNodeCount|) (n : HtmlNode) =
match n.SelectNodes("*[#style[contains(.,'visibility:hidden')]]") with
| null -> 0
| _ as x -> Seq.length x
match node with
| x when x.Name.ToLower() = "script" -> node.Remove()
| x when x.NodeType = HtmlNodeType.Comment -> node.Remove()
| HiddenNodeCount x when x > 0 -> node.Remove()
| _ -> ()
let html = "some long messy html code would be here"
let dom = new HtmlDocument(OptionAutoCloseOnEnd=true)
dom.LoadHtml(html)
let nodes = dom.DocumentNode.DescendantNodes()
nodes |> Seq.toArray |> Array.iter removeNodes
Personally, I prefer if elif else over pattern matching when you don't have a data structure to decompose (it's just less typing, and may also serve to differentiate between when a structure is being decomposed versus simpler case testing).
There are some odd things in your code. The Active Pattern isn't very helpful here for two reasons: first, its scope is limited to removeNodes so it is only used once. I'll address the second issues later, but first I will show how I would write this by eliminating the Active Pattern and, for me at least, making the side-effects more obvious (by separating the code which tests whether a node should be removed from the code that does the removing):
let shouldRemoveNode (node : HtmlNode) =
if node.Name.ToLower() = "script" then true
elif node.NodeType = HtmlNodeType.Comment then true
else match node.SelectNodes("*[#style[contains(.,'visibility:hidden')]]") with
| null -> false
| x -> Seq.length x > 0
let removeNode (node: HtmlNode) =
if shouldRemoveNode(node) then node.Remove() else ()
Notice I do use a pattern match in the visibility hidden query since I do get to match against null and bind to x otherwise (rather than binding to x, and then testing x with if else).
The second odd thing with your Active Pattern is that you are using it for converting a node to an int, but the length you obtain isn't immediately useful (you still need to perform a test against it). Whereas the more powerful use of an Active Pattern here would be to carve up nodes into different kinds (assuming this isn't ad-hoc, which was may first point). So you could have:
//expand to encompass several other kinds of nodes
let (|Script|Comment|Hidden|Other|) (node : HtmlNode) =
if node.Name.ToLower() = "script" then Script
elif node.NodeType = HtmlNodeType.Comment then Comment
else match node.SelectNodes("*[#style[contains(.,'visibility:hidden')]]") with
| null -> Other
| x -> if Seq.length x > 0 then Hidden
else Other
let removeNode (node: HtmlNode) =
match node with
| Script | Comment | Hidden -> node.Remove()
| Other -> ()
Edit:
#Pascal made the observation in the comments that shouldRemoveNode can be further condensed into one big boolean expression:
let shouldRemoveNode (node : HtmlNode) =
node.Name.ToLower() = "script" ||
node.NodeType = HtmlNodeType.Comment ||
match node.SelectNodes("*[#style[contains(.,'visibility:hidden')]]") with
| null -> false
| x -> Seq.length x > 0
It isn't clear to me that this is any better than using functions and if-then-else, e.g.
let HiddenNodeCount (n : HtmlNode) =
match n.SelectNodes("*[#style[contains(.,'visibility:hidden')]]") with
| null -> 0
| x -> Seq.length x
if node.Name.ToLower() = "script" then
node.Remove()
elif node.NodeType = HtmlNodeType.Comment then
node.Remove()
elif HiddenNodeCount node > 0 then
node.Remove()