Here is my problem: I'm trying to write a parser leveraging the power of active patterns in F#. The basic signature of a parsing function is the following
LazyList<Token> -> 'a * LazyList<Token>
Meaning it takes a lazy list of tokens, and returns the result of the parse and the new list of tokens after parsing, so as to follow functional design.
Now, as a next step, I can define active patterns that will help me match some constructs directly in match expressions, thusly
let inline (|QualName|_|) token_stream =
match parse_qualified_name token_stream with
| Some id_list, new_stream -> Some (id_list, new_stream)
| None, new_stream -> None
let inline (|Tok|_|) token_stream =
match token_stream with
| Cons (token, tail) -> Some(token.variant, tail)
| _ -> None
and then match parse results in a high level fashion this way
let parse_subprogram_profile = function
| Tok (Kw (KwProcedure | KwFunction),
QualName(qual_name,
Tok (Punc (OpeningPar), stream_tail))) as token_stream ->
// some code
| token_stream -> None, token_stream
The problem I have with this code is that every new matched construct is nested, which is not readable, especially if you have a long chain of results to match. I'd like to have the ability to define a matching operator such as the :: operator for list, which would enable me to do the following :
let parse_subprogram_profile = function
| Tok (Kw (KwProcedure | KwFunction)) ::
QualName(qual_name) ::
Tok (Punc (OpeningPar)) :: stream_tail as token_stream ->
// some code
| token_stream -> None, token_stream
But I don't think such a thing is possible in F#. I would even accept a design in which I have to call a specific "ChainN" active pattern where N is the number of element I want to parse, but I don't know how to design such a function if it is possible.
Any advice or directions regarding this ? Is there an obvious design I didn't see ?
I had something like this in mind, too, but actually gave up going for this exact design. Something you can do is to use actual lists.
In such case, you would have a CombinedList which is made of (firstly) a normal list acting as a buffer and (secondly) a lazy list.
When you want to match against a pattern, you can do:
match tokens.EnsureBuffer(4) with
| el1 :: el2 :: remaining -> (el1.v+el2.v, tokens.SetBuffer(remaining))
| el3 :: el4 :: el5 :: el6 :: remaining -> (el1.v-el2.v+el3.v-el4.v, tokens.SetBuffer(remaining))
where EnsureBuffer and SetBuffer may either mutate "tokens" and return it or return it if no change are required or return a new instances otherwise.
Would that solve your problem?
François
Related
I have been working with some f# parsers and some streaming software and I find myself using this pattern more and more. I find it to be a natural alternative to sequences and it has some natural advantages.
here are some example functions using the type.
type foldedSequence<'a> =
| Empty
| Value of ' a * (unit -> 'a foldedSequence)
let rec createFoldedSequence fn state =
match fn state with
| None -> Empty
| Some(value, nextState) ->
Value(value, (fun () -> unfold fn nextState))
let rec filter predicate =
function
| Empty -> Empty
| Value(value, nextValue) ->
let next() = filter predicate(nextValue())
if predicate value then Value(value, next)
else next()
let toSeq<'t> =
Seq.unfold<'t foldedSequence, 't>(function
| Empty -> None
| Value(value, nextValue) -> Some(value, nextValue()))
It has been very helpful I would like to know if it has a name so I can research some tips and tricks for it
To add to the existing answers, I think Haskellers might call a generalised version of this this a list monad transformer. The idea is that your type definition looks almost like ordinary F# list except that there is some additional aspect to it. You can imagine writing this as:
type ListTransformer<'T> =
| Empty
| Value of 'T * M<ListTransformer<'T>>
By supplying specific M, you can define a number of things:
M<'T> = 'T gives you the ordinary F# list type
M<'T> = unit -> 'T gives you your sequence that can be evaluated lazily
M<'T> = Lazy<'T> gives you LazyList (which caches already evaluated elements)
M<'T> = Async<'T> gives you asynchronous sequences
It is also worth noting that in this definition LazyTransformer<'T> is not itself a delayed/lazy/async value. This can cause problems in some cases - e.g. when you need to perform some async operation to decide whether the stream is empty - and so a better definition is:
type ListTransformer<'T> = M<ListTransformerInner<'T>>
and ListTransformerInner<'T> =
| Empty
| Value of 'T * ListTransformer<'T>
This sounds like LazyList which used to be in the "powerpack" and I think now lives here:
http://fsprojects.github.io/FSharpx.Collections/reference/fsharpx-collections-lazylist-1.html
https://github.com/fsprojects/FSharpx.Collections/blob/master/src/FSharpx.Collections/LazyList.fs
Your type is close to how an iteratee would be defined, and since you already mention streaming, this might be the concept you're looking for.
Iteratee IO is an approach to lazy IO outlined by Oleg Kiselyov. Apart from Haskell, implementations exist for major functional languages, including F# (as part of FSharpx.Extras).
This is how FSharpx defines an Iteratee:
type Iteratee<'Chunk,'T> =
| Done of 'T * Stream<'Chunk>
| Error of exn
| Continue of (Stream<'Chunk> -> Iteratee<'Chunk,'T>)
See also this blog post: Iteratee in F# - part 1. Note that there doesn't seem to be a part 2.
Let us have a type definition for a tree with several types of binary nodes, among other types of nodes, i.e.
type Tree =
| BinaryNodeA of Tree * Tree
| BinaryNodeB of Tree * Tree
| [Other stuff...]
I want to manipulate this tree using a recursive function that could, e.g., swap subnodes of any kind of binary node (by constructing a new node). The problem that is driving me crazy: How to match all BinaryNodes so that Node flavor becomes "a parameter" so as to have generic swap that can be applied to any BinaryNode flavor to return swapped node of that flavor?
I know how to match all Trees that are BinaryNodes by using an active pattern:
let (|BinaryNode|_|) (tree : Tree) =
match tree with
| BinaryNodeA _ | BinaryNodeB _ -> Some(tree)
| _ -> None
But that's not good enough because the following does not seem achievable:
match tree with
| [cases related to unary nodes..]
| BinaryNode a b -> BinaryNode b a
In other words, I have not found way to use BinaryNode flavor as if it were parameter like a and b. Instead, it seems I have to match each BinaryNode flavor separately. This could have practical significance if there were large number of binary node flavors. Type Tree is AST for Fsyacc/Fslex-generated parser/lexer, which limits options to restructure it. Any ideas?
You just need to change the definition of your active pattern:
type Flavor = A | B
let (|BinaryNode|_|) (tree : Tree) =
match tree with
| BinaryNodeA(x,y) -> Some(A,x,y)
| BinaryNodeB(x,y) -> Some(B,x,y)
| _ -> None
let mkBinaryNode f t1 t2 =
match f with
| A -> BinaryNodeA(t1,t2)
| B -> BinaryNodeB(t1,t2)
Then you can achieve what you want like this:
match tree with
| [cases related to unary nodes..]
| BinaryNode(f,t1,t2) -> mkBinaryNode f t2 t1
But if this is a common need then it might make sense to alter the definition of Tree to include flavor instead of dealing with it using active patterns.
I'm rather new to F# so the question may be fairly elementary. Still, I couldn't find any suggestion on SO.
I'm playing with an algorithmic task in F#. As a first step I want to create a collection of integers from user console input. The number of inputs is not defined. And I don't wont to use any while loops. I would prefer as much idiomatic approach as possible.
In a recursive function I'm reading the result and parsing it with Int32.TryParse. I match the bool result using match ... with. If successful then I attach a new value to a collection. Otherwise I return the collection.
Below is my code:
let rec getNumList listSoFar =
let ok, num = Int32.TryParse(Console.ReadLine())
match ok with
| false -> listSoFar
| true -> getNumList num::listSoFar
let l = getNumList []
And the error I get:
Type mismatch. Expecting a
'a
but given a
'a list
I'm aware I'm using types incorrectly, though I don't understand what exactly is wrong. Any explanation highly appreciated.
In the match branch
| true -> getNumList num::listSoFar
You should use parenthesis:
| true -> getNumList (num::listSoFar)
Because function application has higher priority than the :: operator
Have a look at this F#/OCaml code:
type AllPossible =
| A of int
| B of int*int
| ...
| Z of ...
let foo x =
....
match x with
| A(value) | B(value,_) -> (* LINE 1 *)
(* do something with the first (or only, in the case of A) value *)
...
(* now do something that is different in the case of B *)
let possibleData =
match x with
| A(a) -> bar1(a)
| B(a,b) -> bar2(a+b)
| _ -> raise Exception (* the problem - read below *)
(* work with possibleData *)
...
| Z -> ...
So what is the problem?
In function foo, we pattern match against a big list of types.
Some of the types share functionality - e.g. they have common
work to do, so we use "|A | B ->" in LINE 1, above.
We read the only integer (in the case of A), or the first integer
(in the case of B) and do something with it.
Next, we want to do something that is completely different, depending
on whether we work on A or B (i.e. call bar1 or bar2).
We now have to pattern match again, and here's the problem: In this
nested pattern match, unless we add a 'catchAll' rule (i.e. '_'),
the compiler complains that we are missing cases - i.e. it doesn't
take into account that only A and B can happen here.
But if we add the catchAll rule, then we have a far worse problem:
if at some point we add more types in the list of LINE1
(i.e. in the line '|A | B ->' ... then the compiler will NOT help
us in the nested match - the '_' will catch them, and a bug will
be detected at RUNTIME. One of the most important powers of
pattern matching - i.e. detecting such errors at compile-time - is lost.
Is there a better way to write this kind of code, without having
to repeat whatever work is shared amongst A and B in two separate
rules for A and B? (or putting the A-and-B common work in a function
solely created for the purpose of "local code sharing" between A and B?)
EDIT: Note that one could argue that the F# compiler's behaviour is buggy in this case -
it should be able to detect that there's no need for matching beyond A and B
in the nested match.
If the datatype is set in stone - I would also prefer local function.
Otherwise, in OCaml you could also enjoy open (aka polymorphic) variants :
type t = [`A | `B | `C]
let f = function
| (`A | `B as x) ->
let s = match x with `A -> "a" | `B -> "b" in
print_endline s
| `C -> print_endline "ugh"
I would just put the common logic in a local function, should be both faster and more readable. Matches nested that way is pretty hard to follow, and putting the common logic in a local function allows you to ditch the extra matching in favour of something that'll get inlined anyway.
Hmm looks like you need to design the data type a bit differently such as:
type AorB =
| A of int
| B of int * int
type AllPossible =
| AB of AorB
| C of int
.... other values
let foo x =
match x with
| AB(v) ->
match v with
| A(i) -> () //Do whatever need for A
| B(i,v) -> () // Do whatever need for B
| _ -> ()
Perhaps the better solution is that rather than
type All =
|A of int
|B of int*int
you have
type All =
|AorB of int * (int Option)
If you bind the data in different ways later on you might be better off using an active pattern rather than a type, but the result would be basically the same
I don't really agree that this should be seen as a bug - although it would definitely be convenient if the case was handled by the compiler.
The C# compiler doesn't complain to the following and you wouldn't expect it to:
var b = true;
if (b)
if (!b)
Console.WriteLine("Can never be reached");
I'm trying to write a string processing function in F#, which looks like this:
let rec Process html =
match html with
| '-' :: '-' :: '>' :: tail -> ("→" |> List.of_seq) # Process tail
| head :: tail -> head :: Process tail
| [] -> []
My pattern matching expression against several elements is a bit ugly (the whole '-' :: '-' :: '>' thing). Is there any way to make it better? Also, is what I'm doing efficient if I were to process large texts? Or is there another way?
Clarification: what I mean is, e.g., being able to write something like this:
match html with
| "-->" :: tail ->
I agree with others that using a list of characters for doing serious string manipulation is probably not ideal. However, if you'd like to continue to use this approach, one way to get something close to what you're asking for is to define an active pattern. For instance:
let rec (|Prefix|_|) s l =
if s = "" then
Some(Prefix l)
else
match l with
| c::(Prefix (s.Substring(1)) xs) when c = s.[0] -> Some(Prefix xs)
| _ -> None
Then you can use it like:
let rec Process html =
match html with
| Prefix "-->" tail -> ("→" |> List.of_seq) # Process tail
| head :: tail -> head :: Process tail
| [] -> []
Is there any way to make it better?
Sure:
let process (s: string) = s.Replace("-->", "→")
Also, is what I'm doing efficient if I were to process large texts?
No, it is incredibly inefficient. Allocation and garbage collection is expensive and you're doing so for every single character.
Or is there another way?
Try the Replace member. If that doesn't work, try a regular expression. If that doesn't work, write a lexer (e.g. using fslex). Ultimately, what you want for efficiency is a state machine processing a stream of chars and outputting its result by mutating in-place.
I think you should avoid using list<char> and using strings and e.g. String.Replace, String.Contains, etc. System.String and System.StringBuilder will be much better for manipulating text than list<char>.
For simple problems, using String and StringBuilder directly as Brian mentioned is probably the best way. For more complicated problems, you may want to check out some sophisticated parsing library like FParsec for F#.
This question may be some help to give you ideas for another way of approaching your problem - using list<> to contain lines, but using String functions within each line.