F# fslex fsyacc mature for production code? - f#

After reading a 2 year old webpage really ripping fslex/fsyacc, buggy, slow, stupid etc. compared to their OCamel counterparts i wonder what would be ones best bet for lexing parsing needs?
Ive used ANTLR before with C# bindings but am currently in the process of learning F# and was excited when i saw it came with a parser generator. Since F# is now officaly released and it seems something Microsoft is really aiming to support and develop. Would you say fslex and fsyacc is worth it for production code?

Fslex and fsyacc are used by the F# compiler, so they kind of work. I have used them a few years ago, it was good enough for my needs.
However, my experience is that lex/yacc is much less mature in F# than in OCaml. Many people in the OCaml community have used them for years, including many students (it seems like writing a small interpreter/compiler with them is a common exercise). I don't think many F# developers have used them, and I don't think the F# team has done a lot of work on these tools recently (for instance, VS integration has not been a priority). If you're not very exigent, Fslex and fsyacc could be enough for you.
A solution could be to adapt Menhir (a camlyacc replacement with several nice features) to use it with F#. I have no idea how much work it would be.
Personally, I now use FParsec every time I need to write a parser. It's quite different to use, but it's also much more flexible and it generates good parse error messages. I've been very happy with it and its author has always been very helpful when I had questions.

Fslex and fsyacc are certainly ready for production use. After all, they are used in Microsoft Visual Studio 2010, because the F# lexer and parser are written using them (The F# compiler source code is also a good example that demonstrates how to use them efficiently).
I'm not sure how fslex/fsyacc compare to their OCaml equivalents or with ANTLR. However, Frederik Holmstrom has an article that compares ANTLR with hand-written parser written in F# used in IronJS. Unfortunatelly, he doesn't have fslex/fsyacc version, so there is no direct comparison.
To answer some specific concerns - you can get MSBUILD tasks for running fslex/fsyacc as part of the build, so it integrates quite well. You don't get a syntax highlighting, but I don't think that's such a big deal. It may be slower than OCaml version, but that affects the compilation only when you change the parser - I did some modifications to the F# parser and didn't find the compilation time a problem.

The fslex and fsyacc tools were specifically written for the F# compiler and were not intended for wider use. That said, I have managed to get significant code bases ported from OCaml to F# thanks to these tools but it was laborious due to the complete lack of VS integration on the F# side (OCaml has excellent integration with syntax highlighting, jump to definition and error throwback). In particular, I moved as much of the F# code out of the lexer and parser as possible.
We have often needed to write parsers and have asked Microsoft to add official support for fslex and fsyacc but I do not believe this will happen.
My advice would be to use fslex and fsyacc only if you are facing translating a large legacy OCaml code base that uses ocamllex and ocamlyacc. Otherwise, write a parser from scratch.
I am personally not a fan of parser combinator libraries and prefer to write parsers using active patterns that look something like this s-expression parser:
let alpha = set['A'..'Z'] + set['a'..'z']
let numeric = set['0'..'9']
let alphanumeric = alpha + numeric
let (|Empty|Next|) (s: string, i) =
if i < s.Length then Next(s.[i], (s, i+1)) else Empty
let (|Char|_|) alphabet = function
| Empty -> None
| s, i when Set.contains s.[i] alphabet -> Some(s, i+1)
| _ -> None
let rec (|Chars|) alphabet = function
| Char alphabet (Chars alphabet it)
| it -> it
let sub (s: string, i0) (_, i1) =
s.Substring(i0, i1-i0)
let rec (|SExpr|_|) = function
| Next ((' ' | '\n' | '\t'), SExpr(f, it)) -> Some(f, it)
| Char alpha (Chars alphanumeric it1) as it0 -> Some(box(sub it0 it1), it1)
| Next ('(', SExprs(fs, Next(')', it))) -> Some(fs, it)
| _ -> None
and (|SExprs|) = function
| SExpr(f, SExprs(fs, it)) -> box(f, fs), it
| it -> null, it
This approach does not require any VS integration because it is just vanilla F# code. I find it easy to read and maintainable. Performance has been more than adequate in my production code.

Related

cost of implementing pipeline operator

I'm following a language called 'elm' which is an attempt to bring a Haskel-esque syntax and FRP to Javascript. There has been some discussion here about implementing the pipeline operator from F# but the language designer has concerns about the increased cost (I assume in increased compilation time or compiler implementation complexity) over the more standard (in other FP langs at least) reverse pipeline operator (which elm already implements). Can anyone speak to this? [Feel free to post directly to that thread as well or I will paste back the best answers if no one else does].
https://groups.google.com/forum/?fromgroups=#!topic/elm-discuss/Kt0MbDyRpO4
Thanks!
In the discussion you reference, I see Evan poses two challenges:
Show me some F# project that uses it
Find some credible F# programmer talking about why it is a good idea and what costs come with it (blog post or something).
I'd answer as follows:
The forward pipe-idiom is very common in F# programming, both for stylistic (we like it) and practical (it helps type inference) reasons. Just about any F# project you'll find will use it frequently. Certainly all of my open source projects use it (Unquote, FsEye, NL found here). No doubt you'll find the same with all of the Github located F# projects including the F# compiler source itself.
Brian, a developer on the F# compiler team at Microsoft, blogged about Pipelining in F# back in 2008, a still very interesting and relevant blog which relates F# pipes to POSIX pipes. In my own estimation, there is very little cost to implementing a pipe operator. In the F# compiler, this is certainly true in every sense (it's a one-line, inline function definition).
The pipeline operator is actually incredibly simple - here is the standard definition
let inline (|>) a b = b a
Also, the . operator discussed in the thread is the reverse pipe operator in F# (<|) which enables you to eliminate some brackets.
I don't think adding pipeline operators would have a significant impact on complexity
In addition to the excellent answers already given here, I'd like to add a couple more points.
Firstly, one of the reasons why the pipeline operator is common in F# is that it helps to circumvent a shortcoming the way type inference is currently done. Specifically, if you apply an aggregate operation with a lambda function that uses OOP to a collection type inference will typically fail. For example:
Seq.map (fun z -> z.Real) zs
This fails because F# does not yet know the type of z when it encounters the property Real so it refuses to compile this code. The idiomatic fix is to use the pipeline operator:
xs |> Seq.map (fun z -> z.Real)
This is strictly uglier (IMO) but it works.
Secondly, the F# pipe operator is nice to a point but you cannot currently get the inferred type of an intermediate result. For example:
x
|> h
|> g
|> f
If there is a type error at f then the programmer will want to know the type of the value being fed into f in case the problem was actually with h or g but this is not currently possible in Visual Studio. Ironically, this was easy in OCaml with the Tuareg mode for Emacs because you could get the inferred type of any subexpression, not just an identifier.

What to keep in mind while learning F#, having learned Scheme

I'm quite interested in learning F#.
My only experience with functional languages has been 2 introductory courses on Scheme in college.
Are there any things that I should keep in mind while learning F#, having previously learned Scheme? Any differences in methodologies, gotchas or other things that might give me trouble?
Are there any things that I should keep in mind while learning F#, having previously learned Scheme? Any differences in methodologies, gotchas or other things that might give me trouble?
Static typing is the major difference between Scheme and F#. This facilitates a style called typeful programming where the type system is used to encode constraints about functions and data such that the compiler proves these aspects of the program correct at compile time and any violations of the constraints are caught immediately.
For example, a sequence of one or more elements of the same type might be conveyed by a value of the following type:
type list1<'a> = List1 of 'a * 'a list
let xs = List1(1, [])
let ys = List1(2, [3; 4])
The compiler now guarantees that any attempt to use an empty one of these sequences will be caught at compile time as an error.
Now, the reduce function makes no sense on an empty sequence so the built-in implementation for lists barfs at run-time with an exception if it encounters an empty sequence:
> List.reduce (+) [];;
System.ArgumentException: The input list was empty.
Parameter name: list
at Microsoft.FSharp.Collections.ListModule.Reduce[T](FSharpFunc`2 reduction, FSharpList`1 list)
at <StartupCode$FSI_0271>.$FSI_0271.main#()
Stopped due to error
With our new sequence of one or more elements, we can now write a reduce function that never barfs at run-time with an exception because its input is guaranteed by the type system to be non-empty:
let rec reduce f = function
| List1(x, []) -> x
| List1(x0, x1::xs) -> f x0 (reduce f (List1(x1, xs)))
This is a great way to improve the reliability of software by eliminating sources of run-time errors and it is something that dynamically typed languages like Scheme cannot even begin to do.
Scheme is a nice functional language; learning it in school should provide a good foundation for functional programming.
F# is statically-typed whereas Scheme is dynamic, so that is one obvious difference. If you have experience with other static languages (especially .NET languages like C#) then that will not be a big deal, but if most of your experience is dynamic, that will be a change.
Learning the names of the main F# functional programming functions (things like List.map) is important; most every functional language has the same basic set but often with different names (I don't recall the main Scheme names to compare).
If you have old Scheme 'programming assignments' with sample inputs/outputs handy, it may be useful to re-code them in F# as a way to 'warm up' with the language.
I suggest considering Haskell too, and they are roughly in the same family as F# and ML, and Haskell contains a lot of interesting functional concepts not found elsewhere.
Take a look at tryhaskell.org for an interactive online tutorial.

What is the shortest way to write parser for my language?

PS.Where to read about parsing theory?
Summary: the shortest is probably Antlr.
Its tempting to go to the Dragon Book to learn about parsing theory. But I don't think the Dragon Book and you have the same idea of what "theory" means. The Dragon Book describes how to built hand-written parsers, parser generators, etc, but you almost certainly want to use a parser-generation tool instead.
A few people have suggested Bison and Flex (or their older versions Yacc and Lex).
Those are the old stalwarts, but they are not very usable tools.
Their documentation is not poor per se, its just that it doesn't quite help in getting dealing with the accidental complexity of using them.
Their internal data is not well encapsulated, and it is very hard to do anything advanced with them. As an example, in phc we still do not have correct line numbers because it is very difficult. They got better when we modified out grammar to include No-op statements, but that is an incredible hack which should not be necessary.
Ostensibly, Bison and Flex work together, but the interface is awkward. Worse, there are many versions of each, which only play nicely with some specific versions of the other. And, last I checked at least, the documentation of which versions went with which was pretty poor.
Writing a recursive descent parser is straightforward, but can be tedious. Antlr can do that for you, and it seems to be a pretty good toolset, with the benefit that what you learn on this project can be applied to lots of other languages and platforms (Antlr is very portable). There are also lots of existing grammars to learn from.
Its not clear what language you're working in, but some languages have excellent parsing frameworks. In particular, the Haskell Parsec Library seems very elegant. If you use C++ you might be tempted to use Spirit. I found it very easy to get started with, and difficult--but still possible--to do advanced things with it. This matches my experience of C++ in general. I say I found it easy to start, but then I had already written a couple of parsers, and studied parsing in compiler class.
Long story short: Antlr, unless you've a very good reason.
It's always a good idea to read the Dragon Book. But be aware that if your language is not trivial, there's not really a "short" way to do it.
It rather depends on your language. Some very simple languages take very little parsing so can be hand-coded; other languages use PEG generators such as Rats! ( PEG is parser expression grammar, which sits between a Regex and a LR parser ) or conventional parser generators such as Antlr and Yacc. Less formal languages require probabilistic techniques such as link grammars.
Write a Recursive Descent Parser. This is sometimes easier than YACC/BISON, and usually more intuitive.
Douglas Crockford has an approachable example of a parser written in JavaScript.
YACC, there are various implementation for different languages.
Good luck with your language ;-)
I used the GOLD Parsing System, because it seemed easier to use than ANTLR for a novice like me, while still being sufficiently-fully-featured for my needs. The web site includes documentation (including an instructions on Writing Grammars, which is half the work) as well as software.
Try Bison for parsing and Flex for lexing
The bison definition of your language is in the form of a context-free grammar. The wikipedia artcile on this topic is quite good, and is probably a good place to start.
Using a parser generator for your host language is the fastest way, combined with parsing theory from a book such as the Dragon Book or the Modern Compiler Construction in {C,ML} series.
If you use C, yacc and the GNU version bison are the standard generators. Antlr is widely used in many languages, supporting Java, C#, and C++ as far as I know. There are also many others in almost any language.
My personal favorite at present is Menhir, an excellent parser generator for OCaml. ML-style languages (Ocaml, Standard ML, etc.) dialects in general are very good for building compilers and interpreters.
ANTLR is the easiest for someone without compiler theory background because of:
ANTLRWORKS (visual parsing and AST debugging)
The ANTLR book (no compiler theory background required)
Just 1 syntax for lexer and parser.
If you are happy with parsing expression grammars, writing your own parsers can be incredibly short. Here is a simple Packrat parser that takes a reasonable subset of PEG:
import functools
class peg_parse:
def __init__(self, grammar):
self.grammar = {k:[tuple(l) for l in rules] for k,rules in grammar.items()}
#functools.lru_cache(maxsize=None)
def unify_key(self, key, text, at=0):
if key not in self.grammar:
return (at + len(key), (key, [])) if text[at:].startswith(key) \
else (at, None)
rules = self.grammar[key]
for rule in rules:
l, res = self.unify_rule(rule, text, at)
if res is not None: return l, (key, res)
return (0, None)
def unify_line(self, parts, text, tfrom):
results = []
for part in parts:
tfrom, res = self.unify_key(part, text, tfrom)
if res is None: return tfrom, None
results.append(res)
return tfrom, results
It accepts grammars of the form of a python dictionary, with nonterminals as keys and alternatives as elements of the array, and each alternative is a sequence of expressions. Below is an example grammar.
term_grammar = {
'expr': [
['term', 'add_op', 'expr'],
['term']],
'term': [
['fact', 'mul_op', 'term'],
['fact']],
'fact': [
['digits'],
['(','expr',')']],
'digits': [
['digit','digits'],
['digit']],
'digit': [[str(i)] for i in list(range(10))],
'add_op': [['+'], ['-']],
'mul_op': [['*'], ['/']]
}
Here is the driver:
import sys
def main(to_parse):
result = peg_parse(term_grammar).unify_key('expr', to_parse)
assert (len(to_parse) - result[0]) == 0
print(result[1])
if __name__ == '__main__': main(sys.argv[1])
Which can be invoked thus:
python3 parser.py '1+2'
('expr',
[('term',
[('fact',
[('digits', [('digit', [('1', [])])])])]),
('add_op', [('+', [])]),
('expr',
[('term', [('fact', [('digits', [('digit', [('2', [])])])])])])])
Parsing Expression Grammars take some care to write: The ordering of alternatives is important (Unlike a Context Free Grammar, the alternatives are an ordered choice, with the first choice being tried first, and second being tried only if the first did not match). However, they can represent all known context free grammars.
If on the other hand, you decide to go with a Context Free Grammar, Earley Parser is one of the simplest.

F#: is it OK for developing theorem provers?

Please advise. I am a lawyer, I work in the field of Law Informatics. I have been a programmer for a long time (Basic, RPG, Fortran, Pascal, Cobol, VB.NET, C#). I am currently interested in F#, but I'd like some advise. My concern is F# seems to be fit for math applications. And what I want would require a lot of Boolean Math operations and Natural Language Processing of text and, if successful, speech. I am worried about the text processing.
I received a revolutionary PROLOG source code (revolutionary in the field of Law and in particular Dispute Resolution). The program solves disputes by evaluating Yes-No (true-false) arguments advanced by two debating parties. Now, I am learning PROLOG so I can take the program to another level: evaluate the strenght of arguments when they are neither a Yes or No, but a persuasive element in the argumentation process.
So, the program handles the dialectics aspect of argumentation, I want it to begin processing the rhetoric aspect of argumentation, or at least some aspects.
Currently the program can manage formal logic. What I want is to begin managing some aspects of informal logic and for that I would need to do parsing of strings (long strings, maybe ms word documents) for the detection of text markers, words like "but" "therefore" "however" "since" etc, etc, just a long list of words I have to look up in any speech (verbal or written) and mark, and then evaluate left side and right side of the mark. Depending on the mark the sides are deemed strong or weak.
Initially, I thought of porting the Prolog program to C# and use a Prolog library. Then, it ocurred to me maybe it could be better in pure F#.
First, the project you describe sounds (and I believe this is the correct legal term) totally freaking awesome.
Second, while F# is a good choice for math applications, its also extremely well-suited for any applications which perform a lot of symbolic processing. Its worth noting that F# is part of the ML family of languages which were originally designed for the specific purpose of developing theorem provers. It sounds like you're writing an application which appeals directly to the niche ML languages are geared for.
I would personally recommend writing any theorem proving applications you have in F# rather than C# -- only because the resulting F# code will be about 1/10th the size of the C# equivalent. I posted this sample demonstrating how to evaluate propositional logic in C# and F#, you can see the difference for yourself.
F# has many features that make this type of logic processing natural. To get a feel for what the language looks like, here is one possible way to decide which side of an argument has won, and by how much. Uses a random result for the argument, since the interesting (read "very hard to impossible") part will be parsing out the argument text and deciding how persuasive it would be to an actual human.
/// Declare a 'weight' unit-of-measure, so the compiler can do static typechecking
[<Measure>] type weight
/// Type of tokenized argument
type Argument = string
/// Type of argument reduced to side & weight
type ArgumentResult =
| Pro of float<weight>
| Con of float<weight>
| Draw
/// Convert a tokenized argument into a side & weight
/// Presently returns a random side and weight
let ParseArgument =
let rnd = System.Random()
let nextArg() = rnd.NextDouble() * 1.0<weight>
fun (line:string) ->
// The REALLY interesting code goes here!
match rnd.Next(0,3) with
| 1 -> Pro(nextArg())
| 2 -> Con(nextArg())
| _ -> Draw
/// Tally the argument scored
let Score args =
// Sum up all pro & con scores, and keep track of count for avg calculation
let totalPro, totalCon, count =
args
|> Seq.map ParseArgument
|> Seq.fold
(fun (pros, cons, count) arg ->
match arg with
| Pro(w) -> (pros+w, cons, count+1)
| Con(w) -> (pros, cons+w, count+1)
| Draw -> (pros, cons, count+1)
)
(0.0<weight>, 0.0<weight>, 0)
let fcount = float(count)
let avgPro, avgCon = totalPro/fcount, totalCon/ fcoun
let diff = avgPro - avgCon
match diff with
// consider < 1% a draw
| d when abs d < 0.01<weight> -> Draw
| d when d > 0.0<weight> -> Pro(d)
| d -> Con(-d)
let testScore = ["yes"; "no"; "yes"; "no"; "no"; "YES!"; "YES!"]
|> Score
printfn "Test score = %A" testScore
Porting from prolog to F# wont be that straight forward. While they are both non-imperative languages. Prolog is a declarative language and f# is functional. I never used C# Prolog libraries but I think it will be easier then converting the whole thing to f#.
It sounds like the functional aspects of F# are appealing to you, but you wonder if it can handle the non-functional aspects. You should know that F# has the entire .NET Framework at its disposal. It also is not a purely functional language; you can write imperative code in it if you want to.
Finally, if there are still things you want to do from C#, it is possible to call F# functions from C#, and vice versa.
While F# is certainly more suitable than C# for this kind of application since there're going to be several algorithms which F# allows you to express in a very concise and elegant way, you should consider the difference between functional, OO, and logic programming. In fact, porting from F# will most likely require you to use a solver (or implement your own) and that might take you some time to get used to. Otherwise you should consider making a library with your prolog code and access it from .NET (see more about interop at this page and remember that everything you can access from C# you can also access from F#).
F# does not support logic programming as Prolog does. you might want to check out the P# compiler.

F# - Should I learn with or without #light?

I'm in the process of learning F# and am enjoying it so far. Almost all of the examples online use the lightweight syntax (#light); however, also give a comment about it being on for said example in most cases.
Is it better to learn F# using #light enabled or disabled? I'm planning on eventually learning it w/o it turned on but am curious on if it would be better to learn it at the beginning or work on applying it after I know the core language more.
I'd definitely prefer learning F# with the #light syntax. The non-light version is sometimes useful for understanding some tricks about the F# syntax, but the #light syntax gives you much pleasant experience.
For example - using #light
let add a b c =
let ab = a + b
printfn "%d" ab
c - ab
Using non-light you can write the same thing like this:
let add a b c =
let ab = a + b in // 'in' keyword specifies where the binding (value 'ab') is valid
printfn "%d" ab; // ';' is operator for sequencing expressions
c - ab;; // ';;' is end of a function declaration
This for example shows that you cannot write something like:
let doNothing a b =
let sum = a + b in
There is an 'in' keyword at the end but the function doesn't have any body (because there is no expression following 'in'). In this case non-light syntax is sometimes interesting to understand what's going on... But as you can see, the #light code is a lot simpler.
The "#light" will probably become the default in a future release of the language, so I would learn it that way. I think it's rare for anyone to use the heavier syntax except for OCaml-compatibility (either when cross-compiling, or because the human sitting at the keyboard knows OCaml and is making a smoother transition to F#).
Because I learned F# from an OCaml book (and I use an OCaml mode for Emacs to edit F# code), I prefer to use the "heavy" syntax. I have worked with #light code, and of course most of the F# examples are written using the light syntax so having some general familiarity is useful. That said, it's quite a bit easier to switch from heavy to light than the other way around, so it's certainly not a bad idea to learn it using the heavy syntax.
I have come across the occasional annoying bug with heavy syntax being treated as a second class citizen (combine was broken for computation expressions a couple releases back), but these are pretty rare. Generally speaking, I don't think the differences are very significant and I need to look close to determine which syntax is being used when looking at code in isolation. YMMV.
If I remember correctly, book "Expert C#" mentions that #light will be the default when F# ships and that non-light syntax is intended for compatibility only.

Resources