I know that several tools allow to define "easily" context-free grammars so as to define DSL for example.
But is there a tool that can analyse non-context-free grammars?
Definite clause grammars can be used to generate parsers for non-context-free grammars. There are also several parser generators for context-sensitive grammars, such as LuZc.
Related
I'm reading my notes for my comparative languages class and I'm a bit confused...
What is the difference between a context-free grammar and a deterministic context-free grammar? I'm specifically reading about how parsers are O(n^3) for CFGs and compilers are O(n) for DCFGs, and don't really understand how the difference in time complexities could be that great (not to mention I'm still confused about what the characteristics that make a CFG a DCFG).
Thank you so much in advance!
Conceptually they are quite simple to understand. The context free grammars are those which can be expressed in BNF. The DCFGs are the subset for which a workable parser can be written.
In writing compilers we are only interested in DCFGs. The reason is that 'deterministic' means roughly that the next rule to be applied at any point in the parse is determined by the input so far and a finite amount of lookahead. Knuth invented the LR() compiler back in the 1960s and proved it could handle any DCFG. Since then some refinements, especially LALR(1) and LL(1), have defined grammars that can be parsed in limited memory, and techniques by which we can write them.
We also have techniques to derive parsers automatically from the BNF, if we know it's one of these grammars. Yacc, Bison and ANTLR are familiar examples.
I've never seen a parser for a NDCFG, but at any point in the parse it would potentially need to consider the whole of the input string and every possible parse that could be applied. It's not hard to see why that would get rather large and slow.
I should point out that many real languages are imperfect, in that they are not entirely context free, not unambiguous or otherwise depart from the ideal DCFG. C/C++ is a good example, but there are many others. These languages are usually handled by special purpose rules such as semantic or syntactic predicates, special case backtracking or other 'tricks' with no effect on performance.
The comments point out that certain kinds of NDCFG are common and many tools provide a way to handle them. One common problem is ambiguity. It is relatively easy to parse an ambiguous grammar by introducing a simple local semantic rule, but of course this can only ever generate one of the possible parse trees. A generalised parser for NDCFG would potentially produce all parse trees, and could perhaps allow those trees to be filtered on some arbitrary condition. I don't know any of those.
Left recursion is not a feature of NDCFG. It presents a particular challenge to the design of LL() parsers but no problems for LR() parsers.
I've been using lex/yacc and now I'm trying to switch to ANTLR. The major concern is that ANTLR is an LL(*) parser unlike yacc which is LALR. I'm used to thinking bottom-up and I don't exactly know what the advantage of LL grammars is. People say that LL grammars are easier to understand and more popular these days. But it seems that LR parsers are more powerful e.g. LL parsers are incapable of dealing with left-recursions, although there seems to be some workarounds.
So the question is what is the advantage of LL grammars over LALR? I'd appreciate it if somebody could give me some examples. Links to useful articles would be great, too.
Thanks for your help in advance!
(I see this is a great resource: What advantages do LL parsers have over LR parsers?, but it would've been better with some examples.)
LR parsers are strictly more powerful than LL parsers, and in addition, LALR parsers can run in O(n) like LL parsers. So you won't find any functional advantages of LL over LR.
Thus, the only advantage of LL is that LR state machines are quite a bit more complex and difficult to understand, and LR parsers themselves are not especially intuitive. On the other hand, LL parser code which is automatically generated can be very easy to understand and debug.
The greatest advantage I see to LL parsers is that they are so easy to understand and implement! You can hand write recursive descent parsers with code that closely matches the grammar.
LR are generally considered more powerful and also much faster BUT there are a few trade offs that I know:
LR parsers can only use synthesized attributes; they can not pass inherited attributes.
Actions in an LR grammar can cause grammar nondeterminism but not in LL.
However, you will find that LL(*) are also very powerful.
I'm currently in the process of writing a parser for some language. I've been given a grammar for this language, but this grammar has some left recursions and non-LL(*) constructs, so ANTLR doesn't do very well, even with backtracking.
Because removing these left recursions and non-LL(*) constructs is harder than it looked at first glance, I now want to try a LR(k) or LALR(k) parser generator. The higher k the better.
Can anyone recommend me a parser generator fulfilling these requirements?
The generated parser is preferably a LR(k) parser with some high (or even arbitrary) k, or at least a LALR(k) parser with some high k.
The generated parser is written in C or C++, and if it is written in C, it is linkable to C++-Code.
A feature set similar to ANTLR (especially the AST rewriting) would be nice.
Performance is not the most pressing issue, the generated parser is intended to
be used on desktop machines with much memory and cpu power.
Thanks and greetings,
Jost
PS: I'm not asking because I can't google myself, but because there is no time left to test some generators myself. So please only answer if you have experience with the recommended parser generators.
You might consider LRSTAR.
I have no experience with the tool itself, but I've met the author and he seems like a pretty competent guy. (I do build parsing engines and related technology for a living).
LRSTAR 10.0 is available now. On the comparison page, there is a comparison of LRSTAR, ANTLR and Bison. LRSTAR now reads ANTLR's style notation using the same EBNF operators (:, |, *, +, ?). It's a C++ based system generating LR(k) parsers in C++. The parsers do automatic AST construction and traversal. The new version 10.0 reads Yacc/Bison grammars if there is no action code in the grammar.
I have now decided to use DParser, which is a GLR-Parser generator capable of recognizing any context free language. It seems to be well programmed (look at the tests in the source distribution), but lacks a lot of the features ANTLR provides, most notably the AST-Construction tools.
As a plus, it mostly reuses ANTLRs grammar file format, which was the format my grammar is in.
I'm looking for a LL(1) parser generator in OCaml... Can anybody help me with this?
Well, LALR parsers can parse a strict superset of the languages which can be parsed by LL parsers. So I would advise simply using ocamlyacc which ships with Ocaml and is an LALR(1) parser generator. This may require some minor rewriting of the grammar, but it shouldn't be too hard.
Planck LL(n) parser combinator library: https://bitbucket.org/camlspotter/planck/overview
It has started as my toy project, and there is no actual users, but I could implement OCaml syntax lexer/parser with Planck which are 100% compatible with the originals.
I do not recommend to use it but if you are interested... try it.
Stream parser as included in camlp4 are (at best of my knowledge) LL(1) parser. see
http://caml.inria.fr/pub/docs/manual-camlp4/manual003.html
I have heard good things about Menhir
The home page says at the top:
Menhir is a LR(1) parser generator for the OCaml programming language. That is, Menhir compiles LR(1) grammar specifications down to OCaml code. Menhir was designed and implemented by François Pottier and Yann Régis-Gianas.
Menhir is 90% compatible with ocamlyacc. Legacy ocamlyacc grammar specifications are accepted and compiled by Menhir. The resulting parsers run and produce correct parse trees.
My question is about the Scala Parsers:
Which ones are available (in the Standard library and outside),
what's the difference between them,
do they share a common API and
can different Parsers be combined to parse one input string?
I found at least these:
Scala's "standard" parser (seems to be an LL parser)
Scala's Packrat parser (since 2.8, is a LALR parser)
The Parboiled parser (PEG parser?)
Spiewak's GLL parser combinator
There's also Dan Spiewak's implementation of GLL parser combinators.
It's worth noting that Scala's standard parser combinators are not LL, nor are Packrat combinators LALR. Parser combinators are a form of recursive descent with infinite backtracking. You can think of them a bit like "LL(*)". The class of languages supported by this technique is precisely the class of unambiguous context-free languages, or the same class as LALR(1) and Packrat. However, the class of grammar is quite a bit different, with the most notable weakness being non-support for left-recursion.
Packrat combinators do support left-recursion, but they still fail to support many other, more subtle features of LALR. This weakness generally stems from the ordered choice operator, which can lead to some devilishly tricky grammar bugs, as well as prevents certain nice grammatical formulations. The most often-seen example of these bugs happens when you accidentally order ambiguous choices as shortest first, resulting in a greedy match that prevents the correct branch from ever being tried. LALR doesn't have this problem, since it simply tries all possible branches at once, deferring the decision point until the end of the production.
There is also a new approach known as "parsing with derivatives". The approach is described here. There is an implementation in Scala by Daniel Spiewak.
Just wanted to update this answer with a pointer to the latest iteration of the parboiled project, called parboiled2:
https://github.com/sirthias/parboiled2
parboiled2 targets only Scala (as opposed to Scala + Java), makes use of Scala macros, and is very actively maintained.