Function definition by induction principles in Agda - agda

When playing around with proof verification in Agda, I realised that I used induction principles for some types explicitly and in other cases used pattern matching istead.
I finally found some text about pattern matching and induction principles on wikipedia: "In Agda, dependently typed pattern matching is a primitive of the language, the core language doesn't have the induction/recursion principles that pattern matching translates to."
So are type theoretic induction and recursion principles (to define functions on types) in Agda completely redundant because of Agdas pattern matching? Something like this (Path induction implied) would have only didactical value then.
http://en.wikipedia.org/wiki/Agda_%28programming_language%29#Dependently_typed_pattern_matching

I'm not familiar with Agda, but I think that on this particular point, it is similar to Coq, and I can answer the corresponding question for Coq. Both languages are based on intuitionistic type theory with inductive types.
In intuitionistic type theory, it is possible to derive recursion principles from a sufficiently general fixpoint combinator, plus dependent pattern matching. Pattern matching isn't enough: while this allows an inductive type to be destructed, pattern matching alone doesn't allow writing recursive functions on that type. Taking the example from Wikipedia:
add zero n = n
add (suc n) m = suc (add n m)
The fact that this definition is well-formed doesn't require an induction principle on ℕ. But in addition to the pattern matching beformed on the first argument of add, it requires a rule that states that the recursive call in the second case is well-formed.
With pattern matching and recursion, induction principles can be defined as first-class objects:
nat_ind f_zero f_suc zero = f
nat_ind f_zero f_suc (suc n) = f_suc (nat_ind f_zero f_suc n)

I would not say they are completely irrelevant: they are combinators and, as such, can help you structure your developments, the way you think about them and help you write less (repetitive) code.
Look at Haskell for instance: pattern-matching is also primitive in Haskell but more often than not you will resort to fmap, bind, fold or traverse to write less code or give a more generic, more robust implementation.

Related

Understanding and Writing Parsers

I'm writing a program that requires me to create my first real, somewhat complicated parser. I would like to understand what parsing algorithms exists, as well as how to create a "grammar". So my question(s) are as follows:
1) How does one create a formal grammar that a parser can understand? What are the basic components of a grammar?
2) What parsing algorithms exists, and what kind of input does each exceed at parsing?
3) In light of the broad nature of the questions above, what are some good references I can read through to understand the answer to questions 1 and 2?
I'm looking for more of a broad overview with the keywords/topic areas I need so I can look into the details myself. Thanks everybody!
You generally write a context-free grammar G that describes a certain formal language L (e.g. the set of all syntactically valid C programs) which is simply a set of strings over a certain alphabet (think of all well-formed C programs; or of all well-formed HTML documents; or of all well-formed MARKDOWN posts; all of these are sets of finite strings over certain subsets of the ASCII character set). After that you come up with a parser for the given grammar---that is, an algorithm that, given a string w, decides whether the string w can be derived by the grammar G. (For example, the grammar of the C11 language describes the set of all well-formed C programs.)
Some types of grammars admit simple-to-implement parsers. An example of grammars that are often used in practice are LL grammars. A special subset of LL grammars, called the LL(1) grammars, have parsers that run in linear time (linear in the length of the string we're parsing).
There are more general parsing algorithms---most notably the Early parser and the CYK algorithm---that take as inpuit a string w and a grammar G and decide in time O(|w|^3) whether the string w is derivable by the grammar G. (Notice how cool this is: the algorithm takes the grammar as an agrument. But I don't think this is used in practice.)
I implemented the Early parser in Java some time ago. If your're insterested, the code is available on GitHub.
For a concrete example of the whole process, consider the language of all balanced strings of parenthesis (), (()), ((()))()(())(), etc. We can describe them with the following context-free grammar:
S -> (S) | SS | eps
where eps is the empty production. For example, we can derive the string (())() as follows: S => SS => (S)S => ((S))S => (())S => (())(S) => (())(). We can easily implement a parser for this grammar (left as exercise :-).
A very good references is the so-called dragon book: Compilers: Principles, Techniques, and Tools by Aho et al. It covers all the essential topics. Another good reference is the classic book Introduction to Automata Theory, Languages, and Computation by Hopcroft et al.

Assumptions in Z3 or Z3Py

is there a way to express assumptions in Z3 (I am using the Z3Py library) such that the engine does not check their validity but takes them as underlying theories, just like in theorem proving?
For example, lets say that I have two unary functions with argument of type Real. I would like to tell the Z3 engine that for all input values, f1(t) is equal to f2(t).
Encoded in Z3Py that would look something like the following:
t = Real("t")
assumption1 = ForAll(t, f1(t) = f2(t)).
The problem with the presented code is that my assertion set is quite big and I use quantifiers (I am trying to prove satisfiability of a real-time system). If I add the above assertion to the set of the other assertions the checking procedure does not terminate.
is there a way to express assumptions in Z3 (I am using the Z3Py library) such that the engine does not check their validity but takes them as underlying theories, just like in theorem proving?
In fact, all assertions you add to Z3 are treated as what you call assumptions. Z3 checks satisfiability of the assertions, it does not check validity. To check validity of a formula F, you assert (not F), and check for satisfiability of (not F). If (not F) is unsat, then F is valid. If you have background axioms, you are essentially checking validity of Background => F, so you can check satisifiability of Background & (not F).
Whether Z3 terminates on your query depends on which combination of theories and quantifiers you use. The more features your queries combine the tougher it is.
For formulas over pure linear arithmetic or polynomial real arithmetic,
these are called LRA, LIA and NRA in the SMT-LIB classification (see smtlib.org) Z3 uses specialized decision procedures that have recently been added.
Yes, that's possible just as you describe it, but you will end up with quantifiers, which does of course mean that you're solving a harder problem and Z3 will behave differently (it's possible you end up using completely different solvers that don't even share much source code).
For the particular example given, it's possible to eliminate the quantifier cheaply because it has the form of a function definition (ForAll x . f(x) = ...), i.e., we can just replace all occurrences of f with the right hand side and then the quantifier is trivially satisfied. In Z3, this is done by the macro finder, which may be applied as a tactic (with name "macro-finder"), or if you are using the "smt" tactic (implicitly via others or directly), then you can set smt.macro_finder=true.

Generalized Bottom up Parser Combinators in Haskell

I am wondered why there is no generalized parser combinators for Bottom-up parsing in Haskell like a Parsec combinators for top down parsing.
( I could find some research work went during 2004 but nothing after
https://haskell-functional-parsing.googlecode.com/files/Ljunglof-2002a.pdf
http://www.di.ubi.pt/~jpf/Site/Publications_files/technicalReport.pdf )
Is there any specific reason for not achieving it?
This is because of referential transparency. Just as no function can tell the difference between
let x = 1:x
let x = 1:1:1:x
let x = 1:1:1:1:1:1:1:1:1:... -- if this were writeable
no function can tell the difference between a grammar which is a finite graph and a grammar which is an infinite tree. Bottom-up parsing algorithms need to be able to see the grammar as a graph, in order to enumerate all the possible parsing states.
The fact that top-down parsers see their input as infinite trees allows them to be more powerful, since the tree could be computationally more complex than any graph could be; for example,
numSequence n = string (show n) *> option () (numSequence (n+1))
accepts any finite ascending sequence of numbers starting at n. This has infinitely many different parsing states. (It might be possible to represent this in a context-free way, but it would be tricky and require more understanding of the code than a parsing library is capable of, I think)
A bottom up combinator library could be written, though it is a bit ugly, by requiring all parsers to be "labelled" in such a way that
the same label always refers to the same parser, and
there is only a finite set of labels
at which point it begins to look a lot more like a traditional specification of a grammar than a combinatory specification. However, it could still be nice; you would only have to label recursive productions, which would rule out any infinitely-large rules such as numSequence.
As luqui's answer indicates a bottom-up parser combinator library is not a realistic. On the chance that someone gets to this page just looking for haskell's bottom up parser generator, what you are looking for is called the Happy parser generator. It is like yacc for haskell.
As luqui said above: Haskell's treatment of recursive parser definitions does not permit the definition of bottom-up parsing libraries. Bottom-up parsing libraries are possible though if you represent recursive grammars differently. With apologies for the self-promotion, one (research) parser library that uses such an approach is grammar-combinators. It implements a grammar transformation called the uniform Paull transformation that can be combined with the top-down parser algorithm to obtain a bottom-up parser for the original grammar.
#luqui essentially says, that there are cases in which sharing is unobservable. However, it's not the case in general: many approaches to observable sharing exist. E.g. http://www.ittc.ku.edu/~andygill/papers/reifyGraph.pdf mentions a few different methods to achieve observable sharing and proposes its own new method:
This looping structure can be used for interpretation, but not for
further analysis, pretty printing, or general processing. The
challenge here, and the subject of this paper, is how to allow trees
extracted from Haskell hosted deep DSLs to have observable back-edges,
or more generally, observable sharing. This a well-understood problem,
with a number of standard solutions.
Note that the "ugly" solution of #liqui is mentioned by the paper under the name of explicit labels. The solution proposed by the paper is still "ugly" as it uses so called "stable names", but other solutions such as http://www.cs.utexas.edu/~wcook/Drafts/2012/graphs.pdf (which relies on PHOAS) may work.

LR(k) to LR(1) grammar conversion

I am confused by the following quote from Wikipedia:
In other words, if a language was reasonable enough to allow an
efficient one-pass parser, it could be described by an LR(k) grammar.
And that grammar could always be mechanically transformed into an
equivalent (but larger) LR(1) grammar. So an LR(1) parsing method was,
in theory, powerful enough to handle any reasonable language. In
practice, the natural grammars for many programming languages are
close to being LR(1).[citation needed]
This means that a parser generator, like bison, is very powerful (since it can handle LR(k) grammars), if one is able to convert a LR(k) grammar to a LR(1) grammar. Do some examples of this exist, or a recipe on how to do this? I'd like to know this since I have a shift/reduce conflict in my grammar, but I think this is because it is a LR(2) grammar and would like to convert it to a LR(1) grammar. Side question: is C++ an unreasonable language, since I've read, that bison-generated parsers cannot parse it.
For references on the general purpose algorithm to find a covering LR(1) grammar for an LR(k) grammar, see Real-world LR(k > 1) grammars?
The general purpose algorithm produces quite large grammars; in fact, I'm pretty sure that the resulting PDA is the same size as the LR(k) PDA would be. However, in particular cases it's possible to come up with simpler solutions. The general principle applies, though: you need to defer the shift/reduce decision by unconditionally shifting until the decision can be made with a single lookahead token.
One example: Is C#'s lambda expression grammar LALR(1)?
Without knowing more details about your grammar, I can't really help more than that.
With regard to C++, the things that make it tricky to parse are the preprocessor and some corner cases in parsing (and lexing) template instantiations. The fact that the parse of an expression depends on the "kind" (not type) of a symbol (in the context in which the symbol occurs) makes precise parsing with bison complicated. [1] "Unreasonable" is a value judgement which I'm not comfortable making; certainly, tool support (like accurate syntax colourizers and tab-completers) would have been simple with a different grammar, but the evidence is that it is not that hard to write (or even read) good C++ code.
Notes:
[1] The classic tricky parse, which also applies to C, is (a)*b, which is a cast of a dereference if a represents a type, and otherwise a multiplication. If you were to write it in the context: c/(a)*b, it would be clear that an AST cannot be constructed without knowing whether it's a cast or a product, since that affects the shape of the AST,
A more C++-specific issue is: x<y>(z) (or x<y<z>>(3)) which parse (and arguably tokenise) differently depending on whether x names a template or not.

Converting a context-free grammar into a LL(1) grammar

First off, I have checked similar questions and none has quite the information I seek.
I want to know if, given a context-free grammar, it is possible to:
Know if there exists or not an equivalent LL(1) grammar. Equivalent in the sense that it should generate the same language.
Convert the context-free grammar to the equivalent LL(1) grammar, given it exists. The conversion should succeed if an equivalent LL(1) grammar exists. It is OK if it does not terminate if an equivalent does not exists.
If the answer to those questions are yes, are such algorithms or tools available somewhere ? My own researches have been fruitless.
Another answer mentions that the Dragon Book has an algorithms to eliminate left recursion and to left factor a context-free grammar. I have access to the book and checked it but it is unclear to me if the grammar is guaranteed to be LL(1). The restriction imposed on the context-free grammar (no null production and no cycle) are agreeable to me.
From university compiler courses I took I remember that LL(1) grammar is context free grammar, but context free grammar is much bigger than LL(1). There is no general algorithm (meaning not NP hard) to check and convert from context-free (that can be transformed to LL(1)) to LL(1).
Applying the bag of tricks like removing left recursion, removing first-follow conflict, left-factoring, etc. are similar to mathematical transformation when you want to integrate a function... You need experience that is sometimes very close to an art. The transformations are often inverse of each other.
One reason why LR type grammars are being used now a lot for generated parsers is that they cover much wider spectrum of context free grammars than LL(1).
Btw. e.g. C grammar can be expressed as LL(1), but C# cannot (e.g. lambda function x -> x + 1 comes to mind, where you cannot decide if you are seeing a parameter of lambda or a known variable).

Resources