Definite Clause Grammar - Prolog - parsing

Below I have a Definite Clause Grammar that should accept the string aabccc, however when I tested the grammar, the only string I was able to get accepted was abc.
I'm pretty sure I've gotten rid of left-hand recursion, so I'm not sure what's going wrong.
s --> x, y, z.
x --> [a], x.
x --> [a].
y --> [b], y.
y --> [b].
z --> [c], z.
z --> [c].
Also, would I be able to define the above grammar as...
s --> x, y, z.
x --> [a], x; [a].
y --> [b], y; [b].
z --> [c], z; [c].

Both version of your grammar work as expected:
?- phrase(s, [a,a,b,b,c,c,c], R).
R = [] .
?- phrase(s, [a,a,b,b,c,c,c]).
true .
Maybe the issue was that you're trying to call it not with e.g. a [a,a,b,b,c,c,c] list of tokens but with a aabccc atom? If that's the case, you can use the standard atom_chars/2 built-in predicate to convert an atom into a list of characters:
?- atom_chars(aabccc, Tokens).
Tokens = [a, a, b, c, c, c].

Related

Why does cubical agda choose the particular two component homogeneous path composition operator it does?

In the implementation of the prelude for cubical agda, there is a definition of 3 component path composition
_∙∙_∙∙_ : w ≡ x → x ≡ y → y ≡ z → w ≡ z
This definition feels reasonably natural and clean to me. But then to get the 2 component path composition operator there are 3 choices: One for fixing each of the arguments as refl.
The standard ∙ is done by fixing the first argument, and another implementation (∙') is given for fixing the third argument. There is then a proof that they are the same. But the version fixing the 2nd argument is not discussed.
It seems to me that the 2nd argument version (call it ∘) has a nice property that
sym (p1 ∘ p2) is equal to sym p2 ∘ sym p1 definitionally. This seems like it can reduce the book keeping in some proofs.
Are there reasons why this version is not the standard version? Are there other computational properties that are better with the standard version?

How to recover intermediate computation results from a function using "with"?

I wrote a function on the natural numbers that uses the operator _<?_ with the with-abstraction.
open import Data.Maybe
open import Data.Nat
open import Data.Nat.Properties
open import Relation.Binary.PropositionalEquality
open import Relation.Nullary
fun : ℕ → ℕ → Maybe ℕ
fun x y with x <? y
... | yes _ = nothing
... | no _ = just y
I would like to prove that if the result of computing with fun is nothing then the original two values (x and y) fulfill x < y.
So far all my attempts fall short to prove the property:
prop : ∀ (x y)
→ fun x y ≡ nothing
→ x < y
prop x y with fun x y
... | just _ = λ()
... | nothing = λ{refl → ?} -- from-yes (x <? y)}
-- This fails because the pattern matching is incomplete,
-- but it shouldn't. There are no other cases
prop' : ∀ (x y)
→ fun x y ≡ nothing
→ x < y
prop' x y with fun x y | x <? y
... | nothing | yes x<y = λ{refl → x<y}
... | just _ | no _ = λ()
--... | _ | _ = ?
In general, I've found that working with the with-abstraction is painful. It is probably due to the fact that with and | hide some magic in the background. I would like to understand what with and | really do, but the "Technical details" currently escape my understanding. Do you know where to look for to understand how to interpret them?
Concrete solution
You need to case-split on the same element on which you case-split in your function:
prop : ∀ x y → fun x y ≡ nothing → x < y
prop x y _ with x <? y
... | yes p = p
In the older versions of Agda, you would have had to write the following:
prop-old : ∀ x y → fun x y ≡ nothing → x < y
prop-old x y _ with x <? y
prop-old _ _ refl | yes p = p
prop-old _ _ () | no _
But now you are able to completely omit a case when it leads to a direct contradiction, which is, in this case, that nothing and just smth can never be equal.
Detailed explanation
To understand how with works you first need to understand how definitional equality is used in Agda to reduce goals. Definitional equality binds a function call with its associated expression depending on the structure of its input. In Agda, this is easily seen by the use of the equal sign in the definition of the different cases of a function (although since Agda builds a tree of cases some definitional equalities might not hold in some cases, but let's forget this for now).
Let us consider the following definition of the addition over naturals:
_+_ : ℕ → ℕ → ℕ
zero + b = b
(suc a) + b = suc (a + b)
This definition provides two definitional equalities that bind zero + b with b and (suc a) + b with suc (a + b). The good thing with definitional equalities (as opposed to propositional equalities) is that Agda automatically uses them to reduce goals whenever possible. This means that, for instance, if in a further goal you have the element zero + p for any p then Agda will automatically reduce it to p.
To allow Agda to do such reduction, which is fundamental in most cases, Agda needs to know which of these two equalities can be exploited, which means a case-split on the first argument of this addition has to be made in any further proof about addition for a reduction to be possible. (Except for composite proofs based on other proofs which use such case-splits).
When using with you basically add additional definitional equalities depending on the structure of the additional element. This only makes sense, understanding that, that you need to case-split on said element when doing proofs about such a function, in order for Agda once again to be able to make use of these definitional equalities.
Let us take your example and apply this reasoning to it, first without the recent ability to omit impossible cases. You need to prove the following statement:
prop-old : ∀ x y → fun x y ≡ nothing → x < y
Introducing parameters in the context, you write the following line:
prop-old x y p = ?
Having written that line, you need to provide a proof of x < y with the elements in the context. x and y are just natural so you expect p to hold enough information for this result to be provable. But, in this case, p is just of type fun x y ≡ nothing which does not give you enough information. However, this type contains a call to function fun so there is hope ! Looking at the definition of fun, we can see that it yields two definitional equalities, which depend on the structure of x <? y. This means that adding this parameter to the proof by using with once more will allow Agda to make use of these equalities. This leads to the following code:
prop-old : ∀ x y → fun x y ≡ nothing → x < y
prop-old x y p with x <? y
prop-old _ _ p | yes q = ?
prop-old _ _ p | no q = ?
At that point, not only did Agda case-split on x <? y, but it also reduced the goal because it is able, in both cases, to use a specific definitional equality of fun. Let us take a closer look at both cases:
In the yes q case, p is now of type nothing ≡ nothing and q is of type x < y which is exactly what you want to prove, which means the goal is simply solved by:
prop-old _ _ p | yes q = q
I the no q case, something more interesting happens, which is somewhat harder to understand. After reduction, p is now of type just y ≡ nothing because Agda could use the second definitional equality of fun. Since _≡_ is a data type, it is possible to case-split on p which basically asks Agda: "Look at this data type and give me all the possible constructors for an element of type just y ≡ nothing". At first, Agda only finds one possible constructor, refl, but this constructor only builds an element of a type where both sides of the equality are the same, which is not the case here by definition because just and nothing are two distinct constructors from the same data type, Maybe. Agda then concludes that there are no possible constructors that could ever build an element of such type, hence this case is actually not possible, which leads to Agda replacing p with the empty pattern () and dismissing this case. This line is thus simply:
prop-old _ _ () | no _
In the more recent versions of Agda, as I explained earlier, some of these steps are done directly by Agda which allows us to directly omit impossible cases when the emptiness of a pattern can be deduced behind the curtain, which leads to the prettier:
prop : ∀ x y → fun x y ≡ nothing → x < y
prop x y _ with x <? y
... | yes p = p
But it is the same process, just done a bit more automatically. Hopefully, these elements will be of some use in your journey towards understanding Agda.

elimination of indirect left recursion

I'm having problems understanding an online explanation of how to remove the left recursion in this grammar. I know how to remove direct recursion, but I'm not clear how to handle the indirect. Could anyone explain it?
A --> B x y | x
B --> C D
C --> A | c
D --> d
The way I learned to do this is to replace one of the offending non-terminal symbols with each of its expansions. In this case, we first replace B with its expansions:
A --> B x y | x
B --> C D
becomes
A --> C x y | D x y | x
Now, we do the same for non-terminal symbol C:
A --> C x y | D x y | x
C --> A | c
becomes
A --> A x y | c x y | D x y | x
The only other remaining grammar rule is
D --> d
so you can also make that replacement, leaving your entire grammar as
A --> A x y | c x y | d x y | x
There is no indirect left recursion now, since there is nothing indirect at all.
Also see here.
To eliminate left recursion altogether (not merely indirect left recursion), introduce the A' symbol from your own materials (credit to OP for this clarification and completion):
A -> x A'
A' -> xyA' | cxyA' | dxyA' | epsilon
Response to naomik's comments
Yes, grammars have interesting properties, and you can characterize certain semantic capabilities in terms of constraints on grammar rules. There are transformation algorithms to handle certain types of parsing problems.
In this case, we want to remove left-recursion: one desirable property of a grammar is that the use of any rule must consume at least one input token (terminal symbol). Left-recursion opens a door to infinite recursion in the parser.
I learned these things in my "Foundations of Computing" and "Compiler Construction" classes many years ago. Instead of writing a parser to adapt to a particular grammar, we'd transform the grammar to fit the parser style we wanted.

How to transform a term into a list in Prolog

How can i transform a term like: 3 * y * w * t^3 in a list made of: List = [3, *, y,...], without using the following predicate:
t2l(Term, List) :-
t2l_(Term, List-X),
X = [].
t2l_(Term, [F|X]-X) :-
Term =.. [F],
!.
t2l_(Term, L1-L4) :-
Term =.. [F, A1, A2],
t2l_(A1, L1-L2),
L2 = [F|L3],
t2l_(A2, L3-L4).
Is there a simple way?
In Prolog, everything that can be expressed by pattern matching should be expressed by pattern matching.
In your case, this is difficult, because you cannot collectively distinguish the integers from other arising terms by pattern matching with the defaulty representation you are using.
In the following, I am not solving the task completely for you, but I am showing how you can solve it once you have a clean representation.
As always when describing a list in Prolog, consider using dcg notation:
term_to_list(y) --> [y].
term_to_list(w) --> [w].
term_to_list(t) --> [t].
term_to_list(i(I)) --> [I].
term_to_list(A * B) -->
term_to_list(A),
[*],
term_to_list(B).
term_to_list(A^B) -->
term_to_list(A),
[^],
term_to_list(B).
In this example, I am using i(I) to symbolically represent the integer I.
Sample query and result:
?- phrase(term_to_list(i(3)*y*w*t^i(3)), Ls).
Ls = [3, *, y, *, w, *, t, ^, 3].
I leave converting the defaulty representation to a clean one as an easy exercise.
Thanks mat for answering, i forgot to close the question. However i have created a new predicate that solve the problem:
term_string(Term, X),
string_codes(X, AList),
ascii_to_list(AList, Y).
ascii_to_list([X | Xs], [Y | Out]) :-
X >= 48,
X =< 57,
!,
number_codes(Y, [X]),
ascii_to_list(Xs, Out).
ascii_to_list([X | Xs], [Y | Out]) :-
char_code(Y, X),
ascii_to_list(Xs, Out).
ascii_to_list([], []).

How do I rewrite a context free grammar so that it is LR(1)?

For the given context free grammar:
S -> G $
G -> PG | P
P -> id : R
R -> id R | epsilon
How do I rewrite the grammar so that it is LR(1)?
The current grammar has shift/reduce conflicts when parsing the input "id : .id", where "." is the input pointer for the parser.
This grammar produces the language satisfying the regular expression (id:(id)*)+
It's easy enough to produce an LR(1) grammar for the same language. The trick is finding one which has a similar parse tree, or at least from which the original parse tree can be recovered easily.
Here's a manually generated grammar, which is slightly simplified from the general algorithm. In effect, we rewrite the regular expression:
(id:id*)+
to:
id(:id+)*:id*
which induces the grammar:
S → id G $
G → P G | P'
P' → : R'
P → : R
R' → ε | id R'
R → ε | id R
which is LALR(1).
In effect, we've just shifted all the productions one token to the right, and there is a general algorithm which can be used to create an LR(1) grammar from an LR(k+1) grammar for any k≥1. (The version of this algorithm I'm using comes from Parsing Theory by S. Sippu & E. Soisalon-Soininen, Vol II, section 6.7.)
The non-terminals of the new grammar will have the form (x, V, y) where V is a symbol from the original grammar (either a terminal or a non-terminal) and x and y are terminal sequences of maximum length k such that:
y ∈ FOLLOWk(V)
x ∈ FIRSTk(Vy)
(The lengths of y and consequently x might be less than k if the end of input is included in the follow set. Some people avoid this issue by adding k end symbols, but I think this version is just as simple.)
A non-terminal (x, V, y) will generate the x-derivative of the strings derived from Vy from the original grammar. Informally, the entire grammar is shifted k tokens to the right; each non-terminal matches a string which is missing the first k tokens but is augmented with the following k tokens.
The productions are generated mechanically from the original productions. First, we add a new start symbol, S' with productions:
S' → x (x, S, ε)
for every x ∈ FIRSTk(S). Then, for every production
T → V0 V1 … Vm
we generate the set of productions:
(x0,T,xm+1) → (x0,V0,x1) (x1,V1,x2) … (xm,Vm,xm+1)
and for every terminal A we generate the set of productions
(Ax,A,xB) → B if |x| = k
(Ax,A,x) → ε if |x| ≤ k
Since there is an obvious homomorphism from the productions in the new grammar to the productions in the old grammar, we can directly create the original parse tree, although we need to play some tricks with the semantic values in order to correctly attach them to the parse tree.

Resources