Finding FIRST sets in a grammar - parsing

Today I am reading how to find First and Follow of a grammar. I saw this grammar:
S → ACB | CbB | Ba
A → da | BC
B → g | ε
C → h | ε
The claim is that
FIRST(S) = FIRST(ABC) U FIRST(CbB) U FIRST(Ba)
= {d, g, h, ε} U {h, b} U {g, a}
= {d, g, h, ε, b, a}
I don't understand how a and b are in this set. Can anyone explain this?

Notice that B and C both are nullable (they can produce ε). This means that from the production
S → CbB
we get that b ∈ FIRST(S), since if we use the production C → ε we can get a production that starts with b.
Similarly, note that
S → Ba
is a production, so we get a ∈ FIRST(S) because we can use the production B → ε to get an a at the front of a string derivable from S.
Hope this helps!

Related

Is there any relationship between representing beta-equality as its congruence closure, and as sub-expression substitutions?

Representing beta-equality in Agda
I've recently asked what is the proper way to represent beta-equality in a proof language such as Agda. The accepted answer points a standard way to do it is by defining its congruence closure,
data _~_ {n} : Tm n → Tm n → Set where
β : ∀ {t u} → app (lam t) u ~ sub u t
app : ∀ {t t' u u'} → t ~ t' → u ~ u' → app t u ~ app t' u'
lam : ∀ {t t'} → t ~ t' → lam t ~ lam t'
~refl : ∀ {t} → t ~ t
~sym : ∀ {t t'} → t ~ t' → t' ~ t
~trans : ∀ {t t' t''} → t ~ t' → t' ~ t'' → t ~ t''
Which, if I understand correctly, specifies that: 1. the application (λx.t u) is equal to t[u/x], 2. the function/argument of an application or the body of a function can be replaced by equal terms; 3. reflexivity, symmetry and transitivity hold. The answer also suggests an alternative: one can define a one-step reduction relation between terms, then define a multi-step reduction relation, and, finally, define that two terms are equal if they can be eventually reduced to an identical term. Both of those alternatives make sense.
Another alternative
While I was waiting for the answer, I was looking at this definition:
data _~_ : Term → Term → Set where
refl : (a : Term) → a ~ a
red₁ : (a b : Term) → (f : Term → Term) → f a ~ b → f (redex a) ~ b
red₂ : (a b : Term) → (f : Term → Term) → a ~ f b → a ~ f (redex b)
amp₁ : (a b : Term) → (f : Term → Term) → f (redex a) ~ b → f a ~ b
amp₂ : (a b : Term) → (f : Term → Term) → a ~ f (redex b) → a ~ f b
Where redex a applies a single substitution if a is a λ application. This says that terms are equivalent if they are identical, or if they can be made identical by reducing/de-reducing any of its sub-expressions. One can prove sym, trans, cong:
sym : (a : Term) -> (b : Term) -> a ~ b -> b ~ a
trans : (a : Term) → (b : Term) → (c : Term) → a ~ b → b ~ c → a ~ c
cong : (f : Term → Term) → (a : Term) → (b : Term) → a ~ b → f a ~ f b
The complete source is available here. Now, for curiosity sake, I'd like to know if the third solution is also a valid representation? If so, what is its relationship with the previous two? If not, why?
A minor problem with this attempt is that this relation is inconsistent:
oops : var 0 ~ var 1
oops = red₂
(var 0)
(app id id)
(λ { (lam typ (var 0)) -> var 1; t -> var 0 })
(refl (var zero))
Since we're able to use an arbitrary Agda function on b, then, as long as we have an a that reduces to b, we're able to separate them within Agda and substitute by arbitrary / non-equal values. Thanks pgiarrusso on #agda at Freenode IRC for pointing this.

Relational Database Design decomposition and closure?

I have been trying to solve these two questions, but haven't had much luck.
Question 1: Show that the decomposition rule:
A → BC implies A → B and A → C,
is a sound rule, namely, that the functional dependencies A → B and A → C
are logically implied by the functional dependency A → BC.
Question 2: Let F be the following collection of functional dependencies
for relation schema R = (A, B, C, D, E):
D → A
BA → C
C → E
E → DB .
a) Compute the closure F + of F .
b) What are the candidate keys for R? List all of them.
c) List the dependencies in the canonical cover of the above set of
dependencies F (in other words, compute F c , as we have seen in class).
Any input will be helpful.

Problems about LL(1) grammar transformation

I have some problem in transforming the following non LL(1) grammar into LL(1) grammar. Is it possible to be transformed?
> A ::= B | A ; B
> B ::= C | [ A ]
> C ::= D | C , D
> D ::= x | (C)
where ;, x, (, ), [,] are terminals.
The main problems here are the productions
A → A ; B
and
C → C, D
which are left-recursive. In both cases, these productions will generate a string of objects separated by some kind of delimeter (semicolon in the first case, comma in the second), so you can rewrite them like this:
A → B ; A
C → D, C
This gives the grammar
A → B | B; A
B → C | [A]
C → D | D, C
D → x | (C)
The problem now is that there are productions for A and C that have a common prefix. But that's nothing to worry about: we can left-factor them like this:
A → B H
H → ε | ; A
B → C | [A]
C → D I
I → ε | C
D → x | (C)
I believe that this grammar is now LL(1).

Expressing a theorem about idempotent substitutions

I'm working in a simple library containing definitions and properties about substitutions for simple types. I'm using the following encoding for types:
data Ty (n : Nat) : Set where
var : Fin n -> Ty n
con : Ty n
app : Ty n -> Ty n -> Ty n
so, since variables are represented as finite sets, substitutions are just vectors:
Subst : Nat -> Nat -> Set
Subst n m = Vec (Ty m) n
The complete development is in the following paste: http://lpaste.net/107751
Using this encoding, I have defined several lemmas about such substitutions, but I do not know how to define a theorem that specifies that substitutions are idempotent. I believe that I must use some property like weakening in order to express this, but I can't figure out how.
Could someone give any directions or clues?
Thanks in advance.
Substitutions that produce expressions over fresh variables are indeed idempotent. But in order to express this theorem, you have to consider your substitution Subst n m as one operating on the joint variable set Subst (n + m) (n + m). Here is a variant that uses arbitrary variable sets A and B instead of Fin n and Fin m.
open import Relation.Binary.PropositionalEquality using (_≡_; refl)
-- Disjoint union.
data _+_ (A B : Set) : Set where
left : A → A + B
right : B → A + B
-- A set of variable names can be any type.
Names = Set
-- Simple expressions over a set of names.
data Tm (A : Names) : Set where
var : A → Tm A
app : Tm A → Tm A → Tm A
-- Substitute all variables from set A by terms over a name set B.
Subst : Names → Names → Set
Subst A B = A → Tm B
subst : ∀{A B} → Subst A B → Tm A → Tm B
subst σ (var x) = σ x
subst σ (app t u) = app (subst σ t) (subst σ u)
-- Rename all variables from set A to names from set B.
Rename : Names → Names → Set
Rename A B = A → B
rename : ∀{A B} → Rename A B → Tm A → Tm B
rename σ (var x) = var (σ x)
rename σ (app t u) = app (rename σ t) (rename σ u)
-- In order to speak about idempotency of substitutions whose domain A
-- is disjoint from the variable set B used in the range, we have to promote
-- them to work on the union A+B of variable sets.
-- The promoted substitution is the identity on B,
-- and the original substitution on A, but with the result transferred from B to A + B.
promote : ∀{A B} → Subst A B → Subst (A + B) (A + B)
promote σ (left x) = rename right (σ x)
promote σ (right x) = var (right x)
module _ {A B : Set} (σ₀ : Subst A B) where
-- Now assume a promoted substitution.
σ = promote σ₀
-- A promoted substitution has no effect on terms with variables only in B.
lemma : ∀ t → subst σ (rename right t) ≡ rename right t
lemma (var x) = refl
lemma (app t u) rewrite lemma t | lemma u = refl
-- Hence, it is idempotent.
idempotency : ∀ x → subst σ (σ x) ≡ σ x
idempotency (right x) = refl
idempotency (left x) = lemma (σ₀ x)

Multiple entries in an LL(1) parsing table?

Given this grammar:
S → S1 S2
S1 → a | ε
S2 → ab | ε
Therefore, we have
FIRST(S1) = { a, ε }
FOLLOW(S1) = { a }
Does that mean that in the parsing table I'll have multiple definitions in the row for S1 and the column for a?
Yes, that's correct. (However, note that your FOLLOW set is wrong; it also contains the end-of-input marker $). The issue here is that if the parser sees an a, it can't tell if that's because it wants to use the derivation
S → S1S2 → a S2
Or the derivation
S → S1S2 → S2 → ab
To fix this, you can note that your grammar only generates the strings { a, ab, aab }. Therefore, you can build an LL(1) for the language grammar that directly produces those three strings:
S → aY
Y → ε | aZ
Z → ε | b
Hope this helps!

Resources