How to tell whether parentheses are necessary or not? - parsing

I have written a parser in Haskell, which parses formulas in the form of string inputs and produces a Haskell data type defined by the BNF below.
formula ::= true
| false
| var
| formula & formula
| ∀ var . formula
| (formula)
var ::= letter { letter | digit }*
Now I would like to create an instance of Show so that I can nicely print the formulas defined by my types (I don't want to use deriving (Show)). My question is: How do I define my function so that it can tell when parentheses are necessary? I don't want too many, nor too little parentheses.
For example, given the formula ∀ X . (X & Y) & (∀ Y . Y) & false which, when parsed, produces the data structure
And (And (Forall "X" (And (Var "X") (Var "Y"))) (Forall "Y" (Var "Y"))) False
we have
Too little parentheses: ∀ X . X & Y & ∀ Y . Y & false
Too much parentheses: (∀ X . (((X) & (Y)))) & (∀ Y . (Y)) & (false)
Just right: ∀ X . (X & Y) & (∀ Y . Y) & false
Is there a way to gauge how many parenthesis are necessary so that the semantics is never ambiguous? I appreciate any feedback.

Untested pseudocode:
instance Show Formula where
showsPrec _p True = "True"
showsPrec _p False = "False"
showsPrec p (And f1 f2) = showParen (p > 5) $
showsPrec 5 f1 . (" & " ++) . showsPrec 5 f2
showsPrec p (Forall x f) = showParen (p > 8) $
("forall " ++ x ++) . showsPrec 8 f
...
(I should probably use showString instead of those ++ above. It should work anyway, I think.)
Above, the integer p represents the precedence of the context where we are showing the current formula. For example, if we are showing f inside f & ... then p will have the precedence level of &.
If we need to print a symbol in a context which has higher precedence, we need to add parentheses. E.g. if f is a | b we can't write a | b & ..., otherwise it is interpreted as a | (b & ...). We need to put parentheses around a | b. This is done by the showParen (p > ...).
When we recurse, we pass the precedence level of the symbol at hand to the subterms.
Above, I chose the precedence levels randomly. You need to adjust them to your tastes. You should also check that the levels you choose play along the standard libraries. E.g. printing Just someFormula should not generate things like Just a & b, but add parentheses.

Related

How to recover intermediate computation results from a function using "with"?

I wrote a function on the natural numbers that uses the operator _<?_ with the with-abstraction.
open import Data.Maybe
open import Data.Nat
open import Data.Nat.Properties
open import Relation.Binary.PropositionalEquality
open import Relation.Nullary
fun : ℕ → ℕ → Maybe ℕ
fun x y with x <? y
... | yes _ = nothing
... | no _ = just y
I would like to prove that if the result of computing with fun is nothing then the original two values (x and y) fulfill x < y.
So far all my attempts fall short to prove the property:
prop : ∀ (x y)
→ fun x y ≡ nothing
→ x < y
prop x y with fun x y
... | just _ = λ()
... | nothing = λ{refl → ?} -- from-yes (x <? y)}
-- This fails because the pattern matching is incomplete,
-- but it shouldn't. There are no other cases
prop' : ∀ (x y)
→ fun x y ≡ nothing
→ x < y
prop' x y with fun x y | x <? y
... | nothing | yes x<y = λ{refl → x<y}
... | just _ | no _ = λ()
--... | _ | _ = ?
In general, I've found that working with the with-abstraction is painful. It is probably due to the fact that with and | hide some magic in the background. I would like to understand what with and | really do, but the "Technical details" currently escape my understanding. Do you know where to look for to understand how to interpret them?
Concrete solution
You need to case-split on the same element on which you case-split in your function:
prop : ∀ x y → fun x y ≡ nothing → x < y
prop x y _ with x <? y
... | yes p = p
In the older versions of Agda, you would have had to write the following:
prop-old : ∀ x y → fun x y ≡ nothing → x < y
prop-old x y _ with x <? y
prop-old _ _ refl | yes p = p
prop-old _ _ () | no _
But now you are able to completely omit a case when it leads to a direct contradiction, which is, in this case, that nothing and just smth can never be equal.
Detailed explanation
To understand how with works you first need to understand how definitional equality is used in Agda to reduce goals. Definitional equality binds a function call with its associated expression depending on the structure of its input. In Agda, this is easily seen by the use of the equal sign in the definition of the different cases of a function (although since Agda builds a tree of cases some definitional equalities might not hold in some cases, but let's forget this for now).
Let us consider the following definition of the addition over naturals:
_+_ : ℕ → ℕ → ℕ
zero + b = b
(suc a) + b = suc (a + b)
This definition provides two definitional equalities that bind zero + b with b and (suc a) + b with suc (a + b). The good thing with definitional equalities (as opposed to propositional equalities) is that Agda automatically uses them to reduce goals whenever possible. This means that, for instance, if in a further goal you have the element zero + p for any p then Agda will automatically reduce it to p.
To allow Agda to do such reduction, which is fundamental in most cases, Agda needs to know which of these two equalities can be exploited, which means a case-split on the first argument of this addition has to be made in any further proof about addition for a reduction to be possible. (Except for composite proofs based on other proofs which use such case-splits).
When using with you basically add additional definitional equalities depending on the structure of the additional element. This only makes sense, understanding that, that you need to case-split on said element when doing proofs about such a function, in order for Agda once again to be able to make use of these definitional equalities.
Let us take your example and apply this reasoning to it, first without the recent ability to omit impossible cases. You need to prove the following statement:
prop-old : ∀ x y → fun x y ≡ nothing → x < y
Introducing parameters in the context, you write the following line:
prop-old x y p = ?
Having written that line, you need to provide a proof of x < y with the elements in the context. x and y are just natural so you expect p to hold enough information for this result to be provable. But, in this case, p is just of type fun x y ≡ nothing which does not give you enough information. However, this type contains a call to function fun so there is hope ! Looking at the definition of fun, we can see that it yields two definitional equalities, which depend on the structure of x <? y. This means that adding this parameter to the proof by using with once more will allow Agda to make use of these equalities. This leads to the following code:
prop-old : ∀ x y → fun x y ≡ nothing → x < y
prop-old x y p with x <? y
prop-old _ _ p | yes q = ?
prop-old _ _ p | no q = ?
At that point, not only did Agda case-split on x <? y, but it also reduced the goal because it is able, in both cases, to use a specific definitional equality of fun. Let us take a closer look at both cases:
In the yes q case, p is now of type nothing ≡ nothing and q is of type x < y which is exactly what you want to prove, which means the goal is simply solved by:
prop-old _ _ p | yes q = q
I the no q case, something more interesting happens, which is somewhat harder to understand. After reduction, p is now of type just y ≡ nothing because Agda could use the second definitional equality of fun. Since _≡_ is a data type, it is possible to case-split on p which basically asks Agda: "Look at this data type and give me all the possible constructors for an element of type just y ≡ nothing". At first, Agda only finds one possible constructor, refl, but this constructor only builds an element of a type where both sides of the equality are the same, which is not the case here by definition because just and nothing are two distinct constructors from the same data type, Maybe. Agda then concludes that there are no possible constructors that could ever build an element of such type, hence this case is actually not possible, which leads to Agda replacing p with the empty pattern () and dismissing this case. This line is thus simply:
prop-old _ _ () | no _
In the more recent versions of Agda, as I explained earlier, some of these steps are done directly by Agda which allows us to directly omit impossible cases when the emptiness of a pattern can be deduced behind the curtain, which leads to the prettier:
prop : ∀ x y → fun x y ≡ nothing → x < y
prop x y _ with x <? y
... | yes p = p
But it is the same process, just done a bit more automatically. Hopefully, these elements will be of some use in your journey towards understanding Agda.

Avoiding code duplication for data type with lots of similar constructors

I'm working on a writing simple parser in Haskell and have this datatype which holds the results of the parse.
data AST = Imm Integer
| ArgName String
| Arg Integer
| Add AST AST
| Sub AST AST
| Mul AST AST
| Div AST AST
deriving (Show, Eq)
The problem comes when I want to map over the tree to replace variable names with its reference number using a map. I have to write this code
refVars :: M.Map String Integer -> AST -> Maybe AST
refVars d (ArgName s) = case d M.!? s of
Just n -> Just (Arg n)
Nothing -> Nothing
refVars _ (Imm n) = Just $ Imm n
refVars _ (Arg n) = Just $ Arg n
refVars d (Add a1 a2) = Add <$> refVars d a1 <*> refVars d a2
refVars d (Sub a1 a2) = Sub <$> refVars d a1 <*> refVars d a2
refVars d (Mul a1 a2) = Mul <$> refVars d a1 <*> refVars d a2
refVars d (Div a1 a2) = Div <$> refVars d a1 <*> refVars d a2
Which seems incredibly redundant. Ideally I'd want to have one pattern which matches any (op a1 a2) but Haskell won't allow that. Any suggestions?
As proposed in the comments, the fix for your immediate problem is to move the information about the operator type out of the constructor:
data Op = Add | Sub | Mul | Div
data AST = Imm Integer
| ArgName String
| Arg Integer
| Op Op AST AST
This datatype has one constructor for all of the binary operations, so you only need one line to take it apart:
refVars :: M.Map String Integer -> AST -> Maybe AST
refVars d (ArgName s) = Arg <$> d !? s
refVars _ (Imm n) = Just $ Imm n
refVars _ (Arg n) = Just $ Arg n
refVars d (Op op a1 a2) = Op op <$> refVars d a1 <*> refVars d a2
You can handle all different types of binary operators without modifying refVars, but if you add different syntactic forms to your AST you'll have to add clauses to refVars.
data AST = -- other constructors as before
| Ternary AST AST AST
| List [AST]
| Call AST [AST] -- function and args
refVars -- other clauses as before
refVars d (Ternary cond tt ff) = Ternary <$> refVars d cond <*> refVars d tt <*> refVars d ff
refVars d (List l) = List <$> traverse (refVars d) l
refVars d (Call f args) = Call <$> refVars d f <*> traverse (refVars d) args
So it's still tedious - all this code does is traverse the tree to the leaves, whereupon refVars can scrutinise whether the leaf is an ArgName or otherwise. The interesting part of refVars is the one ArgName line; the remaining six lines of the function are pure boilerplate.
It'd be nice if we could define "traverse the tree" separately from "handle ArgNames". This is where generic programming comes in. There are lots of generic programming libraries out there, each with its own style and approach, but I'll demonstrate using lens.
The Control.Lens.Plated module defines a Plated class for types which know how to access their children. The deal is: you show lens how to access your children (by passing them to a callback g), and lens can recursively apply that to access the children's children and so on.
instance Plated AST where
plate g (Op op a1 a2) = Op op <$> g a1 <*> g a2
plate g (Ternary cond tt ff) = Ternary <$> g cond <*> g tt <*> g ff
plate g (List l) = List <$> traverse g l
plate g (Call f args) = Call <$> g f <*> traverse g args
plate _ a = pure a
Aside: you might object that even writing plate clause-by-clause is rather too much boilerplate. The compiler should be able to locate
the AST's children for you. lens has your back — there's a default
implementation of plate for any type which is an instance of
Data,
so you should be able to slap deriving Data onto AST and leave the
Plated instance empty.
Now we can implement refVars using transformM :: (Monad m, Plated a) => (a -> m a) -> a -> m a.
refVars :: M.Map String Integer -> AST -> Maybe AST
refVars d = transformM $ \case
ArgName s -> Arg <$> d !? s
x -> Just x
transformM takes a (monadic) transformation function and applies that to every descendant of the AST. Our transformation function searches for ArgName nodes and replaces them with Arg nodes, leaving any non-ArgNames unchanged.
For a more detailed explanation, see this paper (or the accompanying slides, if you prefer) by Neil Mitchell. It's what the Plated module is based on.
Here's how you could do it with Edward Kmett's recursion-schemes package:
{-# LANGUAGE DeriveTraversable, TemplateHaskell, TypeFamilies #-}
import Data.Functor.Foldable
import Data.Functor.Foldable.TH
import qualified Data.Map as M
data AST = Imm Integer
| ArgName String
| Arg Integer
| Add AST AST
| Sub AST AST
| Mul AST AST
| Div AST AST
deriving (Show, Eq)
makeBaseFunctor ''AST
refVars :: M.Map String Integer -> AST -> Maybe AST
refVars d (ArgName s) = case d M.!? s of
Just n -> Just (Arg n)
Nothing -> Nothing
refVars d a = fmap embed . traverse (refVars d) . project $ a
This works because your refVars function recurses just like a traverse. Doing makeBaseFunctor ''AST creates an auxiliary type based on your original type that has a Traversable instance. We then use project to switch to the auxiliary type, traverse to do the recursion, and embed to switch back to your type.
Side note: you can simplify the ArgName case to just refVars d (ArgName s) = Arg <$> d M.!? s.

Haskell Parsing: x ++ y : z

In Haskell, why is x ++ y : z parsed as x ++ (y : z) and not as (x ++ y) : z?
For instance, [1] ++ 2 : [3] evaluates to [1,2,3].
Both (++) and (:) are right-associative with precedence 5.
The fact that they are right associative means that these are parsed "right-to-left" so to speak. It thus means that x ⊕ y ⊕ z is parsed as x⊕ (y ⊕ z). So that means that x ++ y : z is indeed parsed as x ++ (y : z).
There are good reasons to make both (:) and (++) right associative. For the "cons" (:) operator, it means that we can write 1 : 4 : 2 : [], since it is parsed as 1 : (4 : (2: [])), which is correct in terms of the types. If would parse it like ((1:4):2:[]), then 1:4 would for example be wrong, since it expects an item as the first operand, and list of these items as the second operand. We can of course still let the Haskell parser parse it as a list, but that would result in a large amount of extra parenthesis.
For (++) it is better to parse it right-to-left as well due to performance reasons. x ++ y takes linear time in the size of x. So that means that if we parse x ++ (y ++ z), it will take |x| + |y| steps. If we would parse this as (x ++ y) ++ z, it would take 2×|x|+|y|, since the first time we apply (x ++ y) it runs in the size of x, but then (x ++ y) ++ z runs in the size of x ++ y. This thus would mean that if we concatenate n lists each with a size of m, it will not run in O(n×m), but in O(n2×m).

SMT let expression binding scope

I'm using a simple let expression to shorten my SMT formula. I want bindings to use previously defined bindings as follows, but if I remove the commented line and have n refer to s it doesn't work:
;;;;;;;;;;;;;;;;;;;;;
; ;
; This is our state ;
; ;
;;;;;;;;;;;;;;;;;;;;;
(declare-datatypes ((State 0))
(((rec
(myArray String)
(index Int))))
)
;;;;;;;;;;;;;;;;;;;;;;;;;;
; ;
; This is our function f ;
; ;
;;;;;;;;;;;;;;;;;;;;;;;;;;
(define-fun f ((in State)) State
(let (
(s (myArray in))
(n (str.len (myArray in))))
;;;;;;;;;;(n (str.len s)))
in
(rec (str.substr s 1 n) 1733))
)
I looked at the documentation here, and it's not clear whether it's indeed forbidden to have bindings refer to other (previously defined) bindings:
The whole let construct is entirely equivalent to replacing each new
parameter by its expression in the target expression, eliminating the
new symbols completely (...)
I guess it's a "shallow" replacement?
From Section 3.6.1 of http://smtlib.cs.uiowa.edu/papers/smt-lib-reference-v2.6-r2017-07-18.pdf:
Let. The let binder introduces and defines one or more local variables
in parallel. Semantically, a term of the form (let ((x1 t1) · · · (xn tn)) t) (3.3) is equivalent to the term t[t1/x1, . . . , tn/xn]
obtained from t by simultaneously replacing each free occurrence of xi
in t by ti , for each i = 1, . . . , n, possibly after a suitable
renaming of t’s bound variables to avoid capturing any variables in
t1, . . . , tn. Because of the parallel semantics, the variables x1, .
. . , xn in (3.3) must be pairwise distinct.
Remark 3 (No sequential
version of let). The language does not have a sequential version of
let. Its effect is achieved by nesting lets, as in (let ((x1 t1)) (let ((x2 t2)) t)).
As indicated in Remark 3, if you want to refer to an earlier definition you have to nest the let-expressions.

Proving MC/DC unique cause definition compliance

I'm reading the following paper on MC/DC: http://shemesh.larc.nasa.gov/fm/papers/Hayhurst-2001-tm210876-MCDC.pdf.
I have the source code: Z := (A or B) and (C or D) and the following test cases:
-----------------
| A | F F T F T |
| B | F T F T F |
| C | T F F T T |
| D | F T F F F |
| Z | F T F T T |
-----------------
I want to prove that the mentioned test cases comply with unique cause definition.
I started by eliminating masked tests:
A or B = F T T T T, meaning it masks the first test case from C or D as F and (C or D) = F.
C or D = T T F T T, meaning it masks the third test case from A or B as (A or B) and F = F.
I then determined MC/DC:
Required test cases for A or B:
F F (first case)
T F (fifth case)
F T (second or fourth case)
Required test cases for C or D:
F F (third case)
T F (fourth or fifth case)
F T (second case)
Required test cases for (A or B) and (C or D):
T T (second, fourth or fifth case)
F T (first case)
T F (third case)
According to the paper, this example doesn't complies to unique cause definition. Instead, they propose changing the second test case from F T F T to T F F T.
-----------------
| A | F T T F T |
| B | F F F T F |
| C | T F F T T |
| D | F T F F F |
| Z | F T F T T |
-----------------
I determined MC/DC for A or B again:
F F (first case)
T F (fifth case)
F T (fourth case)
Then, they introduce the following independence pairs table that shows the difference between both examples (in page 38):
I understand that for the first example, the independence pair that they show changes two variables instead of one, however I don't understand how they are computing the independence pairs.
In the A column, I can infer they take F F T F from the test cases table's A row, and they compute the independence pair as the same test case with only A changed (T F T F).
In B's column, however, they pick F F T F again. According to my thinking, this should equal to the B's column: F T F T instead.
The rest of the letters show the same dilemma.
Also for D's first example column, they show that the independence pair of F T F T is T F F F, which ruins my theory that they are computing the independence pair from the first value, and proving that they are picking it from somewhere else.
Can someone explain better how (and from where) do they construct such independence pair table?
First the let’s re-read the definitions:
(From www.faa.gov/aircraft/air_cert/design_approvals/air_software/cast/cast_papers/media/cast-10.pdf)
DO-178B/ED-12B includes the following definitions:
Condition
A Boolean expression containing no Boolean operators.
Decision
A Boolean expression composed of conditions and zero or more Boolean operators.
A decision without a Boolean operator is a condition.
If a condition appears more than once in a decision, each occurrence is a
distinct condition.
Decision Coverage
Every point of entry and exit in the program has been invoked at least once
and every decision in the program has taken on all possible outcomes at least once.
Modified Condition/Decision Coverage
Every point of entry and exit in the program has been invoked at least once,
every condition in a decision in the program has taken all possible outcomes
at least once, every decision in the program has taken all possible outcomes
at least once, and each condition in a decision has been shown to independently
affect that decision's outcome.
A condition is shown to independently affect a decision's outcome by varying just
that condition while holding fixed all other possible conditions.
So, for the decision '(A or B) and (C or D)' we have four conditions: A,B,C and D
For each condition we must find a pair of test vectors that shows that the condition
'independently affect that decision's outcome'.
For unique cause MC/DC, only the value of the condition considered can vary in the pair of test vectors.
For example let's consider condition A. The following pair of test vectors covers condition A:
(A or B) and (C or D) = Z
T F T F T
F F T F F
With this pair of test vectors (TFTF, FFTF) only the value of A and Z (the decision) change.
We then search pairs for conditions B, C and D.
Using the RapiCover GUI (Qualifiable Code coverage tool from Rapita Systems - www.rapitasystems.com/products/rapicover) we can see the full set of test vectors (observed or missing) to fully cover all conditions of the decision.
RapiCover screenshot
Vector V3 (in yellow in the screenshot above) isn't used in any independence pair.
Vector V6 (in red in the screenshot) is missing for MC/DC coverage of condition D.
This is for the definition of 'unique cause' MC/DC.
Now for 'masking MC/DC':
For 'masking MC/DC' the requirement that the value of a single condition may vary in a pair
of test vectors is relaxed provided that any other change is masked by the boolean
operators in the expression.
For example, let's consider the pair of vectors for condition D:
(A or B) and (C or D) = Z
T F F T T
T F F F F
We can represent these two test vectors on the expression tree:
and
/ \
or1 or2
/ \ / \
A B C D
and and
[T] [F]
/ \ / \
or1 or2 or1 or2
[T] [T] [T] [F]
/ \ / \ / \ / \
A B C D A B C D
[T] [F][F] [T] [T] [F][F] [F]
This is a pair for unique cause MC/DC.
Let's now consider a new pair of test vectors for condition D:
(A or B) and (C or D) = Z
F T F T T
T F F F F
Again we can represent these two test vectors on the expression tree:
and and
[T] [F]
/ \ / \
or1 or2 or1 or2
[T] [T] [T] [F]
/ \ / \ / \ / \
A B C D A B C D
[F] [T][F] [T] [T] [F][F] [F]
This is a pair for masking MC/DC because although the values for 3 conditions (A, B and D) have changed
the change for conditions A and B is masked by the boolean operator 'or1' (i.e. the value of 'A or B' is unchanged).
So, for masking MCDC, the independence pairs for all condition D can be:
RapiCover screenshot

Resources