Removing ambiguities in grammar - parsing

I have a simple grammar I've constructed with a problem I'm not sure how to solve:
start syntax Prog = prog: Type Id;
syntax Dot = {Id "."}+ ;
syntax Id =
id: [A-Z_a-z] !>> [0-9A-Z_a-z] \ KW
| Rv
;
syntax Rv = rv: "$" [a-z_A-Z][a-z_A-Z0-9]* ;
syntax Type =
Rv
| Ref
;
syntax Ref =
Dot
| s: "str"
;
keyword KW = "str" ;
layout LAYOUTLIST = LAYOUT* !>> [\t-\n\r\ /] ;
lexical LAYOUT = [\t-\n\r\ ] ;
The problem being that the we can resolve to rv in three different ways (Id -> Rv) and (Type -> Rv) and (Type -> Ref -> Dot -> Id -> Rv). The issue is that I need both Types and Ids to be able to be Rvs. So, given a simple program:
$a x
My thought was the I could use priorities to fix this (but I guess I don't really understand what they do) by changing the rule for Type to be:
syntax Type =
Rv
> Ref
;
In the hopes that the parser would associate an Rv with Type before checking if it could be resolved in the Ref branch. I've run some ambiguity diagnostics but I'm not exactly sure what to make of these:
info(
"Ambiguity cluster with 2 alternatives",
|cwd:///grammar.txt|(0,3,<1,0>,<1,3>)),
info(
"Production unique to the one alternative: Type = Rv ;",
|cwd:///grammar.txt|(0,3,<1,0>,<1,3>)),
info(
"Production unique to the other alternative: Dot = {Id \".\"}+ ;",
|cwd:///grammar.txt|(0,3,<1,0>,<1,3>)),
info(
"Production unique to the other alternative: {Id \".\"}+;",
|cwd:///grammar.txt|(0,3,<1,0>,<1,3>)),
info(
"Production unique to the other alternative: Ref = Dot ;",
|cwd:///grammar.txt|(0,3,<1,0>,<1,3>)),
info(
"Production unique to the other alternative: Type = Ref ;",
|cwd:///grammar.txt|(0,3,<1,0>,<1,3>)),
info(
"Production unique to the other alternative: Id = Rv ;",
|cwd:///grammar.txt|(0,3,<1,0>,<1,3>)),
info(
"The alternatives have different productions at the top, one has \n Type = Rv \nwhile the other has\n Type = Ref ",
|cwd:///grammar.txt|(0,3,<1,0>,<1,3>)),
warning(
"You should give this production a good label: \n Ref = Dot ]",
|cwd:///grammar.txt|(0,3,<1,0>,<1,3>)),
error(
"To fix this issue, you could restrict the nesting of\n Ref = Dot \nunder\n Type = Ref \nusing the ! operator on argument 0: !labelX\nHowever, you should realize that you are introducing a restriction that makes the language smaller",
|cwd:///grammar.txt|(0,3,<1,0>,<1,3>))
So, what I need is some way to Rvs to be both Types and Ids without ambiguity when parsing. Is this possible?
Thanks!

In this case, if you control the syntax of the language, you might opt for a non-ambiguous language by adding an explicit operator or some form of unique brackets or separator syntax. In other words, I don't think the grammar is accidentally ambiguous for a language which is not inherently ambiguous, and a non-ambiguous grammar should exist. I think the language as currently envisioned does not have a non-ambiguous context-free grammar.
An example fix:
syntax Type
= Rv
| "&" Ref
;
Another way would be to try and hack your way out of this using Rascal's disambiguation mechanisms, but they do not offer any ordered backtracking behaviour by design. It would make the semantics of the grammar dependent on the parsing algorithm (which governs execution order of the grammar rules).
You could try and use !>> and \ operators and ! to exclude certain sentences and derivations, but if this could be done without loosing sentences from the language then the diagnostics tool would have given you the suggestion. So these hacks might solve the ambiguity, but they would also change the design of the language and make it harder to the programmer to predict what is allowed and what is not allowed.
The priorities mechanism does not help either, because it is designed only for mutually recursive rules to have a different priority, like 1 + (2 * 3) vs (1 + 2) * 3
In short: I would opt for one or two changes in the language such that the syntax becomes unambiguous both for a general parser and for the human eye as well.

Related

Does a priority declaration disambiguate between alternative lexicals?

In my previous question, there was a priority > declaration in the example. It turned out not to matter because the solution there did not actually invoke priority but rather avoided it by making the alternatives disjoint. In this question, I'm asking whether priority can be used to select one lexical production over another. In the example below, the language of the production WordInitialDigit is intentionally a subset of that of WordAny. The production Word looks like it should disambiguate between the two properly, but the resulting parse tree has an ambiguity node at the top. Is a priority declaration able to decide between different lexical reductions, or does it require there to be a basis of common lexical elements? Or something else?
The example is contrived (there are no actions in the grammar), but the situations it arises from are not. For example, I'd like to use something like this for error recovery, where I can recognize a natural boundary for a unit of syntax and write a production for it. This generic production would be the last element in a priority chain; if it reduces, it means that there was no valid parse. More generally, I need to be able to select lexical elements based on syntactic context. I had hoped, since Rascal is scannerless, that this would be seamless. Perhaps it is, though I don't see it at the moment.
I'm on the unstable branch, version 0.10.0.201807050853.
EDIT: This question is not about > for defining an expression grammar. The documentation for priority declarations talks mostly about expressions, but the very first sentence provides what looks like a perfectly clear definition:
Priority declarations define a partial ordering between the productions within a single non-terminal.
So the example has two productions, an ordering declared between them, and yet the parser is still generating an ambiguity node in the clear presence of a disambiguation rule. So to put a finer point on my question, it looks like I don't know which of two situations pertains. Either (1) if this isn't supposed to work, then there's a defect in the language definition as documented, a deficiency in error reporting of the compiler, and a language design decision that's somewhere between counter-intuitive and user-hostile. Or (2) if this is supposed to work, there's a defect in the compiler and/or parser (presumably because the focus was initially on expressions) and at some point the example will pass its tests.
module ssce
import analysis::grammars::Ambiguity;
import ParseTree;
import IO;
import String;
lexical WordChar = [0-9A-Za-z] ;
lexical Digit = [0-9] ;
lexical WordInitialDigit = Digit WordChar* !>> WordChar;
lexical WordAny = WordChar+ !>> WordChar;
syntax Word =
WordInitialDigit
> WordAny
;
test bool WordInitialDigit_0() = parseAccept( #Word, "4foo" );
test bool WordInitialDigit_1() = parseAccept( #WordInitialDigit, "4foo" );
test bool WordInitialDigit_2() = parseAccept( #WordAny, "4foo" );
bool verbose = false;
bool parseAccept( type[&T<:Tree] begin, str input )
{
try
{
parse(begin, input, allowAmbiguity=false);
}
catch ParseError(loc _):
{
return false;
}
catch Ambiguity(loc l, str a, str b):
{
if (verbose)
{
println("[Ambiguity] #<a>, \"<b>\"");
Tree tt = parse(begin, input, allowAmbiguity=true) ;
iprintln(tt);
list[Message] m = diagnose(tt) ;
println( ToString(m) );
}
fail;
}
return true;
}
bool parseReject( type[&T<:Tree] begin, str input )
{
try
{
parse(begin, input, allowAmbiguity=false);
}
catch ParseError(loc _):
{
return true;
}
return false;
}
str ToString( list[Message] msgs ) =
( ToString( msgs[0] ) | it + "\n" + ToString(m) | m <- msgs[1..] );
str ToString( Message msg)
{
switch(msg)
{
case error(str s, loc _): return "error: " + s;
case warning(str s, loc _): return "warning: " + s;
case info(str s, loc _): return "info: " + s;
}
return "";
}
Excellent questions.
TL;DR:
the rule priority mechanism is not capable of an algorithmic ordering of a non-terminal's alternatives. Although some kind of partial order is involved in the additional grammatical constraints that a priority declaration generates, there is no "trying" one rule first, before the other. So it simply can't do that. The good news is that the priority mechanism has a formal semantics independent of any parsing algorithm, it's just defined in terms of context-free grammar rules and reduction traces.
using ambiguous rules for error recovery or "robust parsing", is a good idea. However, if there are too many such rules, the parser will eventually start showing quadratic or even cubic behavior, and tree building after parsing might even have higher polynomials. I believe the generated parser algorithm should have a (parameterized) mode for error recovery rather then expressing this at the grammar level.
Accepting ambiguity at parse time, and filtering/choosing trees after parsing is the recommended way to go.
All this talk of "ordering" in the documentation is misleading. Disambiguation is minefield of confusing terminology. For now, I recommend this SLE paper which has some definitions: https://homepages.cwi.nl/~jurgenv/papers/SLE2013-1.pdf
Details
priority mechanism not capable of choosing among alternatives
The use of the > operator and left, right generates a partial order between mutually recursive rules, such as found in expression languages, and limited to specific item positions in each rule: namely the left-most and right-most recursive positions which overlap. Rules which are lower in the hierarchy are not allowed to be grammatically expanded as "children" of rules which are higher in the hierarchy. So in E "*" E, neither E may be expaned to E "+" E if E "*" E > E "+" E.
The additional constraints do not choose for any E which alternative to try first. No they simply disallow certain expansions, assuming the other expansion is still valid and thus the ambiguity is solved.
The reason for the limitation at specific positions is that for these positions the parser generator can "prove" that they will generate ambiguity, and thus filtering one of the two alternatives by disallowing certain nestings will not result in additional parse errors. (consider a rule for array indexing: E "[" E "]" which should not have additional constraints for the second E. This is a so-called "syntax-safe" disambiguation mechanism.
All and all it is a pretty weak mechanism algorithmically, and specifically tailored for mutually recursive combinator/expression-like languages. The end-goal of the mechanism is to make sure we use have to use only 1 non-terminal for the entire expression language, and the parse trees looking very much akin in shape to abstract syntax trees. Rascal inherited all these considerations from SDF, via SDF2, by the way.
Current implementations actually "factor" the grammar or the parse table in some fashion invisibly to get the same effect, as-if somebody would have factored the grammar completely; however these implementations under-the-hood are very specific to the parsing algorithm in question. the GLR version is quite different from the GLL version, which again is quite different from the DataDependent version.
Post-parse filtering
Of course any tree, including ambiguous parse forests produced by the parser, can be manipulated by Rascal programs using pattern matching, visit, etc. You could write any algorithm to remove the trees you want. However, this requires the entire forest to be constructed first. It's possible and often fast enough, but there is a faster alternative.
Since the tree is built in a bottom-up fashion from the parse graph after parsing, we can also apply "rewrite rules" during the construction of the tree, and remove certain alternatives.
For example:
Tree amb({Tree a, *Tree others}) = amb(others) when weDoNotWant(a);
Tree amb({Tree a}) = a;
This first rule would match on the ambiguity cluster for all trees, and remove all alternatives which weDoNotWant. The second rule removes the cluster if only one alternative is left and let's the last tree "win".
If you want to choose among alternatives:
Tree amb({Tree a, Tree b, *Tree others}) = amb({a, others} when weFindPeferable(a, b);
If you don't want to use Tree but a more specific non-terminal like Statement that should also work.
This example module uses #prefer tags in syntax definitions to "prefer" rules which have been tagged over the other rules, as post-parse rewrite rules:
https://github.com/usethesource/rascal/blob/master/src/org/rascalmpl/library/lang/sdf2/filters/PreferAvoid.rsc
Hacking around with additional lexical constraints
Next to priority disambiguation and post-parse rewriting, we still have the lexical level disambiguation mechanisms in the toolkit:
`NT \ Keywords" - rejecting finite (keyword) languages from a non-terminals
CC << NT, NT >> CC, CC !<< NT, NT !>> CC follow and preceede restrictions (where CC stands for character-class and NT for non-terminal)
Solving other kinds of ambiguity apart from the operator precedence stuff can be tried with these, in particular if the length of different sub-sentences is shorter/longer between the different alternatives, !>> can do the "maximal munch" or "longest match" thing. So I was thinking out loud:
lexical C = A? B?;
where A is one lexical alternative and B is the other. With the proper !>> restrictions on A and !<< restrictions on B the grammar might be tricked into always wanting to put all characters in A, unless they don't fit into A as a language, in which case they would default to B.
The obvious/annoying advice
Think harder about an unambiguous and simpler grammar.
Sometimes this means to abstract and allow more sentences in the grammar, avoiding use of the grammar for "type checking" the tree. It's often better to over-approximate the syntax of the language and then use (static) semantic analysis (over simpler trees) to get what you want, rather then staring at a complex ambiguous grammar.
A typical example: C blocks with declarations only at the start are much harder to define unambiguously then C blocks where declarations are allowed everywhere. And for a C90 mode, all you have to do is flag declarations which are not at the start of a block.
This particular example
lexical WordChar = [0-9A-Za-z] ;
lexical Digit = [0-9] ;
lexical WordInitialDigit = Digit WordChar* !>> WordChar;
lexical WordAny = WordChar+ !>> WordChar;
syntax Word =
WordInitialDigit
| [0-9] !<< WordAny // this would help!
;
wrap up
Great question, thanks for the patience. Hope this helps!
The > disambiguation mechanism is for recursive definitions, like for example a expression grammar.
So it's to solve the following ambiguity:
syntax E
= [0-9]+
| E "+" E
| E "-" E
;
The string 1 + 3 - 4 can not be parsed as 1 + (3 - 4) or (1 + 3) - 4.
The > gives an order to this grammar, which production should be at the top of the tree.
layout L = " "*;
syntax E
= [0-9]+
| E "+" E
> E "-" E
;
this now only allows the (1 + 3) - 4 tree.
To finish this story, how about 1 + 1 + 1? That could be 1 + (1 + 1) or (1 + 1) + 1.
This is what we have left, right, and non-assoc for. They define how recursion in the same production should be handled.
syntax E
= [0-9]+
| left E "+" E
> left E "-" E
;
will now enforce: 1 + (1 + 1).
When you take an operator precendence table, like for example this c operator precedance table you can almost literally copy them.
note that these two disambiguation features are not exactly opposite to each other. the first ambiguitity could also have been solved by putting both productions in a left group like this:
syntax E
= [0-9]+
| left (
E "+" E
| E "-" E
)
;
As the left side of the tree is favored, you will now get a different tree 1 + (3 - 4). So it makes a difference, but it all depends on what you want.
More details can be found in the tutor pages on disambiguation

Preferring one alternative

An excerpt of my ANTLR v4 grammar looks like this:
expression:
| expression BINARY_OPERATOR expression
| unaryExpression
| nularExpression
;
unaryExpression:
ID expression
;
nularExpression:
ID
| NUMBER
| STRING
;
My goal is to match the language without knowing all the necessary keywords and therefore I'm simply matching keywords as IDs.
However there are binary operators that take an argument to both sides of the keyword (e.g. keyword ) and therefore they need "special treatment". As you can see I already included this "special treatment" in the expression rule.
The actual problem now consists of the fact that some of these binary operators can be used as unary operators (=normal keywords) as well meaning that the left argument does not have to be specified.
The above grammar can't habdle this case because everytime I tried to implement this I ended up with every binary operator being consumed as a unary operator.
Example:
Let's assume count is a binary operator.
Possible syntaxes are <arg1> count <arg2> and count <arg>
All my attempts to implement the above mentioned case ended up grouping myArgument count otherArgument like (myArgument (count (otherArgument) ) ) instead of (myArgument) count (otherArgument)
My brain tellsme that the solution to this problem is to tell the parser always to take two arguments for a binary operator and if this fails it should try to consume the binary operator as an unary one.
Does anybody know how to accomplish this?
How about something like this:
lower_precedence_expression
: ID higher_precedence_expression
| higher_precedence_expression
;
higher_precedence_expression
: higher_precedence_expression ID lower_precedence_expression
| ID
| NUMBER
| STRING
;
?

Layout in Rascal

When I import the Lisra recipe,
import demo::lang::Lisra::Syntax;
This creates the syntax:
layout Whitespace = [\t-\n\r\ ]*;
lexical IntegerLiteral = [0-9]+ !>> [0-9];
lexical AtomExp = (![0-9()\t-\n\r\ ])+ !>> ![0-9()\t-\n\r\ ];
start syntax LispExp
= IntegerLiteral
| AtomExp
| "(" LispExp* ")"
;
Through the start syntax-definition, layout should be ignored around the input when it is parsed, as is stated in the documentation: http://tutor.rascal-mpl.org/Rascal/Declarations/SyntaxDefinition/SyntaxDefinition.html
However, when I type:
rascal>(LispExp)` (something)`
This gives me a concrete syntax fragment error (or a ParseError when using the parse-function), in contrast to:
rascal>(LispExp)`(something)`
Which succesfully parses. I tried this both with one of the latest versions of Rascal as well as the Eclipse plugin version. Am I doing something wrong here?
Thank you.
Ps. Lisra's parse-function:
public Lval parse(str txt) = build(parse(#LispExp, txt));
Also fails on the example:
rascal>parse(" (something)")
|project://rascal/src/org/rascalmpl/library/ParseTree.rsc|(10329,833,<253,0>,<279,60>): ParseError(|unknown:///|(0,1,<1,0>,<1,1>))
at *** somewhere ***(|project://rascal/src/org/rascalmpl/library/ParseTree.rsc|(10329,833,<253,0>,<279,60>))
at parse(|project://rascal/src/org/rascalmpl/library/demo/lang/Lisra/Parse.rsc|(163,3,<7,44>,<7,47>))
at $shell$(|stdin:///|(0,13,<1,0>,<1,13>))
When you define a start non-terminal Rascal defines two non-terminals in one go:
rascal>start syntax A = "a";
ok
One non-terminal is A, the other is start[A]. Given a layout non-terminal in scope, say L, the latter is automatically defined by (something like) this rule:
syntax start[A] = L before A top L after;
If you call a parser or wish to parse a concrete fragment, you can use either non-terminal:
parse(#start[A], " a ") // parse using the start non-terminal and extra layout
parse(A, "a") // parse only an A
(start[A]) ` a ` // concrete fragment for the start-non-terminal
(A) `a` // concrete fragment for only an A
[start[A]] " a "
[A] "a"

How to resolve ambiguity in the definition of an LR(1) grammar?

I am writing a Golang compiler in OCaml, and argument lists are causing me a bit of a headache. In Go, you can group consecutive parameter names of the same type in the following way:
func f(a, b, c int) === func f(a int, b int, c int)
You can also have a list of types, without parameter names:
func g(int, string, int)
The two styles cannot be mix-and-matched; either all parameters are named or none are.
My issue is that when the parser sees a comma, it doesn't know what to do. In the first example, is a the name of a type or the name of a variable with more variables coming up? The comma has a dual role and I am not sure how to fix this.
I am using the Menhir parser generator tool for OCaml.
Edit: at the moment, my Menhir grammar follows exactly the rules as specified at http://golang.org/ref/spec#Function_types
As written, the go grammar is not LALR(1). In fact, it is not LR(k) for any k. It is, however, unambiguous, so you could successfully parse it with a GLR parser, if you can find one (I'm pretty sure that there are several GLR parser generators for OCAML, but I don't know enough about any of them to recommend one).
If you don't want to (or can't) use a GLR parser, you can do it the same way Russ Cox did in the gccgo compiler, which uses bison. (bison can generate GLR parsers, but Cox doesn't use that feature.) His technique does not rely on the scanner distinguishing between type-names and non-type-names.
Rather, it just accepts parameter lists whose elements are either name_or_type or name name_or_type (actually, there are more possibilities than that, because of the ... syntax, but it doesn't change the general principle.) That's simple, unambiguous and LALR(1), but it is overly-accepting -- it will accept func foo(a, b int, c), for example -- and it does not produce the correct abstract syntax tree because it doesn't attach the type to the list of parameters being declared.
What that means is that once the argument list is fully parsed and is about to be inserted into the AST attached to some function declaration (for example), a semantic scan is performed to fix it up and, if necessary, produce an error message. That scan is done right-to-left over the list of declaration elements, so that the specified type can be propagated to the left.
It's worth noting that the grammar in the reference manual is also overly-accepting, because it does not express the constraint that "either all parameters are named or none are". That constraint could be expressed in an LR(1) grammar -- I'll leave that as an exercise for readers -- but the resulting grammar would be a lot more difficult to understand.
You don't have ambiguity. The fact that the standard Go parser is LALR(1) proves that.
is a the name of a type or the name of a variable with more variables coming up?
So basically your grammar and the parser as a whole should be completely disconnected from the symbol table; don't be C – your grammar is not ambiguous therefore you can check the type name later in the AST.
These are the relevant rules (from http://golang.org/ref/spec); they are already correct.
Parameters = "(" [ ParameterList [ "," ] ] ")" .
ParameterList = ParameterDecl { "," ParameterDecl } .
ParameterDecl = [ IdentifierList ] [ "..." ] Type .
IdentifierList = identifier { "," identifier } .
I'll explain them to you:
IdentifierList = identifier { "," identifier } .
The curly braces represent the kleene-closure (In POSIX regular expression notation it's the asterisk). This rule says "an identifier name, optionally followed by a literal comma and an identifier, optionally followed by a literal comma and an identifier, etc… ad infinitum"
ParameterDecl = [ IdentifierList ] [ "..." ] Type .
The square brackets are nullability; this means that that part may or may not be present. (In POSIX regular expression notation it's the question mark). So you have "Maybe an IdentifierList, followed by maybe an ellipsis, followed by a type.
ParameterList = ParameterDecl { "," ParameterDecl } .
You can have several ParameterDecl in a list like e.g. func x(a, b int, c, d string).
Parameters = "(" [ ParameterList [ "," ] ] ")" .
This rules defines that a ParameterList is optional and to be surrounded by parenthesis and may include an optional final comma literal, useful when you write something like:
func x(
a, b int,
c, d string, // <- note the final comma
)
The Go grammar is portable and can be parsed by any bottom-up parser with one token of lookahead.
Edit regarding "don't be C": I said this because C is context-sensitive and the way they solve this problem in many (all?) compilers is by wiring the symbol table to the lexer and lexing tokens differently depending on if they are defined as type names or variables. This is a hack and should not be done for unambiguous grammars!

Bison: how to fix reduce/reduce conflict

Below is a a Bison grammar which illustrates my problem. The actual grammar that I'm using is more complicated.
%glr-parser
%%
s : e | p '=' s;
p : fp | p ',' fp;
fp : 'x';
e : te | e ';' te;
te : fe | te ',' fe;
fe : 'x';
Some examples of input would be:
x
x = x
x,x = x,x
x,x = x;x
x,x,x = x,x;x,x
x = x,x = x;x
What I'm after is for the x's on the left side of an '=' to be parsed differently than those on the right. However, the set of legal "expressions" which may appear on the right of an '='-sign is larger than those on the left (because of the ';').
Bison prints the message (input file was test.y):
test.y: conflicts: 1 reduce/reduce.
There must be some way around this problem. In C, you have a similar situation. The program below passes through gcc with no errors.
int main(void) {
int x;
int *px;
x;
*px;
*px = x = 1;
}
In this case, the 'px' and 'x' get treated differently depending on whether they appear to the left or right of an '='-sign.
You're using %glr-parser, so there's no need to "fix" the reduce/reduce conflict. Bison just tells you there is one, so that you know you grammar might be ambiguous, so you might need to add ambiguity resolution with %dprec or %merge directives. But in your case, the grammar is not ambiguous, so you don't need to do anything.
A conflict is NOT an error, its just an indication that your grammar is not LALR(1).
The reduce-reduce conflict in your grammar comes from the context:
... = ... x ,
At this point, the parser has to decide whether x is an fe or an fp, and it cannot know with one symbol lookahead. Indeed, it cannot know with any finite lookahead, you could have any number of repetitions of x , following that point without encountering a =, ; or the end of the input, any of which would reveal the answer.
This is not quite the same as the C issue, which can be resolved with single symbol lookahead. However, the C example is a classic illustration of why SLR(1) grammars are less powerful than LALR(1) grammars -- it's used for that purpose in the dragon book -- and a similarly problematic grammar is an example of the difference between LALR(1) and LR(1); it can be found in the bison manual (here):
def: param_spec return_spec ',';
param_spec: type | name_list ':' type;
return_spec: type | name ':' type;
type: "id";
name: "id";
name_list: name | name ',' name_list;
(The bison manual explains how to resolve this issue for LALR(1) grammars, although using a GLR grammar is always a possibility.)
The key to resolving such conflicts without using a GLR grammar is to avoid forcing the parser to make premature decisions.
For example, it is traditional to distinguish syntactically between lvalues and rvalues, and some languages continue to do so. C and C++ do not, however; and this turns out to be an extremely powerful feature in C++ because it allows the definition of functions which can act as lvalues.
In C, I think it's just to simplify the grammar a bit: the C grammar allows the result of any unary operator to appear on the left hand side of an assignment operator, but unary operators are actually a mix of lvalues (*v, v[expr]) and rvalues (sizeof v, f(expr)). The grammar could have distinguished between the two kinds of unary operators, but it could not resolve the actual restriction, which is that only modifiable lvalues may appears on the left side of an assignment operator.
C++ allows an arbitrary expression to appear on the left-hand side of an assignment operator (although some need to be parenthesized); consequently, the following is totally legal:
(predicate(x) ? *some_pointer : some_variable) = 42;
In your case, you could resolve the conflict syntactically by replacing te with p, since both non-terminals produce the same set of derivations. That's probably not the general solution, unless it is really the case in your full grammar that left-side expressions are a strict subset of right-side expressions. In a full grammar, you might end up with three types of expression (left-only, right-only, common), which could considerably complicated the grammar, and leaving the resolution for semantic analysis might prove to be easier (and even, as in the case of C++, surprisingly useful).

Resources