By using Stack data structure, we can easily convert "Infix to Postfix" and "infix to Prefix" but I can't convert Prefix to postfix.
yes, we can convert prefix to infix then infix to postfix. but I want direct conversion from prefix to postfix.
Is there any feasible solution (not expression tree) ?
Related
I am currently writing a basic parser for an XML flavor. As an exercise, I am implementing an LL table-driven parser.
This is my example of BNF grammar:
%token name data string
%% /* LL(1) */
doc : elem
elem : "<" open_tag
open_tag : name attr close_tag
close_tag : ">" elem_or_data "</" name ">"
| "/>"
;
elem_or_data : "<" open_tag elem_or_data
| data elem_or_data
| /* epsilon */
;
attr : name ":" string attr
| /* epsilon */
;
Is this grammar correct ?
Each terminal literal is between quotes. The abstract terminals are specified by %token.
I am coding an hand-written lexer to convert my input into a tokens list. How would I tokenize the abstract terminals ?
The classic approach would be to write a regular expression (or other recogniser) for each possible terminal.
What you call "abstract" terminals, which are perfectly concrete, are actually terminals whose associated patterns recognise more than one possible input string. The string actually recognised (or some computed function of that string) should be passed to the parser as the semantic value of the token.
Nominally, at each point in the input string, the tokeniser will run all recognisers and choose the one with the longest match. (This is the so-called "maximal munch" rule.) This can usually be optimised, particularly if all the patterns are regular expressions. (F)lex will do that optimisation for you, for example.
A complication in your case is that the tokenisation of your language is context-dependent. In particular, when the target is elem_or_data, the only possible tokens are <, </ and "data". However, inside a tag, "data" is not possible, and "name" and "string" tags are possible (among others).
It is also possible that the value of an attribute could have the same lexical form as the key (i.e. a name). In XML itself, the attribute value must be a quoted string and the use of an unquoted string will be flagged as an error, but there are certainly "XML-like" languages (such as HTML) in which attribute values without whitespace can be inserted unquoted.
Since the lexical analysis depends on context, the lexical analyser must be passed (or have access to) an additional piece of information defining the lexical context. This is usually represented as a single enumeration value, which could be computed based on the last few tokens returned, or based on the FIRST set of the current parser stack.
Let's say I have Lexer rules like this:
EMPTY_LITERAL: '\'' '\'';
LITERAL: '\'' (ESCAPED_SEQ|.)*? '\'' ;
fragment ESCAPED_SEQ: '\\\'' | '\\\\'
and a parser rule like this:
literal: EMPTY_LITERAL #EmptyLiteral | LITERAL #LiteralWithContent;
I want to get the content of LITERAL without quotes in the parser. I can strip the quotes, of course, but I am interesting in getting that string without quotes.
If I move the inner rule in the LITERAL the rule will not match properly (will match only 1 char). If I move LITERAL as a parser rule, I can match ESCAPED_SEQ but this is not what I want. Is there a way to name the inner rule in the lexer?
Is there a way to name the inner rule in the lexer?
No, there is not. It's not possible to name or access specific parts of a token in ANTLR 4, nor is there a sensible way to turn LITERAL into a parser rule.
So stripping the quotes from the token's text yourself is your only option.
I need algorithms that will check whether given expression is infix, postfix or prefix expression.
I have tried a method by checking first or last 2 terms of the string e.g.
+AB if there is an operator in the very first index of string then its a prefix
AB+ if there is an operator in the very last index of string then its
a postfix
else it is an infix.
But it doesn't feel appropriate so kindly suggest me a better algorithim.
If it starts with a valid infix operator it's infix, unless you're going to allow unary operators.
If it ends with a valid postfix operator it's postfix.
Otherwise it is either infix or invalid.
Note that (3) includes the case you mentioned in comments of an expression in parentheses. There are no parentheses in prefix or postfix. That's why they exist. (3) also includes the degenerate case of a single term, e.g. 1, but in that case it doesn't matter how you parse it.
You can only detect an invalid expression by parsing it fully.
If you're going to allow unary operators in infix notation I can only suggest that you try all three parses and stop when you get a success. Very possibly this is the strategy you should follow anyway.
check the first elements in the string.
1- if the first element is an operator, then it is for sure prefix expression
2- else, check the second element, if it is operator, then it is for sure infix
3- else, it is for sure postfix
Playing around a little bit with infix operators, I was surprised about the following:
let (>~~~) = function null -> String.Empty | s -> s // compiles fine, see screenshot
match >~~~ input with .... // error: Unexpected infix operator in expression
and:
Changing the first characters of the prefix operator (to !~~~ for instance) fixes it. That I get an error that the infix operator is unexpected is rather weird. Hovering shows the definition to be string -> string.
I'm not too surprised about the error, F# requires (iirc) that the first character of a prefix operator must itself be one of the predefined prefix operators. But why does it compile just fine, and when I use it, the compiler complains?
Update: the F# compiler seems to know in other cases just fine when I use an invalid character in my operator definition, it says "Invalid operator definition. Prefix operator definitions must use a valid prefix operator name."
The rules for custom operators in F# are quite tight - so even though you can define custom operators, there is a lot of rules about how they will behave and you cannot change those. In particular:
Only some operators (mainly those with ! and ~) can be used as prefix operators. With ~ you can also overload unary operators +, -, ~ and ~~, so if you define an operator named ~+., you can then use it as e.g. +. 42.
Other operators (including those starting with >) can only be used as infix. You can turn any operator into ordinary function using parentheses, which is why e.g. (+) 1 2 is valid.
The ? symbols is special (it is used for dynamic invocation) and cannot appear as the first symbol of a custom operator.
I think the most intuitive way of thinking about this is that custom operators will behave like standard F# operators, but you can add additional symbols after the standard operator name.
let tolerance = 0.00000001
let (~=) x1 x2 = abs(x1 - x2) < tolerance
This throws the error:
"invalid operator definition. Prefix operator definitions must use a valid prefix operator name"
This isn't even a prefix operator, I don't get why it thinks so.
However the following is fine:
let (=~) x1 x2 = abs(x1 - x2) < tolerance
I only switched the order, so "=" comes before "~".
Is there any document online stating some rules regarding this?
I'm using Visual Studio 2013 with "F# 2013". The interactive console says "F# Interactive version 12.0.21005.1"
You can't define an infix operator starting with ~ in F#.
The F# 3.0 specification, section Categorization of Symbolic Operators explains the reason quite clearly:
The operators +, -, +., -., %, %%, &, && can be used as both prefix and infix operators. When these operators are used as prefix operators, the tilde character is prepended internally to generate the operator name so that the parser can distinguish such usage from an infix use of the operator. For example, -x is parsed as an application of the operator ~- to the identifier x. This generated name is also used in definitions for these prefix operators. Consequently, the definitions of the following prefix operators include the ~ character.
In F#, the ~ character in first position denotes a prefix operator. For example, (~-) is the prefix "opposite" operator: (~-) 3 is equivalent to - 3.