What is the difference between
do Application.Run(form)
and, simply:
Application.Run(form) ?
What is the role of do keyword in the first sentence?
Whereas 'do' was a required keyword in many places in the language in some of the earlier releases, nowadays you rarely need 'do'. The remaining exceptions that I can think of are that 'do' is still part of loop syntax (e.g. "while e1 do e2") and if you want to put an assembly-level attribute or an attribute on the startup method, you can put the attribute before the explicit 'do' of a final code block in a module. Often times in F# samples you'll see
[<STAThread>]
do Application.Run(form)
as the last two lines of a file, and I think the 'do' is still required there in order to be able to attach the attribute on the line above it.
I think it's just a holdover - like how you can still CALL a sub or SET a variable instead of just doing those things directly, as in:
SET varname = 5
CALL mysub()
Versus just:
varname = 5
mysub()
In other words, I don't think it matters, and the compiler just discards it.
Related
I'm new to Lex and I'm confused on how to declare the following macro, keyword. I want keyword to consist of either "if", "then", "else", or "while."
I typed this in lex:
keyword "if" | "then" | "else" | "while"
but the compiler is giving me an "unrecognized rule error". When I instead do
keyword "if"
It compiles ok.
Is this just a limitation of Lex? I know in jflex you can do what I did above and it'll work fine. Or am I doing it incorrectly?
Thanks
I can't test this right now, but off the top of my head:
Try putting the values in parentheses (before the first %%)
keyword ("if"|"then"|"else"|"while")
And then use it in rules like this (between %% and %%):
{keyword} {//action}
This is how you make a class in lex, so in the rest of the code you can use {keyword} and it will be recognized as the regex you've assigned in the definition section (before the first %%).
Also, you can use a class as a part of other regexs:
{keyword}\{[^\}]\} {//action}
This recognizes a whole block of code. (but it doesn't check the syntax inside the block, I leave that to you :) )
I noticed that the following code compiles and works in VS 2013:
let f() =
do Console.WriteLine(41)
42
But when looking at the F# 3.0 specification I can't find any mention of do being used this way. As far as I can tell, do can have the following uses:
As a part of loop (e.g. while expr do expr done), that's not the case here.
Inside computation expressions, e.g.:
seq {
for i in 1..2 do
do Console.WriteLine(i)
yield i * 2
}
That's not the case here either, f doesn't contain any computation expressions.
Though what confuses me here is that according to the specification, do should be followed by in. That in should be optional due to lightweight syntax, but adding it here causes a compile error (“Unexpected token 'in' or incomplete expression”).
Statement inside a module or class. This is also not the case here, the do is inside a function, not inside a module or a class.
I also noticed that with #light "off", the code doesn't compile (“Unexpected keyword 'do' in binding”), but I didn't find anything that would explain this in the section on lightweight syntax either.
Based on all this, I would assume that using do inside a function this way should not compile, but it does. Did I miss something in the specification? Or is this actually a bug in the compiler or in the specification?
From the documentation on MSDN:
A do binding is used to execute code without defining a function or value.
Even though the spec doesn't contain a comprehensive list of the places it is allowed, it is merely an expression asserted to be of type unit. Some examples:
if ((do ()); true) then ()
let x: unit = do ()
It is generally omitted. Each of the preceding examples are valid without do. Therefore, do serves only to assert that an expression is of type unit.
Going through the F# 3.0 specification expression syntax has do expr as a choice of class-function-or-value-defn (types) [Ch 8, A.2.5] and module-function-or-value-defn (modules) [Ch 10, A.2.1.1].
I don't actually see in the spec where function-defn can have more than one expression, as long all but the last one evaluate to unit -- or that all but the last expression is ignored in determining the functions return value.
So, it seems this is an oversight in the documentation.
I'm implementing a PEG parser generator in Python, and I've had success so far, except with the "cut" feature, of which whomever knows Prolog must know about.
The idea is that after a cut (!) symbol has been parsed, then no alternative options should be attempted at the same level.
expre = '(' ! list ')' | atom.
Means that after the ( is seen, the parsing must succeed, or fail without trying the second option.
I'm using Python's (very efficient) exception system to force backtracking, so I tried having a special FailedCut exception that would abort the enclosing choice, but that didn't work.
Any pointers to how this functionality is implemented in other parser generators would be helpful.
Maybe the problem I've had has been lack of locality. The code generated for the left part of the rule would be something like:
cut_seen = False
try:
self.token('(')
cut_seen = True
self.call('list')
self.token(')')
except FailedParse as e:
if cut_seen:
raise FailedCut(e)
raise
Then the code generated for the choice (|) operator will skip the following choices if it catches a FailedCut. What I mean by lack of locality is that the choice catching the FailedCut may be deep up in calls, thus having an effect too-difficult to discern.
Instead of making the code generated for sequences try to inform enclosing choices of cuts, I could make the code generated for choices beware of them. That would make the scope of cuts very local, unlike Prolog's, but good enough for what I want in a PEG parser, which is to commit to an option after a certain token sequence has been seen, so the error reporting is refers to that location in the source, instead of to another location where some other option might have been available.
It just occurred to me that if the code generated for a rule/predicate catches FailedCut and translates it into a normal FailedParse exception, then the cuts will have the right scope.
In reference to #false's question, here's a complete example of what I want to work:
start = expre ;
expre = named | term ;
named = word ':' ! term;
term = word ;
In that grammar, word can be reached through named or term, but I would like the parser to commit to the named branch after it has seen the :.
The Solution
To be fair, I've published my work so far at https://bitbucket.org/apalala/grako/.
In the final solution, sequences are enclosed with this context manager:
#contextmanager
def _sequence(self):
self._push_cut()
try:
yield
except FailedParse as e:
if self._cut():
self.error(e, FailedCut)
else:
raise
finally:
self._pop_cut()
And options in a choice function are enclosed with this:
#contextmanager
def _option(self):
p = self._pos
try:
self._push_ast()
try:
yield
ast = self.ast
finally:
self._pop_ast()
self.ast.update(ast)
except FailedCut as e:
self._goto(p)
raise e.nested
except FailedParse:
self._goto(p)
Which forces an exit out of the choice instead of a return to try the next option.
The cuts themselves are implemented thus:
def _cut(self):
self._cut_stack[-1] = True
The full source code may be found on Bitbucket.
In a Prolog with ISO Prolog's exception handling (catch/3 and throw/1), a cut could be implemented as:
cut. % Simply succeeds
cut :-
throw(cut). % on backtracking throws an exception
This would require to catch that exception at appropriate places. For example, each goal (that is non-terminal) of a user defined predicate could now be wrapped with:
catchcut(Goal) :-
catch(Goal,cut,fail).
This is not the most efficient way to implement cut since it does not free resources upon success of !, but it might be sufficient for your purposes. Also, this method now might interfere with user-defined uses of catch/3. But you probably do not want to emulate the entire Prolog language in any case.
Also, consider to use Prolog's dcg-grammars directly. There is a lot of fine print that is not evident when implementing this in another language.
The solution proposed at the end of my question worked:
cut_seen = False
try:
self.token('(')
cut_seen = True
self.call('list')
self.token(')')
except FailedParse as e:
if cut_seen:
raise FailedCut(e)
raise
Then, any time a choice or optional is evaluated, the code looks like this:
p = self.pos
try:
# code for the expression
except FailedCut:
raise
except FailedParse:
self.goto(p)
Edit
The actual solution required keeping a "cut stack". The source code is int Bitbucket.
Just read it.
I'd suggested a deep cut_seen (like with modifying parser's state) and a save and restore state with local variables. This uses the thread's stack as "cut_seen stack".
But you have another solution, and I'm pretty sure you're fine already.
BTW: nice compiler – it's just the opposite of what I'm doing with pyPEG so I can learn alot ;-)
A high precedence application expression is one in which an identifier is immediately following by a left paren without intervening whitespace, e.g., f(g). Parentheses are required when passing these as function arguments: func (f(g)).
Section 15.2 of the spec states the grammar and precedence rules allow the unparenthesized form -- func f(g) -- but an additional check prevents this.
Why is this intentionally prohibited? It would obviate the need for excessive parentheses and piping, and generally make the code much cleaner.
A common example is
raise <| IndexOutOfRangeException()
or
raise (IndexOutOfRangeException())
could become simply
raise IndexOutOfRangeException()
I agree that the need for writing the additional parentheses is a bit annoying. I think that the main reason why it is not allowed to omit them is that adding a whitespace would then change the meaning of your code in quite a significant way:
// Call 'foo' with the result of 'bar()' as an argument
foo bar()
// Call 'foo' with 'bar' as the first argument and '()' as the second
foo bar ()
There are still some rough edges where adding parens changes the evaluation (see this form post), but that "just" changes the evaluation order. This would change the meaning of your code!
I'm trying to understand what identifiers represent and what they don't represent.
As I understand it, an identifier is a name for a method, a constant, a variable, a class, a package/module. It covers a lot. But what can you not use it for?
Every language differs in terms of what entities/abstractions can or cannot be named and reused in that language.
In most languages, you can't use an identifier for infix arithmetic operations.
For example, plus is an identifier and you can make a function named plus. But write you can write a = b + c;, there's no way to define an operator named plus to make a = b plus c; work because the language grammar simply does not allow an identifier there.
An identifier allows you to assign a name to some data, so that you can reference it later. That is the limit of what identifiers do; you cannot "use" it for anything other than a reference to some data.
That said, there are a lot of implications that come from this, some subtle. For example, in most languages functions are, to some degree or another, considered to be data, and so a function name is an identifier. In languages where functions are values, but not "first-class" values, you can't use an identifier for a function in an place you could use an identifier for something else. In some languages, there will even be separate namespaces for functions and other data, and so what is textually the same identifier might refer to two different things, and they would be distinguished by the context in which they are used.
An example of what you usually (i.e., in most languages) cannot use an identifier for is as a reference to a language keyword. For example, this sort of thing generally can't be done:
let during = while;
during (true) { print("Hello, world."); }
You could say it's used for everything that you'll want to refer to multiple times, or maybe even once (but use it to clarify the referent's purpose).
What can/can't be named differs per language, it's often quite intuitive, IMHO.
An "Anonymous" entity is something which is not named, although referred to somehow.
#!/usr/bin/perl
$subroutine = sub { return "Anonymous subroutine returning this text"; }
In Perl-speak, this is anonymous - the subroutine is not named, but it is referred to by the reference variable $subroutine.
PS: In Perl, the subroutine would be named like this:
sub NAME_HERE {
# some code...
}
Say, in Java your cannot write something like:
Object myIf = if;
myIf (a == b) {
System.out.println("True!");
}
So, you cannot name some code statement, giving it an alias. While in REBOL it is perfectly possible:
myIf: if
myIf a = b [print "True!"]
What can and what can't be named depends on language, as you see.
as its name implifies, an identifier is used to identify something. so for everything that can be identified uniquely, you can use an identifier. But for example a literal (e.g. string literal) is not unique so you can't use an identifier for it. However you can create a variable and assign a string literal to it.
Making soup out them is rather foul.
In languages such as Lisp, an identifier exists in its own right as an symbol, whereas in languages which are not introspective identifiers don't exist in the runtime.
You write a literal identifier/symbol by putting a single quote in front of it:
[1]> 'a
A
You can create a variable and assign a symbol literal to it:
[2]> (setf a 'Hello)
HELLO
[3]> a
HELLO
[4]> (print a)
HELLO
HELLO
You can set two variables to the same symbol
[10]> (setf b a)
HELLO
[11]> b
HELLO
[12]> a
HELLO
[13]> (eq b a)
T
[14]> (eq b 'Hello)
T
Note that the values bound to b and a are the same, and the value is the literal symbol 'Hello
You can bind a function to the symbol
[15]> (defun hello () (print 'hello))
HELLO
and call it:
[16]> (hello)
HELLO
HELLO
In common lisp, the variable binding and the function binding are distinct
[19]> (setf hello 'goodbye)
GOODBYE
[20]> hello
GOODBYE
[21]> (hello)
HELLO
HELLO
but in Scheme or JavaScript the bindings are in the same namespace.
There are many other things you can do with identifiers, if they are reified as symbols. I suspect that someone more knowledgable than me in Lisp will be able to demonstrate any of the things that you 'can't do with identifiers' exist.
But even Lisp can not make identifier soup.
Sort of a left-field thought, but JSON has all those quotations in it to eliminate the danger of a JavaScript keyword messing up the parsing.