Lua Semicolon Conventions - lua

I was wondering if there is a general convention for the usage of semicolons in Lua, and if so, where/why should I use them? I come from a programming background, so ending statements with a semicolon seems intuitively correct. However I was concerned as to why they are "optional" when its generally accepted that semicolons end statements in other programming languages. Perhaps there is some benefit?
For example: From the lua programming guide, these are all acceptable, equivalent, and syntactically accurate:
a = 1
b = a*2
a = 1;
b = a*2;
a = 1 ; b = a*2
a = 1 b = a*2 -- ugly, but valid
The author also mentions: Usually, I use semicolons only to separate two or more statements written in the same line, but this is just a convention.
Is this generally accepted by the Lua community, or is there another way that is preferred by most? Or is it as simple as my personal preference?

Semi-colons in Lua are generally only required when writing multiple statements on a line.
So for example:
local a,b=1,2; print(a+b)
Alternatively written as:
local a,b=1,2
print(a+b)
Off the top of my head, I can't remember any other time in Lua where I had to use a semi-colon.
Edit: looking in the lua 5.2 reference I see one other common place where you'd need to use semi-colons to avoid ambiguity - where you have a simple statement followed by a function call or parens to group a compound statement. here is the manual example located here:
--[[ Function calls and assignments can start with an open parenthesis. This
possibility leads to an ambiguity in the Lua grammar. Consider the
following fragment: ]]
a = b + c
(print or io.write)('done')
-- The grammar could see it in two ways:
a = b + c(print or io.write)('done')
a = b + c; (print or io.write)('done')

in local variable and function definition. Here I compare two quite similar sample codes to illustrate my point of view.
local f; f = function() function-body end
local f = function() function-body end
These two functions can return different results when the function-body section contains reference to variable "f".

Many programming languages (including Lua) that do not require semicolons have a convention to not use them, except for separating multiple statements on the same line.
Javascript is an important exception, which generally uses semicolons by convention.
Kotlin is also technically an exception. The Kotlin Documentation say not only not to use semicolons on non-batched statements, but also to
Omit semicolons whenever possible.

In local variable definitions, we get ambiguous results from time to time:
local a, b = string.find("hello world", "hello") --> a = nil, b = nil
while sometimes a and b are assigned the right values 7 and 11.
So I found no choice but to follow one of these two approaches:
local a, b; a, b = string.find("hello world", "hello") --> a, b = 7, 11
local a, b
a, b = string.find("hello world", "hello") --> a, b = 7, 11

For having more than one thing on a line, for example:
c=5
a=1+c
print(a) -- 6
could be shortened to:
c=5; a=1+c; print(a) -- 6
also worth noting that if you're used to Javascript, or something like that, where you have to end a line in a semicolon, and you're especially used to writing that, then this means that you won't have to remove that semicolon, and trust me, i'm used to Javascript too, and I really, really forget that you don't need the semicolon, every time I write a new line!

Related

Performant way to lex INDENT and DEDENT to pass to Earley?

Continuing from this GitHub issue:
I need to match on indent or dedent, and I'm using Earley. Earley has no built-in support for indentation, but I'd like to be able to use indentation instead of braces in my language.
Example input:
func foo(a: A, b: B): C =
theresSomeIndentRequiredHere(a, b)
func noMoreIndentMeansNoMoreFoo: D = ???
This would parse theresSomeIndentRequiredHere as part of foo, but noMoreindentMeansNoMoreFoo would not get parsed as part of foo.
How can I do this without losing a ton of speed?

How to invoke Erlang function with variable?

4> abs(1).
1
5> X = abs.
abs
6> X(1).
** exception error: bad function abs
7> erlang:X(1).
1
8>
Is there any particular reason why I have to use the module name when I invoke a function with a variable? This isn't going to work for me because, well, for one thing it is just way too much syntactic garbage and makes my eyes bleed. For another thing, I plan on invoking functions out of a list, something like (off the top of my head):
[X(1) || X <- [abs, f1, f2, f3...]].
Attempting to tack on various module names here is going to make the verbosity go through the roof, when the whole point of what I am doing is to reduce verbosity.
EDIT: Look here: http://www.erlangpatterns.org/chain.html The guy has made some pipe-forward function. He is invoking functions the same way I want to above, but his code doesn't work when I try to use it. But from what I know, the guy is an experienced Erlang programmer - I saw him give some keynote or whatever at a conference (well I saw it online).
Did this kind of thing used to work but not anymore? Surely there is a way I can do what I want - invoke these functions without all the verbosity and boilerplate.
EDIT: If I am reading the documentation right, it seems to imply that my example at the top should work (section 8.6) http://erlang.org/doc/reference_manual/expressions.html
I know abs is an atom, not a function. [...] Why does it work when the module name is used?
The documentation explains that (slightly reorganized):
ExprM:ExprF(Expr1,...,ExprN)
each of ExprM and ExprF must be an atom or an expression that
evaluates to an atom. The function is said to be called by using the
fully qualified function name.
ExprF(Expr1,...,ExprN)
ExprF
must be an atom or evaluate to a fun.
If ExprF is an atom the function is said to be called by using the implicitly qualified function name.
When using fully qualified function names, Erlang expects atoms or expression that evaluates to atoms. In other words, you have to bind X to an atom: X = atom. That's exactly what you provide.
But in the second form, Erlang expects either an atom or an expression that evaluates to a function. Notice that last word. In other words, if you do not use fully qualified function name, you have to bind X to a function: X = fun module:function/arity.
In the expression X=abs, abs is not a function but an atom. If you want thus to define a function,you can do so:
D = fun erlang:abs/1.
or so:
X = fun(X)->abs(X) end.
Try:
X = fun(Number) -> abs(Number) end.
Updated:
After looking at the discussion more, it seems like you're wanting to apply multiple functions to some input.
There are two projects that I haven't used personally, but I've starred on Github that may be what you're looking for.
Both of these projects use parse transforms:
fun_chain https://github.com/sasa1977/fun_chain
pipeline https://github.com/stolen/pipeline
Pipeline is unique because it uses a special syntax:
Result = [fun1, mod2:fun2, fun3] (Arg1, Arg2).
Of course, it could also be possible to write your own function to do this using a list of {module, function} tuples and applying the function to the previous output until you get the result.

REBOL path operator vs division ambiguity

I've started looking into REBOL, just for fun, and as a fan of programming languages, I really like seeing new ideas and even just alternative syntaxes. REBOL is definitely full of these. One thing I noticed is the use of '/' as the path operator which can be used similarly to the '.' operator in most object-oriented programming languages. I have not programmed in REBOL extensively, just looked at some examples and read some documentation, but it isn't clear to me why there's no ambiguity with the '/' operator.
x: 4
y: 2
result: x/y
In my example, this should be division, but it seems like it could just as easily be the path operator if x were an object or function refinement. How does REBOL handle the ambiguity? Is it just a matter of an overloaded operator and the type system so it doesn't know until runtime? Or is it something I'm missing in the grammar and there really is a difference?
UPDATE Found a good piece of example code:
sp: to-integer (100 * 2 * length? buf) / d/3 / 1024 / 1024
It appears that arithmetic division requires whitespace, while the path operator requires no whitespace. Is that it?
This question deserves an answer from the syntactic point of view. In Rebol, there is no "path operator", in fact. The x/y is a syntactic element called path. As opposed to that the standalone / (delimited by spaces) is not a path, it is a word (which is usually interpreted as the division operator). In Rebol you can examine syntactic elements like this:
length? code: [x/y x / y] ; == 4
type? first code ; == path!
type? second code
, etc.
The code guide says:
White-space is used in general for delimiting (for separating symbols).
This is especially important because words may contain characters such as + and -.
http://www.rebol.com/r3/docs/guide/code-syntax.html
One acquired skill of being a REBOler is to get the hang of inserting whitespace in expressions where other languages usually do not require it :)
Spaces are generally needed in Rebol, but there are exceptions here and there for "special" characters, such as those delimiting series. For instance:
[a b c] is the same as [ a b c ]
(a b c) is the same as ( a b c )
[a b c]def is the same as [a b c] def
Some fairly powerful tools for doing introspection of syntactic elements are type?, quote, and probe. The quote operator prevents the interpreter from giving behavior to things. So if you tried something like:
>> data: [x [y 10]]
>> type? data/x/y
>> probe data/x/y
The "live" nature of the code would dig through the path and give you an integer! of value 10. But if you use quote:
>> data: [x [y 10]]
>> type? quote data/x/y
>> probe quote data/x/y
Then you wind up with a path! whose value is simply data/x/y, it never gets evaluated.
In the internal representation, a PATH! is quite similar to a BLOCK! or a PAREN!. It just has this special distinctive lexical type, which allows it to be treated differently. Although you've noticed that it can behave like a "dot" by picking members out of an object or series, that is only how it is used by the DO dialect. You could invent your own ideas, let's say you make the "russell" command:
russell [
x: 10
y: 20
z: 30
x/y/z
(
print x
print y
print z
)
]
Imagine that in my fanciful example, this outputs 30, 10, 20...because what the russell function does is evaluate its block in such a way that a path is treated as an instruction to shift values. So x/y/z means x=>y, y=>z, and z=>x. Then any code in parentheses is run in the DO dialect. Assignments are treated normally.
When you want to make up a fun new riff on how to express yourself, Rebol takes care of a lot of the grunt work. So for example the parentheses are guaranteed to have matched up to get a paren!. You don't have to go looking for all that yourself, you just build your dialect up from the building blocks of all those different types...and hook into existing behaviors (such as the DO dialect for basics like math and general computation, and the mind-bending PARSE dialect for some rather amazing pattern matching muscle).
But speaking of "all those different types", there's yet another weirdo situation for slash that can create another type:
>> type? quote /foo
This is called a refinement!, and happens when you start a lexical element with a slash. You'll see it used in the DO dialect to call out optional parameter sets to a function. But once again, it's just another symbolic LEGO in the parts box. You can ascribe meaning to it in your own dialects that is completely different...
While I didn't find any written definitive clarification, I did also find that +,-,* and others are valid characters in a word, so clearly it requires a space.
x*y
Is a valid identifier
x * y
Performs multiplication. It looks like the path operator is just another case of this.

When do you put double semicolons in F#?

This is a stupid question. I've been reading a couple books on F# and can't find anything that explains when you put ;; after a statement, nor can I find a pattern in the reading. When do you end a statement with double semi-colons?
In the non-interactive F# code that's not supposed to be compatible with OCaml, you shouldn't need to ever need double semicolon. In the OCaml compatible mode, you would use it at the end of a top-level function declaration (In the recent versions, you can switch to this mode by using files with .ml extension or by adding #light "off" to the top).
If you're using the command-line fsi.exe tool or F# Interactive in Visual Studio then you'd use ;; to end the current input for F#.
When I'm posting code samples here at StackOverflow (and in the code samples from my book), I use ;; in the listing when I also want to show the result of evaluating the expression in F# interactive:
Listing from F# interactive
> "Hello" + " world!";;
val it : string = "Hello world!"
> 1 + 2;;
val it : int = 3
Standard F# source code
let n = 1 + 2
printf "Hello world!"
Sometimes it is also useful to show the output as part of the listing, so I find this notation quite useful, but I never explained it anywhere, so it's great that you asked!
Are you talking about F# proper or about running F# functions in the F# Interactive? In F# Interactive ;; forces execution of the code just entered. other than that ;; does not have any special meaning that I know of
In F#, the only place ;; is required is to end expressions in the interactive mode.
;; is left over from the transition from OCaml, where in turn it is left over from Caml Light. Originally ;; was used to end top-level "phrases"--that is, let, type, etc. OCaml made ;; optional since the typical module consists of a series of let statements with maybe one statement at the end to call the main function. If you deviate from this pattern, you need to separate the statements with ;;. Unfortunately, in OCaml, when ;; is optional versus required is hard to learn.
However, F# introduces two relevant modifications to OCaml syntax: indentation and do. Top-level statements have to go inside a do block, and indentation is required for blocks, so F# always knows that each top-level statement begin with do and an indent and ends with an outdent. No more ;; required.
Overall, all you need to know is that [O']Caml's syntax sucks, and F# fixes a lot of its problems, but maintains a lot of confusing backward compatibility. (I believe that F# can still compile a lot of OCaml code.)
Note: This answer was based on my experience with OCaml and the link Adam Gent posted (which is unfortunately not very enlightening unless you know OCaml).
Symbol and Operator Reference (F#)
http://msdn.microsoft.com/en-us/library/dd233228(v=VS.100).aspx
Semi Colon:
•Separates expressions (used mostly in verbose syntax).
•Separates elements of a list.
•Separates fields of a record.
Double Semi Colon:
http://www.ffconsultancy.com/products/fsharp_journal/free/introduction.html
Articles in The F#.NET Journal quote F# code as it would appear in an interactive session. Specifically, the interactive session provides a > prompt, requires a double semicolon ;; identifier at the end of a code snippet to force evaluation, and returns the names (if any) and types of resulting definitions and values.
I suspect that you have seen F# code written when #light syntax wasn't enabled by default (#light syntax is on by default for the May 2009 CTP and later ones as well as for Visual Studio 2010) and then ;; means the end of a function declaration.
So what is #light syntax? It comes with the #light declaration:
The #light declaration makes
whitespace significant. Allowing the
developer to omit certain keywords
such as in, ;, ;;, begin, and end.
Here's a code written without #light syntax:
let halfWay a b =
let dif = b - a in
let mid = dif / 2 in
mid + a;;
and becomes with light syntax:
#light
let halfWay a b =
let dif = b - a
let mid = dif / 2
mid + a
As said you can omit the #light declaration now (which should be the case if you're on a recent CTP or Visual Studio 2010).
See also this thread if you want know more on the #light syntax: F# - Should I learn with or without #light?
The double semi-colon is used to mark the end of a block of code that is ready for evaluation in F# interactive when you are typing directly into the interactive session. For example, when using it as a calculator.
This is rarely seen in F# because you typically write code into a script file, highlight it and use ALT+ENTER to have it evaluated, with Visual Studio effectively injecting the ;; at the end for you.
OCaml is the same.
Literature often quotes code written as it would appear if it had been typed into an interactive session because this is a clear way to convey not only the code but also its inferred type. For example:
> [1; 2; 3];;
val it : int list = [1; 2; 3]
This means that you type the expression [1; 2; 3] into the interactive session followed by the ;; denoting the end of a block of code that is ready to be evaluated interactively and the compiler replies with val it : int list = [1; 2; 3] describing that the expression evaluated to a value of the type int list.
The double semicolon most likely comes from OCaml since that is what the language is based on.
See link text
Basically its for historical purposes and you need it for the evaluator (repl) if you use it.
There is no purpose for double semi-colons (outside of F# interactive). The semi-colon, according to MSDN:
Separates expressions (used mostly
in verbose syntax).
Separates
elements of a list.
Separates
fields of a record.
Therefore, in the first instance, ;; would be separating the expression before the first semi-colon from the empty expression after it but before the second semi-colon, and separating that empty expression from whatever came after the second semi-colon (just as in, say C# or C++).
In the instance of the list, I suspect you'd get an error for defining an empty list element.
With regards to the record, I suspect it would be similar to separating expressions, with the empty space between the semi-colons effectively being ignored.
F# interactive executes the entered F# on seeing a double semi-colon.
[Updated to cover F# interactive - courtesy of mfeingold)
The history of the double semicolon can be traced back to the beginnings of ML when semicolons were used as a separator in lists instead of commas. In this ICFP 2010 - Tribute to Robin Milner video around 50:15 Mike Gordon mentions:
There was a talk on F# where someone asked "Why is there double semicolon on the end of F# commands?" The reason is the separator in lists in the original ML is semicolons, so if you wanted a list 1;2;3; and put it on separate lines- if you ended a line with semicolon you were not ending the phrase, so using double semicolon meant the end of the expression. Then in Standard ML the separator for lists became comma, so that meant you could use single semicolons to end lists.

What can you NOT use an identifier for?

I'm trying to understand what identifiers represent and what they don't represent.
As I understand it, an identifier is a name for a method, a constant, a variable, a class, a package/module. It covers a lot. But what can you not use it for?
Every language differs in terms of what entities/abstractions can or cannot be named and reused in that language.
In most languages, you can't use an identifier for infix arithmetic operations.
For example, plus is an identifier and you can make a function named plus. But write you can write a = b + c;, there's no way to define an operator named plus to make a = b plus c; work because the language grammar simply does not allow an identifier there.
An identifier allows you to assign a name to some data, so that you can reference it later. That is the limit of what identifiers do; you cannot "use" it for anything other than a reference to some data.
That said, there are a lot of implications that come from this, some subtle. For example, in most languages functions are, to some degree or another, considered to be data, and so a function name is an identifier. In languages where functions are values, but not "first-class" values, you can't use an identifier for a function in an place you could use an identifier for something else. In some languages, there will even be separate namespaces for functions and other data, and so what is textually the same identifier might refer to two different things, and they would be distinguished by the context in which they are used.
An example of what you usually (i.e., in most languages) cannot use an identifier for is as a reference to a language keyword. For example, this sort of thing generally can't be done:
let during = while;
during (true) { print("Hello, world."); }
You could say it's used for everything that you'll want to refer to multiple times, or maybe even once (but use it to clarify the referent's purpose).
What can/can't be named differs per language, it's often quite intuitive, IMHO.
An "Anonymous" entity is something which is not named, although referred to somehow.
#!/usr/bin/perl
$subroutine = sub { return "Anonymous subroutine returning this text"; }
In Perl-speak, this is anonymous - the subroutine is not named, but it is referred to by the reference variable $subroutine.
PS: In Perl, the subroutine would be named like this:
sub NAME_HERE {
# some code...
}
Say, in Java your cannot write something like:
Object myIf = if;
myIf (a == b) {
System.out.println("True!");
}
So, you cannot name some code statement, giving it an alias. While in REBOL it is perfectly possible:
myIf: if
myIf a = b [print "True!"]
What can and what can't be named depends on language, as you see.
as its name implifies, an identifier is used to identify something. so for everything that can be identified uniquely, you can use an identifier. But for example a literal (e.g. string literal) is not unique so you can't use an identifier for it. However you can create a variable and assign a string literal to it.
Making soup out them is rather foul.
In languages such as Lisp, an identifier exists in its own right as an symbol, whereas in languages which are not introspective identifiers don't exist in the runtime.
You write a literal identifier/symbol by putting a single quote in front of it:
[1]> 'a
A
You can create a variable and assign a symbol literal to it:
[2]> (setf a 'Hello)
HELLO
[3]> a
HELLO
[4]> (print a)
HELLO
HELLO
You can set two variables to the same symbol
[10]> (setf b a)
HELLO
[11]> b
HELLO
[12]> a
HELLO
[13]> (eq b a)
T
[14]> (eq b 'Hello)
T
Note that the values bound to b and a are the same, and the value is the literal symbol 'Hello
You can bind a function to the symbol
[15]> (defun hello () (print 'hello))
HELLO
and call it:
[16]> (hello)
HELLO
HELLO
In common lisp, the variable binding and the function binding are distinct
[19]> (setf hello 'goodbye)
GOODBYE
[20]> hello
GOODBYE
[21]> (hello)
HELLO
HELLO
but in Scheme or JavaScript the bindings are in the same namespace.
There are many other things you can do with identifiers, if they are reified as symbols. I suspect that someone more knowledgable than me in Lisp will be able to demonstrate any of the things that you 'can't do with identifiers' exist.
But even Lisp can not make identifier soup.
Sort of a left-field thought, but JSON has all those quotations in it to eliminate the danger of a JavaScript keyword messing up the parsing.

Resources