What are the naming conventions in Rascal? It seems that modules, but not intermediate paths, tends to be upper case, also variable names. Does it make a difference? What is the convention and the rationale behind it?
We are working towards the convention that:
identifiers for functions, variable names, constructors, fields names of tuples and constructors, start with lowercase and continue with camelCase.
user-defined types such as, alias, data, syntax, lexical non-terminals, start with uppercase and continue with CamelCase.
The rationale is that in Rascal syntax definitions we currently need a syntactic difference between type names and label names to prevent ambiguity, and we chose to have one with uppercase and the other with lowercase first letters. The above convention continues in that vain for the rest of the language for the sake of consistency, but is yet to be formalized.
Related
I have an app that (among other things) supports plain-text searches and searches using Lua patterns. As a convenience, the app supports case-insensitive searches. Here is an image snippet:
The code that transforms the given Lua pattern into a case-insensitive Lua pattern isn't too pretty. It basically worries about whether or not a character is preceded by an odd or even number of escapes (%) and whether or not it is located inside of square brackets. The pattern shown in the image becomes %a[bB][bB]%%[cC][%abB%%cC]
I haven't had a chance to learn LPeg yet, and I suppose this could be my motivator.
My question is whether this is something that LPeg could have handled easily?
Yes, but for an easier entry into the LPeg world, consider LPeg's "re" module, which gives you a regex-like syntax and which you can specify a set of rules, as in a grammar (think Yacc, etc.). You'd basically write rules for escaped characters, bracket groups and regular characters. Then, you could associate functions to the rules, that would emit either the same text they consumed as the input or the case-insensitive modified version.
The structure of your rules would take care of the even-odd distinction automatically, bracket context, etc. LPeg uses "ordered choice", so if you add your escape rule first, it will handle %[ correctly and avoid mixing it up with the brackets rule, for example.
I am trying to understand how to use EBNF to define a formal grammar, in particular a sequence of words separated by a space, something like
<non-terminal> [<word>[ <word>[ <word>[ ...]]] <non-terminal>
What is the correct way to define a word terminal?
What is the correct way to represent required whitespace?
How are optional, repetitive lists represented?
Are there any show-by-example tutorials on EBNF anywhere?
Many thanks in advance!
You have to decide whether your lexical analyzer is going to return a token (terminal) for the spaces. You also have to decide how it (the lexical analyzer) is going to define words, or whether your grammar is going to do that (in which case, what is the lexical analyzer going to return as terminals?).
For the rest, it is mostly a question of understanding the niceties of EBNF notation, which is an ISO standard (ISO 14977:1996 — and it is available as a free download from Freely Available Standards, which you can also get to from ISO), but it is a standard that is largely ignored in practice. (The languages I deal with — C, C++, SQL — use a BNF notation in the defining documents, but it is not EBNF in any of them.)
Whatever you want to make the correct definition of a word. You need to think about how you'd want to treat the name P. J. O'Neill, for example. What tokens will the lexical analyzer return for that?
This is closely related to the previous issue; what are the terminals that lexical analyzer is going to return.
Optional repetitive lists are enclosed in { and } braces, or you can use the Kleene Star notation.
There is a paper Extended BNF — A generic base standard by R. S. Scowen that explains EBNF. There's also the Wikipedia entry on EBNF.
I think that a non-empty, space-separated word list might be defined using:
non_empty_word_list = word { space word }
where all the names there are non-terminals. You'd need to define those in terms of the relevant terminals of your system.
I wonder if there is a relationship between untyped/typed code quotations in F# and the hygiene of macro systems. Do they solve the same issues in their respective languages or are they separate concerns?
The meta-programming aspect is the only similarity, and even in that regard, there is a big difference. You can think of the macro's transformer as a function from syntax to syntax like you can manipulate quotations, but the transformers are globally coordinated so that names used as binders follow a specific protocol:
1) Binders may not be the same as any free name in input to the macro (unless you use an unhygienic escape hatch)
2) Names bound in a macro definition's context that are free in the macro's expansion must point to the same thing at macro use time. (this needs global coordination)
Choices for names are made so that expansion does not fail if you used the wrong name (unless it turns out that name is unbound).
Transformers of typed quotations do not have this definition time context idea. You manipulate quotations to form a program that does not refer to any names in your program. They are not meant to provide a syntactic abstraction mechanism. Arbitrary shapes of syntax? Nope. It all has to be core AST shapes.
Open code in typed quotation systems can be closed with anything that fits the type structure of the expected context - there is no coordinated composition of several open components into a coherent structure.
Quotations are a form of meta-programming. They allow you to manipulate abstract syntax trees programmatically, which can be in turned spliced into code, and evaluated.
Typed quotations embed the reified type of the AST in the host language's type system, so they ensure you cannot generate ill-typed fragments of code. Untyped quotations do not offer that guarantee (it may fail with a runtime error).
As an aside, typed quotations are strongly similar to Template Haskell quasiquotations.
Hygenic macros in Lisp-like languages are related, in that they exist to support meta-programming. The hygiene however is for simple name capture confusion, something that typed quasi quotations already avoid (and more).
So yes, they are similar, in that they are mechanisms for meta-programming in typed and untyped languages, respectively. Both typed quasi quotes and hygenic macros add additional safety to fully untyped, unsound meta programming. The level of guarantee they offer the programmer though is different. The typed quotes are strictly stronger.
I learned to program with delphi, and i always liked the object pascal code style, looks very intuitive and clean.
When you look at the variable declaration, you know what are you dealing with..
A fast summary:
Exception E EMyError
Classes and Types T TMyClass
Fields in classes f fVisible
Events On OnMouseDown
Pointer types P PMyRecord
Property Get Something Set SetSomething
It's too bad to use this identifier naming style in C++ C# Java, or any other language code?
Aside from taste and cultural issues (as already pointed by Mason)
There might be reasons why a convention is tied to a certain language, and the other languages might also have reasons for theirs.
I can only quickly think of a few examples though:
On languages that don't require a pointer type to be defined before use (like most non-Borland Pascals, C etc), the "P" one is usually rarely necessary.
Other languages might also have additional means of disambiguating (like in C where often types are upper cased, and the variables or fields get the lowercase identifier), and does not need "T". (strictly speaking Delphi doesn't neither at least for fields, since identifiers are somewhat context dependantly looked up (likeseparate namespaces for fields and types), but the convention is older than that feature)
BTW, you forget "I" for interface, and enum names being prefixed with some prefix derived from the base type name (e.g.
TStringsDefined = set of (sdDelimiter, sdQuoteChar, sdNameValueSeparator,
sdLineBreak, sdStrictDelimiter)
)
etc.
Hmm, this is another language specific bit, since Object Pascal always adds enum names to the global space (instead of requiring enumtype.enumname). With a prefix there is less polution of the global space.
That is one of my pet peeves with Delphi btw, the lack of import control (Modula2 style IMPORT QUALIFIED , FROM xxx IMPORT. Extended Pascal also has some of this)
As far as I know, the T, E, F, and P prefixes are only commonly used in Delphi programming. They're a standard part of the idiom here, but in C# or Java they'd look out of place.
Get and Set are pretty standard across object-oriented programming. Not sure about the On prefix, but it wouldn't surprise me to find that that's common in any event-driven framework.
I come across the following in code.
_name1
_name2
smeEGiGross:
In general, what does _name1 underscore mean in Delphi 4?
I think it's just a common practice to begin variable names with underscores.
Rules for variable (and component) names in Delphi:
Name can be any length but Delphi uses only first 255 characters.
First character must be letter or underscore not a number.
You cannot use any special characters such as a question mark (?), but you can
use an underscore (_).
No spaces area allowed in the name of a variable.
Reserved words (such as begin, end, if, program) cannot be used as variables.
Delphi is case insensitive – it does not matter whether capital letters are
used or not. Just make sure the way variables or components are used is
consistent throughout the program.
It's a convention to help determine scope of a variable by its name, like private class members. The original author probably uses C++ as well.
In Delphi, I prefer to prefix fields with "F", method parameters with "a" (argument) and local variables with "l".
Update:
Another place you might see underscores is after certain identifiers in code generated with WSDLImp or TLBImp to avoid confusion with existing Delphi identifiers. For example, unless you specify otherwise, "Name" is renamed (no pun intended) to "Name_".
To add to the answer of phoenix:
* You can use reserved words as identifiers, but you must add a & sign: &then,
&unit.
I only use this if another name ís not apropriate.
I associate leading underscores with C/C++ not with Delphi. The C syntax is more character oriented (like {, }, || and &&) so the underscores fit perfectly. While the Delphi/Pascal syntax is more text oriented (begin, end, or, and) so the underscores look a bit strange as you don't expect them there.
It's regularly used for scope.
I declare all my private variables with a _ at the beginning, which is more common in C#/C++.
It's for readability and doesn't necessarily mean anything.
I cannot say what the author of the code you have in mind was thinking, but an underscore prefix has a fairly well recognised convention in relation to COM.
Members in a COM interface that have an underscore prefix are (or were) hidden by default by interface browsers/inspectors, such as VB's library browser etc. This is used when members are published by necessity but are not intended to be used directly.
The _AddRef and _Release members of the IUnknown interface are perhaps the most obvious example of this.
I've adopted this convention myself, so that for example if I declare a unit implementation variable that is exposed through the unit interface via an accessor function I will name the variable with an underscore prefix. It's not strictly necessary in that sort of example but it acts as documentation for myself (and anyone else reading my code that is aware of the convention, which is fairly self-evident from the context).
Another example is when I have a function pointer that may refer to different functions according to runtime conditions so the actual functions that the pointer may refer to are not intended to be called directly but are instead intended to be invoked via the function pointer.
So speaking for myself, I use it as a warning/reminder.... Proceed with caution you are not expected to reference this symbol directly.