Cobol sections and paragraphs are subdivided into sentences. In older Cobol versions (without explicit scope delimiters e. g. END-IF) defining multiple sentences per section/paragraph was required to limit the scope of conditional statements (e. g. IF).
Are there any use-cases where defining multiple sentences is required in newer Cobol versions? Or are sentences just there for historical reason?
As Bill Woodger says, sentences only exist now for backwards compatibility.
There is now only one place where multiple sentences must be used: in DECLARATIVES, where the USE statement must be in its own sentence.
DECLARATIVES.
a-file-error SECTION.
USE ON a-file.
DISPLAY "Oops"
.
Related
Identifiers typically consist of underscores, digits; and uppercase and lowercase characters where the first character is not a digit. When writing lexers, it is common to have helper functions such as is_digit or is_alnum. If one were to implement such a function to scan a character used in an identifier, what would it be called? Clearly, is_identifier is wrong as that would be the entire token that the lexer scans and not the individual character. I suppose is_alnum_or_underscore would be accurate though quite verbose. For something as common as this, I feel like there should be a single word for it.
Unicode Annex 31 (Unicode Identifier and Pattern Syntax, UAX31) defines a framework for the definition of the lexical syntax of identifiers, which is probably as close as we're going to come to a standard terminology. UAX31 is used (by reference) by Python and Rust, and has been approved for C++23. So I guess it's pretty well mainstream.
UAX31 defines three sets of identifier characters, which it calls Start, Continue and Medial. All Start characters are also Continue characters; no Medial character is a Continue character.
That leads to the simple regular expression (UAX31-D1 Default Identifier Syntax):
<Identifier> := <Start> <Continue>* (<Medial> <Continue>+)*
A programming language which claims conformance with UAX31 does not need to accept the exact membership of each of these sets, but it must explicitly spell out the deviations in what's called a "profile". (There are seven other requirements, which are not relevant to this question. See the document if you want to fall down a very deep rabbit hole.)
That can be simplified even more, since neither UAX31 nor (as far as I know) the profile for any major language places any characters in Medial. So you can go with the flow and just define two categories: identifier-start and identifier-continue, where the first one is a subset of the second one.
You'll see that in a number of grammar documents:
Pythonidentifier ::= xid_start xid_continue*
RustIDENTIFIER_OR_KEYWORD : XID_Start XID_Continue*
| _ XID_Continue+
C++identifier:
identifier-start
identifier identifier-continue
So that's what I'd suggest. But there are many other possibilities:
SwiftCalls the sets identifier-head and identifier-characters
JavaCalls them JavaLetter and JavaLetterOrDigit
CDefines identifier-nondigit and identifier-digit; Continue would be the union of the two sets.
I am trying to build a parser (using Yacc) for first-order logic languages but I am a bit confused. FOL can have many languages — each language can have its own sets of functions, relations, named variables. So, small languages may be general and include very few symbols like animal, human, object. But rich languages can be very specific and include very many symbols like cat, dog, monkey, John, Peter, table, spoon, etc.
Is there way to introduce the parameterized set of symbols whose set can be of almost infinite length, like f(1), f(2), f(3), f(i),... and then there can be both specific grammar rules for specific and general grammar rules like ones that are connected with f(i) for even i and f(j) for odd j. I have heard about https://www.grammaticalframework.org/ and categorical grammars which have categories, but they have a fixed set of constants too.
So - I guess, that - if I would like to make my system open and be ready to accommodate growing symbols, then I should be ready to update grammar and use Yacc to recompile parser and make some generalized classes?
Is there an approach to handle evolving grammars in the software code? Can Yacc do that?
In Java it's possible to write 1_000_000 instead of 1000000 for better readability. Is there something equivalent in F#?
This question was already asked on feature request page and the current status of this chanbge request is "planned" and "approved in principle".
So it may will be implemented in one of the next releases.
You can find more information about this request (like a summary about this feature, the motivation and suggested implementation details) on the github page for F#:
Summary
Allow underscores between any digits in numeric literals. This feature enables you, for example, to separate groups of digits in numeric literals, which can improve the readability of your code.
For instance, if your code contains numbers with many digits, you can use an underscore character to separate digits in groups of three, similar to how you would use a punctuation mark like a comma, or a space, as a separator.
Motivation
This is a popular feature in other languages. Some other languages with a similar feature:
Perl
Ruby
Java 7
C++11 (use single quote)
just to name a few...
Detailed design
You can place underscores only between digits. You cannot place underscores in the following places:
At the beginning or end of a number
Adjacent to a decimal point in a floating point literal
Prior to an F or L or other suffix
In positions where a string of digits is expected
I am confused between the word syntax and grammar. Is there a reason that for computer languages we always use the word syntax to describe the word order and not the word grammar?
The term "syntax" and "grammar" both comes from the field of linguistics. In linguistics, syntax refers to the rules by which sentences are constructed. Grammar refers to how the rules of the language relate to one another.
Grammar actually covers syntax, morphology and phonology. Morphology are the rules of how words can be modified to add meaning or context. Phonology are the rules of how words should sound like (which in turn govern how spelling works in that language).
So, how did concepts form linguistics got adopted by programmers?
If you look at really old papers and publications related to computing, for example Turing's seminal work on computability (Turing machines) or even older, Babbage's publications describing his Analytical Engine and Ada Lovelace's publications on programming, you'll find that they don't refer to computer programs as languages. Instead, they were just referred to as instructions or, if you want to get fancy, algorithms.
It was partly, perhaps mostly, the work of Noam Chomsky that related languages to programming.
Looking for a new way to study languages and how to extract meaning from sentences Chomsky created the concept of the Chomsky hierarchy. His idea was to start with the simplest system that could process a string of "stuff" (sounds,letters,words): a Turing machine and categorize the instructions for a Turing machine as type-0 grammar. Then he went on to define grammar types 1, 2 and 3 (type 3 being the grammar of human languages such as English or Swahili) hoping that as we understand how complexity gets introduced we will end up with a parser for human languages.
Most programming languages are type 2. Indeed we have discovered parsers for types 0, 1 and 2 in the form of language interperters and CPU designs.
Inheriting Chomsky's work, we have defined "syntax" in computing to mean how symbols are arranged to implement a language feature and "grammar" to mean the collection of syntax rules.
Because a language has only "one" syntax (the set of strings it will accept), and probably very many grammars even if we exclude trivial variants.
This may be clearer if you think about the phrase, "the language syntax allows stuff". This phrase is independent of any grammars that might be used to describe the syntax.
Our application is being translated into a number of languages, and we need to have a combo box that lists the possible languages. We'd like to use the name of the language in that language (e.g. Français for French).
Is there any "proper" order for listing these languages? Do we alphabetize them based on their English names?
Update:
Here is my current list (I want to explore the Unicode Collating Algorithm that Brian Campbell mentioned):
"العربية",
"中文",
"Nederlands",
"English",
"Français",
"Deutsch",
"日本語",
"한국어",
"Polski",
"Русский язык",
"Español",
"ภาษาไทย"
Update 2: Here is the list generated by the ICU Demonstration tool, sorting for an en-US locale.
Deutsch
English
Español
Français
Nederlands
Polski
Русский язык
العربية
ภาษาไทย
한국어
中文
日本語
This is a tough question without a single, easy answer. First of all, by default you should use the user's preferred language, as given to you by the operating system, if that is one of your available languages (for example, in Windows, you would use GetUserPreferredUILanguages, and find the first one on that list that you have a translation for).
If the user still needs to select a language (you would like them to be able to override their default language, or select another language if you don't support their preferred language), then you'll need to worry about how to sort the languages. If you have 5 or 10 languages, the order probably doesn't matter that much; you might go for sorting them in alphabetical order. For a longer list, I'd put your most common languages at the top, and perhaps the users preferred languages at the top as well, and then sort the rest in alphabetical order after that.
Of course, this brings up how to sort alphabetically when languages might be written in different scripts. For instance, how does Ελληνικά (Ellinika, Greek) compare to 日本語 (Nihongo, Japanese)? There are a few possible solutions. You could sort each script together, with, for instance, Roman based scripts coming first, followed by Cyrillic, Greek, Han, Hangul, and so on. Or you could sort non-Roman scripts by their English name, or by a Roman transliteration of their native name. Probably the first or third solution should be preferred; people may not know the English name for their language, but many languages have English transliterations that people may know about. The first solution (each script sorted separately) is how the Mac OS X languages selection works; the second (sorted by their Roman transliteration) appears to be how Wikipedia sorts languages.
I don't believe that there is a standard for this particular usage, though there is the Unicode Collation Algorithm which is probably the most common standard for sorting text in mixed scripts in a relatively language-neutral way.
I would say it depends on the length of your list.
If you have 5 languages (or any number which easily fits into the dropdown without scrolling) then I'd say put your most common language at the top and then alphabetize them... but just alphabetizing them wouldn't make it less user friendly IMHO.
If you have enough the you'd need to scroll I would put your top 3 or 5 (or some appropriate number of) most common languages at the top and bold them in the list then alphabetize the rest of the options.
For a long list I would probably list common languages twice.
That is, "English" would appear at the top of the list and at the point in the alphabetized list where you'd expect.
EDIT: I think you would still want to alphabetize them according so how they're listed... that is "Espanol" would appear in the E's, not in the S's as if it were "Spanish"
Users will be able to pick up on the fact that languages are listed according to their translated name.
EDIT2: Now that you've edited to show the languages you're interested in I can see how a sort routine would be a bit more challenging!
The ISO has codes for languages (here's the Library of Congress description), which are offered in order by the code, by the English name, and by the French name.
It's tricky. I think as a user I would expect any list to be ordered based on how the items are represented in the list. So as much as possible, I would use alphabetical order based on the names you are actually displaying.
Now, you can't always do that, as many will use other alphabets. In those cases there may be a roman-alphabet way of transliterating the name (for example, the Pinyin system for Mandarin Chinese) and it could make sense to alphabetize based on that. However, romanization isn't a simple subject; there are at least a dozen ways for romanizing Arabic, for example.
You could alphabetize them based on their ISO 639 language code.