Documentation: list of operators - rascal

Is there somewhere I can find a list of operators.
If "+", "-", "/", "%", "==", "<=", "<",... and even "<-" are common to most languages, there are some like "::", "| (ADT)", "| (Set)" or "in" (for ADT alternative), which are specific to rascal.
Something like what can be found here for java.
I could not find them in the documentation, which is otherwise really good.

Most of the operators are described here: http://tutor.rascal-mpl.org/Rascal/Expressions/Expressions.html#/Rascal/Expressions/Operators/Operators.html
There are some more types of expressions which can also be seen as operators here: http://tutor.rascal-mpl.org/Rascal/Expressions/Expressions.html#/Rascal/Expressions/Expressions.html

Related

how to tokenize input based on a set of ABNF grammar rules

I have read the RFC on the ABNF specification and am having difficulty understanding how a set of ABNF rules could be used to reliably extract tokens from some input string that matches the grammar. It seems that the specification doesn't ever mention tokens or ASTs, so it may not concern itself with that, but I believe that would be the ultimate goal of applying any BNF grammar, unless I am mistaken.
In the specification, they list example rules for parsing a postal-address:
postal-address = name-part street zip-part
name-part = *(personal-part SP) last-name [SP suffix] CRLF
name-part =/ personal-part CRLF
personal-part = first-name / (initial ".")
first-name = *ALPHA
initial = ALPHA
last-name = *ALPHA
suffix = ("Jr." / "Sr." / 1*("I" / "V" / "X"))
street = [apt SP] house-num SP street-name CRLF
apt = 1*4DIGIT
house-num = 1*8(DIGIT / ALPHA)
street-name = 1*VCHAR
zip-part = town-name "," SP state 1*2SP zip-code CRLF
town-name = 1*(ALPHA / SP)
state = 2ALPHA
zip-code = 5DIGIT ["-" 4DIGIT]
There is also a list of core rules that I won't post here describing expected common-usage rules.
Ultimately, what I would like to do is figure out the rules necessary for taking the input
John H. Doe
12345 Fakestreet
Springfield, IL 55555
and generating what I believe would be the correct token sequence which is:
["John", " ", "H", ".", "Doe", "\r\n",
"12345", " ", "Fakestreet", "\r\n",
"Springfield", ",", " ", "IL", " ", "55555", "\r\n"]
(I believe the spaces and CRLFs need to be returned as "tokens" because they are specified as requirements in certain rules)
Some problems I am considering:
It makes sense that "Fakestreet" should be its own token, but according to the definition it is a variable repetition of the visible-character core rule. Ideally I would not like to read out each letter as its own token ("F", "a", "k", and so on), so (assuming core-rules can be treated as terminals?) any potential token string would need to be checked against the entire, theoretically infinite, rule definition 1*VCHAR to see if it is a match. And some rules are more complicated than that, like zip-code's 5DIGIT ["-" 4DIGIT], but any potential token needs to be checked against this rule as well ("12345" and "12345-6789" are both valid tokens). So it seems like entire rule element concatenations need to be checked completely as well, unless "12345-6789" should rather be tokenized as ["12345", "-", "6789"] which... may be correct?
I'd assume we would not want to completely check rules that reference other rules, otherwise we may end up tokenizing the entire postal-address as a single token of type "postal-address". Maybe rules that reference other rules shouldn't be checked? Maybe there is such a thing as a "terminal-rule" that includes no rule refs (excluding core rules)?
Occasionally in the rules, terminal values are combined with rule references, for instance in the definition of "personal-part", the literal "." is defined. So, while we may not want to match any potential token string against the entire "personal-part" rule definition, it seems we do want to try to match it against the literal "." because it is a required token for parsing a personal-part. Maybe in non-terminal rules, terminal values listed there should be considered?
I realize this is a lengthy question, but it seems BNF supersets like EBNF and ABNF are being used for this kind of thing but I cannot find a standard specification for how to tokenize from ABNF grammar.

How do I create a pattern to only match a specific set of characters?

I'm looking for a way search in a string for a very specific set of characters: "(),:;<>#[\]
specialChar = str:find("[\"][%(][%)][,][:][;][<][>][#][%[][%]][\\]")
I'm thinking that there will be no pattern to satisfy my need because of the Limitations of Lua patterns.
I have read the Lua Manual pattern matching section pretty thoroughly but still can't seem to figure it out.
Does anyone know a way I can identify if a given string contains any of these characters?
Note, I do not need to know anything about which character or where in the string it is, if that helps.
To check if a string contains ", (, ), ,, :, ;, <, >, #, [, \ or ] you may use
function ContainsSpecialChar(input)
return string.find(input, "[\"(),:;<>#[\\%]]")
end
See the Lua demo

Is there a guide/convention to the markup being used in the lua official documentation?

I consistently have a really hard time reading official documentation when it's related to coding. I generally don't understand it unless it's paired with an example. I am seeking clarification on what kind of conventions are inplace when reading docs, if any. Take the example below from the lua manual(https://www.lua.org/manual/5.1/manual.html#2.1)
:
stat ::= if exp then block {elseif exp then block} [else block] end
The first word, Stat, is defined as a statement and "this set includes assignments, control structures, function calls, and variable declarations."
::= Is not defined in the docs, it can be googled thankfully.
Exp is linked and explained.
Block has a section as well.
But then they do {} and []. They literally stated "Square brackets are used to index a table" just a few lines above. And that squiggly brackets are for writing a table. So what am I supposed to deduce from this? That {} and [] are being used to denote separate sections as a markup to make it easier to see certain components? Or that {elseif exp then block} is a table with those values inside of itself and [else block] is a key-value indexing a table? If I was writing a doc where that was indeed the case, wouldn't I write it this way?
Then I see
var ::= prefixexp `[´ exp `]´`
' ' defines a string, but I have to make the assumption that '[' ']' is used as a way to highlight the fact that because they were talking about what square brackets do in the previous section they are simply highlighting their position and this should not be included in the code. I only know to make this assumption though cause I know it doesn't work when you put them in there.
But then I see this:
chunk ::= {stat [`;´]}
Similarly they are talking about the placement of the semicolon before listing that code, but the entirety of the line of code was also newly explained and being talked about. Why would I assume that its written without the parenthesis if its written with the parenthesis? And I see they are using {} and [] again, and I have no idea what they are referencing because its not stated explicitly that we're talking about a table...its simply using the code itself to explain whether its talking about a table or not with the {}, but we have that first set of code where {} is being used and its not talking about a table.
What is the convention being used? What are they actually trying to do/show by using {} and [] in the first line of code?
As stated both at the beginning of the Lua documentation and in the section on the Lua grammar, Lua presents its grammar in extended BNF format.
EBNF has its own punctuation with its own meaning, like ::= as you discovered. But as a grammar, there needs to be a distinction between the EBNF meaning of a piece of punctuation and "this punctuation appears in the language defined by the grammar". The former meaning is therefore always assumed; the latter meaning can only be achieved by quoting the punctuation.
So this:
var ::= prefixexp `[´ exp `]´`
Means a prefixexp followed by an open bracket followed by exp followed by a close bracket.
By contrast, this:
funcname ::= Name {`.´ Name} [`:´ Name]
Means Name followed by zero or more sub-sequences of . followed by Name, followed by an optional sub-sequence of : followed by Name. Because those are what {} and [] mean to EBNF.

Regex for validating price in Rails [duplicate]

This question's answers are a community effort. Edit existing answers to improve this post. It is not currently accepting new answers or interactions.
What is this?
This is a collection of common Q&A. This is also a Community Wiki, so everyone is invited to participate in maintaining it.
Why is this?
regex is suffering from give me ze code type of questions and poor answers with no explanation. This reference is meant to provide links to quality Q&A.
What's the scope?
This reference is meant for the following languages: php, perl, javascript, python, ruby, java, .net.
This might be too broad, but these languages share the same syntax. For specific features there's the tag of the language behind it, example:
What are regular expression Balancing Groups? .net
The Stack Overflow Regular Expressions FAQ
See also a lot of general hints and useful links at the regex tag details page.
Online tutorials
RegexOne ↪
Regular Expressions Info ↪
Quantifiers
Zero-or-more: *:greedy, *?:reluctant, *+:possessive
One-or-more: +:greedy, +?:reluctant, ++:possessive
?:optional (zero-or-one)
Min/max ranges (all inclusive): {n,m}:between n & m, {n,}:n-or-more, {n}:exactly n
Differences between greedy, reluctant (a.k.a. "lazy", "ungreedy") and possessive quantifier:
Greedy vs. Reluctant vs. Possessive Quantifiers
In-depth discussion on the differences between greedy versus non-greedy
What's the difference between {n} and {n}?
Can someone explain Possessive Quantifiers to me? php, perl, java, ruby
Emulating possessive quantifiers .net
Non-Stack Overflow references: From Oracle, regular-expressions.info
Character Classes
What is the difference between square brackets and parentheses?
[...]: any one character, [^...]: negated/any character but
[^] matches any one character including newlines javascript
[\w-[\d]] / [a-z-[qz]]: set subtraction .net, xml-schema, xpath, JGSoft
[\w&&[^\d]]: set intersection java, ruby 1.9+
[[:alpha:]]:POSIX character classes
[[:<:]] and [[:>:]] Word boundaries
Why do [^\\D2], [^[^0-9]2], [^2[^0-9]] get different results in Java? java
Shorthand:
Digit: \d:digit, \D:non-digit
Word character (Letter, digit, underscore): \w:word character, \W:non-word character
Whitespace: \s:whitespace, \S:non-whitespace
Unicode categories (\p{L}, \P{L}, etc.)
Escape Sequences
Horizontal whitespace: \h:space-or-tab, \t:tab
Newlines:
\r, \n:carriage return and line feed
\R:generic newline php java-8
Negated whitespace sequences: \H:Non horizontal whitespace character, \V:Non vertical whitespace character, \N:Non line feed character pcre php5 java-8
Other: \v:vertical tab, \e:the escape character
Anchors
anchor
matches
flavors
^
Start of string
Common*
^
Start of line
Commonm
$
End of line
Commonm
$
End of text
Common* except javascript
$
Very end of string
javascript*, phpD
\A
Start of string
Common except javascript
\Z
End of text
Common except javascript python
\Z
Very end of string
python
\z
Very end of string
Common except javascript python
\b
Word boundary
Common
\B
Not a word boundary
Common
\G
End of previous match
Common except javascript, python
Term
Definition
Start of string
At the very start of the string.
Start of line
At the very start of the string, andafter a non-terminal line terminator.
Very end of string
At the very end of the string.
End of text
At the very end of the string, andat a terminal line terminator.
End of line
At the very end of the string, andat a line terminator.
Word boundary
At a word character not preceded by a word character, andat a non-word character not preceded by a non-word character.
End of previous match
At a previously set position, usually where a previous match ended.At the very start of the string if no position was set.
"Common" refers to the following: icu java javascript .net objective-c pcre perl php python swift ruby
* Default |
m Multi-line mode. |
D Dollar end only mode.
Groups
(...):capture group, (?:):non-capture group
Why is my repeating capturing group only capturing the last match?
\1:backreference and capture-group reference, $1:capture group reference
What's the meaning of a number after a backslash in a regular expression?
\g<1>123:How to follow a numbered capture group, such as \1, with a number?: python
What does a subpattern (?i:regex) mean?
What does the 'P' in (?P<group_name>regexp) mean?
(?>):atomic group or independent group, (?|):branch reset
Equivalent of branch reset in .NET/C# .net
Named capture groups:
General named capturing group reference at regular-expressions.info
java: (?<groupname>regex): Overview and naming rules (Non-Stack Overflow links)
Other languages: (?P<groupname>regex) python, (?<groupname>regex) .net, (?<groupname>regex) perl, (?P<groupname>regex) and (?<groupname>regex) php
Lookarounds
Lookaheads: (?=...):positive, (?!...):negative
Lookbehinds: (?<=...):positive, (?<!...):negative
Lookbehind limits in:
Lookbehinds need to be constant-length php, perl, python, ruby
Lookarounds of limited length {0,n} java
Variable length lookbehinds are allowed .net
Lookbehind alternatives:
Using \K php, perl (Flavors that support \K)
Alternative regex module for Python python
The hacky way
JavaScript negative lookbehind equivalents External link
Modifiers
flag
modifier
flavors
a
ASCII
python
c
current position
perl
e
expression
php perl
g
global
most
i
case-insensitive
most
m
multiline
php perl python javascript .net java
m
(non)multiline
ruby
o
once
perl ruby
r
non-destructive
perl
S
study
php
s
single line
ruby
U
ungreedy
php r
u
unicode
most
x
whitespace-extended
most
y
sticky ↪
javascript
How to convert preg_replace e to preg_replace_callback?
What are inline modifiers?
What is '?-mix' in a Ruby Regular Expression
Other:
|:alternation (OR) operator, .:any character, [.]:literal dot character
What special characters must be escaped?
Control verbs (php and perl): (*PRUNE), (*SKIP), (*FAIL) and (*F)
php only: (*BSR_ANYCRLF)
Recursion (php and perl): (?R), (?0) and (?1), (?-1), (?&groupname)
Common Tasks
Get a string between two curly braces: {...}
Match (or replace) a pattern except in situations s1, s2, s3...
How do I find all YouTube video ids in a string using a regex?
Validation:
Internet: email addresses, URLs (host/port: regex and non-regex alternatives), passwords
Numeric: a number, min-max ranges (such as 1-31), phone numbers, date
Parsing HTML with regex: See "General Information > When not to use Regex"
Advanced Regex-Fu
Strings and numbers:
Regular expression to match a line that doesn't contain a word
How does this PCRE pattern detect palindromes?
Match strings whose length is a fourth power
How does this regex find triangular numbers?
How to determine if a number is a prime with regex?
How to match the middle character in a string with regex?
Other:
How can we match a^n b^n?
Match nested brackets
Using a recursive pattern php, perl
Using balancing groups .net
“Vertical” regex matching in an ASCII “image”
List of highly up-voted regex questions on Code Golf
How to make two quantifiers repeat the same number of times?
An impossible-to-match regular expression: (?!a)a
Match/delete/replace this except in contexts A, B and C
Match nested brackets with regex without using recursion or balancing groups?
Flavor-Specific Information
(Except for those marked with *, this section contains non-Stack Overflow links.)
Java
Official documentation: Pattern Javadoc ↪, Oracle's regular expressions tutorial ↪
The differences between functions in java.util.regex.Matcher:
matches()): The match must be anchored to both input-start and -end
find()): A match may be anywhere in the input string (substrings)
lookingAt(): The match must be anchored to input-start only
(For anchors in general, see the section "Anchors")
The only java.lang.String functions that accept regular expressions: matches(s), replaceAll(s,s), replaceFirst(s,s), split(s), split(s,i)
*An (opinionated and) detailed discussion of the disadvantages of and missing features in java.util.regex
.NET
How to read a .NET regex with look-ahead, look-behind, capturing groups and back-references mixed together?
Official documentation:
Boost regex engine: General syntax, Perl syntax (used by TextPad, Sublime Text, UltraEdit, ...???)
JavaScript general info and RegExp object
.NET MySQL Oracle Perl5 version 18.2
PHP: pattern syntax, preg_match
Python: Regular expression operations, search vs match, how-to
Rust: crate regex, struct regex::Regex
Splunk: regex terminology and syntax and regex command
Tcl: regex syntax, manpage, regexp command
Visual Studio Find and Replace
General information
(Links marked with * are non-Stack Overflow links.)
Other general documentation resources: Learning Regular Expressions, *Regular-expressions.info, *Wikipedia entry, *RexEgg, Open-Directory Project
DFA versus NFA
Generating Strings matching regex
Books: Jeffrey Friedl's Mastering Regular Expressions
When to not use regular expressions:
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. (blog post written by Stack Overflow's founder)*
Do not use regex to parse HTML:
Don't. Please, just don't
Well, maybe...if you're really determined (other answers in this question are also good)
Examples of regex that can cause regex engine to fail
Why does this regular expression kill the Java regex engine?
Tools: Testers and Explainers
(This section contains non-Stack Overflow links.)
Online (* includes replacement tester, + includes split tester):
Debuggex (Also has a repository of useful regexes) javascript, python, pcre
*Regular Expressions 101 php, pcre, python, javascript, java
Regex Pal, regular-expressions.info javascript
Rubular ruby RegExr Regex Hero dotnet
*+ regexstorm.net .net
*RegexPlanet: Java java, Go go, Haskell haskell, JavaScript javascript, .NET dotnet, Perl perl php PCRE php, Python python, Ruby ruby, XRegExp xregexp
freeformatter.com xregexp
*+regex.larsolavtorvik.com php PCRE and POSIX, javascript
Offline:
Microsoft Windows: RegexBuddy (analysis), RegexMagic (creation), Expresso (analysis, creation, free)
MySQL 8.0: Various syntax changes were made. Note especially the doubling of backslashes in some contexts. (This Answer need further editing to reflect the differences.)

Any Lua pattern alternative to the regex (\\.|.)?

There's a common idiom for traversing a string whose characters may be escaped with a backslash by using the regex (\\.|.), like this:
alert( "some\\astring".replace(/(\\.|.)/g, "[$1]") )
That's in JavaScript. This code changes the string some\astring to [s][o][m][e][\a][s][t][r][i][n][g].
I know that Lua patterns don't support the OR operator, so we can't translate this regular expression directly into a Lua pattern.
Yet, I was wondering: is there an alternative way to do this (traversing possibly-escaped characters) in Lua, using a Lua pattern?
You can try
(\\?.)
and replace with [$1]
See it on Regexr.
? is a shortcut quantifier for 0 or 1 occurences, so the above pattern is matching a character and an optional backslash before. If ? is not working (I don't know lua) you can try {0,1} instead. That is the long version of the same.

Resources