I am trying to add the thousands operators in my labels in Tableau by using REGEXP_REPLACE:
'$' + REGEXP_REPLACE(STR(ROUND(SUM([Metrcs num]),0)),"(\d)(?=(\d{3})+$)", "$0,")
but the following error occurs: Invalid regular expression: '(\d)(?=(\d{3})+$)', no argument for repetition operator: ?
I am wondering how I can fix it.
Related
I am using ply (a popular python implementation of Lex and Yacc) to create a simple compiler for a custom language.
Currently my lexer looks as follows:
reserved = {
'begin': 'BEGIN',
'end': 'END',
'DECLARE': 'DECL',
'IMPORT': 'IMP',
'Dow': 'DOW',
'Enddo': 'ENDW',
'For': 'FOR',
'FEnd': 'ENDF',
'CASE': 'CASE',
'WHEN': 'WHN',
'Call': 'CALL',
'THEN': 'THN',
'ENDC': 'ENDC',
'Object': 'OBJ',
'Move': 'MOV',
'INCLUDE': 'INC',
'Dec': 'DEC',
'Vibration': 'VIB',
'Inclination': 'INCLI',
'Temperature': 'TEMP',
'Brightness': 'BRI',
'Sound': 'SOU',
'Time': 'TIM',
'Procedure': 'PROC'
}
tokens = ["INT", "COM", "SEMI", "PARO", "PARC", "EQ", "NAME"] + list(reserved.values())
t_COM = r'//'
t_SEMI = r";"
t_PARO = r'\('
t_PARC = r'\)'
t_EQ = r'='
t_NAME = r'[a-z][a-zA-Z_&!0-9]{0,9}'
def t_INT(t):
r'\d+'
t.value = int(t.value)
return t
def t_error(t):
print("Syntax error: Illegal character '%s'" % t.value[0])
t.lexer.skip(1)
Per the documentation, I am creating a dictionary for reserved keywords and then adding them to the tokens list, rather than adding individual rules for them. The documentation also states that precedence is decided following these 2 rules:
All tokens defined by functions are added in the same order as they appear in the lexer file.
Tokens defined by strings are added next by sorting them in order of decreasing regular expression length (longer expressions are added first).
The problem I'm having is that when I test the lexer using this test string
testInput = "// ; begin end DECLARE IMPORT Dow Enddo For FEnd CASE WHEN Call THEN ENDC (asdf) = Object Move INCLUDE Dec Vibration Inclination Temperature Brightness Sound Time Procedure 985568asdfLYBasdf ; Alol"
The lexer returns the following error:
LexToken(COM,'//',1,0)
LexToken(SEMI,';',1,2)
LexToken(NAME,'begin',1,3)
Syntax error: Illegal character ' '
LexToken(NAME,'end',1,9)
Syntax error: Illegal character ' '
Syntax error: Illegal character 'D'
Syntax error: Illegal character 'E'
Syntax error: Illegal character 'C'
Syntax error: Illegal character 'L'
Syntax error: Illegal character 'A'
Syntax error: Illegal character 'R'
Syntax error: Illegal character 'E'
(That's not the whole error but that's enough to see whats happening)
For some reason, Lex is parsing NAME tokens before parsing the keywords. Even after it's done parsing NAME tokens, it doesn't recognize the DECLARE reserved keyword. I have also tried to add reserved keywords with the rest of the tokens, using regular expressions but I get the same result (also the documentation advises against doing so).
Does anyone know how to fix this problem? I want the Lexer to identify reserved keywords first and then to attempt to tokenize the rest of the input.
Thanks!
EDIT:
I get the same result even when using the t_ID function exemplified in the documentation:
def t_NAME(t):
r'[a-z][a-zA-Z_&!0-9]{0,9}'
t.type = reserved.get(t.value,'NAME')
return t
The main problem here is that you are not ignoring whitespace; all the errors are a consequence. Adding a t_ignore definition to your grammar will eliminate those errors.
But the grammar won't work as expected even if you fix the whitespace issue, because you seem to be missing an important aspect of the documentation, which tells you how to actually use the dictionary reserved:
To handle reserved words, you should write a single rule to match an identifier and do a special name lookup in a function like this:
reserved = {
'if' : 'IF',
'then' : 'THEN',
'else' : 'ELSE',
'while' : 'WHILE',
...
}
tokens = ['LPAREN','RPAREN',...,'ID'] + list(reserved.values())
def t_ID(t):
r'[a-zA-Z_][a-zA-Z_0-9]*'
t.type = reserved.get(t.value,'ID') # Check for reserved words
return t
(In your case, it would be NAME and not ID.)
Ply knows nothing about the dictionary reserved and it also has no idea how you produce the token names enumerated in tokens. The only point of tokens is to let Ply know which symbols in the grammar represent tokens and which ones represent non-terminals. The mere fact that some word is in tokens does not serve to define the pattern for that token.
I am currently trying to improvise/fix bug an existing grammar which someone else has created.
We have our own language for which we have created an editor We are using eclipse ide.
Some grammar examples like
calc : choice INTEGER INTEGER
choice : add|sub|div|mul
INTEGER : ('0'..'9')+
So in my editor, if I type
calc add 2 aaa
So the error parser of antlr recognizes it as an error since it is expecting an integer and we typed string and throws error message such as
extraneous input 'aaa' expecting {'{', INTEGER}"
(I have my class extends BaseErrorListener, where I create markers for these errors )
Similarly, I have such grammar defined for my editor.
Now the question is: for all this, it identifies that something is wrong in the syntax and it throws errors, but what for syntax which is not part of grammar like
If I type any garbage value such as
abc add 2 3
or
just_type_junk_in_editor
it does not throw any error since ‘abc’ or ‘just_type_junk_in_editor‘ is not in my grammar
so is there a way that for keywords which are not part of grammar, the error parser of antlr should parse it as an error.
Without having seen the full grammar I think your problem is the missing EOF token in your main rule. ANTLR4 consumes input as much as it can, but if it doesn't match anything at least in the main rule, it ignores the rest, which explains why you don't see an error. By adding EOF you tell your ANTLR4 that all input must be matched:
calc: choice INTEGER INTEGER EOF;
PEG-based parser generators usually provide limited error reporting on invalid inputs. From what I read, the parse dialect of rebol is inspired by PEG grammars extended with regular expressions.
For example, typing the following in JavaScript:
d8> function () {}
gives the following error, because no identifier was provided in declaring a global function:
(d8):1: SyntaxError: Unexpected token (
function () {}
^
The parser is able to pinpoint exactly the position during parsing where an expected token is missing. The character position of the expected token is used to position the arrow in the error message.
Does the parse dialect in rebol provides built-in facilities to report the line and column errors on invalid inputs?
Otherwise, are there examples out there of custom rolled out parse rules that provide such error reporting?
I've done very advanced Rebol parsers which manage live and mission-critical TCP servers, and doing proper error reporting was a requirement. So this is important!
Probably one of the most unique aspects of Rebol's PARSE is that you can include direct evaluation within the rules. So you can set variables to track the parse position, or the error messages, etc. (It's very easy because the nature of Rebol is that mixing code and data as the same thing is a core idea.)
So here's the way I did it. Before each match rule is attempted, I save the parse position into "here" (by writing here:) and then also save an error into a variable using code execution (by putting (error: {some error string}) in parentheses so that the parse dialect runs it). If the match rule succeeds, we don't need to use the error or position...and we just go on to the next rule. But if it fails we will have the last state we set to report after the failure.
Thus the pattern in the parse dialect is simply:
; use PARSE dialect handling of "set-word!" instances to save parse
; position into variable named "here"
here:
; escape out of the parse dialect using parentheses, and into the DO
; dialect to run arbitrary code. Here we run code that saves an error
; message string into a variable named "error"
(error: "<some error message relating to rule that follows>")
; back into the PARSE dialect again, express whatever your rule is,
; and if it fails then we will have the above to use in error reporting
what: (ever your) [rule | {is}]
That's basically what you need to do. Here is an example for phone numbers:
digit: charset "012345689"
phone-number-rule: [
here:
(error: "invalid area code")
["514" | "800" | "888" | "916" "877"]
here:
(error: "expecting dash")
"-"
here:
(error: "expecting 3 digits")
3 digit
here:
(error: "expecting dash")
"-"
here:
(error: "expecting 4 digits")
4 digit
(error: none)
]
Then you can see it in action. Notice that we set error to none if we reach the end of the parse rules. PARSE will return false if there is still more input to process, so if we notice there is no error set but PARSE returns false anyway... we failed because there was too much extra input:
input: "800-22r2-3333"
if not parse input phone-number-rule [
if none? error [
error: "too much data for phone number"
]
]
either error [
column: length? copy/part input here newline
print rejoin ["error at position:" space column]
print error
print input
print rejoin [head insert/dup "" space column "^^"}
print newline
][
print {all good}
]
The above will print the following:
error at position: 4
expecting 3 digits
800-22r2-3333
^
Obviously, you could do much more potent stuff, since whatever you put in parens will be evaluated just like normal Rebol source code. It's really flexible. I even have parsers which update progress bars while loading huge datasets... :-)
Here is a simple example of finding the position during parsing a string which could be used to do what you ask.
Let us say that our code is only valid if it contains a and b characters, anything else would be illegal input.
code-rule: [
some [
"a" |
"b"
]
[ end | mark: (print [ "Failed at position" index? mark ]) ]
]
Let's check that with some valid code
>> parse "aaaabbabb" code-rule
== true
Now we can try again with some invalid input
>> parse "aaaabbXabb" code-rule
Failed at position 7
== false
This is a rather simplified example language, but it should be easy to extend to more a complex example.
I have a simple little grammar which keeps giving a multiple alternatives error when I try to generate Xtext artefacts.
The grammar is:
grammar org.xtext.example.hyrule.HyRule with org.eclipse.xtext.xbase.Xbase
generate hyRule (You can only use links to eclipse.org sites while you have fewer than 25 messages )
Start:
rules+=Rule+
;
Rule:
'FOR''PAYLOAD'payload=PAYLOAD'ELEMENTS' elements+=JvmFormalParameter+'CONSTRAINED' 'BY' expressions+= XExpression*;
PAYLOAD:
"Stacons"|"PFResults"|"any"
;
And the exact error I get is:
![warning(200): ../org.xtext.example.hyrule/src-gen/org/xtext/example/hyrule/parser/antlr/internal/InternalHyRule.g:3197:2: Decision can match input such as "{RULE_ID, '=>', '('}" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
error(201): ../org.xtext.example.hyrule/src-gen/org/xtext/example/hyrule/parser/antlr/internal/InternalHyRule.g:3197:2: The following alternatives can never be matched: 2][1]
I have attached the Syntax diagram for the generated antlr grammar in antlrworks, and can clearly see the multiple alternatives(JvmFormalParameter can match RULE_ID via the JvmTypeReference or the ValidID rule).
So it looks as if JvmFormalParameter is ambiguous...Apologies for my stupidity but could someone point out what it is I'm missing? Is there some way of overcoming this ambiguity when using the JvmFormalParameter rule in my grammar?
The rule JvmFormalParameter is defined as
JvmFormalParameter returns types::JvmFormalParameter:
(parameterType=JvmTypeReference)? name=ValidID;
so the type of the parameter is optional. If you use elements+=JvmFormalParameter+, you allow multiple parameters without a delimiter thus the parser cannot decide about the input sequence
String s
since both String and s could be names of two parameters or String s could be a single parameter with a type String and the name s. You should use a delimiter like
elements+=JvmFormalParameter (',' elements+=JvmFormalParameter)*
or use the rule FullJvmFormalParameter which is defined with a mandatory type reference:
FullJvmFormalParameter returns types::JvmFormalParameter:
parameterType=JvmTypeReference name=ValidID;
i've tried to read the apple documentation but i'll never use regular expression and i do not understand how to solve this matter.
I need to write a regular expression for check if the user has selected a string that match with this rules:
first letter could be s or S and it's optional
second chars could be a . (char dot) and it's optional
4 number digit could from 1 to 4 and must be always present
after the number could be present any single char from a-z and could be optional
after i can have a . (char dot) and it's optional
the last 2 must be a digit must be always 2 number and could be optional
i tried to write in this way
NSString *regexStr = #"(s|S)?(\\.)?(\\d+){,4}([a-z]?(\\.)?(\\d+){2}?";
but the console give this error
Error making regex: Error Domain=NSCocoaErrorDomain Code=2048
"The operation couldn’t be completed. (Cocoa error 2048.)" UserInfo=0x6d3e9b0
{NSInvalidValue=(s|S)?(\.)?(\d+){,4}([a-z]?(\.)?(\d+){2}?}
Anyone could help me?
Thanks!
Here you go:
(s|S)?(\.)?(\d){0,4}[a-z]?(\.)?(\d){2}?
You had an extra opening parenthesis that made the syntax invalid, and + signs after the \ds, that made the regexp accept infinite numbers of digits.
Regexpal is great for debugging regexps: http://regexpal.com/