Can a Neo4j identifier start with an integer? - neo4j

MATCH (p:Product {id:'19134046594'})-[r]-> (o:Attributes {4g:'network'}) return o
I received this exception:
Exception in thread "main" java.lang.RuntimeException: org.neo4j.driver.v1.exceptions.ClientException: Invalid input ':': expected an identifier character, whitespace or '}' (line 1, column 66 (offset: 65))
It's complaining about '4g'. Is '4g' an invalid property key identifier in neo4j? How to work around the issue?

You can use the backtick (`) character to quote property names that start with an otherwise illegal character.
For example, this would work:
CREATE (o:Attributes {`4g`: 'network'})
RETURN o;
And this also works:
MATCH (o:Attributes) WHERE o.`4g` = 'network'
RETURN o;

According to the naming rules section of the documentation, names you use (which does include property keys):
Must begin with an alphabetic letter.
and
Can contain numbers, but not as the first character.
You can however start it off with an underscore, so _4g:'network' would work.
I'm guessing this is just for example purposes, but it does seem to me that it would be better the other way around: network:'4g'.

Related

apoc.merge.node with special identifier fails

I tried to merge a node with apoc.merge.node but my ident property keys have a special char(:) and get double escaped. Did i miss something or does a workaround exist?
If i replace the ":" with "_" everything works as expected.
Neo4j 4.2.1 community and APOC 4.2.0
CALL apoc.merge.node(["test"], apoc.map.fromPairs([["i:d","123"]])) YIELD node return node
Error
Failed to invoke procedure `apoc.merge.node`: Caused by: org.neo4j.exceptions.SyntaxException: Invalid input 'i': expected "}" (line 1, column 17 (offset: 16))
"MERGE (n:test{``i:d``:$identProps.``i:d``}) ON CREATE SET n += $onCreateProps ON MATCH SET n += $onMatchProps RETURN n"
EDIT
It seems there is a bug in APOC which causes the identifier to be encoded twice.
First with Util::quote https://github.com/neo4j-contrib/neo4j-apoc-procedures/blob/4.1/core/src/main/java/apoc/util/Util.java#L674
And then in the merge procedure https://github.com/neo4j-contrib/neo4j-apoc-procedures/blob/4.1/core/src/main/java/apoc/merge/Merge.java#L85
I've filed an issue: https://github.com/neo4j-contrib/neo4j-apoc-procedures/issues/1783
In Neo4j, you can use backticks ` around a key that contain special characters :
CALL apoc.merge.node(["test"], apoc.map.fromPairs([["`i:d`","123"]]))
YIELD node
return node
Same is true everywhere in the Cypher syntax, escaping a label with a whitespace for eg :
MERGE (n:`Land Vehicle` {id: "land-rover-1"})

Flex expression required for validating certain expression based upon the first three characters only

For my parser, for the purpose of this question, any line starting with a single lowercase letter among a set of lowercase letters, followed by the character '=' followed by any other character is a valid line. So, the following are valid lines (all starting from first column):
a=20
b=50 70
q=20 Hello There
z=-
Any other line is not valid. My need is to match the complement. How do I write a flex expression to match the invalid lines. My confusion arises from the ^ which means start of line as well as complement the expression.
I thought ^[abq][=].+ would match the acceptable line so merely complementing it with ^ will do. But ^ at the start of the expression always implies match at start of the line. I made a few other attempts but that did not work too. Though not relevant, the expression is used as the first step to discard invalid SDP lines. See here for details from the relevant SDP RFC, if it matters.
The simplest approach is to always match entire lines (or use different start conditions to lexically analyse the rest of valid lines). Although flex does not have a negation operator (the [^…] negative character class is not an operator), in this case the expressions are pretty simple and can be expressed easily enough. Note that it doesn't matter that the various "invalid line" patterns are not disjoint, since it doesn't matter which one matches a particular invalid line. So here are three patterns which I believe collectively match all invalid lines
[^abqz\n].* { /* Starts with the wrong letter */ }
.[^=\n] { /* Second character not = */ }
.$ { /* Only one character in line */ }

Lex doesn't seem to follow precedence order

I am using ply (a popular python implementation of Lex and Yacc) to create a simple compiler for a custom language.
Currently my lexer looks as follows:
reserved = {
'begin': 'BEGIN',
'end': 'END',
'DECLARE': 'DECL',
'IMPORT': 'IMP',
'Dow': 'DOW',
'Enddo': 'ENDW',
'For': 'FOR',
'FEnd': 'ENDF',
'CASE': 'CASE',
'WHEN': 'WHN',
'Call': 'CALL',
'THEN': 'THN',
'ENDC': 'ENDC',
'Object': 'OBJ',
'Move': 'MOV',
'INCLUDE': 'INC',
'Dec': 'DEC',
'Vibration': 'VIB',
'Inclination': 'INCLI',
'Temperature': 'TEMP',
'Brightness': 'BRI',
'Sound': 'SOU',
'Time': 'TIM',
'Procedure': 'PROC'
}
tokens = ["INT", "COM", "SEMI", "PARO", "PARC", "EQ", "NAME"] + list(reserved.values())
t_COM = r'//'
t_SEMI = r";"
t_PARO = r'\('
t_PARC = r'\)'
t_EQ = r'='
t_NAME = r'[a-z][a-zA-Z_&!0-9]{0,9}'
def t_INT(t):
r'\d+'
t.value = int(t.value)
return t
def t_error(t):
print("Syntax error: Illegal character '%s'" % t.value[0])
t.lexer.skip(1)
Per the documentation, I am creating a dictionary for reserved keywords and then adding them to the tokens list, rather than adding individual rules for them. The documentation also states that precedence is decided following these 2 rules:
All tokens defined by functions are added in the same order as they appear in the lexer file.
Tokens defined by strings are added next by sorting them in order of decreasing regular expression length (longer expressions are added first).
The problem I'm having is that when I test the lexer using this test string
testInput = "// ; begin end DECLARE IMPORT Dow Enddo For FEnd CASE WHEN Call THEN ENDC (asdf) = Object Move INCLUDE Dec Vibration Inclination Temperature Brightness Sound Time Procedure 985568asdfLYBasdf ; Alol"
The lexer returns the following error:
LexToken(COM,'//',1,0)
LexToken(SEMI,';',1,2)
LexToken(NAME,'begin',1,3)
Syntax error: Illegal character ' '
LexToken(NAME,'end',1,9)
Syntax error: Illegal character ' '
Syntax error: Illegal character 'D'
Syntax error: Illegal character 'E'
Syntax error: Illegal character 'C'
Syntax error: Illegal character 'L'
Syntax error: Illegal character 'A'
Syntax error: Illegal character 'R'
Syntax error: Illegal character 'E'
(That's not the whole error but that's enough to see whats happening)
For some reason, Lex is parsing NAME tokens before parsing the keywords. Even after it's done parsing NAME tokens, it doesn't recognize the DECLARE reserved keyword. I have also tried to add reserved keywords with the rest of the tokens, using regular expressions but I get the same result (also the documentation advises against doing so).
Does anyone know how to fix this problem? I want the Lexer to identify reserved keywords first and then to attempt to tokenize the rest of the input.
Thanks!
EDIT:
I get the same result even when using the t_ID function exemplified in the documentation:
def t_NAME(t):
r'[a-z][a-zA-Z_&!0-9]{0,9}'
t.type = reserved.get(t.value,'NAME')
return t
The main problem here is that you are not ignoring whitespace; all the errors are a consequence. Adding a t_ignore definition to your grammar will eliminate those errors.
But the grammar won't work as expected even if you fix the whitespace issue, because you seem to be missing an important aspect of the documentation, which tells you how to actually use the dictionary reserved:
To handle reserved words, you should write a single rule to match an identifier and do a special name lookup in a function like this:
reserved = {
'if' : 'IF',
'then' : 'THEN',
'else' : 'ELSE',
'while' : 'WHILE',
...
}
tokens = ['LPAREN','RPAREN',...,'ID'] + list(reserved.values())
def t_ID(t):
r'[a-zA-Z_][a-zA-Z_0-9]*'
t.type = reserved.get(t.value,'ID') # Check for reserved words
return t
(In your case, it would be NAME and not ID.)
Ply knows nothing about the dictionary reserved and it also has no idea how you produce the token names enumerated in tokens. The only point of tokens is to let Ply know which symbols in the grammar represent tokens and which ones represent non-terminals. The mere fact that some word is in tokens does not serve to define the pattern for that token.

How to delimit this cypher query because there is a syntax error caused by the label name

This Cypher statement causes a syntax error:
CREATE (mediawiki-1.27:Schema { key: mediawiki-1.27, name:mediawiki-1.27})
The error seems to be caused by the - character in the node label:
Invalid input '1': expected whitespace, [ or '-' (line 1, column 19 (offset: 18))
"CREATE (mediawiki-1.27:Schema { key: mediawiki-1.27, name:mediawiki-1.27})"
Dashes and dots are not allowed in variable names. You can surround the variable name with backticks to escape it.
Also I'm guessing your key and name values are strings, in which case make sure to surround them with quotes:
CREATE (`mediawiki-1.27`:Schema { key:'mediawiki-1.27', name:'mediawiki-1.27'})

Write a Lex rule to parse Integer and Float

I am writing a parse for a script language.
I need to recognize strings, integers and floats.
I successfully recognize strings with the rule:
[a-zA-Z0-9_]+ {return STRING;}
But I have problem recognizing Integers and Floats. These are the (wrong) rules I wrote:
["+"|"-"][1-9]{DIGIT}* { return INTEGER;}
["+"|"-"]["0." | [1-9]{DIGIT}*"."]{DIGIT}+ {return FLOAT;}
How can I fix them?
Furthermore, since a "abc123" is a valid string, how can I make sure that it is recognized as a string and not as the concatenation of a string ("abc") and an Integer ("123") ?
First problem: There's a difference between (...) and [...]. Your regular expressions don't do what you think they do because you're using the wrong punctuation.
Beyond that:
No numeric rule recognizes 0.
Both numeric rules require an explicit sign.
Your STRING rule recognizes integers.
So, to start:
[...] encloses a set of individual characters or character ranges. It matches a single character which is a member of the set.
(...) encloses a regular expression. The parentheses are used for grouping, as in mathematics.
"..." encloses a sequence of individual characters, and matches exactly those characters.
With that in mind, let's look at
["+"|"-"][1-9]{DIGIT}*
The first bracket expression ["+"|"-"] is a set of individual characters or ranges. In this case, the set contains: ", +, " (again, which has no effect because a set contains zero or one instances of each member), |, and the range "-", which is a range whose endpoints are the same character, and consequently only includes that character, ", which is already in the set. In short, that was equivalent to ["+|]. It will match one of those three characters. It requires one of those three characters, in fact.
The second bracket expression [1-9] matches one character in the range 1-9, so it probably does what you expected. Again, it matches exactly one character.
Finally, {DIGIT} matches the expansion of the name DIGIT. I'll assume that you have the definition:
DIGIT [0-9]
somewhere in your definitions section. (In passing, I note that you could have just used the character class [:digit:], which would have been unambiguous, and you would not have needed to define it.) It's followed by a *, which means that it will match zero or more repetitions of the {DIGIT} definition.
Now, an example of a string which matches that pattern:
|42
And some examples of strings which don't match that pattern:
-7 # The pattern must start with |, + or "
42 # Again, the pattern must start with |, + or "
+0 # The character following the + must be in the range [0-9]
Similarly, your float pattern, once the [...] expressions are simplified, becomes (writing out the individual pieces one per line, to make it more obvious):
["+|] # i.e. the set " + |
["0.|[1-9] # i.e. the set " 0 | [ 1 2 3 4 5 6 7 8 9
{DIGIT}* # Any number of digits
"." # A single period
] # A single ]
{DIGIT}+ # one or more digits
So here's a possible match:
"..]3
I'll skip over writing out the solution because I think you'll benefit more from doing it yourself.
Now, the other issues:
Some rule should match 0. If you don't want to allow leading zeros, you'll need to just a it as a separate rule.
Use the optional operator (?) to indicate that the preceding object is optional. eg. "foo"? matches either the three characters f, o, o (in order) or matches the empty string. You can use that to make the sign optional.
The problem is not the matching of abc123, as in your question. (F)lex always gives you the longest possible match, and the only rule which could match the starting character a is the string rule, so it will allow the string rule to continue as long as it can. It will always match all of abc123. However, it will also match 123, which you would probably prefer to be matched by your numeric rule. Here, the other (f)lex matching criterion comes into play: when there are two or more rules which could match exactly the same string, and none of the rules can match a longer string, (f)lex chooses the first rule in the file. So if you want to give numbers priority over strings, you have to put the number rule earlier in your (f)lex file than the string rule.
I hope that gives you some ideas about how to fix things.

Resources