{"SyntaxException: Invalid input '{': expected an identifier character, whitespace - neo4j

VB.NET program that loads a few thousand nodes and set up relationships between them. I'm using the Neo4J C# Client (version 1.1.0.8)
One of the commands is
TheConnection.GraphClient.Cypher.Match(
"(user1:StartingPoint)",
"(user2:Committee)"
).Where(
Function(user1 As StartingPoint)
user1.Id = KnowsID
).AndWhere(
Function(user2 As Committee)
user2.Id = KnownID
).Create(
"user1-[r: Knows ]->user2"
).ExecuteWithoutResults()
For various business logic reasons I want to match the nodes by FECIDNumber (it's actually a string, in example 'C00530767') instead of ID. So I changed
KnownID from Long to String
user2.Id = KnownID
This gives me following query
TheConnection.GraphClient.Cypher.Match(
"(user1:StartingPoint)",
"(user2:Committee)"
).Where(
Function(user1 As StartingPoint)
user1.Id = KnowsID
).AndWhere(
Function(user2 As Committee) user2.FECIDNumber = KnownID
).Create(
"user1-[r: Knows ]->user2"
).ExecuteWithoutResults()
When executed it throws
{"SyntaxException: Invalid input '{':expected an identifier character, whitespace, '?', '!', '.', node labels, '[', ""=~"", IN, STARTS, ENDS, CONTAINS, IS, '^', '*', '/', '%', '+', '-', '=', ""<>"", ""!="", '<', '>', ""<="", "">="", AND, XOR, OR or ')' (line 3, column 23 (offset: 95))" & vbLf & """AND (user2.FECIDNumber{p1}{p2} = {p3})" & vbCr & """" & vbLf & " ^"}
When I go into the Neo4J Browser and run
MATCH (user:Committee) WHERE user.FECIDNumber = "C00530766" RETURN user
it returns the node as expected.
I think the important part of the error seems to be
(line 3, column 23 (offset: 95))
" & vbLf & """AND (user2.FECIDNumber{p1}{p2} = {p3})" & vbCr & """" & vbLf & "
It looks like the Neo4J C# Client is tossing in a second parameter {p2}, but that's just a guess.
Any suggestions?
Edit 1
(I didn't know I could even pull the raw query text)
It's returning
MATCH (user1:StartingPoint), (user2:Committee)
WHERE (user1.Id = 1)
AND (user2.FECIDNumber"C00530766"false = 0)
CREATE user1-[r: Knows ]->user2
Clearly the problem is that
user2.FECIDNumber = KnownID).Create("user1-[r: Knows ]->user2")
is somehow generating
user2.FECIDNumber"C00530766"false = 0
Ideas? Is there a different syntax I should be using? Do I need to convert FECIDNumber to a different type?
Edit 2
The same code now generates
MATCH (user1:StartingPoint), (user2:Committee)
WHERE (user1.Id = 1)
AND (user2.FECIDNumber = "C00530766")
CREATE user1-[r: Knows ]->user2
And it creates the relationship as expected.
Winner.....

I have published a version (1.1.0.26) which should resolve this for you, it'll take a few mins for Nuget to index it, so give it 1/2 an hour or so from when this is posted...
Let me know!

Related

apoc.merge.node with special identifier fails

I tried to merge a node with apoc.merge.node but my ident property keys have a special char(:) and get double escaped. Did i miss something or does a workaround exist?
If i replace the ":" with "_" everything works as expected.
Neo4j 4.2.1 community and APOC 4.2.0
CALL apoc.merge.node(["test"], apoc.map.fromPairs([["i:d","123"]])) YIELD node return node
Error
Failed to invoke procedure `apoc.merge.node`: Caused by: org.neo4j.exceptions.SyntaxException: Invalid input 'i': expected "}" (line 1, column 17 (offset: 16))
"MERGE (n:test{``i:d``:$identProps.``i:d``}) ON CREATE SET n += $onCreateProps ON MATCH SET n += $onMatchProps RETURN n"
EDIT
It seems there is a bug in APOC which causes the identifier to be encoded twice.
First with Util::quote https://github.com/neo4j-contrib/neo4j-apoc-procedures/blob/4.1/core/src/main/java/apoc/util/Util.java#L674
And then in the merge procedure https://github.com/neo4j-contrib/neo4j-apoc-procedures/blob/4.1/core/src/main/java/apoc/merge/Merge.java#L85
I've filed an issue: https://github.com/neo4j-contrib/neo4j-apoc-procedures/issues/1783
In Neo4j, you can use backticks ` around a key that contain special characters :
CALL apoc.merge.node(["test"], apoc.map.fromPairs([["`i:d`","123"]]))
YIELD node
return node
Same is true everywhere in the Cypher syntax, escaping a label with a whitespace for eg :
MERGE (n:`Land Vehicle` {id: "land-rover-1"})

Capture group in Lua pattern matches literal digit character instead of capture group

I want to extract the VALUE of lines containing key="VALUE", and I am trying to use a simple Lua pattern to solve this.
It works for lines except for those which contains a literal 1 in the VALUE. It seems the pattern parser is confusing my capture group for an escape sequence.
> return string.find('... key = "PHONE2" ...', 'key%s*=%s*(["\'])([^%1]-)%1')
5 18 " PHONE2
> return string.find('... key = "PHONE1" ...', 'key%s*=%s*(["\'])([^%1]-)%1')
nil
>
You do not need to use the [^%1] at all. Just use .- as it, by definition, matches the smallest possible string.
Also, you can use multiline string syntax, to not have to escape the quotes in your pattern:
> s=[[... key = "PHONE1" ...]]
> return s:find [[key%s*=%s*(["'])(.-)%1]]
5 18 " PHONE1
The pattern [^%1] actually means, do not search for characters % and 1 individually.

Replace each pattern in regexp

I'm having some trouble to find the right pattern to get the string I want.
My starting string is :
,,,,C3:,D3,E3,F3,,
I would like to have
C3: [D3,E3,F3]
I would like to replace each starting commas by double space
Replace coma after colon by double space and left square bracket
Replace trailing commas by right square bracket
For now, I tried this :
> a = ",,,,C3:,D3,E3,F3,,"
=> ",,,,C3:,D3,E3,F3,,"
> b = a.gsub(/^,*/, " ").gsub(/(?<=:),/, " [").gsub(/[,]*$/,"" ).gsub(/[ ]*$/, "]")
=> " C3: [D3,E3,F3]"
> b == " C3: [D3,E3,F3]"
=> false
I can't reach to replace each starting comma by a double space to obtain 8 spaces in this case.
Could you help me to find the right regexp and if possible to improve my code, please ?
To replace each starting comma with a double space, you need to use \G operator, i.e. .gsub(/\G,/, ' '). That operator tells the regex engine to match at the start of the string and then after each successful match. So, you only replace each consecutive comma in the beginning of the string with .gsub(/\G,/, ' ').
Then, you can add other replacements:
s.gsub(/\G,/, ' ').sub(/,+\z/, ']').sub(/:,+/, ': [')
See the IDEONE demo
s = ",,,,C3:,D3,E3,F3,,"
puts s.gsub(/\G,/, ' ').sub(/,+\z/, ']').sub(/:,+/, ': [')
Output:
C3: [D3,E3,F3]
To construct the desired string, one needs to know:
the number of leading commas (the size of the string comprised of the leading commas)
the string following the leading commas up to and including the colon
the string between the comma following the colon and two or more commas
It is a simple matter to construct a regex that saves each of these three strings to a capture group:
r = /
(,*) # match leading commas in capture group 1
(.+:) # match up and including colon in capture group 2
, # match comma
(.+) # match any number of any characters in capture group 3
,, # match two commas
/x # extended/free-spacing regex definition mode
",,,,C3:,D3,E3,F3,," =~ r
We can now form the desired string from the contents of the three capture groups:
"#{' '*$1.size}#{$2} [#{$3}]"
#=> " C3: [D3,E3,F3]"

Why does *any* not backtrack in this example?

I'm trying to understand why in the following example I do not get a match on f2. Contrast that with f1 which does succeed as expected.
import 'package:petitparser/petitparser.dart';
import 'package:petitparser/debug.dart';
main() {
showIt(p, s, [tag = '']) {
var result = p.parse(s);
print('''($tag): $result ${result.message}, ${result.position} in:
$s
123456789123456789
''');
}
final id = letter() & word().star();
final f1 = id & char('(') & letter().star() & char(')');
final f2 = id & char('(') & any().star() & char(')');
showIt(f1, 'foo(a)', 'as expected');
showIt(f2, 'foo(a)', 'why any not matching a?');
final re1 = new RegExp(r'[a-zA-Z]\w*\(\w*\)');
final re2 = new RegExp(r'[a-zA-Z]\w*\(.*\)');
print('foo(a)'.contains(re1));
print('foo(a)'.contains(re2));
}
The output:
(as expected): Success[1:7]: [f, [o, o], (, [a], )] null, 6 in:
foo(a)
123456789123456789
(why any not matching a?): Failure[1:7]: ")" expected ")" expected, 6 in:
foo(a)
123456789123456789
true
true
I'm pretty sure the reason has to do with the fact that any matches the closing paren. But when it then looks for the closing paren and can't find it, shouldn't it:
backtrack the last character
assume the any().star() succeeded with just the 'a'
accept the final paren and succeed
Also in contrast I show the analagous regexps that do this.
As you analyzed correctly, the any parser in your example consumes the closing parenthesis. And the star parser wrapping the any parser is eagerly consuming as much input as possible.
Backtracking as you describe is not automatically done by PEGs (parsing expression grammars). Only the ordered choice backtracks automatically.
To fix your example there are multiple possibilities. Most strait forward one is to not make any match the closing parenthesis:
id & char('(') & char(')').neg().star() & char(')')
or
id & char('(') & pattern('^)').star() & char(')')
Alternatively you can use the starLazy operator. Its implementation is using the star and ordered choice operators. An explanation can be found here.
id & char('(') & any().starLazy(char(')')) & char(')')

Funny CSV format help

I've been given a large file with a funny CSV format to parse into a database.
The separator character is a semicolon (;). If one of the fields contains a semicolon it is "escaped" by wrapping it in doublequotes, like this ";".
I have been assured that there will never be two adjacent fields with trailing/ leading doublequotes, so this format should technically be ok.
Now, for parsing it in VBScript I was thinking of
Replacing each instance of ";" with a GUID,
Splitting the line into an array by semicolon,
Running back through the array, replacing the GUIDs with ";"
It seems to be the quickest way. Is there a better way? I guess I could use substrings but this method seems to be acceptable...
Your method sounds fine with the caveat that there's absolutely no possibility that your GUID will occur in the text itself.
On approach I've used for this type of data before is to just split on the semi-colons regardless then, if two adjacent fields end and start with a quote, combine them.
For example:
Pax;is;a;good;guy";" so;says;his;wife.
becomes:
0 Pax
1 is
2 a
3 good
4 guy"
5 " so
6 says
7 his
8 wife.
Then, when you discover that fields 4 and 5 end and start (respectively) with a quote, you combine them by replacing the field 4 closing quote with a semicolon and removing the field 5 opening quote (and joining them of course).
0 Pax
1 is
2 a
3 good
4 guy; so
5 says
6 his
7 wife.
In pseudo-code, given:
input: A string, first character is input[0]; last
character is input[length]. Further, assume one dummy
character, input[length+1]. It can be anything except
; and ". This string is one line of the "CSV" file.
length: positive integer, number of characters in input
Do this:
set start = 0
if input[0] = ';':
you have a blank field in the beginning; do whatever with it
set start = 2
endif
for each c between 1 and length:
next iteration unless string[c] = ';'
if input[c-1] ≠ '"' or input[c+1] ≠ '"': // test for escape sequence ";"
found field consting of half-open range [start,c); do whatever
with it. Note that in the case of empty fields, start≥c, leaving
an empty range
set start = c+1
endif
end foreach
Untested, of course. Debugging code like this is always fun….
The special case of input[0] is to make sure we don't ever look at input[-1]. If you can make input[-1] safe, then you can get rid of that special case. You can also put a dummy character in input[0] and then start your data—and your parsing—from input[1].
One option would be to find instances of the regex:
[^"];[^"]
and then break the string apart with substring:
List<string> ret = new List<string>();
Regex r = new Regex(#"[^""];[^""]");
Match m;
while((m = r.Match(line)).Success)
{
ret.Add(line.Substring(0,m.Index + 1);
line = line.Substring(m.Index + 2);
}
(Sorry about the C#, I don't known VBScript)
Using quotes is normal for .csv files. If you have quotes in the field then you may see opening and closing and the embedded quote all strung together two or three in a row.
If you're using SQL Server you could try using T-SQL to handle everything for you.
SELECT * INTO MyTable FROM OPENDATASOURCE('Microsoft.JET.OLEDB.4.0',
'Data Source=F:\MyDirectory;Extended Properties="text;HDR=No"')...
[MyCsvFile#csv]
That will create and populate "MyTable". Read more on this subject here on SO.
I would recommend using RegEx to break up the strings.
Find every ';' that is not a part of
";" and change it to something else
that does not appear in your fields.
Then go through and replace ";" with ;
Now you have your fields with the correct data.
Most importers can swap out separator characters pretty easily.
This is basically your GUID idea. Just make sure the GUID is unique to your file before you start and you will be fine. I tend to start using 'Z'. After enough 'Z's, you will be unique (sometimes as few as 1-3 will do).
Jacob

Resources