Why is Neo4J telling me there is no spoon? - neo4j

I am using Neo4J to represent texts; in the simplest case a text is a sequence of words joined by the relationship LEMMA_TEXT.
I am trying to find the Nth word after a known word, with a query that looks something like this.
MATCH (anchor)-[:LEMMA_TEXT*32]->(word)
WHERE id(anchor) = 3275
RETURN word
In one particular case, if I increase the path length to 33, I get this error:
Neo.DatabaseError.Statement.ExecutionFailure: There is no spoon.
And yet the following query returns the correct result.
MATCH (anchor)-[:LEMMA_TEXT*32]->(word)-[:LEMMA_TEXT]->(next)
WHERE id(anchor) = 3275
RETURN next
which demonstrates that the node I want exists and is reachable.
Where is the section of the manual that tells me how to bend the spoon with my mind? More importantly, what does this actually mean?!

If anything breaks at number like 33, it means that there was a restriction upto 32, why 32? 2^5.
It's not trivial that most of the restrictions are in a factor of 2, MongoDB document size cannot be more than 16 MB, on a collection there could be maximum index, no more than 64. etc.
why it works as 32 and then next, because till 32 it can achieve in one operation and for last one it can see the next one as another operation. But it cannot go for 33 in one operation.
Most of these restrictions are basically sanity check though and not really technical boundary.
As for why it is almost always a factor of 2, I want someone else to answer or in other words I don't know.

Have you tried splitting the landing and the search statements in 2?
Plus you should add the label for the text word (forormance)
Example:
MATCH (anchor)
WHERE id(anchor) = 3275
WITH anchor
MATCH (anchor)-[:LEMMA_TEXT*32]->(word)
RETURN word
You get the same error?

Related

table size difference. are both examples identical?

tNum={[2]=true , [3]=true,[4]=true, [5]=true ,[6]=true }
#tNum-->0
tNum={}
tNum[2]=true
tNum[3]=true
tNum[4]=true
tNum[5]=true
tNum[6]=true
#tNum-->6
why such a difference in size?
are both examples identical?
Your two tables are semantically identical, but using # on them is ambiguous. Both 0 and 6 are correct lengths. Here's an abridged version of the docs:
The length operator applied on a table returns a border in that table. A border in a table t is any natural number that satisfies the following condition:
(border == 0 or t[border] ~= nil) and t[border + 1] == nil
A table with exactly one border is called a sequence.
When t is not a sequence, #t can return any of its borders. (The exact one depends on details of the internal representation of the table, which in turn can depend on how the table was populated and the memory addresses of its non-numeric keys.)
This is an example of undefined behavior (UB). (That may not be the right word, because the behavior is partially defined. UB in Lua can't launch nuclear weapons, as it can in C.) Undefined behavior is important, because it gives the devs the freedom to choose the fastest possible algorithm without worrying about what happens when a user violates their assumptions.
To find a length, Lua makes, at most, log n guesses instead of looking at every element to find an unambiguous length. For large arrays, this speeds things up a lot.
The issue is that when you define a table as starting at index [2], the length operator breaks because it assumes that tables start at index [1].
The following code works as intended:
tNum = {[1]=false, [2]=true, [3]=true, [4]=true, [5]=true, [6]=true}
#tNum => 6
The odd behaviour is caused because when you initialize an array with tNum={} it initializes by assigning every index to nil, and the first index is [1] (It doesn't actually initialize every value to nil, but it's easier to explain that way).
Conversely, when you initialize an array with tNum={[2]=true} you are explicitly telling the array that tNum[1] does not exist and the array begins at index 2. The length calculation breaks when you do this.
For a more thorough explanation, see this section of the lua wiki near the bottom where it explains:
For those that really want their arrays starting at 0, it is not difficult to write the following:
days = {[0]="Sunday", "Monday", "Tuesday", "Wednesday",
"Thursday", "Friday", "Saturday"}
Now, the first value, "Sunday", is at index 0. That zero does not affect the other fields, but "Monday" naturally goes to index 1, because it is the first list value in the constructor; the other values follow it. Despite this facility, I do not recommend the use of arrays starting at 0 in Lua. Remember that most functions assume that arrays start at index 1, and therefore will not handle such arrays correctly.
The Length operator assumes your array will begin at index [1], and since it does not, it doesn't work correctly.
I hope this was helpful, good luck with your code!

How to concatenate three columns into one and obtain count of unique entries among them using Cypher neo4j?

I can query using Cypher in Neo4j from the Panama database the countries of three types of identity holders (I define that term) namely Entities (companies), officers (shareholders) and Intermediaries (middle companies) as three attributes/columns. Each column has single or double entries separated by colon (eg: British Virgin Islands;Russia). We want to concatenate the countries in these columns into a unique set of countries and hence obtain the count of the number of countries as new attribute.
For this, I tried the following code from my understanding of Cypher:
MATCH (BEZ2:Officer)-[:SHAREHOLDER_OF]->(BEZ1:Entity),(BEZ3:Intermediary)-[:INTERMEDIARY_OF]->(BEZ1:Entity)
WHERE BEZ1.address CONTAINS "Belize" AND
NOT ((BEZ1.countries="Belize" AND BEZ2.countries="Belize" AND BEZ3.countries="Belize") OR
(BEZ1.status IN ["Inactivated", "Dissolved shelf company", "Dissolved", "Discontinued", "Struck / Defunct / Deregistered", "Dead"]))
SET BEZ4.countries= (BEZ1.countries+","+BEZ2.countries+","+BEZ3.countries)
RETURN BEZ3.countries AS IntermediaryCountries, BEZ3.name AS
Intermediaryname, BEZ2.countries AS OfficerCountries , BEZ2.name AS
Officername, BEZ1.countries as EntityCountries, BEZ1.name AS Companyname,
BEZ1.address AS CompanyAddress,DISTINCT count(BEZ4.countries) AS NoofConnections
The relevant part is the SET statement in the 7th line and the DISTINCT count in the last line. The code shows error which makes no sense to me: Invalid input 'u': expected 'n/N'. I guess it means to use COLLECT probably but we tried that as well and it shows the error vice-versa'd between 'u' and 'n'. Please help us obtain the output that we want, it makes our job hell lot easy. Thanks in advance!
EDIT: Considering I didn't define variable as suggested by #Cybersam, I tried the command CREATE as following but it shows the error "Invalid input 'R':" for the command RETURN. This is unfathomable for me. Help really needed, thank you.
CODE 2:
MATCH (BEZ2:Officer)-[:SHAREHOLDER_OF]->(BEZ1:Entity),(BEZ3:Intermediary)-
[:INTERMEDIARY_OF]->(BEZ1:Entity)
WHERE BEZ1.address CONTAINS "Belize" AND
NOT ((BEZ1.countries="Belize" AND BEZ2.countries="Belize" AND
BEZ3.countries="Belize") OR
(BEZ1.status IN ["Inactivated", "Dissolved shelf company", "Dissolved",
"Discontinued", "Struck / Defunct / Deregistered", "Dead"]))
CREATE (p:Connections{countries:
split((BEZ1.countries+";"+BEZ2.countries+";"+BEZ3.countries),";")
RETURN BEZ3.countries AS IntermediaryCountries, BEZ3.name AS
Intermediaryname, BEZ2.countries AS OfficerCountries , BEZ2.name AS
Officername, BEZ1.countries as EntityCountries, BEZ1.name AS Companyname,
BEZ1.address AS CompanyAddress, AS TOTAL, collect (DISTINCT
COUNT(p.countries)) AS NumberofConnections
Lines 8 and 9 are the ones new and to be in examination.
First Query
You never defined the identifier BEZ4, so you cannot set a property on it.
Second Query (which should have been posted in a separate question):
You have several typos and a syntax error.
This query should not get an error (but you will have to determine if it does what you want):
MATCH (BEZ2:Officer)-[:SHAREHOLDER_OF]->(BEZ1:Entity),(BEZ3:Intermediary)- [:INTERMEDIARY_OF]->(BEZ1:Entity)
WHERE BEZ1.address CONTAINS "Belize" AND NOT ((BEZ1.countries="Belize" AND BEZ2.countries="Belize" AND BEZ3.countries="Belize") OR (BEZ1.status IN ["Inactivated", "Dissolved shelf company", "Dissolved", "Discontinued", "Struck / Defunct / Deregistered", "Dead"]))
CREATE (p:Connections {countries: split((BEZ1.countries+";"+BEZ2.countries+";"+BEZ3.countries), ";")})
RETURN BEZ3.countries AS IntermediaryCountries,
BEZ3.name AS Intermediaryname,
BEZ2.countries AS OfficerCountries ,
BEZ2.name AS Officername,
BEZ1.countries as EntityCountries,
BEZ1.name AS Companyname,
BEZ1.address AS CompanyAddress,
SIZE(p.countries) AS NumberofConnections;
Problems with the original:
The CREATE clause was missing a closing } and also a closing ).
The RETURN clause had a dangling AS TOTAL term.
collect (DISTINCT COUNT(p.countries)) was attempting to perform nested aggregation, which is not supported. In any case, even if it had worked, it probably would not have returned what you wanted. I suspect that you actually wanted the size of the p.countries collection, so that is what I used in my query.

How to find nodes being contained in a node's properties interval?

I'm currently developing some kind of a configurator using neo4j as a backend. Now I ran into a problem, I don't know how to solve best.
I've got nodes created like this:
(A:Product {name:'ProductA', minWidth:20, maxWidth:200, minHeight:10, maxHeight:400})
(B:Product {name:'ProductB', minWidth:40, maxWidth:100, minHeight:20, maxHeight:300})
...
There is an interface where the user can input a desired width & height, f.e. Width=30, Height=250. Now I'd like to check which products match the input criteria. As the input might be any long value, the approach used in http://neo4j.com/blog/modeling-a-multilevel-index-in-neoj4/ with dates doesn't seem to be suitable for me. How can I run a cypher query giving me all the nodes matching the input criteria?
I don't know if I understand well what you are asking for, but if I do, here a simple query to get this:
Assuming the user wants width = 30 and height = 50
Match (p:Product)
WHERE
p.minWidth < 30 AND p.maxWidth > 30 AND
p.minHeight < 50 AND p.maxHeight > 50
RETURN
p
If this is not what you are looking for, feel free to say it as comment.

Rails – assign an order number to each record

So I am importing passages from a book into my application. I am giving all the passages in a given book the class Passage. i.e. Passage.all
I do have many books so I also have a class Book. Therefore, when I am finding all the passages from one book I call:
Passage.where(book_id: self.book_id)
When I use the where method, does it preserve the "natural order", which Passage.all would generally return. If not, I could change the code to:
Passage.where(book_id: self.book_id).order("created_at ASC")
Anyway, I then proceed to write this code:
a = Passage.where(book_id: self.book_id)
b = a.index(self)+1
self.passage_number = b
[first line: returns all the passages in the book]
[second line: returns their number in the array + 1 to account for the 0 starting value thing (pardon the colloquialism)]
[third line: assigns that index value to the passage number]
Ultimately, I am trying to compute passage numbers, without having to hard code them.
SO WHAT'S MY ISSUE? Right now I am getting three passage #3's, and two passage #4's. My last passage is this:
Passage.last.passage_number = 217
Passage.where(book_id: 5).count = 241
It is skipping numbers and incorrectly indexing, so I think I need to code a better method! What's a better way to index an array in this context?
There is no such thing as "natural order": without an order clause Passage.all may return things in any order the database wants (which could depend on things like location of items on disk, query plan etc).
The first and last methods are special in that they order by id if your relation does not already have an order applied to it.
If you need things in a specific order then add an order clause.

How to sort a list of 1million records by the first letter of the title

I have a table with 1 million+ records that contain names. I would like to be able to sort the list by the first letter in the name.
.. ABCDEFGHIJKLMNOPQRSTUVWXYZ
What is the most efficient way to setup the db table to allow for searching by the first character in the table.name field?
The best idea right now is to add an extra field which stores the first character of the name as an observer, index that field and then sort by that field. Problem is it's no longer necessarily alphabetical.
Any suggestions?
You said in a comment:
so lets ignore the first letter part. How can I all records that start with A? All A's no B...z ? Thanks – AnApprentice Feb 21 at 15:30
I issume you meant How can I RETURN all records...
This is the answer:
select * from t
where substr(name, 1, 1) = 'A'
I agree with the questions above as to why you would want to do this -- a regular index on the whole field is functionally equivalent. PostgreSQL (with some new ones in v. 9) has some rather powerful indexing capabilities for special cases which you might want to read about here http://www.postgresql.org/docs/9.1/interactive/sql-createindex.html

Resources