Neo4j Cypher Unknown Error - neo4j

So I created a date dimension from this article
a link
I modified it and added datestamp to Day node which is Month/Day/Year (string)
I added indexes on Year.year, Month.month, Day.day && day.datestamp
When I run this query:
MATCH p=(day2:Day {datestamp:'1/1/2015'})-[:NEXT*]->(day {day:2})
return length(p)
limit 5
It takes 1667 ms to execute
When I modify the query to this:
MATCH p=(day2:Day {datestamp:'1/1/2015'})-[:NEXT*]->(day {datestamp:'1/2/2015'})
return length(p)
After it runs for about a minute, it ends in the Unknown Error message.
My schema is:
Indexes
ON :Day(day) ONLINE
ON :Day(datestamp) ONLINE
ON :Month(month) ONLINE
ON :Year(year) ONLINE
No constraints
Any ideas what I'm doing wrong?

I think I figured it out.
Looks like the 1st query that runs 1667ms only runs and completes because of limit 5, it finds 5 records and stops further execution.
While the other keeps going and going until it runs out of juice.
I think solution in this case is constraint that indicates datestamp is unique which should prevent further execution.
Still interesting, considering there's about 2600+ records connected with HAS_NEXT so traveling through those relationships shouldn't be taking this long to find out that there's only 1 record that matches that query.

Related

Neo4j poor performance on ORDER BY

I have query like this
MATCH (p:Person)-[s:KNOWS]->(t:Person) WHERE s.state = "blocked"
WITH DISTINCT (t) AS user
SKIP 0
LIMIT 10
MATCH (user)<-[r:KNOWS { state: "blocked" }]-(p:Person)
RETURN user.username, SIZE(COLLECT(p.username)) as count
First problem is when I have SKIP for example 100, it's getting slower, any idea why?
Seconds problem is, when I try to add ORDER BY, for example ORDER BY p.createdAt which is date (indexed field), it's always timing out.
You might have better performance with a tweak to your initial MATCH:
MATCH (t:Person)
WHERE ()-[:KNOWS {state:"blocked"}]->(t)
WITH t AS user // no longer need DISTINCT here
...
For better performance you might consider creating a fulltext schema index on :KNOWS relationships by their state property, and use the fulltext index query procedure to do this initial lookup (this is assuming :KNOWS relationships always connect two :Person nodes).

Find first/last event according to time tree in Neo4j

I created a time tree (Day-Month-Year) and assigned events to it. Now I try to find the first and the last event for a user, who causes the events. This is my code to find the last event (assuming all events happen in the same month):
match (day:Day)<--(event:Event)-->(user:User{userID:"007"})
with MAX(day.Day) AS max
match (day) where day.Day=max
return day
But this query returns ALL days, and not only the one with the highest .Day-Property.
After finding the node, i will process with it, so solutions as the following are not suitable
RETURN ... ORDER BY ... DESC LIMIT 1
Thanks a lot!
Note: Time-Tree-Model is designed is shown in the picture.
Source: graphaware.com
that works:
match (day:Day)<--(event:Event)-->(user:User{UserID:"007"})
with MAX(day.Day) AS max, collect(day) as days
match (day) where day in days anD day.Day=max
return day

neo4j cypher - Differing query plan behavior

Nodes with the Location node label have an index on Label.name
Profiling the following query gives me a smart plan, with a NodeHashJoin between the two sides of the graph on either side of Trip nodes. Very clever. Works great.
PROFILE MATCH (rosen:Location)<-[:OCCURS_AT]-(ev:Event)<-[:HAS]-(trip:Trip)-[:OPERATES_ON]->(date:Date)
WHERE rosen.name STARTS WITH "U Rosent" AND
ev.scheduled_departure_time > "07:45:00" AND
date.date = '2015-11-20'
RETURN rosen.name, ev.scheduled_departure_time, trip.headsign
ORDER BY ev.scheduled_departure_time
LIMIT 20;
However, just changing one line of the query from:
WHERE rosen.name STARTS WITH "U Rosent" AND
to
WHERE id(rosen) = 4752371 AND
seems to alter the entire behavior of the query plan, which now appears to become more "sequential", losing the parallel execution of (Trip)-[:OPERATES_ON]->(Date)
Much slower. 6x more DB hits in total.
Question
Why does changing the retrieval of one, seemingly-unrelated Location node via a different index/mechanism alter the behavior of the whole query?
(I'm not sure how best to convey more information about the graph model, but please advise, and I'd be happy to add details that are missing)
Edit:
It gets better. Changing that query line from:
WHERE rosen.name STARTS WITH "U Rosent" AND
to
WHERE rosen.name = "U Rosenthaler Platz." AND
results in the same loss of parallelism in the query plan!
Seems odd that a LIKE query is faster than an = ?

Neo4j cyper performance on simple match

I have a very simple cypher which give me a poor performance.
I have approx. 2 million user and 60 book category with relation from user to category around 28 million.
When I do this cypher:
MATCH (u:User)-[read:READ]->(bc:BookCategory)
WHERE read.timestamp >= timestamp() - (1000*60*60*24*30)
RETURN distinct(bc.id);
It returns me 8.5k rows within 2 - 2.5 (First time) minutes
And when I do this cypher:
MATCH (u:User)-[read:READ]->(bc:BookCategory)
WHERE read.timestamp >= timestamp() - (1000*60*60*24*30)
RETURN u.id, u.email, read.timestamp;
It return 55k rows within 3 to 6 (First time) minutes.
I already have index on User id and email, but still I don't think this performance is acceptable. Any idea how can I improve this?
First of all, you can profile your query, to find what happens under the hood.
Currently looks like that query scans all nodes in database to complete query.
Reasons:
Neo4j support indexes only for '=' operation (or 'IN')
To complete query, it traverses all nodes, one by one, checking each node if it has valid timestamp
There is no straightforward way to deal with this problem.
You should look into creating proper graph structure, to deal with Time-specific queries more efficiently. There are several ways how to represent time in graph databases.
You can take look on graphaware/neo4j-timetree library.
Can you explain your model a bit?
Where are the books and the "reading"-Event in it?
Afaik all you want to know, which book categories have been recently read (in the last month)?
You could create a second type of relationship thats RECENTLY_READ which expires (is deleted) by a batch job it is older than 30 days. (That can be two simple cypher statements which create and delete those relationships).
WITH (1000*60*60*24*30) as month
MATCH (a:User)-[read:READ]->(b:BookCategory)
WHERE read.timestamp >= timestamp() - month
MERGE (a)-[rr:RECENTLY_READ]->(b)
WHERE coalesce(rr.timestamp,0) < read.timestamp
SET rr.timestamp = read.timestamp;
WITH (1000*60*60*24*30) as month
MATCH (a:User)-[rr:RECENTLY_READ]->(b:BookCategory)
WHERE rr.timestamp < timestamp() - month
DELETE rr;
There is another way to achieve what you exactly want to do here, but it's unfortunately not possible in Cypher.
With a relationship-index on timestamp on your read relationship you can run a Lucene-NumericRangeQuery in Neo4j's Java API.
But I wouldn't really recommend to go down this route.

Neo4j Cypher query fails and return with an Unknown Error

I'm trying to build a Cypher query to test if a specific structure exists so I can relate dates to it.
Running Neo4j 2.1.0-M01 on a Linux server, but the same issue occurred with Neo4j 2.0.1
We're starting with a clean database, 0 nodes.
First I'm running this MATCH query to prove that it runs.
Obviously this query is not going to return any nodes. But after creating the nodes, it will
fail with an 'unknown error'.
It seems like a bug to me, since a query with fewer nodes will return. Does anyone have suggestions how to rewrite this query for now?
Sorry for the large amount of code in this post.
Thanks,
-Edwin
Cypher Query:
MATCH (c:Cluster{cluster_name:'mycluster',cluster_uuid:'7bd4f66d-5faf-11db-8d0d-000e0cba569c'})
,(sc1:Controller{serialnumber:'7000610071',system_id:'1873784171',hostname:'node01',node_uuid:'7cd70205-66ae-11e0-a4a9-0deba859517d'})
,(sc1)-[:IS_PART_OF_CLUSTER]->(c)
,(sc2:Controller{serialnumber:'7000606111',system_id:'1873778118',hostname:'node02',node_uuid:'b954f0a1-6682-11e0-b8a0-517da924923d'})
,(sc2)-[:IS_PART_OF_CLUSTER]->(c)
,(sc3:Controller{serialnumber:'7000561878',system_id:'1873772083',hostname:'node03',node_uuid:'ac293586-6690-11e0-b8a0-517da924923d'})
,(sc3)-[:IS_PART_OF_CLUSTER]->(c)
,(sc4:Controller{serialnumber:'800000075807',system_id:'1873784143',hostname:'node04',node_uuid:'e8d6c7e5-663e-11e0-b8a0-517da924923d'})
,(sc4)-[:IS_PART_OF_CLUSTER]->(c)
,(sc5:Controller{serialnumber:'7000477261',system_id:'1873745662',hostname:'node05',node_uuid:'1d1ecc64-728c-11e0-bf0e-3d25383f7ed3'})
,(sc5)-[:IS_PART_OF_CLUSTER]->(c)
,(sc6:Controller{serialnumber:'7000477273',system_id:'1873745654',hostname:'node06',node_uuid:'140fb0f9-728c-11e0-afeb-49fcf0b6e6c3'})
,(sc6)-[:IS_PART_OF_CLUSTER]->(c)
,(sc7:Controller{serialnumber:'7000474908',system_id:'1873745665',hostname:'node07',node_uuid:'edbf9c62-728b-11e0-bf0e-3d25383f7ed3'})
,(sc7)-[:IS_PART_OF_CLUSTER]->(c)
,(sc8:Controller{serialnumber:'7000474910',system_id:'1873745695',hostname:'node08',node_uuid:'20dbfe67-7832-11e0-afeb-49fcf0b6e6c3'})
,(sc8)-[:IS_PART_OF_CLUSTER]->(c)
,(sc9:Controller{serialnumber:'7000609864',system_id:'1873802756',hostname:'node09',node_uuid:'a8b75397-6690-11e0-b8a0-517da924923d'})
,(sc9)-[:IS_PART_OF_CLUSTER]->(c)
,(sc10:Controller{serialnumber:'7000610021',system_id:'1873791030',hostname:'node10',node_uuid:'f6cf0705-6670-11e0-b8a0-517da924923d'})
,(sc10)-[:IS_PART_OF_CLUSTER]->(c)
,(sc11:Controller{serialnumber:'7000610057',system_id:'1873784128',hostname:'node11',node_uuid:'551b1bf8-663d-11e0-b8a0-517da924923d'})
,(sc11)-[:IS_PART_OF_CLUSTER]->(c)
,(sc12:Controller{serialnumber:'7000609981',system_id:'1873778164',hostname:'node12',node_uuid:'f6062d6c-663e-11e0-9b53-cd0ece6aa2ce'})
,(sc12)-[:IS_PART_OF_CLUSTER]->(c)
,(sc13:Controller{serialnumber:'7000610033',system_id:'1873778186',hostname:'node13',node_uuid:'ed0e61c5-6670-11e0-b07f-933da0385fdc'})
,(sc13)-[:IS_PART_OF_CLUSTER]->(c)
,(sc14:Controller{serialnumber:'7000610069',system_id:'1873784175',hostname:'node14',node_uuid:'8623ea28-66ae-11e0-ab9d-5fdc7f30dee7'})
,(sc14)-[:IS_PART_OF_CLUSTER]->(c)
,(sc15:Controller{serialnumber:'7000606109',system_id:'1873778197',hostname:'node15',node_uuid:'b4349a83-6682-11e0-ab9d-5fdc7f30dee7'})
,(sc15)-[:IS_PART_OF_CLUSTER]->(c)
,(sc16:Controller{serialnumber:'7000610045',system_id:'1873784157',hostname:'node16',node_uuid:'67d80db8-663d-11e0-9b53-cd0ece6aa2ce'})
,(sc16)-[:IS_PART_OF_CLUSTER]->(c)
,(sc17:Controller{serialnumber:'7001246085',system_id:'2014175904',hostname:'node19',node_uuid:'1c792588-ff4e-11db-94fe-3b91dd7dd242'})
,(sc17)-[:IS_PART_OF_CLUSTER]->(c)
,(sc18:Controller{serialnumber:'7001246097',system_id:'2014176797',hostname:'node20',node_uuid:'3ae9e7ae-ff44-11db-864d-af2af862bba3'})
,(sc18)-[:IS_PART_OF_CLUSTER]->(c)
RETURN c,sc1,sc2,sc3,sc4,sc5,sc6,sc7,sc8,sc9,sc10,sc11,sc12,sc13,sc14,sc15,sc16,sc17,sc18
Rows returned: 0
Cypher Query for creating the nodes:
CREATE (c:Cluster{cluster_name:'mycluster',cluster_uuid:'7bd4f66d-5faf-11db-8d0d-000e0cba569c'})
,(sc1:Controller{serialnumber:'7000610071',system_id:'1873784171',hostname:'node01',node_uuid:'7cd70205-66ae-11e0-a4a9-0deba859517d'})
,(sc1)-[:IS_PART_OF_CLUSTER]->(c)
,(sc2:Controller{serialnumber:'7000606111',system_id:'1873778118',hostname:'node02',node_uuid:'b954f0a1-6682-11e0-b8a0-517da924923d'})
,(sc2)-[:IS_PART_OF_CLUSTER]->(c)
,(sc3:Controller{serialnumber:'7000561878',system_id:'1873772083',hostname:'node03',node_uuid:'ac293586-6690-11e0-b8a0-517da924923d'})
,(sc3)-[:IS_PART_OF_CLUSTER]->(c)
,(sc4:Controller{serialnumber:'800000075807',system_id:'1873784143',hostname:'node04',node_uuid:'e8d6c7e5-663e-11e0-b8a0-517da924923d'})
,(sc4)-[:IS_PART_OF_CLUSTER]->(c)
,(sc5:Controller{serialnumber:'7000477261',system_id:'1873745662',hostname:'node05',node_uuid:'1d1ecc64-728c-11e0-bf0e-3d25383f7ed3'})
,(sc5)-[:IS_PART_OF_CLUSTER]->(c)
,(sc6:Controller{serialnumber:'7000477273',system_id:'1873745654',hostname:'node06',node_uuid:'140fb0f9-728c-11e0-afeb-49fcf0b6e6c3'})
,(sc6)-[:IS_PART_OF_CLUSTER]->(c)
,(sc7:Controller{serialnumber:'7000474908',system_id:'1873745665',hostname:'node07',node_uuid:'edbf9c62-728b-11e0-bf0e-3d25383f7ed3'})
,(sc7)-[:IS_PART_OF_CLUSTER]->(c)
,(sc8:Controller{serialnumber:'7000474910',system_id:'1873745695',hostname:'node08',node_uuid:'20dbfe67-7832-11e0-afeb-49fcf0b6e6c3'})
,(sc8)-[:IS_PART_OF_CLUSTER]->(c)
,(sc9:Controller{serialnumber:'7000609864',system_id:'1873802756',hostname:'node09',node_uuid:'a8b75397-6690-11e0-b8a0-517da924923d'})
,(sc9)-[:IS_PART_OF_CLUSTER]->(c)
,(sc10:Controller{serialnumber:'7000610021',system_id:'1873791030',hostname:'node10',node_uuid:'f6cf0705-6670-11e0-b8a0-517da924923d'})
,(sc10)-[:IS_PART_OF_CLUSTER]->(c)
,(sc11:Controller{serialnumber:'7000610057',system_id:'1873784128',hostname:'node11',node_uuid:'551b1bf8-663d-11e0-b8a0-517da924923d'})
,(sc11)-[:IS_PART_OF_CLUSTER]->(c)
,(sc12:Controller{serialnumber:'7000609981',system_id:'1873778164',hostname:'node12',node_uuid:'f6062d6c-663e-11e0-9b53-cd0ece6aa2ce'})
,(sc12)-[:IS_PART_OF_CLUSTER]->(c)
,(sc13:Controller{serialnumber:'7000610033',system_id:'1873778186',hostname:'node13',node_uuid:'ed0e61c5-6670-11e0-b07f-933da0385fdc'})
,(sc13)-[:IS_PART_OF_CLUSTER]->(c)
,(sc14:Controller{serialnumber:'7000610069',system_id:'1873784175',hostname:'node14',node_uuid:'8623ea28-66ae-11e0-ab9d-5fdc7f30dee7'})
,(sc14)-[:IS_PART_OF_CLUSTER]->(c)
,(sc15:Controller{serialnumber:'7000606109',system_id:'1873778197',hostname:'node15',node_uuid:'b4349a83-6682-11e0-ab9d-5fdc7f30dee7'})
,(sc15)-[:IS_PART_OF_CLUSTER]->(c)
,(sc16:Controller{serialnumber:'7000610045',system_id:'1873784157',hostname:'node16',node_uuid:'67d80db8-663d-11e0-9b53-cd0ece6aa2ce'})
,(sc16)-[:IS_PART_OF_CLUSTER]->(c)
,(sc17:Controller{serialnumber:'7001246085',system_id:'2014175904',hostname:'node19',node_uuid:'1c792588-ff4e-11db-94fe-3b91dd7dd242'})
,(sc17)-[:IS_PART_OF_CLUSTER]->(c)
,(sc18:Controller{serialnumber:'7001246097',system_id:'2014176797',hostname:'node20',node_uuid:'3ae9e7ae-ff44-11db-864d-af2af862bba3'})
,(sc18)-[:IS_PART_OF_CLUSTER]->(c)
RETURN c,sc1,sc2,sc3,sc4,sc5,sc6,sc7,sc8,sc9,sc10,sc11,sc12,sc13,sc14,sc15,sc16,sc17,sc18
19 nodes created, 18 relationships.
Now when I run the first query again, it takes one minute and will eventually return with 'Unknown error'.
Cypher Query:
MATCH (c:Cluster{cluster_name:'mycluster',cluster_uuid:'7bd4f66d-5faf-11db-8d0d-000e0cba569c'})
,(sc1:Controller{serialnumber:'7000610071',system_id:'1873784171',hostname:'node01',node_uuid:'7cd70205-66ae-11e0-a4a9-0deba859517d'})
,(sc1)-[:IS_PART_OF_CLUSTER]->(c)
,(sc2:Controller{serialnumber:'7000606111',system_id:'1873778118',hostname:'node02',node_uuid:'b954f0a1-6682-11e0-b8a0-517da924923d'})
,(sc2)-[:IS_PART_OF_CLUSTER]->(c)
,(sc3:Controller{serialnumber:'7000561878',system_id:'1873772083',hostname:'node03',node_uuid:'ac293586-6690-11e0-b8a0-517da924923d'})
,(sc3)-[:IS_PART_OF_CLUSTER]->(c)
,(sc4:Controller{serialnumber:'800000075807',system_id:'1873784143',hostname:'node04',node_uuid:'e8d6c7e5-663e-11e0-b8a0-517da924923d'})
,(sc4)-[:IS_PART_OF_CLUSTER]->(c)
,(sc5:Controller{serialnumber:'7000477261',system_id:'1873745662',hostname:'node05',node_uuid:'1d1ecc64-728c-11e0-bf0e-3d25383f7ed3'})
,(sc5)-[:IS_PART_OF_CLUSTER]->(c)
,(sc6:Controller{serialnumber:'7000477273',system_id:'1873745654',hostname:'node06',node_uuid:'140fb0f9-728c-11e0-afeb-49fcf0b6e6c3'})
,(sc6)-[:IS_PART_OF_CLUSTER]->(c)
,(sc7:Controller{serialnumber:'7000474908',system_id:'1873745665',hostname:'node07',node_uuid:'edbf9c62-728b-11e0-bf0e-3d25383f7ed3'})
,(sc7)-[:IS_PART_OF_CLUSTER]->(c)
,(sc8:Controller{serialnumber:'7000474910',system_id:'1873745695',hostname:'node08',node_uuid:'20dbfe67-7832-11e0-afeb-49fcf0b6e6c3'})
,(sc8)-[:IS_PART_OF_CLUSTER]->(c)
,(sc9:Controller{serialnumber:'7000609864',system_id:'1873802756',hostname:'node09',node_uuid:'a8b75397-6690-11e0-b8a0-517da924923d'})
,(sc9)-[:IS_PART_OF_CLUSTER]->(c)
,(sc10:Controller{serialnumber:'7000610021',system_id:'1873791030',hostname:'node10',node_uuid:'f6cf0705-6670-11e0-b8a0-517da924923d'})
,(sc10)-[:IS_PART_OF_CLUSTER]->(c)
,(sc11:Controller{serialnumber:'7000610057',system_id:'1873784128',hostname:'node11',node_uuid:'551b1bf8-663d-11e0-b8a0-517da924923d'})
,(sc11)-[:IS_PART_OF_CLUSTER]->(c)
,(sc12:Controller{serialnumber:'7000609981',system_id:'1873778164',hostname:'node12',node_uuid:'f6062d6c-663e-11e0-9b53-cd0ece6aa2ce'})
,(sc12)-[:IS_PART_OF_CLUSTER]->(c)
,(sc13:Controller{serialnumber:'7000610033',system_id:'1873778186',hostname:'node13',node_uuid:'ed0e61c5-6670-11e0-b07f-933da0385fdc'})
,(sc13)-[:IS_PART_OF_CLUSTER]->(c)
,(sc14:Controller{serialnumber:'7000610069',system_id:'1873784175',hostname:'node14',node_uuid:'8623ea28-66ae-11e0-ab9d-5fdc7f30dee7'})
,(sc14)-[:IS_PART_OF_CLUSTER]->(c)
,(sc15:Controller{serialnumber:'7000606109',system_id:'1873778197',hostname:'node15',node_uuid:'b4349a83-6682-11e0-ab9d-5fdc7f30dee7'})
,(sc15)-[:IS_PART_OF_CLUSTER]->(c)
,(sc16:Controller{serialnumber:'7000610045',system_id:'1873784157',hostname:'node16',node_uuid:'67d80db8-663d-11e0-9b53-cd0ece6aa2ce'})
,(sc16)-[:IS_PART_OF_CLUSTER]->(c)
,(sc17:Controller{serialnumber:'7001246085',system_id:'2014175904',hostname:'node19',node_uuid:'1c792588-ff4e-11db-94fe-3b91dd7dd242'})
,(sc17)-[:IS_PART_OF_CLUSTER]->(c)
,(sc18:Controller{serialnumber:'7001246097',system_id:'2014176797',hostname:'node20',node_uuid:'3ae9e7ae-ff44-11db-864d-af2af862bba3'})
,(sc18)-[:IS_PART_OF_CLUSTER]->(c)
RETURN c,sc1,sc2,sc3,sc4,sc5,sc6,sc7,sc8,sc9,sc10,sc11,sc12,sc13,sc14,sc15,sc16,sc17,sc18
Returns Unknown Error
Your MATCH query is way to complicated. There's no need to specify every node. The following query returns the cluster and its related controllers:
MATCH (c:Cluster{cluster_name:'mycluster',cluster_uuid:'7bd4f66d-5faf-11db-8d0d-000e0cba569c'})<-[:IS_PART_OF_CLUSTER]-(sc:Controller)
WITH c, collect(sc) as controllers
RETURN c as cluster, controllers

Resources