simple cypher query unreasonably slow - what am I doing wrong? - neo4j

I'm trying to get all the relationships connected to a given node that also have a property called 'name'. this is my cypher:
MATCH (starting { number:'123' })<-[r]-() WHERE HAS(r.name) RETURN r
this is unimaginably slow! it takes neo4j ages to compute even if there are only few return values, and there are not so many relationships connected to the node (1 to 10 relationships at most).
am I doing something wrong here?
other cyphers works fine.
thanks!

The number of relationships on the one node might be less relevant if you have not told Neo enough about your graph structure.
Firstly use labels and secondly use indexes. The below will Use a Label YourLabel on the property number.
CREATE INDEX ON :YourLabel(number)
Then hit the index to start the query, and use a type on your relationship too.
MATCH (:YourLabel{number:'123'})<-[r:RELATIONSHIP_TYPE]-()
WHERE HAS (r.name)
RETURN r
Now instead of scanning through every node for the number property with a value of 123, it reads only a single Index.
To use the labels, create your nodes like this (will be added to index):
CREATE (s1:YourLabel{number:"1"})

Related

Neo4j and Cypher - How can I create/merge chained sequential node relationships (and even better time-series)?

To keep things simple, as part of the ETL on my time-series data, I added a sequence number property to each row corresponding to 0..370365 (370,366 nodes, 5,555,490 properties - not that big). I later added a second property and named it "outeseq" (original) and "ineseq" (second) to see if an outright equivalence to base the relationship on might speed things up a bit.
I can get both of the following queries to run properly on up to ~30k nodes (LIMIT 30000) but past that, its just an endless wait. My JVM has 16g max (if it can even use it on a windows box):
MATCH (a:BOOK),(b:BOOK)
WHERE a.outeseq=b.outeseq-1
MERGE (a)-[s:FORWARD_SEQ]->(b)
RETURN s;
or
MATCH (a:BOOK),(b:BOOK)
WHERE a.outeseq=b.ineseq
MERGE (a)-[s:FORWARD_SEQ]->(b)
RETURN s;
I also added these in hopes of speeding things up:
CREATE CONSTRAINT ON (a:BOOK)
ASSERT a.outeseq IS UNIQUE
CREATE CONSTRAINT ON (b:BOOK)
ASSERT b.ineseq IS UNIQUE
I can't get the relationships created for the entire data set! Help!
Alternatively, I can also get bits of the relationships built with parameters, but haven't figured out how to parameterize the sequence over all of the node-to-node sequential relationships, at least not in a semantically general enough way to do this.
I profiled the query, but did't see any reason for it to "blow-up".
Another question: I would like each relationship to have a property to represent the difference in the time-stamps of each node or delta-t. Is there a way to take the difference between the two values in two sequential nodes, and assign it to the relationship?....for all of the relationships at the same time?
The last Q, if you have the time - I'd really like to use the raw data and just chain the directed relationships from one nodes'stamp to the next nearest node with the minimum delta, but didn't run right at this for fear that it cause scanning of all the nodes in order to build each relationship.
Before anyone suggests that I look to KDB or other db's for time series, let me say I have a very specific reason to want to use a DAG representation.
It seems like this should be so easy...it probably is and I'm blind. Thanks!
Creating Relationships
Since your queries work on 30k nodes, I'd suggest to run them page by page over all the nodes. It seems feasible because outeseq and ineseq are unique and numeric so you can sort nodes by that properties and run query against one slice at time.
MATCH (a:BOOK),(b:BOOK)
WHERE a.outeseq = b.outeseq-1
WITH a, b ORDER BY a.outeseq SKIP {offset} LIMIT 30000
MERGE (a)-[s:FORWARD_SEQ]->(b)
RETURN s;
It will take about 13 times to run the query changing {offset} to cover all the data. It would be nice to write a script on any language which has a neo4j client.
Updating Relationship's Properties
You can assign timestamp delta to relationships using SET clause following the MATCH. Assuming that a timestamp is a long:
MATCH (a:BOOK)-[s:FORWARD_SEQ]->(b:BOOK)
SET s.delta = abs(b.timestamp - a.timestamp);
Chaining Nodes With Minimal Delta
When relationships have the delta property inside, the graph becomes a weighted graph. So we can apply this approach to calculate the shortest path using deltas. Then we just save the length of the shortest path (summ of deltas) into the relation between the first and the last node.
MATCH p=(a:BOOK)-[:FORWARD_SEQ*1..]->(b:BOOK)
WITH p AS shortestPath, a, b,
reduce(weight=0, r in relationships(p) : weight+r.delta) AS totalDelta
ORDER BY totalDelta ASC
LIMIT 1
MERGE (a)-[nearest:NEAREST {delta: totalDelta}]->(b)
RETURN nearest;
Disclaimer: queries above are not supposed to be totally working, they just hint possible approaches to the problem.

Is a DFS Cypher Query possible?

My database contains about 300k nodes and 350k relationships.
My current query is:
start n=node(3) match p=(n)-[r:move*1..2]->(m) where all(r2 in relationships(p) where r2.GameID = STR(id(n))) return m;
The nodes touched in this query are all of the same kind, they are different positions in a game. Each of the relationships contains a property "GameID", which is used to identify the right relationship if you want to pass the graph via a path. So if you start traversing the graph at a node and follow the relationship with the right GameID, there won't be another path starting at the first node with a relationship that fits the GameID.
There are nodes that have hundreds of in and outgoing relationships, some others only have a few.
The problem is, that I don't know how to tell Cypher how to do this. The above query works for a depth of 1 or 2, but it should look like [r:move*] to return the whole path, which is about 20-200 hops.
But if i raise the values, the querys won't finish. I think that Cypher looks at each outgoing relationship at every single path depth relating to the start node, but as I already explained, there is only one right path. So it should do some kind of a DFS search instead of a BFS search. Is there a way to do so?
I would consider configuring a relationship index for the GameID property. See http://docs.neo4j.org/chunked/milestone/auto-indexing.html#auto-indexing-config.
Once you have done that, you can try a query like the following (I have not tested this):
START n=node(3), r=relationship:rels(GameID = 3)
MATCH (n)-[r*1..]->(m)
RETURN m;
Such a query would limit the relationships considered by the MATCH cause to just the ones with the GameID you care about. And getting that initial collection of relationships would be fast, because of the indexing.
As an aside: since neo4j reuses its internally-generated IDs (for nodes that are deleted), storing those IDs as GameIDs will make your data unreliable (unless you never delete any such nodes). You may want to generate and use you own unique IDs, and store them in your nodes and use them for your GameIDs; and, if you do this, then you should also create a uniqueness constraint for your own IDs -- this will, as a nice side effect, automatically create an index for your IDs.

Neo4j: Java API to compute intersection multiple properties

I'm very new in using Neo4j and have a question regarding the computation of intersections of nodes.
Let's suppose, I have the three properties A,B,C and I want to select only the nodes that have all three properties.
I created an index for the properties and thus, I can get all nodes having one of the properties. However, afterwards I have to merge the IndexHits. Is there a way to select directly all nodes having the three properties?
My second idea was to create a node for each property and connect other nodes by relationships. I can then iterate over all relationships and get for each property a list of nodes which are connected. But again, I have to compute the intersection afterwards.
Is there a function I miss here, since I suppose it's a standard problem.
Thanks a lot,
Benny
Do you also have the values you look for? You would start with the property that limits the amount of found nodes most.
MATCH (a:Label {property1:{value1}})
WHERE a.property2 = {value2} AND a.property3 = {value3}
RETURN a
For the Java API and lucene indexes:
gdb.index().forNodes("foo").query("p1:value1 p2:value2 p3:value3")
Lucene query syntax

Find all sub-graphes containing at least one node having a certain property

My graph is composed of multiple "sub-graphes" that are disconnected from one another. These sub-graphes are composed of nodes that are connected with a given relation type.
I would like to get (for example) the list of sub-graphes that contain at least one node that has the property "name" equals "John".
It's equivalent to finding one node per subgraph having this property.
One solution would be to find all the nodes having this property and loop through this list to only pick the ones that are not connected to the previously picked ones. But that would be ugly and quite heavy. Is there an elegant way to do that with Cypher?
I'm trying with something along this direction but have no success so far:
START source=node:user('name:"John"')
MATCH source-[r?:KNOWS*]-target
WHERE r is null
RETURN source
Try this one it may help
START source=node:user('name:"John"')
MATCH source-[r:KNOWS]-()-[r2:KNOWS]-target
WHERE NOT(source-[r:KNOWS]-target)
RETURN target

How to get count for all nodes/edges downstream of some node in Neo4J

I'm wondering, within Cypher if there is a way to get a count of all nodes downstream of some node x.
For my particular use-case I have a number of graphs, which are separate entities, but stored in the same instance. I would like to find out, for each graph, what the node and relationship count is.
I already have this for relationships
start r=rel() return count()
and this for nodes
start n=node() return count()
for everything in the database.
Many thanks,
Eamonn
If you have some "reference" or root node per subgraph you can use path expressions to find all nodes:
start root=node:roots(id="xx")
match root-[*..5]->end
return count(distinct end)
It makes sense to limit the depth of your search.
you must index all your properties in your nodes/rels. then, you must start at these indexes to get the count, and if necessarily, sum them together for each graph.
let's assume we got 2 graphs, book-author type and car-color type. then to get the overal sum of nodes for each graph in cypher:
start g1=node:node_auto_index('bookName:*'), g11=node:node_auto_index('authorName:*'),
g2=node:node_auto_index('carName:*'), g22=node:node_auto_index('carColor:*')
return count(g1)+count(g11) as graph1, count(g2)+count(g22) as graph2
similary for all relationships. i don't know about any cypher solution which could simply group by an undefined property - that could solve the problem easily.

Resources