Inconsistent querying using cypher, neo4j - neo4j

I have come across a very strange problem with my data and cypher.
I run this query
match (sm:Small)-->(a:Activity) return distinct a
and it returns all 7 a nodes that exist that are connected to Small's (which is also all of them).
I run this query
match (a:Activity) return distinct a
And it returns only 1 of the a nodes that exists
If I run (same as first one except no direction)
match (sm:Small)--(a:Activity) return distinct a
I also only get one single Activity node
The one activity node was that always shows up was initially added to the database first. The other ones that are missing, were added later in a separate query.
If I run a query in the browser that returns all of them and there outgoing relationships to Small nodes the visualization of them all looks the same and proper.
Is there anyway to further interrogate this?

Related

How Does Neo4j Browser Also Show Relationships When Return Field Is Just Nodes?

In Neo4j Browser you see all edges between resultant nodes when you only queried for nodes.
For example,
MATCH (n:Greeting) RETURN n
shows nodes with the label 'Greeting' as well as all relationships between them.
This is true for other more complicated queries like
MATCH (n:Greeting {message: 'Hello'}) RETURN n
But when you use REST API you get only the nodes.
Actually this is the correct behavior and is obvious on resultant Table/Text/Code view.
That means Neo4j browser does some additional action when the result view is set to 'Graph.'
How can this be done?
Can we do this by just adding another statement, without modification of the original query?

neo4j CYPHER - Relationship Query doesn't finish

in a 14 GB database I have a few CITES relationships:
MATCH p=()-[r:CITES]->() RETURN count(r)
91
However, when I run
MATCH ()-[r:CITES]-() RETURN count(r)
it loads forever and eventually crashes with a browser window reload (neo4j desktop)
You can see the differences in how each of those queries will execute if you prefix each query with EXPLAIN.
The pattern used for the first query is such that the planner will find that count in the counts store, a transactionally updated store of counts of various things. This is a fast constant time lookup.
The other pattern, when omitting the direction, will not use the count store lookup and will actually have to traverse the graph (starting from every node in the graph), and that will take a long time as your graph grows.
As for what this gives back, it should actually be twice the number of :CITIES relationships in your graph, since without the direction on the relationship, each individual relationship will be found twice, since the same path with the start and end nodes switched both fit the given pattern.
Neo4j always choose nodes as start points for query execution. In your query, probably the query engine is touching the whole graph, since you are not adding restrictions on node properties, labels, etc.
I think you should specify a label at least in your first node in the pattern.
MATCH (:Article)-[r:CITES]-() RETURN count(r)

Neo4j more specific query slower than more generic one

I'm trying to count all values collected in one subtree of my graph. I thought that the more descriptive path from the root node I provide, the faster the query will run. Unfortunately this isn't true in my case and I can't figure out why.
Original, slow query:
MATCH (s:Sandbox {name: "sandbox"})<--(root)-[:has_metric]->(n:Metric)-[:most_recent|:prev*0..]->(v:Value) return count(v)
PROFILE returns 38397 total db hits in 2203 ms.
However without matching top-level node, labeled Sandbox, query is 10 times faster:
MATCH (root)-[:has_metric]->(n:Metric)-[:most_recent|:prev*0..]->(v:Value) return count(v)
PROFILE returns 38478 total db hits in 159 ms
To make this clear, in this case the result is the same as I have just one Sandbox.
What is wrong in my first query? How should I model/query the hierarchy like that? I can save sandbox name as property in Metric node, but it seems uglier for me, however executes faster.
Because the 2 queries are not identical.
(For reader visual difference)
MATCH (s:Sandbox {name: "sandbox"})<--(root)-[:has_metric]->(n:Metric)-[:most_recent|:prev*0..]->(v:Value) return count(v)
MATCH (root)-[:has_metric]->(n:Metric)-[:most_recent|:prev*0..]->(v:Value) return count(v)
So in the second query, Neo4j doesn't care about (root). You never use root, and root is already implied by [:has_metric], so Neo4j can just skip to finding ()-[:has_metric]->(n:Metric)-[:most_recent|prev]. In the first query, now we also have to find these Sandbox nodes! And on top of that, root has to be connected to that too! So Neo4j has to do extra work to prove that that is true. The extra column can also add more rows to the results being processed, which may add more validation checks on the rest of the query.
So long story short, the first query is slower because it is doing more validation work. So, the first query will be a subset of the latter.

Is a DFS Cypher Query possible?

My database contains about 300k nodes and 350k relationships.
My current query is:
start n=node(3) match p=(n)-[r:move*1..2]->(m) where all(r2 in relationships(p) where r2.GameID = STR(id(n))) return m;
The nodes touched in this query are all of the same kind, they are different positions in a game. Each of the relationships contains a property "GameID", which is used to identify the right relationship if you want to pass the graph via a path. So if you start traversing the graph at a node and follow the relationship with the right GameID, there won't be another path starting at the first node with a relationship that fits the GameID.
There are nodes that have hundreds of in and outgoing relationships, some others only have a few.
The problem is, that I don't know how to tell Cypher how to do this. The above query works for a depth of 1 or 2, but it should look like [r:move*] to return the whole path, which is about 20-200 hops.
But if i raise the values, the querys won't finish. I think that Cypher looks at each outgoing relationship at every single path depth relating to the start node, but as I already explained, there is only one right path. So it should do some kind of a DFS search instead of a BFS search. Is there a way to do so?
I would consider configuring a relationship index for the GameID property. See http://docs.neo4j.org/chunked/milestone/auto-indexing.html#auto-indexing-config.
Once you have done that, you can try a query like the following (I have not tested this):
START n=node(3), r=relationship:rels(GameID = 3)
MATCH (n)-[r*1..]->(m)
RETURN m;
Such a query would limit the relationships considered by the MATCH cause to just the ones with the GameID you care about. And getting that initial collection of relationships would be fast, because of the indexing.
As an aside: since neo4j reuses its internally-generated IDs (for nodes that are deleted), storing those IDs as GameIDs will make your data unreliable (unless you never delete any such nodes). You may want to generate and use you own unique IDs, and store them in your nodes and use them for your GameIDs; and, if you do this, then you should also create a uniqueness constraint for your own IDs -- this will, as a nice side effect, automatically create an index for your IDs.

Neo4j Spatial- two nodes created for every spatially indexed node

I am using Neo4j 1.8.2 with Neo4j Spatial 0.9 for 1.8.2 (http://m2.neo4j.org/content/repositories/releases/org/neo4j/neo4j-spatial/0.9-neo4j-1.8.2/)
Followed the example code from here http://architects.dzone.com/articles/neo4jcypher-finding-football with one change- instead of SpatialIndexProvider.SIMPLE_WKT_CONFIG, I used SpatialIndexProvider.SIMPLE_POINT_CONFIG_WKT
Everything works fine until you execute the following query:
START n=node:stadiumsLocation('withinDistance:[53.489271,-2.246704, 5.0]')
RETURN n.name, n.wkt;
n.name is null. When I explored the graph, I found this data:
Node[80]{lon:-2.20024,lat:53.483,id:79,gtype:1,bbox:-2.20024,53.483,-2.20024,53.483]}
Node[168]{lon:-2.29139,lat:53.4631,id:167,gtype:1,bbox:-2.29139,53.4631,-2.29139,53.4631]}
For Node 80 returned, it looks like this is the node created for the spatial record, which contains a property id:79. Node 79 is the actual stadium record from the example.
As per the source of IndexProviderTest, the comments
//We not longer need this as the node we get back already a 'Real' node
// Node node = db.getNodeById( (Long) spatialRecord.getProperty( "id" ) );
seem to indicate that this feature isn't available in the version I am using.
My question is, what is the recommended way to use withinDistance with other match conditions? There are a couple of other conditions to be fulfilled but I can't seem to get a handle on the actual node to actually match them.
Should I explicitly create relations? Not use Cypher and use the core API to do a traversal? Split the queries?
Two options:
a) Use GeoPipline.startNearestNeighborLatLonSearch to get a starting set of nodes, supply to subsequent Cypher query to do matching/filtering on other properties
b) Since my lat/longs are common across many entities [using centroid of an area], I can create a relation from the spatial node to all entities that are located in that area and then use one Cypher query such as:
START n=node:stadiumsLocation('withinDistance:[53.489271,-2.246704, 5.0]')
MATCH (n)<-[:LOCATED_IN]-(something)
WHERE something.someProp=5
RETURN something
As advised by Peter, went with option b.
Note though, there is no way to get the spatially indexed node back so that you can create relations from it. Had to do a withinDistance query for 0.0 distance.
can you execute the enhanced testcase I did at https://github.com/neo4j/spatial/blob/2803093d544f56d7dfe8f1d122e049fa73489d8a/src/test/java/org/neo4j/gis/spatial/IndexProviderTest.java#L199 ? It shows how to find a location, and traverse with cypher to the next node.

Resources