I created a spatial index in neo4j but when searching for nearby places I only get one result.
My query is:
START n=node:geom('withinDistance:[63.36, 10.35, 50.0]') RETURN n
And I have 3 nodes in the spatial index with this coords:
Node 1 lat,lon: 63.3654, 10.3578
Node 2 lat,lon: 63.3654, 10.3577
Node 3 lat,lon: 63.3654, 10.3578 (same node 1)
Theoretically the three nodes are in the same area.
Any idea?
UPDATE
I performed these steps to use spatial (all executed from neo4j browser -> rest api)
1) Index creation
:POST /db/data/index/node/
{
"name" : "geom",
"config" : {
"provider" : "spatial",
"geometry_type" : "point",
"lat" : "lat",
"lon" : "lon"
}
}
2) Nodes creation (all in the same way)
:POST /db/data/node
{
"name":"Franciscatos Pizza",
"lat": 63.3654,
"lon": 10.3578
}
3) Node to spatial index
:POST /db/data/index/node/geom
{
"value":"dummy",
"key":"dummy"
"uri":"http://localhost:7474/db/data/node/8"
}
4) Node to layer
:POST /db/data/ext/SpatialPlugin/graphdb/addNodeToLayer
{
"layer":"geom",
"node":"http://localhost:7474/db/data/node/8"
}
Any API response are OK and all nodes indexed contain the :RTREE_REFERENCE relationship.
Depending on the distance parameter in the query, this returns me different nodes, but always one...
Darios,
First thing, don't do step 3). Steps 3) and 4) are somewhat redundant, but step 3) makes a copy of the geometry information in the node and creates a second node that is stored into the layer. Instead, do this new step 3).
START n = NODE(8)
SET n.id = ID(n)
This Cypher code adds an 'id' parameter on the node that contains the Neo4j node number. Once you do this, you can use the Cypher spatial index query. Note that the first line will have a different node number each time. This 'id' property is self-referential.
Alternatively, do your step 3), but don't do step 4). But then you won't get what you expect if you do a REST geometry query.
See if your results improve.
Grace and peace,
Jim
PS.
Michael,
There's actually two competing approaches in play with spatial right now. If you use addNodeToLayer to add your node to a layer (as in step 4), the node is linked into the RTree graph directly and Cypher queries won't find the node. This is also true if you are using Java. You can query via REST using findGeometriesWithinDistance and findGeometriesInBBox.
If you use the 'add the node to the spatial index' method to add your node to a layer (as in step 3), it doesn't actually add your node to the layer. A new node is made that contains a copy of the geometry properties on the original node and an 'id' property that contains the Neo4j node number of the original node, and this copy node is added to the RTree graph. The 'spatial index' does not actually contain a list of nodes. It is an access point for the spatial extension code. When you do a Cypher spatial query, the spatial extension finds the copy nodes that satisfy the query, then dereferences the 'id' properties on each to build a return list of original nodes.
It's the lack of the 'id' property to dereference that causes Cypher spatial index queries to fail if you add a node to a layer using step 4) alone. By adding the 'id' property, the dereference succeeds, and you get results from your query.
The shapefile importer links nodes directly into the RTree, and if you want to be able to do Cypher spatial index queries, you need to add the 'id' property to each node as I described. The OSM importer builds related 'domain' and geometry nodes, but I don't think it makes them accessible to Cypher-based queries. If you add the 'id' property to each geometry node, then they will be.
I may have missed it, but I haven't seen anyone point out that if you use the 'add the node to the spatial index' method, that you just doubled the number of nodes you have, as well as doubled the number of geometry properties stored in your database. Since there is no relationship built between the original nodes and the copy nodes, there is no way to access the geometry properties in the copy nodes, so you can't really delete the geometry properties from the original nodes.
As a result, I find it more desirable to add my nodes to the RTree graph directly and make them queryable (queriable?) through the Cypher spatial index by adding self-referential 'id' properties.
As for deleting nodes, there is no REST SpatialPlugin method for removing a node from a layer. If you add the node to the RTree graph using the REST spatial index method, then the REST call
:DELETE /db/data/index/node/geom/{ID}
will remove the node from the RTree, but there is a catch. You must get the Neo4j node number of the copy node in order for this to work! Which you can't in any straightforward way. If you manage to obtain the node number of the copy node, it will remove it from the RTree, but the copy node is not deleted.
Somewhat ironically, if you add the node to the RTree using addNodeToLayer and don't add the 'id' property, the call to remove the node from the index removes the node from the RTree. If you add the self-referential 'id' property and then remove the node from the index, the node is deleted. So every approach is flawed.
I am using neo4j 2.3 and found that step 3) is useless but not step 4), also if you do not clone the id as property the query from cypher do not work anymore ( return no results )
Related
When I say Stateful Node, I mean a node that carries ‘state info,’ such as the path that leads to this node. E.g. R1 is a node, and
state1: link coming from path 1
state2: link coming from path 2
Is there any way I could create such a node in Neo4j? While traversing such a node, I expect it to behave like this:
if state 1, and input is x, then [:has] node1
if state one and input is y, then stop
if state two and input is z, then [: has] node 2.
I want to convert node R1 to a stateful node so that it keeps the information mentioned above. Does Neo4J support such nodes? If so, could you guide me to a resource? Also, does the cipher query support the ‘stateful’ approach so I can set the state according to the path from which R1 is produced?
In the Neo4j architecture, a relationship is a doubly linked-list that stores pointers to the start and end nodes.
It sounds like what you're looking to do is create nodes that store that same information for all relationships that touch it, and then have behavior based on how the graph reaches them.
This is more akin to logic control, and Cypher handles that through filters on relationship type, node labels, and properties.
However, you can always set properties of nodes based on queries. For example:
MATCH (:AUTH_T)-[:HAS]->(n:R1)
SET R1.reached_by = "HAS"
Then you could do something with that in the future, like if you want to know if node n was reached by another method.
Say I have following path in the graph:
(:Type1)<-[:RelType1]-(:Type2)<-[:RelType2]-()<-[*]-(centernode)-[*]->()-[:RelType2]->(:Type2)-[:RelType1]->(:Type1)
Given <id> of (:Type1) node on left side, I am able to MATCH above path and get corresponding (:Type1) node on right side (notice that the path is symmetric and its center is node (centernode)). In my usecase we get <id>s of (:Type1) node, get the corresponding (:Type1) node on the other side and then process further.
However it may happen that I get <id>s of both nodes of (:Type1). In that case separate queries will be fired starting at corresponding node and will evaluate to the (:Type1) node on the other side, thus further execution will continue on both the nodes.
Q1. How can I avoid processing both nodes. That is, if given two <id>s of (:Type1) nodes which reside on extreme sides of same path, how can I ensure only one of the queries starting at one of these nodes matching node on the other side is executed so that only one of those nodes are processed further and other node is say held in temporary buffer to process afterwards (if processing of first node fails).
Added fact: Above I have a single path with two (:Type1) nodes at its extreme sides. I may have three or more paths emanating from (centernode) and ending in (:Type1) node. So I want only one of those (:Type1) nodes to get processed first, and next (:Type1) node will processed only if earlier processing fails.
Q2. Is this scenario even possible with pure cypher? Or I have to end up using Neo4J Traversal API? If yes how this can be done, as I have to ensure uniqueness of nodes/relationships visited across two different traveresals.
Q3. How can I add path expander in Traversal API to match path of type (:Type1)<-[:RelType1]-(:Type2)<-[:RelType2]-(). Should I be doing something like this:
at each traversal `next()`
if (node is of Type1)
follow <-[:RelType1]-
if (node is of Type2)
follow <-[:RelType2]-
(Above is pseudocode. I am new to Traversal API. I have went through all docs and examples. So I am guessing inside expander I have to put if() filters to check current nodes type and decide which relation type and its direction to expand next. Above pseudocode is meant to indicate that.)
Is this how such cypher can be writting in Traversal API? Or is there any better way?
An old trick is to use node ids to order pairs (ID(a) < ID(b)), which filters out "duplicate" results. So if you feed all your source IDs into a single query, you can make use this trick to filter out duplicates:
WITH [1, 2, 3, 4] AS sourceIds
UNWIND sourceIds AS sourceId
MATCH (source:Type1)
WHERE ID(source) = sourceId
MATCH
(source)<-[:RelType1]-(:Type2)<-[:RelType2]-
()<-[*]-(centernode)-[*]->()
-[:RelType2]->(:Type2)-[:RelType1]->(target:Type1)
WHERE ID(source) < ID(target)
RETURN source, target
Could this work for your use case?
Im using neo4j to store information about maps and sensors. Every time the map or sensor layout changes I need to keep a copy. I can imagine querying and manually creating said copy but I'm wondering if it's possible to build a neo4j type query that would do this for me.
So far all I've come up with is a way to replicate the nodes in a given label:
match ( a:some_label { some_params }) with a create ( b:some_label ) set b=a,b.other_id=value;
This would allow me to put version and time stamp info on a given snap shot.
What it doesn't do is copy the edge information. Suggestions? Maybe a second (similar) query?
If I understand you correctly, you are essentially trying to maintain a history of the state of a node and the state of its incoming relationship. One way to do this is to chain the nodes in reverse chronological order.
For example, suppose the nodes in the chain are labeled Some_label and the relationships are of type SOME_TYPE. The head node of the chain is always the current (most recent) node. Unless a Some_label node is chronologically the earliest node in the chain, it will have a SOME_TYPE relationship to the previous version of the node.
Here is how you'd insert a new relationship and node (with some properties) at the head of the chain. (Just to set up this example, I assume that the first node in the chain is linked to by some node labeled HeadRef).
MATCH (x:HeadRef)-[r1:SOME_TYPE]->(a1:Some_label)
CREATE (x)-[r2:SOME_TYPE {x: "ghi"}]->(a2:Some_label {a:123, b: true})-[r:SOME_TYPE]->(a1)
SET r=r1
WITH r1
DELETE r1
Note that this approach is also much more performant than maintaining your own other_id property to link nodes together. You should always use relationships instead -- that is the graph DB way.
I have a performance issue with a modifying cypher query. Given is an origin node that has a huge amount of outgoing relationships to child nodes. These child nodes all have a key property. Now the goal is to create new nodes between the origin and the child nodes to group all child nodes which share the same key properties value. A plot of that idea can be found at the neo4j console: http://console.neo4j.org/?id=vinntj
I use the query together with spring-data-neo4j 2.2.2.RELEASE and neo4j 1.9.2 embedded. The parameter for that query must be a node id and the result of that query should be the modified root node.
The query currently looks like (a bit more complex than in the linked neo4j console):
START root=node({0})
MATCH (root)-[r:LEAF]->(child)
SET root.__type__='my.GroupedRoot'
DELETE r
WITH child.`custom-GROUP` AS groupingKey, root AS origin, child AS leaf
CREATE UNIQUE (origin)-[:GROUP]->(group{__type__:'my.Group',key:'GROUP',value:groupingKey,origin:ID(origin)})-[:LEAF]->(leaf)
RETURN DISTINCT origin
The property custom-GROUP is the key to group by. In SDN it is represented by a DynamicProperties object. I annotated it to be indexed as well as the groupingKey and origin property of the created group node.
With 5000 child nodes it takes ~50sec to group them. For 10000 nodes ~90sec. For 20000 nodes ~380s and for 30000 nodes > 50min! This looks like an o(log n) scale to me. But my goal is an o(n) scale and to get 500000+ child nodes processed below 30min. I assume that the CREATE UNIQUE part of that query causes that problem because for new group nodes it always need to check what kind of group nodes have already been created. And the amount to check grows with the amount of already grouped child nodes.
Does someone have an idea about how to get this query faster? Or to do the same thing faster with an other query?
If the CREATE UNIQUE is indeed the problem, then this will first create the groups, then map to them.
START root=node(*)
MATCH (root)-[r:LEAF]->(child)
WHERE HAS (root.key) AND root.key='root'
WITH DISTINCT child.key AS groupingKey, root as origin
CREATE UNIQUE (origin)-[:GROUP]->(intermediate { key:groupingKey,origin:ID(origin)})
WITH groupingKey, origin, intermediate
MATCH (origin)-[r:LEAF]->(leaf)
WHERE leaf.key = groupingKey
DELETE r
CREATE (intermediate)-[:LEAF]->(leaf)
RETURN DISTINCT origin
The console is not letting me view the execution plan for either of our queries for some reason so I don't know for sure if it will help.
You might also consider indexing the roots so that you aren't having to do a "WHERE" on all of the nodes. You could just check an index for key=root.
EDIT An alternative to the above query is as follows which will prevent having to match the leaf nodes a second time by using a collect.
START root=node(*)
MATCH (root)-[r:LEAF]->(child)
WHERE HAS (root.key) AND root.key='root'
DELETE r
WITH DISTINCT child.key AS groupingKey, root as origin, COLLECT(child) as children
CREATE UNIQUE (origin)-[:GROUP]->(intermediate { key:groupingKey,origin:ID(origin)})
WITH groupingKey, origin, intermediate, children
FOREACH(leaf IN children : CREATE (intermediate)-[:LEAF]->(leaf))
RETURN DISTINCT origin
Well, now I turned to not use this kind of cypher queries on such a big amount of data. I implemented the same functionality using the traversal API for extracting the groupable items and the Neo4jTemplate to create the new nodes and relationships. Now 50000 items can be grouped in 5474ms instead of ~1h with the previously used cypher query. This is a very big improvement.
I am using Neo4j 1.8.2 with Neo4j Spatial 0.9 for 1.8.2 (http://m2.neo4j.org/content/repositories/releases/org/neo4j/neo4j-spatial/0.9-neo4j-1.8.2/)
Followed the example code from here http://architects.dzone.com/articles/neo4jcypher-finding-football with one change- instead of SpatialIndexProvider.SIMPLE_WKT_CONFIG, I used SpatialIndexProvider.SIMPLE_POINT_CONFIG_WKT
Everything works fine until you execute the following query:
START n=node:stadiumsLocation('withinDistance:[53.489271,-2.246704, 5.0]')
RETURN n.name, n.wkt;
n.name is null. When I explored the graph, I found this data:
Node[80]{lon:-2.20024,lat:53.483,id:79,gtype:1,bbox:-2.20024,53.483,-2.20024,53.483]}
Node[168]{lon:-2.29139,lat:53.4631,id:167,gtype:1,bbox:-2.29139,53.4631,-2.29139,53.4631]}
For Node 80 returned, it looks like this is the node created for the spatial record, which contains a property id:79. Node 79 is the actual stadium record from the example.
As per the source of IndexProviderTest, the comments
//We not longer need this as the node we get back already a 'Real' node
// Node node = db.getNodeById( (Long) spatialRecord.getProperty( "id" ) );
seem to indicate that this feature isn't available in the version I am using.
My question is, what is the recommended way to use withinDistance with other match conditions? There are a couple of other conditions to be fulfilled but I can't seem to get a handle on the actual node to actually match them.
Should I explicitly create relations? Not use Cypher and use the core API to do a traversal? Split the queries?
Two options:
a) Use GeoPipline.startNearestNeighborLatLonSearch to get a starting set of nodes, supply to subsequent Cypher query to do matching/filtering on other properties
b) Since my lat/longs are common across many entities [using centroid of an area], I can create a relation from the spatial node to all entities that are located in that area and then use one Cypher query such as:
START n=node:stadiumsLocation('withinDistance:[53.489271,-2.246704, 5.0]')
MATCH (n)<-[:LOCATED_IN]-(something)
WHERE something.someProp=5
RETURN something
As advised by Peter, went with option b.
Note though, there is no way to get the spatially indexed node back so that you can create relations from it. Had to do a withinDistance query for 0.0 distance.
can you execute the enhanced testcase I did at https://github.com/neo4j/spatial/blob/2803093d544f56d7dfe8f1d122e049fa73489d8a/src/test/java/org/neo4j/gis/spatial/IndexProviderTest.java#L199 ? It shows how to find a location, and traverse with cypher to the next node.