I have a Neo4j query with searched multiple entities and I would like to pass parameters in batch using nodes object. However, I the speed of query execution is not quite high. How can I optimize this query and make its performance better?
WITH $nodes as nodes
UNWIND nodes AS node
with node.id AS id, node.lon AS lon, node.lat AS lat
MATCH
(m:Member)-[mtg_r:MT_TO_MEMBER]->(mt:MemberTopics)-[mtt_r:MT_TO_TOPIC]->(t:Topic),
(t1:Topic)-[tt_r:GT_TO_TOPIC]->(gt:GroupTopics)-[tg_r:GT_TO_GROUP]->(g:Group)-[h_r:HAS]->
(e:Event)-[a_r:AT]->(v:Venue)
WHERE mt.topic_id = gt.topic_id AND
distance(point({ longitude: lon, latitude: lat}),point({ longitude: v.lon, latitude: v.lat })) < 4000 AND
mt.member_id = id
RETURN
distinct id as member_id,
lat as member_lat,
lon as member_lon,
g.group_name as group_name,
e.event_name as event_name,
v.venue_name as venue_name,
v.lat as venue_lat,
v.lon as venue_lon,
distance(point({ longitude: lon,
latitude: lat}),point({ longitude: v.lon, latitude: v.lat })) as distance
Query profiling looks like this:
So, your current plan has 3 parallel threads. One we can ignore for now because it has 0db hits.
The biggest hit you are taking is the match for (mt:MemberTopics) ... WHERE mt.member_id = id. I'm guessing member_id is a unique id, so you will want to create an index on it CREATE INDEX ON :MemberTopics(member_id). That will allow Cypher to do an index lookup instead of a node scan, which will reduce the DB hits from ~30mill to ~1 (Also, in some cases, in-lining property matches is faster for more complex queries. So (mt:MemberTopics {member_id:id}) is better. It explicitly makes clear that this condition must always be true while matching, and will reinforce to use the index lookup)
The second biggest hit is the point-distance check. Right now, this is being done independently, because the node scan takes so long. Once you make the changes for MemberTopic, The planner should switch to finding all connected Venues, and then only doing the distance check on thous, so that should become cheaper as well.
Also, it looks like mt and gt are linked by a topic, and you are using a topic id to align them. If t and t1 are suppose to be the same Topic node, you could just use t for both nodes to enforce that, and then you don't need to do the id check to link mt and gt. If t and t1 are not the same node, the use of a foriegn key in your node's properties is a sign that you should have a relationship between the two nodes, and just travel along that edge (Relationships can have properties too, but the context looks a lot like t and t1 are suppose to be the same node. You can also enforce this by saying WHERE t = t1, but at that point, you should just use t for both nodes)
Lastly, Depending on the number of rows your query returns, you may want to use LIMIT and SKIP to page your results. This looks like info going to a user, and I doubt they need the full dump. So Only return the top results, and only process the rest if the user wants to see more. (Useful as results approach a metric ton) Since you only have 21 results so far, this won't be an issue right now, but keep in mind as you need to scale to 100,000+ results.
I want to replace the value of the 'Amount' key in a map (literal) with the sum of the existing 'Amount' value plus the new 'Amount' value such where both the 'type' and 'Price' match. The structure I have so far is:
WITH [{type:1, Orders:[{Price:10,Amount:100},{Price:11,Amount:200},{Price:12,Amount:300}]},
{type:2, Orders:[{Price:10,Amount:100},{Price:11,Amount:200},{Price:12,Amount:300}]},
{type:3, Orders:[{Price:10,Amount:100},{Price:11,Amount:200},{Price:12,Amount:300}]}] as ExistingOrders,
{type:2, Order:{Price:11,Amount:50}} as NewOrder
(I'm trying to get it to:)
RETURN [{type:1, Orders:[{Price:10,Amount:100},{Price:11,Amount:200},{Price:12,Amount:300}]},
{type:2, Orders:[{Price:10,Amount:100},{Price:11,Amount:250},{Price:12,Amount:300}]},
{type:3, Orders:[{Price:10,Amount:100},{Price:11,Amount:200},{Price:12,Amount:300}]}] as CombinedOrders
If there is no existing NewOrder.type and NewOrder.Price then it should obviously insert the new record rather than add it together.
Sorry, this is possibly really straight forward, but I'm not very good at this yet.
thanks
Edit:
I should add, that I have been able to get this working for a simpler map structure as such:
WITH [{type:1, Amount:100},{type:2, Amount:200},{type:3, Amount:300}] as ExistingOrders,
{type:2, Amount:50} as NewValue
RETURN reduce(map=filter(p in ExistingOrders where not p.type=NewValue.type),x in [(filter(p2 in ExistingOrders where p2.type=NewValue.type)[0])]|CASE x WHEN null THEN NewValue ELSE {type:x.type,Amount:x.Amount+NewValue.Amount} END+map) as CombinedOrders
But I'm struggling I think because of the Orders[array] in my first example.
I believe you are just trying to update the value of the appropriate Amount in ExistingOrders.
The following query is legal Cypher, and should normally work:
WITH ExistingOrders, NewOrder, [x IN ExistingOrders WHERE x.type = NewOrder.type | x.Orders] AS eo
FOREACH (y IN eo |
SET y.Amount = y.Amount + CASE WHEN y.Price = NewOrder.Order.Price THEN NewOrder.Order.Amount ELSE 0 END
)
However, the above query produces a (somewhat) funny ThisShouldNotHappenError error with the message:
Developer: Stefan claims that: This should be a node or a relationship
What the message is trying to say (in obtuse fashion) is that you are not using the neo4j DB in the right way. Your properties are way too complicated, and should be separated out into nodes and relationships.
So, I will a proposed data model that does just that. Here is how you can create nodes and relationships that represent the same data as ExistingOrders:
CREATE (t1:Type {id:1}), (t2:Type {id:2}), (t3:Type {id:3}),
(t1)-[:HAS_ORDER]->(:Order {Price:10,Amount:100}),
(t1)-[:HAS_ORDER]->(:Order {Price:11,Amount:200}),
(t1)-[:HAS_ORDER]->(:Order {Price:12,Amount:300}),
(t2)-[:HAS_ORDER]->(:Order {Price:10,Amount:100}),
(t2)-[:HAS_ORDER]->(:Order {Price:11,Amount:200}),
(t2)-[:HAS_ORDER]->(:Order {Price:12,Amount:300}),
(t3)-[:HAS_ORDER]->(:Order {Price:10,Amount:100}),
(t3)-[:HAS_ORDER]->(:Order {Price:11,Amount:200}),
(t3)-[:HAS_ORDER]->(:Order {Price:12,Amount:300});
And here is a query that will update the correct Amount:
WITH {type:2, Order:{Price:11,Amount:50}} as NewOrder
MATCH (t:Type)-[:HAS_ORDER]->(o:Order)
WHERE t.id = NewOrder.type AND o.Price = NewOrder.Order.Price
SET o.Amount = o.Amount + NewOrder.Order.Amount
RETURN t.id, o.Price, o.Amount;
There's two parts to your question - one with a simple answer, and a second part that doesn't make sense. Let me take the simple one first!
As far as I can tell, it seems you're asking how to concatenate a new map on to a collection of maps. So, how to add a new item in an array. Just use + like this simple example:
return [{item:1}, {item:2}] + [{item:3}];
Note that the single item we're adding at the end isn't a map, but a collection with only one item.
So for your query:
RETURN [
{type:1, Orders:[{Price:10,Amount:100},
{Price:11,Amount:200},
{Price:12,Amount:300}]},
{type:2, Orders:[{Price:10,Amount:100},
{Price:11,Amount:**250**},
{Price:12,Amount:300}]}]
+
[{type:3, Orders:[{Price:10,Amount:100},
{Price:11,Amount:200},{Price:12,Amount:300}]}]
as **CombinedOrders**
Should do the trick.
Or you could maybe do it a bit cleaner, like this:
WITH [{type:1, Orders:[{Price:10,Amount:100},{Price:11,Amount:200},{Price:12,Amount:300}]},
{type:2, Orders:[{Price:10,Amount:100},{Price:11,Amount:200},{Price:12,Amount:300}]},
{type:3, Orders:[{Price:10,Amount:100},{Price:11,Amount:200},{Price:12,Amount:300}]}] as ExistingOrders,
{type:2, Order:{Price:11,Amount:50}} as NewOrder
RETURN ExistingOrders + [NewOrder];
OK now for the part that doesn't make sense. In your example, it looks like you want to modify the map inside of the collection. But you have two {type:2} maps in there, and you're looking to merge them into something with one resulting {type:3} map in the output that you're asking for. If you need to deconflict map entries and change what the map entry ought to be, it might be that cypher isn't your best choice for that kind of query.
I figured it out:
WITH [{type:1, Orders:[{Price:10,Amount:100},{Price:11,Amount:200},Price:12,Amount:300}]},{type:2, Orders:[{Price:10,Amount:100},{Price:11,Amount:200},{Price:12,Amount:300}]},{type:3, Orders:[{Price:10,Amount:100},{Price:11,Amount:200},{Price:12,Amount:300}]}] as ExistingOrders,{type:2, Orders:[{Price:11,Amount:50}]} as NewOrder
RETURN
reduce(map=filter(p in ExistingOrders where not p.type=NewOrder.type),
x in [(filter(p2 in ExistingOrders where p2.type=NewOrder.type)[0])]|
CASE x
WHEN null THEN NewOrder
ELSE {type:x.type, Orders:[
reduce(map2=filter(p3 in x.Orders where not (p3.Price=(NewOrder.Orders[0]).Price)),
x2 in [filter(p4 in x.Orders where p4.Price=(NewOrder.Orders[0]).Price)[0]]|
CASE x2
WHEN null THEN NewOrder.Orders[0]
ELSE {Price:x2.Price, Amount:x2.Amount+(NewOrder.Orders[0]).Amount}
END+map2 )]} END+map) as CombinedOrders
...using nested Reduce functions.
So, to start with it combines a list of orders without matching type, with a list of those orders (actually, just one) with a matching type. For those latter ExistingOrders (with type that matches the NewOrder) it does a similar thing with Price in the nested reduce function and combines non-matching Prices with matching Prices, adding the Amount in the latter case.
In my graph I have data like following way.
Here a,b,c,d are nodes and r1,r2,r3,r4 are relations.
a-r1->b
b-r2->a
b-r2->c
c-r1->b
d-r3->a
a-r1->d like this.
I am using following Cypher to get path with max depth 3.
MATCH p=(n)-[r*1..3]-(m) WHERE n.id=1 and m.id=2 RETURN p
Here return p is path and I want to display path in text format like this.
Example : Suppose Path Lengh is 3.
a-r1->b-r2->c like this in text format.
Is this possible ?
Sort of. I'll give you most of the answer, but I myself can't complete the answer. Maybe another cypher wizard will come along and improve on the answer, but here's what I've got for you.
match p=(n)-[r*1..3]-(m)
WHERE id(n)=1 AND id(m)=2
WITH extract(node in nodes(p) | coalesce(node.label, "")) as nodeLabels,
extract(rel in relationships(p) | type(rel)) as relationshipLabels
WITH reduce(nodePath="", nodeLabel in nodeLabels | nodePath + nodeLabel + "-") as nodePath,
reduce(relPath="", relLabel in relationshipLabels | relPath + relLabel + "-") as relPath
RETURN nodePath, relPath
LIMIT 1;
EDIT - one small note, in your question you specify the WHERE criteria n.id=1 and m.id=2. Note that this is probably not what you want. Node IDs are usually checked with WHERE id(n)=1 AND id(m)=2. Id isn't technically a node property, so I changed that.
OK, so we're going to match the path. Then we're going to use the extract function to pull out the label property from nodes, and create a collection called nodeLabels. We'll do the same for the relationship types. What reduce does here is accumulate each of the individual strings in those collections down to a single string. So if your nodes are a, b, and c, you'd get a nodePath string that looks like a-b-c-. Similarly, your relationship string would look like r1-r2-r3-.
Now, I know you want those interleaved, and you'd prefer output like a-r1-b-r2-c. Here's the problem I see with that...
Normally, the way I'd approach that is to use FOREACH to iterate over the node label collection. Since you know there is one less relationship than nodes because of what paths are, ideally (in pseudo code) I'd want to do something like this:
buffer = ""
foreach x in range(0, length(nodeLabels)) |
buffer = buffer + nodeLabels[idx] + "-" + relLabels[idx] + "->")
This would be a way of reducing to the string that you want. You can't use the reduce function, because it doesn't provide you a way of getting which index you're at in the collection. Meaning that you can iterate over one of the collections, but not at the same time over the other. This FOREACH pseudo code will not work, because the second part of FOREACH I believe has to be a mutating operation on the graph, and you can't just use it to accumulate a string like I did here, or like the extract function does.
So as far as I can tell, you might kinda be stuck here. Hopefully someone will prove me wrong on this - I am not 100% sure.
Finally another way to go after this would be, if there was a path function that extracted node/relationship pairs, rather than just nodes() or relationships() individually as I used them above, then you could use that function to iterate over one collection, rather than shuffling two collections, as my code above attempts and fails to do. Sadly, I don't think there's any such path function, so that's just more reason why I think you might be up a creek.
Now, practically speaking, you could always execute this query in java or some other language, return the path, and then use the full power of whatever programming language you want to build up this string. But pure cypher? I'm doubtful.
Here What I ended up doing. Hope that somebody else find it useful for future.
MATCH p=(n)-[r*1..3]->(m)
WHERE n.id=1 AND m.id=4
WITH extract(rel in relationships(p) | STARTNODE(rel).name + '->' + type(rel)) as relationshipLabels, m.name as endnodename
WITH reduce(relPath="", relLabel in relationshipLabels | relPath + relLabel+ '->') as relPath , end
RETURN distinct relPath + endnodename
I am using the spatial server plugin for Neo4j 2.0 and manage to add Users and Cities with their geo properties lat/lon to a spatial index "geom". Unfortunately I cannot get the syntax right to get them back via Neo4jClient :( What I want is basically:
Translate the cypher query START n=node:geom('withinDistance:[60.0,15.0, 100.0]') RETURN n; to Neo4jClient syntax so I can get all the users within a given distance from a specified point.
Even more helpful would be if it is possible to return the nodes with their respective distance to the point?
Is there any way to get the nearest user or city from a given point without specify a distance?
UPDATE
After some trial and error I have solved question 1 and the problem communicating with Neo4j spatial through Neo4jClient. Below Neo4jClient query returns 1 user but only the nearest one even though the database contains 2 users who should be returned. I have also tried plain cypher through the web interface without any luck. Have I completely misunderstood what withinDistance is supposed to do? :) Is there really no one who can give a little insight to question 2 and 3 above? It would be very much appreciated!
var queryString = string.Format("withinDistance:[" + latitude + ", " + longitude + ", " + distance + "]");
var graphResults = graphClient.Cypher
.Start(new { user = Node.ByIndexQuery("geom", queryString) })
.Return((user) => new
{
EntityList = user.CollectAsDistinct<UserEntity>()
}).Results;
The client won't let you using the fluent system, the closest you could get would be something like:
var geoQuery = client.Cypher
.Start( new{n = Node.ByIndexLookup("geom", "withindistance", "[60.0,15.0, 100.0]")})
.Return(n => n.As<????>());
but that generates cypher like:
START n=node:`geom`(withindistance = [60.0,15.0, 100.0]) RETURN n
which wouldn't work, which unfortunately means you have two options:
Get the code and create a pull request adding this in
Go dirty and use the IRawGraphClient interface. Now this is VERY frowned upon, and I wouldn't normally suggest it, but I don't see you having much choice if you want to use the client as-is. To do this you need to do something like: (sorry Tatham)
((IRawGraphClient)client).ExecuteGetCypherResults<Node<string>>(new CypherQuery("START n=node:geom('withinDistance:[60.0,15.0, 100.0]') RETURN n", null, CypherResultMode.Projection));
I don't know the spatial system, so you'll have to wait for someone who does know it to get back to you for the other questions - and I have no idea what is returned (hence the Node<string> return type, but if you get that worked out, you should change that to a proper POCO.
After some trial and error and help from the experts in the Neo4j google group all my problems are now solved :)
Neo4jClient can be used to query withinDistance as below. Unfortunately withinDistance couldn't handle attaching parameters in the normal way so you would probably want to check your latitude, longitude and distance before using them. Also those metrics have to be doubles in order for the query to work.
var queryString = string.Format("withinDistance:[" + latitude + ", " + longitude + ", " + distance + "]");
var graphResults = graphClient.Cypher
.Start(new { city = Node.ByIndexQuery("geom", queryString) })
.Where("city:City")
.Return((city) => new
{
Entity = city.As<CityEntity>()
})
.Limit(1)
.Results;
Cypher cannot be used to return distance, you have to calculate it yourself. Obviously you should be able to use REST http://localhost:7474/db/data/index/node/geom?query=withinDistance:[60.0,15.0,100.0]&ordering=score to get the score (distance) but I didn't get that working and I want to user cypher.
No there isn't but limit the result to 1 as in the query above and you will be fine.
A last note regarding this subject is that you should not add your nodes to the spatial layer just the spatial index. I had a lot of problems and strange exceptions before figure this one out.