How use Cypher with Merge to create a unique sub graph path - neo4j

in Neo4j 2.0 M06 I understand that CREATE UNIQUE is depreciated and replaced with MERGE and MATCH instead, but I am finding it hard to see how this can be used to create a unique path.
as an example, I want to create a
MERGE root-[:HAS_CALENDER]->(cal:Calender{name:'Booking'})-[:HAS_YEAR]->(year:Year{value:2013})-[:HAS_MONTH]-(month:Month{value:'January'})-[:HAS_DAY]->(day:Day{value:1})
ON CREATE cal
SET cal.created = timestamp()
ON CREATE year
SET year.created = timestamp()
ON CREATE month
SET month.created = timestamp()
ON CREATE day
SET day.created = timestamp()
intention is that when I try to add a new days to my calender, it should only create the year, and month when it does not exist else just add to the existing path. Now when i run the query, i get an STATEMENT_EXECUTION_ERROR
MERGE only supports single node patterns
should I be executing multiple statements here to achieve this.
So the question is what's the best way in Neo4j to handle cases like this?
Edit
I did change my approach a bit and now even after making multiple calls, I think my merge is happening at a label level and not trying to restrict to the start node I provide as a result I am ending up with nodes that are shared across years and month which is not what I was expecting
I would really appreciate if some one can suggest me how to get a proper graph like below
my c# code is somewhat like this:
var qry = GraphClient.Cypher
.Merge("(cal:CalendarType{ Name: {calName}})")
.OnCreate("cal").Set("cal = {newCal}")
.With("cal")
.Start(new { root = GraphClient.RootNode})
.CreateUnique("(root)-[:HAS_CALENDAR]->(cal)")
.WithParams(new { calName = newCalender.Name, newCal = newCalender })
.Return(cal => cal.Node<CalenderType>());
var calNode = qry.Results.Single();
var newYear = new Year { Name = date.Year.ToString(), Value = date.Year }.RunEntityHousekeeping();
var qryYr = GraphClient.Cypher
.Merge("(year:Year{ Value: {yr}})")
.OnCreate("year").Set("year = {newYear}")
.With("year")
.Start(new { calNode })
.CreateUnique("(calNode)-[:HAS_YEAR]->(year)")
.WithParams(new { yr = newYear.Value, newYear = newYear })
.Return(year => year.Node<Year>());
var yearNode = qryYr.Results.Single();
var newMonth = new Month { Name = date.Month.ToString(), Value = date.Month }.RunEntityHousekeeping();
var qryMonth = GraphClient.Cypher
.Merge("(mon:Month{ Value: {mnVal}})")
.OnCreate("mon").Set("mon = {newMonth}")
.With("mon")
.Start(new { yearNode })
.CreateUnique("(yearNode)-[:HAS_MONTH]->(mon)")
.WithParams(new { mnVal = newMonth.Value, newMonth = newMonth })
.Return(mon => mon.Node<Month>());
var monthNode = qryMonth.Results.Single();
var newDay = new Day { Name = date.Day.ToString(), Value = date.Day, Date = date.Date }.RunEntityHousekeeping();
var qryDay = GraphClient.Cypher
.Merge("(day:Day{ Value: {mnVal}})")
.OnCreate("day").Set("day = {newDay}")
.With("day")
.Start(new { monthNode })
.CreateUnique("(monthNode)-[:HAS_DAY]->(day)")
.WithParams(new { mnVal = newDay.Value, newDay = newDay })
.Return(day => day.Node<Day>());
var dayNode = qryDay.Results.Single();
Regards
Kiran

Nowhere on the documentation page does it say that CREATE UNIQUE has been deprecated.
MERGE is just a new approach that's available to you. It enables some new scenarios (matching based on labels, and ON CREATE and ON MATCH triggers) but also does not cover more complex scenarios (more than a single node).
It sounds like you're already familiar with CREATE UNIQUE. For now, I think you should still be using that.

It seems to me the picture of what you want your graph to look like has the order imposed by relationships, but your code models the order with nodes. If you want that graph, you will need to use relationship types like [2010], [2011] instead of a pattern like [HAS_YEAR]->({value:2010}).
Another way to say the same thing: you are trying to constitute uniqueness for a node intrinsically, by a combination of label and property, e.g. (unique:Day {value:4}). Assuming you have the relevant constraints, this would be database wide uniqueness, so only one fourth-day-of-the-month for all the months to share. What you want is extrinsic local uniqueness, uniqueness established and extended transitively by a hierarchy of relationships. Uniqueness for a node is then not in its internal properties but in its external 'position' or 'order' in relation to its parent. The locally unique pattern (month)-[:locally_unique_rel]->(day) is made unique for a wider scope when the month is made unique, and the month is made unique, not by property and label, but extrinsically by its 'order' or 'position' under its year. Hence the transitivity. I think this is a strength of modeling with graphs, among other things it allows you to continue to partition your structure. If for instance you want to split some of your days into AM and PM or into hours, you can easily do so.
So, in your graph, [HAS_DAY] makes all days equally related to their month, and cannot therefore be used to differentiate between them. You have solved this locally under a month, since the property value differentiates, but since the fourth-day-of-the-month in
(november)-[:HAS_DAY]->(4th)` and `(december)-[:HAS_DAY]->(4th)
are not distinct by property value or label, they are the same node in your graph. Locally, under a month say, unique nodes can be achieved equally with
[11]->()-[4]->(unique1), [11]->()-[5]->(unique2)
and
[HAS_MONTH]->({value:11})-[HAS_DAY]->(unique1 {value:4}),
[HAS_MONTH]->({value:11})-[HAS_DAY]->(unique2 {value:5})
The difference is that with the former extrinsic local uniqueness, you have the benefit of transitivity. Since the months are unique in a year, as (november) in [11]->(november) is locally unique, therefore the days of November are also unique in that year - the (fourth) node is distinct between
[11]->(november)-[4]->(fourth)
and
[12]-(december)->[4]->(fourth)
What this amounts to is transferring more of your semantic model to your relationships, leaving the nodes for storing data. The node identifiers in the picture you posted are only pedagogical, replacing them with x,y,z or empty parentheses would perhaps better reveal the structure or scaffolding of the graph.
If you want to keep the relationship types intact, adding an ordering property to each relationship to create a pattern like (november)-[:HAS_DAY {order:4}]->(4th) will also work. This may be less performant for querying, but you may have other concerns that make it worth it.

This code allows you to create calendar graphs on demand upon creation of an event for a specific day. You'll want to modify it to allow events on multiple days, but it seems more like your issue is creating unique paths, right? And you'd probably want to modify this to use parameters in your language of choice.
First I create the root:
CREATE (r:Root {id:'root'})
Then use this reusable MERGE query to successively match or create subgraphs for the calendar. I pass along the root so I can display the graph at the end:
MATCH (r:Root)
MERGE r-[:HAS_CAL]->(cal:Calendar {id:'General'})
WITH r,cal MERGE (cal)-[:HAS_YEAR]->(y:Year {id:2011})
WITH r,y MERGE (y)-[:HAS_MONTH]->(m:Month {id:'Jan'})
WITH r,m MERGE (m)-[:HAS_DAY]->(d:Day {id:1})
CREATE d-[:SCHEDULED_EVENT]->(e:Event {id:'ev3', t:timestamp()})
RETURN r-[*1..5]-()
Creates a graph like this when called multiple times:
Does this help?

Related

Neo4J Insertion taking time

I have a query which is taking the long time to insert in neo4j roughly the query looks like following :
create index on :symaccess_symdev(dir_port);
create index on :symaccess_symdev(host_lun);
create index on :symaccess_symdev(ini_tiator_group_name);
create index on :symaccess_symdev(sym_dev);
CALL apoc.load.json('file:////root/output/1530115956414/dev.json') YIELD
value AS row UNWIND row.symdev AS symdevs
MERGE (accesssymdev:symaccess_symdev {
sym_dev: symdevs.sym_dev,
host_lun: symdevs.host_lun,
ini_tiator_group_name: symdevs.ini_tiator_group_name,
dir_port: symdevs.dir_port
})
ON CREATE SET
accesssymdev.attr_percentage = symdevs.attr_percentage,
accesssymdev.cap_mb = toFloat(symdevs.cap_mb),
accesssymdev.physicaldevicename = symdevs.physicaldevicename;
Assuming that the sym_dev property value is unique for every symaccess_symdev node, then this query may be faster:
CALL apoc.load.json('file:////root/output/1530115956414/dev.json') YIELD
value AS row UNWIND row.symdev AS symdevs
MERGE (a:symaccess_symdev {sym_dev: symdevs.sym_dev})
ON CREATE SET
a.host_lun = symdevs.host_lun,
a.ini_tiator_group_name = symdevs.ini_tiator_group_name,
a.dir_port = symdevs.dir_port,
a.attr_percentage = symdevs.attr_percentage,
a.cap_mb = toFloat(symdevs.cap_mb),
a.physicaldevicename = symdevs.physicaldevicename;
A MERGE will only use at most one index, so your current query will cause the Cypher planner to pick one index (out of the 4 that are applicable). After using that index to generate a set of candidate nodes, it would still need to check the other 3 properties for each candidate node. If it had picked an index that is not very selective (because there tends to be many nodes with the same property value), then a lot of work would need to be done per MERGE.
Assuming that the sym_dev property value is unique, the above query simplifies the MERGE so that it will quickly discover whether the wanted symaccess_symdev node existed, and without needing to check any other properties.

Neo4J Match & Set Multiple Relationships/Nodes in One Query

I'm currently running the following query to update the properties on two nodes and relationships.
I'd like to be able to update 1,000 nodes and the corresponding relationships in one query.
MATCH (p1:Person)-[r1:OWNS_CAR]->(c1:Car) WHERE id(r1) = 3018
MATCH (p2:Person)-[r2:OWNS_CAR]->(c2:Car) WHERE id(r2) = 3019
SET c1.serial_number = 'SERIAL027436', c1.signature = 'SIGNATURE728934',
r1.serial_number = 'SERIAL78765', r1.signature = 'SIGNATURE749532',
c2.serial_number = 'SERIAL027436', c2.signature = 'SIGNATURE728934',
r2.serial_number = 'SERIAL78765', r2.signature = 'SIGNATURE749532'
This query has issues when you run it in larger quantities. Is there a better way?
Thank you.
You could work with a LOAD CSV. Your input would contain the keys (not the ids, using the ids is not recommended) for Person and Car and whatever properties you need to set. For example
personId, carId, serial_number, signature
00001, 00045, SERIAL78765, SIGNATURE728934
00002, 00046, SERIAL78665, SIGNATURE724934
Your query would then be something like :
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///input.csv' AS row
MATCH (p:Person {personId: row.PersonId})-[r:OWNS_CAR]->(c:Car {carId: row.carId})
SET r.serial_number = row.serialnumber, c.signature = row.signature
Note that you should have unique constraints on Person and Car to make that work. You can do thousands (even millions) like that very quickly ...
Hope this helps,
Tom

Neo4j: Maintaining node counts

I'm investigating the use of Neo4j to detect potentially fraudulent card transactions in near real time. I receive details of a customer and a card they've just used from our on-line systems. What I'm trying to do here is create new nodes for the customer and card if they don't exist, then establish the relationship between them.
Whenever the customer uses the card I want to set the time the card was last used, in addition, if this is the first time this customer-->card relationship has been seen, update totals of the number of cards the customer is associated with and the number of customers associated with the card.
The Cypher below seems to work, however I think it will re-evaluate the counts every time the relationship is seen, not just on the create. Is it possible to use the ON MATCH and ON CREATE in this statement to limit the unnecessary processing?
MERGE (c:customers {customer_id:"12345678"})
MERGE (a:cards {card_hash:"45uIic..."})
MERGE (c)-[r:has_card]->(a)
set r.last_transaction = "30-NOV-2016 07:58:42"
set a.card_ct = size(()-[:has_card]->(a))
set c.card_count = size((c)-[:has_card]->())
I'm running this from Python (using py2neo), I also want to return something back that will allow me to kick off a bespoke dijkstra based search of the neighborhood. Any ideas how I'd return some variable based on whether this was a new or existing relationship?
There is no need you to even have the card_ct or card_count properties.
Since neo4j 2.1, getting a count of the number of relationships of a specific type from a node is very efficient. So, every time you need a count, just use SIZE(()-[:has_card]->(node)) or SIZE((node)-[:has_card]->()).
How about something like this. Create a counter on a MATCH and if the counter is greater than zero then it is an existing relationship. Otherwise it is a new relationship.
MERGE (c:customers {customer_id:"12345678"})
MERGE (a:cards {card_hash:"45uIic..."})
MERGE (c)-[r:has_card]->(a)
ON MATCH SET r.num = coalesce(r.num, 0) + 1
set r.last_transaction = "30-NOV-2016 07:58:42"
set a.card_ct = size(()-[:has_card]->(a))
set c.card_count = size((c)-[:has_card]->())
RETURN
CASE
WHEN r.num > 0 THEN false
ELSE true
END as new_relationship
Here's the Cypher I've ended up with, thanks to Dave Bennett for his suggestion. I also realised that I don't need to initiate any further analysis if only 1 customer is associated with 1 card so I've excluded this as well.
MERGE (c:customers {customer_id:"12345678"})
MERGE (a:cards {card_hash:"BFgn..."})
MERGE (c)-[r:has_card]->(a)
ON CREATE SET a.card_scheme = "VISA DEBIT"
, a.card_ct = size(()-[:has_card]->(a))
, c.card_count = size((c)-[:has_card]->())
ON MATCH SET r.ind = 1
SET r.last_transaction = "06-Dec-2016 11:19:13"
RETURN CASE WHEN exists(r.ind)
AND a.card_ct + c.card_count > 2
THEN false
ELSE true END as new_relationship

Merging a set of nodes onto one (only in query)

I am currently investigating how to model a bitemporal graph in neo4j. Unfortunately noone seems to have publicly undertaken this before.
One particular thing I am looking at is whether I can store in a new node only those values that have changed and then express a query that would merge all those values ordered by a given timestamp:
This creates the data I am playing with:
CREATE (:P1 {id: '1'})<-[:EXPANDS {date:5200, recorded:5100}]-(:P1Data {name:'Joe', wage: 3000})
// New data, recorded 2014-10-1 for 2015-1-1
MATCH (p:P1 {id: '1'}) CREATE (:P1Data { wage:3100 })-[:EXPANDS { date:5479, recorded: 5387}]->(p)
Now, I can get a history for a given point in time so far, e.g. like
MATCH (:P1 { id: '1' })<-[x:EXPANDS]-(d:P1Data)
WHERE x.recorded < 6000
WITH {date: x.date, data:d} as data
RETURN data
ORDER BY data.date DESC
What I would like to achieve is to merge the name and wage values such that I get a whole view of the data at a given point in time. The answer may also be that this is not really possible.
(PS: I say only in query, because I found a refactor function in apoc which does merge nodes, but that procedure actually merges and persists the node, while I would just want to query it).
As with most things, you can do it using REDUCE like so:
MATCH (:P1 { id: '1' })<-[x:EXPANDS]-(d:P1Data)
WITH x.date AS date, d AS data
ORDER BY date
WITH COLLECT(data) AS datas
WITH REDUCE(s = {}, y IN datas|
{name: COALESCE(y.name, s.name),
wage: COALESCE(y.wage, s.wage)})
AS most_recent_fields
RETURN most_recent_fields.name AS name, most_recent_fields.wage AS wage
You can do it in descending order instead (swap s and y inside the COALESCE statements if so), but there isn't really a way to shortcut processing the entire set of results from your queried time back to the start.
UPDATE: This will, of course, generate a Map and not a Node, but if you only want the properties and don't want to create a permanent record, a Map is actually better suited to your needs.
EXTENDED: If you don't want to specify which keys to use, you can do it without REDUCE like this instead:
MATCH (:P1 { id: '1' })<-[x:EXPANDS]-(d:P1Data)
WITH x.date AS date, d AS data
ORDER BY date
WITH COLLECT(data) AS datas
CREATE (t:Temp)
FOREACH(data IN datas|
SET t += data)
DELETE t
RETURN t
This approach does create a node, but if you DELETE it right before you RETURN it, it won't persist at all. += ensures that pre-existing properties aren't removed, only overwritten if the data node has existing values.

Neo4j: Sum relationship properties where node properties equal Value A and Value B (intersection)

Basically my question is: how do I sum relationship properties where there is a related nodes that have properties equal to Value A and Value B?
For example:
I have a simple DB has the following relationship:
(site)-[:HAS_MEMBER]->(user)-[:POSTED]->(status)-[:TAGGED_WITH]->(tag)
On [:TAGGED_WITH] I have a property called "TimeSpent". I can easily SUM up all the time spent for a particular day and user by using the following query:
MATCH (user)-[:POSTED]->(updates)-[r:TAGGED_WITH]->(tags)
WHERE user.name = "Josh Barker" AND updates.date = 20141120
RETURN tags.name, SUM(r.TimeSpent) as totalTimeSpent;
This returns to me a nice table with tags and associated time spent on each. (i.e. #Meeting 4.5). However, the question arises if I want to do some advanced searches and say "Show me all the meetings for ProjectA" (i.e. #Meeting #ProjectA). Basically, I am looking for a query that I can get all of the relationships where a single status has BOTH tags (and only if it has both). Then I can SUM that number up to get a count for how many meetings I spent in #ProjectA.
How do I do this?
MATCH (updates)-[r:TAGGED_WITH]->(tag1 {name: 'Meeting'}),
(updates)-[r:TAGGED_WITH]->(tag2 {name: 'ProjectA'})
RETURN SUM(r.TimeSpent) as totalTimeSpent, count(updates);
This should find all updates tagged with both of those things, and sum all time spent across all of those updates.
To create a generic solution where you may want one or more tags you could use something like this, passing in the array of tags as a parameter (and using the length of the array instead of the hard coded 2.
MATCH (user)-[:POSTED]->(update)-[r:TAGGED_WITH]->(tag)
WHERE user.name = "Josh Barker" AND updates.date = 20141120 AND tag.name IN ['Meeting', 'ProjectA']
WITH update, SUM(r.TimeSpent) AS totalTimeSpent, COLLECT(tag) AS tags
WHERE LENGTH(tags) = 2
RETURN update, totalTtimeSpent
As long as tag.name is indexed, this should be fast.
Edit - Remove User constraint
MATCH (update)-[r:TAGGED_WITH]->(tag)
WHERE tag.name IN ['Meeting', 'ProjectA']
WITH update, SUM(r.TimeSpent) AS totalTimeSpent, COLLECT(tag) AS tags
WHERE LENGTH(tags) = 2
RETURN update, totalTtimeSpent

Resources