Neo4j: Maintaining node counts - neo4j

I'm investigating the use of Neo4j to detect potentially fraudulent card transactions in near real time. I receive details of a customer and a card they've just used from our on-line systems. What I'm trying to do here is create new nodes for the customer and card if they don't exist, then establish the relationship between them.
Whenever the customer uses the card I want to set the time the card was last used, in addition, if this is the first time this customer-->card relationship has been seen, update totals of the number of cards the customer is associated with and the number of customers associated with the card.
The Cypher below seems to work, however I think it will re-evaluate the counts every time the relationship is seen, not just on the create. Is it possible to use the ON MATCH and ON CREATE in this statement to limit the unnecessary processing?
MERGE (c:customers {customer_id:"12345678"})
MERGE (a:cards {card_hash:"45uIic..."})
MERGE (c)-[r:has_card]->(a)
set r.last_transaction = "30-NOV-2016 07:58:42"
set a.card_ct = size(()-[:has_card]->(a))
set c.card_count = size((c)-[:has_card]->())
I'm running this from Python (using py2neo), I also want to return something back that will allow me to kick off a bespoke dijkstra based search of the neighborhood. Any ideas how I'd return some variable based on whether this was a new or existing relationship?

There is no need you to even have the card_ct or card_count properties.
Since neo4j 2.1, getting a count of the number of relationships of a specific type from a node is very efficient. So, every time you need a count, just use SIZE(()-[:has_card]->(node)) or SIZE((node)-[:has_card]->()).

How about something like this. Create a counter on a MATCH and if the counter is greater than zero then it is an existing relationship. Otherwise it is a new relationship.
MERGE (c:customers {customer_id:"12345678"})
MERGE (a:cards {card_hash:"45uIic..."})
MERGE (c)-[r:has_card]->(a)
ON MATCH SET r.num = coalesce(r.num, 0) + 1
set r.last_transaction = "30-NOV-2016 07:58:42"
set a.card_ct = size(()-[:has_card]->(a))
set c.card_count = size((c)-[:has_card]->())
RETURN
CASE
WHEN r.num > 0 THEN false
ELSE true
END as new_relationship

Here's the Cypher I've ended up with, thanks to Dave Bennett for his suggestion. I also realised that I don't need to initiate any further analysis if only 1 customer is associated with 1 card so I've excluded this as well.
MERGE (c:customers {customer_id:"12345678"})
MERGE (a:cards {card_hash:"BFgn..."})
MERGE (c)-[r:has_card]->(a)
ON CREATE SET a.card_scheme = "VISA DEBIT"
, a.card_ct = size(()-[:has_card]->(a))
, c.card_count = size((c)-[:has_card]->())
ON MATCH SET r.ind = 1
SET r.last_transaction = "06-Dec-2016 11:19:13"
RETURN CASE WHEN exists(r.ind)
AND a.card_ct + c.card_count > 2
THEN false
ELSE true END as new_relationship

Related

How to create random relationships between nodes?

I'm trying to create random transaction between bank accounts. I have created the following query:
//Create transactions
CALL apoc.periodic.iterate("
match (a:BANK_ACCOUNT)
WITH apoc.coll.randomItem(collect(a)) as sender
return sender", "
MATCH (b:BANK_ACCOUNT)
WHERE NOT sender = b
WITH apoc.coll.randomItem(collect(b)) as receiver
MERGE (sender)-[r:HAS_TRANSFERED {time: datetime()}]->(receiver)
set r.amount = rand()*1000",
{batchSize:100, parallel:false});
I would assume that it would create 100 random transactions between random bank accounts. Instead it creates 1 new bank account and 1 new relationship. What am I doing wrong and what should I do?
Thanks for your help !
The following query uses apoc.coll.randomItems to get 200 different random accounts at one time (which is much faster than getting one random account 200 times):
MATCH (ba:BankAccount)
WITH apoc.coll.randomItems(COLLECT(ba), 200) AS accts
WHERE SIZE(accts) > 1
UNWIND RANGE(0, SIZE(accts)/2*2-1, 2) AS i
WITH accts[i] AS sender, accts[i+1] AS receiver
CREATE (sender)-[:TRANSFERED_TO {time: datetime()}]->(receiver)
Notes:
This query uses CREATE instead of MERGE because it is unlikely that a TRANSFERED_TO relationship already exists with the current time as the time value. (You can choose to use MERGE anyway, if duplication is still possible.)
The WHERE SIZE(accts) > 1 test avoids errors when there are not at least 2 accounts.
SIZE(accts)/2*2-1 calculation prevents the RANGE function from generating a list index (i) that exceeds the last valid index for a sender account.

Neo4J Insertion taking time

I have a query which is taking the long time to insert in neo4j roughly the query looks like following :
create index on :symaccess_symdev(dir_port);
create index on :symaccess_symdev(host_lun);
create index on :symaccess_symdev(ini_tiator_group_name);
create index on :symaccess_symdev(sym_dev);
CALL apoc.load.json('file:////root/output/1530115956414/dev.json') YIELD
value AS row UNWIND row.symdev AS symdevs
MERGE (accesssymdev:symaccess_symdev {
sym_dev: symdevs.sym_dev,
host_lun: symdevs.host_lun,
ini_tiator_group_name: symdevs.ini_tiator_group_name,
dir_port: symdevs.dir_port
})
ON CREATE SET
accesssymdev.attr_percentage = symdevs.attr_percentage,
accesssymdev.cap_mb = toFloat(symdevs.cap_mb),
accesssymdev.physicaldevicename = symdevs.physicaldevicename;
Assuming that the sym_dev property value is unique for every symaccess_symdev node, then this query may be faster:
CALL apoc.load.json('file:////root/output/1530115956414/dev.json') YIELD
value AS row UNWIND row.symdev AS symdevs
MERGE (a:symaccess_symdev {sym_dev: symdevs.sym_dev})
ON CREATE SET
a.host_lun = symdevs.host_lun,
a.ini_tiator_group_name = symdevs.ini_tiator_group_name,
a.dir_port = symdevs.dir_port,
a.attr_percentage = symdevs.attr_percentage,
a.cap_mb = toFloat(symdevs.cap_mb),
a.physicaldevicename = symdevs.physicaldevicename;
A MERGE will only use at most one index, so your current query will cause the Cypher planner to pick one index (out of the 4 that are applicable). After using that index to generate a set of candidate nodes, it would still need to check the other 3 properties for each candidate node. If it had picked an index that is not very selective (because there tends to be many nodes with the same property value), then a lot of work would need to be done per MERGE.
Assuming that the sym_dev property value is unique, the above query simplifies the MERGE so that it will quickly discover whether the wanted symaccess_symdev node existed, and without needing to check any other properties.

Set degree of Nodes

I have a graph in which I am keeping the degree of a node as a property called "degree" in the node.
What I need is when I create an edge between two nodes, I need to increment the degree of the two nodes.
For creating unique edges I am using "CREATE UNIQUE" for the edges. So if I need to increment the property "degree" of the corresponding nodes, I need to use "ON CREATE" and "ON MATCH" as it is for "MERGE".
But I can't use the ON CREATE and ON MATCH with CREATE UNIQUE. So whats the proper way of using ON CREATE and ON MATCH with CREATE UNIQUE?
This is the way I am trying:
MATCH (n1:PER {Node_Id:"X"}), (n2:PER {Node_Id:"Y"}) WHERE n1.Node_Id<>n2.Node_Id CREATE UNIQUE (n1)-[r:PER_PER {Doc_Id:"st_new", Event_Class:"EC_1", Event_Instance:"EI_1"}]-(n2) ON CREATE SET n1.degree = n1.degree + 1, n2.degree = n2.degree + 1
Not sure why you want to store the degree as a property. Neo4j has a getDegree() function on API level, see http://neo4j.com/docs/stable/javadocs/org/neo4j/graphdb/Node.html#getDegree(). Cypher is not yet using this everywhere possible but for some patterns it already does.
Nevertheless to answer your question: just use MERGE instead of CREATE UNIQUE for eventually establishing the relationship:
MATCH (n1:PER {Node_Id:"X"}), (n2:PER {Node_Id:"Y"})
WHERE n1.Node_Id<>n2.Node_Id
MERGE (n1)-[r:PER_PER {Doc_Id:"st_new", Event_Class:"EC_1", Event_Instance:"EI_1"}]-(n2)
ON CREATE SET n1.degree = n1.degree + 1, n2.degree = n2.degree + 1
If you do have a good reason to store node degrees as a property, please have a look at one of our modules called RelCount, which has been build exactly for what you need to do. It is described in detail in my thesis.
However, as Stefan points out, have a go with getDegree() first and only if that isn't fast enough, or you need to get degree based on some relationship property values as well, use RelCount.

Neo4j: Sum relationship properties where node properties equal Value A and Value B (intersection)

Basically my question is: how do I sum relationship properties where there is a related nodes that have properties equal to Value A and Value B?
For example:
I have a simple DB has the following relationship:
(site)-[:HAS_MEMBER]->(user)-[:POSTED]->(status)-[:TAGGED_WITH]->(tag)
On [:TAGGED_WITH] I have a property called "TimeSpent". I can easily SUM up all the time spent for a particular day and user by using the following query:
MATCH (user)-[:POSTED]->(updates)-[r:TAGGED_WITH]->(tags)
WHERE user.name = "Josh Barker" AND updates.date = 20141120
RETURN tags.name, SUM(r.TimeSpent) as totalTimeSpent;
This returns to me a nice table with tags and associated time spent on each. (i.e. #Meeting 4.5). However, the question arises if I want to do some advanced searches and say "Show me all the meetings for ProjectA" (i.e. #Meeting #ProjectA). Basically, I am looking for a query that I can get all of the relationships where a single status has BOTH tags (and only if it has both). Then I can SUM that number up to get a count for how many meetings I spent in #ProjectA.
How do I do this?
MATCH (updates)-[r:TAGGED_WITH]->(tag1 {name: 'Meeting'}),
(updates)-[r:TAGGED_WITH]->(tag2 {name: 'ProjectA'})
RETURN SUM(r.TimeSpent) as totalTimeSpent, count(updates);
This should find all updates tagged with both of those things, and sum all time spent across all of those updates.
To create a generic solution where you may want one or more tags you could use something like this, passing in the array of tags as a parameter (and using the length of the array instead of the hard coded 2.
MATCH (user)-[:POSTED]->(update)-[r:TAGGED_WITH]->(tag)
WHERE user.name = "Josh Barker" AND updates.date = 20141120 AND tag.name IN ['Meeting', 'ProjectA']
WITH update, SUM(r.TimeSpent) AS totalTimeSpent, COLLECT(tag) AS tags
WHERE LENGTH(tags) = 2
RETURN update, totalTtimeSpent
As long as tag.name is indexed, this should be fast.
Edit - Remove User constraint
MATCH (update)-[r:TAGGED_WITH]->(tag)
WHERE tag.name IN ['Meeting', 'ProjectA']
WITH update, SUM(r.TimeSpent) AS totalTimeSpent, COLLECT(tag) AS tags
WHERE LENGTH(tags) = 2
RETURN update, totalTtimeSpent

How use Cypher with Merge to create a unique sub graph path

in Neo4j 2.0 M06 I understand that CREATE UNIQUE is depreciated and replaced with MERGE and MATCH instead, but I am finding it hard to see how this can be used to create a unique path.
as an example, I want to create a
MERGE root-[:HAS_CALENDER]->(cal:Calender{name:'Booking'})-[:HAS_YEAR]->(year:Year{value:2013})-[:HAS_MONTH]-(month:Month{value:'January'})-[:HAS_DAY]->(day:Day{value:1})
ON CREATE cal
SET cal.created = timestamp()
ON CREATE year
SET year.created = timestamp()
ON CREATE month
SET month.created = timestamp()
ON CREATE day
SET day.created = timestamp()
intention is that when I try to add a new days to my calender, it should only create the year, and month when it does not exist else just add to the existing path. Now when i run the query, i get an STATEMENT_EXECUTION_ERROR
MERGE only supports single node patterns
should I be executing multiple statements here to achieve this.
So the question is what's the best way in Neo4j to handle cases like this?
Edit
I did change my approach a bit and now even after making multiple calls, I think my merge is happening at a label level and not trying to restrict to the start node I provide as a result I am ending up with nodes that are shared across years and month which is not what I was expecting
I would really appreciate if some one can suggest me how to get a proper graph like below
my c# code is somewhat like this:
var qry = GraphClient.Cypher
.Merge("(cal:CalendarType{ Name: {calName}})")
.OnCreate("cal").Set("cal = {newCal}")
.With("cal")
.Start(new { root = GraphClient.RootNode})
.CreateUnique("(root)-[:HAS_CALENDAR]->(cal)")
.WithParams(new { calName = newCalender.Name, newCal = newCalender })
.Return(cal => cal.Node<CalenderType>());
var calNode = qry.Results.Single();
var newYear = new Year { Name = date.Year.ToString(), Value = date.Year }.RunEntityHousekeeping();
var qryYr = GraphClient.Cypher
.Merge("(year:Year{ Value: {yr}})")
.OnCreate("year").Set("year = {newYear}")
.With("year")
.Start(new { calNode })
.CreateUnique("(calNode)-[:HAS_YEAR]->(year)")
.WithParams(new { yr = newYear.Value, newYear = newYear })
.Return(year => year.Node<Year>());
var yearNode = qryYr.Results.Single();
var newMonth = new Month { Name = date.Month.ToString(), Value = date.Month }.RunEntityHousekeeping();
var qryMonth = GraphClient.Cypher
.Merge("(mon:Month{ Value: {mnVal}})")
.OnCreate("mon").Set("mon = {newMonth}")
.With("mon")
.Start(new { yearNode })
.CreateUnique("(yearNode)-[:HAS_MONTH]->(mon)")
.WithParams(new { mnVal = newMonth.Value, newMonth = newMonth })
.Return(mon => mon.Node<Month>());
var monthNode = qryMonth.Results.Single();
var newDay = new Day { Name = date.Day.ToString(), Value = date.Day, Date = date.Date }.RunEntityHousekeeping();
var qryDay = GraphClient.Cypher
.Merge("(day:Day{ Value: {mnVal}})")
.OnCreate("day").Set("day = {newDay}")
.With("day")
.Start(new { monthNode })
.CreateUnique("(monthNode)-[:HAS_DAY]->(day)")
.WithParams(new { mnVal = newDay.Value, newDay = newDay })
.Return(day => day.Node<Day>());
var dayNode = qryDay.Results.Single();
Regards
Kiran
Nowhere on the documentation page does it say that CREATE UNIQUE has been deprecated.
MERGE is just a new approach that's available to you. It enables some new scenarios (matching based on labels, and ON CREATE and ON MATCH triggers) but also does not cover more complex scenarios (more than a single node).
It sounds like you're already familiar with CREATE UNIQUE. For now, I think you should still be using that.
It seems to me the picture of what you want your graph to look like has the order imposed by relationships, but your code models the order with nodes. If you want that graph, you will need to use relationship types like [2010], [2011] instead of a pattern like [HAS_YEAR]->({value:2010}).
Another way to say the same thing: you are trying to constitute uniqueness for a node intrinsically, by a combination of label and property, e.g. (unique:Day {value:4}). Assuming you have the relevant constraints, this would be database wide uniqueness, so only one fourth-day-of-the-month for all the months to share. What you want is extrinsic local uniqueness, uniqueness established and extended transitively by a hierarchy of relationships. Uniqueness for a node is then not in its internal properties but in its external 'position' or 'order' in relation to its parent. The locally unique pattern (month)-[:locally_unique_rel]->(day) is made unique for a wider scope when the month is made unique, and the month is made unique, not by property and label, but extrinsically by its 'order' or 'position' under its year. Hence the transitivity. I think this is a strength of modeling with graphs, among other things it allows you to continue to partition your structure. If for instance you want to split some of your days into AM and PM or into hours, you can easily do so.
So, in your graph, [HAS_DAY] makes all days equally related to their month, and cannot therefore be used to differentiate between them. You have solved this locally under a month, since the property value differentiates, but since the fourth-day-of-the-month in
(november)-[:HAS_DAY]->(4th)` and `(december)-[:HAS_DAY]->(4th)
are not distinct by property value or label, they are the same node in your graph. Locally, under a month say, unique nodes can be achieved equally with
[11]->()-[4]->(unique1), [11]->()-[5]->(unique2)
and
[HAS_MONTH]->({value:11})-[HAS_DAY]->(unique1 {value:4}),
[HAS_MONTH]->({value:11})-[HAS_DAY]->(unique2 {value:5})
The difference is that with the former extrinsic local uniqueness, you have the benefit of transitivity. Since the months are unique in a year, as (november) in [11]->(november) is locally unique, therefore the days of November are also unique in that year - the (fourth) node is distinct between
[11]->(november)-[4]->(fourth)
and
[12]-(december)->[4]->(fourth)
What this amounts to is transferring more of your semantic model to your relationships, leaving the nodes for storing data. The node identifiers in the picture you posted are only pedagogical, replacing them with x,y,z or empty parentheses would perhaps better reveal the structure or scaffolding of the graph.
If you want to keep the relationship types intact, adding an ordering property to each relationship to create a pattern like (november)-[:HAS_DAY {order:4}]->(4th) will also work. This may be less performant for querying, but you may have other concerns that make it worth it.
This code allows you to create calendar graphs on demand upon creation of an event for a specific day. You'll want to modify it to allow events on multiple days, but it seems more like your issue is creating unique paths, right? And you'd probably want to modify this to use parameters in your language of choice.
First I create the root:
CREATE (r:Root {id:'root'})
Then use this reusable MERGE query to successively match or create subgraphs for the calendar. I pass along the root so I can display the graph at the end:
MATCH (r:Root)
MERGE r-[:HAS_CAL]->(cal:Calendar {id:'General'})
WITH r,cal MERGE (cal)-[:HAS_YEAR]->(y:Year {id:2011})
WITH r,y MERGE (y)-[:HAS_MONTH]->(m:Month {id:'Jan'})
WITH r,m MERGE (m)-[:HAS_DAY]->(d:Day {id:1})
CREATE d-[:SCHEDULED_EVENT]->(e:Event {id:'ev3', t:timestamp()})
RETURN r-[*1..5]-()
Creates a graph like this when called multiple times:
Does this help?

Resources