Neo4J Match & Set Multiple Relationships/Nodes in One Query - neo4j

I'm currently running the following query to update the properties on two nodes and relationships.
I'd like to be able to update 1,000 nodes and the corresponding relationships in one query.
MATCH (p1:Person)-[r1:OWNS_CAR]->(c1:Car) WHERE id(r1) = 3018
MATCH (p2:Person)-[r2:OWNS_CAR]->(c2:Car) WHERE id(r2) = 3019
SET c1.serial_number = 'SERIAL027436', c1.signature = 'SIGNATURE728934',
r1.serial_number = 'SERIAL78765', r1.signature = 'SIGNATURE749532',
c2.serial_number = 'SERIAL027436', c2.signature = 'SIGNATURE728934',
r2.serial_number = 'SERIAL78765', r2.signature = 'SIGNATURE749532'
This query has issues when you run it in larger quantities. Is there a better way?
Thank you.

You could work with a LOAD CSV. Your input would contain the keys (not the ids, using the ids is not recommended) for Person and Car and whatever properties you need to set. For example
personId, carId, serial_number, signature
00001, 00045, SERIAL78765, SIGNATURE728934
00002, 00046, SERIAL78665, SIGNATURE724934
Your query would then be something like :
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///input.csv' AS row
MATCH (p:Person {personId: row.PersonId})-[r:OWNS_CAR]->(c:Car {carId: row.carId})
SET r.serial_number = row.serialnumber, c.signature = row.signature
Note that you should have unique constraints on Person and Car to make that work. You can do thousands (even millions) like that very quickly ...
Hope this helps,
Tom

Related

Neo4J Insertion taking time

I have a query which is taking the long time to insert in neo4j roughly the query looks like following :
create index on :symaccess_symdev(dir_port);
create index on :symaccess_symdev(host_lun);
create index on :symaccess_symdev(ini_tiator_group_name);
create index on :symaccess_symdev(sym_dev);
CALL apoc.load.json('file:////root/output/1530115956414/dev.json') YIELD
value AS row UNWIND row.symdev AS symdevs
MERGE (accesssymdev:symaccess_symdev {
sym_dev: symdevs.sym_dev,
host_lun: symdevs.host_lun,
ini_tiator_group_name: symdevs.ini_tiator_group_name,
dir_port: symdevs.dir_port
})
ON CREATE SET
accesssymdev.attr_percentage = symdevs.attr_percentage,
accesssymdev.cap_mb = toFloat(symdevs.cap_mb),
accesssymdev.physicaldevicename = symdevs.physicaldevicename;
Assuming that the sym_dev property value is unique for every symaccess_symdev node, then this query may be faster:
CALL apoc.load.json('file:////root/output/1530115956414/dev.json') YIELD
value AS row UNWIND row.symdev AS symdevs
MERGE (a:symaccess_symdev {sym_dev: symdevs.sym_dev})
ON CREATE SET
a.host_lun = symdevs.host_lun,
a.ini_tiator_group_name = symdevs.ini_tiator_group_name,
a.dir_port = symdevs.dir_port,
a.attr_percentage = symdevs.attr_percentage,
a.cap_mb = toFloat(symdevs.cap_mb),
a.physicaldevicename = symdevs.physicaldevicename;
A MERGE will only use at most one index, so your current query will cause the Cypher planner to pick one index (out of the 4 that are applicable). After using that index to generate a set of candidate nodes, it would still need to check the other 3 properties for each candidate node. If it had picked an index that is not very selective (because there tends to be many nodes with the same property value), then a lot of work would need to be done per MERGE.
Assuming that the sym_dev property value is unique, the above query simplifies the MERGE so that it will quickly discover whether the wanted symaccess_symdev node existed, and without needing to check any other properties.

Neo4j: Maintaining node counts

I'm investigating the use of Neo4j to detect potentially fraudulent card transactions in near real time. I receive details of a customer and a card they've just used from our on-line systems. What I'm trying to do here is create new nodes for the customer and card if they don't exist, then establish the relationship between them.
Whenever the customer uses the card I want to set the time the card was last used, in addition, if this is the first time this customer-->card relationship has been seen, update totals of the number of cards the customer is associated with and the number of customers associated with the card.
The Cypher below seems to work, however I think it will re-evaluate the counts every time the relationship is seen, not just on the create. Is it possible to use the ON MATCH and ON CREATE in this statement to limit the unnecessary processing?
MERGE (c:customers {customer_id:"12345678"})
MERGE (a:cards {card_hash:"45uIic..."})
MERGE (c)-[r:has_card]->(a)
set r.last_transaction = "30-NOV-2016 07:58:42"
set a.card_ct = size(()-[:has_card]->(a))
set c.card_count = size((c)-[:has_card]->())
I'm running this from Python (using py2neo), I also want to return something back that will allow me to kick off a bespoke dijkstra based search of the neighborhood. Any ideas how I'd return some variable based on whether this was a new or existing relationship?
There is no need you to even have the card_ct or card_count properties.
Since neo4j 2.1, getting a count of the number of relationships of a specific type from a node is very efficient. So, every time you need a count, just use SIZE(()-[:has_card]->(node)) or SIZE((node)-[:has_card]->()).
How about something like this. Create a counter on a MATCH and if the counter is greater than zero then it is an existing relationship. Otherwise it is a new relationship.
MERGE (c:customers {customer_id:"12345678"})
MERGE (a:cards {card_hash:"45uIic..."})
MERGE (c)-[r:has_card]->(a)
ON MATCH SET r.num = coalesce(r.num, 0) + 1
set r.last_transaction = "30-NOV-2016 07:58:42"
set a.card_ct = size(()-[:has_card]->(a))
set c.card_count = size((c)-[:has_card]->())
RETURN
CASE
WHEN r.num > 0 THEN false
ELSE true
END as new_relationship
Here's the Cypher I've ended up with, thanks to Dave Bennett for his suggestion. I also realised that I don't need to initiate any further analysis if only 1 customer is associated with 1 card so I've excluded this as well.
MERGE (c:customers {customer_id:"12345678"})
MERGE (a:cards {card_hash:"BFgn..."})
MERGE (c)-[r:has_card]->(a)
ON CREATE SET a.card_scheme = "VISA DEBIT"
, a.card_ct = size(()-[:has_card]->(a))
, c.card_count = size((c)-[:has_card]->())
ON MATCH SET r.ind = 1
SET r.last_transaction = "06-Dec-2016 11:19:13"
RETURN CASE WHEN exists(r.ind)
AND a.card_ct + c.card_count > 2
THEN false
ELSE true END as new_relationship

How to correctly use conditionals like IF or CASE in Cypher query language (Neo4J) to successfully create relationships?

I failed to create relationships in Neo4J and I would like to encourage anyone who has sucessfully done it to help me.
The desired result is to have a detailed visualisation of who is a brother to whom, who is who's mother and so on. I want to extract the data from single parent-child relationships. That means, setting a relationship like [:relatedTo {:how['daughter']}] if a node has a parent whose name corresponds to the field node.name and the gender of the node is F.
I have my CSV file that looks like this.
1;Jakub Hančin;M;1994;4;3
2;Hana Hančinová;F;1991;4;3
3;Alojz Hančin jr.;M;1968;15;14
4;Viera Hančinová;F;1968;9;
5;Miroslav Barus sr.;M;1965;9;
6;Helena Barusová;F;1942;;
7;Miroslav Barus jr.;M;1995;6;5
8;Martin Barus;M;1991;6;5
9;Hedviga Barusová;F;1945;;
10;Peter Hančin jr.;M;1991;12;13
11;Zuzka Hančinová;F;1996;12;13
12;Andrea Hančinová;F;1966;;
13;Peter Hančin sr.;M;1965;15;14
14;Alojz Hančin sr.;M;1937;;
15;Anna Hančinová;F;1945;;
This is my personal family tree and I would like to visualize it through Neo4J.
It is a file created with Excel, where I put the information into a table and create a database. Then it was converted to .csv file which is importable into Neo4J. I have sucessfully installed it and now I am at the point of writing the Cypher script to manage it. So far, I have this:
LOAD CSV WITH HEADERS FROM "file:c:/users/Skelo/Desktop/Family Database/Family Database CSV UTF.txt" AS row FIELDTERMINATOR ';'
CREATE (n:Person)
SET n = row, n.name = row.name,
n.personID = toInt(row.personID) , n.G = row.G,
n.Year = toInt(row.Year), n.Parent1 = row.Parent1, n.Parent2 = row.Parent2
WITH n
MATCH(n:Person),(b:Person)
WHERE n.Parent1 = b.name OR n.Parent2 = b.name
CASE b.gender
WHEN b.gender = 'F' THEN
CREATE (b)-[:isRelatedTo{how:['mother']}]->(n)
WHEN b.gender = 'M' THEN
CREATE (b)-[:isRelatedTo{how:['father']}]->(n)
RETURN *
The error message shown looks like this.
Invalid input 'A': expected 'r/R' (line 11, column 2 (offset: 389))
"CASE b.gender"
^
Somehow, I can't figure out why this does not work. Why can't I use the Case command? The Neo4J does not allow me to use anything but the command CREATE (it expects a letter R after C and not an A, this means the CREATE command).
Again, I want to do this. I have a few nodes that are correctly set. For each of those nodes (they represent people), I want to look into the Parent1 and Parent2 fields and to look for a node that has the same name as one of these fields. If it matches one of these, I want to mark that node as a father or a mother to the previous node (judging by the gender of the node, which represents the person).
This way I would like to fill the graph database with many relationships, but I fail at this very basic step. Please help me. If you can, please do not only say what is wrong and why it is wrong, but present a solution that works.
Since you want to create the isRelatedTo relationship regardless of gender and only the property is dependent upon a conditional, do this:
CREATE (b)-[r:isRelatedTo]->(n)
SET r.how = CASE b.gender WHEN 'F' THEN 'mother' ELSE 'father' END

connecting the nodes with relationships in py2neo

I have the following python code to make a graph in neo4j. I am using py2neo version 2.0.3.
import json
from py2neo import neo4j, Node, Relationship, Graph
graph = neo4j.Graph("http://localhost:7474/db/data/")
with open("example.json") as f:
for line in f:
while True:
try:
file = json.loads(line)
break
except ValueError:
# Not yet a complete JSON value
line += next(f)
# Now creating the node and relationships
news, = graph.create(Node("Mainstream_News", id=unicode(file["_id"]), entry_url=unicode(file["entry_url"]),
title=unicode(file["title"]))) # Comma unpacks length-1 tuple.
authors, = graph.create(
Node("Authors", auth_name=unicode(file["auth_name"]), auth_url=unicode(file["auth_url"]),
auth_eml=unicode(file["auth_eml"])))
graph.create(Relationship(news, "hasAuthor", authors ))
I can create a graph with nodes Mainstream_News and Authors with a relation 'hasAuthor'. My problem is when I am doing this I am having one Mainstream_News node with one Authors but in reality one author nodes has more than one Mainstream_News. I would like to make auth_name property of a Author nodes as a index to connect with the Mainstream_news nodes. Any suggestions will be great.
You are creating a new Authors node each time through your loop, even if an Author node (with the same properties) already exists.
First, I think you should create uniqueness constraints on Authors(auth_name) and Mainstream_News(id), to enforce what seem to be your requirements. This only needs to be done once. A uniqueness constraint also creates an index for you automatically, which is a bonus.
graph.schema.create_uniqueness_constraint("Authors", "auth_name")
graph.schema.create_uniqueness_constraint("Mainstream_News", "id")
But you will probably have to empty out your DB first (at least of all Authors and Mainstream_News nodes and their relationships), since I presume it currently has a lot of duplicate nodes.
Then, you can use the merge_one and create_unique APIs to prevent duplicate nodes and relationships:
news = graph.merge_one("Mainstream_News", "id", unicode(file["_id"]))
news.properties["entry_url"] = unicode(file["entry_url"])
news.properties["title"] = unicode(file["title"])
authors = graph.merge_one("Authors", "auth_name", unicode(file["auth_name"]))
news.properties["auth_url"] = unicode(file["auth_url"])
news.properties["auth_eml"] = unicode(file["auth_eml"])
graph.create_unique(Relationship(news, "hasAuthor", authors))
This is what I normally do, as I find it easier to know what's happening. As far as I know there are a but when you create_unique with only a Node, and there are no need to create the nodes, when you also have to create an edge.
I don't have the database on this computer, so please bear with me, if there are some typo'es, I'll correct it in the morning, but I guess you'll rather have a fast answer.. :-)
news = graph.cypher.execute_one('MATCH (m:Mainstream_News) '
'WHERE m.id = {id} '
'RETURN p'.format(id=unicode(file["_id"])))
if not news:
news = Node("Mainstream_News")
news.properties['id] = unicode(file["_id"])
news.properties['entry_url'] = unicode(file["entry_url"])
news.properties['title'] = unicode(file["title"])
# You can make a for-loop here
authors = Node("Authors")
authors.properties['auth_name'] = unicode(file["auth_name"])
authors.properties['auth_url'] = unicode(file["auth_url"])
authors.properties['auth_eml'] = unicode(file["auth_eml"])
rel = Relationship(new, "hasAuthor", authors)
graph.create_unique(rel)
# For-loop should end here
I've included the tree first lines, to make it more generic. It returns a node-object or None.
EDIT:
#cybersam use of schema is cool, implement that to, I'll try to use it myselfe also.. :-)
You can read more about it here:
http://neo4j.com/docs/stable/query-constraints.html
http://py2neo.org/2.0/schema.html

How use Cypher with Merge to create a unique sub graph path

in Neo4j 2.0 M06 I understand that CREATE UNIQUE is depreciated and replaced with MERGE and MATCH instead, but I am finding it hard to see how this can be used to create a unique path.
as an example, I want to create a
MERGE root-[:HAS_CALENDER]->(cal:Calender{name:'Booking'})-[:HAS_YEAR]->(year:Year{value:2013})-[:HAS_MONTH]-(month:Month{value:'January'})-[:HAS_DAY]->(day:Day{value:1})
ON CREATE cal
SET cal.created = timestamp()
ON CREATE year
SET year.created = timestamp()
ON CREATE month
SET month.created = timestamp()
ON CREATE day
SET day.created = timestamp()
intention is that when I try to add a new days to my calender, it should only create the year, and month when it does not exist else just add to the existing path. Now when i run the query, i get an STATEMENT_EXECUTION_ERROR
MERGE only supports single node patterns
should I be executing multiple statements here to achieve this.
So the question is what's the best way in Neo4j to handle cases like this?
Edit
I did change my approach a bit and now even after making multiple calls, I think my merge is happening at a label level and not trying to restrict to the start node I provide as a result I am ending up with nodes that are shared across years and month which is not what I was expecting
I would really appreciate if some one can suggest me how to get a proper graph like below
my c# code is somewhat like this:
var qry = GraphClient.Cypher
.Merge("(cal:CalendarType{ Name: {calName}})")
.OnCreate("cal").Set("cal = {newCal}")
.With("cal")
.Start(new { root = GraphClient.RootNode})
.CreateUnique("(root)-[:HAS_CALENDAR]->(cal)")
.WithParams(new { calName = newCalender.Name, newCal = newCalender })
.Return(cal => cal.Node<CalenderType>());
var calNode = qry.Results.Single();
var newYear = new Year { Name = date.Year.ToString(), Value = date.Year }.RunEntityHousekeeping();
var qryYr = GraphClient.Cypher
.Merge("(year:Year{ Value: {yr}})")
.OnCreate("year").Set("year = {newYear}")
.With("year")
.Start(new { calNode })
.CreateUnique("(calNode)-[:HAS_YEAR]->(year)")
.WithParams(new { yr = newYear.Value, newYear = newYear })
.Return(year => year.Node<Year>());
var yearNode = qryYr.Results.Single();
var newMonth = new Month { Name = date.Month.ToString(), Value = date.Month }.RunEntityHousekeeping();
var qryMonth = GraphClient.Cypher
.Merge("(mon:Month{ Value: {mnVal}})")
.OnCreate("mon").Set("mon = {newMonth}")
.With("mon")
.Start(new { yearNode })
.CreateUnique("(yearNode)-[:HAS_MONTH]->(mon)")
.WithParams(new { mnVal = newMonth.Value, newMonth = newMonth })
.Return(mon => mon.Node<Month>());
var monthNode = qryMonth.Results.Single();
var newDay = new Day { Name = date.Day.ToString(), Value = date.Day, Date = date.Date }.RunEntityHousekeeping();
var qryDay = GraphClient.Cypher
.Merge("(day:Day{ Value: {mnVal}})")
.OnCreate("day").Set("day = {newDay}")
.With("day")
.Start(new { monthNode })
.CreateUnique("(monthNode)-[:HAS_DAY]->(day)")
.WithParams(new { mnVal = newDay.Value, newDay = newDay })
.Return(day => day.Node<Day>());
var dayNode = qryDay.Results.Single();
Regards
Kiran
Nowhere on the documentation page does it say that CREATE UNIQUE has been deprecated.
MERGE is just a new approach that's available to you. It enables some new scenarios (matching based on labels, and ON CREATE and ON MATCH triggers) but also does not cover more complex scenarios (more than a single node).
It sounds like you're already familiar with CREATE UNIQUE. For now, I think you should still be using that.
It seems to me the picture of what you want your graph to look like has the order imposed by relationships, but your code models the order with nodes. If you want that graph, you will need to use relationship types like [2010], [2011] instead of a pattern like [HAS_YEAR]->({value:2010}).
Another way to say the same thing: you are trying to constitute uniqueness for a node intrinsically, by a combination of label and property, e.g. (unique:Day {value:4}). Assuming you have the relevant constraints, this would be database wide uniqueness, so only one fourth-day-of-the-month for all the months to share. What you want is extrinsic local uniqueness, uniqueness established and extended transitively by a hierarchy of relationships. Uniqueness for a node is then not in its internal properties but in its external 'position' or 'order' in relation to its parent. The locally unique pattern (month)-[:locally_unique_rel]->(day) is made unique for a wider scope when the month is made unique, and the month is made unique, not by property and label, but extrinsically by its 'order' or 'position' under its year. Hence the transitivity. I think this is a strength of modeling with graphs, among other things it allows you to continue to partition your structure. If for instance you want to split some of your days into AM and PM or into hours, you can easily do so.
So, in your graph, [HAS_DAY] makes all days equally related to their month, and cannot therefore be used to differentiate between them. You have solved this locally under a month, since the property value differentiates, but since the fourth-day-of-the-month in
(november)-[:HAS_DAY]->(4th)` and `(december)-[:HAS_DAY]->(4th)
are not distinct by property value or label, they are the same node in your graph. Locally, under a month say, unique nodes can be achieved equally with
[11]->()-[4]->(unique1), [11]->()-[5]->(unique2)
and
[HAS_MONTH]->({value:11})-[HAS_DAY]->(unique1 {value:4}),
[HAS_MONTH]->({value:11})-[HAS_DAY]->(unique2 {value:5})
The difference is that with the former extrinsic local uniqueness, you have the benefit of transitivity. Since the months are unique in a year, as (november) in [11]->(november) is locally unique, therefore the days of November are also unique in that year - the (fourth) node is distinct between
[11]->(november)-[4]->(fourth)
and
[12]-(december)->[4]->(fourth)
What this amounts to is transferring more of your semantic model to your relationships, leaving the nodes for storing data. The node identifiers in the picture you posted are only pedagogical, replacing them with x,y,z or empty parentheses would perhaps better reveal the structure or scaffolding of the graph.
If you want to keep the relationship types intact, adding an ordering property to each relationship to create a pattern like (november)-[:HAS_DAY {order:4}]->(4th) will also work. This may be less performant for querying, but you may have other concerns that make it worth it.
This code allows you to create calendar graphs on demand upon creation of an event for a specific day. You'll want to modify it to allow events on multiple days, but it seems more like your issue is creating unique paths, right? And you'd probably want to modify this to use parameters in your language of choice.
First I create the root:
CREATE (r:Root {id:'root'})
Then use this reusable MERGE query to successively match or create subgraphs for the calendar. I pass along the root so I can display the graph at the end:
MATCH (r:Root)
MERGE r-[:HAS_CAL]->(cal:Calendar {id:'General'})
WITH r,cal MERGE (cal)-[:HAS_YEAR]->(y:Year {id:2011})
WITH r,y MERGE (y)-[:HAS_MONTH]->(m:Month {id:'Jan'})
WITH r,m MERGE (m)-[:HAS_DAY]->(d:Day {id:1})
CREATE d-[:SCHEDULED_EVENT]->(e:Event {id:'ev3', t:timestamp()})
RETURN r-[*1..5]-()
Creates a graph like this when called multiple times:
Does this help?

Resources