Neo4J Insertion taking time - neo4j

I have a query which is taking the long time to insert in neo4j roughly the query looks like following :
create index on :symaccess_symdev(dir_port);
create index on :symaccess_symdev(host_lun);
create index on :symaccess_symdev(ini_tiator_group_name);
create index on :symaccess_symdev(sym_dev);
CALL apoc.load.json('file:////root/output/1530115956414/dev.json') YIELD
value AS row UNWIND row.symdev AS symdevs
MERGE (accesssymdev:symaccess_symdev {
sym_dev: symdevs.sym_dev,
host_lun: symdevs.host_lun,
ini_tiator_group_name: symdevs.ini_tiator_group_name,
dir_port: symdevs.dir_port
})
ON CREATE SET
accesssymdev.attr_percentage = symdevs.attr_percentage,
accesssymdev.cap_mb = toFloat(symdevs.cap_mb),
accesssymdev.physicaldevicename = symdevs.physicaldevicename;

Assuming that the sym_dev property value is unique for every symaccess_symdev node, then this query may be faster:
CALL apoc.load.json('file:////root/output/1530115956414/dev.json') YIELD
value AS row UNWIND row.symdev AS symdevs
MERGE (a:symaccess_symdev {sym_dev: symdevs.sym_dev})
ON CREATE SET
a.host_lun = symdevs.host_lun,
a.ini_tiator_group_name = symdevs.ini_tiator_group_name,
a.dir_port = symdevs.dir_port,
a.attr_percentage = symdevs.attr_percentage,
a.cap_mb = toFloat(symdevs.cap_mb),
a.physicaldevicename = symdevs.physicaldevicename;
A MERGE will only use at most one index, so your current query will cause the Cypher planner to pick one index (out of the 4 that are applicable). After using that index to generate a set of candidate nodes, it would still need to check the other 3 properties for each candidate node. If it had picked an index that is not very selective (because there tends to be many nodes with the same property value), then a lot of work would need to be done per MERGE.
Assuming that the sym_dev property value is unique, the above query simplifies the MERGE so that it will quickly discover whether the wanted symaccess_symdev node existed, and without needing to check any other properties.

Related

Neo4J Match & Set Multiple Relationships/Nodes in One Query

I'm currently running the following query to update the properties on two nodes and relationships.
I'd like to be able to update 1,000 nodes and the corresponding relationships in one query.
MATCH (p1:Person)-[r1:OWNS_CAR]->(c1:Car) WHERE id(r1) = 3018
MATCH (p2:Person)-[r2:OWNS_CAR]->(c2:Car) WHERE id(r2) = 3019
SET c1.serial_number = 'SERIAL027436', c1.signature = 'SIGNATURE728934',
r1.serial_number = 'SERIAL78765', r1.signature = 'SIGNATURE749532',
c2.serial_number = 'SERIAL027436', c2.signature = 'SIGNATURE728934',
r2.serial_number = 'SERIAL78765', r2.signature = 'SIGNATURE749532'
This query has issues when you run it in larger quantities. Is there a better way?
Thank you.
You could work with a LOAD CSV. Your input would contain the keys (not the ids, using the ids is not recommended) for Person and Car and whatever properties you need to set. For example
personId, carId, serial_number, signature
00001, 00045, SERIAL78765, SIGNATURE728934
00002, 00046, SERIAL78665, SIGNATURE724934
Your query would then be something like :
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///input.csv' AS row
MATCH (p:Person {personId: row.PersonId})-[r:OWNS_CAR]->(c:Car {carId: row.carId})
SET r.serial_number = row.serialnumber, c.signature = row.signature
Note that you should have unique constraints on Person and Car to make that work. You can do thousands (even millions) like that very quickly ...
Hope this helps,
Tom

Neo4j: Maintaining node counts

I'm investigating the use of Neo4j to detect potentially fraudulent card transactions in near real time. I receive details of a customer and a card they've just used from our on-line systems. What I'm trying to do here is create new nodes for the customer and card if they don't exist, then establish the relationship between them.
Whenever the customer uses the card I want to set the time the card was last used, in addition, if this is the first time this customer-->card relationship has been seen, update totals of the number of cards the customer is associated with and the number of customers associated with the card.
The Cypher below seems to work, however I think it will re-evaluate the counts every time the relationship is seen, not just on the create. Is it possible to use the ON MATCH and ON CREATE in this statement to limit the unnecessary processing?
MERGE (c:customers {customer_id:"12345678"})
MERGE (a:cards {card_hash:"45uIic..."})
MERGE (c)-[r:has_card]->(a)
set r.last_transaction = "30-NOV-2016 07:58:42"
set a.card_ct = size(()-[:has_card]->(a))
set c.card_count = size((c)-[:has_card]->())
I'm running this from Python (using py2neo), I also want to return something back that will allow me to kick off a bespoke dijkstra based search of the neighborhood. Any ideas how I'd return some variable based on whether this was a new or existing relationship?
There is no need you to even have the card_ct or card_count properties.
Since neo4j 2.1, getting a count of the number of relationships of a specific type from a node is very efficient. So, every time you need a count, just use SIZE(()-[:has_card]->(node)) or SIZE((node)-[:has_card]->()).
How about something like this. Create a counter on a MATCH and if the counter is greater than zero then it is an existing relationship. Otherwise it is a new relationship.
MERGE (c:customers {customer_id:"12345678"})
MERGE (a:cards {card_hash:"45uIic..."})
MERGE (c)-[r:has_card]->(a)
ON MATCH SET r.num = coalesce(r.num, 0) + 1
set r.last_transaction = "30-NOV-2016 07:58:42"
set a.card_ct = size(()-[:has_card]->(a))
set c.card_count = size((c)-[:has_card]->())
RETURN
CASE
WHEN r.num > 0 THEN false
ELSE true
END as new_relationship
Here's the Cypher I've ended up with, thanks to Dave Bennett for his suggestion. I also realised that I don't need to initiate any further analysis if only 1 customer is associated with 1 card so I've excluded this as well.
MERGE (c:customers {customer_id:"12345678"})
MERGE (a:cards {card_hash:"BFgn..."})
MERGE (c)-[r:has_card]->(a)
ON CREATE SET a.card_scheme = "VISA DEBIT"
, a.card_ct = size(()-[:has_card]->(a))
, c.card_count = size((c)-[:has_card]->())
ON MATCH SET r.ind = 1
SET r.last_transaction = "06-Dec-2016 11:19:13"
RETURN CASE WHEN exists(r.ind)
AND a.card_ct + c.card_count > 2
THEN false
ELSE true END as new_relationship

Merging a set of nodes onto one (only in query)

I am currently investigating how to model a bitemporal graph in neo4j. Unfortunately noone seems to have publicly undertaken this before.
One particular thing I am looking at is whether I can store in a new node only those values that have changed and then express a query that would merge all those values ordered by a given timestamp:
This creates the data I am playing with:
CREATE (:P1 {id: '1'})<-[:EXPANDS {date:5200, recorded:5100}]-(:P1Data {name:'Joe', wage: 3000})
// New data, recorded 2014-10-1 for 2015-1-1
MATCH (p:P1 {id: '1'}) CREATE (:P1Data { wage:3100 })-[:EXPANDS { date:5479, recorded: 5387}]->(p)
Now, I can get a history for a given point in time so far, e.g. like
MATCH (:P1 { id: '1' })<-[x:EXPANDS]-(d:P1Data)
WHERE x.recorded < 6000
WITH {date: x.date, data:d} as data
RETURN data
ORDER BY data.date DESC
What I would like to achieve is to merge the name and wage values such that I get a whole view of the data at a given point in time. The answer may also be that this is not really possible.
(PS: I say only in query, because I found a refactor function in apoc which does merge nodes, but that procedure actually merges and persists the node, while I would just want to query it).
As with most things, you can do it using REDUCE like so:
MATCH (:P1 { id: '1' })<-[x:EXPANDS]-(d:P1Data)
WITH x.date AS date, d AS data
ORDER BY date
WITH COLLECT(data) AS datas
WITH REDUCE(s = {}, y IN datas|
{name: COALESCE(y.name, s.name),
wage: COALESCE(y.wage, s.wage)})
AS most_recent_fields
RETURN most_recent_fields.name AS name, most_recent_fields.wage AS wage
You can do it in descending order instead (swap s and y inside the COALESCE statements if so), but there isn't really a way to shortcut processing the entire set of results from your queried time back to the start.
UPDATE: This will, of course, generate a Map and not a Node, but if you only want the properties and don't want to create a permanent record, a Map is actually better suited to your needs.
EXTENDED: If you don't want to specify which keys to use, you can do it without REDUCE like this instead:
MATCH (:P1 { id: '1' })<-[x:EXPANDS]-(d:P1Data)
WITH x.date AS date, d AS data
ORDER BY date
WITH COLLECT(data) AS datas
CREATE (t:Temp)
FOREACH(data IN datas|
SET t += data)
DELETE t
RETURN t
This approach does create a node, but if you DELETE it right before you RETURN it, it won't persist at all. += ensures that pre-existing properties aren't removed, only overwritten if the data node has existing values.

How to use Create Unique with Cypher

My target is to create node + set new property to it in case not exists
if it's exist I just want to update it's property
Tried this:
MATCH (user:C9 {userId:'44'})
CREATE UNIQUE (user{timestamp:'1111'})
RETURN user
*in case the node with the property userId=44 already existed I just want to set it's property into 1111 else just create it and set it.
error I am getting:
user already declared (line 2, column 16 (offset: 46))
"CREATE UNIQUE (user{timestamp:'1111'})"
should I switch to Merge or?
thanks.
Yes, you should use the MERGE statement.
MERGE (user:C9 {userId:'44'})
// you can set some initial properties when the node is created if required
//ON CREATE SET user.propertykey = 'propertyvalue'
ON MATCH SET user.timestamp = '1111'
RETURN user
You mention unique constraints - I assume you have one set up. You definitely should do to prevent duplicate nodes being created. It will also create a schema index to improve the performance of your node lookup. If you do not yet have a unique constraint then it can be created like so
CREATE CONSTRAINT ON (u:C9) ASSERT u.userId IS UNIQUE
See the Neo4j MERGE documentation.
Finally, to understand what is going on in your query let's have a quick look line by line.
MATCH (user:C9 { userId:'44' })
This matches the node with label :C9 that has a userId property with value 44 and assigns it the identifier user.
CREATE UNIQUE (user{timestamp:'1111'})
This line is simply trying to create a new node with no label and a property timestamp with value '1111'. The exception you are seeing is a result of you using the same user identifier that has already been used in the first line. However, this is not a supported way to use CREATE UNIQUE as it requires a match first and will then create bits of the pattern that does not exist. The upside of this is that it is stopping this unwanted node (user{timestamp:'1111'}) being created in the graph.
RETURN user
This line is pretty self explanatory and is not being reached.
EDIT
There seems to be some confusion surrounding CREATE UNIQUE and when it should be used. This query
CREATE UNIQUE (user:C9 {timestamp:'1111'})
will fail with the message
This pattern is not supported for CREATE UNIQUE
To use CREATE UNIQUE you would first match an existing node and then use that to create a unique pattern in the graph. So to create a relationship from user to a second node you would use
MATCH (user:C9 { userId: '44' }
CREATE UNIQUE (user)-[r:FOO]-(bar)
RETURN r
If there is no relationships of type FOO from user then a new node will be created to represent bar and relationship of type :FOO will be created between them. Conversely, if the MATCH statement does not make a match then no nodes or relationships will be created.

Neo4j: Sum relationship properties where node properties equal Value A and Value B (intersection)

Basically my question is: how do I sum relationship properties where there is a related nodes that have properties equal to Value A and Value B?
For example:
I have a simple DB has the following relationship:
(site)-[:HAS_MEMBER]->(user)-[:POSTED]->(status)-[:TAGGED_WITH]->(tag)
On [:TAGGED_WITH] I have a property called "TimeSpent". I can easily SUM up all the time spent for a particular day and user by using the following query:
MATCH (user)-[:POSTED]->(updates)-[r:TAGGED_WITH]->(tags)
WHERE user.name = "Josh Barker" AND updates.date = 20141120
RETURN tags.name, SUM(r.TimeSpent) as totalTimeSpent;
This returns to me a nice table with tags and associated time spent on each. (i.e. #Meeting 4.5). However, the question arises if I want to do some advanced searches and say "Show me all the meetings for ProjectA" (i.e. #Meeting #ProjectA). Basically, I am looking for a query that I can get all of the relationships where a single status has BOTH tags (and only if it has both). Then I can SUM that number up to get a count for how many meetings I spent in #ProjectA.
How do I do this?
MATCH (updates)-[r:TAGGED_WITH]->(tag1 {name: 'Meeting'}),
(updates)-[r:TAGGED_WITH]->(tag2 {name: 'ProjectA'})
RETURN SUM(r.TimeSpent) as totalTimeSpent, count(updates);
This should find all updates tagged with both of those things, and sum all time spent across all of those updates.
To create a generic solution where you may want one or more tags you could use something like this, passing in the array of tags as a parameter (and using the length of the array instead of the hard coded 2.
MATCH (user)-[:POSTED]->(update)-[r:TAGGED_WITH]->(tag)
WHERE user.name = "Josh Barker" AND updates.date = 20141120 AND tag.name IN ['Meeting', 'ProjectA']
WITH update, SUM(r.TimeSpent) AS totalTimeSpent, COLLECT(tag) AS tags
WHERE LENGTH(tags) = 2
RETURN update, totalTtimeSpent
As long as tag.name is indexed, this should be fast.
Edit - Remove User constraint
MATCH (update)-[r:TAGGED_WITH]->(tag)
WHERE tag.name IN ['Meeting', 'ProjectA']
WITH update, SUM(r.TimeSpent) AS totalTimeSpent, COLLECT(tag) AS tags
WHERE LENGTH(tags) = 2
RETURN update, totalTtimeSpent

Resources