Retrieve last node in list for further processing

Retrieve last node in list for further processing - neo4j

I trying to set up a scheme for web-clicks, where each node is a (:Click), which links to the click that precedes it by a [:PREV]-edge and the (:Session) that owns it by a [:GEN]-edge. In the end this should happen procedural, a new transaction/insert when a new click is made. While I have no problem generating the involved objects, I cannot figure out how to dynamically select last (:Click) and link it to the current created one.
Generate a session with 2 clicks:
CREATE (s:Session {name:'S0'})
CREATE (c1:Click {name:'C1', click:1}), (c1)<-[:GEN]-(s)
CREATE (c2:Click {name:'C2', click:2}), (c2)<-[:GEN]-(s), (c1)<-[:PREV]-(c2);
generate one other click in separated transaction:
MERGE (s:Session {name:'S0'})
CREATE (c3:Click {name:'C3', click:3}),
(c3)<-[:GEN]-(s) //(c2)<-[:PREV]-(c3);
for the commented out link, I cannot use the c2-variable as it is scope-local to the previous transaction.
Now I thought to try something like this to dynamically find the last generated node on the same session and link it
MERGE (s:Session {name:'S0'})
CREATE (c3:Click {name:'C3', click:3}), (c3)<-[:GEN]-(s)
MATCH (s)-[:GEN]->(c_prevs:Click)
WITH c_prevs
ORDER BY c_prevs.click DESC LIMIT 1
CREATE (head(c_prevs))<-[:PREV]-(c3)
Unfortunately this won't work for me with any Cypher-construct I came up with so far.

If I understand you can get the last :Click node on the same session this way:
match (:Session {name:'S0'})-[:GEN]->(c:Click)
where not (:Click)-[:PREV]->(c)
return c
That is: Get the node from the same session that does not have an incoming [PREV] relationship. Will return c2
╒═══════════════════════╕
│"c" │
╞═══════════════════════╡
│{"name":"C2","click":2}│
└───────────────────────┘
For your specific case a query like the following should work:
merge (s:Session {name:'S0'})
with s
match (s)-[:GEN]->(last:Click)
where not (:Click)-[:PREV]->(last)
create (c3:Click {name:'C3', click:3}),
(c3)<-[:GEN]-(s),
(last)<-[:PREV]-(c3)

I found the answer to my question to be the following
MATCH (s:Session {name:'S0'})
CREATE (c3:Click {name:'C3', click:3})
WITH s, c3
MATCH (s)-[:GEN]->(c_prev:Click)
WITH c_prev, c3, s
ORDER BY c_prev.click DESC LIMIT 1
WITH c_prev, c3, s
CREATE (c_prev)<-[:PREV]-(c3), (c3)<-[:GEN]-(s)
which is chaining through the nodes as variables s, c3 and last_c with the WITH keyword. Unfortunately this involves a lot of repetition, as every WITH in principle is a part-separator in the query, so I learned.
This also allows to carry over already MERGED/CREATED nodes, which might help to ensure their existence.
EDIT:
This problem seems to be even more complicated if clicks should be generated prozedural, thus using one cypher-statement to insert and link any click.
my solution looks like the following
MERGE (s:Session {name: $session_name})
WITH s
CREATE (c:Click {name: $click_name, click: $click_count})
WITH s, c
OPTIONAL MATCH (s)-[:GEN]->(c_prev:Click)
WITH c_prev, c, s
ORDER BY c_prev.click DESC LIMIT 1
WITH c_prev, c, s
FOREACH (o IN CASE WHEN c_prev IS NOT NULL THEN ['1'] ELSE [] END |
CREATE (c_prev)<-[:PREV]-(c)
)
WITH s, c
CREATE (c)<-[:GEN]-(s)
with executing this statement for {$session_name, $click_name, $click_count} =[{'AAA', 'C1', 1}, {'AAA', 'C2', 2}, {'AAA', 'C3', 3}].
Notice that I had to work around the returning empty node-list by explicitly catching this condition and then not executing the subsequent connection statement with the FOREACH-loop on an empty list. This does not only look very ugly, I sincerely think there should be a better way to expressively specify this desired behavior through Cypher in the near future.

Related

Correct order of operations in neo4j - LOAD, MERGE, MATCH, WITH, SET

I am loading simple csv data into neo4j. The data is simple as follows :-
uniqueId compound value category
ACT12_M_609 mesulfen 21 carbon
ACT12_M_609 MNAF 23 carbon
ACT12_M_609 nifluridide 20 suphate
ACT12_M_609 sulfur 23 carbon
I am loading the data from the URL using the following query -
LOAD CSV WITH HEADERS
FROM "url"
AS row
MERGE( t: Transaction { transactionId: row.uniqueId })
MERGE(c:Compound {name: row.compound})
MERGE (t)-[r:CONTAINS]->(c)
ON CREATE SET c.category= row.category
ON CREATE SET r.price =row.value
Next I do the aggregation to count total orders for a compound and create property for a node in the following way -
MATCH (c:Compound) <-[:CONTAINS]- (t:Transaction)
with c.name as name, count( distinct t.transactionId) as ord
set c.orders = ord
So far so good. I can accomplish what I want but I have the following 2 questions -
How can I create the orders property for compound node in the first step itself? .i.e. when I am loading the data I would like to perform the aggregation straight away.
For a compound node I am also setting the property for category. Theoretically, it can also be modelled as category -contains-> compound by creating Categorynode. But what advantage will I have if I do it? Because I can execute the queries and get the expected output without creating this additional node.
Thank you for your answer.

I don't think that's possible, LOAD CSV goes over one row at a time, so at row 1, it doesn't know how many more rows will follow.
I guess you could create virtual nodes and relationships, aggregate those and then use those to create the real nodes, but that would be way more complicated. Virtual Nodes/Rels
That depends on the questions/queries you want to ask.
A graph database is optimised for following relationships, so if you often do a query where the category is a criteria (e.g. MATCH (c: Category {category_id: 12})-[r]-(:Compound) ), it might be more performant to create a label for it.
If you just want to get the category in the results (e.g. RETURN compound.category), then it's fine as a property.

Neo4j Performance for large dataset

I am trying to load large dataset into neo4j-3 and looking for the options. I found one neo4j-import but the problem with that is it is for initial load only. I have to load 2M records around every week.
I tried loading through shell but having some performance issue, I tried following.
1) Creating constraint upfront.
2) Creating Node and relationships in separate query.
3) Heap space 8G
4) dbms.memory.pagecache 4G
Many times the import just hangs and does nothing for hours.
Edit - CSV load being executed:
USING PERIODIC COMMIT 5000
LOAD CSV WITH HEADERS
FROM "file:///my_sds_39_joe.csv"
AS row
OPTIONAL MATCH (per:Person {UID : "Person."+row.player_cardnum})
WHERE per IS NULL
MERGE (p:Person {CardNumber : row.player_cardnum})
ON CREATE SET p.Creation Date = timestamp(), p.Modification Date = timestamp() ;

EDIT
On a second look, seems like you're trying to implement some kind of conditional logic to your insert.
It looks like what you're trying to do is figure out if a :Person exists with a UID (derived from some concatenation with row.player_cardnum), and in the case where that :Person doesn't exist and the match fails, MERGE a :Person with the CardNumber given by row.player_cardnum.
If this is your goal, you're ALMOST there with your query. The problem is with your WHERE clause.
Understand that WHERE clauses are linked with a preceding MATCH, OPTIONAL MATCH, or WITH, and only affects the linked clause.
With that WHERE on that OPTIONAL MATCH, per will always be null, but more importantly, your row will still exist, and the following MERGE will ALWAYS take place for all rows in the CSV. This is probably the source of your slowdown, as it's creating new :Person nodes for all rows.
If you're trying to null out the row completely when the OPTIONAL MATCH hits on an existing :Person (so the MERGE won't happen in that case), you'll need to add a WITH clause, and make sure your WHERE clause is applied to it instead of the OPTIONAL MATCH.
Additionally, make sure that you have either unique constraints or indexes on Person.UID and Person.CardNumber. As for the UID match, I've heard that indexes are not used when there's some kind of string concatenation of the thing you're matching upon, so you may need to assemble it first and pass it in with a WITH.
Your final query would look like this:
USING PERIODIC COMMIT 5000
LOAD CSV WITH HEADERS
FROM "file:///my_sds_39_joe.csv"
AS row
// first build the UID so we can take advantage of the index
WITH row, "Person." + row.player_cardnum AS UID
OPTIONAL MATCH (per:Person {UID : UID})
// the WHERE now applies to the WITH, which will filter out and null out the row when an OPTIONAL MATCH is found
WITH row, per
WHERE per IS NULL
MERGE (p:Person {CardNumber : row.player_cardnum})
ON CREATE SET p.Creation Date = timestamp(), p.Modification Date = timestamp() ;

How to set all relationships of a certain type -- replacing the old ones if necessary -- in Neo4j?

I have a node id for an event, and list of node ids for users that are hosting the event. I want to update these (:USER)-[:HOSTS]->(:EVENT) relationships. I dont just want to add the new ones, I want to remove the old ones as well.
NOTE: this is coffeescript syntax where #{} is string interpolation, and str() will escape any characters for me.
Right now I'm querying all the hosts:
MATCH (u:USER)-[:HOSTS]->(:EVENT {id:#{str(eventId)}})
RETURN u.id
Then I'm determining which hosts are new and need to be added and which ones are old and need to be removed. For the old ones, I remove them
MATCH (:HOST {id:#{str(host.id)}})-[h:HOSTS]->(:EVENT {id:#{str(eventId)}})
DELETE h
And for the new ones, I add them:
MATCH (e:EVENT {id: #{str(eventId)}})
MERGE (u:USER {id:#{str(id)}})
SET u.name =#{str(name)}
MERGE (u)-[:HOSTS]->(e)
So my question is, can I do this more efficiently all in one query? I want want to set the new relationships, getting rid of any previous relationships that arent in the new set.

If I understand your question correctly, you can achieve your objective in a single query by introducing WITH and FOREACH. On a sample graph created by
CREATE (_1:User { name:"Peter" }),(_2:User { name:"Paul" }),(_3:User { name:"Mary" })
CREATE (_4:Event { name:"End of the world" })
CREATE _1-[:HOSTS]->_4, _2-[:HOSTS]->_4
you can remove the no longer relevant hosts, and add the new hosts, as such
WITH ["Peter", "Mary"] AS hosts, "End of the world" AS eventId
MATCH (event:Event { name:eventId })<-[r:HOSTS]-(u:User)
WHERE NOT u.name IN hosts
DELETE r
WITH COLLECT(u.name) AS oldHosts, hosts, event
WITH FILTER(h IN hosts
WHERE NOT h IN oldHosts) AS newHosts, event, oldHosts
FOREACH (n IN newHosts |
MERGE (nh:User { name:n })
MERGE nh-[:HOSTS]->event
)
I have made some assumptions, at least including
The new host (:User) of the event may already exists, therefore MERGE (nh:User { name:n }) and not CREATE.
The old [:HOSTS]s should be disconnected from the event, but not removed from the database.
Your coffee script stuff can be translated into parameters, and you can translate my pseudo-parameters into parameters. In my sample query I simulate parameters with the first line, but you may need to adapt the syntax according to how you actually pass the parameters to the query (I can't turn Coffee into Cypher).
Click here to test the query. Change the contents of the hosts array to ["Peter", "Paul"], or to ["Peter", "Dragon"], or whatever value makes sense to you, and rerun the query to see how it works. I've used name rather than id to catch the nodes, and again, I've simulated parameters, but you might be able to translate the query to the context from which you want to execute it.
Edit:
Re comment, if you want the query to also match events that don't have any hosts you need to make the -[:HOSTS]- part of the pattern optional. Do so by braking the MATCH clause in two:
MATCH (event:Event { name:eventId })
OPTIONAL MATCH event<-[r:HOSTS]-(u:User)
The rest of the query is the same.

How to correctly use conditionals like IF or CASE in Cypher query language (Neo4J) to successfully create relationships?

I failed to create relationships in Neo4J and I would like to encourage anyone who has sucessfully done it to help me.
The desired result is to have a detailed visualisation of who is a brother to whom, who is who's mother and so on. I want to extract the data from single parent-child relationships. That means, setting a relationship like [:relatedTo {:how['daughter']}] if a node has a parent whose name corresponds to the field node.name and the gender of the node is F.
I have my CSV file that looks like this.
1;Jakub Hančin;M;1994;4;3
2;Hana Hančinová;F;1991;4;3
3;Alojz Hančin jr.;M;1968;15;14
4;Viera Hančinová;F;1968;9;
5;Miroslav Barus sr.;M;1965;9;
6;Helena Barusová;F;1942;;
7;Miroslav Barus jr.;M;1995;6;5
8;Martin Barus;M;1991;6;5
9;Hedviga Barusová;F;1945;;
10;Peter Hančin jr.;M;1991;12;13
11;Zuzka Hančinová;F;1996;12;13
12;Andrea Hančinová;F;1966;;
13;Peter Hančin sr.;M;1965;15;14
14;Alojz Hančin sr.;M;1937;;
15;Anna Hančinová;F;1945;;
This is my personal family tree and I would like to visualize it through Neo4J.
It is a file created with Excel, where I put the information into a table and create a database. Then it was converted to .csv file which is importable into Neo4J. I have sucessfully installed it and now I am at the point of writing the Cypher script to manage it. So far, I have this:
LOAD CSV WITH HEADERS FROM "file:c:/users/Skelo/Desktop/Family Database/Family Database CSV UTF.txt" AS row FIELDTERMINATOR ';'
CREATE (n:Person)
SET n = row, n.name = row.name,
n.personID = toInt(row.personID) , n.G = row.G,
n.Year = toInt(row.Year), n.Parent1 = row.Parent1, n.Parent2 = row.Parent2
WITH n
MATCH(n:Person),(b:Person)
WHERE n.Parent1 = b.name OR n.Parent2 = b.name
CASE b.gender
WHEN b.gender = 'F' THEN
CREATE (b)-[:isRelatedTo{how:['mother']}]->(n)
WHEN b.gender = 'M' THEN
CREATE (b)-[:isRelatedTo{how:['father']}]->(n)
RETURN *
The error message shown looks like this.
Invalid input 'A': expected 'r/R' (line 11, column 2 (offset: 389))
"CASE b.gender"
^
Somehow, I can't figure out why this does not work. Why can't I use the Case command? The Neo4J does not allow me to use anything but the command CREATE (it expects a letter R after C and not an A, this means the CREATE command).
Again, I want to do this. I have a few nodes that are correctly set. For each of those nodes (they represent people), I want to look into the Parent1 and Parent2 fields and to look for a node that has the same name as one of these fields. If it matches one of these, I want to mark that node as a father or a mother to the previous node (judging by the gender of the node, which represents the person).
This way I would like to fill the graph database with many relationships, but I fail at this very basic step. Please help me. If you can, please do not only say what is wrong and why it is wrong, but present a solution that works.

Since you want to create the isRelatedTo relationship regardless of gender and only the property is dependent upon a conditional, do this:
CREATE (b)-[r:isRelatedTo]->(n)
SET r.how = CASE b.gender WHEN 'F' THEN 'mother' ELSE 'father' END

Cypher query - Optional Create

I am trying to create a social network-like structure.
I would like to create a timeline of posts which looks like this
(user:Person)-[:POSTED]->(p1:POST)-[:PREV]->[p2:POST]...
My problem is the following.
Assuming a post for a user already exists, I can create a new post by executing the following cypher query
MATCH (user:Person {id:#id})-[rel:POSTED]->(prev_post:POST)
DELETE rel
CREATE (user)-[:POSTED]->(post:POST {post:"#post", created:timestamp()}),
(post)-[:PREV]->(prev_post);
Assuming, the user has not created a post yet, this query fails. So I tried to somehow include both cases (user has no posts / user has at least one post) in one update query (I would like to insert a new post in the "post timeline")
MATCH (user:Person {id:"#id"})
OPTIONAL MATCH (user)-[rel:POSTED]->(prev_post:POST)
CREATE (post:POST {post:"#post2", created:timestamp()})
FOREACH (o IN CASE WHEN rel IS NOT NULL THEN [rel] ELSE [] END |
DELETE rel
)
FOREACH (o IN CASE WHEN prev_post IS NOT NULL THEN [prev_post] ELSE [] END |
CREATE (post)-[:PREV]->(o)
)
MERGE (user)-[:POSTED]->(post)
Is there any kind of if-statement (or some type of CREATE IF NOT NULL) to avoid using a foreach loop two times (the query looks a litte bit complicated and I know that the loop will only run 1 time)?.
However, this was the only solution, I could come up with after studying this SO post. I read in an older post that there is no such thing as an if-statement.
EDIT: The question is: Is it even good to include both cases in one query since I know that the "no-post case" will only occur once and that all other cases are "at least one post"?
Cheers

I've seen a solution to cases like this in some articles. To use a single query for all cases, you could create a special terminating node for the list of posts. A person with no posts would be like:
(:Person)-[:POSTED]->(:PostListEnd)
Now in all cases you can run the query:
MATCH (user:Person {id:#id})-[rel:POSTED]->(prev_post)
DELETE rel
CREATE (user)-[:POSTED]->(post:POST {post:"#post", created:timestamp()}),
(post)-[:PREV]->(prev_post);
Note that the no label is specified for prev_post, so it can match either (:POST) or (:PostListEnd).
After running the query, a person with 1 post will be like:
(:Person)-[:POSTED]->(:POST)-[:PREV]->(:PostListEnd)
Since the PostListEnd node has no info of its own, you can have the same one node for all your users.

I also do not see a better solution than using FOREACH.
However, I think I can make your query a bit more efficient. My solution essentially merges the 2 FOREACH tests into 1, since prev_postand rel must either be both NULL or both non-NULL. It also combines the CREATE and the MERGE (which should have been a CREATE, anyway).
MATCH (user:Person {id:"#id"})
OPTIONAL MATCH (user)-[rel:POSTED]->(prev_post:POST)
CREATE (user)-[:POSTED]->(post:POST {post:"#post2", created:timestamp()})
FOREACH (o IN CASE WHEN prev_post IS NOT NULL THEN [prev_post] ELSE [] END |
DELETE rel
CREATE (post)-[:PREV]->(o)
)

In the Neo4j v3.2 developer manual it specifies how you can create essentially a composite key made of multiple node properties at this link:
CREATE CONSTRAINT ON (n:Person) ASSERT (n.firstname, n.surname) IS NODE KEY
However, this is only available for the Enterprise Edition, not Community.

"CASE" is as close to an if-statement as you're going to get, I think.
The FOREACH probably isn't so bad given that you're likely limited in scope. But I see no particular downside to separating the query into two, especially to keep it readable and given the operations are fairly small.
Just my two cents.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Retrieve last node in list for further processing - neo4j

Related

Correct order of operations in neo4j - LOAD, MERGE, MATCH, WITH, SET

Neo4j Performance for large dataset

How to set all relationships of a certain type -- replacing the old ones if necessary -- in Neo4j?

How to correctly use conditionals like IF or CASE in Cypher query language (Neo4J) to successfully create relationships?

Cypher query - Optional Create

Categories

Resources