Neo4j: Sequence of Events as Nodes Not working - neo4j

I am new to cypher query syntax and tried different types of syntax/relationship to build sequence graph. My data contains group_id and within each group_id a code occurs based on the 'number'. Lowest number is the first sequence and highest number is the last sequence per group id. I am able to load the data from csv and create nodes with properties, however it is not letting me convert to numerical sequence for 'code' nodes. I am reading/referencing this article : this tutorial. Is there special cypher syntax to use to achieve this result?
Sample Data:
group_id,code,date,number
123,abc,2/18/21,4
123,def,11/11/20,3
123,ghi,11/10/20,2
123,jkl,10/1/20,1
456,gtg,11/28/20,5
456,abc,10/30/20,4
456,def,10/5/20,3
456,jkl,10/1/20,2
456,uuu,10/1/20,1
My Code to load data:
LOAD CSV WITH HEADERS FROM "file:///sample2.csv" AS row
WITH row
WHERE row.group_id IS NOT NULL
MERGE (g:group_id {group_id: row.group_id});
LOAD CSV WITH HEADERS FROM "file:///sample2.csv" AS row
WITH row
WHERE row.code IS NOT NULL
MERGE (c:code {code: row.code})
ON CREATE SET c.number = row.number,
c.date = row.date;
Here is what I have tried:
// Building relationship
LOAD CSV WITH HEADERS FROM "file:///sample2.csv" AS row
WITH row
MATCH (g:group_id {group_id: row.group_id})
MATCH (c:code {code: row.code})
MERGE (g)-[:GROUPS]->(c) // Connects ALL codes to group id, but how to connect to 'code' and 'number' sequentially?
MERGE (c:{code: row.number})-[:NEXT]->(c) // Neo.ClientError.Statement.SyntaxError
I have gotten result:
I am trying to get this.

This will be a two step process. First the initial loading of the data as you have outlined. Then an enhancement in which you create the NEXT relationships. We do this in healthcare analytics of patient journeys or trajectories. By analogy, your yellow nodes might be a patient and the blue one an encounter. So each patient has a sequence of encounters.
You can query and sort by the date or other ordering variable. For example, collect a sorted list of encounters:
match (e:encounter) with e order by e.enc_date with e.subjectId as sid,collect(distinct e.enc_date) as eo return sid,size(eo) as ct,eo
I used this in some python code to then iterate through the collection to create the enc_seq edge, equivalent to your NEXT:
> dfeo = Neo4jLib.CypherToPandas("match (e:encounter) with e order by e.enc_date with e.subjectId as sid,collect(distinct e.enc_date) as eo return sid,size(eo) as ct,eo",'ppmi')
csv = dfeo.to_csv(index=False).split('\n')
cts=0
sw = open("c:\\temp\\error.txt","a")
for i in range(1,len(dfeo)):
cc = csv[i].split(',')
for j in range(0,int(cc[1])-1):
try:
q= "match (e1:encounter{subjectId:" + str(dfeo['sid'][i]) + ",enc_date:date('" + str(dfeo['eo'][i][j]) + "')}) match (e2:encounter{subjectId:" + str(dfeo['sid'][i]) + ",enc_date:date('" + str(dfeo['eo'][i][j+1]) + "')}) merge (e1)-[r:enc_seq{subjectId:" + str(dfeo['sid'][i]) + ", seqCt:" + str(j) + "}]-(e2)"
Neo4jLib.CypherNoReturn(q,'ppmi')
except:
cts = cts + 1
sw.write(str(i) + ':' + str(j) + "\n"+ q + "\n")
print("exceptions: " + str(cts))
sw.flush()
sw.close()
You can probably do this within a cypher query using a WITH (each row) followed by a CALL to a function similar to my python code. For my purposes it was more convenient to use python.

Related

Correct order of operations in neo4j - LOAD, MERGE, MATCH, WITH, SET

I am loading simple csv data into neo4j. The data is simple as follows :-
uniqueId compound value category
ACT12_M_609 mesulfen 21 carbon
ACT12_M_609 MNAF 23 carbon
ACT12_M_609 nifluridide 20 suphate
ACT12_M_609 sulfur 23 carbon
I am loading the data from the URL using the following query -
LOAD CSV WITH HEADERS
FROM "url"
AS row
MERGE( t: Transaction { transactionId: row.uniqueId })
MERGE(c:Compound {name: row.compound})
MERGE (t)-[r:CONTAINS]->(c)
ON CREATE SET c.category= row.category
ON CREATE SET r.price =row.value
Next I do the aggregation to count total orders for a compound and create property for a node in the following way -
MATCH (c:Compound) <-[:CONTAINS]- (t:Transaction)
with c.name as name, count( distinct t.transactionId) as ord
set c.orders = ord
So far so good. I can accomplish what I want but I have the following 2 questions -
How can I create the orders property for compound node in the first step itself? .i.e. when I am loading the data I would like to perform the aggregation straight away.
For a compound node I am also setting the property for category. Theoretically, it can also be modelled as category -contains-> compound by creating Categorynode. But what advantage will I have if I do it? Because I can execute the queries and get the expected output without creating this additional node.
Thank you for your answer.
I don't think that's possible, LOAD CSV goes over one row at a time, so at row 1, it doesn't know how many more rows will follow.
I guess you could create virtual nodes and relationships, aggregate those and then use those to create the real nodes, but that would be way more complicated. Virtual Nodes/Rels
That depends on the questions/queries you want to ask.
A graph database is optimised for following relationships, so if you often do a query where the category is a criteria (e.g. MATCH (c: Category {category_id: 12})-[r]-(:Compound) ), it might be more performant to create a label for it.
If you just want to get the category in the results (e.g. RETURN compound.category), then it's fine as a property.

neo4j cypher create relation between two nodes based on an attribute String value

I have a Node with Label Experiment with an attribute called ExperimentName.
This ExperimentName is based on the concatenation of 3 different variables
"Condition (ExperimentProfile1) Dose"
Example :
Control diet (MOA77) LD
Control gavage(MOA66) HD
I have another Node called ExperimentMapper
it has 3 attributes :
- Condition
- ExperimentProfile
- Dose
I would like to create a Relation between Node Experiment and Node ExperimentMapper when experimentName is the results of the 3 attributes assembled.
I have tried to use Regex but the query was extremely slow and took forever..
Any help?
This is my cypher but it is taking forever despite me creating indexes
MATCH (mxpExperiment:MxpExperiment) OPTIONAL MATCH (otuExperimentMapper:OtuExperimentMapper)
WHERE mxpExperiment.name CONTAINS otuExperimentMapper.Condition
AND mxpExperiment.name CONTAINS otuExperimentMapper.Experiment
AND mxpExperiment.name CONTAINS otuExperimentMapper.dose
CREATE (mxpExperiment)-[:OTU_EXPERIMENT_MAPPER]->(otuExperimentMapper)
RETURN mxpExperiment, otuExperimentMapper
I think that you need to go from the side of the Experiment Mapper.
First you need to create an index:
CREATE INDEX ON :MxpExperiment(name)
Then the query can be as follows:
MATCH (otuExperimentMapper:OtuExperimentMapper)
WITH otuExperimentMapper,
otuExperimentMapper.Condition + ' (' +
otuExperimentMapper.Experiment + ') ' +
otuExperimentMapper.dose AS name
MATCH (mxpExperiment:MxpExperiment) WHERE mxpExperiment.name = name
MERGE (mxpExperiment)-[:OTU_EXPERIMENT_MAPPER]->(otuExperimentMapper)
RETURN mxpExperiment, otuExperimentMapper

Two values as DISTINCT in Neo4j query

I am sending a query to Neo4j database and I want to return only the items that have two numbers - tsneX and tsneY as a distinct 'point'. So, tsneX can be equal to tsneX, but the second number tsneY should not in that case. Here is my query:
MATCH (c:Cell)-[ex:EXPRESSES]->(g:Gene { geneName: "' + geneName + '" })
RETURN ex.expr, c.tsneX, c.tsneY;
So, I want smth like
MATCH (c:Cell)-[ex:EXPRESSES]->(g:Gene { geneName: "' + geneName + '" })
WITH DISTINCT (c.tsneX, c.tsneY) AS point
RETURN ex.expr, point;
Example:
ex.expr c.tsneX c.tsneY
1. 4 1.2 1.2
2. 5 2.1 3.3
3. 1 1.2 1.2
One of them - 1st or 3rd need to be dropped since their tsneX and tsneY coordinates are equal respectively to each other. So, I would want only 1st and 2nd to be returned but 3rd to be dropped since ex.expr is higher in the 1st one.
Any suggestions would be greatly appreciated.
So you want the pair of [c.tsneX, c.tsneY], and in the case there's more results with the same point you only want the higheset ex.expr. This should do the trick:
MATCH (c:Cell)-[ex:EXPRESSES]->(g:Gene { geneName: "' + geneName + '" })
RETURN [c.tsneX, c.tsneY] AS point, max(ex.expr) as expr
If you want the point as an object rather than a list, you can instead do:
{x:c.tsneX, y:c.tsneY} AS point
In either case, the max(ex.expr) aggregation function will ensure that the remaining non-aggregation value, point, is distinct.

Cypher slowly create relations with some type nodes

I have one type of node and one type of relationship.
USING PERIODIC COMMIT 500
load csv from 'http://host.int:8787/rel_import.csv' as line FIELDTERMINATOR ';'
match(c1)
with c1,line, trim(line[0]) as abs1, trim(line[1]) as abs2
match(c2)
where (c1.abs = abs1 and c2.abs = abs2) or (c1.abs = abs2 and c2.abs = abs1)
create (c1)-[rel:relations{abs1:line[0], abs2:line[1], kind:line[2],personId:line[3], rel_k1:line[4], rel_k2:line[5],contact:line[6], id:line[7]}]->(c2)
So, it was fast.
I div one type node (now five types, old type deleted, summary count of entities not changed), and have problem with speed creating relationship. structure of nodes not changed, indexes created for all types.
How do it right?
I think the problem is the where clause in your join is complex. I find complex where clauses on joins cause it to be really slow. Could you do: "where c1.abs <> c2.abs?"
Are you able to do something like this:
USING PERIODIC COMMIT 500
load csv from 'http://host.int:8787/rel_import.csv' as line FIELDTERMINATOR ';'
with line, trim(line[0]) as abs1, time(line[1] as abs2
match(c1{abs: abs1})
match(c2 {abs:abs2})
match c3 {abs: abs2})
match c4 {abs: abs1})
where c1.abs <> c2.abs and c3.abs <> c4.abs
create (c1)-[rel:relations{abs1:line[0], abs2:line[1], kind:line[2],personId:line[3], rel_k1:line[4], rel_k2:line[5],contact:line[6], id:line[7]}]->(c2)
create (c3)-[rel:relations{abs1:line[0], abs2:line[1], kind:line[2],personId:line[3], rel_k1:line[4], rel_k2:line[5],contact:line[6], id:line[7]}]->(c4)
If possible I'd break up the match c1, c2 and the match c3, c4 and run LOAD CSV twice, I find it's best when possible to do fewer steps within the LOAD CSV.

Too much time importing data and creating nodes

i have recently started with neo4j and graph databases.
I am using this Api to make the persistence of my model. I have everything done and working but my problems comes related to efficiency.
So first of all i will talk about the scenary. I have a couple of xml documents which translates to some nodes and relations between the, as i already read that this API still not support a batch insertion, i am creating the nodes and relations once a time.
This is the code i am using for creating a node:
var newEntry = new EntryNode { hash = incremento++.ToString() };
var result = client.Cypher
.Merge("(entry:EntryNode {hash: {_hash} })")
.OnCreate()
.Set("entry = {newEntry}")
.WithParams(new
{
_hash = newEntry.hash,
newEntry
})
.Return(entry => new
{
EntryNode = entry.As<Node<EntryNode>>()
});
As i get it takes time to create all the nodes, i do not understand why the time it takes to create one increments so fats. I have made some tests and am stuck at the point where creating an EntryNode the setence takes 0,2 seconds to resolve, but once it has reached 500 it has incremented to ~2 seconds.
I have also created an index on EntryNode(hash) manually on the console before inserting any data, and made test with both versions, with and without index.
Am i doing something wrong? is this time normal?
EDITED:
#Tatham
Thanks for the answer, really helped. Now i am using the foreach statement in the neo4jclient to create 1000 nodes in just 2 seconds.
On a related topic, now that i create the nodes this way i wanted to also create relationships. This is the code i am trying right now, but got some errors.
client.Cypher
.Match("(e:EntryNode)")
.Match("(p:EntryPointerNode)")
.ForEach("(n in {set} | " +
"FOREACH (e in (CASE WHEN e.hash = n.EntryHash THEN [e] END) " +
"FOREACH (p in pointers (CASE WHEN p.hash = n.PointerHash THEN [p] END) "+
"MERGE ((p)-[r:PointerToEntry]->(ee)) )))")
.WithParam("set", nodesSet)
.ExecuteWithoutResults();
What i want it to do is, given a list of pairs of strings, get the nodes (which are uniques) with the string value as the property "hash" and create a relationship between them. I have tried a couple of variants to do this query but i dont seem to find the solution.
Is this possible?
This approach is going to be very slow because you do a separate HTTP call to Neo4j for every node you are inserting. Each call is then a transaction. Finally, you are also returning the node back, which is probably a waste.
There are two options for doing this in batches instead.
From https://stackoverflow.com/a/21865110/211747, you can do something like this, where you pass in a set of objects and then FOREACH through them in Cypher. This means one, larger, HTTP call to Neo4j and then executing in a single transaction on the DB:
FOREACH (n in {set} | MERGE (c:Label {Id : n.Id}) SET c = n)
http://docs.neo4j.org/chunked/stable/query-foreach.html
The other option, coming soon, is that you will be able to write something like this in Cypher:
LOAD CSV WITH HEADERS FROM 'file://c:/temp/input.csv' AS n
MERGE (c:Label { Id : n.Id })
SET c = n
https://github.com/davidegrohmann/neo4j/blob/2.1-fix-resource-failure-load-csv/community/cypher/cypher/src/test/scala/org/neo4j/cypher/LoadCsvAcceptanceTest.scala

Resources