Two values as DISTINCT in Neo4j query - neo4j

I am sending a query to Neo4j database and I want to return only the items that have two numbers - tsneX and tsneY as a distinct 'point'. So, tsneX can be equal to tsneX, but the second number tsneY should not in that case. Here is my query:
MATCH (c:Cell)-[ex:EXPRESSES]->(g:Gene { geneName: "' + geneName + '" })
RETURN ex.expr, c.tsneX, c.tsneY;
So, I want smth like
MATCH (c:Cell)-[ex:EXPRESSES]->(g:Gene { geneName: "' + geneName + '" })
WITH DISTINCT (c.tsneX, c.tsneY) AS point
RETURN ex.expr, point;
Example:
ex.expr c.tsneX c.tsneY
1. 4 1.2 1.2
2. 5 2.1 3.3
3. 1 1.2 1.2
One of them - 1st or 3rd need to be dropped since their tsneX and tsneY coordinates are equal respectively to each other. So, I would want only 1st and 2nd to be returned but 3rd to be dropped since ex.expr is higher in the 1st one.
Any suggestions would be greatly appreciated.

So you want the pair of [c.tsneX, c.tsneY], and in the case there's more results with the same point you only want the higheset ex.expr. This should do the trick:
MATCH (c:Cell)-[ex:EXPRESSES]->(g:Gene { geneName: "' + geneName + '" })
RETURN [c.tsneX, c.tsneY] AS point, max(ex.expr) as expr
If you want the point as an object rather than a list, you can instead do:
{x:c.tsneX, y:c.tsneY} AS point
In either case, the max(ex.expr) aggregation function will ensure that the remaining non-aggregation value, point, is distinct.

Related

Neo4j: Sequence of Events as Nodes Not working

I am new to cypher query syntax and tried different types of syntax/relationship to build sequence graph. My data contains group_id and within each group_id a code occurs based on the 'number'. Lowest number is the first sequence and highest number is the last sequence per group id. I am able to load the data from csv and create nodes with properties, however it is not letting me convert to numerical sequence for 'code' nodes. I am reading/referencing this article : this tutorial. Is there special cypher syntax to use to achieve this result?
Sample Data:
group_id,code,date,number
123,abc,2/18/21,4
123,def,11/11/20,3
123,ghi,11/10/20,2
123,jkl,10/1/20,1
456,gtg,11/28/20,5
456,abc,10/30/20,4
456,def,10/5/20,3
456,jkl,10/1/20,2
456,uuu,10/1/20,1
My Code to load data:
LOAD CSV WITH HEADERS FROM "file:///sample2.csv" AS row
WITH row
WHERE row.group_id IS NOT NULL
MERGE (g:group_id {group_id: row.group_id});
LOAD CSV WITH HEADERS FROM "file:///sample2.csv" AS row
WITH row
WHERE row.code IS NOT NULL
MERGE (c:code {code: row.code})
ON CREATE SET c.number = row.number,
c.date = row.date;
Here is what I have tried:
// Building relationship
LOAD CSV WITH HEADERS FROM "file:///sample2.csv" AS row
WITH row
MATCH (g:group_id {group_id: row.group_id})
MATCH (c:code {code: row.code})
MERGE (g)-[:GROUPS]->(c) // Connects ALL codes to group id, but how to connect to 'code' and 'number' sequentially?
MERGE (c:{code: row.number})-[:NEXT]->(c) // Neo.ClientError.Statement.SyntaxError
I have gotten result:
I am trying to get this.
This will be a two step process. First the initial loading of the data as you have outlined. Then an enhancement in which you create the NEXT relationships. We do this in healthcare analytics of patient journeys or trajectories. By analogy, your yellow nodes might be a patient and the blue one an encounter. So each patient has a sequence of encounters.
You can query and sort by the date or other ordering variable. For example, collect a sorted list of encounters:
match (e:encounter) with e order by e.enc_date with e.subjectId as sid,collect(distinct e.enc_date) as eo return sid,size(eo) as ct,eo
I used this in some python code to then iterate through the collection to create the enc_seq edge, equivalent to your NEXT:
> dfeo = Neo4jLib.CypherToPandas("match (e:encounter) with e order by e.enc_date with e.subjectId as sid,collect(distinct e.enc_date) as eo return sid,size(eo) as ct,eo",'ppmi')
csv = dfeo.to_csv(index=False).split('\n')
cts=0
sw = open("c:\\temp\\error.txt","a")
for i in range(1,len(dfeo)):
cc = csv[i].split(',')
for j in range(0,int(cc[1])-1):
try:
q= "match (e1:encounter{subjectId:" + str(dfeo['sid'][i]) + ",enc_date:date('" + str(dfeo['eo'][i][j]) + "')}) match (e2:encounter{subjectId:" + str(dfeo['sid'][i]) + ",enc_date:date('" + str(dfeo['eo'][i][j+1]) + "')}) merge (e1)-[r:enc_seq{subjectId:" + str(dfeo['sid'][i]) + ", seqCt:" + str(j) + "}]-(e2)"
Neo4jLib.CypherNoReturn(q,'ppmi')
except:
cts = cts + 1
sw.write(str(i) + ':' + str(j) + "\n"+ q + "\n")
print("exceptions: " + str(cts))
sw.flush()
sw.close()
You can probably do this within a cypher query using a WITH (each row) followed by a CALL to a function similar to my python code. For my purposes it was more convenient to use python.

neo4j cypher create relation between two nodes based on an attribute String value

I have a Node with Label Experiment with an attribute called ExperimentName.
This ExperimentName is based on the concatenation of 3 different variables
"Condition (ExperimentProfile1) Dose"
Example :
Control diet (MOA77) LD
Control gavage(MOA66) HD
I have another Node called ExperimentMapper
it has 3 attributes :
- Condition
- ExperimentProfile
- Dose
I would like to create a Relation between Node Experiment and Node ExperimentMapper when experimentName is the results of the 3 attributes assembled.
I have tried to use Regex but the query was extremely slow and took forever..
Any help?
This is my cypher but it is taking forever despite me creating indexes
MATCH (mxpExperiment:MxpExperiment) OPTIONAL MATCH (otuExperimentMapper:OtuExperimentMapper)
WHERE mxpExperiment.name CONTAINS otuExperimentMapper.Condition
AND mxpExperiment.name CONTAINS otuExperimentMapper.Experiment
AND mxpExperiment.name CONTAINS otuExperimentMapper.dose
CREATE (mxpExperiment)-[:OTU_EXPERIMENT_MAPPER]->(otuExperimentMapper)
RETURN mxpExperiment, otuExperimentMapper
I think that you need to go from the side of the Experiment Mapper.
First you need to create an index:
CREATE INDEX ON :MxpExperiment(name)
Then the query can be as follows:
MATCH (otuExperimentMapper:OtuExperimentMapper)
WITH otuExperimentMapper,
otuExperimentMapper.Condition + ' (' +
otuExperimentMapper.Experiment + ') ' +
otuExperimentMapper.dose AS name
MATCH (mxpExperiment:MxpExperiment) WHERE mxpExperiment.name = name
MERGE (mxpExperiment)-[:OTU_EXPERIMENT_MAPPER]->(otuExperimentMapper)
RETURN mxpExperiment, otuExperimentMapper

c# NEO4J v3 can't create relationship

I have a c# project that I'm working with NEO4J 2.3.2 , after updating to version 3 I start to see that my system always fail creating relationships . thus is my code
View userView = new View { parent = parent, timestamp = currentTime };
WebApiConfig.GraphClient.Cypher
.Match("(user123:BaseUser{guid: '" + isAuto + "'})", "(y:YoutubeItem{videoId: '" + itemid + "'})")
.CreateUnique("user123-[r:VIEW]->y")
.Set("r = {userView}")
.WithParam("userView", userView)
.ExecuteWithoutResults();
and this is the exception
"SyntaxException: Parentheses are required to identify nodes in patterns, i.e. (user123) (line 2, column 15 (offset: 127))\n\"CREATE UNIQUE user123-[r:VIEW]->y\r\"\n ^"
and when i go back to the old version everything is working well, what should i do?
Cypher now enforces the requirement that nodes must be surrounded by parentheses.
So, in your query, the CreateUnique line needs to look like this:
.CreateUnique("(user123)-[r:VIEW]->(y)")
By the way, you should be using parameters for injecting the isAuto and itemId values. You are already doing that with userView.

Get Path in text format from Graph

In my graph I have data like following way.
Here a,b,c,d are nodes and r1,r2,r3,r4 are relations.
a-r1->b
b-r2->a
b-r2->c
c-r1->b
d-r3->a
a-r1->d like this.
I am using following Cypher to get path with max depth 3.
MATCH p=(n)-[r*1..3]-(m) WHERE n.id=1 and m.id=2 RETURN p
Here return p is path and I want to display path in text format like this.
Example : Suppose Path Lengh is 3.
a-r1->b-r2->c like this in text format.
Is this possible ?
Sort of. I'll give you most of the answer, but I myself can't complete the answer. Maybe another cypher wizard will come along and improve on the answer, but here's what I've got for you.
match p=(n)-[r*1..3]-(m)
WHERE id(n)=1 AND id(m)=2
WITH extract(node in nodes(p) | coalesce(node.label, "")) as nodeLabels,
extract(rel in relationships(p) | type(rel)) as relationshipLabels
WITH reduce(nodePath="", nodeLabel in nodeLabels | nodePath + nodeLabel + "-") as nodePath,
reduce(relPath="", relLabel in relationshipLabels | relPath + relLabel + "-") as relPath
RETURN nodePath, relPath
LIMIT 1;
EDIT - one small note, in your question you specify the WHERE criteria n.id=1 and m.id=2. Note that this is probably not what you want. Node IDs are usually checked with WHERE id(n)=1 AND id(m)=2. Id isn't technically a node property, so I changed that.
OK, so we're going to match the path. Then we're going to use the extract function to pull out the label property from nodes, and create a collection called nodeLabels. We'll do the same for the relationship types. What reduce does here is accumulate each of the individual strings in those collections down to a single string. So if your nodes are a, b, and c, you'd get a nodePath string that looks like a-b-c-. Similarly, your relationship string would look like r1-r2-r3-.
Now, I know you want those interleaved, and you'd prefer output like a-r1-b-r2-c. Here's the problem I see with that...
Normally, the way I'd approach that is to use FOREACH to iterate over the node label collection. Since you know there is one less relationship than nodes because of what paths are, ideally (in pseudo code) I'd want to do something like this:
buffer = ""
foreach x in range(0, length(nodeLabels)) |
buffer = buffer + nodeLabels[idx] + "-" + relLabels[idx] + "->")
This would be a way of reducing to the string that you want. You can't use the reduce function, because it doesn't provide you a way of getting which index you're at in the collection. Meaning that you can iterate over one of the collections, but not at the same time over the other. This FOREACH pseudo code will not work, because the second part of FOREACH I believe has to be a mutating operation on the graph, and you can't just use it to accumulate a string like I did here, or like the extract function does.
So as far as I can tell, you might kinda be stuck here. Hopefully someone will prove me wrong on this - I am not 100% sure.
Finally another way to go after this would be, if there was a path function that extracted node/relationship pairs, rather than just nodes() or relationships() individually as I used them above, then you could use that function to iterate over one collection, rather than shuffling two collections, as my code above attempts and fails to do. Sadly, I don't think there's any such path function, so that's just more reason why I think you might be up a creek.
Now, practically speaking, you could always execute this query in java or some other language, return the path, and then use the full power of whatever programming language you want to build up this string. But pure cypher? I'm doubtful.
Here What I ended up doing. Hope that somebody else find it useful for future.
MATCH p=(n)-[r*1..3]->(m)
WHERE n.id=1 AND m.id=4
WITH extract(rel in relationships(p) | STARTNODE(rel).name + '->' + type(rel)) as relationshipLabels, m.name as endnodename
WITH reduce(relPath="", relLabel in relationshipLabels | relPath + relLabel+ '->') as relPath , end
RETURN distinct relPath + endnodename

How to count tag-to-tag relationships without having it explode?

I'm using neo4j, storing a simple "content has-many tags" data structure.
I'd like to find out "what tags co-exist with what other tags the most?"
I've got around 500K content-to-tag relationships, so unfortunately, that works out to 0.5M^2 posible coexist relationships, and then you need to count how many each type of relationship happens! Or do you? Am I doing this the long way?
It never seems to return, and my CPU is pegged out for quite some time now.
final ExecutionResult result = engine.execute(
"START metag=node(*)\n"
+ "MATCH metag<-[:HAS_TAG]-content-[:HAS_TAG]->othertag\n"
+ "WHERE metag.name>othertag.name\n"
+ "RETURN metag.name, othertag.name, count(content)\n"
+ "ORDER BY count(content) DESC");
for (Map<String, Object> row : result) {
System.out.println(row.get("metag.name") + "\t" + row.get("othertag.name") + "\t" + row.get("count(content)"));
}
You should try to decrease your bound points to make the traversal faster. I assume your graph will always have more tags than content so you should make the content your bound points. Something like
start
content = node:node_auto_index(' type:"CONTENT" ')
match
metatag<-[:HAS_CONTENT]-content-[:HAS_CONTENT]->othertag
where
metatag<>othertag
return
metatag.name, othertag.name, count(content)

Resources