I want to compute Indegree and Outdegree and return a graph that has a connection between top 5 Indegree nodes and top 5 Outdegree nodes. I have written a code as
match (a:Port1)<-[r]-()
return a.id as NodeIn, count(r) as Indegree
order by Indegree DESC LIMIT 5
union
match (n:Port1)-[r]->()
return n.id as NodeOut, count(r) as Outdegree
order by Outdegree DESC LIMIT 5
union
match p=(u:Port1)-[:LinkTo*1..]->(t:Port1)
where u.id in NodeIn and t.id in NodeOut
return p
I get an error as
All sub queries in an UNION must have the same column names (line 4, column 1 (offset: 99)) "union"
What are the changes that I need to do to the code?
There's a few things we can improve.
The matches you're doing isn't the most efficient way to get incoming and outgoing degrees for relationships.
Also, UNION can only be used to combine query results with identical columns. In this case, we won't even need UNION, we can use WITH to pipe results from one part of a query to another, and COLLECT() the nodes you need in between.
Try this query:
match (a:Port1)
with a, size((a)<--()) as Indegree
order by Indegree DESC LIMIT 5
with collect(a) as NodesIn
match (a:Port1)
with NodesIn, a, size((a)-->()) as Outdegree
order by Outdegree DESC LIMIT 5
with NodesIn, collect(a) as NodesOut
unwind NodesIn as NodeIn
unwind NodesOut as NodeOut
// we now have a cartesian product between both lists
match p=(NodeIn)-[:LinkTo*1..]->(NodeOut)
return p
Be aware that this performs two NodeLabelScans of :Port1 nodes, and does a cross product of the top 5 of each, so there are 25 variable length path matches, which can be expenses, as this generates all possible paths from each NodeIn to each NodeOut.
If you only one the shortest connection between each, then you might try replacing your variable length match with a shortestPath() call, which only returns the shortest path found between each two nodes:
...
match p = shortestPath((NodeIn)-[:LinkTo*1..]->(NodeOut))
return p
Also, make sure your desired direction is correct, as you're matching nodes with the highest in degree and getting an outgoing path to nodes with the highest out degree, that seems like it might be backwards to me, but you know your requirements best.
Related
MATCH (d:domain)
WITH COLLECT(d) AS domains
UNWIND domains AS d1
UNWIND domains AS d2
WITH d1,d2
WHERE id(d1) < id(d2) and d1.name='google'
MATCH (d1)-[r:domain_join]-(d2)
//where r.weight is max // I want something like this (I am stuck at this line)
return d1.name,d2.name,r.weight;
The output I am getting is
The output I want is the single row having the maximum weight
You should be able to do:
return d1.name,d2.name,r.weight
ORDER BY r.weight DESC LIMIT 1;
I suspect you may be able to simplify your query (especially if you have an index on the name property for your domain nodes). By the way, labels usually start with an uppercase letter and types for relationships are all uppercased.
MATCH (google:domain {name: 'google'})
MATCH (google)-[r:domain_join]-(d2:domain)
RETURN d2.name, r.weight
ORDER BY r.weight DESC LIMIT 1
I want to make a cypher query that do below tasks:
there is a given start node, and I want to get all related nodes in 2 hops
sort queried nodes by hops asc, and limit it with given number
and get all relations between result of 1.
I tried tons of queries, and I made below query for step 1, 2
MATCH path=((start {eid:12018})-[r:REAL_CALL*1..2]-(end))
WITH start, end, path
ORDER BY length(path) ASC
RETURN start, collect(distinct end)[..10]
But when I try to get relationships in queried path with below query, it returns all relationships in the path :
MATCH path=((start {eid:12018})-[r:REAL_CALL*1..2]-(end))
WITH start, end, path
ORDER BY length(path) ASC
RETURN start, collect(distinct end)[..10], relationships(path)
I think I have to match again with result of first match instead of get relationships from path directly, but all of my attempts have failed.
How can I get all relationships between queried nodes?
Any helps appreciate, thanks a lot.
[EDITED]
Something like this may work for you:
MATCH (start {eid:12018})-[rels:REAL_CALL*..2]-(end)
RETURN start, end, COLLECT(rels) AS rels_collection
ORDER BY
REDUCE(s = 2, rs in rels_collection | CASE WHEN SIZE(rs) < s THEN SIZE(rs) ELSE s END)
LIMIT 10;
The COLLECT aggregation function will generate a collection (of relationship collections) for each distinct start/end pair. The LIMIT clause limits the returned results to the first 10 start/end pairs, based on the ORDER BY clause. The ORDER BY clause uses REDCUE to calculate the minimum size of each path to a given end node.
Suppose tha I have the default database Movies and I want to find the total number of people that have participated in each movie, no matter their role (i.e. including the actors, the producers, the directors e.t.c.)
I have already done that using the query:
MATCH (m:Movie)<-[r]-(n:Person)
WITH m, COUNT(n) as count_people
RETURN m, count_people
ORDER BY count_people DESC
LIMIT 3
Ok, I have included some extra options but that doesn't really matter in my actual question. From the above query, I will get 3 movies.
Q. How can I enrich the above query, so I can get a graph including all the relationships regarding these 3 movies (i.e.DIRECTED, ACTED_IN,PRODUCED e.t.c)?
I know that I can deploy all the relationships regarding each movie through the buttons on each movie node, but I would like to know whether I can do so through cypher.
Use additional optional match:
MATCH (m:Movie)<--(n:Person)
WITH m,
COUNT(n) as count_people
ORDER BY count_people DESC
LIMIT 3
OPTIONAL MATCH p = (m)-[r]-(RN) WHERE type(r) IN ['DIRECTED', 'ACTED_IN', 'PRODUCED']
RETURN m,
collect(p) as graphPaths,
count_people
ORDER BY count_people DESC
I search the longest path of my graph and I want to count the number of distinct nodes of this longest path.
I want to use count(distinct())
I tried two queries.
First is
match p=(primero)-[:ResponseTo*]-(segundo)
with max(length(p)) as lengthPath
match p1=(primero)-[:ResponseTo*]-(segundo)
where length(p1) = lengthPath
return nodes(p1)
The query result is a graph with the path nodes.
But if I tried the query
match p=(primero)-[:ResponseTo*]-(segundo)
with max(length(p)) as lengthPath
match p1=(primero)-[:ResponseTo*]-(segundo)
where length(p1) = lengthPath
return count(distinct(primero))
The result is
count(distinct(primero))
2
How can I use count(distinct()) over the node primero.
Node Primero has a field called id.
You should bind at least one of those nodes, add a direction and also consider a path-limit otherwise this is an extremely expensive query.
match p=(primero)-[:ResponseTo*..30]-(segundo)
with p order by length(p) desc limit 1
unwind nodes(p) as n
return distinct n;
I want to have a query that starts with 2 given nodes and for each of them takes the up to 5 related nodes (by relation R1) and then looks for shortestPath between those 10 nodes (5 from the 1st and 5 from the 2nd original nodes).
I can't manage to "break" my query into 2 parts, each one calculates the 5 nodes and then MATCH the path on both of them.
My query so far is:
MATCH (n1:MyNode {node_id:12345})-[:R1]-(r1:RelatedNode)
WITH r1 LIMIT 5
MATCH (n2:MyNode {node_id:98765})-[:R1]-(r2:RelatedNode)
WITH r1,r2 LIMIT 5
MATCH p=shortestPath( (r1)-[*1..10]-(r2) )
RETURN p
The problem is that the second subquery is not really separated from the first, and still carries on r1, which makes the LIMIT wrong.
I want to run the first part, then run the second part (with only r2), and only then after having r1 and r2 calculated separately, match the shortest path. Can that be done?
Thanks!
Not sure if I understood your requirement correctly. I assume you want to find the shortest path in between any of the first five neighbors of n1 and n2.
I guess you have to pass through the limited result as a collection and UNWIND it later on:
MATCH (n1:MyNode {node_id:12345})-[:R1]-(r1:RelatedNode)
WITH r1 LIMIT 5
WITH collect(r1) as startNodes
MATCH (n2:MyNode {node_id:98765})-[:R1]-(r2:RelatedNode)
WITH r2, startNodes LIMIT 5
WITH collect(r2) as endNodes, startNodes
UNWIND startNodes as s UNWIND endNodes as e
MATCH p=shortestPath( (s)-[*1..10]-(e) )
RETURN p, length(p) ORDER BY length(p) ASC LIMIT 1
Be aware that the two UNWIND basically create a cross product. So you're calculated 5*5 = 25 shortest paths. Out of them we sort by length and pick the first one.