I'm having difficulties to get all nodes in a specific time range. I have two types of node attached to the timetree, Nodes Tweet and Nodes News.
I want all the Tweets nodes. I'm using this query (10+ min stopped):
CALL ga.timetree.events.range({start: 148029120000, end: 1480896000000, relationshipType: "LAST_UPDATE", resolution: 'DAY'})
YIELD node
MATCH (a:TwitterUser)-[:POSTS]->(:Tweet)-[r:RETWEETS]->(:Tweet)<-[:POSTS]-(m:TwitterUser)
RETURN id(a), id(m), count(r) AS NumRetweets
ORDER BY NumRetweets DESC
But this takes a lot compared to the simple query (8 seconds):
MATCH (a:TwitterUser)-[:POSTS]->(:Tweet)-[r:RETWEETS]->(:Tweet)<-[:POSTS]-(m:TwitterUser)
RETURN id(a), id(m), count(r) AS NumRetweets
ORDER BY NumRetweets DESC
Actually, with my data, the 2 query should return the same nodes, so i dont understand the big time difference.
The problem with your first query is that you're not doing anything with the results of the timetree query. It is literally just wasting cycles and bloating up the built up rows with data that's not even used.
You need to take the :Tweet nodes returned from your timetree query and include them into the next part of your query.
CALL ga.timetree.events.range({start: 148029120000, end: 1480896000000, relationshipType: "LAST_UPDATE", resolution: 'DAY'})
YIELD node
WITH node as tweet
WHERE tweet:Tweet
MATCH (a:TwitterUser)-[:POSTS]->(:Tweet)-[r:RETWEETS]->(tweet)<-[:POSTS]-(m:TwitterUser)
RETURN id(a), id(m), count(r) AS NumRetweets
ORDER BY NumRetweets DESC
Related
I am writing an api to return neo4j data. For my case I get all nodes matching.
API takes in userId, limit and offset and return a list of data matching that condition.
I found one solution Cypher to return total node count as well as a limited set but it is pretty old. Not sure if this is still the best way to do it.
Performance is same as firing 2 separate queries, atleast then one of them would be cached by neo4j after couple of runs.
Match(u:WorkstationUser {id: "alw:44807"})-[:HAS_ACCESS_TO]->(p) return distinct(p) skip 0 limit 10
Match(u:WorkstationUser {id: "alw:44807"})-[:HAS_ACCESS_TO]->(p) return count(distinct(p))
I want the result to be something like
{
items: [ {}, {}], # query 1
total: 100, # query 2
limit: 10, # can get from input
skip: 0 # can get from input
}
This will depend a bit on how much information you need from the nodes for which you want the count, and whether you need to get distinct results or not.
If distinct results are not needed, and you don't need to do any additional filtering on the relationship or node at the other end (no filtering of the label or properties of the node), then you can use the size() of the pattern which will use the degree information of the relationships present on the node, which is more efficient as you never have to actually expand out the relationships:
MATCH (u:WorkstationUser {id: "alw:44807"})
WITH u, size((u)-[:HAS_ACCESS_TO]->(p)) as total
MATCH (u)-[:HAS_ACCESS_TO]->(p)
RETURN p, total
SKIP 0 LIMIT 10
However if distinct results are needed, or you need to filter the node by label or properties, then you will have to expand all the results to get the total. If there aren't too many results (millions or billions) then you can collect the distinct nodes, get the size of the collection, then UNWIND the results and page:
MATCH (:WorkstationUser {id: "alw:44807"})-[:HAS_ACCESS_TO]->(p)
WITH collect(DISTINCT p) as pList
WITH pList, size(pList) as total
UNWIND pList as p
RETURN p, total
SKIP 0 LIMIT 10
I have a graph in Neo4j (first time using it) of about 10 different nodes that are connected in various ways. Not all nodes are connected to each other, as some have up to 6 or 7 neighbors, while some have only 1. What query would I write/use to check if a path exists from NodeA to NodeB? It doesn't have to be the shortest path, just if a path exists.
Along with this, is there a way to count who has the most or least neighbors? Thanks everyone for help in advance.
Return Foo nodes a and b if there is at least one path between them. (This variable-length path query with unbounded length could take a very long time or run out of memory if there are a lot of paths or very long paths).
MATCH (a:Foo {id: 'a'}), (b:Foo {id: 'b'})
WHERE (a)-[*]-(b)
RETURN a, b;
Return all paths between a and b. (This query could require even more time and memory than the previous query, since it will attempt to return all matching paths).
MATCH path=(a:Foo {id: 'a'})-[*]-(b:Foo {id: 'b'})
RETURN path;
Return the 10 nodes with the most neighbors, in descending order:
MATCH (n)--()
WITH n, COUNT(*) AS c
RETURN n
ORDER BY c DESC
LIMIT 10;
I want to make a cypher query that do below tasks:
there is a given start node, and I want to get all related nodes in 2 hops
sort queried nodes by hops asc, and limit it with given number
and get all relations between result of 1.
I tried tons of queries, and I made below query for step 1, 2
MATCH path=((start {eid:12018})-[r:REAL_CALL*1..2]-(end))
WITH start, end, path
ORDER BY length(path) ASC
RETURN start, collect(distinct end)[..10]
But when I try to get relationships in queried path with below query, it returns all relationships in the path :
MATCH path=((start {eid:12018})-[r:REAL_CALL*1..2]-(end))
WITH start, end, path
ORDER BY length(path) ASC
RETURN start, collect(distinct end)[..10], relationships(path)
I think I have to match again with result of first match instead of get relationships from path directly, but all of my attempts have failed.
How can I get all relationships between queried nodes?
Any helps appreciate, thanks a lot.
[EDITED]
Something like this may work for you:
MATCH (start {eid:12018})-[rels:REAL_CALL*..2]-(end)
RETURN start, end, COLLECT(rels) AS rels_collection
ORDER BY
REDUCE(s = 2, rs in rels_collection | CASE WHEN SIZE(rs) < s THEN SIZE(rs) ELSE s END)
LIMIT 10;
The COLLECT aggregation function will generate a collection (of relationship collections) for each distinct start/end pair. The LIMIT clause limits the returned results to the first 10 start/end pairs, based on the ORDER BY clause. The ORDER BY clause uses REDCUE to calculate the minimum size of each path to a given end node.
I want to compute Indegree and Outdegree and return a graph that has a connection between top 5 Indegree nodes and top 5 Outdegree nodes. I have written a code as
match (a:Port1)<-[r]-()
return a.id as NodeIn, count(r) as Indegree
order by Indegree DESC LIMIT 5
union
match (n:Port1)-[r]->()
return n.id as NodeOut, count(r) as Outdegree
order by Outdegree DESC LIMIT 5
union
match p=(u:Port1)-[:LinkTo*1..]->(t:Port1)
where u.id in NodeIn and t.id in NodeOut
return p
I get an error as
All sub queries in an UNION must have the same column names (line 4, column 1 (offset: 99)) "union"
What are the changes that I need to do to the code?
There's a few things we can improve.
The matches you're doing isn't the most efficient way to get incoming and outgoing degrees for relationships.
Also, UNION can only be used to combine query results with identical columns. In this case, we won't even need UNION, we can use WITH to pipe results from one part of a query to another, and COLLECT() the nodes you need in between.
Try this query:
match (a:Port1)
with a, size((a)<--()) as Indegree
order by Indegree DESC LIMIT 5
with collect(a) as NodesIn
match (a:Port1)
with NodesIn, a, size((a)-->()) as Outdegree
order by Outdegree DESC LIMIT 5
with NodesIn, collect(a) as NodesOut
unwind NodesIn as NodeIn
unwind NodesOut as NodeOut
// we now have a cartesian product between both lists
match p=(NodeIn)-[:LinkTo*1..]->(NodeOut)
return p
Be aware that this performs two NodeLabelScans of :Port1 nodes, and does a cross product of the top 5 of each, so there are 25 variable length path matches, which can be expenses, as this generates all possible paths from each NodeIn to each NodeOut.
If you only one the shortest connection between each, then you might try replacing your variable length match with a shortestPath() call, which only returns the shortest path found between each two nodes:
...
match p = shortestPath((NodeIn)-[:LinkTo*1..]->(NodeOut))
return p
Also, make sure your desired direction is correct, as you're matching nodes with the highest in degree and getting an outgoing path to nodes with the highest out degree, that seems like it might be backwards to me, but you know your requirements best.
I'm trying to model a large knowledge graph. (using v3.1.1).
My actual graph contains only two types of Nodes (Topic, Properties) and a single type of Relationships (HAS_PROPERTIES).
The count of nodes is about 85M (47M :Topic, the rest of nodes are :Properties).
I'm trying to get the most connected node:Topic for this. I'm using the following query:
MATCH (n:Topic)-[r]-()
RETURN n, count(DISTINCT r) AS num
ORDER BY num
This query or almost any query I try to perform (without filtering the results) using the count(relationships) and order by count(relationships) is always extremely slow: these queries take more than 10 minutes and still no response.
Am i missing indexes or is the a better syntax?
Is there any chance i can execute this query in a reasonable time?
Use this:
MATCH (n:Topic)
RETURN n, size( (n)--() ) AS num
ORDER BY num DESC
LIMIT 100
Which reads the degree from a node directly.