I try to replicate the behaviour of the following SQL query in neo4j
DELETE FROM history
WHERE history.name = $modelName AND id NOT IN (
SELECT history.id
FROM history
JOIN model ON model.id = history.model_id
ORDER BY created DESC
LIMIT 10
)
I tried a lot of different queries, but basically I'm always struggling to incorporate finding the TOP-k elements. That's the closest I got to a solution.
MATCH (h:HISTORY)-[:HISTORY]-(m:MODEL)
WHERE h.name = $modelName
WITH h
MATCH (t:HISTORY)-[:HISTORY]-(m:MODEL)
WITH t ORDER BY t.created DESC LIMIT 10
WHERE NOT h IN t
DELETE h
With that query I get the error expected List<T> but was Node for the line WITH t ORDER BY t.created DESC LIMIT 10.
I tried changing it it COLLECT(t) AS t but then the error is expected Any, Map, Node or Relationship but was List<Node>.
So I'm pretty much stuck. Any idea how to write this query in Cypher?
Following that approach, you should reverse the order, matching to your top-k nodes, collecting them, and performing the match where the nodes matched aren't in the collection.
MATCH (t:HISTORY)-[:HISTORY]-(:MODEL)
WITH t ORDER BY t.created DESC LIMIT 10
WITH collect(t) as saved
MATCH (h:HISTORY)-[:HISTORY]-(:MODEL)
WHERE h.name = $modelName
AND NOT h in saved
DETACH DELETE h
Related
I want to make a cypher query that do below tasks:
there is a given start node, and I want to get all related nodes in 2 hops
sort queried nodes by hops asc, and limit it with given number
and get all relations between result of 1.
I tried tons of queries, and I made below query for step 1, 2
MATCH path=((start {eid:12018})-[r:REAL_CALL*1..2]-(end))
WITH start, end, path
ORDER BY length(path) ASC
RETURN start, collect(distinct end)[..10]
But when I try to get relationships in queried path with below query, it returns all relationships in the path :
MATCH path=((start {eid:12018})-[r:REAL_CALL*1..2]-(end))
WITH start, end, path
ORDER BY length(path) ASC
RETURN start, collect(distinct end)[..10], relationships(path)
I think I have to match again with result of first match instead of get relationships from path directly, but all of my attempts have failed.
How can I get all relationships between queried nodes?
Any helps appreciate, thanks a lot.
[EDITED]
Something like this may work for you:
MATCH (start {eid:12018})-[rels:REAL_CALL*..2]-(end)
RETURN start, end, COLLECT(rels) AS rels_collection
ORDER BY
REDUCE(s = 2, rs in rels_collection | CASE WHEN SIZE(rs) < s THEN SIZE(rs) ELSE s END)
LIMIT 10;
The COLLECT aggregation function will generate a collection (of relationship collections) for each distinct start/end pair. The LIMIT clause limits the returned results to the first 10 start/end pairs, based on the ORDER BY clause. The ORDER BY clause uses REDCUE to calculate the minimum size of each path to a given end node.
I want to compute Indegree and Outdegree and return a graph that has a connection between top 5 Indegree nodes and top 5 Outdegree nodes. I have written a code as
match (a:Port1)<-[r]-()
return a.id as NodeIn, count(r) as Indegree
order by Indegree DESC LIMIT 5
union
match (n:Port1)-[r]->()
return n.id as NodeOut, count(r) as Outdegree
order by Outdegree DESC LIMIT 5
union
match p=(u:Port1)-[:LinkTo*1..]->(t:Port1)
where u.id in NodeIn and t.id in NodeOut
return p
I get an error as
All sub queries in an UNION must have the same column names (line 4, column 1 (offset: 99)) "union"
What are the changes that I need to do to the code?
There's a few things we can improve.
The matches you're doing isn't the most efficient way to get incoming and outgoing degrees for relationships.
Also, UNION can only be used to combine query results with identical columns. In this case, we won't even need UNION, we can use WITH to pipe results from one part of a query to another, and COLLECT() the nodes you need in between.
Try this query:
match (a:Port1)
with a, size((a)<--()) as Indegree
order by Indegree DESC LIMIT 5
with collect(a) as NodesIn
match (a:Port1)
with NodesIn, a, size((a)-->()) as Outdegree
order by Outdegree DESC LIMIT 5
with NodesIn, collect(a) as NodesOut
unwind NodesIn as NodeIn
unwind NodesOut as NodeOut
// we now have a cartesian product between both lists
match p=(NodeIn)-[:LinkTo*1..]->(NodeOut)
return p
Be aware that this performs two NodeLabelScans of :Port1 nodes, and does a cross product of the top 5 of each, so there are 25 variable length path matches, which can be expenses, as this generates all possible paths from each NodeIn to each NodeOut.
If you only one the shortest connection between each, then you might try replacing your variable length match with a shortestPath() call, which only returns the shortest path found between each two nodes:
...
match p = shortestPath((NodeIn)-[:LinkTo*1..]->(NodeOut))
return p
Also, make sure your desired direction is correct, as you're matching nodes with the highest in degree and getting an outgoing path to nodes with the highest out degree, that seems like it might be backwards to me, but you know your requirements best.
I have a collection of :Product nodes and I want to return latest 100. Consider global query like:
MATCH (p:Product) RETURN p LIMIT 100
From what I can see it returns oldest nodes first. Is there a way of getting newest on top?
Order by won't be an option as number of products can be millions.
UPDATE
I ended up creating a dense node (:ProductIndex). Each time I create product I add it to the index (:Product)-[:INDEXED]->(:ProductIndex). With dense nodes rel chain will be ordered by latest first so that query below will return newest records on top
MATCH (p:Product)-[:INDEXED]->(:ProductIndex)
RETURN p
LIMIT 1000
I can always keep index fixed size as I don't need to preserve full history.
Is the data model such that the products are connected in a list (e.g. (:Product)-[:PREVIOUS]->(:Product)?
Can you keep track of the most recent node? Either with a time stamp that you can easily locate or another node connected to your most recent product node.
If so, you could always query out the most recent ones with a query similar to the following.
match (max:Date {name: 'Last Product Date'})-->(latest:Product)
with latest
match p=(latest)-[:PREVIOUS*..100]->(:Product)
return nodes(p)
order by length(p) desc
limit 1
OR something like this where you select
match (max:Date {name: 'Product Date'})
with max
match p=(latest:Product)-[:PREVIOUS*..100]->(:Product)
where latest.date = max.date
return nodes(p)
order by length(p) desc
limit 1
Another approach, still using a list could be to keep an indexed create date property on each product. But when looking for the most recent pick a control date that doesn't go back to the beginning of time so you have a smaller pool of nodes (i.e. not millions). Then use an max function on that smaller pool to find the most recent node and follow it back by however many you want.
match (latest:Product)
where latest.date > {control_date}
with max(latest.date) as latest_date
match p=(product:Product)-[:PREVIOUS*..100]->(:Product)
where product.date = latest_date
return nodes(p)
order by length(p) desc
limit 1
Deleting a node in a linked list is pretty simple. If you need to perform this search a lot and you don't want to order the products, I think keeping the products in a list is a pretty good graph application. Here is an example of a delete that maintains the list.
match (previous:Product)<-[:PREVIOUS]-(product_to_delete:Product)<-[:PREVIOUS]-(next:Product)
where product_to_delete.name = 'name of node to delete'
create (previous)<-[:PREVIOUS]-(next)
detach delete product_to_delete
return previous, next
I am a total beginner with Neo4j and need help. Is there a query for getting the first few nodes with highest degree?
I have nodes called P and nodes called A. There are only links between P and A nodes. I want to have the first 10 nodes P which have the most links to nodes A.
My idea was the following query, but it took so much time!
MATCH (P1:P)-[r]->(A1:A)
RETURN P1.name AS P_name, COUNT(A1) AS A_no
ORDER BY no DESC
LIMIT 10
Is there something wrong with my query?
Best,
Mowi
How many nodes do you have in your db?
I'd probably not use cypher for that, the Java API actually has a node.getDegree() method which is much much faster.
Your query could be sped up a bit by
MATCH (P1:P)-->()
RETURN id(P1),count(*) as degree
ORDER BY degree DESC LIMIT 10
you could also try:
MATCH (P1:P)
RETURN id(P1),size((P1)-->()) as degree
ORDER BY degree DESC LIMIT 10
for limiting the nodes:
MATCH (P1:P)
WHERE P1.foo = "bar"
WITH P1 limit 10000
MATCH (P1)-->()
RETURN id(P1),count(*) as degree
ORDER BY degree DESC LIMIT 10
I'm developing a kind of reddit service to learn Neo4j.
Everything works fine, I just want to get some feedback on the Cypher query to get the most recent news stories, the author and number of comments, likes and dislikes.
I'm using Neo4j 2.0.
MATCH comments = (n:news)-[:COMMENT]-(o)
MATCH likes = (n:news)-[:LIKES]-(p)
MATCH dislikes = (n:news)-[:DISLIKES]-(q)
MATCH (n:news)-[:POSTED_BY]-(r)
WITH n, r, count(comments) AS num_comments, count(likes) AS num_likes, count(dislikes) AS num_dislikes
ORDER BY n.post_date
LIMIT 20
RETURN *
o, p, q, r are all nodes with the label user. Should the label be added to the query to speed it up?
Is there anything else you see that I could optimize?
I think you're going to want to get rid of the multiple matches. Cypher will filter on each one, filtering through one another, rather than getting all the information.
I would also avoid the paths like comments, and rather do the count on the nodes you are saving. When you do MATCH xyz = (a)-[:COMMENT]-(b) then xyz is a path, which contains the source, relationship and destination node.
MATCH (news:news)-[:COMMENT]-(comment),(news:news)-[:LIKES]-(like),(news:news)-[:DISLIKES]-(dislike),(news:news)-[:POSTED_BY]-(posted_by)
WHERE news.post_date > 0
WITH news, posted_by, count(comment) AS num_comments, count(like) AS num_likes, count(dislike) AS num_dislikes
ORDER BY news.post_date
LIMIT 20
RETURN *
I would do something like this.
MATCH (n:news)-[:POSTED_BY]->(r)
WHERE n.post_date > {recent_start_time}
RETURN n, r,
length((n)<-[:COMMENT]-()) AS num_comments,
length((n)<-[:LIKES]-()) AS num_likes,
length((n)<-[:DISLIKES]-()) AS num_dislikes,
ORDER BY n.post_date DESC
LIMIT 20
To speed it up and have not neo search over all your posts, I would probably index the post-date field (assuming it doesn't contain time information). And then send this query in for today, yesterday etc. until you have your 20 posts.
MATCH (n:news {post_date: {day}})-[:POSTED_BY]->(r)
RETURN n, r,
length((n)<-[:COMMENT]-()) AS num_comments,
length((n)<-[:LIKES]-()) AS num_likes,
length((n)<-[:DISLIKES]-()) AS num_dislikes,
ORDER BY n.post_date DESC
LIMIT 20