Neo4j: How to limit subqueries - neo4j

I just imported the English Wikipedia into Neo4j and am playing around. I started by looking up the pages that link into the Page "Berlin"
MATCH p=(p1:Page {title:"Berlin"})<-[*1..1]-(otherPage)
WITH nodes(p) as neighbors
LIMIT 500
RETURN DISTINCT neighbors
That works quite well. What I would like to achieve next is to show the 2nd degree of relationships. In order to be able to display them correctly, I would like to limit the number of first degree relationship nodes to 20 and then query the next level of relationship.
How does one achieve that?

I don't know the Wikipedia model, but I'm assuming that there are many different relationship types and that is why that -[*1..1]-, I think that is analogous to -[]- or even --. I doubt it has any serious impact though.
You can collect up the first level matches and limit them to 20 using a WITH with a LIMIT. You can then perform a second match using those (<20) other pages as the start point.
MATCH (p1:Page {title:"Berlin"})<-[*1..1]-(otherPage:Page)
WITH p1, otherPage
LIMIT 20
MATCH (otherPage)<-[*1..1]-(secondDegree:Page)
WHERE secondDegree <> p1
WITH otherPage, secondDegree
LIMIT 500
RETURN otherPage, COLLECT(secondDegree)
There are many ways to return the data, this just returns the first degree match with an array of the subsequent matches.
If the only type of relationship is :Link and you want to keep the start node then you can change the query to this:
MATCH (p1:Page {title:"Berlin"})<-[:Link]-(otherPage:Page)
WITH p1, otherPage
LIMIT 20
MATCH (otherPage)<-[:Link]-(secondDegree:Page)
WHERE secondDegree <> p1
WITH p1, otherPage, secondDegree
LIMIT 500
RETURN p1, otherPage, COLLECT(secondDegree)

Related

I want to range the nodes by degree - why is this Neo4J Cypher request so slow?

I want to first get all the nodes of a certain type connected to a context and then simply range them by their degree, but only for the (:TO) type of connection to the other nodes that belong to the same context. I tried several ways including the ones below but they are too slow (10s of seconds). Is there any way to make it faster?
MATCH (ctx:Context{uid:'60156a60-d3e1-11ea-9477-f71401ca7fdb'})<-[:AT]-(c1:Concept)
WITH c1 MATCH (c1)-[r:TO]-(c2:Concept)
WHERE r.context = '60156a60-d3e1-11ea-9477-f71401ca7fdb'
RETURN c2, count(r) as degree ORDER BY degree DESC LIMIT 10;
MATCH (ctx:Context{uid:'60156a60-d3e1-11ea-9477-f71401ca7fdb'})<-[:AT]-(c1:Concept)-[:TO]-(c2:Concept)
RETURN c1, count(c2) as degree
ORDER BY degree DESC LIMIT 10;
One way to examine degree is using the size function, have you tried something like this?
size((c1)-[:TO]-(:Concept))
In my graph size() appears to be more efficient, but it might be my cypher rearrangement as well.
Example: (in my graph) This statement is 81db hits
PROFILE MATCH (g:Gene {name:'ACE2'})-[r:EXPRESSED_IN]-(a)
return count(r)
And this is 4 db hits
PROFILE MATCH (g:Gene {name:'ACE2'})
return size((g)-[:EXPRESSED_IN]-())
I'm not sure this next suggestion is faster/more efficient, but if you always calculate degree on a single or subset of relationships, you might look into storing the degree values just to see if that might be an option (faster?).
I do this on my entire graph right after a bulk load
CALL apoc.periodic.iterate(
"MATCH (n) return n",
"set n.degree = size((n)--())",
{batchSize:50000, batchMode: "BATCH", parallel:true});
but for a different reason, I want to see the degree value in the neo4j browser (for example...) Note: I rebuilt my graphs daily from the ground up but then it is static until the next rebuild

Neo4j Variable Depth not working

I am trying to build a cypher query for two scenarios :
Tests having depth more than 2
Specific test having depth more than 2
As in Image you can see tests 1, 2, 3 are somewhat related through depth more than 2. The cypher which i ran was :
MATCH p=()<-[r:TEST_FOR*..10]-() RETURN p LIMIT 50
Now when i Change my cypher to below then i get no records/results or nodes.
1) MATCH p=()<-[r:TEST_FOR*2..5]-() RETURN p LIMIT 50
2) MATCH p=()<-[r:TEST_FOR*2]-() RETURN p LIMIT 50
3) MATCH p=(d:Disease)<-[r:TEST_FOR*]-(t:Tests) WHERE t.testname = 'Alkaline Phosphatase (ALP)' RETURN p
4) MATCH p=()<-[r:TEST_FOR*..10]-(t:Tests {testname:'Alkaline Phosphatase (ALP)'}) RETURN p LIMIT 50
When I run Query 3 and 4 above, I get same results, i.e 1 test with 5 diseases, but it does not extend out further for that specific test.
But if you see the image the test is connected to two other tests ! My structure is as follows :
Tests (testname)
Disease (diseasename, did)
Linknode (parentdieaseid, testname)
I had used below query to create Relationship "TEST_FOR"
match(d:Disease), (l:Linknode) where d.did = l.parentdiseaseid
with d, l.testname as name
match(t:Test {testname:name}) create (d)<-[:TEST_FOR]-(t);
Direction is the problem here. Each of your 4 queries above uses a directed variable-length matching pattern, meaning that every relationship traversed must use the direction indicated (incoming). However, that won't work once you hit :Tests nodes, since they only have outgoing relationships to :Disease nodes.
The easy fix is to omit the direction, so the matching pattern will traverse :TEST_FOR relationships regardless of direction. For example:
MATCH p=()-[r:TEST_FOR*2..5]-() RETURN p LIMIT 50
Note that the reason your original query
MATCH p=()<-[r:TEST_FOR*..10]-() RETURN p LIMIT 50
was returning the full graph is because it was never traversing deeper than a single hop from each :Disease node (since it would never be able to traverse an incoming :TEST_FOR relationship from a :Tests node), but it was starting from every :Disease node, so naturally that hit every single :Tests node as well. You would have gotten the same graph with this:
MATCH p=()<-[r:TEST_FOR]-() RETURN p LIMIT 50

Neo4J order by count relationships extremely slow

I'm trying to model a large knowledge graph. (using v3.1.1).
My actual graph contains only two types of Nodes (Topic, Properties) and a single type of Relationships (HAS_PROPERTIES).
The count of nodes is about 85M (47M :Topic, the rest of nodes are :Properties).
I'm trying to get the most connected node:Topic for this. I'm using the following query:
MATCH (n:Topic)-[r]-()
RETURN n, count(DISTINCT r) AS num
ORDER BY num
This query or almost any query I try to perform (without filtering the results) using the count(relationships) and order by count(relationships) is always extremely slow: these queries take more than 10 minutes and still no response.
Am i missing indexes or is the a better syntax?
Is there any chance i can execute this query in a reasonable time?
Use this:
MATCH (n:Topic)
RETURN n, size( (n)--() ) AS num
ORDER BY num DESC
LIMIT 100
Which reads the degree from a node directly.

Neo4j Cypher, need an optimized way of getting friend of friend of friend who are not in a 1 or 2 degree connection

I am new to Neo4j and trying to get friends of friends of friends (those who are 3 degrees away) and are also not in a 1 or 2 degree relation through a different path. I am using the below cypher which seems to take a lot of time
MATCH p = (origin:User {ID:51})-[:LINKED*3..3]-(fof:User)
WHERE NOT (origin)-[:LINKED*..2]-(fof)
RETURN fof.Nm
ORDER BY Nm LIMIT 1000
Profiling the query shows that the majority of time is taken by the "WHERE NOT" condition as it cross checks every resultant node against all the 1 and 2 degree nodes.
Am I doing something wrong here or is there a more optimized way of doing this?
Just to add, the property UsrID in label User is indexed.
There are probably a few ways you could do it. Here's one to try:
MATCH path = (origin:User {ID:51})-[:LINKED*3..3]-(fofof:User)
WHERE NOT(fofof IN (nodes(path)[0..-1]))
RETURN fofof.Nm
ORDER BY fofof.Nm LIMIT 1000
You could also be more explicit:
MATCH path = (origin:User {ID:51})-[:LINKED]-(f:User)-[:LINKED]-(fof:User)-[:LINKED]-(fofof:User)
WHERE fofof <> f AND fofof <> fof
RETURN fofof.Nm
ORDER BY fofof.Nm LIMIT 1000

Neo4j / Cypher query syntax feedback

I'm developing a kind of reddit service to learn Neo4j.
Everything works fine, I just want to get some feedback on the Cypher query to get the most recent news stories, the author and number of comments, likes and dislikes.
I'm using Neo4j 2.0.
MATCH comments = (n:news)-[:COMMENT]-(o)
MATCH likes = (n:news)-[:LIKES]-(p)
MATCH dislikes = (n:news)-[:DISLIKES]-(q)
MATCH (n:news)-[:POSTED_BY]-(r)
WITH n, r, count(comments) AS num_comments, count(likes) AS num_likes, count(dislikes) AS num_dislikes
ORDER BY n.post_date
LIMIT 20
RETURN *
o, p, q, r are all nodes with the label user. Should the label be added to the query to speed it up?
Is there anything else you see that I could optimize?
I think you're going to want to get rid of the multiple matches. Cypher will filter on each one, filtering through one another, rather than getting all the information.
I would also avoid the paths like comments, and rather do the count on the nodes you are saving. When you do MATCH xyz = (a)-[:COMMENT]-(b) then xyz is a path, which contains the source, relationship and destination node.
MATCH (news:news)-[:COMMENT]-(comment),(news:news)-[:LIKES]-(like),(news:news)-[:DISLIKES]-(dislike),(news:news)-[:POSTED_BY]-(posted_by)
WHERE news.post_date > 0
WITH news, posted_by, count(comment) AS num_comments, count(like) AS num_likes, count(dislike) AS num_dislikes
ORDER BY news.post_date
LIMIT 20
RETURN *
I would do something like this.
MATCH (n:news)-[:POSTED_BY]->(r)
WHERE n.post_date > {recent_start_time}
RETURN n, r,
length((n)<-[:COMMENT]-()) AS num_comments,
length((n)<-[:LIKES]-()) AS num_likes,
length((n)<-[:DISLIKES]-()) AS num_dislikes,
ORDER BY n.post_date DESC
LIMIT 20
To speed it up and have not neo search over all your posts, I would probably index the post-date field (assuming it doesn't contain time information). And then send this query in for today, yesterday etc. until you have your 20 posts.
MATCH (n:news {post_date: {day}})-[:POSTED_BY]->(r)
RETURN n, r,
length((n)<-[:COMMENT]-()) AS num_comments,
length((n)<-[:LIKES]-()) AS num_likes,
length((n)<-[:DISLIKES]-()) AS num_dislikes,
ORDER BY n.post_date DESC
LIMIT 20

Resources