I am a total beginner with Neo4j and need help. Is there a query for getting the first few nodes with highest degree?
I have nodes called P and nodes called A. There are only links between P and A nodes. I want to have the first 10 nodes P which have the most links to nodes A.
My idea was the following query, but it took so much time!
MATCH (P1:P)-[r]->(A1:A)
RETURN P1.name AS P_name, COUNT(A1) AS A_no
ORDER BY no DESC
LIMIT 10
Is there something wrong with my query?
Best,
Mowi
How many nodes do you have in your db?
I'd probably not use cypher for that, the Java API actually has a node.getDegree() method which is much much faster.
Your query could be sped up a bit by
MATCH (P1:P)-->()
RETURN id(P1),count(*) as degree
ORDER BY degree DESC LIMIT 10
you could also try:
MATCH (P1:P)
RETURN id(P1),size((P1)-->()) as degree
ORDER BY degree DESC LIMIT 10
for limiting the nodes:
MATCH (P1:P)
WHERE P1.foo = "bar"
WITH P1 limit 10000
MATCH (P1)-->()
RETURN id(P1),count(*) as degree
ORDER BY degree DESC LIMIT 10
Related
Say there are 2 labels P and M. M has nodes with names M1,M2,M3..M10. I need to associate 50 nodes of P with each Node of M. Also no node of label P should have 2 association with node of M.
This is the cypher query I could come up with, but doesn't seem to work.
MATCH (u:P), (r:M{Name:'M1'}),(s:M)
where not (s)-[:OWNS]->(u)
with u limit 50
CREATE (r)-[:OWNS]->(u);
This way I would run for all 10 nodes of M. Any help in correcting the query is appreciated.
You can utilize apoc.periodic.* library for batching. More info in documentation
call apoc.periodic.commit("
MATCH (u:P), (r:M{Name:'M1'}),(s:M) where not (s)-[:OWNS]->(u)
with u,r limit {limit}
CREATE (r)-[:OWNS]->(u)
RETURN count(*)
",{limit:10000})
If there will always be just one (r)-[:OWNS]->(u) relationship, I would change my first match to include
call apoc.periodic.commit("
MATCH (u:P), (r:M{Name:'M1'}),(s:M) where not (s)-[:OWNS]->(u) and not (r)-[:OWNS]->(u)
with u,r limit {limit}
CREATE (r)-[:OWNS]->(u)
RETURN count(*)
",{limit:10000})
So there is no way the procedure will fall into a loop
This query should be a fast and easy-to-understand. It is fast because it avoids Cartesian products:
MATCH (u:P)
WHERE not (:M)-[:OWNS]->(u)
WITH u LIMIT 50
MATCH (r:M {Name:'M1'})
CREATE (r)-[:OWNS]->(u);
It first matches 50 unowned P nodes. It then finds the M node that is supposed to be the "owner", and creates an OWNS relationship between it and each of the 50 P nodes.
To make this query even faster, you can first create an index on :M(Name) so that the owning M node can be found quickly (without scanning all M nodes):
CREATE INDEX ON :M(Name);
This worked for me.
MATCH (u:P), (r:M{Name:'M1'}),(s:M)
where not (s)-[:OWNS]->(u)
with u,r limit 50
CREATE (r)-[:OWNS]->(u);
Thanks for Thomas for mentioning limit on u and r.
I think one way to connect all 10 nodes :M in one query
MATCH (m:M)
WITH collect(m) as nodes
UNWIND nodes as node
MATCH (p:P) where not ()-[:OWNS]->(p)
WITH node,p limit 50
CREATE (node)-[:OWNS]->(p)
Although I am not really sure if we need to collect and unwind, could just simplify it to:
MATCH (m:M)
MATCH (p:P) where not ()-[:OWNS]->(p)
WITH m,p limit 50
CREATE (node)-[:OWNS]->(p)
I'm trying to model a large knowledge graph. (using v3.1.1).
My actual graph contains only two types of Nodes (Topic, Properties) and a single type of Relationships (HAS_PROPERTIES).
The count of nodes is about 85M (47M :Topic, the rest of nodes are :Properties).
I'm trying to get the most connected node:Topic for this. I'm using the following query:
MATCH (n:Topic)-[r]-()
RETURN n, count(DISTINCT r) AS num
ORDER BY num
This query or almost any query I try to perform (without filtering the results) using the count(relationships) and order by count(relationships) is always extremely slow: these queries take more than 10 minutes and still no response.
Am i missing indexes or is the a better syntax?
Is there any chance i can execute this query in a reasonable time?
Use this:
MATCH (n:Topic)
RETURN n, size( (n)--() ) AS num
ORDER BY num DESC
LIMIT 100
Which reads the degree from a node directly.
I'm trying to figure out how to optimize a cypher query on a very large dataset. I'm trying to find 2nd or 3rd degree friends in the same city. My current cypher query is, which takes over 1 minute to run:
match (n:User {id: 123})-[:LIVES_IN]->()<-[:LIVES_IN]-(u:User), (n)-[:FRIENDS_WITH*2..3]-(u) WHERE u.age >= 20 AND u.age <= 36 return u limit 100
There are approximately 500K User nodes and 500M FRIENDS_WITH relationships. I already have indexes on the id and age properties. The query seems to be choking on the FRIENDS_WITH requirement. Is there any way to think about this in a different way or optimize the cypher to make it real-time (i.e., max time 1-2 seconds)?
Here's the profile of the query:
Thanks.
Create index on id property for label User:
CREATE INDEX ON :User(id)
See documentation for schema indexes for more information http://neo4j.com/docs/stable/query-schema-index.html
If that doesn't help add a result of PROFILE query and we might be able to help you more
PROFILE MATCH ... rest of your query
Also it might be worth trying rewriting the query the following way:
MATCH (n:User {id: 123})-[:LIVES_IN]->()<-[:LIVES_IN]-(u:User),
(n)-[:FRIENDS_WITH*2..3]-(u)
WHERE u.age >= 20 AND u.age <= 36
return u limit 100
I am new to Neo4j and trying to get friends of friends of friends (those who are 3 degrees away) and are also not in a 1 or 2 degree relation through a different path. I am using the below cypher which seems to take a lot of time
MATCH p = (origin:User {ID:51})-[:LINKED*3..3]-(fof:User)
WHERE NOT (origin)-[:LINKED*..2]-(fof)
RETURN fof.Nm
ORDER BY Nm LIMIT 1000
Profiling the query shows that the majority of time is taken by the "WHERE NOT" condition as it cross checks every resultant node against all the 1 and 2 degree nodes.
Am I doing something wrong here or is there a more optimized way of doing this?
Just to add, the property UsrID in label User is indexed.
There are probably a few ways you could do it. Here's one to try:
MATCH path = (origin:User {ID:51})-[:LINKED*3..3]-(fofof:User)
WHERE NOT(fofof IN (nodes(path)[0..-1]))
RETURN fofof.Nm
ORDER BY fofof.Nm LIMIT 1000
You could also be more explicit:
MATCH path = (origin:User {ID:51})-[:LINKED]-(f:User)-[:LINKED]-(fof:User)-[:LINKED]-(fofof:User)
WHERE fofof <> f AND fofof <> fof
RETURN fofof.Nm
ORDER BY fofof.Nm LIMIT 1000
I just imported the English Wikipedia into Neo4j and am playing around. I started by looking up the pages that link into the Page "Berlin"
MATCH p=(p1:Page {title:"Berlin"})<-[*1..1]-(otherPage)
WITH nodes(p) as neighbors
LIMIT 500
RETURN DISTINCT neighbors
That works quite well. What I would like to achieve next is to show the 2nd degree of relationships. In order to be able to display them correctly, I would like to limit the number of first degree relationship nodes to 20 and then query the next level of relationship.
How does one achieve that?
I don't know the Wikipedia model, but I'm assuming that there are many different relationship types and that is why that -[*1..1]-, I think that is analogous to -[]- or even --. I doubt it has any serious impact though.
You can collect up the first level matches and limit them to 20 using a WITH with a LIMIT. You can then perform a second match using those (<20) other pages as the start point.
MATCH (p1:Page {title:"Berlin"})<-[*1..1]-(otherPage:Page)
WITH p1, otherPage
LIMIT 20
MATCH (otherPage)<-[*1..1]-(secondDegree:Page)
WHERE secondDegree <> p1
WITH otherPage, secondDegree
LIMIT 500
RETURN otherPage, COLLECT(secondDegree)
There are many ways to return the data, this just returns the first degree match with an array of the subsequent matches.
If the only type of relationship is :Link and you want to keep the start node then you can change the query to this:
MATCH (p1:Page {title:"Berlin"})<-[:Link]-(otherPage:Page)
WITH p1, otherPage
LIMIT 20
MATCH (otherPage)<-[:Link]-(secondDegree:Page)
WHERE secondDegree <> p1
WITH p1, otherPage, secondDegree
LIMIT 500
RETURN p1, otherPage, COLLECT(secondDegree)