I would like to retrieve a specific number of random nodes. The graph consists of 3 000 000 nodes where some of them are sources, some are target and some are both.
The aim is to retrieve random sources and as I don't know how to select random, the program generates k random numbers from 1 to 3 000 000 which represent node IDs and then discards all randomly selected nodes that are not sources. As this procedure is time-consuming, I wonder whether it is possible to directly select random sources with cypher query.
In case to select all sources, the query would be the following
START t=node(*) MATCH (a)-[:LEADS_TO]->(t) RETURN a
Does anyone know how would it be possible to select the limited number of random nodes directly with a cypher or, if not possible, suggest any workaround?
You can use such construction:
MATCH (a)-[:LEADS_TO]->(t)
RETURN a, rand() as r
ORDER BY r
It should return you random set of object.
Tested with Neo4j 2.1.3
You can limit your query with skip/limit so you could do
START t=node(*)
MATCH (a)-[:LEADS_TO]->(t)
RETURN a
SKIP {randomoffset} LIMIT {randomcount}
Otherwise you can also create a set of random node-id's and pass them as parameter to the cypher statement.
Another way of the one suggested here, for case you want a random Start nodes with all there connections is:
MATCH (a)-[:LEADS_TO]->[]
WITH a,rand() AS rand
ORDER BY rand LIMIT {YourLimit}
MATCH (a)-[l:LEADS_TO]->(t)
RETURN a,l,t
MATCH (n:Label)
WITH n, rand() AS r
ORDER BY r
RETURN n LIMIT <no. of random nodes>
Related
In my Neo4j database, there are many nodes with the same nodeID and different levels, and they are connected through a path. Each time I'm trying to find the node that has the biggest level which is smaller than a specific level n. I use the following Cypher query, which starts searching from the most current node with the nodeID id.
MATCH (:Node{NodeID:id,Current:'true'})-[:type*0..]->(m:Node{NodeID:id})
WHERE m.Level < n
RETURN m
ORDER BY m.Level DESC
LIMIT 1
And the index I create for this database is as following:
CREATE INDEX Nodes FOR(n:Node) ON (n.NodeID, n.Level)
However, it's kind of slow especially when the path is long and I need to repeat this process thousands of times. So my question is, is there any better way of implementation and do I need to modify my index to improve the performance? Thanks in advance for your help!
Assuming all Nodes with the same NodeID are in a type path rooted at the Current node with the same NodeID, then the following query should be logically equivalent but faster:
MATCH (m:Node)
WHERE m.NodeID = $id AND m.Level < $n
RETURN m
ORDER BY m.Level DESC
LIMIT 1
This query assumes id and n are query parameters.
match(m:master_node:Application)-[r]-(k:master_node:Server)-[r1]-(n:master_node)
where (m.name contains '' and (n:master_node:DeploymentUnit or n:master_node:Schema))
return distinct m.name,n.name
Hi,I am trying to get total number of records for the above query.How I change the query using count function to get the record count directly.
Thanks in advance
The following query uses the aggregating funtion COUNT. Distinct pairs of m.name, n.name values are used as the "grouping keys".
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
RETURN m.name, n.name, COUNT(*) AS cnt
I assume that m.name contains '' in your query was an attempt to test for the existence of m.name. This query uses the EXISTS() function to test that more efficiently.
[UPDATE]
To determine the number of distinct n and m pairs in the DB (instead of the number of times each pair appears in the DB):
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
WITH DISTINCT m.name AS n1, n.name AS n2
RETURN COUNT(*) AS cnt
Some things to consider for speeding up the query even further:
Remove unnecessary label tests from the MATCH pattern. For example, can we omit the master_node label test from any nodes? In fact, can we omit all label testing for any nodes without affecting the validity of the result? (You will likely need a label on at least one node, though, to avoid scanning all nodes when kicking off the query.)
Can you add a direction to each relationship (to avoid having to traverse relationships in both directions)?
Specify the relationship types in the MATCH pattern. This will filter out unwanted paths earlier. Once you do so, you may also be able to remove some node labels from the pattern as long as you can still get the same result.
Use the PROFILE clause to evaluate the number of DB hits needed by different Cypher queries.
You can find examples of how to use count in the Neo4j docs here
In your case the first example where:
count(*)
Is used to return a count of each returned item should work.
With the following graph:
How can I write a query that would return N latest relationships by the unique target node?
For an example, this query: MATCH (p)-[r:RATED_IN]->(s) WHERE id(p)={person} RETURN p,s,r ORDER BY r.measurementDate DESC LIMIT {N} with N = 1 would return the latest relationship, whether it is RATED_IN Team Lead or Programming, but I would like to get N latest by each type. Of course, with N = 2, I would like the 2 latest measurements per skill node.
I would like the latest relationship by a person for Team Lead and the latest one for Programming.
How can I write such a query?
-- EDIT --
MATCH (p:Person) WHERE id(p)=175
CALL apoc.cypher.run('
WITH {p} AS p
MATCH (p)-[r:RATED_IN]->(s)
RETURN DISTINCT s, r ORDER BY r.measurementDate DESC LIMIT 2',
{p:p}) YIELD value
RETURN p,value.r AS r, value.s AS s
Here's a Cypher knowledge base article on limiting MATCH results per row, with a few different suggestions on how to accomplish this given current limitations. Using APOC's apoc.cypher.run() to perform a subquery with a RETURN using a LIMIT will do the trick, as it gets executed per row (thus the LIMIT is per row).
Note that for the upcoming Neo4j 4.0 release at the end of the year we're going to be getting some nice Cypher goodies that will make this significantly easier. Stay tuned as we reveal more details as we approach its release!
I'm trying to model a large knowledge graph. (using v3.1.1).
My actual graph contains only two types of Nodes (Topic, Properties) and a single type of Relationships (HAS_PROPERTIES).
The count of nodes is about 85M (47M :Topic, the rest of nodes are :Properties).
I'm trying to get the most connected node:Topic for this. I'm using the following query:
MATCH (n:Topic)-[r]-()
RETURN n, count(DISTINCT r) AS num
ORDER BY num
This query or almost any query I try to perform (without filtering the results) using the count(relationships) and order by count(relationships) is always extremely slow: these queries take more than 10 minutes and still no response.
Am i missing indexes or is the a better syntax?
Is there any chance i can execute this query in a reasonable time?
Use this:
MATCH (n:Topic)
RETURN n, size( (n)--() ) AS num
ORDER BY num DESC
LIMIT 100
Which reads the degree from a node directly.
I am new to Neo4j and trying to get friends of friends of friends (those who are 3 degrees away) and are also not in a 1 or 2 degree relation through a different path. I am using the below cypher which seems to take a lot of time
MATCH p = (origin:User {ID:51})-[:LINKED*3..3]-(fof:User)
WHERE NOT (origin)-[:LINKED*..2]-(fof)
RETURN fof.Nm
ORDER BY Nm LIMIT 1000
Profiling the query shows that the majority of time is taken by the "WHERE NOT" condition as it cross checks every resultant node against all the 1 and 2 degree nodes.
Am I doing something wrong here or is there a more optimized way of doing this?
Just to add, the property UsrID in label User is indexed.
There are probably a few ways you could do it. Here's one to try:
MATCH path = (origin:User {ID:51})-[:LINKED*3..3]-(fofof:User)
WHERE NOT(fofof IN (nodes(path)[0..-1]))
RETURN fofof.Nm
ORDER BY fofof.Nm LIMIT 1000
You could also be more explicit:
MATCH path = (origin:User {ID:51})-[:LINKED]-(f:User)-[:LINKED]-(fof:User)-[:LINKED]-(fofof:User)
WHERE fofof <> f AND fofof <> fof
RETURN fofof.Nm
ORDER BY fofof.Nm LIMIT 1000