how to get a random set of records from an index with cypher query - neo4j

what's the syntax to get random records from a specific node_auto_index using cypher?
I suppose there is this example
START x=node:node_auto_index("uname:*") RETURN x SKIP somerandomNumber LIMIT 10;
Is there a better way that won't return a contiguous set?

there is no feature similar to SQL's Random() in neo4j.
you must either declare the random number in the SKIP random section before you use cypher (in case you are not querying directly from console and you use any upper language with neo4j)
- this will give a random section of nodes continuously in a row
or you must retrieve all the nodes and than make your own random in your upper language across these nodes - this will give you a random set of ndoes.
or, to make a pseudorandom function in cypher, we can try smthing like this:
START x=node:node_auto_index("uname:*")
WITH x, length(x.uname) as len
WHERE Id(x)+len % 3 = 0
RETURN x LIMIT 10
or make a sophisticated WHERE part in this query based upon the total number of uname nodes, or the ordinary ascii value of uname param, for example

Related

Co-occurence analysis in Neo4j database

Let's say I have a database with nodes of two types Candyjars and Candies. Every Candyjar (Candyjar1, Candyjar2...) has different number of candies of different types: CandyRed, CandyGreen etc..
Now let's say the end game here is to find how much is the probability of the various types of candies to occur together, and the covariance among them. Then I want to have relationships between each CandyType with an associated probabilities of co-occurence and covariance. Let's call this relationships OCCURS_WITH so that Candtype1 -[OCCURS_WITH]->Candytype2 and Candytype1 -[COVARIES]->Candytype2
I'd make a database with CandieTypes and CandyJars as nodes, make a relationship (cj:CandyJar)-[r:CONTAINS]->(ct:Candytype) where r can have an attribute to set "how many" candy of a type are cotained in the jar.
Noy my problems is that I don't understand how can i, in Cypher, make a query to assign the OCCURS_WITH relationship in an optimal manner. Would I have to iterate for every pair of Candies, counting the number of pairs that cooccurs in candyjars over the number of candyjars? Is there a way to do it for all of the possible pairs together?
When I try to do:
MATCH (ct1:Candytype)<-[r1:CONTAINS]-(cj:Candyjar)-[r2:CONTAINS]->(ct2:Candytype)
WHERE ct1<>ct2 AND ct1.name="CandyRed" AND ct2.name="CandyBlue"
RETURN ct1,r1,count(r1),cj1,ct2,r2,count(r2)
LIMIT 5
I cannot get the count of the relationships of the co-occurring candies that I would need to express the probability of co-occurrence.
Would I have to use something like python to do the calculations rather than try to make a statement in Cypher?
To get the count of how many times CandyRed and CandyBlue co-occur, you can use the following Cypher statement:
MATCH (ct1:Candytype)<-[:CONTAINS]-(:Candyjar)-[:CONTAINS]->(ct2:Candytype)
WHERE ct1.name="CandyRed" AND ct2.name="CandyBlue"
RETURN ct1,ct2, count(*) AS coOccur
LIMIT 5
If you want a query that will compare all the candy types, you can use:
MATCH (ct1:Candytype)<-[:CONTAINS]-(:Candyjar)-[:CONTAINS]->(ct2:Candytype)
WHERE id(ct1) < id(ct2)
RETURN ct1,ct2, count(*) AS coOccur
LIMIT 5

Auto increment id Neo4j to retrieve elements in insert order

Recently, I am experimenting Neo4j. I like the idea but I am facing a problem that I have never faced with relational databases.
I want to perform these inserts and then return them exactly in the insertion order.
Insert elements:
create(p1:Person {name:"Marc"})
create(p2:Person {name:"John"})
create(p3:Person {name:"Paul"})
create(p4:Person {name:"Steve"})
create(p5:Person {name:"Andrew"})
create(p6:Person {name:"Alice"})
create(p7:Person {name:"Bob"})
While to return them:
match(p:Person) return p order by id(p)
I receive the elements in the following order:
Paul
Andrew
Marc
John
Steve
Alice
Bob
I note that these elements are not returned respecting the query insertion order (through the id function).
In fact the id of my elements are the following:
Marc: 18221
John: 18222
Paul: 18208
Steve: 18223
Andrew: 18209
Alice: 18224
Bob: 18225
How does the Neo4j id function work? I read that it generates an auto incremental id but it seems a little strange his mechanism. How do I return items respecting the query insertion order? I thought about creating a timestamp attribute for each node but I don't think it's the best choice
If you're looking to generate sequence numbers in Neo4j then you need to manage this yourself using a strategy that works best in your application.
In ours we maintain sequence numbers in key/value pair nodes where Scope is the application name given to the sequence number range, and Value is the last sequence number used. When we generate a node of a given type, such as Product, then we increment the sequence number and assign it to our new node.
MERGE (n:Sequence {Scope: 'Product'})
SET n.Value = COALESCE(n.Value, 0) + 1
WITH n.Value AS seq
CREATE (product:Product)
SET product.UniqueId = seq
With this you can create as many sequence numbers you need just by creating sequence nodes with unique scope names.
For more examples and tests see the AutoInc.Neo4j project https://github.com/neildobson-au/AutoInc/blob/master/src/AutoInc.Neo4j/Neo4jUniqueIdGenerator.cs
The id of Neo4j is maintained internally, which your business code should not depend on.
Generally it's auto incrementally, but if there is delete operation, you may reuse the deleted id according to the Reuse Policy of Neo4j Server.

Neo4j variable-length pattern matching tunning

Query:
PROFILE
MATCH(node:Symptom) WHERE node.symptom =~ '.*adult male.*|.*151.*'
WITH node
MATCH (node)-[*1..2]-(result:Disease)
RETURN result
Profile:
enter image description here
Problems:
There are over 40 thousand "Symptom" nodes in the database, and the query is very slow because of the part - "[*1..2]".
It only took 4 seconds when length is 1, i.e "[*1]", but it will take about 30 seconds when length is 2, i.e "[*1..2]".
Is there any way to tune this query???
Firstly your query is using the regex operator, and it can't use indexes. You should use the CONTAINS operator instead :
MATCH (node:Symptom)
WHERE node.symptom CONTAINS 'adult male' OR node.symptom CONTAINS '151'
RETURN node
And you can create an index :CREATE INDEX ON :Symptom(symptom)
For the second part of your query, as it, there is nothing to do ... it's due to the complexity you are asking to do.
So to have better performances, you should think to :
put the relationship type on the pattern to reduce the number returned path : (node)-[*1..2:MY_REL_TYPE]-(result:Disease)
put the direction on the relationship on the pattern to reduce the number returned path : (node)-[*1..2:MY_REL_TYPE]->(result:Disease)
find an other way to reduce this complexity (filter on a property of the relationship , review your model, etc)
For your information, you can directly write your query in one step (ie. without the WITH, but in your case performances should be the same) :
MATCH (node:Symptom)-[*1..2]-(result:Disease)
WHERE node.symptom CONTAINS 'adult male' OR node.symptom CONTAINS '151'
RETURN result

Add Integer Number to existing Values - Neo4j

Using Neo4j.
I would like to add a integer number to values already existing in properties of several relationships that I call this way:
MATCH x=(()-[y]->(s:SOL{PRB:"Taking time"})) SET y.points=+2
But it doesn't add anything, just replace by 2 the value I want to incremente.
To achieve this use
SET y.points = y.points + 2
From your original question it looks like you were trying to use the Addition Assignment operator which exists in lots of languages (e.g. python, type/javascript, C#, etc.). However, in cypher += is a little different and is designed to do this in a way which allows you to add or update properties to or on entire nodes or relationships based on a mapping.
If you had a parameter like the below (copy this into the neo4j browser to create a param).
:param someMapping: {a:1, b:2}
The query below would create a property b on the node with value 2, and set the value of property a on that node to 1.
MATCH (n:SomeLabel) WHERE n.a = 0
SET n+= $someMapping
RETURN n

How to get total number of db-hits from Cypher query within a Java code?

I am trying to get total number of db-hits from my Cypher query. For some reason I always get 0 when calling this:
String query = "PROFILE MATCH (a)-[r]-(b)-[p]-(c)-[q]-(a) RETURN a,b,c";
Result result = database.execute(query);
while (result.hasNext()) {
result.next();
}
System.out.println(result.getExecutionPlanDescription().getProfilerStatistics().getDbHits());
The database seems to be ok. Is there something wrong about the way of reaching such value?
ExecutionPlanDescription is a tree like structure. Most likely the top element does not directly hit the database by itself, e.g. a projection.
So you need to write a recursive function using ExecutionPlanDescription.getChildren() to drill to the individual parts of the query plan. E.g. if one of the children (or sub*-children) is a plan of type Expand you can use plan.getProfilerStatistics().getDbHits().

Resources