(Neo4j, Cypher) How to set incremental number to relationships? - neo4j

i'm using neo4j. what i'd like to do is to create a root node for search result and to create relationships from root node to search result nodes. and I'd like to set incremental number to each relationship's property.
if possible, with one query.

Sorry for not explaining enough.
This is what I'd like to do.
Any more concise way?
// create test data
WITH RANGE(0, 99) AS indexes,
['Paul', 'Bley', 'Bill', 'Evans', 'Robert', 'Glasper', 'Chihiro', 'Yamanaka', 'Fred', 'Hersch'] AS names
UNWIND indexes AS index
CREATE (p:Person { index: index, name: (names[index%10] + toString(index)) });
// create 'Results' node with relationships to search result 'Person' nodes.
// 'SEARCH_RESULT' relationships have 'order' and 'orderBy' properties.
CREATE(x:Results{ts: TIMESTAMP()})
WITH x
MATCH(p:Person)
WHERE p.name contains '1'
MERGE(x)-[r:SEARCH_RESULT]->(p)
WITH x, r, p
MATCH (x)-[r]->(p)
WITH x, r, p
ORDER BY p.name desc
WITH RANGE(0, COUNT(r)-1) AS indexes, COLLECT(r) AS rels
UNWIND indexes AS i
SET (rels[i]).order = i
SET (rels[i]).orderBy = 'name'
RETURN rels;
// validate
MATCH(x:Results)-[r:SEARCH_RESULT]->(p:Person)
RETURN r, p.name ORDER BY r.order;

Related

How to speed up lookup of multi-valued valued attributes in neo4j?

I have created the following nodes in neo4j (1 million of them):
CREATE (p:Person { name: 'user1', email: ['user1#gmail.com', 'user1#yahoo.com'] }) RETURN p
CREATE (p:Person { name: 'user2', email: ['user2#gmail.com', 'user2#yahoo.com'] }) RETURN p
...
CREATE (p:Person { name: 'user1000000', email: ['user1000000#gmail.com', 'user1000000#yahoo.com'] }) RETURN p
I have created the following indexes:
CREATE BTREE INDEX i1 FOR (n:Person) ON (n.name)
CREATE BTREE INDEX i2 FOR (n:Person) ON (n.email)
With the above data, the following query takes 2ms to complete and I can concurrently execute about 2800 such queries per second on my desktop.
MATCH (p:Person) WHERE p.name = 'user10' RETURN DISTINCT p.name
But the following query takes 710ms to complete and I can concurrently execute only about 5 such queries per second on my desktop.
MATCH (p:Person) WHERE 'user10#gmail.com' IN p.email RETURN DISTINCT p.name
Is there any way to speed up the second query and also increase the throughput ?
Edit 1:
I tried to use separate nodes for email as suggested by #jose_bacoy in his answer.
I created the following nodes:
CREATE (m1:mail { email: 'user1#gmail.com' })
CREATE (m2:mail { email: 'user1#yahoo.com' })
CREATE (p:Person { name: 'user1' })
CREATE (p) - [:attribute] -> (m1)
CREATE (p) - [:attribute] -> (m2)
RETURN p
...
CREATE (m1:mail { email: 'user1000000#gmail.com' })
CREATE (m2:mail { email: 'user1000000#yahoo.com' })
CREATE (p:Person { name: 'user1000000' })
CREATE (p) - [:attribute] -> (m1)
CREATE (p) - [:attribute] -> (m2)
RETURN p
and indexed them as follows:
CREATE BTREE INDEX i1 FOR (n:Person) ON (n.name)
CREATE BTREE INDEX i2 FOR (n:mail) ON (n.email)
The speed is also good. Latency: 4ms, throughput 1850 queries per second.
The problem with this is that the following query performs very badly.
MATCH (p:Person) - [:attribute] -> (m1:mail)
MATCH (p) - [:attribute] -> (m2:mail)
WHERE m1.email = 'user10#gmail.com' OR m2.email = 'user10#yahoo.com'
RETURN DISTINCT p.name
On my desktop, the latency is about 5s and the throughput is less than 1 per second.
Edit 2:
I modified the query as suggested by Charchit Kapoor below. Following is the query I used.
MATCH (p:Person) - [:attribute] -> (m:mail)
WHERE m.email IN ['user10#gmail.com', 'user10#yahoo.com']
RETURN DISTINCT p.name
has a latency of about 4ms and throughput of about 2600 queries per second.
Your data model is not aligned to your query. Email is a list of emails in Person node and you are searching within a list. Below is a script to change your data model from Person.email into a relationship between Person -[:HAS_EMAIL]-> Email. The APOC function iterate will divide your Person nodes into batches and will run it in parallel for efficiency. Ensure that you have APOC installed.
Then it will create the (Person)->(Email) relationship and remove the property in Person after completion. You can change the batch size (10k per batch) according to your taste. You also want to create a unique index for Email. I will leave it up to you on how to do it.
CALL apoc.periodic.iterate(
"MATCH (p:Person) RETURN p as person;",
"WITH person
UNWIND person.email as email
MERGE (e:Email {email: email})
MERGE (person)-[:HAS_EMAIL]->(e)
SET person.email = null;",
{batchSize:10000, parallel:true, retries:3});
After doing this and creating the index on Email.email, profiling shows that the BTREE index is being used:
PROFILE MATCH (p:Person) -[:HAS_EMAIL] -> (e:Email)
WHERE e.email = 'user10#gmail.com'
RETURN DISTINCT p.name
BTREE INDEX e:Email(email) WHERE
email = $autostring_0
Previously, it shows NodeLabelByScan and Filter on $autostring_0 IN p.email. Even if you create an index on a list, it is not used.
Your second query can be structured differently, first find all the relevant emails and then find the related users:
MATCH (m1:mail)
WHERE m1.email IN ['user10#gmail.com', 'user10#yahoo.com']
MATCH (p)-[:attribute]->(m1)
RETURN DISTINCT p.name

Neo4j Match with multiple relationships

I need a MATCH where either relationship is true. I understand the (person1)-[:r1|:r2]-(person2). The problem I am having is that one of the MATCH traverse through another node. IE:
(p1:person)-[:FRIEND]-(p2:person)-[:FRIEND]-(p3:person)
So I want this kind of logic. The enemy of my enemy is my friend. And my friend is my friend. Output list of all the names who are my friend. I also limit the relationship to a particular value.
Something like:
MATCH (p1:Person)-[:ENEMY{type:'human'}]-(myEnemy:Person)-[enemy2:ENEMY{type:'human'}]-(myFriend:Person)
OR (p1:Person)-[friend:FRIEND{type:'human'}]-(myFriend:Person)
RETURN p1.name, myFriend.name
I need one list that I can then do aggregation on.
This is my first posting....so if my question is a mess...hit me with your feedback and I will clarify :)
You can use the UNION clause to combine 2 queries and also remove duplicate results:
MATCH (p:Person)-[:ENEMY{type:'human'}]-(:Person)-[:ENEMY{type:'human'}]-(f:Person)
WHERE ID(p) < ID(f)
RETURN p.name AS pName, f.name AS fName
UNION
MATCH (p:Person)-[:FRIEND{type:'human'}]-(f:Person)
WHERE ID(p) < ID(f)
RETURN p.name AS pName, f.name AS fName
The ID(p) < ID(f) filtering is done to avoid having the same pair of Person names being returned twice (in reverse order).
[UPDATE]
To get a count of how many friends each Person has, you can take advantage of the new CALL subquery syntax (in neo4j 4.0) to do post-union processing:
CALL {
MATCH (p:Person)-[:ENEMY{type:'human'}]-(:Person)-[:ENEMY{type:'human'}]-(f:Person)
WHERE ID(p) < ID(f)
RETURN p.name AS pName, f
UNION
MATCH (p:Person)-[:FRIEND{type:'human'}]-(f:Person)
WHERE ID(p) < ID(f)
RETURN p.name AS pName, f
}
RETURN pName, COUNT(f) AS friendCount

Aggregating multiple matches before ordering results

I have an Activity node (a) that refers to (:Something) which can be matched by a direct :LIKE relationship to :User 'me' OR a :LIKE relationship by a :FRIEND.
The first relationship can be described as:
MATCH (a)-[:REF]->(:Something)<-[:LIKE]-(:User {user: 'me'})
While the second relationship can be described as:
MATCH (a)-[:REF]->(:Something)<-[:LIKE]-(:User)<-[:FRIEND]-(:User {user: 'me'})
How would I go about grouping all of the different activity nodes (a) so that I can sort the full list by timestamps? It would look something like:
MATCH
(a)-[:REF]->(:Something)<-[:LIKE]-(:User {user: 'me'})
OR
(a)-[:REF]->(:Something)<-[:LIKE]-(:User)<-[:FRIEND]-(:User {user: 'me'})
RETURN a
ORDER BY a.ts DESC
In your case, you can use the variable-length pattern matching:
// u = node "me" or the node "friend"
MATCH
(:User {user: 'me'})-[:FRIEND*0..1]->(u:User)
MATCH
(a)-[:REF]->(:Something)<-[:LIKE]-(u)
RETURN DISTINCT a
ORDER BY a.ts DESC
Update: If the queries are completely different, then you can collect the result of the first query, then the result of the second query, sum up and unwind:
MATCH
(a1)-[:REF]->(:Something)<-[:OWN]-(:User {user: 'me'})
WITH
collect(DISTINCT a1) AS ac1
MATCH
(a2)-[:REF]->(:Something)<-[:INCLUDES]-(:SomethingElse)<-[:LIKE]-(:User {user: 'me'})
WITH
ac1, collect(DISTINCT a2) AS ac2
UNWIND
ac1 + ac2 AS a
RETURN DISTINCT a
ORDER BY a.ts DESC

neo4j pass parameter to variable length relationship

How do I use parameters with variable length relationships?
MATCH path=(:Person {id: {id}})=[:HAS_FRIEND*0..{num_friends}]->(:Person)
I'm trying to create a generic query so that I can pass a value 'num_friends' into the cypher query for various levels of relationships that I need.
I get an error so I'm wondering how something like this would be done?
Parameters can not be used as hops count.
But you can use path expander from apoc:
match (P:Person {id: {id}}) with P
call apoc.path.expand( P, 'HAS_FRIEND>', 'Person', 0, {num_friends}) yield path
return path
Adapted for comment:
match (P:Person {id: {id}}) with P
call apoc.path.expand( P, 'HAS_FRIEND>', 'Person', 0, {num_friends}) yield path
with path, last(nodes(path)) as lst where not (lst)-[:HAS_FRIEND]->(:Person)
return path

match in clause in cypher

How can I do an match in clause in cypher
e.g. I'd like to find movies with ids 1, 2, or 3.
match (m:movie {movie_id:("1","2","3")}) return m
if you were going against an auto index the syntax was
START n=node:node_auto_index('movie_id:("123", "456", "789")')
how is this different against a match clause
The idea is that you can do:
MATCH (m:movie)
WHERE m.movie_id in ["1", "2", "3"]
However, this will not use the index as of 2.0.1. This is a missing feature in the new label indexes that I hope will be resolved soon. https://github.com/neo4j/neo4j/issues/861
I've found a (somewhat ugly) temporary workaround for this.
The following query doesn't make use of an index on Person(name):
match (p:Person)... where p.name in ['JOHN', 'BOB'] return ...;
So one option is to repeat the entire query n times:
match (p:Person)... where p.name = 'JOHN' return ...
union
match (p:Person)... where p.name = 'BOB' return ...
If this is undesirable then another option is to repeat just a small query for the id n times:
match (p:Person) where p.name ='JOHN' return id(p)
union
match (p:Person) where p.name ='BOB' return id(p);
and then perform a second query using the results of the first:
match (p:Person)... where id(p) in [8,16,75,7] return ...;
Is there a way to combine these into a single query? Can a union be nested inside another query?

Resources