How to speed up lookup of multi-valued valued attributes in neo4j? - neo4j

I have created the following nodes in neo4j (1 million of them):
CREATE (p:Person { name: 'user1', email: ['user1#gmail.com', 'user1#yahoo.com'] }) RETURN p
CREATE (p:Person { name: 'user2', email: ['user2#gmail.com', 'user2#yahoo.com'] }) RETURN p
...
CREATE (p:Person { name: 'user1000000', email: ['user1000000#gmail.com', 'user1000000#yahoo.com'] }) RETURN p
I have created the following indexes:
CREATE BTREE INDEX i1 FOR (n:Person) ON (n.name)
CREATE BTREE INDEX i2 FOR (n:Person) ON (n.email)
With the above data, the following query takes 2ms to complete and I can concurrently execute about 2800 such queries per second on my desktop.
MATCH (p:Person) WHERE p.name = 'user10' RETURN DISTINCT p.name
But the following query takes 710ms to complete and I can concurrently execute only about 5 such queries per second on my desktop.
MATCH (p:Person) WHERE 'user10#gmail.com' IN p.email RETURN DISTINCT p.name
Is there any way to speed up the second query and also increase the throughput ?
Edit 1:
I tried to use separate nodes for email as suggested by #jose_bacoy in his answer.
I created the following nodes:
CREATE (m1:mail { email: 'user1#gmail.com' })
CREATE (m2:mail { email: 'user1#yahoo.com' })
CREATE (p:Person { name: 'user1' })
CREATE (p) - [:attribute] -> (m1)
CREATE (p) - [:attribute] -> (m2)
RETURN p
...
CREATE (m1:mail { email: 'user1000000#gmail.com' })
CREATE (m2:mail { email: 'user1000000#yahoo.com' })
CREATE (p:Person { name: 'user1000000' })
CREATE (p) - [:attribute] -> (m1)
CREATE (p) - [:attribute] -> (m2)
RETURN p
and indexed them as follows:
CREATE BTREE INDEX i1 FOR (n:Person) ON (n.name)
CREATE BTREE INDEX i2 FOR (n:mail) ON (n.email)
The speed is also good. Latency: 4ms, throughput 1850 queries per second.
The problem with this is that the following query performs very badly.
MATCH (p:Person) - [:attribute] -> (m1:mail)
MATCH (p) - [:attribute] -> (m2:mail)
WHERE m1.email = 'user10#gmail.com' OR m2.email = 'user10#yahoo.com'
RETURN DISTINCT p.name
On my desktop, the latency is about 5s and the throughput is less than 1 per second.
Edit 2:
I modified the query as suggested by Charchit Kapoor below. Following is the query I used.
MATCH (p:Person) - [:attribute] -> (m:mail)
WHERE m.email IN ['user10#gmail.com', 'user10#yahoo.com']
RETURN DISTINCT p.name
has a latency of about 4ms and throughput of about 2600 queries per second.

Your data model is not aligned to your query. Email is a list of emails in Person node and you are searching within a list. Below is a script to change your data model from Person.email into a relationship between Person -[:HAS_EMAIL]-> Email. The APOC function iterate will divide your Person nodes into batches and will run it in parallel for efficiency. Ensure that you have APOC installed.
Then it will create the (Person)->(Email) relationship and remove the property in Person after completion. You can change the batch size (10k per batch) according to your taste. You also want to create a unique index for Email. I will leave it up to you on how to do it.
CALL apoc.periodic.iterate(
"MATCH (p:Person) RETURN p as person;",
"WITH person
UNWIND person.email as email
MERGE (e:Email {email: email})
MERGE (person)-[:HAS_EMAIL]->(e)
SET person.email = null;",
{batchSize:10000, parallel:true, retries:3});
After doing this and creating the index on Email.email, profiling shows that the BTREE index is being used:
PROFILE MATCH (p:Person) -[:HAS_EMAIL] -> (e:Email)
WHERE e.email = 'user10#gmail.com'
RETURN DISTINCT p.name
BTREE INDEX e:Email(email) WHERE
email = $autostring_0
Previously, it shows NodeLabelByScan and Filter on $autostring_0 IN p.email. Even if you create an index on a list, it is not used.

Your second query can be structured differently, first find all the relevant emails and then find the related users:
MATCH (m1:mail)
WHERE m1.email IN ['user10#gmail.com', 'user10#yahoo.com']
MATCH (p)-[:attribute]->(m1)
RETURN DISTINCT p.name

Related

(Neo4j, Cypher) How to set incremental number to relationships?

i'm using neo4j. what i'd like to do is to create a root node for search result and to create relationships from root node to search result nodes. and I'd like to set incremental number to each relationship's property.
if possible, with one query.
Sorry for not explaining enough.
This is what I'd like to do.
Any more concise way?
// create test data
WITH RANGE(0, 99) AS indexes,
['Paul', 'Bley', 'Bill', 'Evans', 'Robert', 'Glasper', 'Chihiro', 'Yamanaka', 'Fred', 'Hersch'] AS names
UNWIND indexes AS index
CREATE (p:Person { index: index, name: (names[index%10] + toString(index)) });
// create 'Results' node with relationships to search result 'Person' nodes.
// 'SEARCH_RESULT' relationships have 'order' and 'orderBy' properties.
CREATE(x:Results{ts: TIMESTAMP()})
WITH x
MATCH(p:Person)
WHERE p.name contains '1'
MERGE(x)-[r:SEARCH_RESULT]->(p)
WITH x, r, p
MATCH (x)-[r]->(p)
WITH x, r, p
ORDER BY p.name desc
WITH RANGE(0, COUNT(r)-1) AS indexes, COLLECT(r) AS rels
UNWIND indexes AS i
SET (rels[i]).order = i
SET (rels[i]).orderBy = 'name'
RETURN rels;
// validate
MATCH(x:Results)-[r:SEARCH_RESULT]->(p:Person)
RETURN r, p.name ORDER BY r.order;

Aggregating multiple matches before ordering results

I have an Activity node (a) that refers to (:Something) which can be matched by a direct :LIKE relationship to :User 'me' OR a :LIKE relationship by a :FRIEND.
The first relationship can be described as:
MATCH (a)-[:REF]->(:Something)<-[:LIKE]-(:User {user: 'me'})
While the second relationship can be described as:
MATCH (a)-[:REF]->(:Something)<-[:LIKE]-(:User)<-[:FRIEND]-(:User {user: 'me'})
How would I go about grouping all of the different activity nodes (a) so that I can sort the full list by timestamps? It would look something like:
MATCH
(a)-[:REF]->(:Something)<-[:LIKE]-(:User {user: 'me'})
OR
(a)-[:REF]->(:Something)<-[:LIKE]-(:User)<-[:FRIEND]-(:User {user: 'me'})
RETURN a
ORDER BY a.ts DESC
In your case, you can use the variable-length pattern matching:
// u = node "me" or the node "friend"
MATCH
(:User {user: 'me'})-[:FRIEND*0..1]->(u:User)
MATCH
(a)-[:REF]->(:Something)<-[:LIKE]-(u)
RETURN DISTINCT a
ORDER BY a.ts DESC
Update: If the queries are completely different, then you can collect the result of the first query, then the result of the second query, sum up and unwind:
MATCH
(a1)-[:REF]->(:Something)<-[:OWN]-(:User {user: 'me'})
WITH
collect(DISTINCT a1) AS ac1
MATCH
(a2)-[:REF]->(:Something)<-[:INCLUDES]-(:SomethingElse)<-[:LIKE]-(:User {user: 'me'})
WITH
ac1, collect(DISTINCT a2) AS ac2
UNWIND
ac1 + ac2 AS a
RETURN DISTINCT a
ORDER BY a.ts DESC

Neo4j - Dynamic delete cypher query

I have following data structure of neo4j database:
USER
/ |
/ |
LIST |
\ |
\ |
CONTACT
means, USER have a relationship with LIST and LIST have relationship with CONTACT, but in some case, USER might have relationship with CONTACT (not all time). Now I want to delete CONTACT's data. I have write the following query:
MATCH (b:USER { id: {id} } )-[relationship01]->(pl:LIST {id: {listId} )
OPTIONAL MATCH (pl)-[cnpt:USER_LIST]->(cn:CONTACTS {id: {contactId} } )
DELETE cnpt, cn;
This query delete CONTACT with relationship with LIST. But in some case, I also have to delete relationship with USER. To solve this, I have write the following query:
MATCH (b:USER { id: {id} } )-[relationship01]->(pl:LIST {id: {listId} )
OPTIONAL MATCH (pl)-[cnpt:USER_LIST]->(cn:CONTACTS {id: {contactId} } )
OPTIONAL MATCH (b)-[bur]->(cnx:CONTACTS {id: {contactId} } )
DELETE cnpt, cn, bur, cnx;
This query delete CONTACT with relationship with LIST and USER, but problem is, if there is no relationship between CONTACT and USER, then it throw error.
How can I solve this problem?
Thanks in Advance.
You can't delete a node until all its relationships are deleted, which is why there is a shorthand for deleting all of a node's relationships, then the node itself: DETACH DELETE
So all you have to do is this:
MATCH (:USER { id: {id} } )-->(pl:LIST {id: {listId} )
OPTIONAL MATCH (pl)-[:USER_LIST]->(cn:CONTACTS {id: {contactId} } )
DETACH DELETE cn;

Chaining result with "WITH" doesn't work when the subsequent query doesn't have matched result in Neo4j cypher query

For example, I created two linked nodes:
create (a:ACTOR {id: "a1", name: "bruce wellis"})
create (m:MOVIE {id: "m1", title: "die hardest"})
create (a)-[:ACTED_IN]->(m)
1. From this cypher query:
match (a:ACTOR {id: "a1"})
with a
optional match (m:MOVIE {id: "m1"})
set m += {
title: "die easier"
}
return a;
I can have result:
+-----------------------------------------+
| a |
+-----------------------------------------+
| Node[1000]{name:"bruce wellis",id:"a1"} |
+-----------------------------------------+
1 row
Properties set: 1
The query successfully returned the actor node.
2. (UPDATED) But if you make the match MOVIE subquery failed:
match (a:ACTOR {id: "a1"})
with a
optional match (m:MOVIE {id: "mm"})
set m += {
title: "die easier"
}
return a;
I got error:
CypherTypeException: Expected m to be a node or a relationship, but it was :`null`.
How to make the second query returning matched actor result?
A MATCH that fails to match anything will always return no rows.
So, in #2, since the second MATCH failed, it returns no rows.
You could use OPTIONAL MATCH in place of the second MATCH, and you should see results.
[EDITED]
For the Updated question, this (somewhat ugly) workaround should work:
MATCH (a:ACTOR {id: "a1"})
WITH a
OPTIONAL MATCH (m:MOVIE {id: "mm"})
WITH a, COLLECT(m) AS cm
FOREACH(m IN cm | SET m += {title: "die easier"})
RETURN a;

match in clause in cypher

How can I do an match in clause in cypher
e.g. I'd like to find movies with ids 1, 2, or 3.
match (m:movie {movie_id:("1","2","3")}) return m
if you were going against an auto index the syntax was
START n=node:node_auto_index('movie_id:("123", "456", "789")')
how is this different against a match clause
The idea is that you can do:
MATCH (m:movie)
WHERE m.movie_id in ["1", "2", "3"]
However, this will not use the index as of 2.0.1. This is a missing feature in the new label indexes that I hope will be resolved soon. https://github.com/neo4j/neo4j/issues/861
I've found a (somewhat ugly) temporary workaround for this.
The following query doesn't make use of an index on Person(name):
match (p:Person)... where p.name in ['JOHN', 'BOB'] return ...;
So one option is to repeat the entire query n times:
match (p:Person)... where p.name = 'JOHN' return ...
union
match (p:Person)... where p.name = 'BOB' return ...
If this is undesirable then another option is to repeat just a small query for the id n times:
match (p:Person) where p.name ='JOHN' return id(p)
union
match (p:Person) where p.name ='BOB' return id(p);
and then perform a second query using the results of the first:
match (p:Person)... where id(p) in [8,16,75,7] return ...;
Is there a way to combine these into a single query? Can a union be nested inside another query?

Resources