I have a problem with one composed query, which has three parts.
Get direct friends
Get friends of friends
Get others - just fill up space to limit
So it should always return limited users, ordered by direct friends, friends of friends and others. First two parts are very fast, no problem here, but last part is slow and it's getting slower while db is growing on size. There are indexes on Person.number and Person.createdAt.
Does anyone have an idea how to improve or rewrite this query, to be more performant?
MATCH (me:Person { number: $number })-[r:KNOWS]-(contact:Person { registered: "true" }) WHERE contact.number <> $number AND (r.state = "contact" OR r.state = "declined")
MATCH (contact)-[:HAS_AVATAR]-(avatar:Avatar { primary: true })
WITH contact, avatar
RETURN contact AS friend, avatar, contact.createdAt AS rank
ORDER BY contact.createdAt DESC
UNION
MATCH (me:Person { number: $number })-[:KNOWS]-(friend)-[:KNOWS { state: "accepted" }]-(friend_of_friend:Person { registered: "true" }) WHERE NOT friend.username = 'default' AND NOT (me)-[:KNOWS]-(friend_of_friend)
MATCH (friend_of_friend)-[:HAS_AVATAR]-(avatar:Avatar { primary: true })
OPTIONAL MATCH (friend_of_friend)-[rel:KNOWS]-(friend)
RETURN friend_of_friend AS friend, avatar, COUNT(rel) AS rank
ORDER BY rank DESC
UNION
MATCH (me:Person { number: $number })
MATCH (others:Person { registered: "true" }) WHERE others.number <> $number AND NOT (me)-[:KNOWS]-(others) AND NOT (me)-[:KNOWS]-()-[:KNOWS { state: "accepted" }]-(others:Person { registered: "true" })
MATCH (others)-[:HAS_AVATAR]->(avatar:Avatar { primary: true })
OPTIONAL MATCH (others)-[rel:KNOWS { state: "accepted" }]-()
WITH others, rel, avatar
RETURN others AS friend, avatar, COUNT(rel) AS rank
ORDER BY others.createdAt DESC
SKIP $skip
LIMIT $limit
Here are some profiles:
https://i.stack.imgur.com/LfNww.png
https://i.stack.imgur.com/0EO0r.png
Final solution is to break down the whole query into three and call them separately, in our case it won't reach 3rd query in 99% and first two are super fast. And it seems that even if it reach 3rd stage, it is still fast, so maybe UNION was slowing the whole thing down the most.
const contacts = await this.neo4j.readQuery(`...
if (contacts.records.length < limit){
const friendOfFriend = await this.neo4j.readQuery(`...
if (contacts.records.length + friendOfFriend.records.length < limit){
const others = await this.neo4j.readQuery(`...
merge all results
You're doing a lot of work in that third query before the limit. You may want to move the ordering and LIMIT up sooner.
It's also going to be more efficient to pre-match to the friends (and friends of friends) in a single MATCH pattern, we can use *0..1 as an optional relationship to a potential next node.
And just a bit of style advice, I find it a good idea to reserve plurals for lists/collections and otherwise use singular, as you will only have a single one of those nodes per row.
Try this out for the third part:
MATCH (me:Person { number: $number })
OPTIONAL MATCH (me)-[:KNOWS]-()-[:KNOWS*0..1 { state: "accepted" }]-(other:Person {registered:"true"})
WITH collect(DISTINCT other) as excluded
MATCH (other:Person { registered: "true" }) WHERE other.createdAt < dateTime() AND other.number <> $number AND NOT other IN excluded
WITH other
ORDER BY other.createdAt DESC
SKIP $skip
LIMIT $limit
MATCH (other)-[:HAS_AVATAR]->(avatar:Avatar { primary: true })
WITH other, avatar, size((other)-[:KNOWS { state: "accepted" }]-()) AS rank
RETURN other AS friend, avatar, rank
If we know the type of createdAt then we can add a modification that may trigger index-backed ordering which could improve this.
Related
Consider the following use of UNION cypher command:
MATCH (user:User)-[]-(org:Organization)
WHERE org.size > 100
RETURN collect({
user.name,
user.age
}) AS userList
UNION
MATCH (user:User)-[]-(family:Family)
WHERE family.mood = "Happy"
RETURN collect({
user.name,
user.age
}) AS userList
The UNION does not work, this query returns users only from the first MATCH. I suspect it's because of the collect statements, however the project's design requires the data to be collected. Is there a way to create a union of the collections, or perhaps collect after the union?
Your query will work just fine except that you should 1) return a valid dictionary format and 2) use CALL which is a subquery for neo4j cypher.
RETURN {
name: user.name,
age: user.age
} AS userList
See sample below:
CALL {MATCH (user:user{id:"some_id"})
RETURN {
id: user.id,
age: user.age
} AS userList
UNION
MATCH (user:user{id:"some_id2"})
RETURN {
id: user.id,
age: user.age
} AS userList
}
RETURN collect(userList) as userList
Result:
╒══════════════════════════════════════════════════════════╕
│"userList" │
╞══════════════════════════════════════════════════════════╡
│[{"id":"some_id","age":null},{"id":"some_id2","age":null}]│
└──────────────────────────────────────────────────────────┘
I am using neo4j version 4.4.3
You can use apoc.coll.union of the APOC library, to create a union of two lists, like this:
MATCH (user:User)-[]-(org:Organization)
WHERE org.size > 100
WITH collect({
user.name,
user.age
}) AS userList1
MATCH (user:User)-[]-(family:Family)
WHERE family.mood = "Happy"
WITH userList1, collect({
user.name,
user.age
}) AS userList2
RETURN apoc.coll.union(userList1, userList2) AS userList
The function apoc.coll.union will not include duplicates, if you want to include duplicates use apoc.coll.unionAll.
I'm new in cypher and I'm struggling with this problem:
I have these two queries
MATCH (u:UserNode)-[:PROMOTER_OF*1..]->(c:UserNode)
WHERE u.promoterActualRole IN ["GOLD","RUBY","SAPPHIRE","BRONZE","EMERALD", "DIAMOND"]
AND datetime(c.promoterStartActivity) >= datetime("2021-02-01T00:00:00Z")
AND datetime(c.promoterStartActivity)<= datetime("2021-05-31T23:59:59Z")
AND c.promoterEnabled = true
AND u.firstName="Gianvito"
WITH distinct u as user, count(c) as num_promoter
WHERE num_promoter >= 150
RETURN user.firstName as name, user.email as email, num_promoter
which will return me a table like this
name
email
num_promoter
Gianvito
gianvito#email.com
1475
and
MATCH (u:UserNode)-[:PROMOTER_OF*1..]->(c:UserNode)
WHERE u.promoterActualRole IN ["GOLD","RUBY","SAPPHIRE","BRONZE","EMERALD", "DIAMOND"]
AND datetime(c.subscriptionDate) >= datetime("2021-02-01T00:00:00Z")
AND datetime(c.subscriptionDate)<= datetime("2021-05-31T23:59:59Z")
AND c.kycStatus = "OK"
AND u.firstName="Gianvito"
WITH distinct u as user, count(c) as num_swaggy
WHERE num_swaggy >= 1
RETURN user.firstName as name, user.email as email , num_swaggy
name
email
num_swaggy
Gianvito
gianvito#email.com
1820
I would like to merge these two results into a single table.
I was doing a Union but in this way I can only create a single table with two different rows with duplicate common information and "null" as non present value.
How can I do if I want to obtain a table like this one?
name
email
num_promoter
num_swaggy
Gianvito
gianvito#email.com
1475
1820
If you're using Neo4j 4.x or higher, you can UNION the results of the queries in a subquery, and outside of it perform a sum() to get the results into a single row per user:
CALL {
MATCH (u:UserNode)-[:PROMOTER_OF*1..]->(c:UserNode)
WHERE u.promoterActualRole IN ["GOLD","RUBY","SAPPHIRE","BRONZE","EMERALD", "DIAMOND"]
AND datetime(c.promoterStartActivity) >= datetime("2021-02-01T00:00:00Z")
AND datetime(c.promoterStartActivity)<= datetime("2021-05-31T23:59:59Z")
AND c.promoterEnabled = true
AND u.firstName="Gianvito"
WITH u as user, count(c) as num_promoter
WHERE num_promoter >= 150
RETURN user, num_promoter, 0 as num_swaggy
UNION
MATCH (u:UserNode)-[:PROMOTER_OF*1..]->(c:UserNode)
WHERE u.promoterActualRole IN ["GOLD","RUBY","SAPPHIRE","BRONZE","EMERALD", "DIAMOND"]
AND datetime(c.subscriptionDate) >= datetime("2021-02-01T00:00:00Z")
AND datetime(c.subscriptionDate)<= datetime("2021-05-31T23:59:59Z")
AND c.kycStatus = "OK"
AND u.firstName="Gianvito"
WITH u as user, count(c) as num_swaggy
WHERE num_swaggy >= 1
RETURN user, 0 as num_promoter, num_swaggy
}
WITH user, sum(num_promoter) as num_promoter, sum(num_swaggy) as num_swaggy
RETURN user.firstName as name, user.email as email , num_promoter, num_swaggy
Also you don't need to use DISTINCT when you're performing any aggregation, since the grouping key will become distinct automatically as a result of the aggregation.
I am working on the cypher below but I am not sure how to implement parameterised sorting. I want to sort with the parameters $field and $sort where $field could be: 'species.name', 'species.description', 'species.scientificName', 'monthCount', 'eats' or 'eatenBy' and $sort is only 'asc' or 'desc'. If either of those values is hardcoded then the cypher runs but when passed as a parameter it fails. Any help would be greatly appreciated! :)
MATCH (s:Species)
WHERE toLower(s.name) CONTAINS toLower($search)
WITH s
OPTIONAL MATCH (s)-[e:EATS]->(eatsSpecies:Species)
OPTIONAL MATCH (s)<-[:EATEN_BY]-(eatenBySpecies:Species)
OPTIONAL MATCH (s)<-[:IS_ABOUT]-(image:Image)
OPTIONAL MATCH (s)-[:FALLS_UNDER]->(primary:Primary)
OPTIONAL MATCH (s)-[:MEASURED_BY]->(month:Month)
WITH s, eatsSpecies, eatenBySpecies, image, primary, month
WITH s,
count(DISTINCT eatsSpecies.name) AS eats,
count(DISTINCT eatenBySpecies.name) AS eatenBy,
primary,
image,
count(distinct month) as monthCount
WITH {
name: s.name,
scientificName: s.scientificName,
description: s.description,
primary: case when exists(primary.GUID) then true else false end,
active: case when exists(s.active) then s.active else true end,
months: monthCount,
guid: s.GUID,
eats: eats,
eatenBy: eatenBy,
image: case when exists(image.url) then true else false end
} AS species order by $field $sort
SKIP $skip
LIMIT $limit
RETURN collect(species)
Something like this could work, by adding a field key to the object and inverting the collection before the SKIP and LIMIT
The suggestion below is pure Cypher. Using apoc, you could also create your dynamic queries.
MATCH (s:Species)
WHERE toLower(s.name) CONTAINS toLower($search)
WITH s
OPTIONAL MATCH (s)-[e:EATS]->(eatsSpecies:Species)
OPTIONAL MATCH (s)<-[:EATEN_BY]-(eatenBySpecies:Species)
OPTIONAL MATCH (s)<-[:IS_ABOUT]-(image:Image)
OPTIONAL MATCH (s)-[:FALLS_UNDER]->(primary:Primary)
OPTIONAL MATCH (s)-[:MEASURED_BY]->(month:Month)
WITH s, eatsSpecies, eatenBySpecies, image, primary, month
WITH s,
// add a 'sortField'
s[$field] AS field,
count(DISTINCT eatsSpecies.name) AS eats,
count(DISTINCT eatenBySpecies.name) AS eatenBy,
primary,
image,
count(distinct month) as monthCount
WITH {
name: s.name,
scientificName: s.scientificName,
description: s.description,
primary: case when exists(primary.GUID) then true else false end,
active: case when exists(s.active) then s.active else true end,
months: monthCount,
guid: s.GUID,
eats: eats,
eatenBy: eatenBy,
image: case when exists(image.url) then true else false end,
field: field
} AS species ORDER BY species.field
WITH COLLECT(species) AS sortedSpecies
RETURN CASE $sort
WHEN "asc" THEN sortedSpecies[$skip .. $limit]
ELSE REDUCE(array=[], i IN RANGE(1,size(sortedSpecies)) |
array
+sortedSpecies[size(sortedSpecies)-i]
)[$skip .. $limit]
END AS sortedSpecies
From the docs:
You can also chain multiple where() methods to create more specific queries (logical AND).
How can I perform an OR query?
Example:
Give me all documents where the field status is open OR upcoming
Give me all documents where the field status == open OR createdAt <= <somedatetime>
OR isn't supported as it's hard for the server to scale it (requires keeping state to dedup). The work around is to issue 2 queries, one for each condition, and dedup on the client.
Edit (Nov 2019):
Cloud Firestore now supports IN queries which are a limited type of OR query.
For the example above you could do:
// Get all documents in 'foo' where status is open or upcmoming
db.collection('foo').where('status','in',['open','upcoming']).get()
However it's still not possible to do a general OR condition involving multiple fields.
With the recent addition of IN queries, Firestore supports "up to 10 equality clauses on the same field with a logical OR"
A possible solution to (1) would be:
documents.where('status', 'in', ['open', 'upcoming']);
See Firebase Guides: Query Operators | in and array-contains-any
suggest to give value for status as well.
ex.
{ name: "a", statusValue = 10, status = 'open' }
{ name: "b", statusValue = 20, status = 'upcoming'}
{ name: "c", statusValue = 30, status = 'close'}
you can query by ref.where('statusValue', '<=', 20) then both 'a' and 'b' will found.
this can save your query cost and performance.
btw, it is not fix all case.
I would have no "status" field, but status related fields, updating them to true or false based on request, like
{ name: "a", status_open: true, status_upcoming: false, status_closed: false}
However, check Firebase Cloud Functions. You could have a function listening status changes, updating status related properties like
{ name: "a", status: "open", status_open: true, status_upcoming: false, status_closed: false}
one or the other, your query could be just
...where('status_open','==',true)...
Hope it helps.
This doesn't solve all cases, but for "enum" fields, you can emulate an "OR" query by making a separate boolean field for each enum-value, then adding a where("enum_<value>", "==", false) for every value that isn't part of the "OR" clause you want.
For example, consider your first desired query:
Give me all documents where the field status is open OR upcoming
You can accomplish this by splitting the status: string field into multiple boolean fields, one for each enum-value:
status_open: bool
status_upcoming: bool
status_suspended: bool
status_closed: bool
To perform your "where status is open or upcoming" query, you then do this:
where("status_suspended", "==", false).where("status_closed", "==", false)
How does this work? Well, because it's an enum, you know one of the values must have true assigned. So if you can determine that all of the other values don't match for a given entry, then by deduction it must match one of the values you originally were looking for.
See also
in/not-in/array-contains-in: https://firebase.google.com/docs/firestore/query-data/queries#in_and_array-contains-any
!=: https://firebase.googleblog.com/2020/09/cloud-firestore-not-equal-queries.html
I don't like everyone saying it's not possible.
it is if you create another "hacky" field in the model to build a composite...
for instance, create an array for each document that has all logical or elements
then query for .where("field", arrayContains: [...]
you can bind two Observables using the rxjs merge operator.
Here you have an example.
import { Observable } from 'rxjs/Observable';
import 'rxjs/add/observable/merge';
...
getCombinatedStatus(): Observable<any> {
return Observable.merge(this.db.collection('foo', ref => ref.where('status','==','open')).valueChanges(),
this.db.collection('foo', ref => ref.where('status','==','upcoming')).valueChanges());
}
Then you can subscribe to the new Observable updates using the above method:
getCombinatedStatus.subscribe(results => console.log(results);
I hope this can help you, greetings from Chile!!
We have the same problem just now, luckily the only possible values for ours are A,B,C,D (4) so we have to query for things like A||B, A||C, A||B||C, D, etc
As of like a few months ago firebase supports a new query array-contains so what we do is make an array and we pre-process the OR values to the array
if (a) {
array addObject:#"a"
}
if (b) {
array addObject:#"b"
}
if (a||b) {
array addObject:#"a||b"
}
etc
And we do this for all 4! values or however many combos there are.
THEN we can simply check the query [document arrayContains:#"a||c"] or whatever type of condition we need.
So if something only qualified for conditional A of our 4 conditionals (A,B,C,D) then its array would contain the following literal strings: #["A", "A||B", "A||C", "A||D", "A||B||C", "A||B||D", "A||C||D", "A||B||C||D"]
Then for any of those OR combinations we can just search array-contains on whatever we may want (e.g. "A||C")
Note: This is only a reasonable approach if you have a few number of possible values to compare OR with.
More info on Array-contains here, since it's newish to firebase docs
If you have a limited number of fields, definitely create new fields with true and false like in the example above. However, if you don't know what the fields are until runtime, you have to just combine queries.
Here is a tags OR example...
// the ids of students in class
const students = [studentID1, studentID2,...];
// get all docs where student.studentID1 = true
const results = this.afs.collection('classes',
ref => ref.where(`students.${students[0]}`, '==', true)
).valueChanges({ idField: 'id' }).pipe(
switchMap((r: any) => {
// get all docs where student.studentID2...studentIDX = true
const docs = students.slice(1).map(
(student: any) => this.afs.collection('classes',
ref => ref.where(`students.${student}`, '==', true)
).valueChanges({ idField: 'id' })
);
return combineLatest(docs).pipe(
// combine results by reducing array
map((a: any[]) => {
const g: [] = a.reduce(
(acc: any[], cur: any) => acc.concat(cur)
).concat(r);
// filter out duplicates by 'id' field
return g.filter(
(b: any, n: number, a: any[]) => a.findIndex(
(v: any) => v.id === b.id) === n
);
}),
);
})
);
Unfortunately there is no other way to combine more than 10 items (use array-contains-any if < 10 items).
There is also no other way to avoid duplicate reads, as you don't know the ID fields that will be matched by the search. Luckily, Firebase has good caching.
For those of you that like promises...
const p = await results.pipe(take(1)).toPromise();
For more info on this, see this article I wrote.
J
OR isn't supported
But if you need that you can do It in your code
Ex : if i want query products where (Size Equal Xl OR XXL : AND Gender is Male)
productsCollectionRef
//1* first get query where can firestore handle it
.whereEqualTo("gender", "Male")
.addSnapshotListener((queryDocumentSnapshots, e) -> {
if (queryDocumentSnapshots == null)
return;
List<Product> productList = new ArrayList<>();
for (DocumentSnapshot snapshot : queryDocumentSnapshots.getDocuments()) {
Product product = snapshot.toObject(Product.class);
//2* then check your query OR Condition because firestore just support AND Condition
if (product.getSize().equals("XL") || product.getSize().equals("XXL"))
productList.add(product);
}
liveData.setValue(productList);
});
For Flutter dart language use this:
db.collection("projects").where("status", whereIn: ["public", "unlisted", "secret"]);
actually I found #Dan McGrath answer working here is a rewriting of his answer:
private void query() {
FirebaseFirestore db = FirebaseFirestore.getInstance();
db.collection("STATUS")
.whereIn("status", Arrays.asList("open", "upcoming")) // you can add up to 10 different values like : Arrays.asList("open", "upcoming", "Pending", "In Progress", ...)
.addSnapshotListener(new EventListener<QuerySnapshot>() {
#Override
public void onEvent(#Nullable QuerySnapshot queryDocumentSnapshots, #Nullable FirebaseFirestoreException e) {
for (DocumentSnapshot documentSnapshot : queryDocumentSnapshots) {
// I assume you have a model class called MyStatus
MyStatus status= documentSnapshot.toObject(MyStatus.class);
if (status!= null) {
//do somthing...!
}
}
}
});
}
I have nodes- named "options". "Users" choose these options. I need a chpher query that works like this:
retrieve users who had chosen all the options those are given as a list.
MATCH (option:Option)<-[:CHOSE]-(user:User) WHERE option.Key IN ['1','2','2'] Return user
This query gives me users who chose option(1), option(2) and option(3) and also gives me the user who only chose option(2).
What I need is only the users who chose all of them -option(1), option(2) and option(3).
For an all cypher solution (don't know if it's better than Chris' answer, you'll have to test and compare) you can collect the option.Key for each user and filter out those who don't have a option.Key for each value in your list
MATCH (u:User)-[:CHOSE]->(opt:Option)
WITH u, collect(opt.Key) as optKeys
WHERE ALL (v IN {values} WHERE v IN optKeys)
RETURN u
or match all the options whose keys are in your list and the users that chose them, collect those options per user and compare the size of the option collection to the size of your list (if you don't give duplicates in your list the user with an option collection of equal size has chosen all the options)
MATCH (u:User)-[:CHOSE]->(opt:Option)
WHERE opt.Key IN {values}
WITH u, collect(opt) as opts
WHERE length(opts) = length({values}) // assuming {values} don't have duplicates
RETURN u
Either should limit results to users connected with all the options whose key values are specified in {values} and you can vary the length of the collection parameter without changing the query.
If the number of options is limited, you could do:
MATCH
(user:User)-[:Chose]->(option1:Option),
(user)-[:Chose]->(option2:Option),
(user)-[:Chose]->(option3:Option)
WHERE
option1.Key = '1'
AND option2.Key = '2'
AND option3.Key = '3'
RETURN
user.Id
Which will only return the user with all 3 options.
It's a bit rubbishy as obviously you end up with 3 lines where you have 1, but I don't know how to do what you want using the IN keyword.
If you're coding against it, it's pretty simple to generate the WHERE and MATCH clause, but still - not ideal. :(
EDIT - Example
Turns out there is some string manipulation going on here (!), but you can always cache bits. Importantly - it's using Params which would allow neo4j to cache the queries and supply faster responses with each call.
public static IEnumerable<User> GetUser(IGraphClient gc)
{
var query = GenerateCypher(gc, new[] {"1", "2", "3"});
return query.Return(user => user.As<User>()).Results;
}
public static ICypherFluentQuery GenerateCypher(IGraphClient gc, string[] options)
{
ICypherFluentQuery query = new CypherFluentQuery(gc);
for(int i = 0; i < options.Length; i++)
query = query.Match(string.Format("(user:User)-[:CHOSE]->(option{0}:Option)", i));
for (int i = 0; i < options.Length; i++)
{
string paramName = string.Format("option{0}param", i);
string whereString = string.Format("option{0}.Key = {{{1}}}", i, paramName);
query = i == 0 ? query.Where(whereString) : query.AndWhere(whereString);
query = query.WithParam(paramName, options[i]);
}
return query;
}
MATCH (user:User)-[:CHOSE]->(option:Option)
WHERE option.key IN ['1', '2', '3']
WITH user, COUNT(*) AS num_options_chosen
WHERE num_options_chosen = LENGTH(['1', '2', '3'])
RETURN user.name
This will only return users that have relationships with all the Options with the given keys in the array. This assumes there are not multiple [:CHOSE] relationships between users and options. If it is possible for a user to have multiple [:CHOSE] relationships with a single option, you'll have to add some conditionals as necessary.
I tested the above query with the below dataset:
CREATE (User1:User {name:'User 1'}),
(User2:User {name:'User 2'}),
(User3:User {name:'User 3'}),
(Option1:Option {key:'1'}),
(Option2:Option {key:'2'}),
(Option3:Option {key:'3'}),
(Option4:Option {key:'4'}),
(User1)-[:CHOSE]->(Option1),
(User1)-[:CHOSE]->(Option4),
(User2)-[:CHOSE]->(Option2),
(User2)-[:CHOSE]->(Option3),
(User3)-[:CHOSE]->(Option1),
(User3)-[:CHOSE]->(Option2),
(User3)-[:CHOSE]->(Option3),
(User3)-[:CHOSE]->(Option4)
And I get only 'User 3' as the output.
For shorter lists, you can use path predicates in your WHERE clause:
MATCH (user:User)
WHERE (user)-[:CHOSE]->(:Option { Key: '1' })
AND (user)-[:CHOSE]->(:Option { Key: '2' })
AND (user)-[:CHOSE]->(:Option { Key: '3' })
RETURN user
Advantages:
Clear to read
Easy to generate for dynamic length lists
Disadvantages:
For each different length, you will have a different query that has to be parsed and cached by Cypher. Too many dynamic queries will watch your cache hit rate go through the floor, query compilation work go up, and query performance go down.