I'm a bit confused on the proper way to query Firebase with GeoFire results.
Reading the directions for GeoFire, I should keep locations and user data separate in the tree. So I have the following:
-location
- userID
- g
- l
- 0:lat
- 1:long
- user
-userID
- ...
Using a Geofire query, I have an array of userIDs that are nearby. This is where I get confused, I don't see any methods for Firebase to query an Array of items. I would like to retrieve the user data for every user in the array.
I could loop through the list and make multiple requests, but that seems to be an inefficient way to get the desired results.
I don't see any methods for Firebase to query an Array of items. I would like to retrieve the user data for every user in the array.
There indeed isn't a call for that.
I could loop through the list and make multiple requests, but that seems to be an inefficient way to get the desired results.
While it may seem inefficient, it actually is not. The reason for this is that Firebase is quite efficient at loading multiple items, since it retrieves all of them over the same web socket and pipelines the request.
For a more elaborate explanation, see Speed up fetching posts for my social network app by using query instead of observing a single event repeatedly
Related
I'm looking to build an app that functions like a dating app:
User A fetches All Users.
User A removes Users B, C, and D.
User A fetches All Users again - excluding Users B, C, and D.
My goal is to perform a query that does not read the User B, C, and D documents in my fetch query.
I've read into array-contains-any, array-contains, not-in queries, but the 10 item limit prevents me from using these as options because the "removed users list" will continue to grow.
2 workaround options I've mulled over are...
Performing a paginated fetch on All User documents and then filtering out on the client side?
Store all User IDs (A, B, C, D) on 1 document in an array field, fetch the 1 document, and then filter client side?
Any guidance would be extremely appreciated either on suggestions around how I store my data or specific queries I can perform.
You can do it the other way around.
Instead of a removed or ignored array at your current user, you have an array of ignoredBy or removedBy in which you add your current user.
And when you fetch the users from the users collection, you just have to check if the requesting user is part of the array ignoredBy. So you don’t have tons of entries to check in the array, it is always just one.
Firestore may get a little pricey with the Tinder model but you can certainly implement a very extensible architecture, well enough to scale to millions of users, without breaking a sweat. So the user queries a pool of people, and each person is represented by their own document, this much is obvious. The user must then take an action on each person/document, and, presumably, when an action is taken that person should no longer reappear in the user's queries. We obviously can't edit the queried documents because there could be millions of users and that wouldn't scale. And we shouldn't use arrays because documents have byte limits and that also wouldn't scale. So we have to treat a collection like an array, using documents as items, because collections have no known limit to how many documents they can contain.
So when the user takes an action on someone, consider creating a new document in a subcollection in the user's own document (user A, the one performing the query) that contains the person's uid, and perhaps a boolean to determine if they liked or disliked that person (i.e. liked: true), and maybe a timestamp for UI purposes. This new document is the item in your limitless array.
When the user later performs another query, those same users are going to reappear in the results, which you need to filter out. You have no choice but to check if each person's uid is in this subcollection. If it is, omit the document and move to the next. But if your UI is configured like Tinder's, where there isn't a list of people to scroll through but instead cards stacked on top of each other, this is no big deal. The user will only be presented with one person at a time and they won't know how many you're filtering out behind the scenes. With a paginated list, the user may see odd behavior like uneven pages. The drawback is that you're now paying double for each query. Each query will cost you the original fetch and the subcollection-check fetch. But, hey, with this model you can scale to millions of users without ever breaking a sweat.
I have a rather long and complex paginated query. I'm trying to optimize it. In the worst case - first, I have to execute the data query in a one call to Neo4j, and then I have to execute pretty much the same query for the count. Of course, I do everything in one transaction. Anyway, I don't like the overall execution time, so I extracted the most common part for both - data and count queries and execute it on the first call. This common query returns the IDs of nodes, which I then pass as parameters to the rest of data and count queries. Now, everything works much faster. One thing I don't like is that a common query can sometimes return quite a large set of IDs.. it can be 20k..50k Long IDs.
So my question is - because I'm doing this in a one transaction - is there a way to preserve such Set of IDs somewhere in Neo4j between common query and data/count query calls and just refer them somehow in the subsequent data/count queries without moving between app JVM and Neo4j?
Also, am I crazy for doing this, or is this a good approach to optimize a complex paginated query?
Only with a custom procedure.
Otherwise you'd need to return them.
But usually it's uncommon to both provide counts (even google doesn't provide "real" counts) and data.
One way is to just stream the results with the reactive driver as long as the user scrolls.
Otherwise I would just query for pageSize+1 and return "more than pageSize results".
If you just stream the id's back (and don't collect them as an aggregation) you can start using the id's received already to issue your new queries (even in parallel).
I recently uploaded 37,000 strings of data to Firebase (name of all cities in USA). To find that it takes way too long to go through each one using the observe .childAdded method to upload it to a basic table view.
Is there an alternative? How can I get the data to my app faster? The data shouldn’t change.... so is there an alternative?
There is no way to load the same data faster. Firebase isn't artificially throttling your download speed, so the time it takes to read the 37,000 strings, is he time it takes to read the 37,000 strings.
To make your application respond faster to the user, you will have to load less data. And since it's unlikely your user will read all 37,000 strings, a good first option is to only load the data that they will see.
Since you're describing an auto-complete scenario, I'd first look at using a query to only retrieve child nodes that match what they already typed. In Firebase that'd be something like this:
ref.queryOrdered(byChild: "name")
.queryStarting(atValue: "stack")
.queryEnding(atValue: "stack\u{f8ff}")
This code takes (on the server) all nodes under ref, and orders them by name. It then finds the first one starting with stack and returns all child nodes until it finds one not starting with stack anymore.
With this approach the filtering happens on the server, and the client only has to download the data that matches the query.
This is something easily solvable using Algolia. Algolia can search large data sets without taking much time at all. This way, you query Algolia and never need to look at the Firebase database.
In your Firebase Functions, listen for any new nodes in the place you keep your names of cities, and when that function gets called, add that string to your Algolia index.
You can follow the Algolia docs here: Algolia Docs
Dear Firebase Enthusiasts,
I'm having a problem of multiple querying on Firestore.
I'm developing for iOS.
Here's the data structure I'd like to query on:
Collection X
- Document A
- date: 1532271987 (timestamp)
- users: {
- user_id_1: "true"
- user_id_2: "true"
- user_id_3: "true"
- ...
}
- Document B
- date: ...
- users: {
- ...
}
- Documents...
I'd like to query documents that:
Contains a user_id_x on its users dictionary
(sth like users/user_id_x == true)
AND
It's date is greater than SOME_TIMESTAMP
Which is basically multiple querying on the same API request.
Also I would like to order by date and limit the number of fetched items to enable pagination, but that is sth I can already manage seamlessly, like this:
Given that I know the size of items I'd like to fetch, and a fromDate timestamp parameter
collectionReference
.order(by: "date", descending: false)
.whereField("date", isGreaterThan: fromDate)
.limit(to: size)
This works pretty fine.
What doesn't work is that I can't have this to be filtered by a user_id_x parameter on the server side. So I can't just simply use the following:
collectionReference
.order(by: "date", descending: false)
.whereField("date", isGreaterThan: fromDate)
.whereField("users/user_id_x", isEqualTo: true)
.limit(to: size)
What I tried before:
1-
The way of accessing user_id_x on such data structure by using users/user_id_x using the / should work according to the docs, but I couldn't simply have it work.. (using users.user_id_x where using the . to access a level deeper simply crashes on SWIFT)
So well, the structure of users doesn't have to be like this.. It can be an array (which I know is not so much preferred for querying.. Or the object can directly be like this:
- Document A
- date: 1532271987 (timestamp)
- user_id_1: "true"
- user_id_2: "true"
- user_id_3: "true"
- ...
However in this case Firestore must create individual indexes for each user_id_x present, which doesn't also work for multiple querying by the way.
2-
I tried creating joined indexes (the idea behind libraries like Querybase), where for example:
- Document A
- date: 1532271987 (timestamp)
- users: {
- user_id_1: "true"
- user_id_2: "true"
- user_id_3: "true"
- ...
}
- date_user_id_1:"true_1532271987"
- date_user_id_2:"true_1532271987"
- ...
- date_user_id_x:"true_SOME_TIMESTAMP"
So this structure would allow me to query for something like:
....whereField("date_user_id_x", isEqualTo: true_SOME_TIMESTAMP)
But this would fetch items that are matching a date.. In order to query for sth like
....whereField("date_user_id_x", isGreaterThan: true_SOME_TIMESTAMP)
which queries for isGreaterThan... Firestore complains (on console) that it must create indexes called date_user_id_x on each document.. This has to be done on each and everytime a user becomes related with a timestamp.. I'm not sure how to automate that but it would be too much indexing job either.. Most importantly,
I'm pretty lost of how to make this work in the exact way I'd like it to be.
3-
Client Side filtering
is something I'm trying to avoid..
So the idea behind is, let's pull 10 items from the global dictionary, and then filter them on the device locally, e.g filter if the item has user_id_x: true. But this is simply impractical for a social media app like I'm working on, where hundreds and hundreds of users will be present..
I'm trying to avoid it since it would result with using client's network aggressively, and might (most likely) result in fetches with no items relating to the specific user we want to filter to (so I'd have to build a mechanism that would query one more time, and one more until it pulls an X amount of items, which is practically too hard to manage). Moreover I'll be fetching image data based on the results so any mistakes here can result in too much data consumption for the user.
4-
Creating another dictionary/object to divide this job in multiple queries
So I can actually create another dictionary that can divide the query work load.. I can first filter on that by users and then with dates, and then pull Document_X's using the fetch results from that dictionary. But this is too hard to manage since there are almost 30+ jobs and cases on my app that results with changes on Document_X, user_id_X and some_date relationships. Maintaining these in a single dictionary is fine.. But to manage them in two dictionaries would simply require me writing 30+ Google Cloud Functions, which would simply mimic changes on 1st dictionary to the other one. Which is not practical at all. I've been down that road and gave up after my 3rd compensating function, since at I quickly filled my data usage quota on Firebase (luckily I was on trial).
Additional notes:
I've checked some other SO posts, which basically addresses the few solutions I mention above. I also delved into Google Forums and learned Google has been working on this since 2015, but no results yet.
I think there should be a clever way to do this, easily.. I would very much appreciate your comments, thoughts and guidance on here.
I'm trying to build a (simple) twitter-clone which uses CouchDB as Database-Backend.
Because of its reduced feature set, I'm almost finished with coding, but there's one thing left I can't solve with CouchDB - the per user timeline.
As with twitter, the per user timeline should show the tweets of all people I'm following, in a chronological order. With SQL it's a quite simple Select-Statement, but I don't know how to reproduce this with CouchDBs Map/Reduce.
Here's the SQL-Statement I would use with an RDBMS:
SELECT * FROM tweets WHERE user_id IN [1,5,20,33,...] ORDER BY created_at DESC;
CouchDB schema details
user-schema:
{
_id:xxxxxxx,
_rev:yyyyyy,
"type":"user",
"user_id":1,
"username":"john",
...
}
tweet-schema:
{
"_id":"xxxx",
"_rev":"yyyy",
"type":"tweet",
"text":"Sample Text",
"user_id":1,
...
"created_at":"2011-10-17 10:21:36 +000"
}
With view collations it's quite simple to query CouchDB for a list of "all tweets with user_id = 1 ordered chronologically".
But how do I retrieve a list of "all tweets which belongs to the users with the ID 1,2,3,... ordered chronologically"? Do I need another schema for my application?
The best way of doing this would be to save the created_at as a timestamp and then create a view, and map all tweets to the user_id:
function(doc){
if(doc.type == 'tweet'){
emit(doc.user_id, doc);
}
}
Then query the view with the user id's as keys, and in your application sort them however you want(most have a sort method for arrays).
Edited one last time - Was trying to make it all in couchDB... see revisions :)
Is that a CouchDB-only app? Or do you use something in between for additional buisness logic. In the latter case, you could achieve this by running multiple queries.
This might include merging different views. Another approach would be to add a list of "private readers" for each tweet. It allows user-specific (partial) views, but also introduces the complexity of adding the list of readers for each new tweet, or even updating the list in case of new followers or unfollow operations.
It's important to think of possible operations and their frequencies. So when you're mostly generating lists of tweets, it's better to shift the complexity into the way how to integrate the reader information into your documents (i.e. integrating the readers into your tweet doc) and then easily build efficient view indices.
If you have many changes to your data, it's better to design your database not to update too many existing documents at the same time. Instead, try to add data by adding new documents and aggregate via complex views.
But you have shown an edge case where the simple (1-dimensional) list-based index is not enough. You'd actually need secondary indices to filter by time and user-ids (given that fact that you also need partial ranges for both). But this not possible in CouchDB, so you need to work around by shifting "query" data into your docs and use them when building the view.