Multiple querying on Firebase / Firestore

Multiple querying on Firebase / Firestore - ios

Dear Firebase Enthusiasts,
I'm having a problem of multiple querying on Firestore.
I'm developing for iOS.
Here's the data structure I'd like to query on:
Collection X
- Document A
- date: 1532271987 (timestamp)
- users: {
- user_id_1: "true"
- user_id_2: "true"
- user_id_3: "true"
- ...
}
- Document B
- date: ...
- users: {
- ...
}
- Documents...
I'd like to query documents that:
Contains a user_id_x on its users dictionary
(sth like users/user_id_x == true)
AND
It's date is greater than SOME_TIMESTAMP
Which is basically multiple querying on the same API request.
Also I would like to order by date and limit the number of fetched items to enable pagination, but that is sth I can already manage seamlessly, like this:
Given that I know the size of items I'd like to fetch, and a fromDate timestamp parameter
collectionReference
.order(by: "date", descending: false)
.whereField("date", isGreaterThan: fromDate)
.limit(to: size)
This works pretty fine.
What doesn't work is that I can't have this to be filtered by a user_id_x parameter on the server side. So I can't just simply use the following:
collectionReference
.order(by: "date", descending: false)
.whereField("date", isGreaterThan: fromDate)
.whereField("users/user_id_x", isEqualTo: true)
.limit(to: size)
What I tried before:
1-
The way of accessing user_id_x on such data structure by using users/user_id_x using the / should work according to the docs, but I couldn't simply have it work.. (using users.user_id_x where using the . to access a level deeper simply crashes on SWIFT)
So well, the structure of users doesn't have to be like this.. It can be an array (which I know is not so much preferred for querying.. Or the object can directly be like this:
- Document A
- date: 1532271987 (timestamp)
- user_id_1: "true"
- user_id_2: "true"
- user_id_3: "true"
- ...
However in this case Firestore must create individual indexes for each user_id_x present, which doesn't also work for multiple querying by the way.
2-
I tried creating joined indexes (the idea behind libraries like Querybase), where for example:
- Document A
- date: 1532271987 (timestamp)
- users: {
- user_id_1: "true"
- user_id_2: "true"
- user_id_3: "true"
- ...
}
- date_user_id_1:"true_1532271987"
- date_user_id_2:"true_1532271987"
- ...
- date_user_id_x:"true_SOME_TIMESTAMP"
So this structure would allow me to query for something like:
....whereField("date_user_id_x", isEqualTo: true_SOME_TIMESTAMP)
But this would fetch items that are matching a date.. In order to query for sth like
....whereField("date_user_id_x", isGreaterThan: true_SOME_TIMESTAMP)
which queries for isGreaterThan... Firestore complains (on console) that it must create indexes called date_user_id_x on each document.. This has to be done on each and everytime a user becomes related with a timestamp.. I'm not sure how to automate that but it would be too much indexing job either.. Most importantly,
I'm pretty lost of how to make this work in the exact way I'd like it to be.
3-
Client Side filtering
is something I'm trying to avoid..
So the idea behind is, let's pull 10 items from the global dictionary, and then filter them on the device locally, e.g filter if the item has user_id_x: true. But this is simply impractical for a social media app like I'm working on, where hundreds and hundreds of users will be present..
I'm trying to avoid it since it would result with using client's network aggressively, and might (most likely) result in fetches with no items relating to the specific user we want to filter to (so I'd have to build a mechanism that would query one more time, and one more until it pulls an X amount of items, which is practically too hard to manage). Moreover I'll be fetching image data based on the results so any mistakes here can result in too much data consumption for the user.
4-
Creating another dictionary/object to divide this job in multiple queries
So I can actually create another dictionary that can divide the query work load.. I can first filter on that by users and then with dates, and then pull Document_X's using the fetch results from that dictionary. But this is too hard to manage since there are almost 30+ jobs and cases on my app that results with changes on Document_X, user_id_X and some_date relationships. Maintaining these in a single dictionary is fine.. But to manage them in two dictionaries would simply require me writing 30+ Google Cloud Functions, which would simply mimic changes on 1st dictionary to the other one. Which is not practical at all. I've been down that road and gave up after my 3rd compensating function, since at I quickly filled my data usage quota on Firebase (luckily I was on trial).
Additional notes:
I've checked some other SO posts, which basically addresses the few solutions I mention above. I also delved into Google Forums and learned Google has been working on this since 2015, but no results yet.
I think there should be a clever way to do this, easily.. I would very much appreciate your comments, thoughts and guidance on here.

Related

A faster way to get data from Firebase?

I recently uploaded 37,000 strings of data to Firebase (name of all cities in USA). To find that it takes way too long to go through each one using the observe .childAdded method to upload it to a basic table view.
Is there an alternative? How can I get the data to my app faster? The data shouldn’t change.... so is there an alternative?

There is no way to load the same data faster. Firebase isn't artificially throttling your download speed, so the time it takes to read the 37,000 strings, is he time it takes to read the 37,000 strings.
To make your application respond faster to the user, you will have to load less data. And since it's unlikely your user will read all 37,000 strings, a good first option is to only load the data that they will see.
Since you're describing an auto-complete scenario, I'd first look at using a query to only retrieve child nodes that match what they already typed. In Firebase that'd be something like this:
ref.queryOrdered(byChild: "name")
.queryStarting(atValue: "stack")
.queryEnding(atValue: "stack\u{f8ff}")
This code takes (on the server) all nodes under ref, and orders them by name. It then finds the first one starting with stack and returns all child nodes until it finds one not starting with stack anymore.
With this approach the filtering happens on the server, and the client only has to download the data that matches the query.

This is something easily solvable using Algolia. Algolia can search large data sets without taking much time at all. This way, you query Algolia and never need to look at the Firebase database.
In your Firebase Functions, listen for any new nodes in the place you keep your names of cities, and when that function gets called, add that string to your Algolia index.
You can follow the Algolia docs here: Algolia Docs

GeoFire and Firebase Query

I'm a bit confused on the proper way to query Firebase with GeoFire results.
Reading the directions for GeoFire, I should keep locations and user data separate in the tree. So I have the following:
-location
- userID
- g
- l
- 0:lat
- 1:long
- user
-userID
- ...
Using a Geofire query, I have an array of userIDs that are nearby. This is where I get confused, I don't see any methods for Firebase to query an Array of items. I would like to retrieve the user data for every user in the array.
I could loop through the list and make multiple requests, but that seems to be an inefficient way to get the desired results.

I don't see any methods for Firebase to query an Array of items. I would like to retrieve the user data for every user in the array.
There indeed isn't a call for that.
I could loop through the list and make multiple requests, but that seems to be an inefficient way to get the desired results.
While it may seem inefficient, it actually is not. The reason for this is that Firebase is quite efficient at loading multiple items, since it retrieves all of them over the same web socket and pipelines the request.
For a more elaborate explanation, see Speed up fetching posts for my social network app by using query instead of observing a single event repeatedly

Dynamic Queries using Couch_Potato

The documentation for creating a fairly straightforward view is easy enough to find:
view :completed, :key => :name, :conditions => 'doc.completed === true'
How, though, does one construct a view with a condition created on the fly? For example, if I want to use a query along the lines of
doc.owner_id == my_var
Where my_var is set programatically.
Is this even possible? I'm very new to NoSQL so apologies if I'm making no sense.

Views in CouchDB are incrementally built / indexed as data is inserted / updated into that particular database. So in order to take full advantage of the power behind views you won't want to dynamically query them. You'll want to construct your views in such a way that you can efficiently access the data based on the expected usage patterns of the application. In my experience it's not uncommon to have multiple views each giving you a different way to access / query the same data. I find it helpful to think of CouchDB views as a way to systematically denormalize your documents.
On the other hand there are also ways to generalize your indexes in your views so you can use a single view for endless combinations of queries.
For example, you have an "articles" database, and each article document contains a list of tags. If you want to set up a query to dynamically retrieve all articles tagged with a handful of tags, you could emit multiple entries to the view on the same document:
// this article is tagged with "tag1","tag2","tag3"
emit("tag1",doc._id);
emit("tag2",doc._id);
emit("tag3",doc._id);
....
Now you have a way to query: Give me all articles tagged with these words: ["tag1","tag2",etc]
For more info on how to query multiple keys see "Parameter -> keys" in the table of Querying Options here:
http://wiki.apache.org/couchdb/HTTP_view_API#Querying_Options
One problem with the above example is it would produce duplicates if a single document was tagged with both or all of the tags you were querying for. You can easily de-dupe the results of the view by using a CouchDB "List Function". More info about list functions can be found here:
http://guide.couchdb.org/draft/transforming.html
Another way to construct views for even more robust "dynamic" access to the data would be to compose your indexes out of complex data types such as JavaScript arrays. Also incorporating "range queries" can help. So for example if you have a 3-item array in your index, but only have the first 2 values, you can set up a range query to pull all documents that match the first 2 items of the array. Some useful info about that can be found here:
http://guide.couchdb.org/draft/views.html
Refer to the "startkey", and "endkey" options under "Querying Options" table here:
http://wiki.apache.org/couchdb/HTTP_view_API#Querying_Options
It's good to know how CouchDB indexes itself. It uses a "B+ tree" data structure:
http://guide.couchdb.org/draft/btree.html
Keep this in mind when thinking about how to compose your indexes. This has specific implications about how you need to construct your indexes. For example, you can't expect to get good performance on a view if you query with a range on the first item in the array. For example:
startkey = [a,1,2]
endkey = [z,1,2]
You'll get the performance you'd expect if your query is:
startkey = [1,2,a]
endkey = [1,2,z]
This, in more general terms, means that index order does matter when querying views. Not just on basis of performance, but on basis of what documents will be returned. If you index a document in a view with [1,2,3], you can't expect it to show up in query for index [3,2,1], [2,1,3], or any other combination.
In my experience, most data-access problems can be solved elegantly and efficiently with CouchDB and the basic tools it provides. If / when your project needs true dynamic access to the data, I generally still use CouchDB for common data access needs, but I'll also integrate ElasticSearch using an ElasticSearch plugin which streams your data from CouchDB into ElasticSearch as it becomes available:
http://www.elasticsearch.org/
https://github.com/elasticsearch/elasticsearch-river-couchdb

Write native SQL in Core Data

I need to write a native SQL Query while I'm using Core Data in my project. I really need to do that, since I'm using NSPredicate right now and it's not efficient enough (in just one single case). I just need to write a couple of subqueries and joins to fetch a big number of rows and sort them by a special field. In particular, I need to sort it by the sum of values of their child-entites. Right now I'm fetching everything using NSPredicate and then I'm sorting my result (array) manually, but this just takes too long since there are many thousands of results.
Please correct me if I'm wrong, but I'm pretty sure this can't be a huge challenge, since there's a way of using sqlite in iOS applications.
It would be awesome if someone could guide me into the right direction.
Thanks in advance.
EDIT:
Let me explain what I'm doing.
Here's my Coredata model:
And here's how my result looks on the iPad:
I'm showing a table with one row per customer, where every customer has an amount of sales he made from January to June 2012 (Last) AND 2013 (Curr). Next to the Curr there's the variance between those two values. The same thing for gross margin and coverage ratio.
Every customer is saved in the Kunde table and every Kunde has a couple of PbsRows. PbsRow actually holds the sum of sales amounts per month.
So what I'm doing in order to show these results, is to fetch all the PbsRows between January and June 2013 and then do this:
self.kunden = [NSMutableOrderedSet orderedSetWithArray:[pbsRows valueForKeyPath:#"kunde"]];
Now I have all customers (Kunde) which have records between January and June 2013.
Then I'm using a for loop to calculate the sum for each single customer.
The idea is to get the amounts of sales of the current year and compare them to the last year.
The bad thing is that there are a lot of customers and the for-loop just takes very long :-(

This is a bit of a hack, but... The SQLite library is capable of opening more than one database file at a given time. It would be quite feasible to open the Core Data DB file (read/only usage) directly with SQLite and open a second file in conjunction with this (reporting/temporary tables). One could then execute direct SQL queries on the data in the Core Data DB and persist them into a second file (if persistence is needed).
I have done this sort of thing a few times. There are features available in the SQLite library (example: full-text search engine) that are not exposed through Core Data.

If you want to use Core Data there is no supported way to do a SQL query. You can fetch specific values and use [NSExpression expressionForFunction:arguments:] with a sum: function.
To see what SQL commands Core Data executes add -com.apple.CoreData.SQLDebug 1 to "Arguments Passed on Launch". Note that this should not tempt you to use the SQL commands youself, it's just for debugging purposes.

Short answer: you can't do this.
Long answer: Core Data is not a database per se - it's not guaranteed to have anything relational backing it, let alone a specific version of SQLite that you can query against. Furthermore, going mucking around in Core Data's persistent store files is a recipe for disaster, especially if Apple decides to change the format of that file in some way. You should instead try to find better ways to optimize your usage of NSPredicate or start caching the values you care about yourself.
Have you considered using the KVC collection operators? For example, if you have an entity Foo each with a bunch of children Bar, and those Bars have a Baz integer value, I think you can get the sum of those for each Foo by doing something like:
foo.bars.#sum.baz
Not sure if these are applicable to predicates, but it's worth looking into.

Building a (simple) twitter-clone with CouchDB

I'm trying to build a (simple) twitter-clone which uses CouchDB as Database-Backend.
Because of its reduced feature set, I'm almost finished with coding, but there's one thing left I can't solve with CouchDB - the per user timeline.
As with twitter, the per user timeline should show the tweets of all people I'm following, in a chronological order. With SQL it's a quite simple Select-Statement, but I don't know how to reproduce this with CouchDBs Map/Reduce.
Here's the SQL-Statement I would use with an RDBMS:
SELECT * FROM tweets WHERE user_id IN [1,5,20,33,...] ORDER BY created_at DESC;
CouchDB schema details
user-schema:
{
_id:xxxxxxx,
_rev:yyyyyy,
"type":"user",
"user_id":1,
"username":"john",
...
}
tweet-schema:
{
"_id":"xxxx",
"_rev":"yyyy",
"type":"tweet",
"text":"Sample Text",
"user_id":1,
...
"created_at":"2011-10-17 10:21:36 +000"
}
With view collations it's quite simple to query CouchDB for a list of "all tweets with user_id = 1 ordered chronologically".
But how do I retrieve a list of "all tweets which belongs to the users with the ID 1,2,3,... ordered chronologically"? Do I need another schema for my application?

The best way of doing this would be to save the created_at as a timestamp and then create a view, and map all tweets to the user_id:
function(doc){
if(doc.type == 'tweet'){
emit(doc.user_id, doc);
}
}
Then query the view with the user id's as keys, and in your application sort them however you want(most have a sort method for arrays).
Edited one last time - Was trying to make it all in couchDB... see revisions :)

Is that a CouchDB-only app? Or do you use something in between for additional buisness logic. In the latter case, you could achieve this by running multiple queries.
This might include merging different views. Another approach would be to add a list of "private readers" for each tweet. It allows user-specific (partial) views, but also introduces the complexity of adding the list of readers for each new tweet, or even updating the list in case of new followers or unfollow operations.
It's important to think of possible operations and their frequencies. So when you're mostly generating lists of tweets, it's better to shift the complexity into the way how to integrate the reader information into your documents (i.e. integrating the readers into your tweet doc) and then easily build efficient view indices.
If you have many changes to your data, it's better to design your database not to update too many existing documents at the same time. Instead, try to add data by adding new documents and aggregate via complex views.
But you have shown an edge case where the simple (1-dimensional) list-based index is not enough. You'd actually need secondary indices to filter by time and user-ids (given that fact that you also need partial ranges for both). But this not possible in CouchDB, so you need to work around by shifting "query" data into your docs and use them when building the view.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart