Querying Firebase Firestore Data - ios

I'm looking to build an app that functions like a dating app:
User A fetches All Users.
User A removes Users B, C, and D.
User A fetches All Users again - excluding Users B, C, and D.
My goal is to perform a query that does not read the User B, C, and D documents in my fetch query.
I've read into array-contains-any, array-contains, not-in queries, but the 10 item limit prevents me from using these as options because the "removed users list" will continue to grow.
2 workaround options I've mulled over are...
Performing a paginated fetch on All User documents and then filtering out on the client side?
Store all User IDs (A, B, C, D) on 1 document in an array field, fetch the 1 document, and then filter client side?
Any guidance would be extremely appreciated either on suggestions around how I store my data or specific queries I can perform.

You can do it the other way around.
Instead of a removed or ignored array at your current user, you have an array of ignoredBy or removedBy in which you add your current user.
And when you fetch the users from the users collection, you just have to check if the requesting user is part of the array ignoredBy. So you don’t have tons of entries to check in the array, it is always just one.

Firestore may get a little pricey with the Tinder model but you can certainly implement a very extensible architecture, well enough to scale to millions of users, without breaking a sweat. So the user queries a pool of people, and each person is represented by their own document, this much is obvious. The user must then take an action on each person/document, and, presumably, when an action is taken that person should no longer reappear in the user's queries. We obviously can't edit the queried documents because there could be millions of users and that wouldn't scale. And we shouldn't use arrays because documents have byte limits and that also wouldn't scale. So we have to treat a collection like an array, using documents as items, because collections have no known limit to how many documents they can contain.
So when the user takes an action on someone, consider creating a new document in a subcollection in the user's own document (user A, the one performing the query) that contains the person's uid, and perhaps a boolean to determine if they liked or disliked that person (i.e. liked: true), and maybe a timestamp for UI purposes. This new document is the item in your limitless array.
When the user later performs another query, those same users are going to reappear in the results, which you need to filter out. You have no choice but to check if each person's uid is in this subcollection. If it is, omit the document and move to the next. But if your UI is configured like Tinder's, where there isn't a list of people to scroll through but instead cards stacked on top of each other, this is no big deal. The user will only be presented with one person at a time and they won't know how many you're filtering out behind the scenes. With a paginated list, the user may see odd behavior like uneven pages. The drawback is that you're now paying double for each query. Each query will cost you the original fetch and the subcollection-check fetch. But, hey, with this model you can scale to millions of users without ever breaking a sweat.

Related

Removing specific objects in arrays from references in dictionary

My question :
I need to know if what i'm doing is the best way, and if it's not, what is?
The situation :
I have "Contacts" objects in an array. These contacts must be ordered alphabetically and can have multiple phone numbers. I'm splitting that array into 27 arrays of contacts where each of them reprensents a letter of the alphabet. So i have all my "A" contacts, then "B" and so on.
Also, during the "splitting", I also add a reference of each contact in a dictionary, where the object is the contact, and the key is his phone number.
Because one contact can have X phone numbers, there can be X times the same contact in X different entries in the dictionary. I need that so i can find any contact with any number.
All of the above works like a charm.
Now I need to compare all those numbers from my online database (note: i'm using parse), to see if some of these contacts are already users or not. If they are, they need to be put in a specific section of my tableview. (my tableview is just all the contacts, separated in letter sections, + one "user" section). And the contacts can not appear in the user section AND the letter section. If a contact is a user, he must be separated.
What i'm asking vs What i'm doing :
Right now, i'm just re-looping every array and comparing each element to all the users i've found online. This is a lot of looping and looks like a waste of time and resources.
What i would like to do : Somehow cleaning my arrays of the users i've found, considering i have the reference of the contact object in my dictionary.
TL;DR:
My arrays :
users in the first section, then contacts alphabtically
[[user1, user2, user3, ...],[a1,a2,a3,...],[b1,b2,...],...]
My dictionary :
a1 - phone1
a1 - phone2
a1 - phone3
a2 - phone1
a3 - phone1
...
The ultimate question :
I can very easily find the contact object (since i have his number from my online db). If i interact with the a1 from the dictionary, will it also change the a1 in the array of arrays?
More specifically, can i somehow REMOVE IT from the array considering I don't know which one he is in?
I also add a reference of each contact in a dictionary, where the object is the contact, and the key is his phone number.
You need to be very careful with this approach. It is likely to have collisions. While cell phone numbers are often unique, sometimes they're shared. Home and work numbers are often shared. Phone numbers get reassigned, so your database can wind up with duplicates that way, too. And sometimes people just enter mistaken data. You have to make sure your system behaves reasonably and consistently in those cases. Many behaviors are fine, but "pick one at random" is generally not.
You should think carefully here about what is your model and what is your presentation. One data structure should be the single source of truth. Usually that's going to be your big list of contacts. That has nothing at all to do with how it's displayed. We call this the model. Often it's kept track of as an NSSet since order doesn't matter.
Above that you should have a "view model" that deals with grouping and sorting issues. It is not "truth." You should be willing to throw it away anytime the underlying data changes extensively. Its data structures should point to the underlying model, and should be stored in a way that exactly matches what the table view wants. Keeping the model and the view model separate is one of the best ways to keep your complexity under control. Then you know that there is exactly one place that data can change (the model), and everything else just reacts to that.
Regarding your partitioning problem: so you have a list of contacts and you want to separate them into two groups based on whether any of their phone numbers appear in another list. If your total list is only a few dozen entries long, frankly it doesn't matter how you do this. Even O(n^2) is fine for small enough n. Focus on making it simple and reliable first, then profile with Instruments to see where the real bottlenecks are.
That said, usually the fastest way to determine set intersection is to sort both sets and walk through them both at the same time. So you'd create a "contacts" array of "phone number + contact pointer" and a "users" array of just phone numbers. Sort them both by phone number. Walk over them, comparing the current element of each list, and then incrementing the index of the smaller one. If no match, put the contact in one list. If a match, put it in the other.
But I'd probably just stick all the phone numbers in a set and use member: to look them up. It's just usually easier.

Any tips for displaying user dependent values in a list of items?

It is pretty common for a web application to display a list of items and for each item in the list to indicate to the current user whether they have already viewed the associated item.
An approach that I have taken in the past is to store HasViewed objects that contain the Id of a viewed item/object and the Id of the User who has viewed that item/object.
When it comes time to display a list of items this requires querying the database for the items, and separately querying the database for the HasViewed objects, and then combining the results of these queries into a set of objects constructed solely for the purpose of displaying them in the view.
Each e.g li then uses the e.g. has_viewed property of the objects constructed above.
I think it is time to find a better approach and would like to know what approaches you would recommend.
i wanna to know whether there is better idea also.
Right now my solution is putting the state in the redis and cached the view with fragment cache.

RavenDB - How to Use in Production with Many Inserts?

After hearing about NoSQL for a couple years I finally started playing with RavenDB today in an .Net MVC app (simple blog). Getting the embedded database up and running was pretty quick and painless.
However, I've found that after inserting objects into the document store, they are not always there when the subsequent page refreshes. When I refresh the page they do show up. I read somewhere that this is due to stale indexes.
My question is, how are you supposed to use this in production on a site with inserts happening all the time (example: e-commerce). Isn't this always going to result in stale indexes and unreliable query results?
Think of what actually happens with a traditional database like SQL Server.
When an item is created, updated, or deleted from a table, any indexes associated with table also have to be updated.
The more indexes you have on a table, the slower your write operations will be.
If you create a new index on an existing table, it isn't used at all until it is fully built. If no other index can answer a query, then a slow table scan occurs.
If others attempt to query from an existing index while it is being modified, the reader will block until the modification is complete, because of the requirement for Consistency being higher priority than Availability.
This can often lead to slow reads, timeouts, and deadlocks.
The NoSQL concept of "Eventual Consistency" is designed to alleviate these concerns. It is optimized reads by prioritizing Availability higher than Consistency. RavenDB is not unique in this regard, but it is somewhat special in that it still has the ability to be consistent. If you are retrieving single document, such as reviewing an order or an end user viewing their profile, these operations are ACID compliant, and are not affected by the "eventual consistency" design.
To understand "eventual consistency", think about a typical user looking at a list of products on your web site. At the same time, the sales staff of your company is modifying the catalog, adding new products, changing prices, etc. One could argue that it's probably not super important that the list be fully consistent with these changes. After all, a user visiting the site a couple of seconds earlier would have received data without the changes anyway. The most important thing is to deliver product results quickly. Blocking the query because a write was in progress would mean a slower response time to the customer, and thus a poorer experience on your web site, and perhaps a lost sale.
So, in RavenDB:
Writes occur against the document store.
Single Load operations go directly to the document store.
Queries occur against the index store
As documents are being written, data is being copied from the document store to the index store, for those indexes that are already defined.
At any time you query an index, you will get whatever is already in that index, regardless of the state of the copying that's going on in the background. This is why sometimes indexes are "stale".
If you query without specifying an index, and Raven needs a new index to answer your query, it will start building an index on the fly and return you some of those results right away. It only blocks long enough to give you one page of results. It then continues building the index in the background so next time you query you will have more data available.
So now lets give an example that shows the down side to this approach.
A sales person goes to a "products list" page that is sorted alphabetically.
On the first page, they see that "Apples" aren't currently being sold.
So they click "add product", and go to a new page where they enter "Apples".
They are then returned to the "products list" page and they still don't see any Apples because the index is stale. WTF - right?
Addressing this problem requires the understanding that not all viewers of data should be considered equal. That particular sales person might demand to see the newly added product, but a customer isn't going to know or care about it with the same level of urgency.
So on the "products list" page that the sales person is viewing, you might do something like:
var results = session.Query<Product>()
.Customize(x => x.WaitForNonStaleResultsAsOfLastWrite())
.OrderBy(x=> x.Name)
.Skip((pageNumber-1) * pageSize).Take(pageSize);
While on the customer's view of the catalog, you would not want to add that customization line.
If you wanted to get super precise, you could use a slightly more optimized strategy:
When going back from the "add product" page to the "list products" page, pass along the ProductID that was just added.
Just before you query on that page, if the ProductID was passed in then change your query code to:
var product = session.Load(productId);
var etag = session.Advanced.GetEtagFor(product);
var results = session.Query<Product>()
.Customize(x => x.WaitForNonStaleResultsAsOf(etag))
.OrderBy(x=> x.Name)
.Skip((pageNumber-1) * pageSize).Take(pageSize);
This will ensure that you only wait as long as absolutely necessary to get just that one product's changes included in the results list along with the other results from the index.
You could optimize this slightly by passing the etag back instead of the ProductId, but that might be less reusable from other places in your application.
But do keep in mind that if the list is sorted alphabetically, and we added "Plums" instead of "Apples", then you might not have seen these results instantly anyway. By the time the user had skipped to the page that includes that product, it would likely have been there already.
You are running into stale queries.
That is a by design part of RavenDB. You need to make distinction between queries (BASE) and loading by id (ACID).

How do I get current courses for a user in Desire2Learn's Valence API? What can we do to fetch when courses are in thousands?

We need to find all the courses for a user whose startDate is less than today's date and endDate is greater than today's date. We are using API
/d2l/api/lp/{ver}/enrollments/myenrollments/?orgUnitTypeId=3
In one particular case I have more than 18 thousand courses against one user. The service can not return 18 thousand records at one go, I can only get 100 records at a time, so I need to use bookmark fields to fetch data in set of 100 records. Bookmark is the courseId of the last 100th record that we fetched, to get next set of 100 records.
/d2l/api/lp/{ver}/enrollments/myenrollments/?orgUnitTypeId=3&bookmark=12528
I need to repeat the loop 180 times, which results in "Request time out" error.
I need to filter the record on the basis of startDate and endDate, no sorting criteria is mentioned which can sort the data on the basis of startDate or endDate. Can anyone help me to find out the way to sort these data, or tell any other API which can do such type of sorting?
Note: All the 18 thousand records has property "IsActive":true
Rather than getting to the list of org units by user, you can try getting to the user by the list of org units. You could try using /d2l/api/lp/{ver}/orgstructure/{orgUnitId}/descendants/?ouTypeId={courseOfferingType} to retrieve the entire list of course offering IDs descended from the highest common ancestor known for the user's enrollments. You can then loop through /d2l/api/lp/{ver}/courses/{orgUnitId} to fetch back the course offering info for each one of those org units to pre-filter and cut out all the course offerings you don't care about based on dates. Then, for the ones left, you can check for the user's enrollment in each one of those to figure out which of your smaller set the user matches with.
This will certainly result in more calls to the service, not less, so it only has two advantages I can see:
You should be able to get the entire starting set of course offerings you need off the hop rather than getting it back in pages (although it's entirely possible that this call will get turned into a paging call in the future and the "fetch all the org units at once" nature it currently has deprecated).
If you need to do this entire use-case for more than one user, you can fetch the org structure data once, cache it, and then only do queries for users on the subset of the data.
In the mean time, I think it's totally reasonable to request an enhancement on the enrollments calls to provide better filtering (active/nonactive, start-dates, end-dates, and etc): I suspect that such a request might see more traction than a request to give control to clients over paging (i.e. number of responses in each page frame).

Building a (simple) twitter-clone with CouchDB

I'm trying to build a (simple) twitter-clone which uses CouchDB as Database-Backend.
Because of its reduced feature set, I'm almost finished with coding, but there's one thing left I can't solve with CouchDB - the per user timeline.
As with twitter, the per user timeline should show the tweets of all people I'm following, in a chronological order. With SQL it's a quite simple Select-Statement, but I don't know how to reproduce this with CouchDBs Map/Reduce.
Here's the SQL-Statement I would use with an RDBMS:
SELECT * FROM tweets WHERE user_id IN [1,5,20,33,...] ORDER BY created_at DESC;
CouchDB schema details
user-schema:
{
_id:xxxxxxx,
_rev:yyyyyy,
"type":"user",
"user_id":1,
"username":"john",
...
}
tweet-schema:
{
"_id":"xxxx",
"_rev":"yyyy",
"type":"tweet",
"text":"Sample Text",
"user_id":1,
...
"created_at":"2011-10-17 10:21:36 +000"
}
With view collations it's quite simple to query CouchDB for a list of "all tweets with user_id = 1 ordered chronologically".
But how do I retrieve a list of "all tweets which belongs to the users with the ID 1,2,3,... ordered chronologically"? Do I need another schema for my application?
The best way of doing this would be to save the created_at as a timestamp and then create a view, and map all tweets to the user_id:
function(doc){
if(doc.type == 'tweet'){
emit(doc.user_id, doc);
}
}
Then query the view with the user id's as keys, and in your application sort them however you want(most have a sort method for arrays).
Edited one last time - Was trying to make it all in couchDB... see revisions :)
Is that a CouchDB-only app? Or do you use something in between for additional buisness logic. In the latter case, you could achieve this by running multiple queries.
This might include merging different views. Another approach would be to add a list of "private readers" for each tweet. It allows user-specific (partial) views, but also introduces the complexity of adding the list of readers for each new tweet, or even updating the list in case of new followers or unfollow operations.
It's important to think of possible operations and their frequencies. So when you're mostly generating lists of tweets, it's better to shift the complexity into the way how to integrate the reader information into your documents (i.e. integrating the readers into your tweet doc) and then easily build efficient view indices.
If you have many changes to your data, it's better to design your database not to update too many existing documents at the same time. Instead, try to add data by adding new documents and aggregate via complex views.
But you have shown an edge case where the simple (1-dimensional) list-based index is not enough. You'd actually need secondary indices to filter by time and user-ids (given that fact that you also need partial ranges for both). But this not possible in CouchDB, so you need to work around by shifting "query" data into your docs and use them when building the view.

Resources