Firebase - proper way to structure the DB - ios

I have an iOS app that is like a social network for music.
In the app, users share "posts" about specific music "tracks".
I want to know the best way to structure the DB in Firebase, considering that each "post" object references a single "track" object.
Also, when a user submits a new post, I need to check if a track already exists by querying the artist + song title - if the track does not exist, add a new track. If the track exists get the "track_id" to reference in the "post" object.

In this case, you will meet some troubles when you implement the track search features and search users whom follow a track.
So generally, you need to fully load at least one table in your client app.
I hope this could be a help for your later troubles. Please check the Salada framework on Github. You can use Relation.

The challenge here is performing an 'and' query in Firebase as well, that doesn't exist. So you have mush two pieces of data together to then do that query. Here's a structure
artists
artist_0: Pink Floyd
artist_1: Billy Thorpe
artist_2: Led Zeppelin
tracks
track_id_0: Stairway To Heaven
track_id_1: Children Of The Sun
track_id_2: Comfortably Numb
artists_tracks
artist_0_track_id_2: true
artist_1_track_id_1: true
artist_2_track_id_0: true
posts
post_id_0
artist_track: artist_1_track_id_1
post: Billy was one of the most creative musicians of modern times.
post_id_1
artist_track: artist_0_track_id_2
post: The Floyd is the best band evah.
With this structure, if you know the artist and the track name, you can concatenate them and do a simple query in the artists_tracks node for .equalToValue(true) to see if it exists.
The posts in the posts node tie back to those specific artists and tracks.
In some cases you can glue your data together to perform and searches without the extra nodes... like this
stuff
artist_track: Billy_Thorpe_Children_Of_The_Sun
However, because of the spaces in the names and the varying width of the text it won't work. So that leads to ensuring you include enough digits in the data to handle however many songs and artists so the length stays consistent.
artists_tracks
artist_00000_track_id_00002: true
Now you can have 50,000 artists and 50,000 tracks.

Related

Querying Firebase Firestore Data

I'm looking to build an app that functions like a dating app:
User A fetches All Users.
User A removes Users B, C, and D.
User A fetches All Users again - excluding Users B, C, and D.
My goal is to perform a query that does not read the User B, C, and D documents in my fetch query.
I've read into array-contains-any, array-contains, not-in queries, but the 10 item limit prevents me from using these as options because the "removed users list" will continue to grow.
2 workaround options I've mulled over are...
Performing a paginated fetch on All User documents and then filtering out on the client side?
Store all User IDs (A, B, C, D) on 1 document in an array field, fetch the 1 document, and then filter client side?
Any guidance would be extremely appreciated either on suggestions around how I store my data or specific queries I can perform.
You can do it the other way around.
Instead of a removed or ignored array at your current user, you have an array of ignoredBy or removedBy in which you add your current user.
And when you fetch the users from the users collection, you just have to check if the requesting user is part of the array ignoredBy. So you don’t have tons of entries to check in the array, it is always just one.
Firestore may get a little pricey with the Tinder model but you can certainly implement a very extensible architecture, well enough to scale to millions of users, without breaking a sweat. So the user queries a pool of people, and each person is represented by their own document, this much is obvious. The user must then take an action on each person/document, and, presumably, when an action is taken that person should no longer reappear in the user's queries. We obviously can't edit the queried documents because there could be millions of users and that wouldn't scale. And we shouldn't use arrays because documents have byte limits and that also wouldn't scale. So we have to treat a collection like an array, using documents as items, because collections have no known limit to how many documents they can contain.
So when the user takes an action on someone, consider creating a new document in a subcollection in the user's own document (user A, the one performing the query) that contains the person's uid, and perhaps a boolean to determine if they liked or disliked that person (i.e. liked: true), and maybe a timestamp for UI purposes. This new document is the item in your limitless array.
When the user later performs another query, those same users are going to reappear in the results, which you need to filter out. You have no choice but to check if each person's uid is in this subcollection. If it is, omit the document and move to the next. But if your UI is configured like Tinder's, where there isn't a list of people to scroll through but instead cards stacked on top of each other, this is no big deal. The user will only be presented with one person at a time and they won't know how many you're filtering out behind the scenes. With a paginated list, the user may see odd behavior like uneven pages. The drawback is that you're now paying double for each query. Each query will cost you the original fetch and the subcollection-check fetch. But, hey, with this model you can scale to millions of users without ever breaking a sweat.

How does Firebase choose what to store in its cache with isPersistenceEnabled = true in iOS

I have an app that is using Firebase quite extensively to store data that contains relationships. I want to make sure I am using Firebase as safely as possible in offline mode. The safety concern I have can be demonstrated in the following example:
Assume I have a Zoo model where each individual zoo is stored in Firebase as a subnode of "/zoos".
I have an Animal model where each individual animal is stored in Firebase as a subnode of "/animals".
A Zoo can have Animals which are stored in an ordered list. Specifically, the Zoo model contains an Animal array e.g. [Animal]. This list of Animals is stored in Firebase as a set of position-reference pairs at "/zoos/myZoo/animals" which will contain nodes like:
{0: "animals/fidoTheDog"},
{1: "animals/jillTheCat"}
When I add a new Animal to a Zoo, I need to know how many animals are currently in that zoo so I can add the new animal in the right position like:
{2: "animals/jakeTheSnake"}
If I am offline and happen to read the location "zoos/myZoo/animals" to get the list of animals so I can add in the right position, I want to make sure I have accurate data. I know that if someone else wrote to that position while I am offline and added another animal in position 2, I will get stale data and when I add an animal in position 2, I will overwrite his entry at "zoos/myZoo/animals/2" when I again go online. So that is an issue.
But, if I know I will be the only one writing to that location, can I be relatively sure that Firebase will hold the crucial data at "zoos/myZoo/animals" for me since I am using isPersistenceEnabled = true? In other words, will Firebase just keep that data in cache as long as I have recently written to that location or recently read from that location?
Or do I explicitly need to specify "keepSynced(true)" on that location? This gets to the core general version of the question - How does Firebase choose what to store in its cache with isPersistenceEnabled = true? Especially if I have not specifically set keepSynced(true) on any particular locations. Will Firebase just prioritize recently read data and then when the 10mb limit is hit, discard the old stuff first? Does it matter if I wrote the data to that location a long time ago but consistently read from that location? Will it still maintain that location in the cache because it was recently read? Will it ever discard data before hitting the 10mb limit?
I'm a little bit of a newbie so thank you for your patience with me!
-------------- FOLLOW UP QUESTIONS --------------
A couple follow up questions.
I think the approach suggested in the blog (given by Frank in comments) of using childByAutoID sounds good. So if I am saving a zoo with many animals (in order) then it sounds like I would loop through the animals and use childByAutoID to create a new key for each animal whose value will be the reference to the location of the animal object. Can I be sure that the keys that I create in rapid succession (looping will probably be very fast) will ultimately sort correctly when ordered lexicographically? I’m looking at this blog post and assuming that is the case. https://firebase.googleblog.com/2015/02/the-2120-ways-to-ensure-unique_68.html
Suppose I am doing something more complicated like inserting an animal at the beginning of the list in position zero. Then before doing the operation, I would sync down the list of animals in the zoo as suggested in the blog post you sent. https://firebase.googleblog.com/2014/04/best-practices-arrays-in-firebase.html. If the user is offline, I obviously can’t be sure that I will have the freshest copy. But suppose I am ok with that because users will only be working with their own data and only on their own device. In that case, does it help to use keepSynced(true) on the path to the zoo? Or since the amount of data the user is working with is well, well under 10mb (the whole database right now is 300k for 10ish active users), can I just assume the cache will store the data in the zoo path (whether keepSynced or not) because we never flirt with the 10mb limit in any case?
Thank you!

Databases for filtering pop culture quotes or celebrity-related data out of text (e.g., tweets)?

I am trying to mine social media data, such as tweets. However, social media data have a lot of noise- for example people discussing celebrities or quoting a movie/TV/song, that is something most generally that is not about themselves or somebody they actually know personally.
So, is: are there any dynamic (i.e., automatically updated) databases on the most popular current celebrities? Movie quotes that they are in or song lyrics that they sing would also be relevant.
I don't think such a curated list exists. Smaller ones do exist, for example the 100 top movies quotes on Wikipedia. However, these are not updated.
One possibility is to filter out the aspects of your input that appear on another social media site that tracks trends, such as Delicious. Unless you are looking for trends, something that rises to the top of two trending sites likely ... is just a trend.
Delicious has a nice Python wrapper for its API.
In Pythonic pseudocode,
data = social-media.content
data = filter(lambda datum: datum not in delicious.content-list,data)

Building a (simple) twitter-clone with CouchDB

I'm trying to build a (simple) twitter-clone which uses CouchDB as Database-Backend.
Because of its reduced feature set, I'm almost finished with coding, but there's one thing left I can't solve with CouchDB - the per user timeline.
As with twitter, the per user timeline should show the tweets of all people I'm following, in a chronological order. With SQL it's a quite simple Select-Statement, but I don't know how to reproduce this with CouchDBs Map/Reduce.
Here's the SQL-Statement I would use with an RDBMS:
SELECT * FROM tweets WHERE user_id IN [1,5,20,33,...] ORDER BY created_at DESC;
CouchDB schema details
user-schema:
{
_id:xxxxxxx,
_rev:yyyyyy,
"type":"user",
"user_id":1,
"username":"john",
...
}
tweet-schema:
{
"_id":"xxxx",
"_rev":"yyyy",
"type":"tweet",
"text":"Sample Text",
"user_id":1,
...
"created_at":"2011-10-17 10:21:36 +000"
}
With view collations it's quite simple to query CouchDB for a list of "all tweets with user_id = 1 ordered chronologically".
But how do I retrieve a list of "all tweets which belongs to the users with the ID 1,2,3,... ordered chronologically"? Do I need another schema for my application?
The best way of doing this would be to save the created_at as a timestamp and then create a view, and map all tweets to the user_id:
function(doc){
if(doc.type == 'tweet'){
emit(doc.user_id, doc);
}
}
Then query the view with the user id's as keys, and in your application sort them however you want(most have a sort method for arrays).
Edited one last time - Was trying to make it all in couchDB... see revisions :)
Is that a CouchDB-only app? Or do you use something in between for additional buisness logic. In the latter case, you could achieve this by running multiple queries.
This might include merging different views. Another approach would be to add a list of "private readers" for each tweet. It allows user-specific (partial) views, but also introduces the complexity of adding the list of readers for each new tweet, or even updating the list in case of new followers or unfollow operations.
It's important to think of possible operations and their frequencies. So when you're mostly generating lists of tweets, it's better to shift the complexity into the way how to integrate the reader information into your documents (i.e. integrating the readers into your tweet doc) and then easily build efficient view indices.
If you have many changes to your data, it's better to design your database not to update too many existing documents at the same time. Instead, try to add data by adding new documents and aggregate via complex views.
But you have shown an edge case where the simple (1-dimensional) list-based index is not enough. You'd actually need secondary indices to filter by time and user-ids (given that fact that you also need partial ranges for both). But this not possible in CouchDB, so you need to work around by shifting "query" data into your docs and use them when building the view.

Best MongoDB schema for twitter clone?

I know similar questions have been asked, but looking for a very basic answer to a basic question. I am new to MongoDB and making a twitter style app (blogs, followers, etc) and I'm wondering the best schema to use.
Right now I have (on a very high level):
Member {
login: string,
pass: string,
posts: [
{
title: string,
blog: string,
comments: [ { comment: string } ]
}
]
}
There is more to it, but that gives you the idea. Now the problem is I'm looking to add the "follow" feature and I'm not sure the best route to go.
I could add a "following" embedded doc to the Member, but I'm just not sure using mongoDB what the smartest method would be. My main concearn would obviously be the main "feed" page where you see all of the people you are following's posts.
This is not an ideal schema for a Twitter clone. The main problem is that "posts" is an evergrowing array which means mongo will have to move your massive document every few posts because it ran out of document padding. Additionally there's a hard (16mb) size limit to documents which makes this schema restrictive at best.
The ideal schema depends on whether or not you expect Twitter's load. The "perfect" mongodb schema in terms of maintainability and easy of use is not the same as the one I'd use for something with Twitter's throughput. For example, in the former case I'd use a posts collection with a document per post. In the high throughput scenario I'd start making bucket documents for small groups of posts (say, one per "get more" page). Additionally in the high throughput scenario you'd have to keep the follower's timeline up to date in seperate user timeline documents while in low throughput scenarios you can simply query them.
This question is the same the one how widely used in the blog post example and how to model blog posts and comments. You just have to apply the same concepts here. You have the following options:
embedded documents
dedicated collections and performing multiple queries
The pros and cons have been widely discussed. Embedded docs can only be 16MB large and it is not possible to return individual parts of an matched array in MongoDB...make your choice.
Not going any further because as said: the same question has been discussed in numerous questions about "schema design". Just google "Schema Design MongoDB" or look for the same on SO.
Adding a "following" array to the Member document should work well. It should contain the user IDs of the people that member is following. Your code will have to retrieve the list and construct a query that retrieves the tweets of those users. As Mongo is nonrelational, there's no way to construct a query that joins the Member and Tweet collections and does this in a single query, but you should be able to reduce network overhead by doing this on the database server, using server-side code execution: http://www.mongodb.org/display/DOCS/Server-side+Code+Execution.

Resources