Need advice on MongoDB Schema for Chat App. Embedded vs Related Documents - ruby-on-rails

I'm starting a MongoDB project just for kicks and as a chance to learn MongoDB/NoSQL schemas. It'll be a live chat app and the stack includes: Rails 3, Ruby 1.9.2, Devise, Mongoid/MongoDB, CarrierWave, Redis, JQuery.
I'll be handling the live chat polling/message queueing separately. Not sure how yet, either Node.js, APE or custom EventMachine app. But in regards to Mongo, I'm thinking to use it for everything else in the app, specifically chat logs and historical transcripts.
My question is how best to design the schema as all my previous experience has been with MySQL and relational DB schema's. And as a sub-question, when is it best to us embedded documents vs related documents.
The app will have:
Multiple accounts which have multiple rooms
Multiple rooms
Multiple users per room
List of rooms a user is allowed to be in
Multiple user chats per room
Searchable chat logs on a per room and per user basis
Optional file attachment for a given chat
Given Mongo (at least last time I checked) has a document limit of 4MB, I don't think having a collection for rooms and storing all room chats as embedded documents would work out so well.
From what I've thought about so far, I'm thinking of doing something like:
A collection for accounts
A collection for rooms
Each room relates back to an account
Related documents in chats collections for all chat messages in the room
Embedded Document listing all users currently in the room
A collection for users
Embedded Document listing all the rooms the user is currently in
Embedded Document listing all the rooms the user is allowed to be in
A collection for chats
Each chat relates back to a room in the rooms collection
Each chat relates back to a user in the users collection
Embedded document with info about optional uploaded file attachment.
My main concern is how far do I go until this ends up looking like a relational schema and I defeat the purpose? There is definitely more relating than embedding going on.
Another concern is that referencing related documents is much slower than accessing embedded documents I've heard.
I want to make generic queries such as:
Give me all rooms for an account
Give me all chats in a room (or filtered via date range)
Give me all chats from a specific user
Give me all uploaded files in a given room or for a given org
etc
Any suggestions on how to structure the schema efficiently in a way that scales? Thanks everyone.

I think you're pretty much on the right track. I'd use a capped collection for chat lines, with each line containing the user ID, room ID, timestamp, and what was said. This data would expire once the capped collection's "end" is reached, so if you needed a historical log you'd want to copy data out of the capped collection into a "log" collection periodically, but capped collections are specifically designed for logging-style applications where you aren't going to be deleting documents, and insertion order matters. In the case of chat, it's a perfect match.
The only other change I'd suggest would be to maintain uploads in a separate collection, as well.

I am a big fan of mongodb as a document database aswell. But are you sure you are using mongodb for the right reason? What is mongodb powerful at?
Its a subjective question but for me in-place (atomic) updates over documents is what makes mongodb powerful. And I can't really see you using it that much. And on top of that you are hitting the document size limit problem aswell.(With experience I can tell you that embedding files to mongodb is not a good idea). You want to have a live chat application on top of database too.
Your document schema's seems logical. But I wouldn't go with mongodb for this kind of project where your application heavily depends on inserts. I would go for CouchDB.
With CouchDB you wouldn't have to worry about attachments problem, you can embed them easily. "_changes" would make your life much much easier to eighter build a live chat application / long pooling / feeding search engine (if you want to implement one).
And I saw an open source showcase project in couchone. It has some similarities with your goals: Anologue. You should check it out.
PS : Sorry it was a little off topic but I couldn't hold myself.

Related

What is the best way to sync data among a small number of users in swift?

As the header indicates, I am looking for the simplest way to sync user-generated data (Integers, Booleans, NSDates, etc) among a small number of individuals (at this point, I am just thinking of sharing data between two people). Within the app, users can populate an array with instances of a custom object and this data is used to populate a UITableview. Assuming all users in the select group have synced their devices they should all see the same data in the tableview.
My original idea was to write to a json file in a shared Dropbox or Google Drive folder. After looking around online, however, I found that this method is likely to lead to data corruption. Cloudkit only allows public or private (single account) syncing, nothing in between. I have seen some posts that recommend using Parse, but that service is now on its way out.
Does anyone know of a (preferably free) way to do this?
You have several options:
CloudKit databases - CloudKit's database system has the concept of a public database which does exactly what you want. It's fairly easy to use as well, and is "free" with an Apple developer account. The only downside is that it's for Apple devices only (AFAIK).
Firebase - Google's Firebase is basically identical to CloudKit in concept and features, but runs on multiple platforms. It is tied to the Google ecosystem, so your uses all need to provide a Google account to use it, but that's a small issue these days.
Realm - from a pure usability perspective, Realm is BY FAR the easiest data storage solution I've seen on iOS. However, it's sharing functionality is currently limited, CloudKit support is scheduled but currently all there is is this. If you only need local storage for now, then definitely keep this on your list.
No matter which engine you choose, users would be limited to certain views of the data through your own code. I would suggest that you save every record with a username of the creator, and then have another table containing read/write permissions, so for instance, the entry for "maurymarkowitz" has "bobsmith,ronsmith,jonsmith". You can retrieve these entries on login and then use them as the inputs to the query-by-example both systems use for getting records.
Thanks for all of the helpful responses. I ended up using cloudkit/coredata and it serves my purpose just fine. I simply used the public option and gave each set of users who are sharing data with each other a unique identifier, which is appended to any records they upload. When a user syncs their data with the cloud the application performs a query for only those records that contain the user's identifier. This way, multiple users can sync data among themselves even though they do not share an iCloud account.

What's the most efficient way to create an alert queue for a model with hundreds of millions of entries?

I am currently working on an application in Rails (though language/framework shouldn't matter for this question since it is more of a theoretical one). I'm working on wrapping my head around this problem:
Say I am tracking millions of blogs online and am plugged into their RSS feeds. My app pings these feeds every few few minutes to see if there has been any new activity across any of these millions of blogs. If there is any new activity, I want to alert users of my application who have signed up to receive alerts for specific blogs that there has been an alert.
Does it make sense to have a user_blog_alerts table (where a user can specify custom keywords to be alerted about) and continuously check this table against every new entry that comes in from my feed? And when there is a match, to add them to a queue (using Redis)?
What is the best, most efficient way to build and model this alerting system? Am I even thinking about this in the right way? Are there any good examples or tutorials on this when working with such large amounts of data?
I'm not sure what the right way to do this is, but the thought of continuously scanning a table over and over sounds exhausting (ie. unscalable).
Off the top of my head, what if you created a LIST for every blog in Redis. The values would be the user IDs of those who wanted an alert. The key name would contain the blog id (ex: "user_blog_alerts:12345").
Then when you got a new post for blog 12345 it's a simple lookup to see if that key exists. If it does, then fire off alerts for each user in the list.

Deploying Neo4j database

so I developed a small Neo4j database with the aim of providing users with path-related information (shortest path from A to B and properties of individual sections of the path). My programming skills are very basic, but I want to make the database very user-friendly.
Basically, I would like to have a screen where users can choose start location and end location from dropdown lists, click a button, and the results (shortest path, distance of the path, properties of the path segments) will appear. For example, if this database had been made in MS Access, I would have made a form, where users could choose the locations, then click a control button which would have executed a query and produced results on a nice report.
Please note that all the nodes, relationships and queries are already in place. All I am looking for are some tips regarding the most user-friendly way of making the information accessible to the users.
Currently, all I can do is make the users install neo4j, run neo4j every time they need it, open the browser, run the cypher script and then edit the cypher script (write down strings as locations) and then execute the query. This makes it rather impractical for users and also I am worried that some user might corrupt the data,
I'd suggest making a web application using a web framework like Rails, especially if you're new to programming. You can use the neo4j gem for that to connect to your database and create models to access the data in a friendly way:
https://github.com/neo4jrb/neo4j
I'm one of the maintainers of that gem, so feel free to contact us if you have any questions:
neo4jrb#googlegroups.com
http://twitter.com/neo4jrb
Also, you might be interested in look at my newest project called meta model:
https://github.com/neo4jrb/meta_model
It's a Rails app that lets you define via the web app UI your database model (or at least part of it) and then browse/edit the objects via the web app. It's still very much preliminary, but I'd like to be able to things like what you're talking about (letting users examing data and the relationships between them in a user friendly way)
I general you would write an tiny (web/desktop/forms-)application that contains the form, takes the form values and issues the cypher requests with the form values as parameters.
The results can then be rendered as a table or chart or whatever.
You could even run this from Excel or Access with a Macro (using the Neo4j http endpoint).
Depending on your programming skills (which programming language can you write in) it can be anything. There is also a Neo4j .Net client (see http://neo4j.com/developer/dotnet).
And it's author Tatham Oddie showed a while ago how to do that with Excel

How to design a simple keyword content delivery mechanism that functions like Adwords

I'm have application that allows users to store food diary entries of approximately 140 characters in length. I am looking for a solution that will allow me to tie content modules (think tips for healthy eating) to the user's diary entries based on keywords in the entry similar to what Google does with adwords. Are there any out-of-the-box solutions that can do that in Rails?
Here are the specific requirements:
User logs food diary entry
In the user's food diary, if there's a specific tip that matches a keyword for the entry, then the tip is displayed next to the entry
Tips would be defined through an admin tool where the admin specifies the tip content and keywords that would make it appear in the diary
Trying to figure out a) if there's a pre-build solution I could use for something like this or b) what the best approach would be for performance since the users's food diary might have 20 entries per page, and each entry would have to be evaluated to see if there are any corresponding tips that match entry keywords.
For designing a home-grown solution, one idea I had was to make the tip associations when a new food entry is stored like this:
user adds a food entry
after_save a callback method breaks apart the entry into keywords and searches the tips model for matches
if there's a match, it's stored in an association table when new entries are created rather then when the user's food diary is rendered in the web page.
There's a performance hit on storing new entries, but it might allow the user's diary to load faster then doing all those look-ups when the diary is rendered.
Does that make sense, or is there a better way? better yet, are there tools that can accomplish what I'm trying to do?
Thanks!
This is not an AdWords API question, but I'll take a shot:
I would move the association table building into an offline task / cronjob. That would take care of the performance overhead when creating new entries, and users would be generally okay with a message like "Tips are being generated, please be patient" if they happen to view the topic too soon.
I'm not aware of any existing solutions, but this sounds like a hashtag system to me. Basically you have two lists (food dairy entries, tips), you want to assign hashtags to both lists and then pair entries with same hashtags. Googling for a hashtag system / library might be a good starting point.
Cheers,
Anash

MongoDB and embedded documents, good use cases

I am using embedded documents in MongoDB for a Rails 3 app. I like that I can use embedded documents and the values are all returned with one query and there is less load on the database server. But what happens if I want my users to be able to update properties that really should be shared across documents. Is this sort of operation feasible with MongoDB or would I be better off using normal id based relations? If ID based relations are the way to go would it affect performance to a great degree?
If you need to know anything else about the application or data I would be happy to let you know what I am working with.
Document that has many properties that all documents share.
Person
name: string
description: string
Document that wants to use these properties:
Post
(references many people)
body: string
This all depends on what are you going to do with your Person model later. I know of at least one working example (blog using MongoDB) where its developer keeps user data inside comments they make and uses one collection for the entire blog. Well, ok, he uses second one for his "tag cloud" :) He just doesn't need to keep centralized list of all commenters, he doesn't care. His blog contains consolidated data from all his previous sites/blogs?, almost 6000 posts total. Posts contain comments, comments contain users, users have emails, he got "subscribe to comments" option for every user who comments some post, authorization is handled by the external OpenID service aggregator (Loginza), he keeps user email got from Loginza response and their "login token" in their cookies. So the functionality is pretty good.
So, the real question is - what are you going to do with your Users later? If really feel like you need a separate collection (you're going to let users have centralized control panels, have site-based registration, you're going to make user-centristic features and so on), make it separate. If not - keep it simple and have fun :)
It depends on what user info you want to share acrross documents. Lets say if you have user and user have emails. Does not make sence to move emails into separate collection since will be not more that 10, 20, 100 emails per user. But if user say have some big related information that always growing, like blog posts then make sence to move it into separate collection.
So answer depend on user document structure. If you show your user document structure and what you planning to move into separate collection i will help you make decision.

Resources