MongoDB and embedded documents, good use cases - ruby-on-rails

I am using embedded documents in MongoDB for a Rails 3 app. I like that I can use embedded documents and the values are all returned with one query and there is less load on the database server. But what happens if I want my users to be able to update properties that really should be shared across documents. Is this sort of operation feasible with MongoDB or would I be better off using normal id based relations? If ID based relations are the way to go would it affect performance to a great degree?
If you need to know anything else about the application or data I would be happy to let you know what I am working with.
Document that has many properties that all documents share.
Person
name: string
description: string
Document that wants to use these properties:
Post
(references many people)
body: string

This all depends on what are you going to do with your Person model later. I know of at least one working example (blog using MongoDB) where its developer keeps user data inside comments they make and uses one collection for the entire blog. Well, ok, he uses second one for his "tag cloud" :) He just doesn't need to keep centralized list of all commenters, he doesn't care. His blog contains consolidated data from all his previous sites/blogs?, almost 6000 posts total. Posts contain comments, comments contain users, users have emails, he got "subscribe to comments" option for every user who comments some post, authorization is handled by the external OpenID service aggregator (Loginza), he keeps user email got from Loginza response and their "login token" in their cookies. So the functionality is pretty good.
So, the real question is - what are you going to do with your Users later? If really feel like you need a separate collection (you're going to let users have centralized control panels, have site-based registration, you're going to make user-centristic features and so on), make it separate. If not - keep it simple and have fun :)

It depends on what user info you want to share acrross documents. Lets say if you have user and user have emails. Does not make sence to move emails into separate collection since will be not more that 10, 20, 100 emails per user. But if user say have some big related information that always growing, like blog posts then make sence to move it into separate collection.
So answer depend on user document structure. If you show your user document structure and what you planning to move into separate collection i will help you make decision.

Related

MVC beginner - Advice on how to persist a user selection for use throughout the site

I have an application which is centered around data per 'team'. A user belongs to a team and if they log in they only see that team's data.
However, I now have super users who essentially belong to more than one team. These users should be able to log in to the system and then immediately choose which team they are interested in. From then on they will essentially view/create data against that selected team. They should also have the option to go and change what team they are viewing at any time.
I've established that the user would like to be able to have multiple tabs open and be viewing a different team in each tab.
I'm struggling to work out the best way to accomplish this with .NET MVC while keeping it as stateless and testable as possible.
I've been reading up on the different ways to persist data - session state and cookies seem to get a bad rep in MVC. TempData, ViewBag seem to focus on just persisting data for one request.
I wouldn't have thought that this is an uncommon requirement in an application - are there known patterns for dealing with this in MVC which I have missed?
So far I'm trying to create a partial view which I can show on each page to let the user see what team they are viewing the site as, and change it from there.
Any advice is appreciated!
If you want to let your superuser view multiple teams data then you'll want to pass the team information in on the request, on the query string or as something that looks like a restful url:
/blueteam/members
In fact it would be extra work to track this in a stateful manner as you'd have track user, team, and ui element when a superuser can view multiple team data at once.
I'd say passing the information in on every request is a pretty standard approach to your situation.
The tricky part of the stateless approach is decorating all your internal application links with the team information without too much extra work. Relative links can be your friends here. So a link to the bug page for a team might be to simply "bugs", picking up the team name higher in the uri path. If you are creating something that looks like a one page application it's easy enough to store the team info on the client.
If you don't want team members to see data for other team members, you can set up guard functions that check for team membership for certain classes of users before rendering a view.

What's the most efficient way to create an alert queue for a model with hundreds of millions of entries?

I am currently working on an application in Rails (though language/framework shouldn't matter for this question since it is more of a theoretical one). I'm working on wrapping my head around this problem:
Say I am tracking millions of blogs online and am plugged into their RSS feeds. My app pings these feeds every few few minutes to see if there has been any new activity across any of these millions of blogs. If there is any new activity, I want to alert users of my application who have signed up to receive alerts for specific blogs that there has been an alert.
Does it make sense to have a user_blog_alerts table (where a user can specify custom keywords to be alerted about) and continuously check this table against every new entry that comes in from my feed? And when there is a match, to add them to a queue (using Redis)?
What is the best, most efficient way to build and model this alerting system? Am I even thinking about this in the right way? Are there any good examples or tutorials on this when working with such large amounts of data?
I'm not sure what the right way to do this is, but the thought of continuously scanning a table over and over sounds exhausting (ie. unscalable).
Off the top of my head, what if you created a LIST for every blog in Redis. The values would be the user IDs of those who wanted an alert. The key name would contain the blog id (ex: "user_blog_alerts:12345").
Then when you got a new post for blog 12345 it's a simple lookup to see if that key exists. If it does, then fire off alerts for each user in the list.

ASP.Net MVC Facebook-like activity stream

I would like to implement facebook - like news feed for my website, with social functionalities, such as share, like, comment and post and I want to connect it to already created users (nice to have it connected with Azure Active Directory).
Is there any ready solution for this problem?
Thanks in advance!
I don't know of anything specific, and StackOverflow is not the place for generalized library recommendations. However, it should be trivial enough to implement yourself. Activity streams are composed of four main components:
Actor
Verb
Object
Target
For example: "John (actor) shared (verb) a photo (object) with Mary (target)."
Just create an entity that can track these four aspects, and then add a record describing the action each time something happens. By making certain parts foreign keys (actor/target could be foreign keys to your user table), you can then pull actions specifically related to a particular user or other object in your system.

Simperium multiple users accessing data

In the Simperium documentation/help section there is the following text:
All the data that is created seems like it must be tied to a user - is
that correct? Is it possible to have data that isn't tied to a user -
say a database of locations or beers?
Yes, though this isn't very clear yet. You can create a public user
(i.e., a public namespace) with an access token you share with other
users of your app so anyone can read/write to that namespace.
It's possible to limit this to read-only access as well if you need to
authoritatively publish data from a backend service.
Is there an actual example with this?
The scenario I have is as follows
My app will have a calendar
The primary user can add and remove data from the calendar
They will want to invite other users to add and remove data, my thought is that they can give them a token, the user can use their email address and this token to sign in
Am I on the right track?
You're definitely on the right track, but a little too far ahead on that track. The scenario you described is a great fit for Simperium, but sharing and collaboration features aren't yet released.
The help text you quoted is for authoritatively pushing content, for example from a custom backend to all users of your app. An example of this would be a news stream that updates on all clients as new content is added.
This is quite different than sharing calendar data among a group of users who have different access permissions, which is actually a better use of Simperium's strengths. We have a solution for this that we've tested internally, but we're using what we've learned to build a better version of it that will be more scalable for production use.
There's no timeline for this yet, but you'll see it announced on your dashboard at simperium.com.

Need advice on MongoDB Schema for Chat App. Embedded vs Related Documents

I'm starting a MongoDB project just for kicks and as a chance to learn MongoDB/NoSQL schemas. It'll be a live chat app and the stack includes: Rails 3, Ruby 1.9.2, Devise, Mongoid/MongoDB, CarrierWave, Redis, JQuery.
I'll be handling the live chat polling/message queueing separately. Not sure how yet, either Node.js, APE or custom EventMachine app. But in regards to Mongo, I'm thinking to use it for everything else in the app, specifically chat logs and historical transcripts.
My question is how best to design the schema as all my previous experience has been with MySQL and relational DB schema's. And as a sub-question, when is it best to us embedded documents vs related documents.
The app will have:
Multiple accounts which have multiple rooms
Multiple rooms
Multiple users per room
List of rooms a user is allowed to be in
Multiple user chats per room
Searchable chat logs on a per room and per user basis
Optional file attachment for a given chat
Given Mongo (at least last time I checked) has a document limit of 4MB, I don't think having a collection for rooms and storing all room chats as embedded documents would work out so well.
From what I've thought about so far, I'm thinking of doing something like:
A collection for accounts
A collection for rooms
Each room relates back to an account
Related documents in chats collections for all chat messages in the room
Embedded Document listing all users currently in the room
A collection for users
Embedded Document listing all the rooms the user is currently in
Embedded Document listing all the rooms the user is allowed to be in
A collection for chats
Each chat relates back to a room in the rooms collection
Each chat relates back to a user in the users collection
Embedded document with info about optional uploaded file attachment.
My main concern is how far do I go until this ends up looking like a relational schema and I defeat the purpose? There is definitely more relating than embedding going on.
Another concern is that referencing related documents is much slower than accessing embedded documents I've heard.
I want to make generic queries such as:
Give me all rooms for an account
Give me all chats in a room (or filtered via date range)
Give me all chats from a specific user
Give me all uploaded files in a given room or for a given org
etc
Any suggestions on how to structure the schema efficiently in a way that scales? Thanks everyone.
I think you're pretty much on the right track. I'd use a capped collection for chat lines, with each line containing the user ID, room ID, timestamp, and what was said. This data would expire once the capped collection's "end" is reached, so if you needed a historical log you'd want to copy data out of the capped collection into a "log" collection periodically, but capped collections are specifically designed for logging-style applications where you aren't going to be deleting documents, and insertion order matters. In the case of chat, it's a perfect match.
The only other change I'd suggest would be to maintain uploads in a separate collection, as well.
I am a big fan of mongodb as a document database aswell. But are you sure you are using mongodb for the right reason? What is mongodb powerful at?
Its a subjective question but for me in-place (atomic) updates over documents is what makes mongodb powerful. And I can't really see you using it that much. And on top of that you are hitting the document size limit problem aswell.(With experience I can tell you that embedding files to mongodb is not a good idea). You want to have a live chat application on top of database too.
Your document schema's seems logical. But I wouldn't go with mongodb for this kind of project where your application heavily depends on inserts. I would go for CouchDB.
With CouchDB you wouldn't have to worry about attachments problem, you can embed them easily. "_changes" would make your life much much easier to eighter build a live chat application / long pooling / feeding search engine (if you want to implement one).
And I saw an open source showcase project in couchone. It has some similarities with your goals: Anologue. You should check it out.
PS : Sorry it was a little off topic but I couldn't hold myself.

Resources