What's the most efficient way to create an alert queue for a model with hundreds of millions of entries? - ruby-on-rails

I am currently working on an application in Rails (though language/framework shouldn't matter for this question since it is more of a theoretical one). I'm working on wrapping my head around this problem:
Say I am tracking millions of blogs online and am plugged into their RSS feeds. My app pings these feeds every few few minutes to see if there has been any new activity across any of these millions of blogs. If there is any new activity, I want to alert users of my application who have signed up to receive alerts for specific blogs that there has been an alert.
Does it make sense to have a user_blog_alerts table (where a user can specify custom keywords to be alerted about) and continuously check this table against every new entry that comes in from my feed? And when there is a match, to add them to a queue (using Redis)?
What is the best, most efficient way to build and model this alerting system? Am I even thinking about this in the right way? Are there any good examples or tutorials on this when working with such large amounts of data?

I'm not sure what the right way to do this is, but the thought of continuously scanning a table over and over sounds exhausting (ie. unscalable).
Off the top of my head, what if you created a LIST for every blog in Redis. The values would be the user IDs of those who wanted an alert. The key name would contain the blog id (ex: "user_blog_alerts:12345").
Then when you got a new post for blog 12345 it's a simple lookup to see if that key exists. If it does, then fire off alerts for each user in the list.

Related

How to invite users to join a multi player Gaming Session using Parse (swift)

I'm trying to develop a trivia app, much like Quiz Up but with multi players.
Here's what I thought of doing:-
Creating a class called 'Game Session' on Parse, that has information of who created it (PFUser.current), the name of the gaming session(name), and the names of users invited(invited_users). Think of this Gaming Session as a closed group where the users interact with each other only.
So there's a createSessionViewController, and a joinSessionViewController.
If User A creates a gaming session (in createSessionViewController) and sends invites out to User B and User C, they get to accept or decline these invites in joinSessionViewController.
Now from what I have researched is that I would have to query through all the objects in the class Game Session (in viewdidload of the joinSessionViewController) and use query.wherekey for eg, User B's object id is in the column "invited_users". If so, I return that Gaming Session's object. Is that right?
If that is the case, is that an efficient way of doing it? Because it seems like if the app gets popular and there are lots of objects in the class, then it could take up a lot of time to get the one object with User B's id.
I hope I made myself clear and you guys understand my question.
PS: I'm sort of new to parse and swift, so if you could give me detailed answers it would be much appreciated.
Your logic is correct but I would also strongly suggest you take a look at Parse-LiveQuery. This tool allows you to subscribe to a PFQuery you are interested in. Once subscribed, the server will notify clients whenever a PFObject that matches the PFQuery is created or updated, in real-time.
https://github.com/ParsePlatform/parse-server/wiki/Parse-LiveQuery
https://github.com/ParsePlatform/ParseLiveQuery-iOS-OSX
Your assumption is correct and that is indeed one way you could go about doing that although it has drawbacks as you mentioned. If you felt like putting more effort into it, you can write JavaScript parse cloud code that executes after an item is saved (for example after a game session is created) and send out silent push notifications with the new objects id to the users who were invited. You could then use that push notification data to know the exact ids instead of having to query for them. This is much more advanced though. For whatever your app is, the simple route of having a model query the data on load should be fine. If you find yourself in a situation where performance is hindered due to this, well then congratulations.

Best way to build a feed notification system like Facebook in Rails?

I'm a new junior developer joining to this awesome community. I'm developing my first big personal project, and I'm stuck with this specific part.
I would like to build a feed notification system like Facebook with the following features:
Track different models and relationships, for example: new badges earned, new comments in subscribed models, new posts by followed users, new comments on my posts, new likes on my posts...
Group the activities, for example, instead of have 400 activities for each likes in my post, has just one notification that says "User X and 399 more likes your post"
Be possible to mark notifications as readed to don't see them again, at least you explore past notifications.
Scalability, good perfomance, and possible integration in the future in an APP developed for example with Iconic framework.
Push notifications are optional, it's ok if the user need to refresh the page to see the new notifications.
So for that, I have readed a lot of. I have watched some Railcast Videos, followed tutorials, but still I'm not really sure how to begin.
I have considered the following methods:
Use public_activity gem, adding a new a new field "readed" to the migration. And thinking how to manage grouped activites. But I have seen a lot of complains about perfomance. I'm expecting to have around 50000 users in my website in the first month (I already have the users), with peaks of 500-1000 users online. So maybe this is not the best way to go... as I would have a lot of activities, a lot of "notifications" and a lot of users.
Use a system like https://getstream.io/ because they also have integrations available for RoR and Ruby. The main concern here is about pricing, because checking it, if I'm not wrong, with that number of users, with around 10 notifications per user per day, I would be paying probably more than 200$ month, and always keep growing as the users grow.
Build my own system, maybe using Redis. But maybe this would be too complex and require a lot of time for a good, efficient and working code.
So still, considered these option, I don't know which one is best for me, or if it's another possibilities.
If someone have asked before these questions, please let me know your thoughts and what you think is the correct way to go.
Thank you !! :)

Local storage on Rails

I've built a Rails app, basically a CRUD app for memos/notes.
A notes title must be unique. If a user enters a name already taken a warning message is shown prompting them to chose another.
My question is how to make this latency for this feedback as close to zero as possible. When creating a note little UX speed bumps like this will get annoying for user quickly.
Of course the main bottleneck is the network. Inspired by Meteor (and mini-mongo) I was thinking some kind of local storage could be a solution?
I.E. When app first loads, send ALL JSON to the client with ALL note titles. The app (front end is Angular JS) could check LocalStorage (or App Cache, Web SQL?) instead of incurring a network round trip. The feedback would be instant.
I've used LocalStorage in the past to augment an app, but in the scenario it'd really seriously depend on it. I'm not sure how confident I'd be building on something that user might not have. Also as the number of user Notes/Memos I have doubts how feasible it is to send a JSON object down the wire with ALL the note titles. That might get pretty big. On the other hand MeteorJS seems to do this with no probs.
Has anyone done something similar or have any pointers? Thanks!
I don't know how Meteor works here, but you're right that storing all note titles in localStorage is not a good idea. Actually, you don't need localStorage here, you can just put it in a JS array, because you need this data only once (when checking new note title).
I think, there could be 2 possible solutions:
You can change your business requirements and allow non-unique title. Is there really a necessity for titles to be unique?
You can verify note title when user submits form. In this case you can provide suggestions for users, so they not spend time guessing vacant title.
Or, if titles must be unique only within a user (two users can have same title for their notes), you can really load all note titles in JS array and check uniqueness while users types in a title.
Or you can send an AJAX request checking title uniqueness as soon as user finished typing the title. In this case you can win some seconds.
Or you can send an AJAX request as soon as user typed in 3 symbols. The request will return all titles that begin with these 3 symbols, so you don't need to load all the titles.

MongoDB and embedded documents, good use cases

I am using embedded documents in MongoDB for a Rails 3 app. I like that I can use embedded documents and the values are all returned with one query and there is less load on the database server. But what happens if I want my users to be able to update properties that really should be shared across documents. Is this sort of operation feasible with MongoDB or would I be better off using normal id based relations? If ID based relations are the way to go would it affect performance to a great degree?
If you need to know anything else about the application or data I would be happy to let you know what I am working with.
Document that has many properties that all documents share.
Person
name: string
description: string
Document that wants to use these properties:
Post
(references many people)
body: string
This all depends on what are you going to do with your Person model later. I know of at least one working example (blog using MongoDB) where its developer keeps user data inside comments they make and uses one collection for the entire blog. Well, ok, he uses second one for his "tag cloud" :) He just doesn't need to keep centralized list of all commenters, he doesn't care. His blog contains consolidated data from all his previous sites/blogs?, almost 6000 posts total. Posts contain comments, comments contain users, users have emails, he got "subscribe to comments" option for every user who comments some post, authorization is handled by the external OpenID service aggregator (Loginza), he keeps user email got from Loginza response and their "login token" in their cookies. So the functionality is pretty good.
So, the real question is - what are you going to do with your Users later? If really feel like you need a separate collection (you're going to let users have centralized control panels, have site-based registration, you're going to make user-centristic features and so on), make it separate. If not - keep it simple and have fun :)
It depends on what user info you want to share acrross documents. Lets say if you have user and user have emails. Does not make sence to move emails into separate collection since will be not more that 10, 20, 100 emails per user. But if user say have some big related information that always growing, like blog posts then make sence to move it into separate collection.
So answer depend on user document structure. If you show your user document structure and what you planning to move into separate collection i will help you make decision.

Need advice on MongoDB Schema for Chat App. Embedded vs Related Documents

I'm starting a MongoDB project just for kicks and as a chance to learn MongoDB/NoSQL schemas. It'll be a live chat app and the stack includes: Rails 3, Ruby 1.9.2, Devise, Mongoid/MongoDB, CarrierWave, Redis, JQuery.
I'll be handling the live chat polling/message queueing separately. Not sure how yet, either Node.js, APE or custom EventMachine app. But in regards to Mongo, I'm thinking to use it for everything else in the app, specifically chat logs and historical transcripts.
My question is how best to design the schema as all my previous experience has been with MySQL and relational DB schema's. And as a sub-question, when is it best to us embedded documents vs related documents.
The app will have:
Multiple accounts which have multiple rooms
Multiple rooms
Multiple users per room
List of rooms a user is allowed to be in
Multiple user chats per room
Searchable chat logs on a per room and per user basis
Optional file attachment for a given chat
Given Mongo (at least last time I checked) has a document limit of 4MB, I don't think having a collection for rooms and storing all room chats as embedded documents would work out so well.
From what I've thought about so far, I'm thinking of doing something like:
A collection for accounts
A collection for rooms
Each room relates back to an account
Related documents in chats collections for all chat messages in the room
Embedded Document listing all users currently in the room
A collection for users
Embedded Document listing all the rooms the user is currently in
Embedded Document listing all the rooms the user is allowed to be in
A collection for chats
Each chat relates back to a room in the rooms collection
Each chat relates back to a user in the users collection
Embedded document with info about optional uploaded file attachment.
My main concern is how far do I go until this ends up looking like a relational schema and I defeat the purpose? There is definitely more relating than embedding going on.
Another concern is that referencing related documents is much slower than accessing embedded documents I've heard.
I want to make generic queries such as:
Give me all rooms for an account
Give me all chats in a room (or filtered via date range)
Give me all chats from a specific user
Give me all uploaded files in a given room or for a given org
etc
Any suggestions on how to structure the schema efficiently in a way that scales? Thanks everyone.
I think you're pretty much on the right track. I'd use a capped collection for chat lines, with each line containing the user ID, room ID, timestamp, and what was said. This data would expire once the capped collection's "end" is reached, so if you needed a historical log you'd want to copy data out of the capped collection into a "log" collection periodically, but capped collections are specifically designed for logging-style applications where you aren't going to be deleting documents, and insertion order matters. In the case of chat, it's a perfect match.
The only other change I'd suggest would be to maintain uploads in a separate collection, as well.
I am a big fan of mongodb as a document database aswell. But are you sure you are using mongodb for the right reason? What is mongodb powerful at?
Its a subjective question but for me in-place (atomic) updates over documents is what makes mongodb powerful. And I can't really see you using it that much. And on top of that you are hitting the document size limit problem aswell.(With experience I can tell you that embedding files to mongodb is not a good idea). You want to have a live chat application on top of database too.
Your document schema's seems logical. But I wouldn't go with mongodb for this kind of project where your application heavily depends on inserts. I would go for CouchDB.
With CouchDB you wouldn't have to worry about attachments problem, you can embed them easily. "_changes" would make your life much much easier to eighter build a live chat application / long pooling / feeding search engine (if you want to implement one).
And I saw an open source showcase project in couchone. It has some similarities with your goals: Anologue. You should check it out.
PS : Sorry it was a little off topic but I couldn't hold myself.

Resources