Building Django Activity Feed using Redis - ios

How can one build an activity feed using Django & Redis?
Example: In the 'Home' section of my iOS app, I would like to fill it with activities generated by users via JSON.
Bob liked Kyle's poem.
Bob started following Kyle.
Bob liked 6 poems ------>(all six poems aggregated together in the feed)
Bob commented on Kyle's poem: Beautiful piece!
How can I go about doing this? If the question is not clear, please let me know so that I can make it clearer for you and others who come across this post and may find it useful! Thank you

What you are actually doing requires
aggregation logic (which you can write in python since your main framework is django)
a task queue running in the background which executes these aggregation logic
denormalized and duplicated data in your redis database, repeating data which are relational in your main database, such as your postgresql database
You can breakdown your activity feed into its components which are aggregated together on redis but are related to each other on your relational database.
Bob and Kyle and poems and Beautiful piece are objects, respectively user object, user object, a poem object and a comment object which are stored in your relational database.
Your activity types are "following", "liked", "commented".
You can then write your python logic to aggregate them into a single feed item which is stored in your redis database and each of these feed item is composed of objects and activity types (and a time stamp in which that activity happens).
That's the primary design consideration to get started.
Here's a good example - https://github.com/SupermanScott/redis-activity-example

Stream-Framework is an open-source library made to build feeds and supports both Redis and Cassandra as storage backends.
You can check it out on github
Disclaimer: I am one of the authors of Stream-Framework

Related

How should I be storing expiring stats?

Let's say I have 2 models in my app: User and Survey
I'm trying to plot the number of paid surveys over time. A paid survey is one that has been created by a user that has an active subscription. For simplicity, let's assume the User model has subscription_start_date and subscription_end_date.
So a survey becomes "paid" the moment it is created (provided the user has an active subscription) and loses its "paid" status when the subscription_end_date has passed. Essentially, the "paid survey" is really a state with a defined start and end date.
I can generate the data fine. What I'm curious about is what's the most recommended way of storing this kind of stats? What should that table look like basically.
Another thing I'm concerned about is whether there are any disadvantages of having a daily task that adds the data point for the past day.
For more context, this app is written in Rails and we're thinking of using this stat architecture for other models too.
If I am understanding you correctly, I do not think you need an additional model or daily task to generate data points. To generate your report you just need to come up with the right SQL/ActiveRecord query. When you aggregate the information, be careful not to introduce nested queries. For simplicity's sake we could pull all the information you need using:
surveys = Survey.all.includes(:user)
Based on your description, an instance of survey has a start date that is just created_at.to_date. And since Survey belongs_to :user, it's end date is user.subscription_end_date.
When plotting the information you may need to transform surveys into some data structure that groups the information by date. Alternatively you could probably achieve that with a more complex SQL statement.
You could of course introduce a new table that stores the data points by date to avoid a complex query or data aggregation via ruby. The downside of this is that you are storing redundant information and assume the burden of maintaining data integrity. That doesn't mean you shouldn't do it because there may be an upside in regards to performance and reporting convenience.
I would need more information about your project before saying exactly what I would do, but it sounds like you already have the information you need in your database and it's just a matter querying it properly.

Update Core Data with updated data from web service (including relationships)

I'm building an app that gets a lot of data from a web service. The app consists of different entries that have relationships to each other. Let's make an example and say I'm building a TV show tracking app, all the data is coming from the web service, but I want to mark episodes as watched, which is a custom property on one entry so far. All of this gets save in Core Data. I have these entries:
Show ⇒ has many seasons and episodes
Season ⇒ has many episodes and one show
Episode ⇒ has one show and one season
The main part I'm currently struggling with is how I can best update all of these entries when the web service has an updated version of the data (maybe the show got a new season or some wrong data got fixed). At this point, the only custom property on these entries which differs from the data the web service provides is the watched attribute I created on the Episode entry.
So far I tried different ways, like removing the old data and just adding the new one (the custom watched attribute is a problem here) and I also looked into merge policies like NSMergeByPropertyObjectTrumpMergePolicy but this doesn't play nice with relationships and I got to a roadblock there.
Is there a better way or best practice how to solve this?

Logging data changes for synchronization

I am looking for solution of logging data changes for public API.
There is a need to tell client app which tables form db has changed and need to be synchronised since the app synchronised last time and also need to be for specific brand and country.
Current Solution:
Version table with class_names of models which is touched from every model on create, delete, touch and save action.
When we are touching version for specific model we also look at the reflected associations and touch them too.
Version model is scoped to brand and country
REST API is responding to a request that includes last_sync_at:timestamp, brand and country
Rails look at Version with given attributes and return class_names of models which were changed since lans_sync_at timestamp.
This solution works but the problem is performance and is also hard to maintenance.
UPDATE 1:
Maybe the simple question is.
What is the best practice how to find out and tell frontend apps when and what needs to be synchronized. In terms of whole concept.
Conditions:
Front end apps needs to download only their own content changes not whole dataset.
Does not invoked synchronization when application from different country or brand needs to be synchronized.
Thank you.
I think that the best solution would be to use redis (or some other key-value store) and save your information there. Writing to redis is much faster than any sql db. You can write some service class that would save the data like:
RegisterTableUpdate.set(table_name, country_id, brand_id, timestamp)
Such call would save given timestamp under key that could look like i.e. table-update-1-1-users, where first number is country id, second number is brand id, followed by table name (or you could use country and brand names if needed). If you would like to find out which tables have changed you would just need to find redis keys with query "table-update-1-1-*", iterate through them and check which are newer than timestamp sent through api.
It is worth to rmember that redis is not as reliable as sql databases. Its reliability depends on configuration so you might want to read redis guidelines and decide if you would like to go for it.
You can take advantage of the fact that ActiveModel automatically logs every time it updates a table row (the 'Updated at' column)
When checking what needs to be updated, select the objects you are interested in and compare their 'Updated at' with the timestamp from the client app
The advantage of this approach is that you don't need to keep an additional table that lists all the updates on models, which should speed things up for the API users and be easier to maintain.
The disadvantage is that you cannot see the changes in data over time, you only know that a change occurred and you can access the latest version. If you need to track changes in data over time efficiently, than I'm afraid you'll have to rework things from the top.
(read last part - this is what you are interested in)
I would recommend that you use the decorator design pattern for changing the client queries. So the client sends a query of what he wants and the server decides what to give him based on the client's last update.
so:
the client sends a query that includes the time it last synched
the server sees the query and takes into account the client's nature (device-country)
the server decorates (changes accordingly) the query to request from the DB only the relevant data, and if that is not possible:
after the data are returned from the database manager they are trimmed to be relevant to where they are going
returns to the client all the new stuff that the client cares about.
I assume that you have a time entered field on your DB entries.
In that case the "decoration" of the query (abstractly) would be just to add something like a "WHERE" clause in your query and state you want data entered after the last update.
Finally, if you want that to be done for many devices/locales/whatever implement a decorator for the query and the result of the query and serve them to your clients as they should be served. (Keep in mind that in contrast with a subclassing approach you will only have to implement one decorator for each device/locale/whatever - not for all combinations!
Hope this helped!

Advanced Feed Parsing in Rails

I am a newbie to rails and I have been watching Rails Casts videos.
I am interested to know a little bit more on FeedZirra (Rails casts episode 168) and especially feed parsing.
For example, I need to Parse feeds from Telegraph and Guardian
I want to put all the sports news from both the newspapers in one table, just football news in another table, cricket news in another table etc
How can I achieve that using feed-zirra?
How do I display only football news in one view and only cricket news in another view?
Also, I want the user to know which website he is gonna visit before he actually clicks the link and finds out.
Something like this
Ryder Cup 2010: Graeme McDowell the perfect hero for Europe
5 min ago | Telegraph.co.uk
How do I display Telegraph.co.uk
Looking forward for your help and support
Thanks
There are many questions there, but I'll take this one:
I just know how to put all feeds in
table. I dont know how to keep feeds
in different tables
Create different models to suit your data model, based on what information you need to show rather than what is provided in the feed. (Different tables for each models if required or Single Table Inheritance if possible)
Write a wrapper class that will use FeedZirra (or any other parser for that matter) to read the parsed feeds and process them. These are generally kept in the lib folder.
Create a rake task which can be called to run this script OR if you are familiar with delayed_job, then create a job.
Schedule your rake task through cron or your job through delayed_job, so that you can periodically update your data.

Need advice on MongoDB Schema for Chat App. Embedded vs Related Documents

I'm starting a MongoDB project just for kicks and as a chance to learn MongoDB/NoSQL schemas. It'll be a live chat app and the stack includes: Rails 3, Ruby 1.9.2, Devise, Mongoid/MongoDB, CarrierWave, Redis, JQuery.
I'll be handling the live chat polling/message queueing separately. Not sure how yet, either Node.js, APE or custom EventMachine app. But in regards to Mongo, I'm thinking to use it for everything else in the app, specifically chat logs and historical transcripts.
My question is how best to design the schema as all my previous experience has been with MySQL and relational DB schema's. And as a sub-question, when is it best to us embedded documents vs related documents.
The app will have:
Multiple accounts which have multiple rooms
Multiple rooms
Multiple users per room
List of rooms a user is allowed to be in
Multiple user chats per room
Searchable chat logs on a per room and per user basis
Optional file attachment for a given chat
Given Mongo (at least last time I checked) has a document limit of 4MB, I don't think having a collection for rooms and storing all room chats as embedded documents would work out so well.
From what I've thought about so far, I'm thinking of doing something like:
A collection for accounts
A collection for rooms
Each room relates back to an account
Related documents in chats collections for all chat messages in the room
Embedded Document listing all users currently in the room
A collection for users
Embedded Document listing all the rooms the user is currently in
Embedded Document listing all the rooms the user is allowed to be in
A collection for chats
Each chat relates back to a room in the rooms collection
Each chat relates back to a user in the users collection
Embedded document with info about optional uploaded file attachment.
My main concern is how far do I go until this ends up looking like a relational schema and I defeat the purpose? There is definitely more relating than embedding going on.
Another concern is that referencing related documents is much slower than accessing embedded documents I've heard.
I want to make generic queries such as:
Give me all rooms for an account
Give me all chats in a room (or filtered via date range)
Give me all chats from a specific user
Give me all uploaded files in a given room or for a given org
etc
Any suggestions on how to structure the schema efficiently in a way that scales? Thanks everyone.
I think you're pretty much on the right track. I'd use a capped collection for chat lines, with each line containing the user ID, room ID, timestamp, and what was said. This data would expire once the capped collection's "end" is reached, so if you needed a historical log you'd want to copy data out of the capped collection into a "log" collection periodically, but capped collections are specifically designed for logging-style applications where you aren't going to be deleting documents, and insertion order matters. In the case of chat, it's a perfect match.
The only other change I'd suggest would be to maintain uploads in a separate collection, as well.
I am a big fan of mongodb as a document database aswell. But are you sure you are using mongodb for the right reason? What is mongodb powerful at?
Its a subjective question but for me in-place (atomic) updates over documents is what makes mongodb powerful. And I can't really see you using it that much. And on top of that you are hitting the document size limit problem aswell.(With experience I can tell you that embedding files to mongodb is not a good idea). You want to have a live chat application on top of database too.
Your document schema's seems logical. But I wouldn't go with mongodb for this kind of project where your application heavily depends on inserts. I would go for CouchDB.
With CouchDB you wouldn't have to worry about attachments problem, you can embed them easily. "_changes" would make your life much much easier to eighter build a live chat application / long pooling / feeding search engine (if you want to implement one).
And I saw an open source showcase project in couchone. It has some similarities with your goals: Anologue. You should check it out.
PS : Sorry it was a little off topic but I couldn't hold myself.

Resources