Cache layer to json api (on rails) - ruby-on-rails

I have a small social website on rails (planing to port to phoenix) that use react on view and backend is just a JSON API,
with more o less 3000 users online at any moment. It runs with postgres/memcached
When user, for example, visits its feed page, I do:
Select activities from database (20 per page)
Select last 4 comments from each activity from database (justo 1 select)
Select all users referenced by activity or comment from database (select users.* from users where id in (1,3,4,5,...100) )
I have a cache layer (memcached) that when I will load users, first I try load from memcached, if it not exists I read from database and put it on cache.
BUT I also have some "listenners" on users model (and over others referenced models like address and profile) to invalidate cache if any field change.
The problem:
This cache demand a lot of code.
Sometimes cache run out of sync.
I hate to have this listeners and they are "side effects"
My question is: Any one is doing something like that??
I search A LOT over all google about cache layer to json api and looks like that everyone is just using database directly.
I know that Rails has it own soluction (and I gess that phoenix dont has one), but it always end up using update_at column, that means, I have to go to database anyway.
alternative:
Live with date, life is not pretty
Buy a more powerful postgres instance... any one is using memcached like that.
Remover listeners, put some expires_in (1 or 2 minutos... or more) and let app
show out of sync data for a couple of minutes.
thanks for any help!

Related

Logging data changes for synchronization

I am looking for solution of logging data changes for public API.
There is a need to tell client app which tables form db has changed and need to be synchronised since the app synchronised last time and also need to be for specific brand and country.
Current Solution:
Version table with class_names of models which is touched from every model on create, delete, touch and save action.
When we are touching version for specific model we also look at the reflected associations and touch them too.
Version model is scoped to brand and country
REST API is responding to a request that includes last_sync_at:timestamp, brand and country
Rails look at Version with given attributes and return class_names of models which were changed since lans_sync_at timestamp.
This solution works but the problem is performance and is also hard to maintenance.
UPDATE 1:
Maybe the simple question is.
What is the best practice how to find out and tell frontend apps when and what needs to be synchronized. In terms of whole concept.
Conditions:
Front end apps needs to download only their own content changes not whole dataset.
Does not invoked synchronization when application from different country or brand needs to be synchronized.
Thank you.
I think that the best solution would be to use redis (or some other key-value store) and save your information there. Writing to redis is much faster than any sql db. You can write some service class that would save the data like:
RegisterTableUpdate.set(table_name, country_id, brand_id, timestamp)
Such call would save given timestamp under key that could look like i.e. table-update-1-1-users, where first number is country id, second number is brand id, followed by table name (or you could use country and brand names if needed). If you would like to find out which tables have changed you would just need to find redis keys with query "table-update-1-1-*", iterate through them and check which are newer than timestamp sent through api.
It is worth to rmember that redis is not as reliable as sql databases. Its reliability depends on configuration so you might want to read redis guidelines and decide if you would like to go for it.
You can take advantage of the fact that ActiveModel automatically logs every time it updates a table row (the 'Updated at' column)
When checking what needs to be updated, select the objects you are interested in and compare their 'Updated at' with the timestamp from the client app
The advantage of this approach is that you don't need to keep an additional table that lists all the updates on models, which should speed things up for the API users and be easier to maintain.
The disadvantage is that you cannot see the changes in data over time, you only know that a change occurred and you can access the latest version. If you need to track changes in data over time efficiently, than I'm afraid you'll have to rework things from the top.
(read last part - this is what you are interested in)
I would recommend that you use the decorator design pattern for changing the client queries. So the client sends a query of what he wants and the server decides what to give him based on the client's last update.
so:
the client sends a query that includes the time it last synched
the server sees the query and takes into account the client's nature (device-country)
the server decorates (changes accordingly) the query to request from the DB only the relevant data, and if that is not possible:
after the data are returned from the database manager they are trimmed to be relevant to where they are going
returns to the client all the new stuff that the client cares about.
I assume that you have a time entered field on your DB entries.
In that case the "decoration" of the query (abstractly) would be just to add something like a "WHERE" clause in your query and state you want data entered after the last update.
Finally, if you want that to be done for many devices/locales/whatever implement a decorator for the query and the result of the query and serve them to your clients as they should be served. (Keep in mind that in contrast with a subclassing approach you will only have to implement one decorator for each device/locale/whatever - not for all combinations!
Hope this helped!

Accessing huge volumes of data from Facebook

So I am working on a Rails application, and the person I am designing it for has what seem like extremely hefty data volume requirements. They want to gather ALL posts by a user that logs into the application, and all of the posts for each of their friends for the past year.
Before this particular level of detail was communicated to me, I built the thing using the fb_graph gem and would paginate through posts. I am running into the fact that first it takes a very long time to do this, even when I change the number of posts requested per page. Second, I frequently run into the Oauth error #613, more than 600 requests per 600 seconds. After increasing each request to 200 posts I run into this limit less, but it still takes an incredibly long time to get all of this data.
I am not particularly familiar with the FQL alternative, but it seems to me that we are going to have to either prioritize speed or volume of data. Is there a way that I am missing that would allow me to quickly retrieve this level of information?
Edit: I do save all posts to the database as I retrieve them. What is required is to make one pass through and grab all of the posts for the past year, for the user and friends. This process takes a long time and I am basically wondering if there is any way that it can be sped up.
One thing that I'd like to point out here:
You should implement some kind of local caching for user's posts. I mean, instead of querying FB each time for the posts, you should save the posts in your local database and only check for new posts (whenever needed).
This is faster and saves you many API requests.

How do I see real-time activity of my users in Rails 3?

What I would like to do is have my admin user be able to see - in real time (via some AJAX/jQuery niceness) - what my user's are doing.
How do I go about doing that ?
I assume it has something to do with session activity - and I have started saving the session to the db, rather than the cookie.
But generally speaking, how do I take that info and parse it in real time ?
I looked at my session table and aside from the ids (id and session_id), I see a 'data' field. That data field stores a hash - which I can't make any sense of (looks like an md5 hash).
How would I use that to see that User A just clicked on Link B, and right after that User B clicked on link A, etc. ?
Is there a gem - aside from rackamole - that might be able to help me?
You might want to check out Mixpanel. They are easy to setup and have some of what you are asking for.
The session data only contains the values stored in the session[]-hash from the user. It doesn't store which action/controller was called, so you don't know which "link was clicked".
Get the activity of your users:
Besides rackamole you have two options IMHO.
Use a before_filter in your ApplicationController to store the relevant info you are interested in. (Name of controller, action or URI, additional parameters and id of the logged in user for example).
Use an AJAX-call at the bottom of each page which posts back the info you are interested in (URI, id of logged in user, etc.) to your server. This allows faster response times from the server, as the info is stored after the page has already been delivered. Plus, you don't have to use a Rails-request to store it. The AJAX-request could also be calling a simple PHP-script writing the data to disk. This is much faster.
Storing this activity:
Store this data/info either in the database or in a logfile. The database will give your more flexibility like showing all actions from one user, or all visitors for one page, etc. The logfile solution will give you better performance.
Realtime vs. Oldschool:
As for pulling out your collected data in realtime, you have to build your own solution. To do this elegantly (without querying your server once a second to look if new data has arrived) you'll need another server process. Search for AJAX Push for more info.
Depending on your application I'd ask myself if realtime notifications for this are really necessary (because of all the hassles of setting this up).
To monitor the activity on your site, it should be enough to have a page listing the latest actions and manually refresh it (or refresh it automatically every ten seconds).
Maybe you can test https://github.com/raid5/acts_as_scribe#readme
It works with Rails 3 too.

Need advice on MongoDB Schema for Chat App. Embedded vs Related Documents

I'm starting a MongoDB project just for kicks and as a chance to learn MongoDB/NoSQL schemas. It'll be a live chat app and the stack includes: Rails 3, Ruby 1.9.2, Devise, Mongoid/MongoDB, CarrierWave, Redis, JQuery.
I'll be handling the live chat polling/message queueing separately. Not sure how yet, either Node.js, APE or custom EventMachine app. But in regards to Mongo, I'm thinking to use it for everything else in the app, specifically chat logs and historical transcripts.
My question is how best to design the schema as all my previous experience has been with MySQL and relational DB schema's. And as a sub-question, when is it best to us embedded documents vs related documents.
The app will have:
Multiple accounts which have multiple rooms
Multiple rooms
Multiple users per room
List of rooms a user is allowed to be in
Multiple user chats per room
Searchable chat logs on a per room and per user basis
Optional file attachment for a given chat
Given Mongo (at least last time I checked) has a document limit of 4MB, I don't think having a collection for rooms and storing all room chats as embedded documents would work out so well.
From what I've thought about so far, I'm thinking of doing something like:
A collection for accounts
A collection for rooms
Each room relates back to an account
Related documents in chats collections for all chat messages in the room
Embedded Document listing all users currently in the room
A collection for users
Embedded Document listing all the rooms the user is currently in
Embedded Document listing all the rooms the user is allowed to be in
A collection for chats
Each chat relates back to a room in the rooms collection
Each chat relates back to a user in the users collection
Embedded document with info about optional uploaded file attachment.
My main concern is how far do I go until this ends up looking like a relational schema and I defeat the purpose? There is definitely more relating than embedding going on.
Another concern is that referencing related documents is much slower than accessing embedded documents I've heard.
I want to make generic queries such as:
Give me all rooms for an account
Give me all chats in a room (or filtered via date range)
Give me all chats from a specific user
Give me all uploaded files in a given room or for a given org
etc
Any suggestions on how to structure the schema efficiently in a way that scales? Thanks everyone.
I think you're pretty much on the right track. I'd use a capped collection for chat lines, with each line containing the user ID, room ID, timestamp, and what was said. This data would expire once the capped collection's "end" is reached, so if you needed a historical log you'd want to copy data out of the capped collection into a "log" collection periodically, but capped collections are specifically designed for logging-style applications where you aren't going to be deleting documents, and insertion order matters. In the case of chat, it's a perfect match.
The only other change I'd suggest would be to maintain uploads in a separate collection, as well.
I am a big fan of mongodb as a document database aswell. But are you sure you are using mongodb for the right reason? What is mongodb powerful at?
Its a subjective question but for me in-place (atomic) updates over documents is what makes mongodb powerful. And I can't really see you using it that much. And on top of that you are hitting the document size limit problem aswell.(With experience I can tell you that embedding files to mongodb is not a good idea). You want to have a live chat application on top of database too.
Your document schema's seems logical. But I wouldn't go with mongodb for this kind of project where your application heavily depends on inserts. I would go for CouchDB.
With CouchDB you wouldn't have to worry about attachments problem, you can embed them easily. "_changes" would make your life much much easier to eighter build a live chat application / long pooling / feeding search engine (if you want to implement one).
And I saw an open source showcase project in couchone. It has some similarities with your goals: Anologue. You should check it out.
PS : Sorry it was a little off topic but I couldn't hold myself.

quering an external oracle db in rails application

I have a website which useses a mysql database for its whole operation . But for a new requirement i need to query a external oracle database( used by other component) and compile a list of items and display in a page in the website. How is it possible to connect to a external database just for rendering a single page.
And is it possible to cache the queried result for say 1 month before invalidating the cache and get the updated list of items. i dont want query the external oracle db for each request.
Why not a monthly job that just copies the data from the Oracle database into the MySQL database ?
As stated by Myers, a simple solution is to accept a data feed. For example, a cron job could pull data from the Oracle database at defined intervals, say daily or weekly, and then insert the data into your web application's local MySQL database. The whole process could be essentially transparent to your web application. The caching interval, or how long you go between feeds, would be up to you.
I'll also point out that this could be an opportunity for an API that would more readily support sharing of data between applications. This would, of course, be more work than a simple data feed, but has the possibility of being more useful to more people.

Resources