I have a few data values that I need to store on my rails app and wanted to know if there are any alternatives to creating a database table just to do this simple task.
Background: I'm writing some analytics and dashboard tools for my ruby on rails app and i'm hoping to speed up the dashboard by caching results that will never change. Right now I pull all users for the last 30 days, and re-arrange them so I can see the number of new users per day. It works great but takes quite a long time, in reality I should only need to calculate the most recent day and just store the rest of the array somewhere else.
Where is the best way to store this array?
Creating a database table seems a bit overkill, and I'm not sure that global variables are the correct answer. Is there a best practice for persisting data like this?
If anyone has done anything like this before let me know what you did and how it turned out.
Ruby has a built-in Hash-based key value store named PStore. This provides simple file based, transactional persistance.
PStore documentation
If you've got a database already, it's really not a big deal to create a separate table for tracking this sort of thing. When doing reporting, it's often to your advantage to create derivative summary tables exactly like what you're describing. You can update these as required using a simple SQL statement and there's no worry that your temporary store will somehow go away.
That being said, the type of report you're trying to generate is actually something that can be done in real-time except on extravagantly large data sets. The key is to have indexes that describe the exact grouping operation you're trying to do. For instance, if you're grouping by calendar date, you can create a "date" field and sync it to the "created_at" time as required. An index on this date field will make doing a GROUP BY created_date very quick:
SELECT created_date AS on_date, COUNT(id) AS new_users FROM users GROUP BY created_date
Using a lightweight database like sqlite shouldn't feel like an overkill. Alternatively, you can use key-store solutions like tokyo cabinet or even store the array in a flat file manually but I really don't see any overkill in using sqlite.
Related
Let's say I have 2 models in my app: User and Survey
I'm trying to plot the number of paid surveys over time. A paid survey is one that has been created by a user that has an active subscription. For simplicity, let's assume the User model has subscription_start_date and subscription_end_date.
So a survey becomes "paid" the moment it is created (provided the user has an active subscription) and loses its "paid" status when the subscription_end_date has passed. Essentially, the "paid survey" is really a state with a defined start and end date.
I can generate the data fine. What I'm curious about is what's the most recommended way of storing this kind of stats? What should that table look like basically.
Another thing I'm concerned about is whether there are any disadvantages of having a daily task that adds the data point for the past day.
For more context, this app is written in Rails and we're thinking of using this stat architecture for other models too.
If I am understanding you correctly, I do not think you need an additional model or daily task to generate data points. To generate your report you just need to come up with the right SQL/ActiveRecord query. When you aggregate the information, be careful not to introduce nested queries. For simplicity's sake we could pull all the information you need using:
surveys = Survey.all.includes(:user)
Based on your description, an instance of survey has a start date that is just created_at.to_date. And since Survey belongs_to :user, it's end date is user.subscription_end_date.
When plotting the information you may need to transform surveys into some data structure that groups the information by date. Alternatively you could probably achieve that with a more complex SQL statement.
You could of course introduce a new table that stores the data points by date to avoid a complex query or data aggregation via ruby. The downside of this is that you are storing redundant information and assume the burden of maintaining data integrity. That doesn't mean you shouldn't do it because there may be an upside in regards to performance and reporting convenience.
I would need more information about your project before saying exactly what I would do, but it sounds like you already have the information you need in your database and it's just a matter querying it properly.
I am looking for solution of logging data changes for public API.
There is a need to tell client app which tables form db has changed and need to be synchronised since the app synchronised last time and also need to be for specific brand and country.
Current Solution:
Version table with class_names of models which is touched from every model on create, delete, touch and save action.
When we are touching version for specific model we also look at the reflected associations and touch them too.
Version model is scoped to brand and country
REST API is responding to a request that includes last_sync_at:timestamp, brand and country
Rails look at Version with given attributes and return class_names of models which were changed since lans_sync_at timestamp.
This solution works but the problem is performance and is also hard to maintenance.
UPDATE 1:
Maybe the simple question is.
What is the best practice how to find out and tell frontend apps when and what needs to be synchronized. In terms of whole concept.
Conditions:
Front end apps needs to download only their own content changes not whole dataset.
Does not invoked synchronization when application from different country or brand needs to be synchronized.
Thank you.
I think that the best solution would be to use redis (or some other key-value store) and save your information there. Writing to redis is much faster than any sql db. You can write some service class that would save the data like:
RegisterTableUpdate.set(table_name, country_id, brand_id, timestamp)
Such call would save given timestamp under key that could look like i.e. table-update-1-1-users, where first number is country id, second number is brand id, followed by table name (or you could use country and brand names if needed). If you would like to find out which tables have changed you would just need to find redis keys with query "table-update-1-1-*", iterate through them and check which are newer than timestamp sent through api.
It is worth to rmember that redis is not as reliable as sql databases. Its reliability depends on configuration so you might want to read redis guidelines and decide if you would like to go for it.
You can take advantage of the fact that ActiveModel automatically logs every time it updates a table row (the 'Updated at' column)
When checking what needs to be updated, select the objects you are interested in and compare their 'Updated at' with the timestamp from the client app
The advantage of this approach is that you don't need to keep an additional table that lists all the updates on models, which should speed things up for the API users and be easier to maintain.
The disadvantage is that you cannot see the changes in data over time, you only know that a change occurred and you can access the latest version. If you need to track changes in data over time efficiently, than I'm afraid you'll have to rework things from the top.
(read last part - this is what you are interested in)
I would recommend that you use the decorator design pattern for changing the client queries. So the client sends a query of what he wants and the server decides what to give him based on the client's last update.
so:
the client sends a query that includes the time it last synched
the server sees the query and takes into account the client's nature (device-country)
the server decorates (changes accordingly) the query to request from the DB only the relevant data, and if that is not possible:
after the data are returned from the database manager they are trimmed to be relevant to where they are going
returns to the client all the new stuff that the client cares about.
I assume that you have a time entered field on your DB entries.
In that case the "decoration" of the query (abstractly) would be just to add something like a "WHERE" clause in your query and state you want data entered after the last update.
Finally, if you want that to be done for many devices/locales/whatever implement a decorator for the query and the result of the query and serve them to your clients as they should be served. (Keep in mind that in contrast with a subclassing approach you will only have to implement one decorator for each device/locale/whatever - not for all combinations!
Hope this helped!
I am trying to implement a 'time machine' feature in my grails application. The feature would allow user to select a date in past and would display the interface of the application that was on the selected date. How do I implement this feature? I was thinking of adding a 'dateCreated' field for all domains, so that in the time machine feature, I could query all the results with created date before the selected date. I think this would work but as the data would grow, the size of database would grow and at that time the application would be heavy. Is there any other way to do this ?
Thanks
You could maybe draw some inspiration from this related question:
How to manage object revisions in Grails?
You should look at the http://grails.org/plugin/audit-logging plugin since it will allow you to keep all versions of domain class instances. But implementing this feature will be pretty tricky since object don't exist in isolation - you'll need to not only display the data as of the previous date, but its related data (e.g. the author's books collection) as of that date too. It will make the queries quite complicated.
I'm running a Rails app on Postgres through Heroku.
I'd like to implement similar to Facebook "likes" on my site for various items, such as user comments. What's the smartest way to store these in my database that will be efficient and fast?
The obvious one is just to have a like join table between users and items, something like this:
user_id int
item_id int
item_type string
created_at datettime
However, when being displayed, this would mean every time I pull an item, I would have to pull a join across the entire like table, which could get very big.
The obvious response for this would be to store a counter in the items, for their ongoing like count. However, this won't work because who liked an item matters, both for display next to the item, and also to hide the like button for things a user has already liked.
My plan is to add to all likable items a text field in which I would store a serialized array. That way, every pull of an item would come with the complete list of who liked it. Is there a better way to do this, or is this the recommended approach?
Do you have reason to believe that your dataset is going to be so large that the join is going to be too expensive? Postgres, while not as fast as the fastest RDBMSs out there, is pretty fast these days. I used to run a website that got millions of pageviews a day, and required some pretty complicated queries to generate each page. By doing a bit of simple caching we were able to run it on very modest hardware.
You give up a lot of the benefits of using an RDBMS when you denormalize. I would only do so if I knew I had to. And if that were the case I would consider using something else, like a simple key/value database, for that data. But I think that that's only likely to be the case for you if you have an awful lot of data.
I'm looking for some ideas about saving a snapshot of some (different) records at the time of an event, for example user getting a document from my application, so that this document can be regenerated later. What strategies do you reccomend? Should I use the same table as the table with current values or use a historical table? Do you know of any plugins that could help me with the task? Please share your thoughts and solutions.
There are several plugins for this.
Acts_audited
acts as audited creates a single table for all of the auditable objects and requires no changes to your existing tables. You get on entry per change with the changes stored as a hash in a memo field, the type of change (CRUD). It is very easy to setup with just a single statement in the application controller specifying which models you want audited.
Rolling back is up to you but the information is there. Because the information stored is just the changes building the whole object may be difficult due to subsequent changes.
Acts_as_versioned
A bit more complicated to setup- you need a separate table for each object you want to version and you have to add a version id to your existing table. Rollback is a very easy. There are forks on github that provide a hash of changes since last version so you can easily highlight the differences (it's what I use). My guess is that this is the most popular solution.
Ones I have no experience with: acts_as_revisable. I'll probably give this one a go next time I need versioning as it looks much more sophisticated.
I did this once awhile back. We created a new table that had a very similar structure to the table we wanted to log and whenever we needed to log something, we did something similar to this:
attr = object_to_log.attributes
# Remove things like created_at, updated_at, other unneeded columns
log = MyLogger.new(attrs)
log.save
There's a very good chance there are plugins/gems to do stuff like this, though.
I have used acts_as_versioned for stuff like this.
The OP is a year old but thought I'd add vestal_versions to the mix. It uses a single table to track serialized hashes of each version. By traversing the record of changes, the models can be reverted to any point in time.
Seems to be the community favorite as of this post...