Techniques for storing/retrieving historical data in Rails - ruby-on-rails

I'm looking for some ideas about saving a snapshot of some (different) records at the time of an event, for example user getting a document from my application, so that this document can be regenerated later. What strategies do you reccomend? Should I use the same table as the table with current values or use a historical table? Do you know of any plugins that could help me with the task? Please share your thoughts and solutions.

There are several plugins for this.
Acts_audited
acts as audited creates a single table for all of the auditable objects and requires no changes to your existing tables. You get on entry per change with the changes stored as a hash in a memo field, the type of change (CRUD). It is very easy to setup with just a single statement in the application controller specifying which models you want audited.
Rolling back is up to you but the information is there. Because the information stored is just the changes building the whole object may be difficult due to subsequent changes.
Acts_as_versioned
A bit more complicated to setup- you need a separate table for each object you want to version and you have to add a version id to your existing table. Rollback is a very easy. There are forks on github that provide a hash of changes since last version so you can easily highlight the differences (it's what I use). My guess is that this is the most popular solution.
Ones I have no experience with: acts_as_revisable. I'll probably give this one a go next time I need versioning as it looks much more sophisticated.

I did this once awhile back. We created a new table that had a very similar structure to the table we wanted to log and whenever we needed to log something, we did something similar to this:
attr = object_to_log.attributes
# Remove things like created_at, updated_at, other unneeded columns
log = MyLogger.new(attrs)
log.save
There's a very good chance there are plugins/gems to do stuff like this, though.

I have used acts_as_versioned for stuff like this.

The OP is a year old but thought I'd add vestal_versions to the mix. It uses a single table to track serialized hashes of each version. By traversing the record of changes, the models can be reverted to any point in time.
Seems to be the community favorite as of this post...

Related

Logging data changes for synchronization

I am looking for solution of logging data changes for public API.
There is a need to tell client app which tables form db has changed and need to be synchronised since the app synchronised last time and also need to be for specific brand and country.
Current Solution:
Version table with class_names of models which is touched from every model on create, delete, touch and save action.
When we are touching version for specific model we also look at the reflected associations and touch them too.
Version model is scoped to brand and country
REST API is responding to a request that includes last_sync_at:timestamp, brand and country
Rails look at Version with given attributes and return class_names of models which were changed since lans_sync_at timestamp.
This solution works but the problem is performance and is also hard to maintenance.
UPDATE 1:
Maybe the simple question is.
What is the best practice how to find out and tell frontend apps when and what needs to be synchronized. In terms of whole concept.
Conditions:
Front end apps needs to download only their own content changes not whole dataset.
Does not invoked synchronization when application from different country or brand needs to be synchronized.
Thank you.
I think that the best solution would be to use redis (or some other key-value store) and save your information there. Writing to redis is much faster than any sql db. You can write some service class that would save the data like:
RegisterTableUpdate.set(table_name, country_id, brand_id, timestamp)
Such call would save given timestamp under key that could look like i.e. table-update-1-1-users, where first number is country id, second number is brand id, followed by table name (or you could use country and brand names if needed). If you would like to find out which tables have changed you would just need to find redis keys with query "table-update-1-1-*", iterate through them and check which are newer than timestamp sent through api.
It is worth to rmember that redis is not as reliable as sql databases. Its reliability depends on configuration so you might want to read redis guidelines and decide if you would like to go for it.
You can take advantage of the fact that ActiveModel automatically logs every time it updates a table row (the 'Updated at' column)
When checking what needs to be updated, select the objects you are interested in and compare their 'Updated at' with the timestamp from the client app
The advantage of this approach is that you don't need to keep an additional table that lists all the updates on models, which should speed things up for the API users and be easier to maintain.
The disadvantage is that you cannot see the changes in data over time, you only know that a change occurred and you can access the latest version. If you need to track changes in data over time efficiently, than I'm afraid you'll have to rework things from the top.
(read last part - this is what you are interested in)
I would recommend that you use the decorator design pattern for changing the client queries. So the client sends a query of what he wants and the server decides what to give him based on the client's last update.
so:
the client sends a query that includes the time it last synched
the server sees the query and takes into account the client's nature (device-country)
the server decorates (changes accordingly) the query to request from the DB only the relevant data, and if that is not possible:
after the data are returned from the database manager they are trimmed to be relevant to where they are going
returns to the client all the new stuff that the client cares about.
I assume that you have a time entered field on your DB entries.
In that case the "decoration" of the query (abstractly) would be just to add something like a "WHERE" clause in your query and state you want data entered after the last update.
Finally, if you want that to be done for many devices/locales/whatever implement a decorator for the query and the result of the query and serve them to your clients as they should be served. (Keep in mind that in contrast with a subclassing approach you will only have to implement one decorator for each device/locale/whatever - not for all combinations!
Hope this helped!

Papertrail versioning & storing current versions?

I've added the Papertrail gem to my app after approx 1000 records have already been created.
Does anyone know how to populate the versions table with the current versions of my data?
Without doing this it seems only the second change for each will allow me to properly make comparisions / flag what has changed via diffing etc.
Many thanks!
A quick and dirty method would be to just create a loop that updates a single field, like adding a space to string field or something harmless.

Implementation of Time Machine Feature in grails application

I am trying to implement a 'time machine' feature in my grails application. The feature would allow user to select a date in past and would display the interface of the application that was on the selected date. How do I implement this feature? I was thinking of adding a 'dateCreated' field for all domains, so that in the time machine feature, I could query all the results with created date before the selected date. I think this would work but as the data would grow, the size of database would grow and at that time the application would be heavy. Is there any other way to do this ?
Thanks
You could maybe draw some inspiration from this related question:
How to manage object revisions in Grails?
You should look at the http://grails.org/plugin/audit-logging plugin since it will allow you to keep all versions of domain class instances. But implementing this feature will be pretty tricky since object don't exist in isolation - you'll need to not only display the data as of the previous date, but its related data (e.g. the author's books collection) as of that date too. It will make the queries quite complicated.

Store data in Ruby on Rails without Database

I have a few data values that I need to store on my rails app and wanted to know if there are any alternatives to creating a database table just to do this simple task.
Background: I'm writing some analytics and dashboard tools for my ruby on rails app and i'm hoping to speed up the dashboard by caching results that will never change. Right now I pull all users for the last 30 days, and re-arrange them so I can see the number of new users per day. It works great but takes quite a long time, in reality I should only need to calculate the most recent day and just store the rest of the array somewhere else.
Where is the best way to store this array?
Creating a database table seems a bit overkill, and I'm not sure that global variables are the correct answer. Is there a best practice for persisting data like this?
If anyone has done anything like this before let me know what you did and how it turned out.
Ruby has a built-in Hash-based key value store named PStore. This provides simple file based, transactional persistance.
PStore documentation
If you've got a database already, it's really not a big deal to create a separate table for tracking this sort of thing. When doing reporting, it's often to your advantage to create derivative summary tables exactly like what you're describing. You can update these as required using a simple SQL statement and there's no worry that your temporary store will somehow go away.
That being said, the type of report you're trying to generate is actually something that can be done in real-time except on extravagantly large data sets. The key is to have indexes that describe the exact grouping operation you're trying to do. For instance, if you're grouping by calendar date, you can create a "date" field and sync it to the "created_at" time as required. An index on this date field will make doing a GROUP BY created_date very quick:
SELECT created_date AS on_date, COUNT(id) AS new_users FROM users GROUP BY created_date
Using a lightweight database like sqlite shouldn't feel like an overkill. Alternatively, you can use key-store solutions like tokyo cabinet or even store the array in a flat file manually but I really don't see any overkill in using sqlite.

How do I update only the properties of my models that have changed in MVC?

I'm developing a webapp that allows the editing of records. There is a possibility that two users could be working on the same screen at a time and I want to minimise the damage done, if they both click save.
If User1 requests the page and then makes changes to the Address, Telephone and Contact Details, but before he clicks Save, User2 requests the same page.
User1 then clicks save and the whole model is updated using TryUpdateModel(), if User2 simply appends some detail to the Notes field, when he saves, the TryUpdateModel() method will overwrite the new details User1 saved, with the old details.
I've considered storing the original values for all the model's properties in a hidden form field, and then writing a custom TryUpdateModel to only update the properties that have changed, but this feels a little too like the Viewstate we've all been more than happy to leave behind by moving to MVC.
Is there a pattern for dealing with this problem that I'm not aware of?
How would you handle it?
Update: In answer to the comments below, I'm using Entity Framework.
Anthony
Unless you have any particular requirements for what happens in this case (e.g. lock the record, which of course requires some functionality to undo the lock in the event that the user decides not to make a change) I'd suggest the normal approach is an optimistic lock:
Each update you perform should check that the record hasn't changed in the meantime.
So:
Put an integer "version" property or a guid / rowversion on the record.
Ensure this is contained in a hidden field in the html and is therefore returned with any submit;
When you perform the update, ensure that the (database) record's version/guid/rowversion still matches the value that was in the hidden field [and add 1 to the "version" integer when you do the update if you've decided to go with that manual approach.]
A similar approach is obviously to use a date/time stamp on the record, but don't do that because, to within the accuracy of your system clock, it's flawed.
[I suggest you'll find fuller explanations of the whole approach elsewhere. Certainly if you were to google for information on NHibernate's Version functionality...]
Locking modification of a page while one user is working on it is an option. This is done in some wiki software like dokuwiki. In that case it will usually use some javascript to free the lock after 5-10 minutes of inactivity so others can update it.
Another option might be storing all revisions in a database so when two users submit, both copies are saved and still exist. From there on, all you'd need to do is merge the two.
You usually don't handle this. If two users happen to edit a document at the same time and commit their updates, one of them wins and the other looses.
Resources lockout can be done with stateful desktop applications, but with web applications any lockout scheme you try to implement may only minimize the damage but not prevent it.
Don't try to write an absolutely perfect and secure application. It's already good as it is. Just use it, probably the situation won't come up at all.
If you use LINQ to SQL as your ORM it can handle the issues around changed values using the conflicts collection. However, essentially I'd agree with Mastermind's comment.

Resources