Rails design for database record (employment) history - ruby-on-rails

So, I have a Rails application that tracks employees. We want to track the employment history over time. We'd like to be able to do the following sorts of queries:
Search employees by their current status (salary, manager, title etc.)
Search for remployees by status as of a particular date
Search for employees who ever matched a condition
We don't mind the historical queries being significantly slower, even maybe 2 orders of magnitude slower.
This history is different from an audit trail - an audit trail is the history of the data, so an audit trail will tell you "what salary was stored for employee X on date Y," but it won't let you correct that older data if it is wrong.
[We are also using audit trails for auditing purposes, but I think of the logic for audit trails as providing an almost orthogonal design requirement.]
Is there a known database design pattern for best implementing this kind of history? Or a site where I can find discussions of the various "obvious" designs and the trade-offs? Any Rails plugins?
Rails 2.3.5 with Ruby 1.8.7, if that matters.

I haven't used it, but paper_trail claims to support this type of version management. It would seem to handle your requirements (1) and (2) easily -- not sure about (3), you might need to do some heavy lifting to get a query that searched across models and versions at the same time.
This history is different from an audit trail
Meaning that you need the ability to make changes to old versions -- I doubt that is supported, and it seems problematic to implement if you also need this to act as an audit trail.

Related

Data and business logic history in a Rails app

I'm working on an existing Rails app, on Postgresql, that calculates commissions and various data for contractors.
Employees have many Contractors. Contractors and Employees both have fields that are used in business logic to calculate commissions.
My client wants to have a yearly snapshot of all of their data, so that they can be free to change business logic, add and remove employees, etc without losing their past (calculated) data.
My initial thought in implementing this would be Postgres schemas. I would have a cron task every year that takes the database as-is and copies every table and record to a schema for that year. That would be equivalent to simply having the older version of the DB in the future. I am worried, however, that application logic would break once columns are added in the future.
For example, a schema is created one year and a column gets added to a contractors table later that is used in a commissions calculation. How would I also save the old version of this commissions formula that doesn't depend on the new column?
The only solution I can think of is to simply keep the old formula and conditionally use them based on schema. I feel like this is very dirty and can lead to a lot of garbage as business logic changes.
How do you recommend I approach this problem? Thanks in advance for your help!
I think you should have stored the calculated commision in your db to prevent recalculation. An accepted calculated value is a fact, just persist that value.
Should you need to audit the calculated fields sometime later, Im not sure the old calculation logic should be made very convenient to retrieve on application layer. You might need to trace back your code svn for this. Or the data warehouse should have the calculation logic. The application can only provide the required calculation parameters and let the auditor handle it.
If the usecase is to easily rollback to a specific historical business rules out of blue, then I wouldnt recommend to accommodate such requirement.

Does anyone know of a way to globally scope ActiveRecord? I'm looking to achieve a snapshot in time of my entire database

At work, I often need to investigate data in a previous state. For example, a user was seeing time-sensitive information in their app 3 days ago (their last 50 order records). That information has since been overridden by newer orders. In order to replicate that issue for debugging, I need to limit all pertinent data (orders, user interactions, etc.) before a certain timestamp.
My question boils down to this: Instead of scoping each table that I'm querying (and risk missing something), is there a clean way to scope all timestamps on records and tables that might be queried during this kind of investigation?
I'm on Rails 5.1.6
interesting question.
If I were in your shoes, I'd investigate whether I could achieve what you're looking to do using database views (assuming your using Postgres). Especially sense it sounds like you are looking to do lookups on multiple tables that may or may not be related.
Check out this gem: https://github.com/scenic-views/scenic to get an idea of how you could implement views, though you don't necessarily need a gem.
That said, why I think this would be good for you is because it would allow you to set up a filter for the data you want, which then you could filter accordingly, and then just make some test to validate accuracy.
Hope this helps!
This sounds like a job for ARel. ARel makes it a lot easier to build time/system clock based Scopes. You can define as many of these as you want.
ARel:
scope :created_at_before, ->(timestamp) { where( arel_table[:created_at].gte(timestamp) }
scope :created_at_over_three_months_ago, { created_at_before(3.months.ago) }
#and so on and so on...
But I have to ask this....
You want a "snapshot in time of your database".
Take a backup????? A backup of your database is a snapshot in time of your database. Restore this backup on a separate machine. You can even automate taking these backups with cron, Heroku Scheduler, or your platforms equivalent.

Credit system: history based or balance based?

I am going to write a simple credit system that user can "add", "deduct" credits in the system. Currently I am thinking of two approaches.
Simple one: Store the user' credit as balance field in the database, and all actions ("add", "deduct") are logged but not used to compute the latest balance.
History based: Don't store the balance in database. The balance is computed by looking at the history of transactions, e.g. ("add", "deduct")
Both case would works I think, but I am looking to see if any caveat when designing such a system, particularly I am favoring the History based system.
Or, are there any reference implementation or open source module I am use?
Update: Or are there any Ruby/Rail based module like AuthLogic so I can plug and play into my existing code without reinventing the wheel (e.g. transaction, rollback, security etc)?
Absolutely use both.
The balance-based way gives you fast access to the current amount.
The history-based way gives you auditing. The history table should store the transaction (as you describe), a timestamp, the balance before the transaction happened, and ideally a way to track the funds' source/destination.
See the Ruby Toolbox for bookkeeping and Plutus double-entry bookkeeping gem.
In addition, if your credit system may affect users, then I recommend also using logging, and ideally read about secure log verification and provable timestamp chaining.
For logging details see: techniques for ensuring verifiability of event log files.
For open source code that does credit, you may want to look into: http://www.gnucash.org/
Adding and deducting credits implies that you might also need to be aware of where these credits came from and where they went. Any time you get into a situation like this, whether it is with currency or some other numerical quantity that needs to be tracked and accounted for, you should consider using a double entry accounting pattern.
This pattern has worked for centuries and gives you all of the functionality you need to be able to see what your balances are and how they got to be that way:
Audit log of all transactions (including sources and sinks of "funds")
Running balance of all accounts over time (if you choose to record it)
Easy validation of the correctness of records
Ability to "write-once" - no updates means no tampering
If you aren't familiar with the details, start here: Double Entry Bookkeeping or ask anyone who has taken an introductory course in bookkeeping.
You asked for a Ruby on Rails open source solution that you could plug and play into your application. You can use Plutus. Here is an excerpt from the description of this project on Github:
The plutus plugin provides a complete double entry accounting system
for use in any Ruby on Rails application. The plugin follows general
Double Entry Bookkeeping practices. ... Plutus consists of tables that
maintain your accounts, entries and debits and credits. Each entry can
have many debits and credits. The entry table, which records your
business transactions is, essentially, your accounting Journal.
yes, use both.
On top of that, you'll sometime need to reverse a transaction/
transactions.When doing that, create a new reversed transaction to
notate the money transfer.
sometimes, You'll need to unify several transactions under one roof. I suggest to create a third table called 'tokens' that will be the payments manager and you'll unify those grouped transactions under that token.
token.transactions = (select * from transactions t where t.token = "123") for example

Multi-schema Postgres on Heroku

I'm extending an existing Rails app, and I have to add multi-tenant support to it. I've done some reading, and seeing how this app is going to be hosted on Heroku, I thought I could take advantage of Postgres' multi-schema functionality.
I've read that there seems to be some performance issues with backups when multiple schemas are in use. This information I felt was a bit outdated. Does anyone know if this is still the case?
Also, are there any other performance issues, or caveats I should take into consideration?
I've already thought about adding a field to every table so I can use a single schema, and have that field reference to the tenants table, but given the time windows multiple schemas seem the best solution.
I use postgres schemas for a multi-tenancy site based on some work by Ryan Bigg and the Apartment gem.
https://leanpub.com/multi-tenancy-rails
https://github.com/influitive/apartment
I find that having seperate schemas for each client an elegant solution which provides a higher degree of data segregation. Personally I find the performance improves because Postgres can simply return all results from a table without have to filter to an 'owner_id'.
I also think it makes for simpler migrations and allows you to adjust individual customer data without making global changes. For example you can add columns to specific customers schemas and use feature flags to enable custom features.
My main argument relating to performance would be that backup is a periodic process, whereas customer table scoping would be on every access. On that basis, I would take any performance hit on backup over slowing down the customer experience.

How to achieve versioned ActiveRecord associations?

I want to work with versioned ActiveRecord associations. E.g., I want to find the object that another object belongs_to as of a certain past date, or the one that it belonged to before that. Does there already exist a library subclassing Rails' ActiveRecord to provide versioned relations? Or some other Ruby library which provides persistable versioned relations?
Try the ActsAsVersioned plugin
Provided you're not dealing with huge amounts of data, and the extra temporal dimension won't push your db over the edge, there are no major downsides to historically versioned data. Extra query complexity can be a slight pain, but it's nothing major.
In my case I wrote a rails plugin that handles versioning, it adds 5 columns to each versioned table (and helps handle querying/manipulation etc):
valid_from - datetime - the datetime that this version was created at
valid_to - datetime - the datetime that this version stopped being valid
root_id - integer - the id of the original row (that this is a subsequent version of)
created_by - integer - The user id of the user that performed the creation of this version
retired_by - integer - The user id of the user that retired this version
For currently active rows, valid_to is null. Adding an index on valid_to aids in keeping performance snappy.
Supporting historical state in a transactional application is a good way to massively expand complexity, slow DB performance and make life difficult for yourself. If you only need to display or report on historical state and do not need it up to the minute consider building a star schema with Type-II slowly changing dimensions and a periodic process that updates it.
This will be substantially less complex than building an application with systemic ad-hoc history tracking running through the code base. If this approach will do what you require of the application you will probably be better off doing it. It also means that the application database will play nicely with the vanilla database access mechanisms that come with the system.
If you need reasonably frequent refresh you can implement a changed-data capture system on the database, which is relatively simple if the application only has to be concerned with current state. With a CDC mechanism the load process only has to update based on changes and will run relatively quickly.

Resources