Is it a good idea to serialize immutable data from an association?

Is it a good idea to serialize immutable data from an association? - ruby-on-rails

Let's say we have a collection of products, each with their own specifics e.g. price.
We want to issue invoices that contain said products. Using a direct association from Invoice to Product via :has_many is a no-go, since products may change and invoices must be immutable, thus resulting in an alteration of the invoice price, concept, etc.
I first thought of having an intermediate model like InvoiceProduct that would be associated to the Invoice and created from a Product. Each InvoiceProduct would be unique to its parent invoice and immutable. This option would increase the db size significantly as more invoices get issued though, so I think it is not a good option.
I'm now considering adding a serialized field to the invoice model with all the products information that are associated to it, a hash of the collection of items the invoice contains. This way we can have them in an immutable manner even if the product gets modified in the future.
I'm not sure of possible mid or long term downsides to this approach, though. Would like to hear your thoughts about it.
Also, if there's some more obvious approach that I might have overlooked I'd love to hear about it too.
Cheers

In my experience, the main downside of a serialized field approach vs the InvoiceProducts approach described above is decreased flexibility in terms of how you can use your invoice data going forward.
In our case, we have Orders and OrderItems tables in our database and use this data to generate sales analytics reports as well as customer Invoices.
Querying the OrderItem data to generate the sales reports we need is much faster and easier with this approach than it would be if the same data was stored as serialized data in the db.

No.
Serialized columns have no place in a modern application. They are a overused dirty hack from the days before native JSON/JSONB columns were widespread and have only downsides. The only exception to this rule is when you're using application side encryption.
JSON/JSONB columns can be used for a limited number of tasks where the data defies being defined by a fixed schema or if you're just storing raw json responses - but it should not be how you're defining your schema out of convenience because you're just shooting yourself in the foot. Its a special tool for special jobs.
The better alternative is to actually use good relational database design and store the price at the time of sale and everything else in a separate table:
class Order < ApplicationRecord
has_many :line_items
end
# rails g model line_item order:belongs_to product:belongs_to units:decimal unit_price:decimal subtotal:decimal
# The line item model is responsible for each item of an order
# and records the price at the time of order and any discounts applied to that line
class LineItem < ApplicationRecord
belongs_to :order
belongs_to :product
end
class Product < ApplicationRecord
has_many :line_items
end
A serialized column is not immutable in any way - its actually more prone to denormalization and corruption as there are no database side constraints to ensure its correctness.
Tables can actually be made immutable in many databases by using triggers.
Advantages:
No violation of 1NF.
A normalized fixed data schema to work with - constraints ensure the validity of the data on the database level.
Joins are an extremely powerful tool and not as expensive as you might think.
You can actually access and make sense of the data outside of the application if needed.
DECIMAL data types. JSON/JSONB only has a single number type that uses IEEE 754 floating point.
You have an actual model and assocations instead of having to deal with raw hashes.
You can query the data in sane queries.
You can generate aggregates on the database level and use tools like materialized views.

Related

where do I store static information in rails application?

This is a question about structure of a model and where to put static information. I've got a model Membership and the membership model has 7 entries describing unique features among various Memberships. The information is populated by seeds.rb. However, some attributes like click_value are the same across all memberships.
Would I be better off seeding this information and storing it and duplicating it across 7 entries in the database? Or is it better to write a method within the model like the following?
class Membership < ActiveRecord::Base
def click_value
return 0.001
end
end
Is it a matter of personal preference? Is one way better than the other? Just looking for some guidance on structure.

This is mostly a personal preference in my opinion, but I see a benefit in storing it in the database together with other data for the sake of consistency. What if the click_value will eventually change across membership types? In general, an extra float column in the database wouldn't cause significant overhead, would make sure that your data model is consistent, and be future-proof. Writing a method that returns a constant is not a crime, but I wouldn't prefer it over storing the value in the database.

Rails: polymorphic association, different options depending on type?

I'm building a diet analysis app in Rails 4.1. I have a model, FoodEntry, which at a simple level has a quantity value and references a Food and a Measure:
class FoodEntry < ActiveRecord::Base
belongs_to :food
belongs_to :measure
end
However I actually have two different types of measures, standard generic measures (cups, teaspoons, grams, etc.) and measures which are specific to a food (heads of broccoli, medium-sized bananas, large cans, etc.). Sounds like a case for a polymorphic association right? e.g.
class FoodEntry < ActiveRecord::Base
belongs_to :food
belongs_to :measure, polymorphic: true # Uses measure_id and measure_type columns
end
class StandardMeasure < ActiveRecord::Base
has_many :food_entries, as: :measure
end
class FoodMeasure < ActiveRecord::Base
has_many :food_entries, as: :measure
end
The thing is, the food-specific measures come from a legacy database dump. These records are uniquely identified by a combination of their food_id and description - they aren't supplied with a single-column primary key (description is not unique on its own because there are multiple foods with the same measure description but different numeric data). Because I'm importing to my Rails Postgres db, I'm able to add a surrogate primary key - the auto-incrementing integer id column that Rails expects. But I don't want to utilize this id as a reference in my FoodEntry model because it poses a pretty big challenge for keeping referential integrity intact when the (externally-supplied) data is updated and I have to reimport. Basically, those ids are completely subject to change, so I'd much rather reference the food_id and description directly.
Luckily it's not very difficult to do this in Rails by using a scope on the association:
class FoodEntry < ActiveRecord::Base
belongs_to :food
belongs_to :measure, ->(food_entry) { where(food_id: food_entry.food_id) }, primary_key: 'description', class_name: 'FoodMeasure'
# Or even: ->(food_entry) { food_entry.food.measures }, etc.
end
Which produces a perfectly acceptable query like this:
> FoodEntry.first.measure
FoodMeasure Load (15.6ms) SELECT "food_measures".* FROM "food_measures" WHERE "food_measures"."description" = $1 AND "food_measures"."food_id" = '123' LIMIT 1 [["description", "Broccoli head"]]
Note that this assumes that measure_id is a string column in this case (because description is a string).
In contrast the StandardMeasure data is under my control and doesn't reference Foods, and so it makes perfect sense to simply reference the id column in that case.
So the crux of my issue is this: I need a way for a FoodEntry to reference only one type of measure, as it would in the polymorphic association example I made above. However I don't know how I'd implement a polymorphic association with respect to my measure models because as it stands:
an associated FoodMeasure needs to be referenced through a scope, while a StandardMeasure doesn't.
an associated FoodMeasure needs to be referenced by a string, while a StandardMeasure is referenced by an integer (and the columns being referenced have different names).
How do I reconcile these issues?
Edit: I think I should explain why I don't want to use the autonumber id on FoodMeasures as my foreign key in FoodEntries. When the data set is updated, my plan was to:
Rename the current food_measures table to retired_food_measures (or whatever).
Import the new set of data into a new food_measures table (with a new set of autonumber ids).
Run a join between these two tables, then delete any common records in retired_food_measures, so it just has the retired records.
If I'm referencing those measures by food_id and description, that way I get the benefit that food entries automatically refer to the new records, and therefore any updated numeric data for a given measure. And I can instruct my application to go searching in the retired_food_measures table if a referenced measure can't be found in the new one.
This is why I think using the id column would make things more complicated, in order to receive the same benefits I'd have to ensure that every updated record received the same id as the old one, every new record received a new not-used-before id, and that any retired id is never used again.
There's also one other reason I don't want to do this: ordering. The records in the dump are ordered first by food_id, however the measures for any given food_id are in a non-alphabetical but nevertheless logical order I'd like to retain. The id column can serve this purpose elegantly (because ids are assigned in row order on import), but I lose this benefit the moment the ids start getting messed around with.
So yeah I'm sure I could implement solutions to these problems, but I'm not sure it would be worth the benefit?

it poses a pretty big challenge for keeping referential integrity
intact when the (externally-supplied) data is updated
This is an illusion. You have total control over the surrogates. You can do exactly the processing of external updates whether they are there or not.
This is just one of those times when you want your own new names for things, in this case Measures, of which FoodMeasures and StandardMeasures are subtypes. Have a measure_id in all three models/tables. You can find many idioms for simplifying subtype constraints, eg using type tags.
If you process external updates in such a way that it is convenient for such objects to also have such surrogates then you need to clearly separate such PutativeFoodMeasures and FoodMeasures as subtypes of some supertype PutativeOrProvenFoodMeasure and/or of PutativeOrProvenMeasure.
EDIT:
Your update helps. You have described what I did. It is not difficult to map old to new ids; join on old & new (food_id,description) and select old id (not a food_id!). You control ids; how can it matter to reuse ids compared to them not even existing otherwise? Ditto for sorting FoodMeasures; do as you would have. It is only when you mix them with StandardMeasures giving some result that you need order a mixture differently; but you would do that anyway whether or not a shared id existed. (Though "polymorphic:" may not be the best id sharing design.)
The Measures model offers measures; and when you know you have a FoodMeasure or StandardMeasure you can get at its subtype-particular parts.

Can I have a one way HABTM relationship?

Say I have the model Item which has one Foo and many Bars.
Foo and Bar can be used as parameters when searching for Items and so Items can be searched like so:
www.example.com/search?foo=foovalue&bar[]=barvalue1&bar[]=barvalue2
I need to generate a Query object that is able to save these search parameters. I need the following relationships:
Query needs to access one Foo and many Bars.
One Foo can be accessed by many different Queries.
One Bar can be accessed by many different Queries.
Neither Bar nor Foo need to know anything about Query.
I have this relationship set up currently like so:
class Query < ActiveRecord::Base
belongs_to :foo
has_and_belongs_to_many :bars
...
end
Query also has a method which returns a hash like this: { foo: 'foovalue', bars: [ 'barvalue1', 'barvalue2' } which easily allows me to pass these values into a url helper and generate the search query.
This all works fine.
My question is whether this is the best way to set up this relationship. I haven't seen any other examples of one-way HABTM relationships so I think I may be doing something wrong here.
Is this an acceptable use of HABTM?

Functionally yes, but semantically no. Using HABTM in a "one-sided" fashion will achieve exactly what you want. The name HABTM does unfortunately insinuate a reciprocal relationship that isn't always the case. Similarly, belongs_to :foo makes little intuitive sense here.
Don't get caught up in the semantics of HABTM and the other association, instead just consider where your IDs need to sit in order to query the data appropriately and efficiently. Remember, efficiency considerations should above all account for your productivity.
I'll take the liberty to create a more concrete example than your foos and bars... say we have an engine that allows us to query whether certain ducks are present in a given pond, and we want to keep track of these queries.
Possibilities
You have three choices for storing the ducks in your Query records:
Join table
Native array of duck ids
Serialized array of duck ids
You've answered the join table use case yourself, and if it's true that "neither [Duck] nor [Pond] need to know anything about Query", using one-sided associations should cause you no problems. All you need to do is create a ducks_queries table and ActiveRecord will provide the rest. You could even opt to use has_many :through relationship if you need to do anything fancy.
At times arrays are more convenient than using join tables. You could store the data as a serialized integer array and add handlers for accessing the data similar to the following:
class Query
serialize :duck_ids
def ducks
transaction do
Duck.where(id: duck_ids)
end
end
end
If you have native array support in your database, you can do the same from within your DB. similar.
With Postgres' native array support, you could make a query as follows:
SELECT * FROM ducks WHERE id=ANY(
(SELECT duck_ids FROM queries WHERE id=1 LIMIT 1)::int[]
)
You can play with the above example on SQL Fiddle
Trade Offs
Join table:
Pros: Convention over configuration; You get all the Rails goodies (e.g. query.bars, query.bars=, query.bars.where()) out of the box
Cons: You've added complexity to your data layer (i.e. another table, more dense queries); makes little intuitive sense
Native array:
Pros: Semantically nice; you get all the DB's array-related goodies out of the box; potentially more performant
Cons: You'll have to roll your own Ruby/SQL or use an ActiveRecord extension such as postgres_ext; not DB agnostic; goodbye Rails goodies
Serialized array:
Pros: Semantically nice; DB agnostic
Cons: You'll have to roll your own Ruby; you'll loose the ability to make certain queries directly through your DB; serialization is icky; goodbye Rails goodies
At the end of the day, your use case makes all the difference. That aside, I'd say you should stick with your "one-sided" HABTM implementation: you'll lose a lot of Rails-given gifts otherwise.

Best way to structure database objects with multiple features and attributes in Rails

I have a product model. Each product has a different feature set, and has many features.
Instead of creating a product model that lists all of it's features (since this would involve including a lot of features I do not need, and when I needed to add new features it would be difficult) my thought was to create one column that stores a "features array". So, be it this product is a laptop, and I wanted to know the screen size, I could call:
#laptop.features[:screen]
=> "15.6 inch"
The problem with this that I am not sure there is a simple and practical way to build a form that could accept various features, then map them to the array.
I found a railscast (#196) that explains there is accepts_nested_attributes_for built into rails that would basically have using both a Product model and Feature model and just associate the two records.
Which way would be better? Is there a common approach for this sort of problem? And is there a way to have a form in your view that would accept features? (even if they are not directly apart of the Product model's database structure)

I would definitely go with a more flexible solution of having a has_many relationship with features. Then you can easily call #product.features to get the products features and the flexibility really shines when you want to do something like assign multiple attributes to screen. If you are throwing hashes into your database you wouldn't be able to add two attributes (easily anyways) to screen.
Say you wanted #product.features[:screen] to show IPS of TFT in the future as well as size, then you would have to have nested hashes or something else that would be really ugly to process.

Perhaps a Features table that contains the features that you want to mention for the products, probably with a type attribute. Maybe "Type: Display, Key: Size, Value: 1920x1080", or "Type: HDD, Key: Capacity, Value: 2GB". You can use the type to create 'families' of keys. You can make your keys anything you want to track, and the value is just a string.
With that Feature list built, you create a joining table/model (assignments?) that tracks which product has which features.
Product (id, non-feature attributes)
has_many assignments
has_many features, through assignments
Feature (id, type, key, value)
has_many assignments
has_many products, through assignments
Assignments (id, product_id, feature_id, timestamps?)
belongs_to product
belongs_to feature
Given that you're linking a product id to a feature id, you can fiddle with your feature value text without breaking anything. Decide that "1920x1080" should be "1920px x 1080px" -- just change the feature record.

A database design for variable column names

I have a situation that involves Companies, Projects, and Employees who write Reports on Projects.
A Company owns many projects, many reports, and many employees.
One report is written by one employee for one of the company's projects.
Companies each want different things in a report. Let's say one company wants to know about project performance and speed, while another wants to know about cost-effectiveness. There are 5-15 criteria, set differently by each company, which ALL apply to all of that company's project reports.
I was thinking about different ways to do this, but my current stalemate is this:
To company table, add text field criteria, which contains an array of the criteria desired in order.
In the report table, have a company_id and columns criterion1, criterion2, etc.
I am completely aware that this is typically considered horrible database design - inelegant and inflexible. So, I need your help! How can I build this better?
Conclusion
I decided to go with the serialized option in my case, for these reasons:
My requirements for the criteria are simple - no searching or sorting will be required of the reports once they are submitted by each employee.
I wanted to minimize database load - where these are going to be implemented, there is already a large page with overhead.
I want to avoid complicating my database structure for what I believe is a relatively simple need.
CouchDB and Mongo are not currently in my repertoire so I'll save them for a more needy day.

This would be a great opportunity to use NoSQL! Seems like the textbook use-case to me. So head over to CouchDB or Mongo and start hacking.
With conventional DBs you are slightly caught in the problem of how much to normalize your data:
A sort of "good" way (meaning very normalized) would look something like this:
class Company < AR::Base
has_many :reports
has_many :criteria
end
class Report < AR::Base
belongs_to :company
has_many :criteria_values
has_many :criteria, :through => :criteria_values
end
class Criteria < AR::Base # should be Criterion but whatever
belongs_to :company
has_many :criteria_values
# one attribute 'name' (or 'type' and you can mess with STI)
end
class CriteriaValues < AR::Base
belongs_to :report
belongs_to :criteria
# one attribute 'value'
end
This makes something very simple and fast in NoSQL a triple or quadruple join in SQL and you have many models that pretty much do nothing.
Another way is to denormalize:
class Company < AR::Base
has_many :reports
serialize :criteria
end
class Report < AR::Base
belongs_to :company
serialize :criteria_values
def criteria
self.company.criteria
end
# custom code here to validate that criteria_values correspond to criteria etc.
end
Related to that is the rather clever way of serializing at least the criteria (and maybe values if they were all boolean) is using bit fields. This basically gives you more or less easy migrations (hard to delete and modify, but easy to add) and search-ability without any overhead.
A good plugin that implements this is Flag Shih Tzu which I've used on a few projects and could recommend.
Variable columns (eg. crit1, crit2, etc.).
I'd strongly advise against it. You don't get much benefit (it's still not very searchable since you don't know in which column your info is) and it leads to maintainability nightmares. Imagine your db gets to a few million records and suddenly someone needs 16 criteria. What could have been a complete no-issue is suddenly a migration that adds a completely useless field to millions of records.
Another problem is that a lot of the ActiveRecord magic doesn't work with this - you'll have to figure out what crit1 means by yourself - now if you wan't to add validations on these fields then that adds a lot of pointless work.
So to summarize: Have a look at Mongo or CouchDB and if that seems impractical, go ahead and save your stuff serialized. If you need to do complex validation and don't care too much about DB load then normalize away and take option 1.

Well, when you say "To company table, add text field criteria, which contains an array of the criteria desired in order" that smells like the company table wants to be normalized: you might break out each criterion in one of 15 columns called "criterion1", ..., "criterion15" where any or all columns can default to null.
To me, you are on the right track with your report table. Each row in that table might represent one report; and might have corresponding columns "criterion1",...,"criterion15", as you say, where each cell says how well the company did on that column's criterion. There will be multiple reports per company, so you'll need a date (or report-number or similar) column in the report table. Then the date plus the company id can be a composite key; and the company id can be a non-unique index. As can the report date/number/some-identifier. And don't forget a column for the reporting-employee id.
Any and every criterion column in the report table can be null, meaning (maybe) that the employee did not report on this criterion; or that this criterion (column) did not apply in this report (row).
It seems like that would work fine. I don't see that you ever need to do a join. It looks perfectly straightforward, at least to these naive and ignorant eyes.

Create a criteria table that lists the criteria for each company (company 1 .. * criteria).
Then, create a report_criteria table (report 1 .. * report_criteria) that lists the criteria for that specific report based on the criteria table (criteria 1 .. * report_criteria).

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart