Rails - EAV model with multiple value types? - ruby-on-rails

I currently have a model Feature that is used by various other models; in this example, it will be used by customer.
To keep things flexible, Feature is used to store things such as First Name, Last Name, Name, Date of Birth, Company Registration Number, etc.
You will have noticed a problem with this - while most of these are strings, features such as Date of Birth would ideally be stored in a column of type Date (and would be a datepicker rather than a text input in the view).
How would this best be handled? At the present time I simply have a string column "value"; I have considered using multiple value columns (e.g. string_value, date_value) but this doesn't seem particularly efficient as there will always be a null column in every record.
Would appreciate any advice on how to handle this - thanks!

There are a couple of ways I could see you going with this, depending on your needs. I'm not completely satisfied with any of these, but perhaps they can point you in the right direction:
Serialize Everything
Rails can store any object as a byte stream, and in Ruby everything is an object. So in theory you could store string representations of any object, including Strings, DateTimes, or even your own models in a database column. The Marshal module handles this for you most of the time, and allows you to write your own serialization methods if your objects have special needs.
Pros: Really store anything in a single database column.
Cons: Ability to work with data in the database is minimal - It's basically impossible to use this column as anything other than storage - you (probably) wouldn't be able to sort or filter your data based on it, since the format won't be anything the database will recognize.
Columns for every datatype
This is basically the solution you suggested in the question - figure out exactly which datatypes you might need to store - you mention strings and datestamps. If there aren't too many of those, it's feasible to simply have a column of each type and only store data in one of them. You can override the attribute accessor functions to use the proper column, and from the outside, Feature will act as though .value is whatever you need it to be.
Pros: Only need one table.
Cons: At least one null value in every record.
Multiple Models/Tables
You could make a model for each of the sorts of Feature you might need - TextFeature, DateFeature, etc. This guide on Multiple Table Inheritance conveys the idea and methodology.
Pros: No null values - every record contains only the columns it needs.
Cons: Complexity. In addition to needing multiple models, you may find yourself doing complex joins and unions if you need to work directly with features of different kinds in the database.

Related

Single vs. multiple ID columns in data warehouse/lake

I have setup a time-series / events database using the AWS Firehose -> S3/Glue -> Athena stack. It is being used to track various user actions - session started, action performed etc. across a number of our products. My question is about how best to store different types of IDs in this system.
The existing schema is one big 'fact table' with a bunch of different columns. Two of the most important columns are event_type_id and object_id. To use StackOverflow as an example, two events might be:
question_asked - in this case I would be storing the question id in the object_id column.
tag_created - in this case I would be storing the tag id in the object_id column.
My question is - is storing multiple different types of IDs in the same column bad practice? It's working OK for us at the moment, but it does require the person/system performing queries to know what type of object the object_id column refers to, based on the event they are querying.
If bad practice, what other approaches might be better? Multiple columns where they are NULL if not relevant for the event in that row? Or is this where dimension tables would be a better fit?
This isn't necessarily bad practice, depending on how you use it.
It sounds like you're aware of the potential pitfalls of such an approach (i.e. users of the data have to be aware of the context - in this case "event type" - to use the values correctly), so as you're using Athena you could mitigate that by creating views over source table for different event types, inserting a WHERE clause filter on event type and possibly renaming object_id to something more context specific e.g. question_id.
This makes it easier for users to work with the data and understand exactly what the values are they're working with.
In a big data environment I wouldn't recommend creating dimension tables if it can be avoided as JOINs between tables start to get expensive. Having multiple columns for different ids is possible but then you create new problems for users such as having to account for NULL values in an Id column, and this also potentially makes it harder to add new event types and ids as you have to change the schema to accommodate them.

Rails database setup Polymorphism

We have to create a request system which will have roughly 10 different types of requests. All of these requests will belong to the 'accounting' aspect of our application. Therefore we've called them "Accounting requests".
All requests share maybe only a few columns and each has up to 20 columns individually.
We started to wonder if having separate tables for each request type would be practical in terms of speed when we start to have to do very complicated joins or queries, for example, fetching ALL requests types into a single table and then sorting it.
Maybe it would be easier to just use Single Table Inheritance since it will have a type column and we'd be using one table to store all 10 accounting request types.
What do you think regarding using STI for this many polymorphic associations and requirements?
Essentially, it would have models like so:
AccountingRequest
BillingRequest < AccountingRequest
CheckRequest < AccountingRequest
CancellationRequest < AccountingRequest
Each subclass has roughly 10+ fields.
Currently reading about Multiple Table Inheritance here. This seems like the solution that fits my requirements in this case. Not sure yet though.
STI is a good fit if your models all share the same attributes.
However if your sub classes start having attributes specific to them and not applicable to others, then STI can result in a lot of null columns. In that case, I usually prefer to go with polymorphic association.
This railscast episode is a great example of the difference between the 2
You can use STI in that situation. But making STI will require all the columns into one single table and that's not the good think. The table will go very large in the number of fields.
I think you should divide into two tables like as below...
Request: A request table will be the polymorphic table which saved the information for the type of requests.
RequestItem: The request item table will save all the 20 fields records into the table and will have a foreign key of request table. The request item table will have two fields into the database that's called key and value.
It sounds do-able.
When I've looked into this, I found that making extensive use of value objects helped to control the non-applicability of some attributes to some of the types.
In my case I had types of products, some of which would not have particular measurements for example. In those cases I used a Null Object to indicate "Not applicable" where appropriate.
Edit: I also found the composed_of syntax very convenient: https://apidock.com/rails/ActiveRecord/Aggregations/ClassMethods/composed_of
For now I'm using a bit of NoSQL for such cases. Postgresql's JSONB type allows to store multilevel ruby hash. It also provides rich functionality: DB level constraints, indexes and query operators.
So common attributes are stored in standard way and child specific - in jsonb. Then you can use whatever you need on top of this: STI, Value Objects pattern, serialization or just create scopes for each child. I prefer the last one - my models are thin, most of constraints are DB level and all business logic is in service classes.
Pros:
Avoiding alter table on big tables when need to add one more child type
Keeping my queries efficient
Preventing storing and selecting unnecessary columns
Serialization out of the box for JSON APIs
Cons:
A bit of schemaless
Vendor lock

Store as JSON or separate attributes?

I'm using Rails 3.2 and MariaDB. I have this group of data:
description, services, facilities
Not indexed and purely for output in the show page. Should I store these as one JSON object in one more_info attribute or store as separate attributes?
I personally would make columns for them, it would generally make the fields easier to work with, especially if there will be need a to update the values. I usually reserve JSON serialized fields when I do not know how many attributes there will be.
If you are showing the data to your users I would recommend saving them in different columns. I find that as soon as users see something they want to filter by it or work with it in ways you have not foreseen.
If you are not then the choice is less clear cut but the very fact you have 3 distinct groups suggest that they are different things which could be treated differently as your application matures.
I would always go with the Normalised form unless you have documented reasons not to.

grails when to use enum vs database lookup table

I'm geting towards the end of my application and have 25 lookup tables. Some of my domain classes have 15 table references as properties (one-to-one). Now I have just remembered/learned I can use enums instead. I'm sure I can refactor a bunch of those lookup tables to be enum classes. However, what are the best practices for using an enum vs. a lookup table? A few things about my application scenario that may assist in the answer:
There is only one developer (me)
This is a web based application
It won't be a heavily used application
When the view for the domain class with currently 15 lookups is rendered, all of those lookups will need to be loaded to display the data in the view. (15 joins)
The lookups will be cached in memory.
The lookup relationships to the domain class are all one-to-one
The data in the lookups will rarely change
Some of the values in the lookup tables are wordy, example: "Taking care of animals"
Some of the lookups have many records. Example: one lookup is U.S. states. Another is a person's occupation.
I think the most obvious answer is to use Enums for things that don't change - ever. Like Days of the week,
or Planets in the Solar System - ok so sometimes thnigs you thought wouldnt change actually changed, but that OK - a quick update and you are good to go for a few more years.
But when you are working with data that will certainly change over time and often enough, then by all means store that in the database. A change to a table will not require a code change and redeployment. Additionally it would be nice to add an admin interface to these - and scaffolded screens should be quick and easy enough.
This is just my opinion. I don't think there is a "right" answer here.
If it gets stored in a database I use a lookup table. It is more flexible and based on a few things, should more performant. While you do have to do join queries, you can lessen that impact by enabling 2nd level cache. cache: true In your mappings.
One thing to remember when using an Enum instead of a lookup is that there is no longer a foreign key to a separate table so you can't guarantee the values are in the list you desire from a database level. In addition, if you need to query based on the Enum, it is now a string comparison instead of a number comparison typically.

Custom fields in Rails that act as a template for future entries

I'm looking for some feedback on my current plan of implementing custom fields in rails. I'm new to rails and app development in general and would appreciate any comments from more experienced individuals.
Background
The app: Keep track of food and beverage tastings.
What I'm trying to model:
User creates a new sample type.
They call it: "Wine"
They decide for their company, they'd like to keep track of the following attributes: Origin, Grape Type, Company, Elevation,Temperature Kept, and more.
The only assumptions about a sample type that my database has made is that it has a Name. (eg. coffee, wine, etc.) the rest are all custom fields specified by the user.
Now that a sample type has been created.
The user begins to create samples of sample type wine.
They choose create sample, choose of type Wine.
The fields they must fill in are the ones they specified earlier.
In Origin they put: France, in Grape type: they put chardonnay, etc..
--
My plan of approach is as follows:
When a user creates the sample type, store the custom fields as an array or in some string format and keep it under a column called data.
SampleType
name
wine
data
[origin, grape_type, company, ...]
When a user wants to create a sample of type Wine:
I look up the sample type wine, for each key in the data column, it creates form fields.
When the user submits the data, I create a hash of all the custom fields names and their corresponding data. I serialize it and store it in a hash in a data column like such:
Sample
type
wine
data
{ origin: "France", grape_type: "Pinot Grigio, ... }
My plan at the moment is to use PostgreSQL's hstore to implement the hashing in the data column.
My questions are:
Is this a valid solution for what I'm trying to do?
Will I run into trouble when users change what custom fields they want?
Any other concerns I should take into account?
Is mongodb and other such db's a better choice for this type of model?
I've been using the following links as a reference:
http://schneems.com/post/19298469372/you-got-nosql-in-my-postgres-using-hstore-in-rails
http://blog.artlogic.com/2012/09/13/custom-fields-in-rails/
As well as many other stack overflow posts, however none seem to be using it in the way I mention above.
Any comments are appreciated.
jtgi, having done something like this more times than I want to remember, my first response was, "run away!" In my experience, the whole user-defined field thing is an ugly, hacky, nightmare. Soon, someone will ask, "can I search on grape?" or "I want to be able to input multiple values for grape." And on and on, and you will hate yourself for ever stepping down this path. :-)
That said, I think your approach is pretty decent. To answer your questions directly:
Yes, this is a valid approach.
Yes, you will run into trouble when users change the custom fields they want. (see above)
See some notes below.
Might be. I went there even before I read your 4th question. With your field => value hash, you're kind of implementing a noSQL solution anyhow, but it'll be non-trivial to implement lookups, searches, etc.
Some thoughts:
I think I would marshal the data into a db column, rather than using a db function. That way, it's pure Ruby and not dependent on the db type. See http://www.ruby-doc.org/core-1.9.3/Marshal.html. I'm doing this to cache some data in an app right now, and it's pretty slick. You may need to marshal(l) the data anyhow, if you want to wind up storing Ruby objects more complex than strings.
You'll probably get there soon anyhow, so I would plan on storing some "metadata" about the attributes while you're at it. E.g., "grape" is a String, max length 20, "rating" is an integer between 0 and 100. That way you can make your form a little prettier and do some rudimentary validation.
When you come to hate this feature, you can remember me. :-)

Resources