Postgres' Composite Type on Rails 3 - ruby-on-rails

I've discovered a new thing about Postgres: Composite types. I really like this approach and it will be very useful for me.
The problem is that rails' ActiveRecord don't have native support for this.
Do you ever used Postgres' composite types with Rails? Was a good experience or do you prefer the common approach of creating new models for this nested data?
http://www.postgresql.org/docs/8.4/static/rowtypes.html
Tks! :-)

This is an interesting feature of PostgreSQL, however I have not had a chance to work with it.
A few things come to mind on the Rails side:
ActiveRecord would have a tough time formatting the SQL necessary to query these objects. You would probably have to write custom SQL to take the special syntax into account.
ActiveRecord will not be able to do implicit casting of the custom types. This could be problematic when trying to access the attributes through ActiveRecord. You can extend the PostgreSQL adapter for ActiveRecord to cast these special data types into custom classes, however this is a non-traditional approach.
A few things come to mind on the database side:
Because of the way these types collapse multiple attributes into a single entity, this approach could be difficult to query. This includes specifying conditions where you need to check an individual attribute for a particular value. Additionally, if any of these composite types contain a key references, it could be difficult to perform CASCADE options.
This schema approach could be difficult to index if performance becomes an issue
This schema approach doesn't seem to be normalized in a way a database should be. In the examples provided, this composite data should exist as a separate table definition, with a foreign key reference in the parent table.
Unless the specific application you have in mind has compelling benefits, I would suggest a more normalized approach. Instead of:
CREATE TYPE inventory_item AS (
name text,
supplier_id integer,
price numeric
);
CREATE TABLE on_hand (
item inventory_item,
count integer
);
INSERT INTO on_hand VALUES (ROW('fuzzy dice', 42, 1.99), 1000);
You could achieve a similar result by doing the following, while keeping full support for ActiveRecord without having to extend the Postgres adapter, or create custom classes:
CREATE TABLE inventory_item (
id integer,
name text,
supplier_id integer,
price numeric
);
CREATE TABLE on_hand (
inventory_item_id integer,
count integer
);
INSERT INTO inventory_item VALUES ('fuzzy dice', 42, 1.99) RETURNS INTEGER;
INSERT INTO on_hand VALUES (<inventory_item_id>, 1000);

Related

F# recursive types to SQL tables

I'm modeling an application in F# and I encountered a difficulty when trying to construct the database tables for the following recursive type :
type Base =
| Concrete1 of Concrete1
| Concrete2 of Concrete2
and Concrete1 = {
Id : string
Name : string }
and Concrete2 = {
Id : string
Name : string
BaseReference : Base }
The solution I've got for the moment (I've found inspiration here http://www.sqlteam.com/article/implementing-table-inheritance-in-sql-server) is :
I have two concerns with this solution :
There will be rows on the Base table even though that doesn't make sense in my model. But I can live with that.
It seems that queries to find all the information about BaseReference of Concrete2 will be complex since I will have to take into account the recursivity of the type and the different concrete tables. Moreover, adding a new concrete type to the model must modify these queries. Unless of course there is an equivalent to the match F# keyword in SQL.
Am I worrying too much about these concerns? or maybe, is there a better way to model this recursive F# type in SQL tables?
Part 1: Encoding Algrebraic Data Types in Relational Tables
I've struggled with this very thing many times. I finally discovered the key to modeling algebraic data types in relational tables: Check constraints.
With a check constraint, you can use a common table for all members of your polymorphic type yet still enforce the invariant of each member.
Consider the following SQL schema:
CREATE TABLE ConcreteType (
Id TINYINT NOT NULL PRIMARY KEY,
Type VARCHAR(10) NOT NULL
)
INSERT ConcreteType
VALUES
(1,'Concrete1'),
(2,'Concrete2')
CREATE TABLE Base (
Id INT NOT NULL PRIMARY KEY,
Name VARCHAR(100) NOT NULL,
ConcreteTypeId TINYINT NOT NULL,
BaseReferenceId INT NULL)
GO
ALTER TABLE Base
ADD CONSTRAINT FK_Base_ConcreteType
FOREIGN KEY(ConcreteTypeId)
REFERENCES ConcreteType(Id)
ALTER TABLE Base
ADD CONSTRAINT FK_Base_BaseReference
FOREIGN KEY(BaseReferenceId)
REFERENCES Base(Id)
Simple, right?
We've addressed concern #1 of having meaningless data in the table representing the abstract base class by eliminating that table. We've also combined the tables that were used to model each concrete type independently, opting instead to store all Base instances--regardless of their concrete type--in the same table.
As-is, this schema does not constrain the polymorphism of your Base type. As-is, it is possible to insert rows of ConcreteType1 with a non-null BaseReferenceId or rows of ConcereteType2 with a null BaseReferenceId.
There is nothing keeping you from inserting invalid data, so you'd need to be very diligent about your inserts and edits.
This is where the check constraint really shines.
ALTER TABLE Base
ADD CONSTRAINT Base_Enforce_SumType_Properties
CHECK
(
(ConcreteTypeId = 1 AND BaseReferenceId IS NULL)
OR
(ConcreteTypeId = 2 AND BaseReferenceId IS NOT NULL)
)
The check constraint Base_Enforce_SumType_Properties defines the invariants for each concrete type, protecting your data on insert and update. Go ahead and run all the DDL to create the ConcreteType and Base tables in your own database. Then try to insert rows into Base that break the rules described in the check constraint. You can't! Finally, your data model holds together.
To address concern #2: Now that all members of your type are in a single table (with invariants enforced), your queries will be simpler. You don't even need "equivalent to the match F# keyword in SQL". Adding a new concrete type is as simple as inserting a new row into the ConcreteType table, adding any new properties as columns in the Base table, and modifying the constraint to reflect any new invariants.
Part 2: Encoding hierarchical (read: recursive) relationships in SQL Server
Part of concern #2 I think about the complexity of querying across the 'parent-child' relationship that exists between ConcreteType2 and Base. There are many ways to approach this kind of query and to pick one, we'd need a particular use case in mind.
Example use case: We wish to query every single Base instance and assemble an object graph incorporating every row. This is easy; we don't even need a join. We just need a mutable Dictionary<int,Base> with Id used as the key.
It would be a lot to go into here but its something to consider: There is a MSSQL datatype named HierarchyID (docs) that implements the 'materialized path' pattern, allowing easier modeling of hierarchies like yours. You could try using HierarchyID instead of INT on your Base.ID/Base.BaseReferenceID columns.
I hope this helps.

Why Rails hides the existence of id column?

I don't quite understand the need to hide the existence of id column in Rails.
It is neither reflected in migration file nor the schema.rb file.
There is no way for a newbie to know for the fact that a column named id has been created by default as a primary key.
Unless they go and check the actual schema of the table in database (rails dbconsole).
I can see the timestamps macro included by default in the migration file as well as in schema.rb as two fields created_at and updated_at. Here, a developer at least gets a clue. Rails could have done the same for id column too. But it doesn't.
Why the secrecy around id column? Is it a part of the famous convention over configuration? Or is it a norm across all MVC frameworks?
In database design it is generally accepted that numeric id's are preferred, because
they are easier to index, and thus easier to "follow" or check when creating links (foreign keys).
when editing/updating records, you have a unique (and efficient) identifier
So therefore it is advised to give all tables a unique numeric key, always.
Now this numeric key has no meaning whatsoever to your application, it is a "implementation detail" of your database layer. Also to make sure every table has an id, unless you explicitly ask not to.
I think this would indeed fall under the "convention over configuration" nomer: why explicitly specify an id for each table if you each table should have one.
The timestamps is different: this is interesting for some tables, but for same tables it is not important at all. It also depends on your application.
Note that this is not at all related to MVC. The M in MVC is a container for data, but in MVC it is actually not really important how the Model gets filled. In other words: the ORM part is not part of MVC. You will see that in most MVC implementations there is no ORM, or definitely not as tightly integrated as with Rails.
So in short: imho ommitting the 'id' from the migration is not a secret, it is just to make life easier, saves you some more typing, and it makes sure you follow a good convention unless you explicitly do not want to.
This is probably due to the fact that relational databases tend to use integer primary keys, and doing otherwise introduces complexities. I guess the reason it's hidden in rails is so that creating tables with integer primary keys does not require any special configuration, and having to write it into rails migrations invites inexperienced developers to play around with it (which is probable not a good idea).
Additionally, I think rails tries to abstract away things like numeric ids, if you want to create associations in a migration you do not need to specify foreign keys, you can simply write the name of the object you want to relate the table to.
I never thinked about the id field because almost every table have an id....
Check the documentation about migrations where they say:
A primary key column called id will also be added implicitly, as it's
the default primary key for all Active Record models. The timestamps
macro adds two columns, created_at and updated_at. These special
columns are automatically managed by Active Record if they exist.
If you want to check your table columns, just go to rails console and type Model.column_names
I think it's clear that if you don't add a primary key, then rails will add one generic key for you so that it can index your record and have a control over it, so basically it isn't stating that there WILL be an ID field, since I don't believe this has to be imperative, but rather optional in the event you do not provide a primary key.
It's a Rails convention to hide the I'd attribute to discourage and remove temptation of playing with it.
id attr is automatically generated with auto_increment to ad normalization to your data(to make each record unique an accessible). Injecting your own values would eventually corrupt and break the magic of ActiveRecord.

Rails - EAV model with multiple value types?

I currently have a model Feature that is used by various other models; in this example, it will be used by customer.
To keep things flexible, Feature is used to store things such as First Name, Last Name, Name, Date of Birth, Company Registration Number, etc.
You will have noticed a problem with this - while most of these are strings, features such as Date of Birth would ideally be stored in a column of type Date (and would be a datepicker rather than a text input in the view).
How would this best be handled? At the present time I simply have a string column "value"; I have considered using multiple value columns (e.g. string_value, date_value) but this doesn't seem particularly efficient as there will always be a null column in every record.
Would appreciate any advice on how to handle this - thanks!
There are a couple of ways I could see you going with this, depending on your needs. I'm not completely satisfied with any of these, but perhaps they can point you in the right direction:
Serialize Everything
Rails can store any object as a byte stream, and in Ruby everything is an object. So in theory you could store string representations of any object, including Strings, DateTimes, or even your own models in a database column. The Marshal module handles this for you most of the time, and allows you to write your own serialization methods if your objects have special needs.
Pros: Really store anything in a single database column.
Cons: Ability to work with data in the database is minimal - It's basically impossible to use this column as anything other than storage - you (probably) wouldn't be able to sort or filter your data based on it, since the format won't be anything the database will recognize.
Columns for every datatype
This is basically the solution you suggested in the question - figure out exactly which datatypes you might need to store - you mention strings and datestamps. If there aren't too many of those, it's feasible to simply have a column of each type and only store data in one of them. You can override the attribute accessor functions to use the proper column, and from the outside, Feature will act as though .value is whatever you need it to be.
Pros: Only need one table.
Cons: At least one null value in every record.
Multiple Models/Tables
You could make a model for each of the sorts of Feature you might need - TextFeature, DateFeature, etc. This guide on Multiple Table Inheritance conveys the idea and methodology.
Pros: No null values - every record contains only the columns it needs.
Cons: Complexity. In addition to needing multiple models, you may find yourself doing complex joins and unions if you need to work directly with features of different kinds in the database.

Does ActiveRecord assign a key to every table using the naming convention "ID", and if so, why?

My understanding is that Actice Record is based on a object-relational mapping (ORM) pattern described by Martin Fowler in his book Pattern of Enterprise Application Architecture (Addison-Wesley, 2002); which states a one-to-one mapping relationship exists between a database record and the object that represents it in an object-oriented program (OOP). When Rails creator David Heinemeier sought to implement an ORM for his Rails framework, he based it on Fowler's pattern.
Here's the problem, does ActiveRecord assign a surrogate primary key to every table using the naming convention "ID", and if so, why? Reason I ask is that it appears it'd make more sense to assign a a surrogate primary key using the naming convention "tablename_ID"; as in fact it appears ActiveRecord does when creating foreign keys. Further, is it possible to override the default config, and assign a surrogate primary key using the naming convention "tablename_ID"; reason being that especially in the case of primary keys, it appears to be a good idea not to use a shared name, since telling the difference between two columns ID's is not possible if simply looking at the column namesing: ID, ID, ID. As a use case example of where this appears there would be a problem, if I export data to a single table from two tables, there will be two columns with the ID name; when I import that document with updates, it would appear that there's no way to map the ID columns by default.
It's already obvious to see what the ID is, because it is in a certain table. You don't have table_name.table_name_id but table_name.id. I like the distinction between foreign keys and primary keys this way, but admittedly, it's a matter of opinion.
Unless you have a really good reason to use anything other, you should stick to the convention. That's what conventions are for. ActiveRecord gives you the option to change it, like so:
ActiveRecord::Base.primary_key_prefix_type = :table_name_with_underscore
This sets the way primary keys are generated globally. products.id becomes products.product_id.

ER-to-Relational Mapping: multi-valued primary key

When mapping an ER diagram to a relational schema, my textbook says that in step.. whatever.. a new relation S should be created for multivalued attributes. But if the multivalued attribute is the primary key of R... that leaves the R with no primary key and S with no primary key?
This is an excellent question and something that always bothered me about the textbook explanations of how to eliminate "complex" types.
The question you need to ask is: What is being identified by the sets of values? What are you trying to model? Most database architects working with SQL would probably say that you ought to invent a new attribute to identify the sets of things that would have made up your multivalued attribute.
Another solution is to embrace "complex" types as first class attributes in their own right - not "multivalued" attributes but sets or arrays that can be assigned to a variable as a single value just like any other value. The Tutorial D language permits relation-valued types withing relations. e.g.:
VAR r BASE RELATION {foo RELATION {bar INTEGER} } KEY {foo};
where foo is a relvar nested within r.
SQL however doesn't support anything like this. Nested tables are supported in SQL but are not usually allowed to be part of keys so in SQL you always have to create a new identifying attribute. In a true RDBMS you arguably shouldn't have to create another attribute because any supported type ought to support being part of a key - if it didn't then you wouldn't even be able to project on that attribute because the result wouldn't contain a key.

Resources