When mapping an ER diagram to a relational schema, my textbook says that in step.. whatever.. a new relation S should be created for multivalued attributes. But if the multivalued attribute is the primary key of R... that leaves the R with no primary key and S with no primary key?
This is an excellent question and something that always bothered me about the textbook explanations of how to eliminate "complex" types.
The question you need to ask is: What is being identified by the sets of values? What are you trying to model? Most database architects working with SQL would probably say that you ought to invent a new attribute to identify the sets of things that would have made up your multivalued attribute.
Another solution is to embrace "complex" types as first class attributes in their own right - not "multivalued" attributes but sets or arrays that can be assigned to a variable as a single value just like any other value. The Tutorial D language permits relation-valued types withing relations. e.g.:
VAR r BASE RELATION {foo RELATION {bar INTEGER} } KEY {foo};
where foo is a relvar nested within r.
SQL however doesn't support anything like this. Nested tables are supported in SQL but are not usually allowed to be part of keys so in SQL you always have to create a new identifying attribute. In a true RDBMS you arguably shouldn't have to create another attribute because any supported type ought to support being part of a key - if it didn't then you wouldn't even be able to project on that attribute because the result wouldn't contain a key.
Related
I'm modeling an application in F# and I encountered a difficulty when trying to construct the database tables for the following recursive type :
type Base =
| Concrete1 of Concrete1
| Concrete2 of Concrete2
and Concrete1 = {
Id : string
Name : string }
and Concrete2 = {
Id : string
Name : string
BaseReference : Base }
The solution I've got for the moment (I've found inspiration here http://www.sqlteam.com/article/implementing-table-inheritance-in-sql-server) is :
I have two concerns with this solution :
There will be rows on the Base table even though that doesn't make sense in my model. But I can live with that.
It seems that queries to find all the information about BaseReference of Concrete2 will be complex since I will have to take into account the recursivity of the type and the different concrete tables. Moreover, adding a new concrete type to the model must modify these queries. Unless of course there is an equivalent to the match F# keyword in SQL.
Am I worrying too much about these concerns? or maybe, is there a better way to model this recursive F# type in SQL tables?
Part 1: Encoding Algrebraic Data Types in Relational Tables
I've struggled with this very thing many times. I finally discovered the key to modeling algebraic data types in relational tables: Check constraints.
With a check constraint, you can use a common table for all members of your polymorphic type yet still enforce the invariant of each member.
Consider the following SQL schema:
CREATE TABLE ConcreteType (
Id TINYINT NOT NULL PRIMARY KEY,
Type VARCHAR(10) NOT NULL
)
INSERT ConcreteType
VALUES
(1,'Concrete1'),
(2,'Concrete2')
CREATE TABLE Base (
Id INT NOT NULL PRIMARY KEY,
Name VARCHAR(100) NOT NULL,
ConcreteTypeId TINYINT NOT NULL,
BaseReferenceId INT NULL)
GO
ALTER TABLE Base
ADD CONSTRAINT FK_Base_ConcreteType
FOREIGN KEY(ConcreteTypeId)
REFERENCES ConcreteType(Id)
ALTER TABLE Base
ADD CONSTRAINT FK_Base_BaseReference
FOREIGN KEY(BaseReferenceId)
REFERENCES Base(Id)
Simple, right?
We've addressed concern #1 of having meaningless data in the table representing the abstract base class by eliminating that table. We've also combined the tables that were used to model each concrete type independently, opting instead to store all Base instances--regardless of their concrete type--in the same table.
As-is, this schema does not constrain the polymorphism of your Base type. As-is, it is possible to insert rows of ConcreteType1 with a non-null BaseReferenceId or rows of ConcereteType2 with a null BaseReferenceId.
There is nothing keeping you from inserting invalid data, so you'd need to be very diligent about your inserts and edits.
This is where the check constraint really shines.
ALTER TABLE Base
ADD CONSTRAINT Base_Enforce_SumType_Properties
CHECK
(
(ConcreteTypeId = 1 AND BaseReferenceId IS NULL)
OR
(ConcreteTypeId = 2 AND BaseReferenceId IS NOT NULL)
)
The check constraint Base_Enforce_SumType_Properties defines the invariants for each concrete type, protecting your data on insert and update. Go ahead and run all the DDL to create the ConcreteType and Base tables in your own database. Then try to insert rows into Base that break the rules described in the check constraint. You can't! Finally, your data model holds together.
To address concern #2: Now that all members of your type are in a single table (with invariants enforced), your queries will be simpler. You don't even need "equivalent to the match F# keyword in SQL". Adding a new concrete type is as simple as inserting a new row into the ConcreteType table, adding any new properties as columns in the Base table, and modifying the constraint to reflect any new invariants.
Part 2: Encoding hierarchical (read: recursive) relationships in SQL Server
Part of concern #2 I think about the complexity of querying across the 'parent-child' relationship that exists between ConcreteType2 and Base. There are many ways to approach this kind of query and to pick one, we'd need a particular use case in mind.
Example use case: We wish to query every single Base instance and assemble an object graph incorporating every row. This is easy; we don't even need a join. We just need a mutable Dictionary<int,Base> with Id used as the key.
It would be a lot to go into here but its something to consider: There is a MSSQL datatype named HierarchyID (docs) that implements the 'materialized path' pattern, allowing easier modeling of hierarchies like yours. You could try using HierarchyID instead of INT on your Base.ID/Base.BaseReferenceID columns.
I hope this helps.
I'll try and simplify as much as possible, however, if you need more information please let me know.
I'm using Rails 4 and PostgreSQL
edit:
using PSQL 9.3
the dataset won't change often and for this particular table will probably only have 15 columns
I have a design where there are "core" components that have default attribute values like:
material = wood
color = blue
price = $1.52
dimensions = 3x2x5
These "core" components and their default attribute values are managed by an admin who can make adjustments through an admin interface as needed.
A user can create a new component_group and it will pre-populate with available components. The components in the new group all use the default attribute values of their "core" component.
A user can then modify the attribute values of any of the components that the group contains.
What I currently do is: duplicate each "core" component to create a new unique record with the identical attribute values of the "core".
My concern is, that this app will potentially create a HUGE amount of records; many of those records may not have their default attribute values changed. While I don't know definitively, this seems like it's going to eventually be a performance problem (especially when you consider that in the real world scenario, components will have their own relations which may need to be duplicated as well).
My initial thought was to implement some kind of system where a new component record is only created if it's attribute values are changed, otherwise the component_group referrers to the "core" component.
So my questions are:
Is my current approach even remotely correct?
Are my performance concerns valid, or will it be insignificant to the DB?
Would this type of functionality be better suited to a NoSQL DB like CouchDB?
Is there a specific name for this type of functionality? I've looked at Class-Table Inheritance / Multi-Table Inheritance but I don't think that's what I'm looking for.
You can use (mostly) identical table definitions and NULL values in the child table to default to the respective column value of the parent row. Code example:
CREATE TABLE comp_template ( -- parent table
comp_template_id serial PRIMARY KEY
, material_id int REFERENCES material
, color enum
, ... -- attributes may or may not be defined NOT NULL
);
CREATE TABLE comp_group ( -- container
comp_group_id serial PRIMARY KEY
, comp_group text NOT NULL
)
CREATE TABLE comp ( -- child table
comp_id serial PRIMARY KEY
, comp_group_id int NOT NULL REFERENCES comp_group ON UPDATE CASCADE
ON DELETE CASCADE
, comp_template_id int NOT NULL REFERENCES comp_template ON UPDATE CASCADE
, material_id int REFERENCES material
, color enum
, ... -- like comp_template, but all attributes can be NULL
A view returning effective values:
CREATE VIEW comp_effective AS
SELECT c.comp_id, c.comp_template_id
, COALESCE(c.material_id, t.material_id) AS material_id
, COALESCE(c.color, t.color) AS color
, ...
FROM comp c
JOIN comp_template t USING (comp_template_id);
NULL storage is very cheap:
Do nullable columns occupy additional space in PostgreSQL?
This is assuming that you have a small, mostly static set of possible attributes. The solution is efficient up to a couple of hundred distinct attributes (columns) where you don't add another attribute every day.
Else look to unstructured data types like hstore or jsonb
You could use inheritance between comp_template and comp, would make sense. But consider limitations of the Postgres implementation first.
Related answer with more details:
Use case for hstore against multiple columns
I've discovered a new thing about Postgres: Composite types. I really like this approach and it will be very useful for me.
The problem is that rails' ActiveRecord don't have native support for this.
Do you ever used Postgres' composite types with Rails? Was a good experience or do you prefer the common approach of creating new models for this nested data?
http://www.postgresql.org/docs/8.4/static/rowtypes.html
Tks! :-)
This is an interesting feature of PostgreSQL, however I have not had a chance to work with it.
A few things come to mind on the Rails side:
ActiveRecord would have a tough time formatting the SQL necessary to query these objects. You would probably have to write custom SQL to take the special syntax into account.
ActiveRecord will not be able to do implicit casting of the custom types. This could be problematic when trying to access the attributes through ActiveRecord. You can extend the PostgreSQL adapter for ActiveRecord to cast these special data types into custom classes, however this is a non-traditional approach.
A few things come to mind on the database side:
Because of the way these types collapse multiple attributes into a single entity, this approach could be difficult to query. This includes specifying conditions where you need to check an individual attribute for a particular value. Additionally, if any of these composite types contain a key references, it could be difficult to perform CASCADE options.
This schema approach could be difficult to index if performance becomes an issue
This schema approach doesn't seem to be normalized in a way a database should be. In the examples provided, this composite data should exist as a separate table definition, with a foreign key reference in the parent table.
Unless the specific application you have in mind has compelling benefits, I would suggest a more normalized approach. Instead of:
CREATE TYPE inventory_item AS (
name text,
supplier_id integer,
price numeric
);
CREATE TABLE on_hand (
item inventory_item,
count integer
);
INSERT INTO on_hand VALUES (ROW('fuzzy dice', 42, 1.99), 1000);
You could achieve a similar result by doing the following, while keeping full support for ActiveRecord without having to extend the Postgres adapter, or create custom classes:
CREATE TABLE inventory_item (
id integer,
name text,
supplier_id integer,
price numeric
);
CREATE TABLE on_hand (
inventory_item_id integer,
count integer
);
INSERT INTO inventory_item VALUES ('fuzzy dice', 42, 1.99) RETURNS INTEGER;
INSERT INTO on_hand VALUES (<inventory_item_id>, 1000);
I am trying to add an association that I didn't have originally. I realized that two tables were technically related, and that some navigation properties might simplify what I would have otherwise had to do manually. The tables and their keys look like this:
Import
Primary Key:
Number : Int32
Date : DateTime
Hour
Primary Key:
DepartmentID : Int32
UserNumber : Int32
Date : DateTime
The association is named ImportHour. Import.Number maps to Hour.UserNumber, and Import.Date maps to Hour.Date. I am trying to add an association that is 0..1 on Import, and * on Hour with navigation properties and no additional foreign keys. When I do this, the designer tells me that the association is not mapped. If I then generate the DDL, it creates new fields Hours.Import_Date and Hours.Import_Number (Hours is the actual database table name for the Hour entity). If I manually map the fields, I end up with the following error:
Error 3021: Problem in mapping fragments starting at line 332:
Each of the following columns in table Hours is mapped to multiple conceptual side properties:
Hours.Date is mapped to <ImportHour.Hour.Date, ImportHour.Import.Date>
Hours.UserNumber is mapped to <ImportHour.Hour.UserNumber, ImportHour.Import.Number>*
I am not really sure what is happening, and I don't think I understand the 'mapping' process well enough to figure this out. It almost seems as if it wants a quintuple key, instead of realizing that the one key maps to the other. I look at my other one-to-many associations, and they do not even have table mappings; I think they have referential constraints instead, but you obviously can't have a referential constraint with a 0..1 to many association.
There are two ways to define relation but in your case you must use the Foreign key association. It means that once you draw association in entity model you must select it and define referential constraints.
You cannot have 0..1 on Import because in such case UserNumber and Date in Hour must be nullable. That is what that relation mean. If no principal entity exists (Import) FK properties in dependent entity (Hour) will be null.
Btw. using DateTime in primary key is not recommended.
As far as I can tell from other databases I have since used, the issue here seems to be that the EF model requires a foreign key to already exist in the database. While I cannot seem to get EF to generate one, it will accept one if it already exists. (Contrary to what I said in the question, you can have a referential constraint on a 0..1 to many (nullable) foreign key).
#Sahuagin this may be long after your question but did you try after adding the association, deleting the scalar property in the designer -- example after creating the ImportHour association, delete the hour.usernumber and hour.date from your hour entity.
this way the independent association established this way is the only relationship between yuor entities - thats the meaning of independent association
My understanding is that Actice Record is based on a object-relational mapping (ORM) pattern described by Martin Fowler in his book Pattern of Enterprise Application Architecture (Addison-Wesley, 2002); which states a one-to-one mapping relationship exists between a database record and the object that represents it in an object-oriented program (OOP). When Rails creator David Heinemeier sought to implement an ORM for his Rails framework, he based it on Fowler's pattern.
Here's the problem, does ActiveRecord assign a surrogate primary key to every table using the naming convention "ID", and if so, why? Reason I ask is that it appears it'd make more sense to assign a a surrogate primary key using the naming convention "tablename_ID"; as in fact it appears ActiveRecord does when creating foreign keys. Further, is it possible to override the default config, and assign a surrogate primary key using the naming convention "tablename_ID"; reason being that especially in the case of primary keys, it appears to be a good idea not to use a shared name, since telling the difference between two columns ID's is not possible if simply looking at the column namesing: ID, ID, ID. As a use case example of where this appears there would be a problem, if I export data to a single table from two tables, there will be two columns with the ID name; when I import that document with updates, it would appear that there's no way to map the ID columns by default.
It's already obvious to see what the ID is, because it is in a certain table. You don't have table_name.table_name_id but table_name.id. I like the distinction between foreign keys and primary keys this way, but admittedly, it's a matter of opinion.
Unless you have a really good reason to use anything other, you should stick to the convention. That's what conventions are for. ActiveRecord gives you the option to change it, like so:
ActiveRecord::Base.primary_key_prefix_type = :table_name_with_underscore
This sets the way primary keys are generated globally. products.id becomes products.product_id.