How to model Entity1---(1,1)---Relation---(1,3)---Entity2 to a table? - entity-relationship

I want to convert given ER diagram with (min, max) notation to tables and im unsure of what the primary key of the "trainieren"-relation is.
If the Relation R is between A and B and:
one to one -> the primary key is the primary key of either A or B
one to many -> the primary key is the primary key of the entity that takes part multiple times in the relation
many to many -> the primary key is the primary key of A and B
I'd interpret (0,1) and (1,1) as one and (1,3) and (1,*) as many, therefore
my solution would be (primary keys in strong text) trainieren: {[Trainer.AkkrNr, Teams.Land]}

Generally, we try to use only one or many cardinality indicators, since those map easily to table structure. Most data modelers would do the same as you did in discarding the upper limit to simplify the model.
If you want to enforce that limit, there are a few ways to implement it:
Use your structure and a trigger on insert/update to count how many trainers the given team has, and throw an error if it exceeds 3.
You could add a position column to the primary key of trainieren and a constraint to limit it to values 1, 2 and 3. However, that imposes an ordering that wasn't part of the conceptual model.
Change trainieren to (Teams.Land PK, Trainer1.AkkrNr, Trainer2.AkkrNr, Trainer3.AkkrNr). Trainer2 and Trainer3 would need to be nullable, and this design loses the constraint that each trainer belongs to only one team. You could fix that with a trigger. Yuck.
Since there's no ideal way to implement an upper bound on the relationship cardinality, most data modelers would follow the same approach as you did, and leave it to the database client (usually the application logic) to enforce that limit.

Related

How to populate fact table with Surrogate keys from dimensions

Could you please help understand how to populate fact table with Surrogate keys from dimensions.
I have the following fact table and dimensions:
ClaimFacts
ContractDim_SK
ClaimDim_SK
AccountingDim_SK
ClaimNbr
ClaimAmount
ContractDim
ContractDim_SK (PK)
ContractNbr(BK)
ReportingPeriod(BK)
Code
Name
AccountingDim
TransactionNbr(BK)
ReportingPeriod(PK)
TransactionCode
CurrencyCode
(Should I add ContractNbr here ?? original table in OLTP has it)
ClaimDim
CalimsDim_Sk(PK)
CalimNbr (BK)
ReportingPeriod(BK)
ClaimDesc
ClaimName
(Should I add ContractNbr here ?? original table in OLTP has it)
My logic to load data into fact table is the following :
First I load data into dimensions (with Surrogate keys are created as identity columns)
From transactional model (OLTP) the fact table will be filled with the measures (ClaimNbr And ClaimAmount)
I don’t know how to populate fact table with SKs of Dimensions, how to know where to put the key I am pulling from dimensions to which row in fact table (which key belongs to this claimNBR ?)
Should I add contract Nbr in all dimensions and join them together when loading keys to fact?
What’s the right approach to do this?
Please help,
Thank you
The way it usually works:
In your dimensions, you will have "Natural Keys" (aka "Business Keys") - keys that come from external systems. For example, Contract Number. Then you create synthetic (surrogat) keys for the table.
In your fact table, all keys initially must also be "Natural Keys". For example, Contract Number. Such keys must exist for each dimension that you want to connect to the fact table. Sometimes, a dimension might need several natural keys (collectively, they represent dimension table "Granularity" level). For example, Location might need State and City keys if modeled on State-City level.
Join your dim table to the fact table on natural keys, and from the result omit natural key from fact and select surrogat key from dim. I usually do a left join (fact left join dim), to control records that don't match. I also join dims one by one (to better control what's happening).
Basic example (using T-SQL). Let's say you have the following 2 tables:
Table Source.Sales
( Contract_BK,
Amount,
Quantity)
Table Dim.Contract
( Contract_SK,
Contract_BK,
Contract Type)
To Swap keys:
SELECT
c.Contract_SK
,s.Amount
,s.Quantity
INTO
Fact.Sales
FROM
Source.Sales s LEFT JOIN Dim.Contract c ON s.Contract_BK = c.Contract_BK
-- Test for missing keys
SELECT
*
FROM
Fact.Sale
WHERE
Contract_SK IS NULL

Entity Relationship Diagram: How to create a Yelp-kind of app with not just one price-range?

Im new to Rails and I'm in the middle of sketching up an ERD for my new app. A Yelp-sort of app, where a Client is sorted by price.
So I want one Client to have many priceranges - One Client can both have pricerange $ and Pricerange $$$$ for example. The priceranges are:
$ - $$ - $$$ - $$$$ - $$$$$
How would this look in a table? Would I create a table called PriceRange with Range1, Range2, Range3, Range4, Range5 to be booleans?
Doesn't the PriceRange-table need any foreign/primary keys?
PriceRange
Range1 (Boolean)
Range2 (Boolean)
Range3 (Boolean)
Range4 (Boolean)
Range5 (Boolean)
Look, I'm Brazilian and I'm not very knowledgeable about yelp applications. I do not quite know what it is, but from what I saw, they are systems to assess/measure/evaluate (perhaps the translation is wrong here for you) things, in this case, companies, right?
Following this logic, let's think...
By the description of your problem (context), you have clients (companies), and they can have price ranges, correct? If:
A price interval is represented by textual names, such as "$", "$$",
and so on,
and the same price range may have (numeric) values for different companies,
And the same price range (type) can be (or not) assigned to different
companies,
Then here is what we have:
By decomposing this conceptual model, you would end up with three tables:
Companies
Price Ranges
Price Ranges from Companies
The primary keys of Company and Price Ranges will be passed to Price Ranges from Companies as foreign keys. You can use them as a composite primary key, or use a surrogate key. If using a surrogate key, you will permit/allow a company to have the same kind of price range more than once, which I believe is not the case.
Let's look at another situation, if things are simpler as:
If there is no need to store prices,
and an company may have or not one or more price ranges represented by "$", "$$", and so on,
Then here is what we have:
Similarly, we'll have the same 3 tables. Likewise, you still must pass the primary keys of Companies and Price Ranges to Price Ranges from Companies as foreign keys.
So I want one Client to have many priceranges - One Client can both
have pricerange $ and Pricerange $$$$ for example
Notice how N-N relationships allow us to create optional relationships between entities. This will allow a company to have zero, one, two, (etc.) or all price ranges defined. Again, so that is not allowed a company to have a price range more than once, set the foreign keys as composite primary key in Price Ranges from Companies.
If you have any questions or anything I explained has nothing to do with your context, please do not hesitate to comment.
EDIT
Is the Price ranges from companies what is called a Joint table?
Yes. There are also other terms used, some in different areas of computer science, such as Link Table, or Intermediate Table.
Actually we do not have a table here in the diagram, but an entity. In the Conceptual Model there are no tables, but entities and relationships. Be careful with this terminology when developing the Conceptual Model, or else you may get confused (I say this from experience).
However, yes, once decomposed, we will have a table from this relationship. When decomposed, N-N relationships will always become tables, no exception. Differently, 1-1 and 1-N (or N-1) relationships do not become tables. These tables with these special names (Join/Link/Intermediate Tables) serves to associate records from different tables, hence the name.
And is it necessary to have a column called Price Range Id? I mean
what is it there for?
At where? If you say at the Price Ranges entity, it is rather necessary. Must We not identify records in a table in some way? Here I set what is called a Surrogate Key. If on the other hand, you have a column with unique values for each record in the table, you can also use this column. I highly recommend that you consider the use of surrogate keys. Read the link I gave you.
In the Conceptual Model, we have to define the properties and also the primary keys. During the phase of the conceptual model, natural attributes of entities can become primary keys if you so desire. In this case, we have what is called a Natural Key.
If on the other hand you refer to Price Ranges from Companies entity, so the question is another ("And is it necessary to have a column called Price Range Id?"). Here we have a table with two columns, as I told you. The two are foreign keys. You need it so you can relate rows from the two tables... I think you were not referring to that, is not it? If so, no problem, you can comment and ask more questions. I do not care to answer. To be honest, I did not quite understand your question.
EDIT 2
So that Company 28 can be identified in the Price Ranges (for instance
ID 40) Which would make it easier to call out the price ranges it has?
Maybe my English is not very good, but it seems to me that you have a beginner's doubt/question in relation to the concept of tables and relationships between them. If not that, I apologize because maybe I did not understand. But let's see...
The tables in a database have rows / records. Each line has its own data. Even with this, each line / record needs to be differentiated and identified somehow. That is why we attach to each line an identifier, known as the primary key (this, and this). In summary, the primary key is how we identify, differentiate, separate and organize different records.
Even if all records have different values, you must select a field (column) that represents the primary key of the table. By obligation, every record MUST have a primary key. Although you can choose which field is a primary key, you are allowed to choose one or more fields to serve as the primary key. When this happens, that is, when more than one field participates/serves as the primary key, we have a table with something called Composite Primary Key. Similarly, it has the ability to identify records. Note that, because of that, primary key values must be unique, otherwise you may have 2 identical records.
This is the basic concept so that we can relate tables to each other, in case, records/rows of tables together. If we have a Company identified by the ID 28 (a line/record), and we want to relate it to a Price Range identified by the ID 40, then we need to store somewhere that relationship (28 <--> 40). This is where the role of intermediate/link/join tables comes in (but only to relationships N-N! For 1-N or N-1 relationships it works similarly, but not identical).
My original question was whether it was necessary, and why a company
ID had to link up with a price range ID at all.
With this table storing records which relates to other records (for their primary keys), we can perform a SQL join operation (If you have questions about this, see this image). Depending on how you perform this operation, you'll get:
All companies that have Price Ranges.
All companies that do not have Price Ranges.
All the Price Ranges of a given company.
All companies that have or not a X Price Range.
All price ranges that are given or not to companies.
...
Anyway, you get all this because of the established relationship.
If it could just be taken out and then the table of price ranges would
only involve Pricerange1-5.
This sentence I did not understand. What should be taken out? Could you please explain this sentence better?

Star schema: how to handle dimension table with constantly changing set of columns?

First project using star schema, still in planning stage. We would appreciate any thoughts and advice on the following problem.
We have a dimension table for "product features used", and the set of features grows and changes over time. Because of the dynamic set of features, we think the features cannot be columns but instead must be rows.
We have a fact table for "user events", and we need to know which product features were used within each event.
So it seems we need to have a primary key on the fact table, which is used as a foreign key within the dimension table (exactly the opposite direction from a conventional star schema). We have several different dimension tables with similar dynamics and therefore a similar need for a foreign key into the fact table.
On the other hand, most of our dimension tables are more conventional and the fact table can just store a foreign key into these conventional dimension tables. We don't like that this means that some joins (many-to-one) will use the dimension table's primary key, but other joins (one-to-many) will use the fact table's primary key. We have considered using the fact table key as a foreign key in all the dimension tables, just for consistency, although the storage requirements increase.
Is there a better way to implement the keys for the "dynamic" dimension tables?
Here's an example that's not exactly what we're doing but similar:
Suppose our app searches for restaurants.
Optional features that a user may specify include price range, minimum star rating, or cuisine. The set of optional features changes over time (for example we may get rid of the option to specify cuisine, and add an option for most popular). For each search that is recorded in the database, the set of features used is fixed.
Each search will be a row in the fact table.
We are currently thinking that we should have a primary key in the fact table, and it should be used as a foreign key in the "features" dimension table. So we'd have:
fact_table(search_id, user_id, metric1, metric2)
feature_dimension_table(feature_id, search_id, feature_attribute1, feature_attribute2)
user_dimension_table(user_id, user_attribute1, user_attribute2)
Alternatively, for consistent joins and ignoring storage requirements for the sake of argument, we could use the fact table's primary key as a foreign key in all the dimension tables:
fact_table(search_id, metric1, metric2) /* no more user_id */
feature_dimension_table(feature_id, search_id, feature_attribute1, feature_attribute2)
user_dimension_table(user_id, search_id, user_attribute1, user_attribute2)
What are the pitfalls with these key schemas? What would be better ways to do it?
You need a Bridge table, it is the recommended solution for many-to-many relationships between fact and dimension.
http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/multivalued-dimension-bridge-table/
Edit after example added to question:
OK, maybe it is not a bridge, the example changes my view.
A fundamental requirement of dimensional modelling is to correctly identify the grain of your fact table. A common example is invoice and line-item, where the grain is usually line-item.
Hypothetical examples are often difficult because you can never be sure that the example mirrors the real use case, but I think that your scenario might be search-and-criteria, and that your grain should be at the criteria level.
For example, your fact table might look like this:
fact_search (date_id,time_id,search_id,criteria_id,criteria_value)
Thinking about the types of query I might want to do against search data, this design is my best choice. The only issue I see is with the data type of criteria_value, it would have to be a choice/text value, and would definitely be non-additive.

Database Normalization Validation

How do I know if I normalized correctly to 2NF or 3NF? I am still struggling how to validate, that I followed the algorithm correctly.
Is this a normlization that would correspond to 3NF? I an a little bit lost.
According to your data schema you have these rules:
At an Incident there can be MANY Responders.
A Responder can have ONE Device.
A Responder can have ONE res_latitude and ONE res_longitude.
A Device can have ONE Dev_installation.
If the above are what you want then i think it's ok (but see again the primary keys).
Also, i forgot to mention that the reason of keeping the responder_id and device_id in a separate table is to keep historical data in case device_id change responder_id. You could also merge ResponceIncidentDevice in one table with keys incident_id, responder_id, device_id so you will be able to know in what incidents a reponder went carrying what devices.
EDIT:
According to your comment you need to make the following changes. Also note that it is better to use lower case for all your tables and columns to avoid case sensitivity problems due to various engine implemantations.
Responders
responder_id res_latitude res_longitude
Responders_Devices (pk: responder_id, device_id)
responder_id device_id
1 1
1 2
2 3
2 4
3 5
Hey there are a lot many tutorials available on the subject but they are a little complex, I can understand your problem.
First of all your project isn't even legal for First Normalization form because your second table, RespondersIncidents, is a table which has two foreign keys but you have no primary keys.
Now let me simplify the rules for you.
1NF - You must have a primary key (One sentence layman definition)
2NF - No Partial Dependency, Try not to have two entries in one column and make sure that your primary key uniquely identifies the whole row.
3NF - No Functional Dependency, Make sure that in one row only your specific primary key has the power to identify the whole row. For e.g. if in one row there is primary key (auto generated) and student id as well which is unique then we have functional dependency here that means we don't need a separate primary key, we can use student id as primary key.
I hope this was informative for you. I kept it short and simple.

How do you normalize one-to-one-or-the-other relationships?

I'm storing data on baseball statistics and would like to do so with three tables: players, battingStats, and pitchingStats. For the purpose of the question, each player will have batting stats or pitching stats, but not both.
How would I normalize such a relationship in 3NF?
PlayerId would be a foreign key in both BattingStats and PitchingStats tables
[and remember to put some time dimension (season, year, et al) in the stats tables]
and by the way, this is a bad assumption: as far as I know, pitchers are allowed to bat, too!
Are you really required not to use more than 3 tables. Normalization normally implies breaking down one non-normalized model into many normalized relations.
If you can have more than 3 tables, you may want to consider the following (in 3NF):
Players: ([player_id], name, date_of_birth, ...)
Batters: ([batter_id], player_id)
Pitchers: ([pitcher_id], player_id)
Batting_Stats: ([batter_id, time_dimension], stat_1, stat_2, ...)
Pitching_Stats: ([pitcher_id, time_dimension], stat_1, stat_2, ...)
Attributes in [] define the primary key, but a surrogate key may be used if preferred. The player_id attribute in Batters and Pitches should have a unique constraint, and it should also be a foreign key to the Players relation. Batting_Stats and Pitching_Stats should also have a foreign key to Batters and Pitching respectively.
Note however that the above does not enforce that a player can be only a batter or only a pitcher.
UPDATE:
One method I am aware of to enforce that a player is only a batter or only a pitcher, is through this model:
Players: ([player_id], name, date_of_birth, ...)
Roles: ([role_id, role_type], player_id)
Batting_Stats: ([role_id, role_type, time_dimension], stat_1, stat_2, ...)
Pitching_Stats: ([role_id, role_type, time_dimension], stat_1, stat_2, ...)
The role_type should define a pitcher or a batter. Batting_Stats and Pitching_Stats should have a composite foreign key to Roles using (role_id, role_type). A unique constraint on player_id in Roles would ensure that a player can only have one, and only one, role. Finally add check constraints so that Batting_Stats.role_type = 'Batter' and Pitching_Stats.role_type = 'Pitcher'. These check constraint guarantee that Batting_Stats is always describing a batter, and note a pitcher. The same applies for Pitching_Stats.
I know how I would implement this from a practical perspective (I'd create a UNIONed view over the disjoint tables and put a unique index on the player ID - therefore, they can only appear in one table).
Or in the players table, record what type of statistics they have, and then include that in the FK relationship from the stats tables.
But either of these is probably closer to the metal than you want.

Resources