When to refer another dimension table in a dimensional table

When to refer another dimension table in a dimensional table - data-warehouse

I want to model contracts that clients have at an organization. Typically one client has one contract. One contract can have multiple machines with different starting and end dates.
The fact table will look something like this:
client_id (FK)
machine_id (FK)
contract_id (DD)
start_date
end_date
rate_per_month
I am wondering how the machine dimension table should look like. Should it also include the client_id? Seems somehow natural, but that seems to violate the 'no snowflake' principle.
machine_id
client_id (FK)(?)
type
model
brand
FK = foreign key, DD = degenerate dimension.
Kimballs book refers to outrigger dimensions which are okay, but I am not sure this counts as one.

I dimension should define the attributes of the modeled entity and not attributes of some relationships of it (this is the role of fact tables).
So from this point of view I do not see any natural arguments to include client_id in the dimension table.
Note that you'll have to update this information in case of a re-assignment of the machine or after a churn of the client.
The relationship between the machine and the client is stored in your fact table.

Related

Normalizing issue with data history

I have two entities: Location and Employee. Each employee works in a single location at a time. For any given moment in time, the model is as follows:
There is, however, a requirement to also store historical information for all locations and employees for every end-of-month. I can achieve this by adding a Month PK attribute in both entities, but: how do I handle the relationship in that case?
A foreign key has to reference a composite PK in its entirety. Several alternatives come to mind:
Option 1: repeat the Month attribute in the Employee entity to get the full PK as FK attributes. This feels a bit redundant? If an employee has existed in a given month, surely she has to work in a location in the same month - i.e. the two Month attributes have to always have the exact same value:
Option 2: re-use the Month attribute in the PK of the Employee entity as a foreign key referencing Location. I don't even know if this is allowed (note: I'm going to be using SQL Server eventually, if it matters here)?
Option 3: create a separate bridge entity that holds the history of Location-Employee relationships. This feels kind of neat, but then again I have some doubts as to whether or not I can use one Month attribute here or if I need two of them. Also, it would allow many-to-many relationships (an employee in several locations on a given month), which is not supposed to happen in this case and I'd like to be able to enforce this in the data model.
Am I missing something obvious here? What is the "correct" and properly normalized solution? Or should I just leave the FK constraints out?

ER Model representing entities not stored in DB and user choice

I'm trying to create a ER diagram of a simple retail chain type database model. You have your customer, the various stores, inventory etc.
My first question is, how to represent a customer placing an order in a store. If the customer is a discount card holder, the company has their name, address etc, so I can have a cardHolder entity connect to item and store with an order relationship. But how do I represent an order being placed by a customer who is not really an entity in the database?
Secondly, how are conditional... stuff represented in ER diagrams, e.g. in a car dealership, a customer may choose one or more optional extra when buying a car. I would think that there is a Car entity with the relevant attributes and the options as a multi-valued attribute, but how do you represent a user picking those options (I.e. order table shows the car ordered, extras chosen and the added cost of extras) in the order relationship?

First, do you really need to model customers as distinct entities, or do you just need order, payment and delivery details? Many retail systems don't track individual customers. If you need to, you can have a customer table with a surrogate key and unique constraints on identifying attributes like SSN or discount card number (even if those attributes are optional). It's generally hard to prevent duplication in customer tables since there's no ideal natural key for people, so consider whether this is really required.
How to model optional extras depends on what they depends on. Some extras might be make or model-specific, e.g. the choice of certain colors or manual/automatic transmission. Extended warranties might be available across the board.
Here's an example of car-specific optional extras:
car (car_id PK, make, model, color, vin, price, ...)
car_extras (extra_id PK, car_id FK, option_name, price)
order (order_id PK, date_time, car_id FK, customer_id FK, payment_id FK, discount)
order_extras (order_id PK/FK, car_id FK, extra_id PK/FK)
I excluded price totals since those can be calculated via aggregate queries.
In my example, order_extras.car_id is redundant, but supports better integrity via the use of composite FK constraints (i.e. (order_id, car_id) references the corresponding columns in order, and (car_id, extra_id) references the corresponding columns in car_optional_extras to prevent invalid extras from being linked to an order).
Here's an ER diagram for the tables above:

First, as per your thought you can definitely have two kinds of customers. Discount card holders whose details are present with the company and new customers whose details aren't available with the company.
There are three possible ways to achieve what you are trying,
1) Have two different order table in the system(which I personally wouldn't suggest)
2) Have a single Order table in the system and getting the details of those who are a discount card holder.
3) Insert a row in the discount card holder table for new/unregistered customers having only one order table in the system.
Having a single order table would make the system standardized and would be more convenient while performing many other operations.
Secondly, to solve your concern, you need to follow normalization. It will reduce the current problem faced and will also make the system redundant free and will make the entities light weighted which will directly impact on the performance when you grow large.
The extra chosen items can be listed in the order against the customer by adding it at the time of generating a bill using foreign key. Dealing with keys will result in fast and robust results instead of storing redundant/repeating details at various places.
By following normalization, the problem can be handled by applying foreign keys wherever you want to refer data to avoid problems or errors.
Preferably NF 4 would be better. Have a look at the following link for getting started with normalization.
http://www.w3schools.in/dbms/database-normalization/

Representation of Lookup table

How can I represent my lookup tables in technical reports?
In other words, the ER model is used to represent a database,
but what about lookup tables?

To recover a conceptual model (entity sets, attributes and relationships) from a physical model (tables and columns), we first have to understand the logical model. This means understanding the domains and functional dependencies which are represented by the lookup table.
Lookup table is a common term which can mean different things. I generally understand it as a table which represents a domain with a surrogate key, and associates it with a name and/or a few other attributes. In the ER model, these would be simple entity relations, and leaves / terminal nodes in the graph of entity sets.
If a lookup table records facts about only one type of thing (represented by the key of the lookup table), then you can represent that type as an entity set (rectangle) with an attribute (oval) for each dependent column, and draw relationships (diamonds) to connect it to other entity sets as required. Look for foreign key columns / constraints in other tables to find these relationships.
For example, consider the following physical model:
CarMake and CarModel are examples of lookup tables. This isn't a very good model, since in the real world CarModelId determines CarMakeId, while the model treats them as independent elements in CarSales. However, since the point of the example is to focus on lookup tables, I'll use it as is.
In this case, CarMake and CarModel describe a single entity set each. Their functional dependencies are CarMakeId -> CarMakeName and CarModelId -> CarModelName. In CarSales, we've got CarSaleId -> RegNumber, Price, SoldOn (attributes) and CarSaleId -> CarMakeId, CarModelId (relationships).
In this case our ER model is similar to the physical model:
However, in some cases, you may find multiple types of things combined into one lookup table due to the similar physical structure. This doesn't affect the logical or conceptual models, but makes it more complicated to recover since we have to understand how the table is used to unpack it.

Entity Relationship Diagram: How to create a Yelp-kind of app with not just one price-range?

Im new to Rails and I'm in the middle of sketching up an ERD for my new app. A Yelp-sort of app, where a Client is sorted by price.
So I want one Client to have many priceranges - One Client can both have pricerange $ and Pricerange $$$$ for example. The priceranges are:
$ - $$ - $$$ - $$$$ - $$$$$
How would this look in a table? Would I create a table called PriceRange with Range1, Range2, Range3, Range4, Range5 to be booleans?
Doesn't the PriceRange-table need any foreign/primary keys?
PriceRange
Range1 (Boolean)
Range2 (Boolean)
Range3 (Boolean)
Range4 (Boolean)
Range5 (Boolean)

Look, I'm Brazilian and I'm not very knowledgeable about yelp applications. I do not quite know what it is, but from what I saw, they are systems to assess/measure/evaluate (perhaps the translation is wrong here for you) things, in this case, companies, right?
Following this logic, let's think...
By the description of your problem (context), you have clients (companies), and they can have price ranges, correct? If:
A price interval is represented by textual names, such as "$", "$$",
and so on,
and the same price range may have (numeric) values for different companies,
And the same price range (type) can be (or not) assigned to different
companies,
Then here is what we have:
By decomposing this conceptual model, you would end up with three tables:
Companies
Price Ranges
Price Ranges from Companies
The primary keys of Company and Price Ranges will be passed to Price Ranges from Companies as foreign keys. You can use them as a composite primary key, or use a surrogate key. If using a surrogate key, you will permit/allow a company to have the same kind of price range more than once, which I believe is not the case.
Let's look at another situation, if things are simpler as:
If there is no need to store prices,
and an company may have or not one or more price ranges represented by "$", "$$", and so on,
Then here is what we have:
Similarly, we'll have the same 3 tables. Likewise, you still must pass the primary keys of Companies and Price Ranges to Price Ranges from Companies as foreign keys.
So I want one Client to have many priceranges - One Client can both
have pricerange $ and Pricerange $$$$ for example
Notice how N-N relationships allow us to create optional relationships between entities. This will allow a company to have zero, one, two, (etc.) or all price ranges defined. Again, so that is not allowed a company to have a price range more than once, set the foreign keys as composite primary key in Price Ranges from Companies.
If you have any questions or anything I explained has nothing to do with your context, please do not hesitate to comment.
EDIT
Is the Price ranges from companies what is called a Joint table?
Yes. There are also other terms used, some in different areas of computer science, such as Link Table, or Intermediate Table.
Actually we do not have a table here in the diagram, but an entity. In the Conceptual Model there are no tables, but entities and relationships. Be careful with this terminology when developing the Conceptual Model, or else you may get confused (I say this from experience).
However, yes, once decomposed, we will have a table from this relationship. When decomposed, N-N relationships will always become tables, no exception. Differently, 1-1 and 1-N (or N-1) relationships do not become tables. These tables with these special names (Join/Link/Intermediate Tables) serves to associate records from different tables, hence the name.
And is it necessary to have a column called Price Range Id? I mean
what is it there for?
At where? If you say at the Price Ranges entity, it is rather necessary. Must We not identify records in a table in some way? Here I set what is called a Surrogate Key. If on the other hand, you have a column with unique values for each record in the table, you can also use this column. I highly recommend that you consider the use of surrogate keys. Read the link I gave you.
In the Conceptual Model, we have to define the properties and also the primary keys. During the phase of the conceptual model, natural attributes of entities can become primary keys if you so desire. In this case, we have what is called a Natural Key.
If on the other hand you refer to Price Ranges from Companies entity, so the question is another ("And is it necessary to have a column called Price Range Id?"). Here we have a table with two columns, as I told you. The two are foreign keys. You need it so you can relate rows from the two tables... I think you were not referring to that, is not it? If so, no problem, you can comment and ask more questions. I do not care to answer. To be honest, I did not quite understand your question.
EDIT 2
So that Company 28 can be identified in the Price Ranges (for instance
ID 40) Which would make it easier to call out the price ranges it has?
Maybe my English is not very good, but it seems to me that you have a beginner's doubt/question in relation to the concept of tables and relationships between them. If not that, I apologize because maybe I did not understand. But let's see...
The tables in a database have rows / records. Each line has its own data. Even with this, each line / record needs to be differentiated and identified somehow. That is why we attach to each line an identifier, known as the primary key (this, and this). In summary, the primary key is how we identify, differentiate, separate and organize different records.
Even if all records have different values, you must select a field (column) that represents the primary key of the table. By obligation, every record MUST have a primary key. Although you can choose which field is a primary key, you are allowed to choose one or more fields to serve as the primary key. When this happens, that is, when more than one field participates/serves as the primary key, we have a table with something called Composite Primary Key. Similarly, it has the ability to identify records. Note that, because of that, primary key values must be unique, otherwise you may have 2 identical records.
This is the basic concept so that we can relate tables to each other, in case, records/rows of tables together. If we have a Company identified by the ID 28 (a line/record), and we want to relate it to a Price Range identified by the ID 40, then we need to store somewhere that relationship (28 <--> 40). This is where the role of intermediate/link/join tables comes in (but only to relationships N-N! For 1-N or N-1 relationships it works similarly, but not identical).
My original question was whether it was necessary, and why a company
ID had to link up with a price range ID at all.
With this table storing records which relates to other records (for their primary keys), we can perform a SQL join operation (If you have questions about this, see this image). Depending on how you perform this operation, you'll get:
All companies that have Price Ranges.
All companies that do not have Price Ranges.
All the Price Ranges of a given company.
All companies that have or not a X Price Range.
All price ranges that are given or not to companies.
...
Anyway, you get all this because of the established relationship.
If it could just be taken out and then the table of price ranges would
only involve Pricerange1-5.
This sentence I did not understand. What should be taken out? Could you please explain this sentence better?

Star schema: how to handle dimension table with constantly changing set of columns?

First project using star schema, still in planning stage. We would appreciate any thoughts and advice on the following problem.
We have a dimension table for "product features used", and the set of features grows and changes over time. Because of the dynamic set of features, we think the features cannot be columns but instead must be rows.
We have a fact table for "user events", and we need to know which product features were used within each event.
So it seems we need to have a primary key on the fact table, which is used as a foreign key within the dimension table (exactly the opposite direction from a conventional star schema). We have several different dimension tables with similar dynamics and therefore a similar need for a foreign key into the fact table.
On the other hand, most of our dimension tables are more conventional and the fact table can just store a foreign key into these conventional dimension tables. We don't like that this means that some joins (many-to-one) will use the dimension table's primary key, but other joins (one-to-many) will use the fact table's primary key. We have considered using the fact table key as a foreign key in all the dimension tables, just for consistency, although the storage requirements increase.
Is there a better way to implement the keys for the "dynamic" dimension tables?
Here's an example that's not exactly what we're doing but similar:
Suppose our app searches for restaurants.
Optional features that a user may specify include price range, minimum star rating, or cuisine. The set of optional features changes over time (for example we may get rid of the option to specify cuisine, and add an option for most popular). For each search that is recorded in the database, the set of features used is fixed.
Each search will be a row in the fact table.
We are currently thinking that we should have a primary key in the fact table, and it should be used as a foreign key in the "features" dimension table. So we'd have:
fact_table(search_id, user_id, metric1, metric2)
feature_dimension_table(feature_id, search_id, feature_attribute1, feature_attribute2)
user_dimension_table(user_id, user_attribute1, user_attribute2)
Alternatively, for consistent joins and ignoring storage requirements for the sake of argument, we could use the fact table's primary key as a foreign key in all the dimension tables:
fact_table(search_id, metric1, metric2) /* no more user_id */
feature_dimension_table(feature_id, search_id, feature_attribute1, feature_attribute2)
user_dimension_table(user_id, search_id, user_attribute1, user_attribute2)
What are the pitfalls with these key schemas? What would be better ways to do it?

You need a Bridge table, it is the recommended solution for many-to-many relationships between fact and dimension.
http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/multivalued-dimension-bridge-table/
Edit after example added to question:
OK, maybe it is not a bridge, the example changes my view.
A fundamental requirement of dimensional modelling is to correctly identify the grain of your fact table. A common example is invoice and line-item, where the grain is usually line-item.
Hypothetical examples are often difficult because you can never be sure that the example mirrors the real use case, but I think that your scenario might be search-and-criteria, and that your grain should be at the criteria level.
For example, your fact table might look like this:
fact_search (date_id,time_id,search_id,criteria_id,criteria_value)
Thinking about the types of query I might want to do against search data, this design is my best choice. The only issue I see is with the data type of criteria_value, it would have to be a choice/text value, and would definitely be non-additive.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart