how to capture the updates happening on dimension table - data-warehouse

i have a fact table joined to 5 dimension tables. both the fact and dimension tables have metadata fields DWcreateddate,DWupdatedate,DWdeleteddate,DWdeletedflag.am building a table which flattens out the fact table by joining all the dimension on surrogate keys.
am doing the incremental load to the flattened table.am tracking the upserts happening on the fact table by metadata fields and doing the incremental load(stored procedure does that)...if a record is updated to a new name in the dimension table the fact DWupdated date doesnt have the latest date..so my flattened table is ending up having the old name..can some one help how to overcome this

You should never update your dimension. Once created, should be left alone with a few exceptions like slowly changing dimensions. You should be creating a new dimension record instead.

Related

Can you extract a dimension table from a fact table?

Here's the situation, in the source database we have more than 600K active rows for a dimension but in reality the business only uses 100 of them.
Unfortunately the list of values that they might use is not known and we can't manually filter on those values to populate the dimension table.
I was thinking, what if I include the dimension columns for that table in the fact table and then when we send that to staging area, just seperate it from the fact and send it to it's own table.
This way, I will only capture the values that are actually used.
P.S. They have a search function in the application that help users navigate through 600K values. it's not like a drop-down field !
Do you have a better recommendation?
Yes - you could build the Dimension from the fact staging table. A couple of things to consider:
If the only attribute for the Dimension is the field in the fact staging table then you can keep this as a degenerate dimension in the fact table; no need to build a dimension table for it - unless you have other requirements that require a standalone dimension table, such as your BI tool needs it.
If there are other attributes you need to include in the dimension then you are still going to need to bring in the source dimension table - but you can filter it using the the values in the fact staging table and only load the used values into your dimension

Is A Table Linking a Dimension to a Fact table, a Dimension of a Fact?

In my incomplete view of BI tables, a Fact table represents action and a dimension an entity.
I have a FactOrder table that contains order information (including OrderId and CustomerId). There is a separate dimension for people who are actually connected to the order but are not customers. So they are saved in a separate table called DimServiceUser. A linking table connects an Order to a ServiceUser. Should this intermediate OrderServeruser table be defined as a Dimension, a Fact or another type?
It really is more of a bridge table. Here's what is going on.
Your FactOrder table is a fact table, but it also contains a degenerate dimension. A degenerate dimension acts as a dimension key in the fact table, however does not join to a corresponding dimension table because all its interesting attributes have already been placed in other analytic dimensions. So you have an implied DimOrder in there that didn't require a separate table.
A bridge table can connect a set of values to a single fact table row, or it can connect two dimensions (such as customers and bank accounts). It is a way of handling legitimate many-to-many relationships. A bridge table is like a factless fact table. But in dimensional modeling we do not join fact tables together, whereas it is acceptable to join bridge tables and fact tables together. If you must force your bridge table into being a fact or dimension, it is closer to a fact table. But doing so may make it easier to implement bad modeling habits in the future. If you can call it a bridge instead, I would just go with that. (Make sure you read that third link on "like a factless fact table". It was written by the author of Star Schema: The Complete Reference. That is a pretty well accepted source.)
As OrderService does not contain any facts/measurements therefore you cannot call it a fact table.
Dimension table:
A dimension table contains dimensions of a fact.
They are joined to fact table via a foreign key.
Dimension tables are de-normalized tables.
The Dimension Attributes are the various columns in a dimension table
Dimensions offers descriptive characteristics of the facts with the help of their attributes
No set limit set for given for number of dimensions
The dimension can also contain one or more hierarchical relationships
Based on the above definition of dimension table, I believe you table should be referred as dimension table.
OrderServeruser table should be prefix with "bridge".

Load fact table using informatica

How can we load fact table in star schema using informatica powercenter ? Can you please provide any example for mappings/tranformations for this.
to load fact table ,if there is star schema dimentions table are independant at that time lookup on every dimention which you have to load, override the query with only active records check the condition with only natural key means your primary key in dimention after that on that basis take the surrogate key which artifically made by us for loading dimention table and also take which field you want to load in to that fact table.
Take the Staging tables as source tables and take the dimensions as lookups then load the data into fact table.
eg. http://www.folkstalk.com/2012/11/how-to-load-rows-into-fact-table-in.html
I was not able to find one when I was learning, hence adding this screenshot as a reference for new learners.
the mapping basically looks up at each of the dimension tables, and loads the dimension keys into fact as Foriegn keys and rest of the active records should come from SQ, I have used SQL override to perform all the joins and conditions required for loading the fact records.

How are dimensions and fact tables related in a star diagram?

If you have a relational database and you want to start making reports, you might do the following (please let me know if this is incorrect).
Go through your relational database and make a list of all the columns that you want to include in your report.
Group related columns together and then split those (normalise) into additional tables. These are the dimensions.
The dimensions then have a primary key (possibly a combination of two rows), and the fact table has a foreign key to reference each dimension, plus fields that you don't separate out in the first place such as sales value.
The question:
I was originally seeing dimensions as data marts that referenced data from external sources, and a fact table that in turn referenced data in the dimensions.. that's incorrect, isn't it? It's the other way around...
Or in general, if you were to normalise a database you would always replace the columns you take out a table with a foreign key, and add a primary key to the new table?
A fact table represents a process or event that you want to analyze.
Step 1: What is the process or event that you want to analyze?
The columns in the fact table represent all of the variables that are pertinent to your analysis.
Step 2: What variables are pertinent to the analysis?
Whether you "split-out" columns into dimension tables is irrelevant to your understanding. It's an optimization to minimize the space taken up by fact tables.
If you want to discriminate between measures and dimensions, ask
Step 3: What are the (true) numeric values in my fact table? These are your measures.
An example of a true numeric value is a dollar amount, like Sales Order Line Item Extended Price. You can sum it up or take an average of it.
An example of a not true numeric value is Customer ID 12345. It's a number, but represents something that isn't a number (a customer). The sum of customer ids makes no sense, nor does the average. Dig?
Regarding your questions:
Fact tables do not need foreign keys to dimension tables. (hint: see Hot-Swappable Dimensions)
"dimensions as data marts that referenced data from external sources". Hm...maybe, but don't worry about data marts for now. A dimension is just a column in your fact table (that isn't a measure). A dimension table is just a collection of dimensions that are related.
Just start with Excel. Figure out the columns you need in your analysis. Put them in Excel. That's your fact table. If you expect your fact table to get large (100s of MB), then do ONE level of normalization:
Figure out your measures. Leave them in the fact table.
Figure out your dimensions. Group them together (Customer info into one group, Store info into another).
Put them in their own tables. Give them meaningless surrogate keys. Put those keys in the fact table.

Adding a new dimension to an existing Data warehouse

What is the standard practice for adding a new dimension (a totally new table, not a new row to an exisiting dimension table)? Wouldn't you have to redo the entire fact table, to add a new field, and then populate it based on the surrogate key in your new dimension table? Any simpler ways to do this?
As long as the new dimension does not alter the grain of any of your fact tables, it should not be a problem to add a new dimension. If any of the existing fact table records are non-applicable to the new dimension, simply populate the new foreign key column with the dummy key in those cases.
Remember to check the overhead related to your fact tables that require the new dimensional key, and scale it according to the number of existing records in each fact table.

Resources