Kimball's Data Warehouse Toolkit - When to rename Date foreign keys - data-warehouse

A lot of times in "The Datawarehouse Toolkit" the authors switch their date foreign key name:
Sometimes they use "Date Key (FK)"
Other times they name it after the fact e.g. "Invoice Date Key (FK)"
Why and when should this occur? When there are multiple dates within the same Fact? When there are multiple facts within the same process?
When this happens should a role-playing date dimension be used? If so, wouldn't that prevent the BI tool from aggregating between multiple facts? (since they have different date dimensions, roles)

Related

Storing Date Components Instead of a Date

My app lets people log the movies they see (for example). Each logged movie usually (but not always) has a date and sometimes has a time. It's not unusual to have one but not the other. Occasionally the dates are only a year ("I watched a Dumbo sometime in 1984"), but could realistically be any combination of day/month/year/time.
I am used to modeling dates as date objects in my app and my backend. But is it a viable approach to store each component separately? When I need to reference an actual date from the components (e.g. for sorting the log) this will be built client-side, or perhaps be stored as a derived property sortDate and updated whenever any of the components change.
My reservation is that the information the user is saving is truly a 'moment in time' and I will have to take care of some things myself - for example what time zone are my components stored relative to? This would be captured automatically as part of a real Date object.
The alternative seems to be assuming some sort of 'default' for missing components (e.g. year 0000 if no year, time 00:00 if no time). But those defaults have meaning and I won't be able to distinguish them from 'not provided'.
What are the limitations and/or pitfalls of this approach? Does anyone have experience modeling their dates this way?
If it's of any consequence, my app is for iOS written in Swift and uses a Parse Server backend.
I've successfully used question marks to represent ambiguous and unknown timestamp parts in legal systems. Try to keep in mind that you're really not modeling dates here ('1984' isn't a date); you're modeling facts about dates.
So, if one of your users saw a movie some time in 1984, you might record the value '1984-??-?? ??:??:??' in a text column in a database. Values like this sort sensibly.
See also this answer on dba. Comments on that answer are also good to read.

How to get the daily data of "Microsoft.VSTS.Scheduling.CompletedWork"?

We need to get the daily data from the "Microsoft.VSTS.Scheduling.CompletedWork"field (which is detailed in Workload, scheduling and time tracking field references). However I get data from the Analysis database and found that it only records one last new data,and can't get the historical data.
For example the task of ID 3356, who's "CompletedWork" is 3 hours in 2016/8/4, and I get the exact 3 hours-data from the Analysis database in the second day, 2016/8/5, as the pictures in this post show.
Then on the 2016/8/5, I update the "CompletedWork" from 3 hours to 4 hours and I get the exact 4 hours-data from the Analysis database in the second day, 2016/8/6. However the 3 hours-data of 2016/8/4 is lost. Well, How can I get the historical data of "Microsoft.VSTS.Scheduling.CompletedWork"?
First of all, it's important to understand that the CompletedWork is a cumulatieve data field. So when one user enters 3 and another enters 4, the total number of hours worked on the field is 4 not 7.
The warehouse has a granularity of a day and keeps that data int he cube, though the relational warehouse tables will store all the changes to the reportable fields on a per-revision bases. You can't easily query this data using the qube or Excel Power Pivot and they're lost in the Dim* and fact* tables, but you can write a SQL query against tfs_warehouse and iterate through the tables containing the workitem data (tbl_workitems[are|were|latest]). This is much slower and much harder to build unfortunately.
Your other alternative is to use the TFS Client Object Model and query the WorkItemStore object directly. You'll be able to query all work items of interest and iterate through them and their revisions. The API for workitems is relatively easy to use and is well documented.
If you're on TFS 2015 you can also use the new REST api to query workitem data and revisions.

Fact tables with different level Date Dimension Data as Date Dimension Key

I am a beginner in warehousing. I have two facts Which their names are sales and budget.
I can put days (Date Dimension key) in my sales Fact, but the table i have for budget can be just in month detail. so i don't know what i should do. would you please tell me what are the best practices in this case?
regards
Mana
In this scenario, I generally find it easiest to store the month level data always on either the first/last day of the month. This way, you can still aggregate up to month from date and compare sales & budget; and you will only store the budget value once a month as intended. This would also help if down the road you're asked to store the budget data at the day level.
If you don't want to use this approach, then you would want to snowflake out your date dimension and have a separate month dimension, and then your budget fact table can FK to this new dimension.

Joining Two Data-sets in PowerPivot by Month

I've got 2 different data sets, revenue and contracts sold, that I need to join based off of year and month in PowerPivot so when I use my slicers, they'll filter accordingly. I know part of this will involve coming up with some temp tables for year and month but I can't get those to work. In the contracts sold table, there is an actual date column which I'm then using to format the year/month in "MM-MMM" format:
However, the revenue comes in only as a YYYYMM format:
So the solution would have to take into account this aspect as well. It's been a while since I've dealt with PowerPivot and I recall the PowerPivotPro or Kasep de Jonge's site containing something about linking tables based off of common month but I can't find those pages anymore. If anyone could point me in the right direction or give me some insight, it'd be greatly appreciated.
I'm using Excel 2010 with PowerPivot version 11.0.3000.0.
Thanks,
Joshua
Joshua, I think the solution can be quite simple:
In the contracts sold table, create a new calculated column (a new column within a powerpivot window) that would give you the same date format as is in the revenue table (YYYYMM).
Use Create Time Dimension app in Excel 2013 -- this app creates a date-table with unique dates which makes everything much easier. As with the other table, create a new calculated column with the same format (YYYYMM).
Make a relationship between those tables -- the date table will be linked to revenue as well as contracts.
Created required measures (like sums of revenue, number of contracts etc.).
Place a new pivot table - rows will probably be date-based (YYYYMM), with measures coming from both tables it should be easy to create a report that you need.

Data warehouse reporting questions

I've just begun diving into data warehousing and I have one question that I just can't seem to figure out.
I have a business which has ten stores, each with a certain employees. In my data warehouse I have a dimension representing the store. The employee dimension is a SCD, with a column for start/end, and the store at which the employee is working.
My fact table is based on suggestions the employees give (anonymously) to the store managers. This table contains the suggestion type (cleanliness, salary issue, etc), the date it was submitted (foreign keyed to a Time dimension table), and the store at which it was submitted.
What I want to do is create a report showing the ratio of the number of suggestions to the number of employees in a given year. Because the number of employees changes periodically I just can't do a simple query for the total number of employees.
Unfortunately I've searched the web quite a bit trying to find a solution but the majority of the examples are retail based sales, which is different from what I'm trying to do.
Any help would be appreciated. I do have the AdventureWorksDW installed on my machine so I can use that as a point of reference if anyone offers a suggestion using that.
Thanks in advance!
The slowly changing dimension should have a natural key that identifies the source of the row (otherwise how would it know what to compare to detect changes). This should be constant amongst all iterations of the dimension. You can get a count of employees by computing a distinct count of the natural key.
Edit: If your transaction table (suggestion) has a date on it, a distinct count of employees grouped by a computed function of the suggestion date (e.g. datepart (yy, s.SuggestionDate)) and the business unit should do it. You don't need to worry about the date on the employee dimension as the applicable row should join directly to the transaction table.
Add another fact table for number of Employees in each store for each month -- you could use max number for the month. Then average months for the year, use this as "number of employees in a year".
Load your new fact table at the end of each month. The new table would look like:
fact table: EmployeeCount
KeyEmployeeCount int -- surrogate key
KeyDate int -- FK to date dimension, point to last day of a month
KeyStore int -- FK to store dimension
NumberOfEmployes int -- (max) number of employees for the month in a given store
If you need a finer resolution, use "per week" or even "per day". The main idea is to average the NumberOfEmployes measure for a given store over the year.

Resources