SASS : Bridge table won't work when browsing the cube - data-warehouse

As described in the data source view schema in the picture above, I have a transaction fact table, the GL_Hierarchy dimension,the AccountList dimension and a bridge table that links between them. Here is an example of each dimension
The GL_Hierarchy dimension is a Parent/Child dimension :
The AccountList dimensions contains all the accounts of the general ledger
The bridge dimension links between an account and one/many categories of the GL hierarchy (Many to Many relationship)
When browsing my cube I want to the see the amount of the transactions and drill down through the hierarchy
When filtering only by account everything works fine
But as soon as I want to filter by using GL_Hierarchy attribute "Category Description" the results are wrong, the accounts are recurrent for every category as if there was no bridge dimension
I'm helpless and can't manage to make it work.
Thank you.

Related

Investment Valuation Data Model Design Issue

I was wondering if somebody could shed some light on the below.
I am currently working on building a kimball data warehouse within the financial sector specifically within the pensions industry.
Currently we are working to integrate the business process of valuations on a scheme.
The requirement is to store all valuations regardless of the product in a single FACT table for reporting. There are many different type of products which a pension can hold (Portfolio, Securities, Property etc) , so we have decided to go down the route of creating supertype and subtype dimensions. There will be one supertype for product which will contain common fields and then a subtype dimension for each product which will contain further detail.
The issue we are currently having is that a security can be held within a portfolio , but on the flip side the portfolio might not hold any investments but still contain a value (may be down to how we store the underlying data).
We do not want to create a single valuation line within the fact table for the portfolio if it has underlying investments we would just expect to show the underlying investments but somehow tie this back to the porfolio. If the porfolio has no underlying investments that we know of we would expect to store a line within the FACT table with just the value of the portfolio which would key directly to the product table.
Does anyone have any suggestions on this?
Here is a structure of how the data is held in the source system.Tables With Sample Data
Here is my proposed design with all of the investment dimensions inter-changeable and the product dimension being core , however this falls over because there is no link between the underlying investment holding and the portfolio.ValutionModel
Updated model with Portfolio Key within Fact UpdatedFact

How deep to go when denormalising

I denormalising a OLTP database for use in a DWH.
At the moment I am denormalising studygroups.
Each studygroup has a key pointing towards 1 project.
Each project has a key pointing towards 1 department.
Each department has a key pointing towards 1 university.
Each universityhas a key pointing to 1 city.
Now I know that you are supposed to denormalize the sh*t out your OLTP but in this dwh department will be a dimension on its own. This goes for university also. Would it suffise to add a key from studygroup pointing at department or is it wiser to denormalize as far as you can and add all attributes from the department and all attributes from its M:1 related tables to the dimension studygroup? Even when department and university will be dimensions by themselves?
In other words: how far/deep do you go when denormalizing?
The key concept behind a dimensional model is:
Keep your fact tables in 3NF (third normal form);
De-normalize your dimensions into 2NF (second normal form)
So ideally, the only joins you should have in your model are the joins between fact tables and relevant dimensions.
As part of this philosophy:
Avoid "snow flake" designs, where dimensions contain keys to other dimensions. It's always possible to come up with a data model that allows the same functionality as the snow flakes, without violating 3NF/2NF rule;
Never have any direct joins between 2 separate dimensions (i.e, department and study group) directly. All relations among dimensions must be resolved via fact tables;
Never have any direct joins between 2 separate fact tables. Any relations among fact tables must be resolved via shared dimensions.
Finally, consider that dimensional design, besides optimization of the data for querying, serves a second important purpose: it's a semantic model of the business (or whatever else it represents). So, when making decisions about combining data elements into dimensions and facts, consider their "logical affinity" - they should make intuitive sense to the end users. If you have hard times explaining to a BI analyst the meaning of your dimension or fact table, most likely you've made a modeling mistake.
For example, in your case you should consider logical relations between universities, departments, study groups, etc. It's very likely that University/Department form a natural hierarchy. If so, they should belong to the same dimension. Study group, on the other hand, might not - let's assume, it's possible to form study groups across multiple universities and/or multiple departments. Such Many:Many relations are clear indication that they should be resolved via fact tables. In addition, relations between universities and departments are stable (rarely change), while study groups are formed and dissolved very often, and thus should be modeled separately.
In general, if you see 1:1 or 1:M relations between dimensional elements, it's often an indication that they should be de-normalized into the same table (again, only if their combination makes logical sense). If the relations are M:M, most likely they belong to different tables (you can force them into the same table, but often such tables look like Frankenstein creatures).
You can get much better help by making your question more specific - draw your dimensional model, post it, and ask for specific issues/challenges you have. For general concepts, books from Kimball and Inmon are your best friends.

Is A Table Linking a Dimension to a Fact table, a Dimension of a Fact?

In my incomplete view of BI tables, a Fact table represents action and a dimension an entity.
I have a FactOrder table that contains order information (including OrderId and CustomerId). There is a separate dimension for people who are actually connected to the order but are not customers. So they are saved in a separate table called DimServiceUser. A linking table connects an Order to a ServiceUser. Should this intermediate OrderServeruser table be defined as a Dimension, a Fact or another type?
It really is more of a bridge table. Here's what is going on.
Your FactOrder table is a fact table, but it also contains a degenerate dimension. A degenerate dimension acts as a dimension key in the fact table, however does not join to a corresponding dimension table because all its interesting attributes have already been placed in other analytic dimensions. So you have an implied DimOrder in there that didn't require a separate table.
A bridge table can connect a set of values to a single fact table row, or it can connect two dimensions (such as customers and bank accounts). It is a way of handling legitimate many-to-many relationships. A bridge table is like a factless fact table. But in dimensional modeling we do not join fact tables together, whereas it is acceptable to join bridge tables and fact tables together. If you must force your bridge table into being a fact or dimension, it is closer to a fact table. But doing so may make it easier to implement bad modeling habits in the future. If you can call it a bridge instead, I would just go with that. (Make sure you read that third link on "like a factless fact table". It was written by the author of Star Schema: The Complete Reference. That is a pretty well accepted source.)
As OrderService does not contain any facts/measurements therefore you cannot call it a fact table.
Dimension table:
A dimension table contains dimensions of a fact.
They are joined to fact table via a foreign key.
Dimension tables are de-normalized tables.
The Dimension Attributes are the various columns in a dimension table
Dimensions offers descriptive characteristics of the facts with the help of their attributes
No set limit set for given for number of dimensions
The dimension can also contain one or more hierarchical relationships
Based on the above definition of dimension table, I believe you table should be referred as dimension table.
OrderServeruser table should be prefix with "bridge".

Material and Product hierarchies

I have a master data with both the material and product details in a single table. I am creating a star schema and my question is do i need to make two dimension table with separate material attributes and product attributes or can i have both in a single dimension table? The current master data looks has the following fields -
Material id, name, type, product hier 1,2,3,4...product hierarchy, product category, sub category. In my case both material and product are same, so a single id.
I am thinking of making it in a single table, but is that the best practice? Any future potential issues?
Many thanks in advance,
Arun
The important (and obvious) thing is, that the fact table has two separate foreign keys: PRODUCT_ID and MATERIAL_ID, both referencing your single dimension table.
This setup is not always best practice for OLTP systems, because in this case the database can't enforce the referential integrity. (You may store a product ID in the MATERIAL_ID column).
But for data-warehouse the database constraints are typically not enabled and are enforced in the loading job, so this setup is fine.
The decision to split is more dependent on the origin of the two dimensions. If both of them are maintained together, I see no reason to split them. If the two dimension are independent, with different lifecycles and separate sources, there is no reason to combine them.
And BTW Kimball IMO mentions the split of hierarch levels (not separate dimensions). So he sees as an mistake to split the product attributes and the hiearchy and category attributes (which is not your problem).
It depends on your business requirement.
If you ever need to produce a report that shows (say) units produced of product category by material, then you need to keep them in separate dimensions.

Single Inheritance or Polymorphic?

I'm programming a website that allows users to post classified ads with detailed fields for different types of items they are selling. However, I have a question about the best database schema.
The site features many categories (eg. Cars, Computers, Cameras) and each category of ads have their own distinct fields. For example, Cars have attributes such as number of doors, make, model, and horsepower while Computers have attributes such as CPU, RAM, Motherboard Model, etc.
Now since they are all listings, I was thinking of a polymorphic approach, creating a parent LISTINGS table and a different child table for each of the different categories (COMPUTERS, CARS, CAMERAS). Each child table will have a listing_id that will link back to the LISTINGS TABLE. So when a listing is fetched, it would fetch a row from LISTINGS joined by the linked row in the associated child table.
LISTINGS
-listing_id
-user_id
-email_address
-date_created
-description
CARS
-car_id
-listing_id
-make
-model
-num_doors
-horsepower
COMPUTERS
-computer_id
-listing_id
-cpu
-ram
-motherboard_model
Now, is this schema a good design pattern or are there better ways to do this?
I considered single inheritance but quickly brushed off the thought because the table will get too large too quickly, but then another dilemma came to mind - if the user does a global search on all the listings, then that means I will have to query each child table separately. What happens if I have over 100 different categories, wouldn't it be inefficient?
I also thought of another approach where there is a master table (meta table) that defines the fields in each category and a field table that stores the field values of each listing, but would that go against database normalization?
How would sites like Kijiji do it?
Your database design is fine. No reason to change what you've got. I've seen the search done a few ways. One is to have your search stored procedure join all the tables you need to search across and index the columns to be searched. The second way I've seen it done which worked pretty well was to have a table that is only used for search which gets a copy of whatever fields that need to be searched. Then you would put triggers on those fields and update the search table.
They both have drawbacks but I preferred the first to the second.
EDIT
You need the following tables.
Categories
- Id
- Description
CategoriesListingsXref
- CategoryId
- ListingId
With this cross reference model you can join all your listings for a given category during search. Then add a little dynamic sql (because it's easier to understand) and build up your query to include the field(s) you want to search against and call execute on your query.
That's it.
EDIT 2
This seems to be a little bigger discussion that we can fin in these comment boxes. But, anything we would discuss can be understood by reading the following post.
http://www.sommarskog.se/dyn-search-2008.html
It is really complete and shows you more than 1 way of doing it with pro's and cons.
Good luck.
I think the design you have chosen will be good for the scenario you just described. Though I'm not sure if the sub class tables should have their own ID. Since a CAR is a Listing, it makes sense that the values are from the same "domain".
In the typical classified ads site, the data for an ad is written once and then is basically read-only. You can exploit this and store the data in a second set of tables that are more optimized for searching in just the way you want the users to search. Also, the search problem only really exists for a "general" search. Once the user picks a certain type of ad, you can switch to the sub class tables in order to do more advanced search (RAM > 4gb, cpu = overpowered).

Resources