Many to Many relationship Dimensional Modeling - data-warehouse

Image of different fees that relate to a transaction
I have a Transactional Fact table showing transactions done by a particular client. I want to relate this fact table to a Dimension Containing Different transaction Fees that might occur on a transaction. Each transaction will be linked to a range of between 1-5 different Transaction Fees which are all listed in different rows within the Transaction Fees Dimension. What is the best way to implement this ? Should I implement something such as Role Play Dimension and have multiple Keys for each type of Transaction Fee ?
Regards,
K

I would create a fact table with an atomicity of transaction + transaction fee.
If you know the fees at the time that the transaction is created then you can perform the join between the transaction and its fees during the data load process.

Related

ER modeling account transaction with two participants

Hello I need help with this modeling
a transaction can have one or two participants, and one or more statements can have that transaction.
But I need to somehow know who owns the transaction and who receives the transaction
An example :
the participant who owns the transaction sends 100 dollars, I need to somehow show that he is losing 100 dollars and the other participant of the transaction receiving 100 dollars
but I’m not able to imagine how I can do that, identify in the transaction who is the origin and the destination and the value
In my opinion, transaction_participants.from_id is the "owner" (payer) and transaction_participants.to_id is the receiver. Multiple records in transaction_participants for single transaction could mean some kind of multi-payment for several receivers.
Concerning how to get the amount of payments, for example amount of payments sent from certain account: you start with rows from transaction_participants where from_id is the account ID, and via transactions_has_transaction_participants you get to transactions.amount, where the total amount of transaction is stored.

Granularity in Star Schema leads to multiple values in Fact Table?

I'm trying to understand star schema at the moment & struggling a lot with granularity.
Say I have a fact table that has session_id, user_id, order_id, product_id and I want to roll-up to sessions by user by week (keeping in mind that not every session would lead to an order or a product & the DW needs to track the sessions for non-purchasing users as well as those who purchase).
I can see no reason to track order_ids or session_ids in the fact table so it would become something like:
week_date, user_id, total_orders, total_sessions ...
But how would I then track product_ids if a user makes more than one purchase in a week? I assume I can't keep multiple product ids in an array (eg: "20/02/2012","5","3","PR01,PR32,PR22")?
I'm thinking it may have to be kept at 'every session' level but that could lead to a very large amount of data. How would you implement granularity for an example such as above?
Dimensional modelling required Dimensions as well as Facts.
You need a Date/Calendar dimension, which includes columns like this:
calendar (id,cal_date,cal_year,cal_month,...)
The "grain" of your fact table is the key to data storage. If you have transactions, then the transaction should be the grain, and you store one row per transaction. Use proper (integer) surrogate keys to your dimensions, and your table won't be as large as you fear.
Now you can write a query like this, to sum sales of product by year:
select product_name,cal_year,sum(purchase_amount)
from fact_whatever
inner join calendar on id = fact_whatever.calendar_id
inner join product on id = fact_whatever.product_id
group by product_name,cal_year

Joins table parent to another joins table

I'm planning the following table:
While I know this is technically feasible (I just tried it), I wanted to see if it seemed unnecessarily complicated... Basically I'm keeping track of revenues vs. costs.
Tab1 contains revenue data for both Tab2 & Tab3. Tab2 contains its own cost data, so that's easy. But the complication is that Tab3 costs are further allocated across Tab2 units. That's why there's a secondary joinsB table there.
I realize this isn't a concrete question, but I know there are so many more experienced folks who, based on that experience, will have a gut "wow this is too complicated" sense or not about what I'm doing. That's what I'd like input/ feedback on as a gut check before I build this.
EDIT for more clarity
Tab1 = Charges
Tab2 = Reservations
Tab3 = Logistics
joinsA = TypeCharges
joinsB = TypeLogistics
A user pays for a reservation of something, but may also pay to have that something shipped logistically. The two payments are wrapped up in one charge. The complication is that one logistics shipment may contain more than 1 reservation under separate users (i.e., separate charges).
This data structure is designed to help me do two things:
easily track exactly which charge is associated with a given reservation or logistics so that for example I can issue a refund, but for a specific amount (for example, a user may keep the reservation but want to cancel the logistics shipment and pay for it him/herself using another vendor)
easily understand how the cost of a logistics breaks down into allocatable costs by reservation
The latter is why Tab3 and Tab2 are joined, and the through table contains more information on the nature of the type of logistics. The through table also contains the charge amount that the user paid to have a reservation shipped logistically. Tab3 contains the cost of the logistics which is then allocated based on how many reservations there are. Then you can compare that allocated cost against the charge for the specific reservation in the joins table.
It seems that you need to track your charges at reservation + logistics level. If I were to model this I would do it as shown in the below diagram where the Reservations is an entity/table and so is Logistics. These have a many-to-many relation with each other, this is many-to-many relation is materialized using the Charges entity/table. So if you need charges for a reservation, you add up all the charges for a reservation, if you need charges for a logistic you add up all the charges for a logistic/shipment. The base data in the Charges table will contain the lowest level of granularity of the charges and can be rolled up by reservation and/or logistic/shipment.
Hope this helps

Track multiple status in Transaction Fact Table

I have to track the status of my business process for analysis purpose. I have seen a post where it is mentioned that we can keep the status in Transaction Fact Table against time/transaction type/service center and we can use the Accumulated fact table to study the process lag, I am wondering if few transactions have multiple status in a single day should I store all the status in Transaction Fact Table? Here I am assuming that my ETL is done at end of the business day.
Secondly should i keep all my key dimensions keys into Transaction Fact Table. Keys in this case are Transaction Type, Department id, Service_type, Service_id, Submission Channel or should I divide them in multiple fact tables?
Third if I need to report which department is meeting its SLA what would be the best approach, Calculate and keep track of Within SLA and Not Within SLA in Transaction Fact Table or I should compute this value at run time?
Thanks in advance for your help and assistance.
For status tracking you should have:
A transaction table where ony events show up (but does not provide event tracing)
An accumulating snapshot table where each process's status are tracked/updated as they happen.
As for the keys, you should keep as much detail as possible. No need to delete keys if they may hold valuable information in the future.

Transaction lifecycle tracking in data warehouse

How do you store facts within which data is related? And how do you configure the measure? For example, I have a data warehouse that tracks the lifecycle of an order, which changes states - ordered, to shipped, to refunded. And for a state like 'refunded', it is not always there. So in my model, I am employing the transaction store model, so every time the order changes state, it is another row in the fact table. So, for an order that was placed in april, and refunded in may, there will be two rows - one with a state of 'ordered' and another with a state of 'refunded'. So if the user wanted to see all the orders placed/ordered in april, and wanted to see how many of 'those' orders got refunded, how would he see that? Is this a MDX query that will be run at runtime? Is this is a calculated measure I can store in the cube? How would I do that? My thought process is that it should be a fact that the user can use in a pivottable, but I'm not sure.....
One way to model this would be to create a factless fact table to model events. Your ORDERS fact table models the transaction amount, customer information etc, while the factless fact table (perhaps called ORDER_STATUS) models any events that occur in relation to a specific order.
With this model, it's easy to count or add all transactions based on their order status by checking for existence of records in the factless fact table.

Resources