I am building a Data warehouse for finance team and we have measure
revenue,
expense ,
revenue - expense = gross margin
They are connect to following dimension
Project,
Org,
Client,
Date
However some of the Project id which are present in Revenue are not present in the expense and vice-versa.
Should I keep them in separate fact table to get all the data ?
If I keep them separate how will I get gross margin?
What I understand from your Question is :
you are calculating the Revenue and Expense based on project_id.
Assumption : There are projects which will have no Revenue, few might have no Expense too.
In that case, those project_id will have 0 as Revenue or Expense. Ideally, that's how you will get the Gross. If you use separate tables, then there is no point of Gross calculation.
Related
I have the following scenario where OLTP sales data is stored in two separate physical tables:
Sales
Refunds/Cancellations
A refund always refers to an existing sale (thus 'negating' it), though the dimensions of these tables are nearly the same (date, sales clerk, store, etc.). The data schema looks something like the following:
CREATE TABLE sale
(
sale_id uuid,
transaction_at timestamp with time zone,
store_id uuid,
clerk_id uuid,
clerk_number bigint,
currency character varying(3),
pos_id uuid,
total numeric,
net_total numeric
);
CREATE TABLE refund
(
refund_id uuid,
sale_id uuid, -- referenced sale to void
refunded_at timestamp with time zone,
pos_id uuid,
clerk_id uuid,
clerk_number bigint
)
I am trying to figure how to model this data for ingstion in a DW. Since I am relatively new to dimensional modeling I have begun reading The Data Warehouse Toolkit, but I am as of now unsure of the best approach to handle this case.
To my mind, these are two separate fact tables describing two different business processes (e.g. making a sale and getting a refund), though due to normalization concerns the refunds table (besides containing most of the same dimensions) is basically a pointer to a row in the sales table (which is fine for OLTP).
Analytical reports down the line would obviously want to look at these in a few ways:
All net sales per dimension (gross sales minus refunds)
All refund amounts per dimension
Other potential business use cases
As is, the first two cases would require joining the fact tables to either subtract the sales amount (case 1) or to get the information on refund amounts (case 2).
The approach that seems to make the most sense for me is something like the following (via some ETL/ELT process):
Load the (gross) sales data into a table in the DW
Load and denormalize refund data into a table in the DW, joining actual sale data so that amounts etc. are located in the fact table
Join either table with common conformed dimensions for further roll-ups and querying
This makes sense to me because:
Both fact tables have all required information from the physical event, and
There is no explicit dependency between the fact tables, and
Common dimensions can be reused
However, in this case, I still would not be able to get the net sales without joining these two tables. This makes me think that there should be a separate net_sale fact table, but this is problematic:
From a business point of view, sales without refunds are the vast majority of events that occur. A net_sale table would copy basically 99% of all sale data.
From a business process point of view, this table would describe an event that does not exist as such (there is no "net sale", only an aggregated view of sales amount per dimension minus refund costs).
Glossing over the third Chapter in The Data Warehouse Toolkit, I do not see this case mentioned explicitly (though there might be some parallels w.r.t. factless fact-tables and derived facts). What kind of approach would work in a case like this?
I'm working on a datamart for our sales and marketing departments, and I've come across a modeling challenge. Our ERP stores pricing data in a few different ways:
List pricing for each item
A discount percentage from list pricing for a product line, either for groups of customers or for a specific account
A custom price for an item, either for groups of customers or for a specific account
The Pricing department primarily uses this data operationally, not analytically. For example, they generate reports for customers ("What special pricing / discount %s do I have?") and identify which items / item groups need to be changed when they engage in a new pricing strategy.
Pricing changes happen somewhat regularly on a small scale, usually on a customer-by-customer or item-by-item basis. Infrequently, there are large-scale adjustments to list pricing and group pricing (discounts and individual items) in addition to the customer-level discounts.
My head has been in creating one or more fact tables to represent this process. Unfortunately, there's no pre-existing business key for pricing. There's also no specific "transaction date," since the ERP doesn't (accurately) maintain records of when pricing is changed. Essentially, a "pricing event" is going to be a combination of:
Effective date
End date
Item OR product line
(Not required for list price) customer or customer group
A price amount OR discount percentage
A single fact table seems problematic in that I'm going to have to deal with a lot of invalid combinations of dimensions and facts. First, a record will never have both a non-NULL price amount and a non-NULL discount percentage; pricing events are either-or. Second, only certain combinations of dimensions are valid for each fact. For example, a discount percentage will only ever have a product line, not an individual item.
Does it make sense to model pricing as a fact table in the first place? If so, how many tables should I be considering? My intuition is to use at least two, one for the percentages and one for the price amounts, but this still leaves a problem where each record will either have a valid customer group OR a valid customer (or neither, for list prices), since we need to maintain customer-specific pricing separate from any group pricing that customer might have.
You may need to keep them both as attributes and as facts.
The price a certain item was sold for is a fact. When you multiply it by the quantity sold it's actually an additive measure. So, keep it in the fact table. Total discount applied is also additive, I'd keep it. You can later query "how much was discounted in 2019 per customer", which would be much harder to achieve without those facts.
But if you also need to query things like "what's the discount customer X is on", then you should also keep that as an attribute of the customer dimension, and treat it as a type II dimension, so as to keep discount history. If you know when a certain discount was applied, great, if not take the 1st sale as the start date and you won't be too far off.
Maybe the list price can also be kept as an attribute of product or product line in a dimension, but only if they don't change too often; but if most customers get discounts anyway that would be of limited use.
I have this simple Data Warehouse schema:
Flight (ID, pilot, aircraft, airport)
Pilot (ID, name, surname, flight hours)
Aircraft (ID, model)
Airport (ID, name, city)
Flight.pilot references Pilot.ID
Flight.aircraft references Aircraft.ID
Flight.airport references Airport.ID
Flight is going to be my fact table.
Then I'm going to have three dimensions:
pilotage (involving Pilot table)
vehicle (involving Aircraft table)
departures (involving Airport table)
One measure can be flights' number, obtained by count(ID) on Flight table.
In the following picture, you can see the star schema I've just described.
My question is: does it make sense choosing flight hours (which is a column of a table - Pilot - taking part in one dimension - pilotage -) ?
And, more in general, if it is possible/conceptually correct choosing as a measure a column which is not in the fact table.
So, in short, does a measure for a data warehouse cube HAVE TO BE from the fact table? Or also columns from dimensions' tables can be chosen?
Many thanks if you can help me!
Does it make sense choosing flight hours (which is a column of a table - Pilot - taking part in one dimension - pilotage?
What else can flight hours take part in? You're only measuring flight hours for the pilots. You could (should?) measure flight hours for the aircraft, but you have no aircraft flight hours input for your warehouse.
Do the users of your warehouse want to know flight hours for pilots? If so, then your Pilotage table becomes a de facto fact table for flight hours.
It would be more logical for a real warehouse to sum flight hours from the flights themselves, rather than have a lump sum for the pilots and the aircraft. You're going to have to update the pilot flight hours every time you load the warehouse
Is it possible/conceptually correct choosing as a measure a column which is not in the fact table?
Yes. The rule is, if your users are going to query on the column, include it in the data warehouse.
It depends.
Gilbert is right about the main meal of the answer - your flight hours is on a per-pilot basis, so don't go trying to measure it vehicle, or you'll double (or triple, or quadruple...) your numbers because you only have a many-to-many relationship to those dimensions via your Flight fact table.
However, if your fact is non-summable (e.g. "average flight hours of the pilots who fly each vehicle") suddenly it does make sense again.
Now I'm most experienced with SQL Server SSAS models - in those situations I would typically create this as a calculated measure, and it would live in the Flight measure group - when you're making measures like these you have to be very specific about what relationships you're using, and how the aggregation is performed. In this case, the measure would actually cease to be "on the dimension" and would actually be "on the fact" (even though it was calculated by referencing the dimension). Happiness and best practice is resumed.
If you're not able to do that, however, it's really not the end of the world - if it works and makes sence in your context, then it works and makes sense in your context, there's not a lot else that comes in to it. Most of the DW best practices are just about warning you to make sure that it does work make sense in your situation.
So go figure out how you want to use it, and see if you can do that with your existing model.
I want to make relationship between Product entity and Warehouse(location) Entity as you can see in the picture below.
But the problem is the Quantity since the quantity differs in each warehouse and for each product i am not sure if its correct way since in most of the class diagrams for eg. doctrine2.5 there is no mapping class diagram simply annotation would do.
I know i can add extra column in the product entity but what if there are many warehouses i have not seen any practical with many warehouses usually there are large warehouses(space).
What is the best way to represent 'N' no. of Products in 'M' no. of warehouses with the quantities included.
My ER Diagram
In table Product_Location, I assume that the primary key is a combination of ProductId and LocationId, and not only one of that Id.
If that is the case, I don't see why you cannot have different quantity for a particular product in different locations.
For example:
Product A is stored in warehouse X and warehouse Y. The quantity of product A in warehouse X is 10. The quantity of product A in warehouse Y is 20. Thus, the content of table Product_Location will be:
A - X - 10 and A - Y - 20.
Hope this help.
I’m trying to create a basic Point of Sale and Inventory management system.
Some things to take into account:
The products are always the same (same ID) through the whole system, but inventory (available units for sale per product) is unique per location. Location Y and Z may both have for sale units of product X, but if, for example, two units are sold from location Y, location Z’s inventory should not be affected. Its stocked units are still intact.
Selling one (1) unit of product X from location Y, means inventory of location Y should subtract one unit from its inventory.
From that, I thought of these tables:
locations
id
name
products
id
name
transactions
id
description
inventories_header
id
location_id
product_id
inventories_detail
inventories_id
transaction_id
unit_cost
unit_price
quantity
orders_header
id
date
total (calculated from orders_detail quantity * price; just for future data validation)
orders_detail
order_id
transaction_id
product_id
quantity
price
Okay, so, are there any questions? Of course.
How do I keep track of changes in units cost? If some day I start paying more for a certain product, I would need to keep track of the marginal utility ((cost*quantity) - (price*quantity) = marginal utility) some way. I thought of inventories_detail mostly for this. I wouldn’t have cared otherwise.
Are relationships well stablished? I still have a hard time thinking if the locations have inventories, or if inventories have several locations. It’s maddening.
How would you keep/know your current stock levels? Since I had to separate the inventory table to keep up with cost updates, I guess I would just have to add up all the quantities stated in inventories_detail.
Any suggestions do you want to share?
I’m sure I still have some questions, but these are mostly the ones I need addressing. Also, since I’m using Ruby on Rails for the first time, actually, as a learning experience, it’s a shame to be stopped at design, not letting me punch through implementation quicker, but I guess that’s the way it should be.
Thanks in advance.
The tricky part here is that you're really doing more than a POS solution. You're also doing an inventory management & basic cost accounting system.
The first scenario you need to address is what accounting method you'll use to determine the cost of any item sold. The most common options would be FIFO, LIFO, or Specific Identification (all terms that can be Googled).
In all 3 scenarios, you should record your purchases of your goods in a data structure (typically called PurchaseOrder, but in this case I'll call it SourcingOrder to differentiate from your orders tables in the original question).
The structure below assumes that each sourcing order line will be for one location (otherwise things get even more complex). In other words, if I buy 2 widgets for store A and 2 for store B, I'd add 2 lines to the order with quantity 2 for each, not one line with quantity 4.
SourcingOrder
- order_number
- order_date
SourcingOrderLine
- product_id
- unit_cost
- quantity
- location_id
Inventory can be one level...
InventoryTransaction
- product_id
- quantity
- sourcing_order_line_id
- order_line_id
- location_id
- source_inventory_transaction_id
Each time a SourcingOrderLine is received at a store, you'll create an InventoryTransaction with a positive quantity and FK references to the sourcing_order_line_id, product_id and location_id.
Each time a sale is made, you'll create an InventoryTransaction with a negative quantity and FK references to the order_line_id, product_id and location_id, source_inventory_transaction_id.
The source_inventory_transaction_id would be a link from the negative quantity InventoryTransaction back to the postiive quantity InventoryTransaction calculated using whichever accounting method you choose.
Current inventory for a location would be SELECT sum(quantity) FROM inventory_transactions WHERE product_id = ? and location_id = ?
GROUP BY product_id, location_id.
Marginal cost would be calculated by tracing back from the sale, through the 2 related inventory transactions to the SourcingOrder line.
NOTE: You have to handle the case where you allocate one order line across 2 inventory transactions because the ordered quantity was larger that what was left in the next inventory transaction to be allocated. This data structure will handle this, but you'll need to work the logic and query yourself.
Brian is correct. Just to add additional info. If you are working into a complete system for your business or client. I would suggest that you start working on the organizational level down to process of POS and accounting. That would make your database experience more extensive... :P In my experience in system development, Inventory modules always start with the stock taking+(purchases-purchase returns)=SKU available for sales. POS is not directly attached to Inventory module but rather will be reconciled daily by the sales supervisor. Total Daily Sales quantities will then be deducted to SKU available for sales. you will work out also the costing and pricing modules. Correct normalization of database is always a must.