I'm planning the following table:
While I know this is technically feasible (I just tried it), I wanted to see if it seemed unnecessarily complicated... Basically I'm keeping track of revenues vs. costs.
Tab1 contains revenue data for both Tab2 & Tab3. Tab2 contains its own cost data, so that's easy. But the complication is that Tab3 costs are further allocated across Tab2 units. That's why there's a secondary joinsB table there.
I realize this isn't a concrete question, but I know there are so many more experienced folks who, based on that experience, will have a gut "wow this is too complicated" sense or not about what I'm doing. That's what I'd like input/ feedback on as a gut check before I build this.
EDIT for more clarity
Tab1 = Charges
Tab2 = Reservations
Tab3 = Logistics
joinsA = TypeCharges
joinsB = TypeLogistics
A user pays for a reservation of something, but may also pay to have that something shipped logistically. The two payments are wrapped up in one charge. The complication is that one logistics shipment may contain more than 1 reservation under separate users (i.e., separate charges).
This data structure is designed to help me do two things:
easily track exactly which charge is associated with a given reservation or logistics so that for example I can issue a refund, but for a specific amount (for example, a user may keep the reservation but want to cancel the logistics shipment and pay for it him/herself using another vendor)
easily understand how the cost of a logistics breaks down into allocatable costs by reservation
The latter is why Tab3 and Tab2 are joined, and the through table contains more information on the nature of the type of logistics. The through table also contains the charge amount that the user paid to have a reservation shipped logistically. Tab3 contains the cost of the logistics which is then allocated based on how many reservations there are. Then you can compare that allocated cost against the charge for the specific reservation in the joins table.
It seems that you need to track your charges at reservation + logistics level. If I were to model this I would do it as shown in the below diagram where the Reservations is an entity/table and so is Logistics. These have a many-to-many relation with each other, this is many-to-many relation is materialized using the Charges entity/table. So if you need charges for a reservation, you add up all the charges for a reservation, if you need charges for a logistic you add up all the charges for a logistic/shipment. The base data in the Charges table will contain the lowest level of granularity of the charges and can be rolled up by reservation and/or logistic/shipment.
Hope this helps
Related
I'm working on a datamart for our sales and marketing departments, and I've come across a modeling challenge. Our ERP stores pricing data in a few different ways:
List pricing for each item
A discount percentage from list pricing for a product line, either for groups of customers or for a specific account
A custom price for an item, either for groups of customers or for a specific account
The Pricing department primarily uses this data operationally, not analytically. For example, they generate reports for customers ("What special pricing / discount %s do I have?") and identify which items / item groups need to be changed when they engage in a new pricing strategy.
Pricing changes happen somewhat regularly on a small scale, usually on a customer-by-customer or item-by-item basis. Infrequently, there are large-scale adjustments to list pricing and group pricing (discounts and individual items) in addition to the customer-level discounts.
My head has been in creating one or more fact tables to represent this process. Unfortunately, there's no pre-existing business key for pricing. There's also no specific "transaction date," since the ERP doesn't (accurately) maintain records of when pricing is changed. Essentially, a "pricing event" is going to be a combination of:
Effective date
End date
Item OR product line
(Not required for list price) customer or customer group
A price amount OR discount percentage
A single fact table seems problematic in that I'm going to have to deal with a lot of invalid combinations of dimensions and facts. First, a record will never have both a non-NULL price amount and a non-NULL discount percentage; pricing events are either-or. Second, only certain combinations of dimensions are valid for each fact. For example, a discount percentage will only ever have a product line, not an individual item.
Does it make sense to model pricing as a fact table in the first place? If so, how many tables should I be considering? My intuition is to use at least two, one for the percentages and one for the price amounts, but this still leaves a problem where each record will either have a valid customer group OR a valid customer (or neither, for list prices), since we need to maintain customer-specific pricing separate from any group pricing that customer might have.
You may need to keep them both as attributes and as facts.
The price a certain item was sold for is a fact. When you multiply it by the quantity sold it's actually an additive measure. So, keep it in the fact table. Total discount applied is also additive, I'd keep it. You can later query "how much was discounted in 2019 per customer", which would be much harder to achieve without those facts.
But if you also need to query things like "what's the discount customer X is on", then you should also keep that as an attribute of the customer dimension, and treat it as a type II dimension, so as to keep discount history. If you know when a certain discount was applied, great, if not take the 1st sale as the start date and you won't be too far off.
Maybe the list price can also be kept as an attribute of product or product line in a dimension, but only if they don't change too often; but if most customers get discounts anyway that would be of limited use.
I am developing a BI system for our company, from scratch, and currently, I am designing a data warehouse. I am completely new to this so there are many things that I don't really understand, so I need to hear some more insights into this.
My problems are:
1) In our source system, there are tables called "Booking" and "BookingAccess". Booking table holds the data of a booking, such as check-in time and check-out time, booking date, booking number, gross amount of that booking.
Whereas in BookingAccess, it holds foreign keys related to the booking, such as bookerID, customerID, processID, hotelID, paymentproviderID and a current status of that booking. Booking and BookingAccess has a 1:1 relation ship.
Our source system is about checking the validity of those bookings, these bookings are not ours. We receive these booking information from other sources, outsource the above process for them. The gross amount is just an information of that booking that we need to validate, their are not parts of our business. The current status of a booking which is hold in the BookingAccess table is the current status of that booking in our system, which can be "Processing" or "Finshed".
From what I read from Ralph Kimball, in this situation, the "Booking" is the Dimension table, and the BookingAccess should be the fact. I feel that the BookingAccess is some what a [Accumulating Snapshot table], in which I should track the time when a booking is "Processing", and when a booking is "Finshed".
Do I get it right?
2) In "Booking" table, there is also a foreign key called "ImportID". This key links to a table called "Import". This "Import" table hold history records of files (these file contain bookings which will be written to the "Booking" table) which were imported to our system, including attributes such as file name, imported date, total booking imported...
From my point of view, this is clearly a fact table.
But the problem is that, the "Import" table and the "Booking" table has a relationship of one to many (1 ImportID in "Import" table can have 1, 2 or more records which have a same ImportID in "Booking" table). This is against the idea of fact tables which insists that the relationship between Fact and Dimension must be many-to-one, which fact is always in the many side.
So what approach should I use to solve this case? I'm thinking of using bridge tables to solve this problem. But I don't know if this is a good practice, as there are a lot of record in the "Import" table, so I will have to create a big bridge table just to covers all of this.
3) Should I separate a table (from source system) which contains a mix of relationships and information to a fact table containing only relationships, and dimension table containing only information? (For example, a table called "Customer" in source system. This table contains some things like customer name, customer address and customertype id, customer parentID....)
I am asking this because I feel that if I use BI tools to analyze things (for example, analyzing the number of customers which has customertypeid = 1), I feel it's some what weird if there are no fact tables involved in.
Or should I treat it as a mere dimension table and use snowflake-schema? But this will lead to a mix of Star-Schema and snowflake-schema in our Data Warehouse. Is this normal? I have read some official sources (most likely Oracle) stating that one should try to avoid using and mixing snowflake-schema as much as possible. But some sources like Microsoft say that this is very normal. Even the Advanture Work Data Warehouse sample database uses this kind of approach.
Or should I de-normalize every relation in that "Customer" table? But I don't think this is a good approach as it will make the Customer contain a lot of columns, and it will be very hard to track the history of every row in the "DIM_Customer" table. For example, if any change occur in any relation of "Customer" table, the whole "DIM_Customer" table will need to be updated.
I still have a lot of question regarding to Data Warehouse. I am working with it nearly alone, without any help or consultant. So pardon me if I made any kind of inconveniences or mistakes.
I am in the process of designing this E-R diagram for a shop of which I have shown part of below (the rest is not relevant). See the link please:
E-R diagram
The issue that I have is that the shop only sells two items, Socks and Shoes.
Have I correctly detailed this in my diagram? I'm not sure if my cardinalities and/or my design is correct. A customer has to buy at least one of these items for the order to exist (but has the liberty to buy any number).
The Shoe and Sock entities would have their respective ID attribute, and I am planning to translate to a relational schema like this:
(I forgot to add to my diagram the ORDER_CONTAINS relationship to have an attribute called "Quantity". )
Table: Order_Contains
ORDER_ID | SHOEID | SOCKID | QTY
primary key | FK, could be null |FK, could be null | INT
This clearly won't work since the Qty would be meaningless. Is there a way I can reduce the products to just two products and make all this work?
Having two one-to-many relationships combined into one with nullable fields is a poor design. How would you record an order containing both shoes and socks - a row per shoe with SOCKID set to NULL and vice-versa for socks, or would you combine rows? In the former case the meaning of QTY is clear though it depends on the contents of SHOEID/SOCKID fields, but what would the QTY mean in the latter case? How would you deal with rows where both SHOEID and SOCKID are NULL and the QTY is positive? Keep in mind Murphy's law of databases - if it can be recorded it will be. Worse, your primary key (ORDER_ID) will prevent you from recording more than one row, so a customer couldn't buy more than one (pair of) socks or shoes.
A better design would be to have two separate relations:
Order_Socks (ORDER_ID PK/FK, SOCKID PK/FK, QTY)
Order_Shoes (ORDER_ID PK/FK, SHOEID PK/FK, QTY)
With this, there's only one way to record the contents of an order and it's unambiguous.
You have not explained very well the context here. I'll try to explain from what I understand, and give you some hints.
Do your shop only and always (forever) sell 2 products? Do the details of these products (color, model, weight, width, etc...) need to be persisted in the database? If yes, then we have two entities in the model, SOCKS and SHOES. Each entity has its own properties. A purchase or a order is usually seen as an event on the ERD. If your customers always buys (or order) socks with shoes, then there will always be a link between three entities:
CLIENTS --- SHOES --- SOCKS
This connection / association / relationship is an event, and this would be the purchase (or order).
If a customer can buy separate shoes and socks, then socks and shoes are subtypes of a super entity, called PRODUCTS, and a purchase is an event between CUSTOMERS and PRODUCTS. Here in this case we have a partitioning relationship.
If however, your customers buy separate products, and your store will not sell forever only 2 products, and details of the products are not always the same and will not be saved as columns in a table, then the case is another.
Shoes and socks are considered products, as well as other items that can be considered in future. Thus, we have records/rows in a PRODUCTS table.
When a customer places an order (or a purchase), he (she) is acquiring products. There is a strong link between customers and products here, again usually an event, which would be the purchase (or a order).
I do not know if you do it, but before thinking of start a diagram, type the problem context in a paper or a document. Show all details present in the situation.
The entities are seen when they have properties. If you need to save the name of a customer, the customer's eye color, the customer's e-mail, and so on, then you will have certainly a CUSTOMER entity.
If you see entities relate in some way, then you have a relationship, and you should ask yourself what kind of relationship these entities form. In your case of products and customers, we have a purchasing relationship there between. The established relationship is a purchase (or an order, you call it). One customer can buy various products, and one product (not on the same shelf, is the type, model) can be purchased for several customers, thus, we have a Many-To-Many relationship.
The relationship created changes according to the context. Whatever, we'll invent something crazy here as examples. Say we have customers and products. Say you want to persist a situation where customers lick Products (something really crazy, just for you to see how the context says the relationship).
There would be an intimate connection between customers and products entities (really close... I think...). In this case, the relationship represents a history of customers licking products. This would generate an EVENT. In this event you could put properties such as the date, the amount of times a customer licked a proper product, the weather, the time, the traffic light color on the street, etc., only what you need to persist according to your context, your needs.
Remember that for N-N relationships created, we need to see if new entities (out of relationship) will emerge. This usually happens when you are decomposing the conceptual model to the logical model. Probably, product orders will generate not one but two entities: The ORDER and the products of orders. It is within the products of orders that you place the list of products ordered from each customer, and the quantity.
I would like to present various materials to study ERD, but unfortunately they are all in Portuguese. I hope I have helped you in some way. If you want to be more specific about your problem, I think I can really help you best. Anything, please ask.
I’m trying to create a basic Point of Sale and Inventory management system.
Some things to take into account:
The products are always the same (same ID) through the whole system, but inventory (available units for sale per product) is unique per location. Location Y and Z may both have for sale units of product X, but if, for example, two units are sold from location Y, location Z’s inventory should not be affected. Its stocked units are still intact.
Selling one (1) unit of product X from location Y, means inventory of location Y should subtract one unit from its inventory.
From that, I thought of these tables:
locations
id
name
products
id
name
transactions
id
description
inventories_header
id
location_id
product_id
inventories_detail
inventories_id
transaction_id
unit_cost
unit_price
quantity
orders_header
id
date
total (calculated from orders_detail quantity * price; just for future data validation)
orders_detail
order_id
transaction_id
product_id
quantity
price
Okay, so, are there any questions? Of course.
How do I keep track of changes in units cost? If some day I start paying more for a certain product, I would need to keep track of the marginal utility ((cost*quantity) - (price*quantity) = marginal utility) some way. I thought of inventories_detail mostly for this. I wouldn’t have cared otherwise.
Are relationships well stablished? I still have a hard time thinking if the locations have inventories, or if inventories have several locations. It’s maddening.
How would you keep/know your current stock levels? Since I had to separate the inventory table to keep up with cost updates, I guess I would just have to add up all the quantities stated in inventories_detail.
Any suggestions do you want to share?
I’m sure I still have some questions, but these are mostly the ones I need addressing. Also, since I’m using Ruby on Rails for the first time, actually, as a learning experience, it’s a shame to be stopped at design, not letting me punch through implementation quicker, but I guess that’s the way it should be.
Thanks in advance.
The tricky part here is that you're really doing more than a POS solution. You're also doing an inventory management & basic cost accounting system.
The first scenario you need to address is what accounting method you'll use to determine the cost of any item sold. The most common options would be FIFO, LIFO, or Specific Identification (all terms that can be Googled).
In all 3 scenarios, you should record your purchases of your goods in a data structure (typically called PurchaseOrder, but in this case I'll call it SourcingOrder to differentiate from your orders tables in the original question).
The structure below assumes that each sourcing order line will be for one location (otherwise things get even more complex). In other words, if I buy 2 widgets for store A and 2 for store B, I'd add 2 lines to the order with quantity 2 for each, not one line with quantity 4.
SourcingOrder
- order_number
- order_date
SourcingOrderLine
- product_id
- unit_cost
- quantity
- location_id
Inventory can be one level...
InventoryTransaction
- product_id
- quantity
- sourcing_order_line_id
- order_line_id
- location_id
- source_inventory_transaction_id
Each time a SourcingOrderLine is received at a store, you'll create an InventoryTransaction with a positive quantity and FK references to the sourcing_order_line_id, product_id and location_id.
Each time a sale is made, you'll create an InventoryTransaction with a negative quantity and FK references to the order_line_id, product_id and location_id, source_inventory_transaction_id.
The source_inventory_transaction_id would be a link from the negative quantity InventoryTransaction back to the postiive quantity InventoryTransaction calculated using whichever accounting method you choose.
Current inventory for a location would be SELECT sum(quantity) FROM inventory_transactions WHERE product_id = ? and location_id = ?
GROUP BY product_id, location_id.
Marginal cost would be calculated by tracing back from the sale, through the 2 related inventory transactions to the SourcingOrder line.
NOTE: You have to handle the case where you allocate one order line across 2 inventory transactions because the ordered quantity was larger that what was left in the next inventory transaction to be allocated. This data structure will handle this, but you'll need to work the logic and query yourself.
Brian is correct. Just to add additional info. If you are working into a complete system for your business or client. I would suggest that you start working on the organizational level down to process of POS and accounting. That would make your database experience more extensive... :P In my experience in system development, Inventory modules always start with the stock taking+(purchases-purchase returns)=SKU available for sales. POS is not directly attached to Inventory module but rather will be reconciled daily by the sales supervisor. Total Daily Sales quantities will then be deducted to SKU available for sales. you will work out also the costing and pricing modules. Correct normalization of database is always a must.
I'm designing a Ruby on Rails reservation system for our small tour agency. It needs to accommodate a number of things, and the table structure is becoming quite complex.
Has anyone encountered a similar problem before? What sort of issues might I come up against? And are performance/ validation likely to become issues?
In simple terms, I have a customer table, and a reservations table. When a customer contacts us with an enquiry, a reservation is set up, and related information added (e.g., paid/ invoiced, transport required, hotel required, etc).
So far so good, but this is where is gets complex. Under each reservation, a customer can book different packages (e.g. day trip, long tour, training course). These are sufficiently different, require specific information, and are limited in number, such that I feel they should each have a different model.
Also, a customer may have several people in his party. This would result in links between the customer table and the reservation table, as well as between the customer table and the package tables.
So, if customer A were to make a booking for a long trip for customers A,B and C, and a training course for customer B, it would look something like this.
CUSTOMERS TABLE
CustomerA
CustomerB
CustomerC
CustomerD
CustomerE
etc
RESERVATIONS TABLE
1. CustomerA
LONG TRIP BOOKINGS
CustomerA - Reservation_ID 1
CustomerB - Reservation_ID 1
CustomerC - Reservation_ID 1
TRAINING COURSE BOOKINGS
CustomerB - Reservation_ID 1
This is a very simplified example, and omits some detail. For example, there would be a model containing details of training courses, a model containing details of long trips, a model containing long trip schedules, etc. But this detail shouldn't affect my question.
What I'd like to know is:
1) are there any issues I should be aware of in linking the customer table to the reservations model, as well as to bookings models nested under reservations.
2) is this the best approach if I need to handle information about the reservation itself (including invoicing), as well as about the specific package bookings.
On the one hand this approach seems to be complex, but on the other, simplifying everything into a single package model does not appear to provide enough flexibility.
Please let me know if I haven't explained this issue very clearly, I'm happy to provide more information. Grateful for any ideas, suggestions or comments that would help me think through this rather complex database design.
Many thanks!
I have built a large reservation system for travel operators and wholesalers, and I can tell you that it isn't easy. There seems to be similarity yet still large differences in the kinds of product booked. Also, date-sensitivity is a large difference from other systems.
1) In respect to 'customers' I have typically used different models for representing different concepts. You really have:
a. Person / Company paying for the booking
b. Contact person for emergencies
c. People travelling
a & b seem like the same, but if you have an agent booking, then you might want to separate them.
I typically use a => 'customer' table, then some simple contact-fields for b, and finally for c use a 'passengers' table. These could be setup as different associations to the same model, but I think they are different enough, and I tend to separate them - perhaps use a common address/contact model.
2) I think this is fine, but depends on your needs. If you are building up itineraries for a traveller, then it makes sense to setup 'passengers' on the 'reservation', then for individual itinerary items, with links to which passenger is travelling on/using that item.
This is more complicated, and you must be careful to track dependencies, but the alternative is to not track passenger names, and simply assign quantities to each item (1xAdult, 2xChildren). This later method is great for small bookings, so it seems to depend on if your bookings are simple, or typically built up of longer itineraries.
other) In addition, in respect to different models for different product types, this can work well. However, there tends to be a lot of cross over, so some kind of common 'resource' model might be better -- or some other means of capturing common behaviour.
If I haven't answered your questions, please do ask more specific database design questions, or I can add more detail about specific examples of what I've found works well.
Good luck with the design!