ER Model representing entities not stored in DB and user choice - entity-relationship

I'm trying to create a ER diagram of a simple retail chain type database model. You have your customer, the various stores, inventory etc.
My first question is, how to represent a customer placing an order in a store. If the customer is a discount card holder, the company has their name, address etc, so I can have a cardHolder entity connect to item and store with an order relationship. But how do I represent an order being placed by a customer who is not really an entity in the database?
Secondly, how are conditional... stuff represented in ER diagrams, e.g. in a car dealership, a customer may choose one or more optional extra when buying a car. I would think that there is a Car entity with the relevant attributes and the options as a multi-valued attribute, but how do you represent a user picking those options (I.e. order table shows the car ordered, extras chosen and the added cost of extras) in the order relationship?

First, do you really need to model customers as distinct entities, or do you just need order, payment and delivery details? Many retail systems don't track individual customers. If you need to, you can have a customer table with a surrogate key and unique constraints on identifying attributes like SSN or discount card number (even if those attributes are optional). It's generally hard to prevent duplication in customer tables since there's no ideal natural key for people, so consider whether this is really required.
How to model optional extras depends on what they depends on. Some extras might be make or model-specific, e.g. the choice of certain colors or manual/automatic transmission. Extended warranties might be available across the board.
Here's an example of car-specific optional extras:
car (car_id PK, make, model, color, vin, price, ...)
car_extras (extra_id PK, car_id FK, option_name, price)
order (order_id PK, date_time, car_id FK, customer_id FK, payment_id FK, discount)
order_extras (order_id PK/FK, car_id FK, extra_id PK/FK)
I excluded price totals since those can be calculated via aggregate queries.
In my example, order_extras.car_id is redundant, but supports better integrity via the use of composite FK constraints (i.e. (order_id, car_id) references the corresponding columns in order, and (car_id, extra_id) references the corresponding columns in car_optional_extras to prevent invalid extras from being linked to an order).
Here's an ER diagram for the tables above:

First, as per your thought you can definitely have two kinds of customers. Discount card holders whose details are present with the company and new customers whose details aren't available with the company.
There are three possible ways to achieve what you are trying,
1) Have two different order table in the system(which I personally wouldn't suggest)
2) Have a single Order table in the system and getting the details of those who are a discount card holder.
3) Insert a row in the discount card holder table for new/unregistered customers having only one order table in the system.
Having a single order table would make the system standardized and would be more convenient while performing many other operations.
Secondly, to solve your concern, you need to follow normalization. It will reduce the current problem faced and will also make the system redundant free and will make the entities light weighted which will directly impact on the performance when you grow large.
The extra chosen items can be listed in the order against the customer by adding it at the time of generating a bill using foreign key. Dealing with keys will result in fast and robust results instead of storing redundant/repeating details at various places.
By following normalization, the problem can be handled by applying foreign keys wherever you want to refer data to avoid problems or errors.
Preferably NF 4 would be better. Have a look at the following link for getting started with normalization.
http://www.w3schools.in/dbms/database-normalization/

Related

Entity Relationship Diagram: How to create a Yelp-kind of app with not just one price-range?

Im new to Rails and I'm in the middle of sketching up an ERD for my new app. A Yelp-sort of app, where a Client is sorted by price.
So I want one Client to have many priceranges - One Client can both have pricerange $ and Pricerange $$$$ for example. The priceranges are:
$ - $$ - $$$ - $$$$ - $$$$$
How would this look in a table? Would I create a table called PriceRange with Range1, Range2, Range3, Range4, Range5 to be booleans?
Doesn't the PriceRange-table need any foreign/primary keys?
PriceRange
Range1 (Boolean)
Range2 (Boolean)
Range3 (Boolean)
Range4 (Boolean)
Range5 (Boolean)
Look, I'm Brazilian and I'm not very knowledgeable about yelp applications. I do not quite know what it is, but from what I saw, they are systems to assess/measure/evaluate (perhaps the translation is wrong here for you) things, in this case, companies, right?
Following this logic, let's think...
By the description of your problem (context), you have clients (companies), and they can have price ranges, correct? If:
A price interval is represented by textual names, such as "$", "$$",
and so on,
and the same price range may have (numeric) values for different companies,
And the same price range (type) can be (or not) assigned to different
companies,
Then here is what we have:
By decomposing this conceptual model, you would end up with three tables:
Companies
Price Ranges
Price Ranges from Companies
The primary keys of Company and Price Ranges will be passed to Price Ranges from Companies as foreign keys. You can use them as a composite primary key, or use a surrogate key. If using a surrogate key, you will permit/allow a company to have the same kind of price range more than once, which I believe is not the case.
Let's look at another situation, if things are simpler as:
If there is no need to store prices,
and an company may have or not one or more price ranges represented by "$", "$$", and so on,
Then here is what we have:
Similarly, we'll have the same 3 tables. Likewise, you still must pass the primary keys of Companies and Price Ranges to Price Ranges from Companies as foreign keys.
So I want one Client to have many priceranges - One Client can both
have pricerange $ and Pricerange $$$$ for example
Notice how N-N relationships allow us to create optional relationships between entities. This will allow a company to have zero, one, two, (etc.) or all price ranges defined. Again, so that is not allowed a company to have a price range more than once, set the foreign keys as composite primary key in Price Ranges from Companies.
If you have any questions or anything I explained has nothing to do with your context, please do not hesitate to comment.
EDIT
Is the Price ranges from companies what is called a Joint table?
Yes. There are also other terms used, some in different areas of computer science, such as Link Table, or Intermediate Table.
Actually we do not have a table here in the diagram, but an entity. In the Conceptual Model there are no tables, but entities and relationships. Be careful with this terminology when developing the Conceptual Model, or else you may get confused (I say this from experience).
However, yes, once decomposed, we will have a table from this relationship. When decomposed, N-N relationships will always become tables, no exception. Differently, 1-1 and 1-N (or N-1) relationships do not become tables. These tables with these special names (Join/Link/Intermediate Tables) serves to associate records from different tables, hence the name.
And is it necessary to have a column called Price Range Id? I mean
what is it there for?
At where? If you say at the Price Ranges entity, it is rather necessary. Must We not identify records in a table in some way? Here I set what is called a Surrogate Key. If on the other hand, you have a column with unique values for each record in the table, you can also use this column. I highly recommend that you consider the use of surrogate keys. Read the link I gave you.
In the Conceptual Model, we have to define the properties and also the primary keys. During the phase of the conceptual model, natural attributes of entities can become primary keys if you so desire. In this case, we have what is called a Natural Key.
If on the other hand you refer to Price Ranges from Companies entity, so the question is another ("And is it necessary to have a column called Price Range Id?"). Here we have a table with two columns, as I told you. The two are foreign keys. You need it so you can relate rows from the two tables... I think you were not referring to that, is not it? If so, no problem, you can comment and ask more questions. I do not care to answer. To be honest, I did not quite understand your question.
EDIT 2
So that Company 28 can be identified in the Price Ranges (for instance
ID 40) Which would make it easier to call out the price ranges it has?
Maybe my English is not very good, but it seems to me that you have a beginner's doubt/question in relation to the concept of tables and relationships between them. If not that, I apologize because maybe I did not understand. But let's see...
The tables in a database have rows / records. Each line has its own data. Even with this, each line / record needs to be differentiated and identified somehow. That is why we attach to each line an identifier, known as the primary key (this, and this). In summary, the primary key is how we identify, differentiate, separate and organize different records.
Even if all records have different values, you must select a field (column) that represents the primary key of the table. By obligation, every record MUST have a primary key. Although you can choose which field is a primary key, you are allowed to choose one or more fields to serve as the primary key. When this happens, that is, when more than one field participates/serves as the primary key, we have a table with something called Composite Primary Key. Similarly, it has the ability to identify records. Note that, because of that, primary key values must be unique, otherwise you may have 2 identical records.
This is the basic concept so that we can relate tables to each other, in case, records/rows of tables together. If we have a Company identified by the ID 28 (a line/record), and we want to relate it to a Price Range identified by the ID 40, then we need to store somewhere that relationship (28 <--> 40). This is where the role of intermediate/link/join tables comes in (but only to relationships N-N! For 1-N or N-1 relationships it works similarly, but not identical).
My original question was whether it was necessary, and why a company
ID had to link up with a price range ID at all.
With this table storing records which relates to other records (for their primary keys), we can perform a SQL join operation (If you have questions about this, see this image). Depending on how you perform this operation, you'll get:
All companies that have Price Ranges.
All companies that do not have Price Ranges.
All the Price Ranges of a given company.
All companies that have or not a X Price Range.
All price ranges that are given or not to companies.
...
Anyway, you get all this because of the established relationship.
If it could just be taken out and then the table of price ranges would
only involve Pricerange1-5.
This sentence I did not understand. What should be taken out? Could you please explain this sentence better?

E-R diagram confusion

I am in the process of designing this E-R diagram for a shop of which I have shown part of below (the rest is not relevant). See the link please:
E-R diagram
The issue that I have is that the shop only sells two items, Socks and Shoes.
Have I correctly detailed this in my diagram? I'm not sure if my cardinalities and/or my design is correct. A customer has to buy at least one of these items for the order to exist (but has the liberty to buy any number).
The Shoe and Sock entities would have their respective ID attribute, and I am planning to translate to a relational schema like this:
(I forgot to add to my diagram the ORDER_CONTAINS relationship to have an attribute called "Quantity". )
Table: Order_Contains
ORDER_ID | SHOEID | SOCKID | QTY
primary key | FK, could be null |FK, could be null | INT
This clearly won't work since the Qty would be meaningless. Is there a way I can reduce the products to just two products and make all this work?
Having two one-to-many relationships combined into one with nullable fields is a poor design. How would you record an order containing both shoes and socks - a row per shoe with SOCKID set to NULL and vice-versa for socks, or would you combine rows? In the former case the meaning of QTY is clear though it depends on the contents of SHOEID/SOCKID fields, but what would the QTY mean in the latter case? How would you deal with rows where both SHOEID and SOCKID are NULL and the QTY is positive? Keep in mind Murphy's law of databases - if it can be recorded it will be. Worse, your primary key (ORDER_ID) will prevent you from recording more than one row, so a customer couldn't buy more than one (pair of) socks or shoes.
A better design would be to have two separate relations:
Order_Socks (ORDER_ID PK/FK, SOCKID PK/FK, QTY)
Order_Shoes (ORDER_ID PK/FK, SHOEID PK/FK, QTY)
With this, there's only one way to record the contents of an order and it's unambiguous.
You have not explained very well the context here. I'll try to explain from what I understand, and give you some hints.
Do your shop only and always (forever) sell 2 products? Do the details of these products (color, model, weight, width, etc...) need to be persisted in the database? If yes, then we have two entities in the model, SOCKS and SHOES. Each entity has its own properties. A purchase or a order is usually seen as an event on the ERD. If your customers always buys (or order) socks with shoes, then there will always be a link between three entities:
CLIENTS --- SHOES --- SOCKS
This connection / association / relationship is an event, and this would be the purchase (or order).
If a customer can buy separate shoes and socks, then socks and shoes are subtypes of a super entity, called PRODUCTS, and a purchase is an event between CUSTOMERS and PRODUCTS. Here in this case we have a partitioning relationship.
If however, your customers buy separate products, and your store will not sell forever only 2 products, and details of the products are not always the same and will not be saved as columns in a table, then the case is another.
Shoes and socks are considered products, as well as other items that can be considered in future. Thus, we have records/rows in a PRODUCTS table.
When a customer places an order (or a purchase), he (she) is acquiring products. There is a strong link between customers and products here, again usually an event, which would be the purchase (or a order).
I do not know if you do it, but before thinking of start a diagram, type the problem context in a paper or a document. Show all details present in the situation.
The entities are seen when they have properties. If you need to save the name of a customer, the customer's eye color, the customer's e-mail, and so on, then you will have certainly a CUSTOMER entity.
If you see entities relate in some way, then you have a relationship, and you should ask yourself what kind of relationship these entities form. In your case of products and customers, we have a purchasing relationship there between. The established relationship is a purchase (or an order, you call it). One customer can buy various products, and one product (not on the same shelf, is the type, model) can be purchased for several customers, thus, we have a Many-To-Many relationship.
The relationship created changes according to the context. Whatever, we'll invent something crazy here as examples. Say we have customers and products. Say you want to persist a situation where customers lick Products (something really crazy, just for you to see how the context says the relationship).
There would be an intimate connection between customers and products entities (really close... I think...). In this case, the relationship represents a history of customers licking products. This would generate an EVENT. In this event you could put properties such as the date, the amount of times a customer licked a proper product, the weather, the time, the traffic light color on the street, etc., only what you need to persist according to your context, your needs.
Remember that for N-N relationships created, we need to see if new entities (out of relationship) will emerge. This usually happens when you are decomposing the conceptual model to the logical model. Probably, product orders will generate not one but two entities: The ORDER and the products of orders. It is within the products of orders that you place the list of products ordered from each customer, and the quantity.
I would like to present various materials to study ERD, but unfortunately they are all in Portuguese. I hope I have helped you in some way. If you want to be more specific about your problem, I think I can really help you best. Anything, please ask.

How to create an Order Model with a type field that dictates other fields

I'm building a Ruby on Rails App for a business and will be utilizing an ActiveRecord database. My question really has to do with Database Architecture and really the best way I should organize all the different tables and models within my app. So the App I'm building is going to have a database of orders for an ECommerce Business that sells products through 2 different channels, a subscription service where they pick the products and sell it for a fixed monthly fee and a traditional ECommerce channel, where customers pay for their products directly. So essentially while all of these would be classified as the Order model, there are two types of Orders: Subscription Order and Regular Order.
So initially I thought I would classify all this activity in my Orders Table and include a field 'Type' that would indicate whether it is a subscription order or a regular order. My issue is that there are a bunch of fields that I would need that would be specific to each type. For instance, transaction_id, batch_id and sub_id are all fields that would only be present if that order type was a subscription, and conversely would be absent if the order type was regular.
My question is, would it be in my best interest to just create two separate tables, one for subscription orders and one for regular orders? Or is there a way that fields could only appear conditional on what the Type field is? I would hate to see so many Nil values, for instance, if the order type was a regular order.
Sorry this question isn't as technical as it is just pertaining to best practice and organization.
Thanks,
Sunny
What you've described is a pattern called Single Table Inheritance — aka, having one table store data for different types of objects with different behavior.
Generally, people will tell you not to do it, since it leads to a lot of empty fields in your database which will hurt performance long term. It also just looks gross.
You should probably instead store the data in separate tables. If you want to get fancy, you can try to implement Class Table Inheritance, in which there are actually separate but connected table for each of the child classes. This isn't supported natively by ActiveRecord. This gem and this gem might be able to help you, but I've never used either, so I can't give you a firm recommendation.
I would keep all of my orders in one table. You could create a second table for "subscription order information" that would only contain the columns transaction_id, batch_id and sub_id as well as a primary key to link it back to the main orders table. You would still want to include an order type column in the main database though to make it a little easier when debugging.
Assuming you're using Postgres, I might lean towards an Hstore for that.
Some reading:
http://www.devmynd.com/blog/2013-3-single-table-inheritance-hstore-lovely-combination
https://github.com/devmynd/hstore_accessor
Make an integer column called order_type.
In the model do:
SUBSCRIPTION = 0
ONLINE = 1
...
It'll query better than strings and whenever you want to call one you do Order:SUBSCRIPTION.
Make two+ other tables with a foreign key equal to whatever the ID of the corresponding row in orders.
Now you can keep all shared data in the orders table, for easy querying, and all unique data in the other tables so you don't have bloated models.

How to Manage Dynamic Relationships?

I need help creating an appropriate database structure that will allow me to dynamically create "fields" and "values". I plan on using the following 5 tables.
TraitCategories
Groups
TraitGroupings
People
TraitValues
TraitCategories table holds only categories (i.e. "fields") of traits -- i.e. hair color, height, etc. -- and the categories can be added/removed as desired.
Groups table holds ad hoc/dynamic group labels -- i.e. Asian, South American, etc.
TraitGroupings is the join table for TraitCategories and Groups
The People table will be linked to the Groups table via a foreign key and thus will be assigned various categories (fields) of traits by leveraging the relationship between the Groups and TraitCategories tables.
But the question is, how do I assign per person values to the trait categories/fields?
I was thinking of having each row in the TraitValues table contain person_id and trait_category_id so that there will be a relationship between the TraitValues table and both the People and TraitCategories tables. Does this approach make sense? Will this approach allow me to get trait categories and values via the People table?
You are describing a form of EAV.
I'm not sure how practical this is going to be for representing in Ruby, but in you case, the database model would look similar to this:
(Most non-key fields omitted, for brevity.)
Note how we abundantly use the identifying relationships. This is what lets us propagate GroupId down both sides of the "diamond-shaped" dependency, and merge it into a single field at the bottom, in TraitValue.
This is what ensures a person cannot have a trait, unless it is also listed for that person's group. For example, a person can have a "hair color" only if the person's group has the "hair color" as well.
BTW...
The People table will be linked to the TraitGroupings via a foreign key -- and thus will be assigned various categories (fields) of traits.
If People has a FK that directly references TraitGroupings, then a person can have at most one trait grouping and therefore at most one trait category. From the wording of your question, that desn't appear to be what you want.

To normalize or not to normalize user_ids

In my Rails application, I have a variety of database tables that contain user data. Some of these tables have a lot of rows (as many as 500,000 rows per user in some cases) and are queried frequently. Whenever I query any table for anything, the user_id of the current user is somewhere in the query - either directly, if the table has a direct relation with the user, or through a join, if they are related through some other tables.
Should I denormalize the user_id and include it in every table, for faster performance?
Here's one example:
Address belongs to user, and has a user_id
Envelope belongs to user, and has a user_id
AddressesEnvelopes joins an Address and an Envelope, so it has envelope_id and address_id -- it doesn't have user_id, but could get to it through either the envelope or the address (which must belong to the same user).
One common expensive query is to select all the AddressesEnvelopes for a particular user, which I could accomplish by joining with either Address or Envelope, even though I don't need anything from those tables. Or I could just duplicate the user id in this table.
Here's a different scenario:
Letter belongs to user, and has a user_id
Recepient belongs to Letter, and has a letter_id
RecepientOption belongs to Recepient, and has a recepient_id
Would it make sense to duplicate the user_id in both Recepient and RecepientOption, even though I could always get to it by going up through the associations, through Letter?
Some notes:
There are never any objects that are
shared between users. An entire
hierarchy of related objects always
belongs to the same user.
The user owner of objects never changes.
Database performance is important because it's a data intensive application. There are many queries and many tables.
So should I include user_id in every table so I can use it when creating indexes? Or would that be bad design?
I'd like to point out that it isn't necessary to denormalize, if you are willing to work with composite primary keys. Sample for AddressEnvelop case:
user(
#user_id
)
address(
#user_id
, #addres_num
)
envelope(
#user_id
, #envelope_num
)
address_envelope(
#user_id
, #addres_num
, #envelope_num
)
(the # indicates a primary key column)
I am not a fan of this design if I can avoid it, but considering the fact that you say that all these objects are tied to a user, this type of design would make it relatively simply to partition your data (either logically, put ranges of users in separate tables or physically, using multiple databases or even machines)
Another thing that would make sense with this type of design is using clustered indexes (in MySQL, the primary key of InnoDB tables are built from a clustered index). If you ensure the user_id is always the first column in your index, it will ensure that for each table, all data for one user is stored close together on disk. This is great when you always query by user_id, but it can hurt perfomance if you query by another object (in which case duplication like you sugessted may be a better solution)
At any rate, before you change the design, first make sure your schema is already optimized, and you have proper indexes on your foreign key columns. If performance really is paramount, you should simply try several solutions and do benchmarks.
As long as you
a) get a measurable performance improvement
and
b) know which parts of your database are real normalized data and which are redundant improvements
there is no reason not to do it!
Do you actually have a measured performance problem? 500 000 rows isn't very large table. Your selects should be reasonable fast if they are not very complex and you have proper indexes on your columns.
I would first see if there are slow queries and try to optimize them with indexes. If that is not enough, only then I would look into denormalization.
Denormalizations that you suggest seem reasonable if you can't achieve the required performance with other means. Just make sure that you keep denormalized fields up-to-date.

Resources