I need help creating an appropriate database structure that will allow me to dynamically create "fields" and "values". I plan on using the following 5 tables.
TraitCategories
Groups
TraitGroupings
People
TraitValues
TraitCategories table holds only categories (i.e. "fields") of traits -- i.e. hair color, height, etc. -- and the categories can be added/removed as desired.
Groups table holds ad hoc/dynamic group labels -- i.e. Asian, South American, etc.
TraitGroupings is the join table for TraitCategories and Groups
The People table will be linked to the Groups table via a foreign key and thus will be assigned various categories (fields) of traits by leveraging the relationship between the Groups and TraitCategories tables.
But the question is, how do I assign per person values to the trait categories/fields?
I was thinking of having each row in the TraitValues table contain person_id and trait_category_id so that there will be a relationship between the TraitValues table and both the People and TraitCategories tables. Does this approach make sense? Will this approach allow me to get trait categories and values via the People table?
You are describing a form of EAV.
I'm not sure how practical this is going to be for representing in Ruby, but in you case, the database model would look similar to this:
(Most non-key fields omitted, for brevity.)
Note how we abundantly use the identifying relationships. This is what lets us propagate GroupId down both sides of the "diamond-shaped" dependency, and merge it into a single field at the bottom, in TraitValue.
This is what ensures a person cannot have a trait, unless it is also listed for that person's group. For example, a person can have a "hair color" only if the person's group has the "hair color" as well.
BTW...
The People table will be linked to the TraitGroupings via a foreign key -- and thus will be assigned various categories (fields) of traits.
If People has a FK that directly references TraitGroupings, then a person can have at most one trait grouping and therefore at most one trait category. From the wording of your question, that desn't appear to be what you want.
Related
Consider two tables Foo and Bar and consider models based on them. Now consider a one-to-one relationship between them.
Foo contains has_one :bar in it's declaration so that we're able to access Bar from Foo's objects. But then what I don't understand is why Bar needs a foreign key referencing Foo?
Shouldn't it be easier if they just compare both the ids to get the result?
I'm assuming that there will be problems with comparing both ids and I want to know what the problems are.
The problem with ids is that they store auto-incremented values. Let's consider 2 tables students and projects.
Let's assume a student can have at most 1 project. Which means he can either have a project or not.
Now consider 2 students A & B.
students table
id name
1 A
2 B
now projects table
id name
1 P1
2 NULL
in this case A has a project named as P1 but B doesn't and we're creating a null entry just to maintain and match the id of records present in projects with the students but this is not feasible in the long term. If in a school there are 1000 students then we'll have may be 500 empty rows for 500 students who are not working on a project.
That's why adding a column in projects table is a feasible solution to reduce the size of the table and maintain relationships as well and also if you're going to delete a record then the new id won't be same as the previous one as id's are auto-incremented.
now projects table
id name student_id
1 P1 1
is more feasible and flexible as well. You can make it has_many as well because a student can work on multiple projects as well.
I hope this helps you.
You can't assume that the DB engine will add the same IDs to rows in different tables. You can (I would not recommend) make an app with such behavior and implement it with triggers and constraints, but this would be a very creative (in a negative sense) approach to relational databases.
I'm trying to create a ER diagram of a simple retail chain type database model. You have your customer, the various stores, inventory etc.
My first question is, how to represent a customer placing an order in a store. If the customer is a discount card holder, the company has their name, address etc, so I can have a cardHolder entity connect to item and store with an order relationship. But how do I represent an order being placed by a customer who is not really an entity in the database?
Secondly, how are conditional... stuff represented in ER diagrams, e.g. in a car dealership, a customer may choose one or more optional extra when buying a car. I would think that there is a Car entity with the relevant attributes and the options as a multi-valued attribute, but how do you represent a user picking those options (I.e. order table shows the car ordered, extras chosen and the added cost of extras) in the order relationship?
First, do you really need to model customers as distinct entities, or do you just need order, payment and delivery details? Many retail systems don't track individual customers. If you need to, you can have a customer table with a surrogate key and unique constraints on identifying attributes like SSN or discount card number (even if those attributes are optional). It's generally hard to prevent duplication in customer tables since there's no ideal natural key for people, so consider whether this is really required.
How to model optional extras depends on what they depends on. Some extras might be make or model-specific, e.g. the choice of certain colors or manual/automatic transmission. Extended warranties might be available across the board.
Here's an example of car-specific optional extras:
car (car_id PK, make, model, color, vin, price, ...)
car_extras (extra_id PK, car_id FK, option_name, price)
order (order_id PK, date_time, car_id FK, customer_id FK, payment_id FK, discount)
order_extras (order_id PK/FK, car_id FK, extra_id PK/FK)
I excluded price totals since those can be calculated via aggregate queries.
In my example, order_extras.car_id is redundant, but supports better integrity via the use of composite FK constraints (i.e. (order_id, car_id) references the corresponding columns in order, and (car_id, extra_id) references the corresponding columns in car_optional_extras to prevent invalid extras from being linked to an order).
Here's an ER diagram for the tables above:
First, as per your thought you can definitely have two kinds of customers. Discount card holders whose details are present with the company and new customers whose details aren't available with the company.
There are three possible ways to achieve what you are trying,
1) Have two different order table in the system(which I personally wouldn't suggest)
2) Have a single Order table in the system and getting the details of those who are a discount card holder.
3) Insert a row in the discount card holder table for new/unregistered customers having only one order table in the system.
Having a single order table would make the system standardized and would be more convenient while performing many other operations.
Secondly, to solve your concern, you need to follow normalization. It will reduce the current problem faced and will also make the system redundant free and will make the entities light weighted which will directly impact on the performance when you grow large.
The extra chosen items can be listed in the order against the customer by adding it at the time of generating a bill using foreign key. Dealing with keys will result in fast and robust results instead of storing redundant/repeating details at various places.
By following normalization, the problem can be handled by applying foreign keys wherever you want to refer data to avoid problems or errors.
Preferably NF 4 would be better. Have a look at the following link for getting started with normalization.
http://www.w3schools.in/dbms/database-normalization/
Im new to Rails and I'm in the middle of sketching up an ERD for my new app. A Yelp-sort of app, where a Client is sorted by price.
So I want one Client to have many priceranges - One Client can both have pricerange $ and Pricerange $$$$ for example. The priceranges are:
$ - $$ - $$$ - $$$$ - $$$$$
How would this look in a table? Would I create a table called PriceRange with Range1, Range2, Range3, Range4, Range5 to be booleans?
Doesn't the PriceRange-table need any foreign/primary keys?
PriceRange
Range1 (Boolean)
Range2 (Boolean)
Range3 (Boolean)
Range4 (Boolean)
Range5 (Boolean)
Look, I'm Brazilian and I'm not very knowledgeable about yelp applications. I do not quite know what it is, but from what I saw, they are systems to assess/measure/evaluate (perhaps the translation is wrong here for you) things, in this case, companies, right?
Following this logic, let's think...
By the description of your problem (context), you have clients (companies), and they can have price ranges, correct? If:
A price interval is represented by textual names, such as "$", "$$",
and so on,
and the same price range may have (numeric) values for different companies,
And the same price range (type) can be (or not) assigned to different
companies,
Then here is what we have:
By decomposing this conceptual model, you would end up with three tables:
Companies
Price Ranges
Price Ranges from Companies
The primary keys of Company and Price Ranges will be passed to Price Ranges from Companies as foreign keys. You can use them as a composite primary key, or use a surrogate key. If using a surrogate key, you will permit/allow a company to have the same kind of price range more than once, which I believe is not the case.
Let's look at another situation, if things are simpler as:
If there is no need to store prices,
and an company may have or not one or more price ranges represented by "$", "$$", and so on,
Then here is what we have:
Similarly, we'll have the same 3 tables. Likewise, you still must pass the primary keys of Companies and Price Ranges to Price Ranges from Companies as foreign keys.
So I want one Client to have many priceranges - One Client can both
have pricerange $ and Pricerange $$$$ for example
Notice how N-N relationships allow us to create optional relationships between entities. This will allow a company to have zero, one, two, (etc.) or all price ranges defined. Again, so that is not allowed a company to have a price range more than once, set the foreign keys as composite primary key in Price Ranges from Companies.
If you have any questions or anything I explained has nothing to do with your context, please do not hesitate to comment.
EDIT
Is the Price ranges from companies what is called a Joint table?
Yes. There are also other terms used, some in different areas of computer science, such as Link Table, or Intermediate Table.
Actually we do not have a table here in the diagram, but an entity. In the Conceptual Model there are no tables, but entities and relationships. Be careful with this terminology when developing the Conceptual Model, or else you may get confused (I say this from experience).
However, yes, once decomposed, we will have a table from this relationship. When decomposed, N-N relationships will always become tables, no exception. Differently, 1-1 and 1-N (or N-1) relationships do not become tables. These tables with these special names (Join/Link/Intermediate Tables) serves to associate records from different tables, hence the name.
And is it necessary to have a column called Price Range Id? I mean
what is it there for?
At where? If you say at the Price Ranges entity, it is rather necessary. Must We not identify records in a table in some way? Here I set what is called a Surrogate Key. If on the other hand, you have a column with unique values for each record in the table, you can also use this column. I highly recommend that you consider the use of surrogate keys. Read the link I gave you.
In the Conceptual Model, we have to define the properties and also the primary keys. During the phase of the conceptual model, natural attributes of entities can become primary keys if you so desire. In this case, we have what is called a Natural Key.
If on the other hand you refer to Price Ranges from Companies entity, so the question is another ("And is it necessary to have a column called Price Range Id?"). Here we have a table with two columns, as I told you. The two are foreign keys. You need it so you can relate rows from the two tables... I think you were not referring to that, is not it? If so, no problem, you can comment and ask more questions. I do not care to answer. To be honest, I did not quite understand your question.
EDIT 2
So that Company 28 can be identified in the Price Ranges (for instance
ID 40) Which would make it easier to call out the price ranges it has?
Maybe my English is not very good, but it seems to me that you have a beginner's doubt/question in relation to the concept of tables and relationships between them. If not that, I apologize because maybe I did not understand. But let's see...
The tables in a database have rows / records. Each line has its own data. Even with this, each line / record needs to be differentiated and identified somehow. That is why we attach to each line an identifier, known as the primary key (this, and this). In summary, the primary key is how we identify, differentiate, separate and organize different records.
Even if all records have different values, you must select a field (column) that represents the primary key of the table. By obligation, every record MUST have a primary key. Although you can choose which field is a primary key, you are allowed to choose one or more fields to serve as the primary key. When this happens, that is, when more than one field participates/serves as the primary key, we have a table with something called Composite Primary Key. Similarly, it has the ability to identify records. Note that, because of that, primary key values must be unique, otherwise you may have 2 identical records.
This is the basic concept so that we can relate tables to each other, in case, records/rows of tables together. If we have a Company identified by the ID 28 (a line/record), and we want to relate it to a Price Range identified by the ID 40, then we need to store somewhere that relationship (28 <--> 40). This is where the role of intermediate/link/join tables comes in (but only to relationships N-N! For 1-N or N-1 relationships it works similarly, but not identical).
My original question was whether it was necessary, and why a company
ID had to link up with a price range ID at all.
With this table storing records which relates to other records (for their primary keys), we can perform a SQL join operation (If you have questions about this, see this image). Depending on how you perform this operation, you'll get:
All companies that have Price Ranges.
All companies that do not have Price Ranges.
All the Price Ranges of a given company.
All companies that have or not a X Price Range.
All price ranges that are given or not to companies.
...
Anyway, you get all this because of the established relationship.
If it could just be taken out and then the table of price ranges would
only involve Pricerange1-5.
This sentence I did not understand. What should be taken out? Could you please explain this sentence better?
I am building an app that have the following requirements:
-> A User can be a player of different teams.
-> A Team can be of a sport type.
My question is:
-> Since for each sport type I want to store different information of a Player, what would be the best way to model that?
I have thought on having several models (and tables) for each kind of Sport, for example:
Basketball_Players, Football_Players and so on, but I am not sure if that would be a good approach. How do you usually do this on RoR?
I'd say you have two options, and I don't know that it's really possible to say which is the "most correct" way to do it without knowing the details of the requirements of your application.
What's a given is that you'll have a sport table and a player table. I can say that for sure. The question is how you connect the two.
Option 1: a single join table
You could have a table called player_sport (or whatever) with a player_id column, a sport_id column, and a serialized_player_data column or something like that, where you'd keep serialized player data (JSON, perhaps) depending on the sport. Pros: simple schema. Cons: not properly normalized, and therefore subject to inconsistencies.
Option 2: a separate join table for each sport
This is what you alluded to in your question, where you have a basketball_player, football_player, etc. Here you'd also have a player_id column but probably not a sport_id column because that would be redundant now that you're specifying the sport right in the table name. The need to have a serialized_player_data column would go away, since you'd now be free to store the needed attributes directly in columns, e.g. wrestling_player.weight_class_id or whatever. Pros: proper normalization. Cons: more complex schema, and therefore more work in your application code.
There's actually a third option as well:
Option 3: a combination of 1 and 2
Here you might do everything you would do in Option 2, except you'd move the common player attributes to the player_sport table and save basketball_player, etc. for the sport-specific attributes. So weight_class_id would stay in wrestling_player but player_sport would have height, weight, and other columns that are relevant to all sports.
If you're looking for a recommendation, I would probably do Option 2, or, if it looks like there's enough overlap for it to make sense, Option 3.
I'm programming a website that allows users to post classified ads with detailed fields for different types of items they are selling. However, I have a question about the best database schema.
The site features many categories (eg. Cars, Computers, Cameras) and each category of ads have their own distinct fields. For example, Cars have attributes such as number of doors, make, model, and horsepower while Computers have attributes such as CPU, RAM, Motherboard Model, etc.
Now since they are all listings, I was thinking of a polymorphic approach, creating a parent LISTINGS table and a different child table for each of the different categories (COMPUTERS, CARS, CAMERAS). Each child table will have a listing_id that will link back to the LISTINGS TABLE. So when a listing is fetched, it would fetch a row from LISTINGS joined by the linked row in the associated child table.
LISTINGS
-listing_id
-user_id
-email_address
-date_created
-description
CARS
-car_id
-listing_id
-make
-model
-num_doors
-horsepower
COMPUTERS
-computer_id
-listing_id
-cpu
-ram
-motherboard_model
Now, is this schema a good design pattern or are there better ways to do this?
I considered single inheritance but quickly brushed off the thought because the table will get too large too quickly, but then another dilemma came to mind - if the user does a global search on all the listings, then that means I will have to query each child table separately. What happens if I have over 100 different categories, wouldn't it be inefficient?
I also thought of another approach where there is a master table (meta table) that defines the fields in each category and a field table that stores the field values of each listing, but would that go against database normalization?
How would sites like Kijiji do it?
Your database design is fine. No reason to change what you've got. I've seen the search done a few ways. One is to have your search stored procedure join all the tables you need to search across and index the columns to be searched. The second way I've seen it done which worked pretty well was to have a table that is only used for search which gets a copy of whatever fields that need to be searched. Then you would put triggers on those fields and update the search table.
They both have drawbacks but I preferred the first to the second.
EDIT
You need the following tables.
Categories
- Id
- Description
CategoriesListingsXref
- CategoryId
- ListingId
With this cross reference model you can join all your listings for a given category during search. Then add a little dynamic sql (because it's easier to understand) and build up your query to include the field(s) you want to search against and call execute on your query.
That's it.
EDIT 2
This seems to be a little bigger discussion that we can fin in these comment boxes. But, anything we would discuss can be understood by reading the following post.
http://www.sommarskog.se/dyn-search-2008.html
It is really complete and shows you more than 1 way of doing it with pro's and cons.
Good luck.
I think the design you have chosen will be good for the scenario you just described. Though I'm not sure if the sub class tables should have their own ID. Since a CAR is a Listing, it makes sense that the values are from the same "domain".
In the typical classified ads site, the data for an ad is written once and then is basically read-only. You can exploit this and store the data in a second set of tables that are more optimized for searching in just the way you want the users to search. Also, the search problem only really exists for a "general" search. Once the user picks a certain type of ad, you can switch to the sub class tables in order to do more advanced search (RAM > 4gb, cpu = overpowered).