Rails 3: what does "a model with a uniquely indexed column" exactly mean - ruby-on-rails

"a model with a uniquely indexed column"
Does this mean just a model and a column with a unique validation on the column? Or does it mean the column needs add_index in the migration?
And could you explain what exactly it means to create an add_index. Such as if you have an Authors model, with a name column. What does adding an index to 'name' accomplish?
Thanks.

I am taking it to mean that the model has a column that is guaranteed to be unique and that there is an index on it. I take it you are reading about models in general in Rails.
A unique column means that no two models (such as User1 and User2) can have the same value for that column. For example, users would have unique logins. No two users should exist that have the same login (or username or email). But Rails automatically gives models an ID column that is always unique. Unless you change it, the first record will have ID 1, then 2, then 3, etc.
An index on a column means that it is easier to find that column. Think of a an encyclopedia. There is so much information in there, but the appendix (like an index) helps you quickly find what you are looking for. There may be an appendix of key terms, and then it will tell you where to quickly find it. That's what an index on a column does.
So "a model with a uniquely indexed column" in Rails, by default, is the ID column: it is unique and will automatically get an index on it to more quickly find records.
Extra: when you make a model with a foreign key (example: model User may have a gender_id, and you may have a table called Gender that defines Male and Female and the gender_id corresponds to a Gender object), then you should add an index to that foreign key to make searches on it faster.
More information: http://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/SchemaStatements.html#method-i-add_index

Related

Split fact table because of one missing foreign key?

Imagine that we have two different messages:
CarDataLog
CarStatusLog
CarDataLog contains data which has a direct relation to a car and the corresponding Person and contains data about the car.
CarStatusLog contains data about the same car as mentioned above which had a customer in the log included. But this time the data is a status. For a field like: "CleaningState": "NotCleaned" or "Cleaned".
Both of the log messages contain a Car_ID. Would we create one Fact table with the foreign keys to Car and Person and have the risk the person_id is null sometimes because it is not given.. Or would a better approach be to create two fact tables with the risk of having the 'grain' spreaded out?
The use case would be: get data for a specific car, including the states it had and the Person first name.
I am new to data warehousing and I hope someone can assist me with this issue?
A standard practice in data warehousing is to make a dummy row for dimension tables that is used to match "UNKNOWN" data. This prevents NULLS in the foreign keys in the fact table.
Depending on your use case, you may have multiple types of "UNKNOWN" data. For example, you could use a key of -1 for "UNKNOWN" and -2 for "NOT APPLICABLE" dimensional data.
See also: https://www.kimballgroup.com/2010/10/design-tip-128-selecting-default-values-for-nulls/
You need dims as Car_dim, Person_dim, Status_dim (as values CleaningState,NotCleaned" or "Cleaned), and Date_dim. Person_dim can have a row of "Unknown" person name when you get a null person name.
Dim and Fact tables have parent/child relationship that means you have to load data in Dim first (Dim is a parent) and then you load into a Fact (child) table.
Load dim IDs from above Dims in your Fact table based on the data you get. Make sure the 2 logs you have date fields in them so you can join both logs on a Car_id and when a date in both logs matches for that Car_id.
If you get a scenario when a Car_id exists in CarDataLog but not in CarStatusLog, then you need to create a row of "Unknown Status" in the Status_dim so you can use it in the Fact table. Good Luck!

Add references to a chosen column different from id

Is it possible to add references to a column different from the id column?
Usually when a relationship between two models (Model1 and Model2) is created, the use of model1:references and model2:references for the creation of the Relationship model automatically adds a model1_id and model2_id column (along with an index and a foreign key reference) for use in the model1/model2 association:
rails generate Relationship model1:references model2:references
Say for instance Model1 = Teacher and Model2 = Pupil.
Suppose that Model2's records (pupils' records) are updated every now and then with a rake task: the values of its attributes (for instance name and school_credits) would change, preserving id and ranking (1 to 100).
Associate a teacher with a pupil_id would not have much sense.
Each teacher should be instead associated with his/her pupils' names using as a foreign key reference the attribute pupil.name instead of pupil.id.
Is that possible?
What options can I add to the command rails generate Relationship or what reference am I supposed to add to have this result?
Yes, you can. Check sections on foreign_key and primary_key from the following link. I don't use generator so I cannot comment on which options to pass into generator, but you just need to ensure that the column to be used as foreign key exists in your table and that you assign appropriate foreign_key in the model files.
But why do you need it? I don't understand what kind of use case you might have that would require you to keep id and ranking identical.

Entity Relationship Diagram: How to create a Yelp-kind of app with not just one price-range?

Im new to Rails and I'm in the middle of sketching up an ERD for my new app. A Yelp-sort of app, where a Client is sorted by price.
So I want one Client to have many priceranges - One Client can both have pricerange $ and Pricerange $$$$ for example. The priceranges are:
$ - $$ - $$$ - $$$$ - $$$$$
How would this look in a table? Would I create a table called PriceRange with Range1, Range2, Range3, Range4, Range5 to be booleans?
Doesn't the PriceRange-table need any foreign/primary keys?
PriceRange
Range1 (Boolean)
Range2 (Boolean)
Range3 (Boolean)
Range4 (Boolean)
Range5 (Boolean)
Look, I'm Brazilian and I'm not very knowledgeable about yelp applications. I do not quite know what it is, but from what I saw, they are systems to assess/measure/evaluate (perhaps the translation is wrong here for you) things, in this case, companies, right?
Following this logic, let's think...
By the description of your problem (context), you have clients (companies), and they can have price ranges, correct? If:
A price interval is represented by textual names, such as "$", "$$",
and so on,
and the same price range may have (numeric) values for different companies,
And the same price range (type) can be (or not) assigned to different
companies,
Then here is what we have:
By decomposing this conceptual model, you would end up with three tables:
Companies
Price Ranges
Price Ranges from Companies
The primary keys of Company and Price Ranges will be passed to Price Ranges from Companies as foreign keys. You can use them as a composite primary key, or use a surrogate key. If using a surrogate key, you will permit/allow a company to have the same kind of price range more than once, which I believe is not the case.
Let's look at another situation, if things are simpler as:
If there is no need to store prices,
and an company may have or not one or more price ranges represented by "$", "$$", and so on,
Then here is what we have:
Similarly, we'll have the same 3 tables. Likewise, you still must pass the primary keys of Companies and Price Ranges to Price Ranges from Companies as foreign keys.
So I want one Client to have many priceranges - One Client can both
have pricerange $ and Pricerange $$$$ for example
Notice how N-N relationships allow us to create optional relationships between entities. This will allow a company to have zero, one, two, (etc.) or all price ranges defined. Again, so that is not allowed a company to have a price range more than once, set the foreign keys as composite primary key in Price Ranges from Companies.
If you have any questions or anything I explained has nothing to do with your context, please do not hesitate to comment.
EDIT
Is the Price ranges from companies what is called a Joint table?
Yes. There are also other terms used, some in different areas of computer science, such as Link Table, or Intermediate Table.
Actually we do not have a table here in the diagram, but an entity. In the Conceptual Model there are no tables, but entities and relationships. Be careful with this terminology when developing the Conceptual Model, or else you may get confused (I say this from experience).
However, yes, once decomposed, we will have a table from this relationship. When decomposed, N-N relationships will always become tables, no exception. Differently, 1-1 and 1-N (or N-1) relationships do not become tables. These tables with these special names (Join/Link/Intermediate Tables) serves to associate records from different tables, hence the name.
And is it necessary to have a column called Price Range Id? I mean
what is it there for?
At where? If you say at the Price Ranges entity, it is rather necessary. Must We not identify records in a table in some way? Here I set what is called a Surrogate Key. If on the other hand, you have a column with unique values for each record in the table, you can also use this column. I highly recommend that you consider the use of surrogate keys. Read the link I gave you.
In the Conceptual Model, we have to define the properties and also the primary keys. During the phase of the conceptual model, natural attributes of entities can become primary keys if you so desire. In this case, we have what is called a Natural Key.
If on the other hand you refer to Price Ranges from Companies entity, so the question is another ("And is it necessary to have a column called Price Range Id?"). Here we have a table with two columns, as I told you. The two are foreign keys. You need it so you can relate rows from the two tables... I think you were not referring to that, is not it? If so, no problem, you can comment and ask more questions. I do not care to answer. To be honest, I did not quite understand your question.
EDIT 2
So that Company 28 can be identified in the Price Ranges (for instance
ID 40) Which would make it easier to call out the price ranges it has?
Maybe my English is not very good, but it seems to me that you have a beginner's doubt/question in relation to the concept of tables and relationships between them. If not that, I apologize because maybe I did not understand. But let's see...
The tables in a database have rows / records. Each line has its own data. Even with this, each line / record needs to be differentiated and identified somehow. That is why we attach to each line an identifier, known as the primary key (this, and this). In summary, the primary key is how we identify, differentiate, separate and organize different records.
Even if all records have different values, you must select a field (column) that represents the primary key of the table. By obligation, every record MUST have a primary key. Although you can choose which field is a primary key, you are allowed to choose one or more fields to serve as the primary key. When this happens, that is, when more than one field participates/serves as the primary key, we have a table with something called Composite Primary Key. Similarly, it has the ability to identify records. Note that, because of that, primary key values must be unique, otherwise you may have 2 identical records.
This is the basic concept so that we can relate tables to each other, in case, records/rows of tables together. If we have a Company identified by the ID 28 (a line/record), and we want to relate it to a Price Range identified by the ID 40, then we need to store somewhere that relationship (28 <--> 40). This is where the role of intermediate/link/join tables comes in (but only to relationships N-N! For 1-N or N-1 relationships it works similarly, but not identical).
My original question was whether it was necessary, and why a company
ID had to link up with a price range ID at all.
With this table storing records which relates to other records (for their primary keys), we can perform a SQL join operation (If you have questions about this, see this image). Depending on how you perform this operation, you'll get:
All companies that have Price Ranges.
All companies that do not have Price Ranges.
All the Price Ranges of a given company.
All companies that have or not a X Price Range.
All price ranges that are given or not to companies.
...
Anyway, you get all this because of the established relationship.
If it could just be taken out and then the table of price ranges would
only involve Pricerange1-5.
This sentence I did not understand. What should be taken out? Could you please explain this sentence better?

Constructing a 1-many relationship with custom string foreign keys in PGSQL ActiveRecord

I have the following tables (Showing only the relevant fields):
lots
history_id
histories
initial_date
updated_date
r_doc_date
l_doc_date
datasheet_finalized_date
users
username
So I am rebuilding an exisiting application that dealt with a rather large amount of bureaucracy, and needs to keep track of five separate dates (as shown in the histories table). The problem that I am having is that I don't know how best to model this in ActiveRecord, historically it's been done by having the histories tables represented as so:
histories
initial_date
updated_date
r_doc_date
l_doc_date
datasheet_finalized_date
username
Where only one of the five date fields could ever be filled at one time...which in my opinion is a terrible way to go about modeling this...
So basically I want to build a unique queryable connection between every date in the histories table and its specific relevant user. Is it possible to use every timestamp in the histories table as a foreign key to query the specific user?
I think that there's a simpler approach to what you're trying to accomplish. It sounds like you want to be able to query each lot and find the 'relevant user' (I am guessing that this refers to the user who did whatever action is necessary to update the specific column on the histories table). To do this I would first create a join table between users and histories, called user_histories:
user_histories
user_id
history_id
I would create a row on this table any time a lot's history is updated and one of the relevant dates changes. But that now brings up the issue of being able to differentiate which specific date-type the user actually changed (since there are five). Instead of using each one as a foreign key (since they wouldn't necessarily be unique) I would recommend creating a 'history_code' on the user_histories table to represent each one of the history date-types (much like how a polymorphic_type is used). Resulting in the user_histories table looking like this:
user_histories
user_id
history_id
history_code
And an example record looking like this:
UserHistory.sample = {
user_id: 1,
history_id: 1,
history_code: "Initial"
}
Allowing you to query the specific user who changed a record in the histories table with the following:
history.user_histories.select { |uhist| hist.history_code == "Initial" }
I would recommend building these longer queries out into model methods, allowing for a faster, cleaner query down the line, for example:
#app/models/history.rb
def initial_user
self.user_histories.select { |uhist| hist.history_code == "Initial" }
end
This should give you the results you want, but should get around the whole issue of the dates not being suitable for foreign keys, since you can't guarantee their uniqueness.

Postgresql depend on record id

I am working on an application design, using Ruby on Rails and Postgresql. I have a table with the following fields
Table: account_type
Fields: id(primary key), name(String)
AccountType name is unique string (so am thinking about putting unique constraints on it). Depending on the name (type) I'm going to make some checks in my Models. Something like that:
def urban?
self.name == 'Some long type'
end
The question is: do I leave it like that? Or, as the other option, I can depend on some ID. So, assuming that my 'Some long type' is always created with ID=1, I can check for
def urban?
self.id == 1
end
Is it a good practice if I do depend on the ID? What about readability? Are there other solutions to that problem?
The second example is a text-book case of how NOT to use surrogate keys
Your real primary key is account_type and should have a unique key. There is always endless debate about the 'goodness' of using auto-inc id columns for primary keys. To query by id depends on how the rows have been inserted. Querying by account_type.name is immutable.
Readability? the id field gives no information to what the record really means.
Other Solutions? I don't really see what problem is, but you could also use an enum type (but it is much less flexible than a lookup table.)

Resources