Rails: Foreign Keys and when to index - ruby-on-rails

Three questions:
When you create a model, is a foreign_key automatically created as well?
I'm thinking I should add_index when a column is unique to the table or unique in general, or when the column will relate to other databases, Amirite?
What will an index look like? Will it just use the contents of the cell?

1) Do you mean when using a generator? Generally speaking, you should generate migrations, rather than use a generator for the whole model/scaffolding. And then, no, a foreign key is not automatically created, only if you specify it.
2) add_index is going to come in handy for columns on big tables that need to be accessed quickly by your database. Let's say you've got a users table with an email column that must be unique, but isn't indexed. And your service grows, now you have millions of users, and you need to go User.find_by_email "someone#example.com". Without an index, that's going to take you a while. With an index, it'll be quick. That's when an index comes in handy.
3) Really depends on your database engine afaik. Not something that will affect your day-to-day imho (though if you have a specific database engine in mind, you can certainly find out). Here's the info on MySQL, straight from the source: https://dev.mysql.com/doc/refman/5.5/en/column-indexes.html

Related

Why Rails hides the existence of id column?

I don't quite understand the need to hide the existence of id column in Rails.
It is neither reflected in migration file nor the schema.rb file.
There is no way for a newbie to know for the fact that a column named id has been created by default as a primary key.
Unless they go and check the actual schema of the table in database (rails dbconsole).
I can see the timestamps macro included by default in the migration file as well as in schema.rb as two fields created_at and updated_at. Here, a developer at least gets a clue. Rails could have done the same for id column too. But it doesn't.
Why the secrecy around id column? Is it a part of the famous convention over configuration? Or is it a norm across all MVC frameworks?
In database design it is generally accepted that numeric id's are preferred, because
they are easier to index, and thus easier to "follow" or check when creating links (foreign keys).
when editing/updating records, you have a unique (and efficient) identifier
So therefore it is advised to give all tables a unique numeric key, always.
Now this numeric key has no meaning whatsoever to your application, it is a "implementation detail" of your database layer. Also to make sure every table has an id, unless you explicitly ask not to.
I think this would indeed fall under the "convention over configuration" nomer: why explicitly specify an id for each table if you each table should have one.
The timestamps is different: this is interesting for some tables, but for same tables it is not important at all. It also depends on your application.
Note that this is not at all related to MVC. The M in MVC is a container for data, but in MVC it is actually not really important how the Model gets filled. In other words: the ORM part is not part of MVC. You will see that in most MVC implementations there is no ORM, or definitely not as tightly integrated as with Rails.
So in short: imho ommitting the 'id' from the migration is not a secret, it is just to make life easier, saves you some more typing, and it makes sure you follow a good convention unless you explicitly do not want to.
This is probably due to the fact that relational databases tend to use integer primary keys, and doing otherwise introduces complexities. I guess the reason it's hidden in rails is so that creating tables with integer primary keys does not require any special configuration, and having to write it into rails migrations invites inexperienced developers to play around with it (which is probable not a good idea).
Additionally, I think rails tries to abstract away things like numeric ids, if you want to create associations in a migration you do not need to specify foreign keys, you can simply write the name of the object you want to relate the table to.
I never thinked about the id field because almost every table have an id....
Check the documentation about migrations where they say:
A primary key column called id will also be added implicitly, as it's
the default primary key for all Active Record models. The timestamps
macro adds two columns, created_at and updated_at. These special
columns are automatically managed by Active Record if they exist.
If you want to check your table columns, just go to rails console and type Model.column_names
I think it's clear that if you don't add a primary key, then rails will add one generic key for you so that it can index your record and have a control over it, so basically it isn't stating that there WILL be an ID field, since I don't believe this has to be imperative, but rather optional in the event you do not provide a primary key.
It's a Rails convention to hide the I'd attribute to discourage and remove temptation of playing with it.
id attr is automatically generated with auto_increment to ad normalization to your data(to make each record unique an accessible). Injecting your own values would eventually corrupt and break the magic of ActiveRecord.

Improve performance of Rails application by adding indexes?

In my Rails application I have an index view that lists all of my projects.
This list can be sorted by clicking on any of the table column headers, e.g. Date, Name, updated_at etc. This happens by appending a &sort= GET parameter to the URL.
My question is: From a performance point-of-view, would it be advisable to add indexes to these columns in my database?
This is what a migration might look like:
class AddMoreIndexes < ActiveRecord::Migration
def change
add_index :projects, :date
add_index :projects, :name
add_index :projects, :update_at
end
end
Will I get any performance gains from this?
Indexes can be used to speed an order-by, but if you were identifying a subset of rows to display then an index that is helpful for that is likely to be chosen in preference. You'd need composite indexes in such a situation.
There're a couple of other problems.
Firstly, ordering on an indexed string value may require a linguistically sorted index, not the regular ASCII/Binary sort, so multilingual applications may not be helped at all.
Secondly, it can discourage normalisation of the database because you really need the display values to be in the table you're selecting.
You might like to look at using another method for the sort. I've been very happy with using Google visualisation tables, which come with JQuery sorting built in.
Depending on how you query your database, then yes, it will give you performance gains. For example, whenever I add a foreign key to a table, I immediately index by it. Why? I know queries will be running through it in my application. If not, I wouldn't have put a foreign key. In this way, especially when you accumulate a large amount of data in your database, it will definitely give performance gains (sometimes, by an incredible amount). If you plan to query your database by date, name, or updated at, then yes, it could potentially be a performance gain depending on your query. Otherwise, there really is no point.
Note, you wouldn't want to add an index for every column. Having necessary indices will help you, but if you have an index for every column, then you run the risk of confusing the SQL Query Optimizer and actually hindering your performance.
My suggestion: Add an index for every foreign key you have in your table, but if you're also running some heavy queries with other columns, then add an index there too.

One polymorphic association vs many through/HABTM associations

I am working on a project that currently has tons of HABTM associations. Essentially, everything is related to everything else. I am considering setting up a single intermediate table/model that has two polymorphic fields. This way, if I add another model I can easily connect it to the remaining models. Is this a good idea? If not, why not? If it is, why don't all rails projects have this kind of intermediate table?
I see two other options. I could keep adding intermediate tables or I could add a table that contains one of each type. The former option is kind of a hassle and the latter option does not allow for self joins.
While a polymorphic join table sounds like it would make things easier, I think you will end up creating more headache for yourself than it's worth. Here are a few potential challenges/problems off the top of my head:
You will not be able to use ActiveRecord's has_and_belongs_to_many association or related helpers without a ton of hacking/monkeypatching which will immediately eclipse the time it would take to setup individual pairwise link tables.
Your join table will have two id columns, let's call them a_id and b_id. For any given pair of models you will have to ensure that the ids always end up in the same column.
Example: If you have two models called User and Role, you would have to ensure for that pair that the user_id is always stored in col a_id and the role_id is always stored in col b_id, otherwise you will not be able to index the table in any kind of meaningful way (and will run the risk of defining the same relationship twice).
If you ever want to use database enforcement of FOREIGN KEY constraints it is unlikely that this polymorphic link table scheme will be supported.
The universal link table will get n times larger than n separate link tables. It shouldn't matter much with good indexing but as your application and data grow this could become a headache and limit some of your options in regards to scaling. Give your DB a break.
Most or least importantly (I can't decide) you will be bucking the norm which means a lot fewer (if any) resources out there to help you when you run into trouble. Basically the Adam Sandler "they're all gonna laugh at you" rationale.
Last thought: Can you eliminate any of the link tables by using has_many :xxx, :through => :xxx relationships?
Thinking it all through, you could actually do this, but I wouldn't. Join tables grow fast enough as it is and i like to keep model relationships simple and easy to alter.
I'm used to working on very large systems / data sets though, so if you're going going to have much in each join then ok. I'd still do it separately for joins however and i really like my polymorphics.
I think it would be cleaner and more flexible if you were to use multiple join tables as opposed to one giant multipurpose join table.

Dynamically creating new Active Record models and database tables

I am not sure exactly what I should name this question. I just started server-side programming and I need some help.
All the tutorials I have read so far on RoR deal with creating a pre-defined table and with pre-defined fields (id, name, email, etc etc). They use ActiveRecord as base class and saving to db is handled automatically by superclass.
What I am trying to program is something that allows user-defined table with fields. So think of this way. The web UI will have an empty table, the user will name the table, and add columns (field), and after that, add rows, and then later save it. How would I implement this? I am not asking for details, just an overview of it. As I said, all the tutorials I have read so far deal with pre-defined tables with fields where the ActiveRecord subclass is predefined.
So in a nutshell, I am asking, how to create tables in db on runtime, and add fields to the tables.
Hope I was clear, if not, please let me know and i will try to elaborate a bit more.
Thanks.
Unless you're building a DB administration tool (and even maybe then), allowing the user direct access to the database layer in the way you're suggesting is probably a bad idea. Apart from issues of stability and security, it'll get really slow if your users are creating lots of tables.
For instance, if you wanted to search for a certain value across 100 of your users' tables, you'd have to run 100 separate queries. The site would get exponentially slower the more user tables that were created.
A saner way to do it might be to have a Table model like this
class Table < ActiveRecord::Base
has_many :fields
has_many :rows
end
Every table would have fields attached to it, and rows to store the corresponding data (which would be encoded somehow).
However, as #Aditya rightly points out, this is not really beginner stuff!
I agree with previous answers generally speaking. It's not clear from your question why you want to create a table at runtime. It's not really obvious what the advantage of doing this would be. If you are just trying to store data that seems to fit into a table with rows and columns, why not just store it as an array in a field of your user table. If your user is allowed to create many tables, then you could have something like
class User < ActiveRecord::Base
has_many :tables
end
and then each table might have a field to store a serialized array. Or you could go with Alex's suggestion - the best choice really depends on what you are going to do with the data, how often it changes, whether you need to search it and so on ...
You can create a database as shown in tutorials which stores name of tables and their columns name those your user want. Then you can have worker (which can be build using Redis and Resque, here is simple Tut on Resque and Redis) and have those worker run migration (write migration with variables and use params to replace them) for you for new table in DB as soon as new entry is made in database. Tell me if you have questions on this.

Thinking Sphinx - Foreign key with different type - Association problem

I have two tables on mysql: users, and management. The users table has a numeric id, and the management table has a varchar foreign key which is the primary key of the other table. The types are not the same, and this seems to be the main problem when I build an index from the User model, and try to include one column from the management table. The join that thinkinx sphinx generates takes too much damn time to execute, and thus the index never gets done.
I know the best solution is to change the management table and use a numeric id, but right now that seems to be too expensive. Is there a way to just tell thinking sphinx that the varchar field is in fact a numeric id, so the index could be generated without altering the tables?
If this is not clear, please ask me to clarify whatever seems too obscure.
Thanks!
I'd make sure you have a database index on your foreign key.
Also, if you want to edit the generated configuration, you can do so, and then process the index using one of the two options, which doesn't automatically regenerate the file:
rake ts:index INDEX_ONLY=true
rake ts:reindex # this was only added the other day

Resources