Rails model columns and performance - ruby-on-rails

I am using postgresql.
I started to realise that I have created too many columns for the User model, and most of them are boolean fields.
Correct me if I am wrong, if I just update one boolean value, the whole table are being updated even though "Patch" verb is being used.
So I decided to create a specific model for some boolean columns, however, this would also trigger two queries, one for the User load and the other for the newly created model load.
My question is: Would it be better if I chop some of the columns to form a new model? Or a model with many columns just don't affect the performance of a rails app.
My main concern is the data connection speed, please advise.

Avoiding a join will be better than having two tables.
You can limit which columns are returned by using select in ActiveRecord. If you have large text fields, but don't need them at a particular time, this can be helpful in improving performance. The impact is probably negligible with boolean columns.

Related

Join an ActiveRecord model to a table in another schema with no model

I need to join an ActiveRecord model in my Ruby on Rails app to another table in a different schema that has no model. I've searched for the answer, and found parts of it, but not a whole solution in one place, hence this question.
I have Vehicle model, with many millions of rows.
I have a table (reload_cars) in another schema (temp_cars) in the same database, with a few million records. This is an ad hoc table, to be used for one ad hoc data update, and will never be used again. There is no model associated with that table.
I initially was lazy and selected all the reload_cars records into an array (reload_vins) in one query, and then in a second query did something like:
`Vehicle.where(vin_status: :invalid).where('vin in (?)', reload_vins)`.
That's simplified a bit from the actual query, but demonstrates the join I need. In other queries, I need full sets of inner and outer joins between these tables. I also need to put various selection criteria on the model table and/or the non-model table in various steps.
That blunt approach worked fine in development, but did not scale up to the production database. I thought it would take a few minutes, which is plenty fast enough for a one-time operation. But, it timed out, particularly when looping through sets of records. Small tweaks did not help.
So, I need to do a legit join.
In retrospect, the answer seems pretty obvious. This query ran pretty much instantly, and gave the exact expected result, with various criteria on each table.
Here is one such query:
Vehicle.where(vin_status: :invalid)
.joins("
join temp_cars.reload_cars tcar
on tcar.vin = vehicles.vin
where tcar.registration_id is not null
")

Creating a YAML based list vs. a model in Rails

I have an app that consists mainly of restaurant model instances. One of the essential attributes for these restaurants is labeling the cuisine it falls under. I'm currently at odds with myself in regards to designing this. On one hand I thought of creating a Cuisine model and creating either a HMT or HABTM association between Restaurants and Cuisines.
More recently I came across this post which shows how to create a pre-defined set of attributes. To take the answer one step further I'm assuming (in my case) I'd add a string-based cuisine column to my restaurant model and setup a select box in my restaurant form that would save the selected value.
What I was wondering was what would be the most efficient way of doing this? The goal is to eventually be able to query restaurants based what cuisine(s) they fall under. I wasn't sure if a model would be the best choice due to it only serving as a join table in a sense with a name attribute. Wasn't sure if having this extra table for something so minute would be optimal.
On the other hand I didn't know if using YAML for this would be conducive since the values are essentially dummy strings with no tangible records on file like I'd have with a model instance. Can someone help me sort out this confusion?
There are many benefits of normalizing many-to-many relationships in the db. Here are some:
Searching, sorting, and creating indexes is faster, since tables are narrower, and more rows fit on a data page.
You can have more clustered indexes (one per table), so you get more flexibility in tuning queries.
Index searching is often faster, since indexes tend to be narrower and shorter.
More tables allow better use of segments to control physical placement of data.
You usually have fewer indexes per table, so data modification commands are faster.
Fewer null values and less redundant data, making your database more compact.
Triggers execute more quickly if you are not maintaining redundant data.
Data modification anomalies are reduced.
Normalization is conceptually cleaner and easier to maintain and change as your needs change.
Also, by normalizing you get the cleaner syntax and other infrastructure benefits from ActiveRecord, e.g.
cuisine.restaurants.where(city: 'Toledo')

Is it possible to alter a record in rails, without first reading it

I want to alter a record, but I don't want to read it out of the database first, because frankly I don't see the point. It's huge and across a network that is slow and expensive.
Can it be done (easily)?
Lets say I have a record with 100 fields (for arguments sake) and I want to alter one field in the table. I have a really bad connection to the database (this is true) because it's housed on a different box and there's nothing I can do to change this.
Right now I pull down the record and rails validates its contents (because I have serialized bits) I then alter one field (one of the hundred depending on X condition) and save the record again. Which I suppose writes the whole record to the database again, with no knowledge of the fact that I only changed one small bit. (this last bit is assumption)
Now to change one record it's sending a huge amount of data across the network, and it could be that I'm only changing one small small thing..
Also it's doing two queries on the database. First the select * then the update..
So my question.. are there smarter base classes that do this right, to write without read?
Top of my head I would think a setter method for each field with a bool flag for changed.
When saving, walk the flags and where true... Does this happen now, if so how do I make use of it?
Rails models have the update method, which does a SQL select then an SQL update.
Its use is simple (suppose you have a model named Person)
Person.update(person_id, :user_name => 'New Name')
The drawback is that you have to know beforehand the person id.
But then, you will always have to do a query to find out that. You could write your own SQL to change the column value, with a WHERE clause that searches for the param you have.
Rails models have the update_all method, which does only a SQL update without loading the data.
But I wouldn't recommend using it because it doesn't trigger any callbacks or validations that your model may have.
You could use the update_all like this:
Book.update_all "author = 'David'", "title LIKE '%Rails%'"
ps: examples were all taken from the Rails official documentation. If you search for update, or update_all, you'll find both methods in no time.
Take a look there, it's a good place to find out what rails can do.
Rails API link
It's not obvious from the documentation, but update_all is the answer to this question.
Book.where(id: 123).update_all(title: "War and Peace")
results in exactly one query:
UPDATE `books` SET `books`.`title` = "War and Peace" WHERE `books`.`id` = 123
It's been a while since this question was asked, but this is so for Rails 4/5/6.

Ruby dynamically tied to table

I've got a huge monster of a database (Okay that's not quite true, but there are over 8 million records in one product table)..
This table is fed by 13 suppliers.
Even with the best indexing I could come up with, searching for the top 10,000 records that are ready for supplier 8, is crazy slow.
What I'd like to do is create a product table for each supplier and parse the table into smaller tables.
Now in c++ or what have you, I'd just switch the table that I'm working with inside the class.
In ruby, it seems I'll have to create a new class for each table, and do a migration.
Also as I plan to have some in session tables #, I'd be interested in getting ruby to work with them..
Oh.. 8 million and set to grow to 20 million in the next 6 months.
A question posed, was what's my db engine.. Right now it's sql, but I'm open to pushing my db to another engine, if it will mean I can use temp tables, and "partitioned" tables.
One additional point to indexing.. Indexing on fields that change frequently isn't practical. Like price and quantity.. I'd have to re-index the changed items, each time I made a change.
By Ruby, I am assuming you mean that inheriting from the ActiveRecord::Base class in a Ruby on Rails application. By convention, you are correct in that each class is meant to represent a separate table.
You can easily execute arbitrary SQL using the "ActiveRecord::Base.connection.execute" method, and passing a string that is your SQL query. This would bypass having to create separate Ruby classes that would represent transient tables. This is not the "Rails approach", however it does address your question of allowing switching of the tables inside a class file.
More information on ActiveRecord database statements can be found here: http://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/DatabaseStatements.html
However, as other people have pointed out, you should be able to optimize your query such that splitting across multiple tables is not necessary. You may want to analyze your SQL query's execution plan using various tools to optimize the execution. If you are using MySQL view check out their query execution planning functionality: http://dev.mysql.com/doc/refman/5.5/en/execution-plan-information.html
By introducing indexes, or changing join methods between tables, etc you should be able to return reduce your query execution time.

To normalize or not to normalize user_ids

In my Rails application, I have a variety of database tables that contain user data. Some of these tables have a lot of rows (as many as 500,000 rows per user in some cases) and are queried frequently. Whenever I query any table for anything, the user_id of the current user is somewhere in the query - either directly, if the table has a direct relation with the user, or through a join, if they are related through some other tables.
Should I denormalize the user_id and include it in every table, for faster performance?
Here's one example:
Address belongs to user, and has a user_id
Envelope belongs to user, and has a user_id
AddressesEnvelopes joins an Address and an Envelope, so it has envelope_id and address_id -- it doesn't have user_id, but could get to it through either the envelope or the address (which must belong to the same user).
One common expensive query is to select all the AddressesEnvelopes for a particular user, which I could accomplish by joining with either Address or Envelope, even though I don't need anything from those tables. Or I could just duplicate the user id in this table.
Here's a different scenario:
Letter belongs to user, and has a user_id
Recepient belongs to Letter, and has a letter_id
RecepientOption belongs to Recepient, and has a recepient_id
Would it make sense to duplicate the user_id in both Recepient and RecepientOption, even though I could always get to it by going up through the associations, through Letter?
Some notes:
There are never any objects that are
shared between users. An entire
hierarchy of related objects always
belongs to the same user.
The user owner of objects never changes.
Database performance is important because it's a data intensive application. There are many queries and many tables.
So should I include user_id in every table so I can use it when creating indexes? Or would that be bad design?
I'd like to point out that it isn't necessary to denormalize, if you are willing to work with composite primary keys. Sample for AddressEnvelop case:
user(
#user_id
)
address(
#user_id
, #addres_num
)
envelope(
#user_id
, #envelope_num
)
address_envelope(
#user_id
, #addres_num
, #envelope_num
)
(the # indicates a primary key column)
I am not a fan of this design if I can avoid it, but considering the fact that you say that all these objects are tied to a user, this type of design would make it relatively simply to partition your data (either logically, put ranges of users in separate tables or physically, using multiple databases or even machines)
Another thing that would make sense with this type of design is using clustered indexes (in MySQL, the primary key of InnoDB tables are built from a clustered index). If you ensure the user_id is always the first column in your index, it will ensure that for each table, all data for one user is stored close together on disk. This is great when you always query by user_id, but it can hurt perfomance if you query by another object (in which case duplication like you sugessted may be a better solution)
At any rate, before you change the design, first make sure your schema is already optimized, and you have proper indexes on your foreign key columns. If performance really is paramount, you should simply try several solutions and do benchmarks.
As long as you
a) get a measurable performance improvement
and
b) know which parts of your database are real normalized data and which are redundant improvements
there is no reason not to do it!
Do you actually have a measured performance problem? 500 000 rows isn't very large table. Your selects should be reasonable fast if they are not very complex and you have proper indexes on your columns.
I would first see if there are slow queries and try to optimize them with indexes. If that is not enough, only then I would look into denormalization.
Denormalizations that you suggest seem reasonable if you can't achieve the required performance with other means. Just make sure that you keep denormalized fields up-to-date.

Resources