In one of my views, I need to present alot of statistics about the data in the model and the nested models. For the nested models, there are alot of sums and counts with various conditions.
Is it better to write individual methods for each statistic I need, which generates lots of SQL calls but makes nice short pieces of code? Or should I write a method that just loops over each nested model once, computes all the counts/sums manually, and returns the data in a hash?
It depends, if breaking the big query into independent queries forces you to, for example, join twice a same pair of tables, or reading twice a same table, then it is not a good strategy.
If that's not the case, you may want to break the query into independent queries for better legibility and reusability.
Use EXPLAIN to analyze your queries, try different ones and test performance with considerable amount of test data. Consider creating indexes for big tables with many records.
Related
We have to create a request system which will have roughly 10 different types of requests. All of these requests will belong to the 'accounting' aspect of our application. Therefore we've called them "Accounting requests".
All requests share maybe only a few columns and each has up to 20 columns individually.
We started to wonder if having separate tables for each request type would be practical in terms of speed when we start to have to do very complicated joins or queries, for example, fetching ALL requests types into a single table and then sorting it.
Maybe it would be easier to just use Single Table Inheritance since it will have a type column and we'd be using one table to store all 10 accounting request types.
What do you think regarding using STI for this many polymorphic associations and requirements?
Essentially, it would have models like so:
AccountingRequest
BillingRequest < AccountingRequest
CheckRequest < AccountingRequest
CancellationRequest < AccountingRequest
Each subclass has roughly 10+ fields.
Currently reading about Multiple Table Inheritance here. This seems like the solution that fits my requirements in this case. Not sure yet though.
STI is a good fit if your models all share the same attributes.
However if your sub classes start having attributes specific to them and not applicable to others, then STI can result in a lot of null columns. In that case, I usually prefer to go with polymorphic association.
This railscast episode is a great example of the difference between the 2
You can use STI in that situation. But making STI will require all the columns into one single table and that's not the good think. The table will go very large in the number of fields.
I think you should divide into two tables like as below...
Request: A request table will be the polymorphic table which saved the information for the type of requests.
RequestItem: The request item table will save all the 20 fields records into the table and will have a foreign key of request table. The request item table will have two fields into the database that's called key and value.
It sounds do-able.
When I've looked into this, I found that making extensive use of value objects helped to control the non-applicability of some attributes to some of the types.
In my case I had types of products, some of which would not have particular measurements for example. In those cases I used a Null Object to indicate "Not applicable" where appropriate.
Edit: I also found the composed_of syntax very convenient: https://apidock.com/rails/ActiveRecord/Aggregations/ClassMethods/composed_of
For now I'm using a bit of NoSQL for such cases. Postgresql's JSONB type allows to store multilevel ruby hash. It also provides rich functionality: DB level constraints, indexes and query operators.
So common attributes are stored in standard way and child specific - in jsonb. Then you can use whatever you need on top of this: STI, Value Objects pattern, serialization or just create scopes for each child. I prefer the last one - my models are thin, most of constraints are DB level and all business logic is in service classes.
Pros:
Avoiding alter table on big tables when need to add one more child type
Keeping my queries efficient
Preventing storing and selecting unnecessary columns
Serialization out of the box for JSON APIs
Cons:
A bit of schemaless
Vendor lock
I have an app that consists mainly of restaurant model instances. One of the essential attributes for these restaurants is labeling the cuisine it falls under. I'm currently at odds with myself in regards to designing this. On one hand I thought of creating a Cuisine model and creating either a HMT or HABTM association between Restaurants and Cuisines.
More recently I came across this post which shows how to create a pre-defined set of attributes. To take the answer one step further I'm assuming (in my case) I'd add a string-based cuisine column to my restaurant model and setup a select box in my restaurant form that would save the selected value.
What I was wondering was what would be the most efficient way of doing this? The goal is to eventually be able to query restaurants based what cuisine(s) they fall under. I wasn't sure if a model would be the best choice due to it only serving as a join table in a sense with a name attribute. Wasn't sure if having this extra table for something so minute would be optimal.
On the other hand I didn't know if using YAML for this would be conducive since the values are essentially dummy strings with no tangible records on file like I'd have with a model instance. Can someone help me sort out this confusion?
There are many benefits of normalizing many-to-many relationships in the db. Here are some:
Searching, sorting, and creating indexes is faster, since tables are narrower, and more rows fit on a data page.
You can have more clustered indexes (one per table), so you get more flexibility in tuning queries.
Index searching is often faster, since indexes tend to be narrower and shorter.
More tables allow better use of segments to control physical placement of data.
You usually have fewer indexes per table, so data modification commands are faster.
Fewer null values and less redundant data, making your database more compact.
Triggers execute more quickly if you are not maintaining redundant data.
Data modification anomalies are reduced.
Normalization is conceptually cleaner and easier to maintain and change as your needs change.
Also, by normalizing you get the cleaner syntax and other infrastructure benefits from ActiveRecord, e.g.
cuisine.restaurants.where(city: 'Toledo')
I am using postgresql.
I started to realise that I have created too many columns for the User model, and most of them are boolean fields.
Correct me if I am wrong, if I just update one boolean value, the whole table are being updated even though "Patch" verb is being used.
So I decided to create a specific model for some boolean columns, however, this would also trigger two queries, one for the User load and the other for the newly created model load.
My question is: Would it be better if I chop some of the columns to form a new model? Or a model with many columns just don't affect the performance of a rails app.
My main concern is the data connection speed, please advise.
Avoiding a join will be better than having two tables.
You can limit which columns are returned by using select in ActiveRecord. If you have large text fields, but don't need them at a particular time, this can be helpful in improving performance. The impact is probably negligible with boolean columns.
I am relatively new to rails. I understand that rails lets you play with your database values with much ease but I am a little bit in the blind about what kind of approach is more energy efficient on the database and which not.
Here is a case in point. I have a model appointment which belongs_to user. In my syntax I can sometimes say process_user #appointment.user. When I write that, does that run a separate SELECT query on the database to retrieve that user? Is it more efficient to write process_user #appointment.user_id where user_id is an attribute in the appointment and then try use the user_id value to perform my evaluation related tasks as long as I don't need the whole user object #appointment.user.
Frankly, from a peace of mind point of view, I just love to be able to use process_user #appointment.user because it reads better, looks nicer and works better when preparing logic. Is it a performance efficient way?
You are perfectly fine with using code like process_user #appointment.user, as ActiveRecord tries its best to minimize the number of database queries. Of course it does not handle all situations perfectly, but your example is a very basic one. There would probably no immediate database query happen and the object would only be loaded when its attributes are accessed.
If you notice performance problems in a running large-scaled application and you can track the problems down to ActiveRecord using profiling, it is probably time to optimize. Trying to pre-optimize from the very beginning would be against Rails' philosophy and will only result in ugly (and possible even slower) code. Remember that the real performance bottlenecks are often at places where you would never expect them.
EDIT: As Winfield pointed out, optimizing the number of queries does usually not mean to manage foreign keys or similar internals by yourself. There are quite a number of flags and options for DB access methods that allow you to control how your database is queries.
You can eagerly load your associated users with your Appointment models:
Appointment.all(:include => :user)
...which will join in the users or do a separate lookup for all the associated users in a single query.
This will then load the user association in advance (eagerly) so the user attribute is already populated with the object when you reference it, instead of having to stop and execute a separate query to look it up one by one (N+1 queries).
I'm getting ready to start a small project that provides an opportunity to use single table inheritance. As I read through prior post on STI on Stackoverflow there seems to be some strong opinions on sides of the argument.
My application is related to my horse racing hobby. A horse's connections are defined as its current jockey, trainer and owner. The jockey, trainer and owner could be modeled using three separate tables (models/classes) or as one one class with several sub-classes through single table inheritance.
When faced with a decision like this, is there a check list of questions that one can go through to determine what approach is preferable. I'm assuming that using STI would reduce the number of potential joins. What are the other practical considerations?
There are a few things you should think about:
Are the objects, conceptually, children of a single parent?
Don't use single table inheritance just because your classes share some attributes; make sure there is actually an OO inheritance relationship between each of them and an understandable parent class.
Do you need to do database queries on all objects together?
If you want to list the objects together or run aggregate queries on all of the data, you’ll probably want everything in the same database table for speed and simplicity.
Do the objects have similar data but different behavior?
If you have a larger number of model-specific columns, you should consider polymorphic associations instead.
The article linked goes in depth a bit more.