The Stage
Lets talk about the most common type of association we encounter.
I have a User which :has_many Post(s)
class User < ActiveRecord::Base
has_many :posts
end
class Post < ActiveRecord::Base
belongs_to :user
end
Problem Statement
I want to do some (very light and quick) processing on all the posts of a user. I am looking for the best way to structure my code to achieve it. Below are a couple of ways and why they work or don't work.
Method 1
Do it in the User class itself.
class User < ActiveRecord::Base
has_many :posts
def process_posts
posts.each do |post|
# code of whatever 'process' does to posts of this user
end
end
end
Post class remains the same:
class Post < ActiveRecord::Base
belongs_to :user
end
The method is called as:
User.find(1).process_posts
Why doesn't this look the best way to do it
The logic of doing something with the posts of the user should really belong to the Post class. In a real world scenario, a user might also have :has_many relations with a lot of other classes e.g. orders, comments, children etc.
If we start adding similar process_orders, process_comments, process_children (yikes) methods to the User class, it'll result in one giant file with lots of code much of which could (and should) be distributed to where it belongs i.e. the target associations.
Method 2
Proxy Associations and Scopes
Both of these constructs require addition of methods/code to the User class which again makes it bloated. I'd rather have all implementation shifted to the target classes.
Method 3
Class Method on target Class
Create class methods in the target class and call those methods on the User object.
class User < ActiveRecord::Base
has_many :comments
# all target specific code in target classes
end
class Post < ActiveRecord::Base
belongs_to :user
# Class method
def self.process
Post.all.each do |post| # see Note 2 below
# code of whatever 'process' does to posts of this user
end
end
end
The method is called as:
User.find(1).posts.process # See Note 1 below
Now, this looks and feels better than Method 1 and 2 because:
User model remains clutter free.
The process function is called process instead of process_posts. Now we can have a process for other classes as well and invoke them as: User.find(1).orders.process etc. instead of User.find(1).process_orders (Method 1).
Note 1:
Yes you can call a class method like this on a association. Read why here. TL;DR is that User.find(1).posts returns a CollectionProxy object which has access to class methods of the target (Post) class. It also conveniently passes a scope_attributes which stores the user_id of the user which called posts.process. This comes handy. See Note 2 below.
Note 2:
For people not sure whats going on when we do a Post.all.each in the class method, it returns all the posts of the user this method was called on as against all the posts in the database.
So when called as User.find(99).posts.process, Post.all executes:
SELECT "notes".* FROM "posts" WHERE "posts"."user_id" = $1 [["user_id", 99]]
which are all the posts for User ID: 99.
Per #Jesuspc's comment below, Post.all.each can be succinctly written as all.each. Its more idiomatic and doesn't make it look like we are querying all posts in the database.
The Answer I am looking for
Explains what is the best way to handle such associations. How do people do it normally? and if there are any obvious design flaws in Method 3.
There's a fourth option. Move this logic out of the model entirely:
class PostProcessor
def initialize(posts)
#posts = posts
end
def process
#posts.each do |post|
# ...
end
end
end
PostProcessor.new(User.find(1).posts).process
This is sometimes called the Service Object pattern. A very nice bonus of this approach is that it makes writing tests for this logic really simple. Here's a great blog post on this and other ways to refactor "fat" models: http://blog.codeclimate.com/blog/2012/10/17/7-ways-to-decompose-fat-activerecord-models/
Personally, I think that Method 1 is the cleanest one. It will be very clean and understandable write something like this:
Class User < ActiveRecord::Base
has_many :posts
def process_posts
posts.each do |post|
post.process
end
end
end
And put all the logic of process method in Post model (with an instance variable):
Class Post < ActiveRecord::Base
belongs_to :user
def process
# Logic of your Post process
end
end
That way, the very logic of a Post process belong to Post class. Even if your User model will have many "process" functions, these will be very basic and small. That seems very clean to me, as a developer.
Method 3 has many technical implications that are pretty complex and unintuitive (yourself had to clarify your question).
NOTE: If you want better performance, maybe you should use eager loading to reduce ActiveRecord calls, but that is out of the scope of this question.
First of all excuse me for the opinionated answer.
ActiveRecord models are a controversial matter. Its essence is against the Single responsibility principle since they handle both database interaction via class methods and domain objects (which use to implement their own behaviour) via its instances. At the same time they also break the Liskov Substitution Principle because the models are not sub cases of ActiveRecord::Base and implement their own set of methods. And finally the ActiveRecord paradigm often leads to code that breaks the Law of Demeter, as in your proposal for the third method:
User.find(1).posts.process
Thus, there is a trend that in order to reduce coupling would recommend to use ActiveRecord objects only to interact with the database and therefore no behaviour should be added to them (in your case the process method). Under my point of view that is the lesser evil, even though it is still not a perfect solution.
So if I were to implement what you describe I would have a ProcessablePostsCollection object (where the name Processable can be customised to better describe what the processing is about, or even neglected completely so you would simple have a PostsCollection class) that would probably be a wrapper over a list of posts using SimpleDelegator and would have a method process.
class ProcessablePostsCollection < SimpleDelegator
def self.from_collection(collection)
new collection
end
def initialize(source)
super source
end
def process
# code of whatever 'process' does to posts
end
end
And the usage would be something like:
ProcessablePostsCollection.from_collection(User.find(1).posts).process
even though the from_collection and the call to process should happen in different clases.
Also, in case you have a big posts table it would probably be wise to process stuff in batches. For that your process method could call find_in_batches on your posts ActiveRecord::Relation.
But as always it depends on your needs. If you are simply building a prototype is perfectly fine to let your models grow fat, and if you are building an enormous application Rails itself is probably not going to be the best choice since discourages some OOP best practises with things such as ActiveRecord models.
You shouldn't be putting this in the User model - put it in Post (unless - of course - the scope of process involves the User model directly) :
#app/models/post.rb
class Post < ActiveRecord::Base
def process
return false if post.published?
# do something
end
end
Then you can use an ActiveRecord Association Extension to add the functionality to the User model:
#app/models/user.rb
class User < ActiveRecord::Base
has_many :posts do
def process
proxy_association.target.each do |post|
post.process
end
end
end
end
This will allow you to call...
#user = User.find 1
#user.posts.process
Related
Motivation
The motivation was that I want to embed the serialization of any model that have been included in a Relation chain. What I've done works at the relation level but if I get one record, the serialization can't take advantage of what I've done.
What I've achieved so far
Basically what I'm doing is using the method includes_values of the class ActiveRecord::Relation, which simply tells me what things have been included so far. I.e
> Appointment.includes(:patient).includes(:slot).includes_values
=> [:patient, :slot]
To take advantage of this, I'm overwriting the as_json method at the ActiveRecord::Relation level, with this initializer:
# config/initializers/active_record_patches.rb
module ActiveRecord
class Relation
def as_json(**options)
super(options.merge(include: includes_values)) # I could precondition this behaviour with a config
end
end
end
What it does is to add for me the option include in the as_json method of the relation.
So, the old chain:
Appointment.includes(:patient).includes(:slot).as_json(include: [:patient, :slot])
can be wrote now without the last include:
Appointment.includes(:patient).includes(:slot).as_json
obtaining the same results (the Patient and Slot models are embedded in the generated hash).
THE PROBLEM
The problem is that because the method includes_values is of the class ActiveRecord::Relation, I can't use it at the record level to know if a call to includes have been done.
So currently, when I get a record from such queries, and call as_json on it, I don't get the embedded models.
And the actual problem is to answer:
how to know the included models in the query chain that retrieved the
current record, given that it happened?
If I could answer this question, then I could overwrite the as_json method in my own Models with:
class ApplicationRecord < ActiveRecord::Base
self.abstract_class = true
extend Associations
def as_json(**options)
super(options.merge(include: included_models_in_the_query_that_retrieved_me_as_a_record))
end
end
One Idea
One Idea I have is to overwrite the includes somewhere (could be in my initializer overwriting directly the ActiveRecord::Relation class, or my ApplicationRecord class). But once I'm there, I don't find an easy way to "stamp" arbitrary information in the Records produced by the relation.
This solution feels quite clumsy and there might be better options out there.
class ApplicationRecord < ActiveRecord::Base
def as_json(**options)
loaded_associations = _reflections.each_value
.select { |reflection| association(reflection.name).loaded? }
.map(&:name)
super(options.merge(include: loaded_associations))
end
end
Note that this only loads 1st level associations. If Appointment.includes(patient: :person) then only :patient will be returned since :person is nested. If you plan on making the thing recursive beware of circular loaded associations.
Worth pointing out is that you currently merge include: ... over the provided options. Giving a user no choice to use other include options. I recommend using reverse_merge instead. Or swap the placements around {includes: ...}.merge(options).
I was recently working on a project where I faced a dilemma of choosing between two ways of getting same results. Here is the class structure:
class Book < ApplicationRecord
belongs_to :author
end
class Author < ApplicationRecord
has_many :books
end
An author has first name, last name. I want to get the full name of the author for a given book as an instance method.
In simple active record terms, since book is associated with author, we can get the author name for a book as follows:
For example in Book class, we have:
class Book < ApplicationRecord
belongs_to :author
def author_name
"#{author.first_name} #{author.last_name}"
end
end
And we get the result!
But, according to the target of minimizing dependencies (POODR Book), future ease of change and better object oriented design, the book should not know properties of an author. It should interact with an author object by interfaces.
So Book should not be the one responsible for getting the Author name. The author class should.
class Book < ApplicationRecord
belongs_to :author
def author_name
get_author_name(self.author_id)
end
private
#minimizing class dependecies by providing private methods as external interfaces
def get_author_name(author_id)
Author.get_author_name_from_id(author_id)
end
end
class Author < ApplicationRecord
has_many :books
#class methods which provides a gate-way for other classes to communicate through interfaces, thus reducing coupling.
def self.get_author_name_from_id(id)
author = self.find_by_id(id)
author == nil ? "Author Record Not Found" : "#{author.first_name.titleize} #{author.last_name.titleize}"
end
end
Now, book is just interacting with the public interface provided by Author and Author is handling the responsibility of getting full name from its properties which is a better design for sure.
I tried running the queries as two separate methods in my console:
class Book < ApplicationRecord
def author_name
get_author_name(self.author_id)
end
def author_name2
"#{author.last_name} + #{author.first_name}"
end
end
The results are shown below:
Looks like both run the same queries.
My questions are
Does rails convert author.last_name called inside the Book class to
the same SQL query as Author.find_by_id(author_id).last_name called inside
Author class (through message passing from Book class) in case of bigger data size?
Which one is more performant in case of bigger data size?
Doesn't calling author.last_name from Book class violates design
principles ?
It's actually much more common and simplier to use delegation.
class Book < ApplicationRecord
belongs_to :author
delegate :name, to: :author, prefix: true, allow_nil: true
end
class Author < ApplicationRecord
has_many :books
def name
"#{first_name.titleize} #(last_name.titleize}"
end
end
As to performance, if you join the authors at the time of the book query you end up doing a single query.
#books = Book.joins(:author)
Now when you iterate through #books and you call individually book.author_name no SQL query needs to be made to the authors table.
1) Obviously not, it performs JOIN of books & authors tables. What you've made requires 2 queries, instead of 1 join you'll have book.find(id) and author.find(book.author_id).
2) JOIN should be faster.
3) Since last_name is a public interface, it absolutely doesn't violate design principles. It would violate principles if you were accessing author's last name from outside like that: Book.find(1).author.last_name - that's a bad thing. Correct is: Book.find(1).authors_last_name - and accessing author's name inside Model class.
Your provided example seems to be overcomplicated to me.
According to the example you shared, you only want to get full name of the book's author. So, the idea of splitting responsibility is correct, but in Author class should be simple instance method full_name, like:
class Author < ApplicationRecord
has_many :books
def full_name
"#{author.first_name.titleize} #{author.last_name.titleize}"
end
end
class Book < ActiveRecord::Base
belongs_to :author
def author_name
author.full_name
end
end
Note, there're no direct queries in this code. Once you'll need the author's name somewhere (in a view, in api response, etc), Rails will make the most optimized query possible (depends on your use case though, it may be ineffective for example, if you call iterate over books and call author in a loop)
I prefer the second approach because the full_name is property of author not a book. If the book wants to access that information, it can using book.author&.full_name (& is for handling cases of books with no authors).
but I would suggest a refactoring as below:
class Book < ApplicationRecord
belongs_to :author
end
class Author < ApplicationRecord
has_many :books
def full_name
"#{firstname} #{lastname}"
end
end
Does rails convert author.last_name called inside the Book class to the same SQL query as Author.find_by_id(author_id).last_name called inside Author class (through message passing from Book class) in case of bigger data size?
Depend upon the calling factor, like in your example both will generate the same query. But if you have a include\join clause while getting the Book/Author, both will generate different queries.
As per the rails convention, Author.find_by_id(author_id).last_name is not recommended as it will always fire a query on database whenever the method is called. One should use the rails' association interface to call the method on related object which is smart to identify the object from memory or fetch it from database if not in memory.
Which one is more performant in case of bigger data size?
author.last_name is better because it will take care of joins, include, and memoization clauses if used and avoid the N+1 query problem.
Doesn't calling author.last_name from Book class violates design principles?
No, you can even use delegate like #Steve Suggested.
In my experience, it's a balancing act between minimizing code complexity and minimizing scalability issues.
However, in this case, I think the simplest solution that would separate class concerns and minimize code would be to simply use: #book.author.full_name
And in your Author.rb define full_name in Author.rb:
def full_name
"#{self.first_name} #{self.last_name}"
end
This will simplify your code a lot. For example, if in the future you had another model called Magazine that has an Author, you don't have to go define author_name in the Magazine model as well. You simply use #magazine.author.full_name. This will DRY up your code nicely.
I have multiple models that in practice are created and deleted together.
Basically I have an Article model and an Authorship model. Authorships link the many to many relation between Users and Articles. When an Article is created, the corresponding Authorships are also created. Right now, this is being achieved by POSTing multiple times.
However, say only part of my request works. For instance, I'm on bad wifi and only the create article request makes it through. Then my data is in a malformed half created, half not state.
To solve this, I want to send all the data at once, then have Rails split up the data into the corresponding controllers. I've thought of a couple ways to do this. The first way is having controllers handle each request in turn, sort of chaining them together. This would require the controllers to call the next one in the chain. However, this seems sorta rigid because if I decide to compose the controllers in a different way, I'll have to actually modify the controller code itself.
The second way splits up the data first, then calls the controller actions with each bit of data. This way seems more clean to me, but it requires some logic either in the routing or in a layer independent of the controllers. I'm not really clear where this logic should go (another controller? Router? Middleware?)
Has anybody had experience with either method? Is there an even better way?
Thanks,
Nicholas
Typically you want to do stuff like this -- creating associated records on object creation -- all in the same transaction. I would definitely not consider breaking up the creation of an Authorship and Article if creating an Authorship is automatic on Article creation. You want a single request that takes in all needed parameters to create an Article and its associated Authorship, then you create both in the same transaction. One way would be to do something like this in the controller:
class Authorship
belongs_to :user
belongs_to :article
end
class Article
has_many :authorships
has_many :users, through: :authorships
end
class ArticlesController
def create
#article = Article.new({title: params[:title], stuff: [:stuff]...})
#article.authorships.build(article: #article, user_id: params[:user_id])
if #article.save
then do stuff...
end
end
end
This way when you hit #article.save, the processing of both the Article and the Authorship are part of the same transaction. So if something fails anywhere, then the whole thing fails, and you don't end up with stray/disparate/inconsistent data.
If you want to assign multiple authorships on the endpoint (i.e. you take in multiple user id params) then the last bit could become something like:
class ArticlesController
def create
#article = Article.new({title: params[:title], stuff: [:stuff]...})
params[:user_ids].each do |id|
#article.authorships.build(article: #article, user_id: id)
end
if #article.save
then do stuff...
end
end
end
You can also offload this kind of associated object creation into the model via a virtual attribute and a before_save or before_create callback, which would also be transactional. But the above idiom seems more typical.
I would handle this in the model with one request. If you have a has_many relationship between Article and Author, you may be able to use accept_nested_attributes_for on your Article model. Then you can pass Authorship attributes along with your Article attributes in one request.
I have not seen your code, but you can do something like this:
model/article.rb
class Article < ActiveRecord::Base
has_many :authors, through: :authorship # you may also need a class_name: param
accepts_nested_attributes_for: :authors
end
You can then pass Author attributes to the Article model and Rails will create/update the Authors as required.
Here is a good blog post on accepts_nested_attributes_for. You can read about it in the official Rails documentation.
I would recommend taking advantage of nested attributes and the association methods Rails gives you to handle of this with one web request inside one controller action.
I'm looking for some best-practice advice for the following situation.
I have the following skeleton ActiveRecord models:
# user.rb
class User < ActiveRecord::Base
has_many :country_entries, dependent: destroy
end
# country_entry.rb
class CountryEntry < ActiveRecord::Base
belongs_to :user
validates :code, presence: true
end
Now suppose I need to get a comma-separated list of CountryEntry codes for a particular user. The question is, where do I put this method? There are two options:
# user.rb
#...
def country_codes
self.country_entries.map(&:code)
end
#...
-or-
# country_entry.rb
#...
def self.codes_for_user(user)
where(user_id: user.id).map(&:code)
end
#...
And so the APIs would be: #current_user.country_codes -or- CountryEntry.codes_for_user(#current_user)
Seems like placing the code in country_entry.rb decouples everything a little more, but it makes the API a little uglier. Any general or personal-experience best practices on this issue?
Instance method VS Class method: If the method is for an instance, of course it is better to be an instance method.
In user model VS in Coutry model: User model wins. Law of Demeter suggests one dot only in Ruby. If you have chance to do that, of course it's better to follow.
Conclusion: Your first method wins.
# user.rb
def country_codes
self.country_entries.map(&:code)
end
Add: Reference for Law of Demeter
http://en.wikipedia.org/wiki/Law_of_Demeter
http://rails-bestpractices.com/posts/15-the-law-of-demeter
http://devblog.avdi.org/2011/07/05/demeter-its-not-just-a-good-idea-its-the-law/
Now this is really an interesting question. And it has so many answers ;-)
From your initial question I would suggest you put the code in the association itself
class User < ActiveRecord::Base
has_many :country_entries do
def codes
proxy_association.owner.country_entries.map(&:code)
end
end
end
so you could do something like this
list_of_codes = a_user.country_entries.codes
Now obviously this is a violation of the Law of Demeter.
So you would best be advised to offer a method on the User object like this
class User < ActiveRecord::Base
has_many :country_entries do
def codes
proxy_association.owner.country_entries.map(&:code)
end
end
def country_codes
self.country_entries.codes
end
end
Obviously nobody in the Rails world cares about the Law of Demeter so take this with a grain of salt.
As for putting the code into the CountryEntry class I am not sure why you would do this. If you can look up country codes only with the user I dont see the need to create a class method. You are anyway only able to look that list up if you have a User at hand.
If however many different objects can have a country_entries association than it makes sense to put it as a class method into CountryEntry.
My favorite would be a combination of LOD and a class method for reuse purposes.
class User < ActiveRecord::Base
has_many :country_entries
def country_codes
CountryEntry.codes_for_user(self)
end
end
class CountryEntry < ActiveRecord::Base
belongs_to :user
validates :code, presence: true
def self.codes_for_user(some_id)
where(ref_id: some_id).map(&:code)
end
end
In terms of API developers get from the two proposals, adding to the user model seems pretty straightforward. Given the problem:
Now suppose I need to get a comma-separated list of CountryEntry codes for a particular user.
The context is made of a user, for which we want to get the code list. The natural "entry point" seems a user object.
Another way to see the problem is in terms of responsibilities (thus linking to #robkuz entry on Demeter's). A CountryEntry instance is responsible for providing its code (and maybe a few other things). A CountryEntry class is basically responsible for providing attributes and methods common to all its instances, and no more (well). Getting the list of comma-separated codes is a specialized usage of CountryEntry instances that only User objects care of apparently. In this case, the responsibility belongs to the current user object. Value in the eye of the beholder...
This is inline with most answers on the thread, although in the solutions so far, you do not get a comma-separated list of codes, but an array of codes.
In terms of performance, note there is probably a difference too because of lazy evaluation. Just a note---someone more deeply familiar with ActiveRecord could comment on that!
I think #current_user.country_codes is a better choice in this case because it will be easier to use in your code.
I am working on a very large Rails application. We initially did not use much inheritance, but we have had some eye opening experiences from a consultant and are looking to refactor some of our models.
We have the following pattern a lot in our application:
class Project < ActiveRecord::Base
has_many :graph_settings
end
class GraphType < ActiveRecord::Base
has_many :graph_settings
#graph type specific settings (units, labels, etc) stored in DB and very infrequently updated.
end
class GraphSetting < ActiveRecord::Base
belongs_to :graph_type
belongs_to :project
# Project implementation of graph type specific settings (y_min, y_max) also stored in db.
end
This also results in a ton of conditionals in views, helpers and in the GraphSetting model itself. None of this is good.
A simple refactor where we get rid of GraphType in favor of using a structure more like this:
class Graph < ActiveRecord::Base
belongs_to :project
# Generic methods and settings
end
class SpecificGraph < Graph
# Default methods and settings hard coded
# Project implementation specific details stored in db.
end
Now this makes perfect sense to me, eases testing, removes conditionals, and makes later internationalization easier. However we only have 15 to 30 graphs.
We have a very similar model (to complicated to use as an example) with close to probably 100 different 'types', and could potentially double that. They would all have relationships and methods they inheritated, some would need to override more methods then others. It seems like the perfect use, but that many just seems like a lot.
Is 200 STI classes to many? Is there another pattern we should look at?
Thanks for any wisdom and I will answer any questions.
If the differences are just in the behavior of the class, then I assume it shouldn't be a problem, and this is a good candidate for STI. (Mind you, I've never tried this with so many subclasses.)
But, if your 200 STI classes each have some unique attributes, you would need a lot of extra database columns in the master table which would be NULL, 99.5% of the time. This could be very inefficient.
To create something like "multiple table inheritance", what I've done before with success was to use a little metaprogramming to associate other tables for the details unique to each class:
class SpecificGraph < Graph
include SpecificGraphDetail::MTI
end
class SpecificGraphDetail < ActiveRecord::Base
module MTI
def self.included(base)
base.class_eval do
has_one :specific_graph_detail, :foreign_key => 'graph_id', :dependent => :destroy
delegate :extra_column, :extra_column=, :to => :specific_graph_detail
end
end
end
end
The delegation means you can access the associated detail fields as if they were directly on the model instead of going through the specific_graph_detail association, and for all intents and purposes it "looks" like these are just extra columns.
You have to trade off the situations where you need to join these extra detail tables against just having the extra columns in the master table. That will decide whether to use STI or a solution using associated tables, such as my solution above.