I've looked over the Arel sources, and some of the activerecord sources for Rails 3.0, but I can't seem to glean a good answer for myself as to whether Arel will be changing our ability to use includes(), when constructing queries, for the better.
There are instances when one might want to modify the conditions on an activerecord :include query in 2.3.5 and before, for the association records which would be returned. But as far as I know, this is not programmatically tenable for all :include queries:
(I know some AR-find-includes make t#{n}.c#{m} renames for all the attributes, and one could conceivably add conditions to these queries to limit the joined sets' results; but others do n_joins + 1 number of queries over the id sets iteratively, and I'm not sure how one might hack AR to edit these iterated queries.)
Will Arel allow us to construct ActiveRecord queries which specify the resulting associated model objects when using includes()?
Ex:
User :has_many posts( has_many :comments)
User.all(:include => :posts) #say I wanted the post objects to have their
#comment counts loaded without adding a comment_count column to `posts`.
#At the post level, one could do so by:
posts_with_counts = Post.all(:select => 'posts.*, count(comments.id) as comment_count',
:joins => 'left outer join comments on comments.post_id = posts.id',
:group_by => 'posts.id') #i believe
#But it seems impossible to do so while linking these post objects to each
#user as well, without running User.all() and then zippering the objects into
#some other collection (ugly)
#OR running posts.group_by(&:user) (even uglier, with the n user queries)
Why don't you actually use AREL at its core. Once you get down to the actual table scope you can use Arel::Relation which is COMPLETELY different from ActiveRecord implementation itself. I truly believe that the ActiveRecord::Relation is a COMPLETELY different (and busted) implementation of a wrapper around an Arel::Relation & Arel::Table. I choose to use Arel at its core by either doing Thing.scoped.table (Arel::Table) which is the active record style OR Arel::Table.new(:table_name) which gives me a fresh Arel::Table (my preferred method). From this you can do the following.
posts = Arel::Table.new(:thing, :as => 'p') #derived relation
comments = Arel::Table.new(:comments, :as => 'c') # derived relation
posts_and_comments = posts.join(comments).on( posts[:id].eq(:comments[:id]) )
# now you can iterate through the derived relation by doing the following
posts_and_comments.each {...} # this will actually return Arel::Rows which is another story.
#
An Arel::Row returns a TRUE definition of a tuple from the set which will consist of an Arel::Header (set of Arel::Attributes) and a tuple.
Also slightly more verbose, the reason why I use Arel at its core is because it truly exposes the relational model to me which is the power behind ActiveRelation. I have noticed that ActiveRecord is exposing like 20% of what Arel has to offer and I am affraid that developers will not realize this NOR will they understand the true core of Relational Algebra. Using the conditions hash is to me "old school" and an ActiveRecord style programming in a Relational Algebra world. Once we learn to break away from the Martin Fowler model based approach and adopt the E.F. Codd Relational Model based approach this is actually what RDBMS have been trying to do for decades but gotten very wrong.
I've taken the liberty to start a seven part learning series on Arel and Relational Algebra for the ruby community. These will consist of short videos ranging from absolute beginner to advanced techniques like self referencing relations and closure under composition. The first video is at http://Innovative-Studios.com/#pilot Please let me know if you need more information or this was not descriptive enough for you.
The future looks bright with Arel.
ActiveRecord::Relation is a fairly weak wrapper around Base#find_by_sql, so :include queries are not extended in any way by its inclusion.
Isn't
Post.includes([:author, :comments]).where(['comments.approved = ?', true]).all
what you're looking for? (taken from the official docs)
Related
Say I have the model Item which has one Foo and many Bars.
Foo and Bar can be used as parameters when searching for Items and so Items can be searched like so:
www.example.com/search?foo=foovalue&bar[]=barvalue1&bar[]=barvalue2
I need to generate a Query object that is able to save these search parameters. I need the following relationships:
Query needs to access one Foo and many Bars.
One Foo can be accessed by many different Queries.
One Bar can be accessed by many different Queries.
Neither Bar nor Foo need to know anything about Query.
I have this relationship set up currently like so:
class Query < ActiveRecord::Base
belongs_to :foo
has_and_belongs_to_many :bars
...
end
Query also has a method which returns a hash like this: { foo: 'foovalue', bars: [ 'barvalue1', 'barvalue2' } which easily allows me to pass these values into a url helper and generate the search query.
This all works fine.
My question is whether this is the best way to set up this relationship. I haven't seen any other examples of one-way HABTM relationships so I think I may be doing something wrong here.
Is this an acceptable use of HABTM?
Functionally yes, but semantically no. Using HABTM in a "one-sided" fashion will achieve exactly what you want. The name HABTM does unfortunately insinuate a reciprocal relationship that isn't always the case. Similarly, belongs_to :foo makes little intuitive sense here.
Don't get caught up in the semantics of HABTM and the other association, instead just consider where your IDs need to sit in order to query the data appropriately and efficiently. Remember, efficiency considerations should above all account for your productivity.
I'll take the liberty to create a more concrete example than your foos and bars... say we have an engine that allows us to query whether certain ducks are present in a given pond, and we want to keep track of these queries.
Possibilities
You have three choices for storing the ducks in your Query records:
Join table
Native array of duck ids
Serialized array of duck ids
You've answered the join table use case yourself, and if it's true that "neither [Duck] nor [Pond] need to know anything about Query", using one-sided associations should cause you no problems. All you need to do is create a ducks_queries table and ActiveRecord will provide the rest. You could even opt to use has_many :through relationship if you need to do anything fancy.
At times arrays are more convenient than using join tables. You could store the data as a serialized integer array and add handlers for accessing the data similar to the following:
class Query
serialize :duck_ids
def ducks
transaction do
Duck.where(id: duck_ids)
end
end
end
If you have native array support in your database, you can do the same from within your DB. similar.
With Postgres' native array support, you could make a query as follows:
SELECT * FROM ducks WHERE id=ANY(
(SELECT duck_ids FROM queries WHERE id=1 LIMIT 1)::int[]
)
You can play with the above example on SQL Fiddle
Trade Offs
Join table:
Pros: Convention over configuration; You get all the Rails goodies (e.g. query.bars, query.bars=, query.bars.where()) out of the box
Cons: You've added complexity to your data layer (i.e. another table, more dense queries); makes little intuitive sense
Native array:
Pros: Semantically nice; you get all the DB's array-related goodies out of the box; potentially more performant
Cons: You'll have to roll your own Ruby/SQL or use an ActiveRecord extension such as postgres_ext; not DB agnostic; goodbye Rails goodies
Serialized array:
Pros: Semantically nice; DB agnostic
Cons: You'll have to roll your own Ruby; you'll loose the ability to make certain queries directly through your DB; serialization is icky; goodbye Rails goodies
At the end of the day, your use case makes all the difference. That aside, I'd say you should stick with your "one-sided" HABTM implementation: you'll lose a lot of Rails-given gifts otherwise.
I have two ActiveRecord models: Post and Vote. I want a make a simple query:
SELECT *,
(SELECT COUNT(*)
FROM votes
WHERE votes.id = posts.id) AS vote_count
FROM posts
I am wondering what's the best way to do it in activerecord DSL. My goal is to minimize the amount of SQL I have to write.
I can do Post.select("COUNT(*) from votes where votes.id = posts.id as vote_count")
Two problems with this:
Raw SQL. Anyway to write this in DSL?
This returns only attribute vote_count and not "*" + vote_count. I can append .select("*") but I will be repeating this every time. Is there an much better/DRY way to do this?
Thanks
Well, if you want to reduce amount of SQL, you can split that query into smaller two end execute them separately. For instance, the votes counting part could be extracted to query:
SELECT votes.id, COUNT(*) FROM votes GROUP BY votes.id;
which you may write with ActiveRecord methods as:
Vote.group(:id).count
You can store the result for later use and access it directly from Post model, for example you may define #votes_count as a method:
class Post
def votes_count
##votes_count_cache ||= Vote.group(:id).count
##votes_count_cache[id] || 0
end
end
(Of course every use of cache raises a question about invalidating or updating it, but this is out of the scope of this topic.)
But I strongly encourage you to consider yet another approach.
I believe writing complicated queries like yours with ActiveRecord methods — even if would be possible — or splitting queries into two as I proposed earlier are both bad ideas. They result in extremely cluttered code, far less readable than raw SQL. Instead, I suggest introducing query objects. IMO there is nothing wrong in using raw, complicated SQL when it's hidden behind nice interface. See: M. Fowler's P of EAA and Brynary's post on Code Climate Blog.
How about doing this with no additional SQL at all - consider using the Rails counter_cache feature.
If you add an integer votes_count column to the posts table, you can get Rails to automatically increment and decrement that counter by changing the belongs_to declaration in Vote to:
belongs_to :post, counter_cache: true
Rails will then keep each Post updated with the number of votes it has. That way the count is already calculated and no sub-query is needed.
Maybe you can create mysql view and just map it to new AR model. It works similar way to table, you just need to specify with set_table_name "your_view_name"....maybe on DB level it will work faster and will be automatically re-calculating.
Just stumbled upon postgres_ext gem which adds support for Common Table Expressions in Arel and ActiveRecord which is exactly what you asked. Gem is not for SQLite, but perhaps some portions could be extracted or serve as examples.
Sphinx & ThinkingSphinx are working great for me, however when a search returns back an array of results (models), I then notice in my logs that there are a large number of subsidiary SQL lookups to retrieve any associated models, these associations are defined within my model classes.
If I was just using ActiveRecord I could use the "include" feature to retrieve these associated records as part of the original search query, for example:
Booking.find_all_by_date(Date.today, :include => [:event,
:organizer, :sessions])
But I'm not sure how to implement this performance optimization in ThinkingSphinx, has anyone solved this?
You do it exactly the same way - use :include, it'll get passed through to the underlying ActiveRecord query when Thinking Sphinx translates Sphinx results to ActiveRecord objects.
Edit: Since TS v3, the :include option is now contained within the :sql option:
Booking.search(:sql => {:include => [:event, :organiser, :sessions]})
I have an ActiveRecord model Language, with columns id and short_code (there are other columns, but they are not relevant to this question). I want to create a method that will be given a list of short codes, and return a list of IDs. I do not care about associations, I just need to end up with an array that looks like [1, 2, 3, ...].
My first thought was to do something like
def get_ids_from_short_codes(*short_codes)
Language.find_all_by_short_code(short_codes.flatten, :select => 'id').map(&:id)
end
but I'm not sure if that's needlessly wasting time/memory/processing.
My question is twofold:
Is there a way to run an ActiveRecord find that will just return an array of a certain table column rather than instantiating objects?
If so, would it actually be worthwhile to collect an array of length n rather than instantiating n ActiveRecord objects?
Note that for my specific purpose, n would be approximately 200.
In Rails 3.x, you can use the pluck method which returns the values from the requested field without instantiating objects to hold them.
This would give you an array of IDs:
Language.where(short_code: short_codes.flatten).pluck(:id)
I should mention that in Rails 3.x you can pluck only one column at a time but in Rails 4 you can pass multiple columns to pluck.
By the way, here's a similar answer to a similar question
Honestly, for 200 records, I wouldn't worry about it. When you get to 2000, 20,000, or 200,000 records - then you can worry about optimization.
Make sure you have short_code indexed in your table.
If you are still concerned about performance, take a look at the development.log and see what the database numbers are for that particular call. You can adjust the query and see how it affects performance in the log. This should give you a rough estimate of performance.
Agree with the previous answer, but if you absolutely must, you can try this
sql = Language.send(:construct_finder_sql, :select => 'id', :conditions => ["short_code in (?)", short_codes])
Language.connection.select_values(sql)
A bit ugly as it is, but it doesn't create in-memory objects.
if you're using associations you can get raw ids directly from ActiveRecord.
eg.:
class User < ActiveRecord::Base
has_many :users
end
irb:=> User.find(:first).user_ids
irb:>> [1,2,3,4,5]
Phil is right about this, but if you do find that this is an issue. You can send a raw SQL query to the database and work at a level below ActiveRecord. This can be useful for situations like this.
ActiveRecord::Base.connection.execute("SQL CODE!")
Benchmark your code first before you resort to this.
This really is a matter of choice.
Overkill or not, ActiveRecord is supposed to give you objects since it's an ORM. And Like Ben said, if you do not want objects, use raw SQL.
I have a little example Rails app called tickets, which views and edits fictional tickets sold to various customers. In tickets_controller.rb, inside def index, I have this standard line, generated by scaffolding:
#tickets = Ticket.find(:all)
To sort the tickets by name, I have found two possible approaches. You can do it this way:
#tickets = Ticket.find(:all, :order => 'name')
... or this way:
#tickets = Ticket.find(:all).sort!{|t1,t2|t1.name <=> t2.name}
(Tip: Ruby documentation explains that sort! will modify the array that it is sorting, as opposed to sort alone, which returns the sorted array but leaves the original unchanged).
What strategy do you normally use? When might you use .sort! versus the :order => 'criteria' syntax?
Use :order => 'criteria' for anything simple that can be done by the database (ie. basic alphabetical or chronological order). Chances are it's a lot faster than letting your Ruby code do it, assuming you have the right indexes in place.
The only time I could think you should use the sort method is if you have a complex attribute that's calculated at run-time and not stored in the database, like a 'trustworthiness value' based off number of good/bad responses or something. In that case it's better to use the sort method, but be aware that this will screw things up if you have pagination in place (each page will have ITS results in order, but the set of pages as a whole will be out of order).
I specify an order in the ActiveRecord finder or in the model association because sorting using SQL is faster. You should take advantage of the features offered by the RDBMS when you're able to do so.