How do you normally sort items in Rails? - ruby-on-rails

I have a little example Rails app called tickets, which views and edits fictional tickets sold to various customers. In tickets_controller.rb, inside def index, I have this standard line, generated by scaffolding:
#tickets = Ticket.find(:all)
To sort the tickets by name, I have found two possible approaches. You can do it this way:
#tickets = Ticket.find(:all, :order => 'name')
... or this way:
#tickets = Ticket.find(:all).sort!{|t1,t2|t1.name <=> t2.name}
(Tip: Ruby documentation explains that sort! will modify the array that it is sorting, as opposed to sort alone, which returns the sorted array but leaves the original unchanged).
What strategy do you normally use? When might you use .sort! versus the :order => 'criteria' syntax?

Use :order => 'criteria' for anything simple that can be done by the database (ie. basic alphabetical or chronological order). Chances are it's a lot faster than letting your Ruby code do it, assuming you have the right indexes in place.
The only time I could think you should use the sort method is if you have a complex attribute that's calculated at run-time and not stored in the database, like a 'trustworthiness value' based off number of good/bad responses or something. In that case it's better to use the sort method, but be aware that this will screw things up if you have pagination in place (each page will have ITS results in order, but the set of pages as a whole will be out of order).

I specify an order in the ActiveRecord finder or in the model association because sorting using SQL is faster. You should take advantage of the features offered by the RDBMS when you're able to do so.

Related

Index virtual attribute Thinking Sphinx

I'm wondering how one might index a virtual attribute on a model with Thinking Sphinx. Given a Project model and some instance method which returns a boolean derived by some other information from another model, say Users, whose attribute is derived and is not on the project table in the database.
For example, suppose we have a method is_user_eligible such that we can query Project.first.is_user_eligible, and get a true or false response. This works in the ORM already.
How can I index this virtual attribute with Thinking Sphinx? I'm able to index virtual attributes in my django project which is on Haystack backed by Elasticsearch. I facilitated this by having a #property decorator on the model method. I figured I should be able to do this with Rails/ThinkingSphinx too, yet I get all sorts of bizarre SQL errors when trying to index. I've tried all sorts of various constructions in setting up my index (e.g. has -vs- indexes) and all result in some sort of SQL error while indexing.
Is this possible with Thinking Sphinx? If so, how can I index a virtual attribute?
You've made it clear that the value is not available as a column on the projects table, but is it on an associated model instead? If so, then you could refer to it via the association:
has user.is_eligible, :as => :is_user_eligible
However, if it's not a column, but can be determined within the context of the SQL query, then you can use a SQL snippet as the attribute definition (I know my example is rather contrived, but should give you some idea):
has "(users.foo = 'bar' || users.baz = 'qux')",
:as => :is_user_eligible,
:type => :boolean
If you're referring to associations that aren't used elsewhere in the index definition, you can force the references, or provide a SQL join statement:
join users
# or through more than one association:
join users.addresses
# or via your own custom join:
join "INNER JOIN users ON users.project_id = projects.id"
But if you cannot determine this value via SQL at all, then the only way to do this with Thinking Sphinx is use real-time indices instead of SQL-backed indices. What this then means is that instead of referring to associations and columns in your index definitions, you refer to methods instead. So, your attribute would become:
has is_user_eligible, :type => :boolean
The type must be specified - SQL indices can guess attribute types due to column types, but real-time indices don't have that reference point.
I realise the link to the real-time indices feature is a blog post I wrote over two years ago. However, the feature certainly works - I and others have been using it in production for quite some time (including with Flying Sphinx).
On the topic of has vs indexes: if you want to use the value as a filter or for sorting, then it must be an attribute, and thus you should use the has method. However, if it's textual data that you expect search queries to match on, then it should be a field, and thus use the indexes method.
Certainly I'd recommend switching to real-time indices anyway: it removes the need for deltas and you get up-to-date Sphinx records without needing to run 'ts:index' regularly (or at all - use ts:generate should your data end up in an out-of-date state). But make sure you switch all index definitions to real-time, instead of having some real-time and others SQL-backed.

Rails subquery reduce amount of raw SQL

I have two ActiveRecord models: Post and Vote. I want a make a simple query:
SELECT *,
(SELECT COUNT(*)
FROM votes
WHERE votes.id = posts.id) AS vote_count
FROM posts
I am wondering what's the best way to do it in activerecord DSL. My goal is to minimize the amount of SQL I have to write.
I can do Post.select("COUNT(*) from votes where votes.id = posts.id as vote_count")
Two problems with this:
Raw SQL. Anyway to write this in DSL?
This returns only attribute vote_count and not "*" + vote_count. I can append .select("*") but I will be repeating this every time. Is there an much better/DRY way to do this?
Thanks
Well, if you want to reduce amount of SQL, you can split that query into smaller two end execute them separately. For instance, the votes counting part could be extracted to query:
SELECT votes.id, COUNT(*) FROM votes GROUP BY votes.id;
which you may write with ActiveRecord methods as:
Vote.group(:id).count
You can store the result for later use and access it directly from Post model, for example you may define #votes_count as a method:
class Post
def votes_count
##votes_count_cache ||= Vote.group(:id).count
##votes_count_cache[id] || 0
end
end
(Of course every use of cache raises a question about invalidating or updating it, but this is out of the scope of this topic.)
But I strongly encourage you to consider yet another approach.
I believe writing complicated queries like yours with ActiveRecord methods — even if would be possible — or splitting queries into two as I proposed earlier are both bad ideas. They result in extremely cluttered code, far less readable than raw SQL. Instead, I suggest introducing query objects. IMO there is nothing wrong in using raw, complicated SQL when it's hidden behind nice interface. See: M. Fowler's P of EAA and Brynary's post on Code Climate Blog.
How about doing this with no additional SQL at all - consider using the Rails counter_cache feature.
If you add an integer votes_count column to the posts table, you can get Rails to automatically increment and decrement that counter by changing the belongs_to declaration in Vote to:
belongs_to :post, counter_cache: true
Rails will then keep each Post updated with the number of votes it has. That way the count is already calculated and no sub-query is needed.
Maybe you can create mysql view and just map it to new AR model. It works similar way to table, you just need to specify with set_table_name "your_view_name"....maybe on DB level it will work faster and will be automatically re-calculating.
Just stumbled upon postgres_ext gem which adds support for Common Table Expressions in Arel and ActiveRecord which is exactly what you asked. Gem is not for SQLite, but perhaps some portions could be extracted or serve as examples.

Duplicating logic in methods and scopes (and sql)

Named scopes really made this problem easier but it is far from being solved. The common situation is to have logic redefined in both named scopes and model methods.
I'll try to demonstrate the edge case of this by using somewhat complex example. Lets say that we have Message model that has many Recipients. Each recipient is being able to mark the message as being read for himself.
If you want to get the list of unread messages for given user, you would say something like this:
Message.unread_for(user)
That would use the named scope unread_for that would generate the sql which will return the unread messages for given user. This sql is probably going to join two tables together and filter messages by those recipients that haven't already read them.
On the other hand, when we are using the Message model in our code, we are using the following:
message.unread_by?(user)
This method is defined in message class and even it is doing basically the same thing, it now has different implementation.
For simpler projects, this is really not a big thing. Implementing the same simple logic in both sql and ruby in this case is not a problem.
But when application starts to get really complex, it starts to be a problem. If we have permission system implemented that checks who is able to access what message based on dozens of criteria defined in dozens of tables, this starts to get very complex. Soon it comes to the point where you need to join 5 tables and write really complex sql by hand in order to define the scope.
The only "clean" solution to the problem is to make the scopes use the actual ruby code. They would fetch ALL messages, and then filter them with ruby. However, this causes two major problems:
Performance
Pagination
Performance: we are creating a lot more queries to the database. I am not sure about internals of DMBS, but how harder is it for database to execute 5 queries each on single table, or 1 query that is going to join 5 tables at once?
Pagination: we want to keep fetching records until specified number of records is being retrieved. We fetch them one by one and check whether it is accepted by ruby logic. Once 10 of them are accepted, process will stop.
Curious to hear your thoughts on this. I have no experience with nosql dbms, can they tackle the issue in different way?
UPDATE:
I was only speaking hypotetical, but here is one real life example. Lets say that we want to display all transactions on the one page (both payments and expenses).
I have created SQL UNION QUERY to get them both, then go through each record, check whether it could be :read by current user and finally paginated it as an array.
def form_transaction_log
sql1 = #project.payments
.select("'Payment' AS record_type, id, created_at")
.where('expense_id IS NULL')
.to_sql
sql2 = #project.expenses
.select("'Expense' AS record_type, id, created_at")
.to_sql
result = ActiveRecord::Base.connection.execute %{
(#{sql1} UNION #{sql2})
ORDER BY created_at DESC
}
result = result.map do |record|
klass = Object.const_get record["record_type"]
klass.find record["id"]
end.select do |record|
can? :read, record
end
#transactions = Kaminari.paginate_array(result).page(params[:page]).per(7)
end
Both payments and expenses need to be displayed within same table, ordered by creation date and paginated.
Both payments and expenses have completely different :read permissions (defined in ability class, CanCan gem). These permission are quite complex and they require querieng several other tables.
The "ideal" thing would be to write one HUGE sql query that would do return what I need. It would made pagination and everything else a lot easier. But that is going to duplicate my logic defined in ability.rb class.
I'm aware that CanCan provides a way of defining the sql query for the ability, but the abilities are so complex, that they couldn't be defined in that way.
What I did is working, but I'm loading ALL transactions, and then checking which ones I could read. I consider it a big performance issue. Pagination here seems pointless because I'm already loading all records (it only saves bandwidth). An alternative is to write really complex SQL that is going to be hard to maintain.
Sounds like you should remove some duplication and perhaps use DB logic more. There's no reason that you can't share code between named scopes between other methods.
Can you post some problematic code for review?

How will ActiveRelation affect rails' includes() 's capabilities?

I've looked over the Arel sources, and some of the activerecord sources for Rails 3.0, but I can't seem to glean a good answer for myself as to whether Arel will be changing our ability to use includes(), when constructing queries, for the better.
There are instances when one might want to modify the conditions on an activerecord :include query in 2.3.5 and before, for the association records which would be returned. But as far as I know, this is not programmatically tenable for all :include queries:
(I know some AR-find-includes make t#{n}.c#{m} renames for all the attributes, and one could conceivably add conditions to these queries to limit the joined sets' results; but others do n_joins + 1 number of queries over the id sets iteratively, and I'm not sure how one might hack AR to edit these iterated queries.)
Will Arel allow us to construct ActiveRecord queries which specify the resulting associated model objects when using includes()?
Ex:
User :has_many posts( has_many :comments)
User.all(:include => :posts) #say I wanted the post objects to have their
#comment counts loaded without adding a comment_count column to `posts`.
#At the post level, one could do so by:
posts_with_counts = Post.all(:select => 'posts.*, count(comments.id) as comment_count',
:joins => 'left outer join comments on comments.post_id = posts.id',
:group_by => 'posts.id') #i believe
#But it seems impossible to do so while linking these post objects to each
#user as well, without running User.all() and then zippering the objects into
#some other collection (ugly)
#OR running posts.group_by(&:user) (even uglier, with the n user queries)
Why don't you actually use AREL at its core. Once you get down to the actual table scope you can use Arel::Relation which is COMPLETELY different from ActiveRecord implementation itself. I truly believe that the ActiveRecord::Relation is a COMPLETELY different (and busted) implementation of a wrapper around an Arel::Relation & Arel::Table. I choose to use Arel at its core by either doing Thing.scoped.table (Arel::Table) which is the active record style OR Arel::Table.new(:table_name) which gives me a fresh Arel::Table (my preferred method). From this you can do the following.
posts = Arel::Table.new(:thing, :as => 'p') #derived relation
comments = Arel::Table.new(:comments, :as => 'c') # derived relation
posts_and_comments = posts.join(comments).on( posts[:id].eq(:comments[:id]) )
# now you can iterate through the derived relation by doing the following
posts_and_comments.each {...} # this will actually return Arel::Rows which is another story.
#
An Arel::Row returns a TRUE definition of a tuple from the set which will consist of an Arel::Header (set of Arel::Attributes) and a tuple.
Also slightly more verbose, the reason why I use Arel at its core is because it truly exposes the relational model to me which is the power behind ActiveRelation. I have noticed that ActiveRecord is exposing like 20% of what Arel has to offer and I am affraid that developers will not realize this NOR will they understand the true core of Relational Algebra. Using the conditions hash is to me "old school" and an ActiveRecord style programming in a Relational Algebra world. Once we learn to break away from the Martin Fowler model based approach and adopt the E.F. Codd Relational Model based approach this is actually what RDBMS have been trying to do for decades but gotten very wrong.
I've taken the liberty to start a seven part learning series on Arel and Relational Algebra for the ruby community. These will consist of short videos ranging from absolute beginner to advanced techniques like self referencing relations and closure under composition. The first video is at http://Innovative-Studios.com/#pilot Please let me know if you need more information or this was not descriptive enough for you.
The future looks bright with Arel.
ActiveRecord::Relation is a fairly weak wrapper around Base#find_by_sql, so :include queries are not extended in any way by its inclusion.
Isn't
Post.includes([:author, :comments]).where(['comments.approved = ?', true]).all
what you're looking for? (taken from the official docs)

Best practices for getting a list of IDs from an ActiveRecord model

I have an ActiveRecord model Language, with columns id and short_code (there are other columns, but they are not relevant to this question). I want to create a method that will be given a list of short codes, and return a list of IDs. I do not care about associations, I just need to end up with an array that looks like [1, 2, 3, ...].
My first thought was to do something like
def get_ids_from_short_codes(*short_codes)
Language.find_all_by_short_code(short_codes.flatten, :select => 'id').map(&:id)
end
but I'm not sure if that's needlessly wasting time/memory/processing.
My question is twofold:
Is there a way to run an ActiveRecord find that will just return an array of a certain table column rather than instantiating objects?
If so, would it actually be worthwhile to collect an array of length n rather than instantiating n ActiveRecord objects?
Note that for my specific purpose, n would be approximately 200.
In Rails 3.x, you can use the pluck method which returns the values from the requested field without instantiating objects to hold them.
This would give you an array of IDs:
Language.where(short_code: short_codes.flatten).pluck(:id)
I should mention that in Rails 3.x you can pluck only one column at a time but in Rails 4 you can pass multiple columns to pluck.
By the way, here's a similar answer to a similar question
Honestly, for 200 records, I wouldn't worry about it. When you get to 2000, 20,000, or 200,000 records - then you can worry about optimization.
Make sure you have short_code indexed in your table.
If you are still concerned about performance, take a look at the development.log and see what the database numbers are for that particular call. You can adjust the query and see how it affects performance in the log. This should give you a rough estimate of performance.
Agree with the previous answer, but if you absolutely must, you can try this
sql = Language.send(:construct_finder_sql, :select => 'id', :conditions => ["short_code in (?)", short_codes])
Language.connection.select_values(sql)
A bit ugly as it is, but it doesn't create in-memory objects.
if you're using associations you can get raw ids directly from ActiveRecord.
eg.:
class User < ActiveRecord::Base
has_many :users
end
irb:=> User.find(:first).user_ids
irb:>> [1,2,3,4,5]
Phil is right about this, but if you do find that this is an issue. You can send a raw SQL query to the database and work at a level below ActiveRecord. This can be useful for situations like this.
ActiveRecord::Base.connection.execute("SQL CODE!")
Benchmark your code first before you resort to this.
This really is a matter of choice.
Overkill or not, ActiveRecord is supposed to give you objects since it's an ORM. And Like Ben said, if you do not want objects, use raw SQL.

Resources