Efficiency of querying a database with STI - ruby-on-rails

I was curious how rails queries a table that uses STI. For example, if my parent class is Book and I have two subclasses ComicBook and Novel if I do something like
Novel.all.each
since there's only one table, does the server shift through all of the comic books as well? Is appropriate indexing automatically added to prevent this? Thanks

Well, you can't do Novel.each, each isn't defined on model classes. You can however do Novel.all.each ... where ... is some block.
As for how the query works, just call to_sql on any ARel expression. Novel.all will return the collection of models itself, so you need to go a step further to ensure a valid ARel expresion is returned by calling scoped.
[1] pry(main)> Novel.scoped.to_sql
=> "SELECT \"books\".* FROM \"books\" WHERE \"books\".\"type\" IN ('Novel')"
Indexing by most columns that are queried against frequently is a good thing to consider. Yes, without the index your rdbms will have to look at all records in the table as part of the above condition check.

Related

Rails scope complexity

I have a model to which I need to create a default scope. I am unsure of the best way to write this scope but I will explain how it needs to work.
Basically I need to get all items of the model and if two items have the same "order" value then it should look to the "version" field (which will contain, 1, 2, 3 etc) and pick the one with the highest value.
Is there a way of achieving this with just a scope?
Try this code:
scope :group_by_order, -> { order('order ASC').group('order') }
default_scope, { (group_by_order.map{ |key,values| values.order('version DESC') }.map{|key, values| values - values[1..-1]}).values.flatten }
Explanation Code:
order by "order" field.
group by "order" field.
map on the result hash, and order each values by "version" field
map again on values, and remove from index "1" to the end.
get all values, and flatten them
A word of caution using default scopes with order. When you performs updated on the collection such as update_all it will use the default scope to fetch the records, and what you think would be a quick operation will bring your database to its knees as it copies the rows to a temporary table before updating.
I would recommend just using a normal scope instead of a default scope.
Have a look at Select the 3 most recent records where the values of one column are distinct on how to construct the sql query you want and then put that into a find_by_sql statemate mentioned in How to chain or combine scopes with subqueries or find_by_sql
The ActiveRecord order method simply uses the SQL ORDER function which can have several arguments. Let's say you have some model with the attributes order and version then the correct way order the records as you describe it, is order(:order, :version). If you want this as the default scope would you end up with:
default_scope { order(:order, :version) }
First, default_scopes are dangerous. They get used whenever you use the model, unless you specifically force 'unscoped'. IME, it is rare to need a scope to every usage of a model. Not impossible, but rare. And rarer yet when you have such a big computation.
Instead of making a complex query, can you simplify the problem? Here's one approach:
In order to make the version field work, you probably have some code that is already comparing the order fields (otherwise you would not have unique rows with the two order fields the same, but the version field differing). So you can create a new field, that is higher in value than the last field that indicated the right entity to return. That is, in order to create a new unique version, you know that you last had a most-important-row. Take the most-important-rows' sort order, and increment by one. That's your new most-important-rows' sort order.
Now you can query for qualifying data with the highest sort order (order_by(sort_order, 'DESC').first).
Rather than focus on the query, focus on whether you are storing the right data, that can the query you want to achieve, easier. In this case, it appears that you're already doing an operation that would help identify a winning case. So use that code and the existing database operation, to reduce future database operations.
In sql you can easily order on two things, which will first order on the first and then order on the second if the first thing is equal. So in your case that would be something like
select * from posts order by order_field_1, version desc
You cannot name a column order since it is a sql reserved word, and since you did not give the real column-name, I just named it order_field_1.
This is easily translated to rails:
Post.order(:order_field_1, version: :desc)
I would generally advice against using default_scope since once set it is really hard to avoid (it is prepended always), but if you really need it and know the risks, it is really to apply as well:
class Post < ActiveRecord::Base
default_scope { order(:order_field_1, version: :desc) }
end
This is all actually documented very well in the rails guides.

What's the best way to create a model associated to a query instead of a table

I'm trying to create a reporting app with Rails 4.
As a reporting system, it has a lot of SQL queries where the result is not like any table schema. I mean, a select query where I have some joins, unions and etc and the result will be something like a row with it's columns being result of subqueries, sums and etc.
Would it be possible to have a Model with no table associated, but I can use the "find_by_sql" on it, to instanciate an array of that model with the results of my query?
Something like:
Use "select table1.field1, sum(if(...,table2.field,...) as field2, as field3 from...." as query, and return a array of a model "Result", where I can call a
array_of_result.first.field3?
Sorry if I'm not writing clearly enough.
EDIT: until now, sparky's anwser(http://railscasts.com/episodes/193-tableless-model) was the closest one, beacuse I want to use some of the ActiveRecord features, like specify a connection in the class(or even in a super class).
For pure reporting, especially when the result column names span multiple models, one alternative is to just pass the query directly back and deal with the result set:
ActiveRecord::Base.connection.execute([raw SQL query])
You'll get back a result set, which is typically an enumerable set of row results, but check the documentation for your DB adapter to find out for sure what it's returning.
For example, if you're using PostgreSQL as your database with the pg gem, you'll get back an instance of PG::Result which you can then operate on in the following way:
> results = ActiveRecord::Base.connection.execute("SELECT COUNT(*) FROM customers")
=> <PG:Result >
> results.count
=> 63 # the number of customers I have in this contrived example
> results.first
=> { "count": "63" }
> results[0]
=> { "count": "63" }
> results[0]["count"]
=> "63"
You'll need to cast your return values to something other than strings. ActiveRecord will typically do this for you in your models since it knows the column types, but by doing a raw query you'll probably just get back strings that you'll have to cast yourself. If you're just doing a query to display it on a page somewhere maybe the strings will be sufficient.
I'm sure you'll be doing more sophisticated reports, but you'll notice in my simple example that the key count wound up being created as the accessor to the result of the SELECT COUNT... query. If you specify column names, or alias them, the keys in the resulting hash set will match the column names or the aliases you've set.
You can certainly create a Reporting model.
You would want to start off by creating a tableless model. Essentially, this can be as simple as a file in your models directory with
class Reporting
end
in it, and a controller with some appropriate actions and views. However, have a look at
http://railscasts.com/episodes/193-tableless-model
http://railscasts.com/episodes/219-active-model
which cover tableless models and what you can do with active model with respect to validations etc.
In your case, you say that you have some complex joins etc. Sometimes it's easier in the short term to SQLize these, but if you can use activerecord you should. Apart from anything else, this will allow you to define custom methods in your model which you can chain and make your Reporting controller much cleaner

Can I have a one way HABTM relationship?

Say I have the model Item which has one Foo and many Bars.
Foo and Bar can be used as parameters when searching for Items and so Items can be searched like so:
www.example.com/search?foo=foovalue&bar[]=barvalue1&bar[]=barvalue2
I need to generate a Query object that is able to save these search parameters. I need the following relationships:
Query needs to access one Foo and many Bars.
One Foo can be accessed by many different Queries.
One Bar can be accessed by many different Queries.
Neither Bar nor Foo need to know anything about Query.
I have this relationship set up currently like so:
class Query < ActiveRecord::Base
belongs_to :foo
has_and_belongs_to_many :bars
...
end
Query also has a method which returns a hash like this: { foo: 'foovalue', bars: [ 'barvalue1', 'barvalue2' } which easily allows me to pass these values into a url helper and generate the search query.
This all works fine.
My question is whether this is the best way to set up this relationship. I haven't seen any other examples of one-way HABTM relationships so I think I may be doing something wrong here.
Is this an acceptable use of HABTM?
Functionally yes, but semantically no. Using HABTM in a "one-sided" fashion will achieve exactly what you want. The name HABTM does unfortunately insinuate a reciprocal relationship that isn't always the case. Similarly, belongs_to :foo makes little intuitive sense here.
Don't get caught up in the semantics of HABTM and the other association, instead just consider where your IDs need to sit in order to query the data appropriately and efficiently. Remember, efficiency considerations should above all account for your productivity.
I'll take the liberty to create a more concrete example than your foos and bars... say we have an engine that allows us to query whether certain ducks are present in a given pond, and we want to keep track of these queries.
Possibilities
You have three choices for storing the ducks in your Query records:
Join table
Native array of duck ids
Serialized array of duck ids
You've answered the join table use case yourself, and if it's true that "neither [Duck] nor [Pond] need to know anything about Query", using one-sided associations should cause you no problems. All you need to do is create a ducks_queries table and ActiveRecord will provide the rest. You could even opt to use has_many :through relationship if you need to do anything fancy.
At times arrays are more convenient than using join tables. You could store the data as a serialized integer array and add handlers for accessing the data similar to the following:
class Query
serialize :duck_ids
def ducks
transaction do
Duck.where(id: duck_ids)
end
end
end
If you have native array support in your database, you can do the same from within your DB. similar.
With Postgres' native array support, you could make a query as follows:
SELECT * FROM ducks WHERE id=ANY(
(SELECT duck_ids FROM queries WHERE id=1 LIMIT 1)::int[]
)
You can play with the above example on SQL Fiddle
Trade Offs
Join table:
Pros: Convention over configuration; You get all the Rails goodies (e.g. query.bars, query.bars=, query.bars.where()) out of the box
Cons: You've added complexity to your data layer (i.e. another table, more dense queries); makes little intuitive sense
Native array:
Pros: Semantically nice; you get all the DB's array-related goodies out of the box; potentially more performant
Cons: You'll have to roll your own Ruby/SQL or use an ActiveRecord extension such as postgres_ext; not DB agnostic; goodbye Rails goodies
Serialized array:
Pros: Semantically nice; DB agnostic
Cons: You'll have to roll your own Ruby; you'll loose the ability to make certain queries directly through your DB; serialization is icky; goodbye Rails goodies
At the end of the day, your use case makes all the difference. That aside, I'd say you should stick with your "one-sided" HABTM implementation: you'll lose a lot of Rails-given gifts otherwise.

Rails subquery reduce amount of raw SQL

I have two ActiveRecord models: Post and Vote. I want a make a simple query:
SELECT *,
(SELECT COUNT(*)
FROM votes
WHERE votes.id = posts.id) AS vote_count
FROM posts
I am wondering what's the best way to do it in activerecord DSL. My goal is to minimize the amount of SQL I have to write.
I can do Post.select("COUNT(*) from votes where votes.id = posts.id as vote_count")
Two problems with this:
Raw SQL. Anyway to write this in DSL?
This returns only attribute vote_count and not "*" + vote_count. I can append .select("*") but I will be repeating this every time. Is there an much better/DRY way to do this?
Thanks
Well, if you want to reduce amount of SQL, you can split that query into smaller two end execute them separately. For instance, the votes counting part could be extracted to query:
SELECT votes.id, COUNT(*) FROM votes GROUP BY votes.id;
which you may write with ActiveRecord methods as:
Vote.group(:id).count
You can store the result for later use and access it directly from Post model, for example you may define #votes_count as a method:
class Post
def votes_count
##votes_count_cache ||= Vote.group(:id).count
##votes_count_cache[id] || 0
end
end
(Of course every use of cache raises a question about invalidating or updating it, but this is out of the scope of this topic.)
But I strongly encourage you to consider yet another approach.
I believe writing complicated queries like yours with ActiveRecord methods — even if would be possible — or splitting queries into two as I proposed earlier are both bad ideas. They result in extremely cluttered code, far less readable than raw SQL. Instead, I suggest introducing query objects. IMO there is nothing wrong in using raw, complicated SQL when it's hidden behind nice interface. See: M. Fowler's P of EAA and Brynary's post on Code Climate Blog.
How about doing this with no additional SQL at all - consider using the Rails counter_cache feature.
If you add an integer votes_count column to the posts table, you can get Rails to automatically increment and decrement that counter by changing the belongs_to declaration in Vote to:
belongs_to :post, counter_cache: true
Rails will then keep each Post updated with the number of votes it has. That way the count is already calculated and no sub-query is needed.
Maybe you can create mysql view and just map it to new AR model. It works similar way to table, you just need to specify with set_table_name "your_view_name"....maybe on DB level it will work faster and will be automatically re-calculating.
Just stumbled upon postgres_ext gem which adds support for Common Table Expressions in Arel and ActiveRecord which is exactly what you asked. Gem is not for SQLite, but perhaps some portions could be extracted or serve as examples.

Best practices for getting a list of IDs from an ActiveRecord model

I have an ActiveRecord model Language, with columns id and short_code (there are other columns, but they are not relevant to this question). I want to create a method that will be given a list of short codes, and return a list of IDs. I do not care about associations, I just need to end up with an array that looks like [1, 2, 3, ...].
My first thought was to do something like
def get_ids_from_short_codes(*short_codes)
Language.find_all_by_short_code(short_codes.flatten, :select => 'id').map(&:id)
end
but I'm not sure if that's needlessly wasting time/memory/processing.
My question is twofold:
Is there a way to run an ActiveRecord find that will just return an array of a certain table column rather than instantiating objects?
If so, would it actually be worthwhile to collect an array of length n rather than instantiating n ActiveRecord objects?
Note that for my specific purpose, n would be approximately 200.
In Rails 3.x, you can use the pluck method which returns the values from the requested field without instantiating objects to hold them.
This would give you an array of IDs:
Language.where(short_code: short_codes.flatten).pluck(:id)
I should mention that in Rails 3.x you can pluck only one column at a time but in Rails 4 you can pass multiple columns to pluck.
By the way, here's a similar answer to a similar question
Honestly, for 200 records, I wouldn't worry about it. When you get to 2000, 20,000, or 200,000 records - then you can worry about optimization.
Make sure you have short_code indexed in your table.
If you are still concerned about performance, take a look at the development.log and see what the database numbers are for that particular call. You can adjust the query and see how it affects performance in the log. This should give you a rough estimate of performance.
Agree with the previous answer, but if you absolutely must, you can try this
sql = Language.send(:construct_finder_sql, :select => 'id', :conditions => ["short_code in (?)", short_codes])
Language.connection.select_values(sql)
A bit ugly as it is, but it doesn't create in-memory objects.
if you're using associations you can get raw ids directly from ActiveRecord.
eg.:
class User < ActiveRecord::Base
has_many :users
end
irb:=> User.find(:first).user_ids
irb:>> [1,2,3,4,5]
Phil is right about this, but if you do find that this is an issue. You can send a raw SQL query to the database and work at a level below ActiveRecord. This can be useful for situations like this.
ActiveRecord::Base.connection.execute("SQL CODE!")
Benchmark your code first before you resort to this.
This really is a matter of choice.
Overkill or not, ActiveRecord is supposed to give you objects since it's an ORM. And Like Ben said, if you do not want objects, use raw SQL.

Resources