Rails scope complexity - ruby-on-rails

I have a model to which I need to create a default scope. I am unsure of the best way to write this scope but I will explain how it needs to work.
Basically I need to get all items of the model and if two items have the same "order" value then it should look to the "version" field (which will contain, 1, 2, 3 etc) and pick the one with the highest value.
Is there a way of achieving this with just a scope?

Try this code:
scope :group_by_order, -> { order('order ASC').group('order') }
default_scope, { (group_by_order.map{ |key,values| values.order('version DESC') }.map{|key, values| values - values[1..-1]}).values.flatten }
Explanation Code:
order by "order" field.
group by "order" field.
map on the result hash, and order each values by "version" field
map again on values, and remove from index "1" to the end.
get all values, and flatten them

A word of caution using default scopes with order. When you performs updated on the collection such as update_all it will use the default scope to fetch the records, and what you think would be a quick operation will bring your database to its knees as it copies the rows to a temporary table before updating.
I would recommend just using a normal scope instead of a default scope.
Have a look at Select the 3 most recent records where the values of one column are distinct on how to construct the sql query you want and then put that into a find_by_sql statemate mentioned in How to chain or combine scopes with subqueries or find_by_sql

The ActiveRecord order method simply uses the SQL ORDER function which can have several arguments. Let's say you have some model with the attributes order and version then the correct way order the records as you describe it, is order(:order, :version). If you want this as the default scope would you end up with:
default_scope { order(:order, :version) }

First, default_scopes are dangerous. They get used whenever you use the model, unless you specifically force 'unscoped'. IME, it is rare to need a scope to every usage of a model. Not impossible, but rare. And rarer yet when you have such a big computation.
Instead of making a complex query, can you simplify the problem? Here's one approach:
In order to make the version field work, you probably have some code that is already comparing the order fields (otherwise you would not have unique rows with the two order fields the same, but the version field differing). So you can create a new field, that is higher in value than the last field that indicated the right entity to return. That is, in order to create a new unique version, you know that you last had a most-important-row. Take the most-important-rows' sort order, and increment by one. That's your new most-important-rows' sort order.
Now you can query for qualifying data with the highest sort order (order_by(sort_order, 'DESC').first).
Rather than focus on the query, focus on whether you are storing the right data, that can the query you want to achieve, easier. In this case, it appears that you're already doing an operation that would help identify a winning case. So use that code and the existing database operation, to reduce future database operations.

In sql you can easily order on two things, which will first order on the first and then order on the second if the first thing is equal. So in your case that would be something like
select * from posts order by order_field_1, version desc
You cannot name a column order since it is a sql reserved word, and since you did not give the real column-name, I just named it order_field_1.
This is easily translated to rails:
Post.order(:order_field_1, version: :desc)
I would generally advice against using default_scope since once set it is really hard to avoid (it is prepended always), but if you really need it and know the risks, it is really to apply as well:
class Post < ActiveRecord::Base
default_scope { order(:order_field_1, version: :desc) }
end
This is all actually documented very well in the rails guides.

Related

Index virtual attribute Thinking Sphinx

I'm wondering how one might index a virtual attribute on a model with Thinking Sphinx. Given a Project model and some instance method which returns a boolean derived by some other information from another model, say Users, whose attribute is derived and is not on the project table in the database.
For example, suppose we have a method is_user_eligible such that we can query Project.first.is_user_eligible, and get a true or false response. This works in the ORM already.
How can I index this virtual attribute with Thinking Sphinx? I'm able to index virtual attributes in my django project which is on Haystack backed by Elasticsearch. I facilitated this by having a #property decorator on the model method. I figured I should be able to do this with Rails/ThinkingSphinx too, yet I get all sorts of bizarre SQL errors when trying to index. I've tried all sorts of various constructions in setting up my index (e.g. has -vs- indexes) and all result in some sort of SQL error while indexing.
Is this possible with Thinking Sphinx? If so, how can I index a virtual attribute?
You've made it clear that the value is not available as a column on the projects table, but is it on an associated model instead? If so, then you could refer to it via the association:
has user.is_eligible, :as => :is_user_eligible
However, if it's not a column, but can be determined within the context of the SQL query, then you can use a SQL snippet as the attribute definition (I know my example is rather contrived, but should give you some idea):
has "(users.foo = 'bar' || users.baz = 'qux')",
:as => :is_user_eligible,
:type => :boolean
If you're referring to associations that aren't used elsewhere in the index definition, you can force the references, or provide a SQL join statement:
join users
# or through more than one association:
join users.addresses
# or via your own custom join:
join "INNER JOIN users ON users.project_id = projects.id"
But if you cannot determine this value via SQL at all, then the only way to do this with Thinking Sphinx is use real-time indices instead of SQL-backed indices. What this then means is that instead of referring to associations and columns in your index definitions, you refer to methods instead. So, your attribute would become:
has is_user_eligible, :type => :boolean
The type must be specified - SQL indices can guess attribute types due to column types, but real-time indices don't have that reference point.
I realise the link to the real-time indices feature is a blog post I wrote over two years ago. However, the feature certainly works - I and others have been using it in production for quite some time (including with Flying Sphinx).
On the topic of has vs indexes: if you want to use the value as a filter or for sorting, then it must be an attribute, and thus you should use the has method. However, if it's textual data that you expect search queries to match on, then it should be a field, and thus use the indexes method.
Certainly I'd recommend switching to real-time indices anyway: it removes the need for deltas and you get up-to-date Sphinx records without needing to run 'ts:index' regularly (or at all - use ts:generate should your data end up in an out-of-date state). But make sure you switch all index definitions to real-time, instead of having some real-time and others SQL-backed.

What's the best way to create a model associated to a query instead of a table

I'm trying to create a reporting app with Rails 4.
As a reporting system, it has a lot of SQL queries where the result is not like any table schema. I mean, a select query where I have some joins, unions and etc and the result will be something like a row with it's columns being result of subqueries, sums and etc.
Would it be possible to have a Model with no table associated, but I can use the "find_by_sql" on it, to instanciate an array of that model with the results of my query?
Something like:
Use "select table1.field1, sum(if(...,table2.field,...) as field2, as field3 from...." as query, and return a array of a model "Result", where I can call a
array_of_result.first.field3?
Sorry if I'm not writing clearly enough.
EDIT: until now, sparky's anwser(http://railscasts.com/episodes/193-tableless-model) was the closest one, beacuse I want to use some of the ActiveRecord features, like specify a connection in the class(or even in a super class).
For pure reporting, especially when the result column names span multiple models, one alternative is to just pass the query directly back and deal with the result set:
ActiveRecord::Base.connection.execute([raw SQL query])
You'll get back a result set, which is typically an enumerable set of row results, but check the documentation for your DB adapter to find out for sure what it's returning.
For example, if you're using PostgreSQL as your database with the pg gem, you'll get back an instance of PG::Result which you can then operate on in the following way:
> results = ActiveRecord::Base.connection.execute("SELECT COUNT(*) FROM customers")
=> <PG:Result >
> results.count
=> 63 # the number of customers I have in this contrived example
> results.first
=> { "count": "63" }
> results[0]
=> { "count": "63" }
> results[0]["count"]
=> "63"
You'll need to cast your return values to something other than strings. ActiveRecord will typically do this for you in your models since it knows the column types, but by doing a raw query you'll probably just get back strings that you'll have to cast yourself. If you're just doing a query to display it on a page somewhere maybe the strings will be sufficient.
I'm sure you'll be doing more sophisticated reports, but you'll notice in my simple example that the key count wound up being created as the accessor to the result of the SELECT COUNT... query. If you specify column names, or alias them, the keys in the resulting hash set will match the column names or the aliases you've set.
You can certainly create a Reporting model.
You would want to start off by creating a tableless model. Essentially, this can be as simple as a file in your models directory with
class Reporting
end
in it, and a controller with some appropriate actions and views. However, have a look at
http://railscasts.com/episodes/193-tableless-model
http://railscasts.com/episodes/219-active-model
which cover tableless models and what you can do with active model with respect to validations etc.
In your case, you say that you have some complex joins etc. Sometimes it's easier in the short term to SQLize these, but if you can use activerecord you should. Apart from anything else, this will allow you to define custom methods in your model which you can chain and make your Reporting controller much cleaner

Rails select in scope

In my model User, I have scope set up:
scope :count_likes, lambda {
select("(SELECT count(*) from another_model) AS count")
}
If I want to get all attributes of my User + count_likes, I have to do:
Model.count_likes.select("users.*")
because calling select() will the default "*"
I use count_likes scope a lot of my application and my issue is that I have to append select("users.*") everywhere.
I know about the default scope, however, I don't think doing select("users.*") in default scope if a good idea.
Is there a DRY / better way of doing this?
Thanks
This isn't really another answer. I wanted to leave a comment about the joins, but comments cannot run long and I wanted to provide code examples.
What you need is to sometimes get all the fields and counts of a related table, and other times get the counts without the users.* fields, (and maybe sometimes just the user.* fields without the counts). So, you are going to have to tell the code which one you want. I think what you are looking for is an except type of thing, where by default you get the user.* fields and the counts, but when you only want the counts, to specify turning off the select('user.*'). I don't think there is such a solution, except maybe using the default scope. I suggest having one scope for just the counts, and one scope for users fields and the counts.
Here is what I would do:
class Users
has_many :likes
def self.with_count_likes
joins(:likes)
.select('users.*, count(likes.id) as count')
.group('users.id')
end
def self.count_likes
joins(:likes)
.select('users.id, users.username, count(likes.id) as count')
.group('users.id')
end
...
Call with_count_likes (or chain it into a query) when you want all the users fields and the likes counts. Call count_likes when you want just the counts and a few identifying fields.
I'm assuming here that whenever you want the counts, you want some users fields to identify what/(who) the counts are for.
Note that some databases (like Oracle) may require grouping by 'users.*'. This is the standard in SQL, but some databases like mySQL only use the primary key.
You may simply add users.* to the scope.
scope :count_likes, lambda {
select("(SELECT count(*) from another_model) AS count, users.*")
}
HTH
EDIT: I am not sure of exactly what you are trying to achieve, but you should consider using joins and get the data by joining tables appropriately.
EDIT: Usually I am not a big fan of making such changes, but as situation suggests sometimes we need to get our hands dirty. In this case, I would try to reduce the number of operations in terms of making changes. Consider:
scope :count_likes, Proc.new { |all| s = select("(SELECT count(*) from another_model) AS count"); s = s.select("users.*") unless all == false; s }
Now you will get users.* everywhere. For specific places where you just need the count, you may replace it like User.count_likes(false) and it will give you just the counts. Thus minimal changes.
There may be another possibility of appending multiple scopes together, one for counts, one for users.* and use them to achieve the above effect.

Efficiency of querying a database with STI

I was curious how rails queries a table that uses STI. For example, if my parent class is Book and I have two subclasses ComicBook and Novel if I do something like
Novel.all.each
since there's only one table, does the server shift through all of the comic books as well? Is appropriate indexing automatically added to prevent this? Thanks
Well, you can't do Novel.each, each isn't defined on model classes. You can however do Novel.all.each ... where ... is some block.
As for how the query works, just call to_sql on any ARel expression. Novel.all will return the collection of models itself, so you need to go a step further to ensure a valid ARel expresion is returned by calling scoped.
[1] pry(main)> Novel.scoped.to_sql
=> "SELECT \"books\".* FROM \"books\" WHERE \"books\".\"type\" IN ('Novel')"
Indexing by most columns that are queried against frequently is a good thing to consider. Yes, without the index your rdbms will have to look at all records in the table as part of the above condition check.

Rails Hide Specific Record From All Select * Type Queries

I have a record in a table that serves as a placeholder of sorts, and doesn't represent actual data. It's bad design, I know, but I have some very awkward requirements that I have to deal with and I saw no other solutions so it's a bit of a hotfix per se.
Now lets say I have a series of SELECT *s throughout my application and I don't want to have to explicitly exclude that single record for each of them. Is there anything I can drop into my model to exclude it from all queries except for the ones where it's explicitly called? Or perhaps some logic I can put directly into my PG database?
It's the very first record in the table with an ID of 0.
Add a default scope
default_scope where('id != 0')
to your model...
In any case you want to avoid that default scope in some query, you can have Model.unscoped... there...
One solution would be to define a default_scope that would exclude those records, see the doc
So when doing YourModel.all, if the default_scope on YourModel excludes the correct records, you'll get what you want.
But as you said, it's bad design !
Create a view excluding it:
create view v as
select *
from t
where id != 0
Now select from the view:
select *
from v

Resources