ActiveRecord query array intersection? - ruby-on-rails

I'm trying to figure out the count of certain types of articles. I have a very inefficient query:
Article.where(status: 'Finished').select{|x| x.tags & Article::EXPERT_TAGS}.size
In my quest to be a better programmer, I'm wondering how to make this a faster query. tags is an array of strings in Article, and Article::EXPERT_TAGS is another array of strings. I want to find the intersection of the arrays, and get the resulting record count.
EDIT: Article::EXPERT_TAGS and article.tags are defined as Mongo arrays. These arrays hold strings, and I believe they are serialized strings. For example: Article.first.tags = ["Guest Writer", "News Article", "Press Release"]. Unfortunately this is not set up properly as a separate table of Tags.
2nd EDIT: I'm using MongoDB, so actually it is using a MongoWrapper like MongoMapper or mongoid, not ActiveRecord. This is an error on my part, sorry! Because of this error, it screws up the analysis of this question. Thanks PinnyM for pointing out the error!

Since you are using MongoDB, you could also consider a MongoDB-specific solution (aggregation framework) for the array intersection, so that you could get the database to do all the work before fetching the final result.
See this SO thread How to check if an array field is a part of another array in MongoDB?

Assuming that the entire tags list is stored in a single database field and that you want to keep it that way, I don't see much scope of improvement, since you need to get all the data into Ruby for processing.
However, there is one problem with your database query
Article.where(status: 'Finished')
# This translates into the following query
SELECT * FROM articles WHERE status = 'Finished'
Essentially, you are fetching all the columns whereas you only need the tags column for your process. So, you can use pluck like this:
Article.where(status: 'Finished').pluck(:tags)
# This translates into the following query
SELECT tags FROM articles WHERE status = 'Finished'

I answered a question regarding general intersection like queries in ActiveRecord here.
Extracted below:
The following is a general approach I use for constructing intersection like queries in ActiveRecord:
class Service < ActiveRecord::Base
belongs_to :person
def self.with_types(*types)
where(service_type: types)
end
end
class City < ActiveRecord::Base
has_and_belongs_to_many :services
has_many :people, inverse_of: :city
end
class Person < ActiveRecord::Base
belongs_to :city, inverse_of: :people
def self.with_cities(cities)
where(city_id: cities)
end
# intersection like query
def self.with_all_service_types(*types)
types.map { |t|
joins(:services).merge(Service.with_types t).select(:id)
}.reduce(scoped) { |scope, subquery|
scope.where(id: subquery)
}
end
end
Person.with_all_service_types(1, 2)
Person.with_all_service_types(1, 2).with_cities(City.where(name: 'Gold Coast'))
It will generate SQL of the form:
SELECT "people".*
FROM "people"
WHERE "people"."id" in (SELECT "people"."id" FROM ...)
AND "people"."id" in (SELECT ...)
AND ...
You can create as many subqueries as required with the above approach based on any conditions/joins etc so long as each subquery returns the id of a matching person in its result set.
Each subquery result set will be AND'ed together thus restricting the matching set to the intersection of all of the subqueries.

Related

Datamapper: Sorting results through association

I'm working on a Rails 3.2 app that uses Datamapper as its ORM. I'm looking for a way to sort a result set by an attribute of the associated model. Specifically I have the following models:
class Vehicle
include DataMapper::Resource
belongs_to :user
end
class User
include DataMapper::Resource
has n, :vehicles
end
Now I want to be able to query the vehicles and sort them by the name of the driver. I tried the following but neither seems to work with Datamapper:
> Vehicle.all( :order => 'users.name' )
ArgumentError: +options[:order]+ entry "users.name" does not map to a property in Vehicle
> Vehicle.all( :order => { :users => 'name' } )
ArgumentError: +options[:order]+ entry [:users, "name"] of an unsupported object Array
Right now I'm using Ruby to sort the result set post-query but obviously that's not helping performance any, also it stops me from further chaining on other scopes.
I spent some more time digging around and finally turned up an old blog which has a solution to this problem. It involves manually building the ordering query in DataMapper.
From: http://rhnh.net/2010/12/01/ordering-by-a-field-in-a-join-model-with-datamapper
def self.ordered_by_vehicle_name direction = :asc
order = DataMapper::Query::Direction.new(vehicle.name, direction)
query = all.query
query.instance_variable_set("#order", [order])
query.instance_variable_set("#links", [relationships['vehicle'].inverse])
all(query)
end
This will let you order by association and still chain on other scopes, e.g.:
User.ordered_by_vehicle_name(:desc).all( :name => 'foo' )
It's a bit hacky but it does what I wanted it to do at least ;)
Note: I'm not familiar with DataMapper and my answer might not be within the standards and recommendations of using DataMapper, but it should hopefully give you the result you're looking for.
I've been looking through various Google searches and the DataMapper documentation and I haven't found a way to "order by assocation attribute". The only solution I have thought of is "raw" SQL.
The query would look like this.
SELECT vehicles.* FROM vehicles
LEFT JOIN users ON vehicles.user_id = users.id
ORDER BY users.name
Unfortunately, from my understanding, when you directly query the database you won't get the Vehicle object, but the data from the database.
From the documentation: http://datamapper.org/docs/find.html. It's near the bottom titled "Talking directly to your data-store"
Note that this will not return Zoo objects, rather the raw data straight from the database
Vehicle.joins(:user).order('users.name').all
or in Rails 2.3,
Vehicle.all(:joins => "inner join users on vehicles.user_id = user.id", :order => 'users.name')

Fetch COUNT(column) as an integer in a query with group by in Rails 3

I have 2 models Category and Article related like this:
class Category < ActiveRecord::Base
has_many :articles
end
class Article < ActiveRecord::Base
belongs_to :category
def self.count_articles_per_category
select('category_id, COUNT(*) AS total').group(:category_id)
end
end
I'm accessing count_articles_per_category like this
Article.count_articles_per_category
which will return articles that have 2 columns: category_id and total.
My problem is that total column is a string. So the question is: is there a method to fetch that column as an integer?
PS: I tried to do a cast in the database for COUNT(*) and that doesn't help.
I try to avoid doing something like this:
articles = Article.count_articles_per_category
articles.map do |article|
article.total = article.total.to_i
article
end
No, there is no support in ActiveRecord to automatically cast datatypes (which are always transferred as strings to the database).
The way ActiveRecord works when retrieving items is:
for each attribute in the ActiveRecord model, check the column type, and cast the data to that type.
for extra columns, it does not know what data type it should cast it to.
Extra columns includes columns from other tables, or expressions.
You can use a different query, like:
Article.group(:category_id).count
Article.count(:group => :category_id)
These return a hash of :category_id => count. So you might get something like {6=>2, 4=>2, 5=>1, 2=>1, 9=>1, 1=>1, 3=>1}.
Using the count method works because it implicitly lets ActiveRecord know that it is an integer type.
Article.group(:category_id).count might give you something you can use. This will return a hash where each key represents the category_id and each value represents the corresponding count as an integer.

Join and select multiple column

I'm trying to select multiple columns after doing a join. I couldn't find a way to do so using ActiveRecord without writing SQL between quotation marks in the query (Thing I'd like to avoid)
Exploring Arel, I've found I could select multiple columns using "project", however I'm not quite sure if I should use Arel directly or if there was a way to achieve the same with AR.
These is the code in Arel:
l = Location.arel_table
r = Region.arel_table
postcode_name_with_region_code = l.where(l[:id].eq(location_id)).join(r).on(l[:region_id].eq(r[:id])).project(l[:postcode_name], r[:code])
After running this query I'd like to return something along the lines of:
(Pseudo-code)
"#{:postcode_name}, #{:code}"
Is there a way to achieve the same query using AR?
If I stick to Arel, how can I get the values out of the SelectManager class the above query returns.
Thanks in advance,
Using AR, without writing any SQL and assuming your models and associations are:
models/region.rb
class Region < ActiveRecord::Base
has_many :locations
end
model/location.rb
class Location < ActiveRecord::Base
belongs_to :region
end
You can certainly do:
postcode_name_with_region_code = Location.where(:id=>location_id).includes(:region).collect{|l| "#{l.postcode_name}, #{l.region.code}"}
This will do the query and then use Ruby to format your result (note that it will return an array since I'm assuming there could be multiple records returned). If you only want one item of the array, you can use the array.first method to reference it.
You could also eager load the association and build your string from the result:
my_loc = Location.find(location_id, :include=>:region)
postcode_name_with_region_code = "#{my_loc.postcode_name}, #{my_loc.region.code}"
predicate = Location.where(:id=>location_id).includes(:region)
predicate.ast.cores.first.projections.clear
predicate.project(Location.arel_table[:postcode_name], Region.arel_table[:code])

How to filter association_ids for an ActiveRecord model?

In a domain like this:
class User
has_many :posts
has_many :topics, :through => :posts
end
class Post
belongs_to :user
belongs_to :topic
end
class Topic
has_many :posts
end
I can read all the Topic ids through user.topic_ids but I can't see a way to apply filtering conditions to this method, since it returns an Array instead of a ActiveRecord::Relation.
The problem is, given a User and an existing set of Topics, marking the ones for which there is a post by the user. I am currently doing something like this:
def mark_topics_with_post(user, topics)
# only returns the ids of the topics for which this user has a post
topic_ids = user.topic_ids
topics.each {|t| t[:has_post]=topic_ids.include(t.id)}
end
But this loads all the topic ids regardless of the input set. Ideally, I'd like to do something like
def mark_topics_with_post(user, topics)
# only returns the topics where user has a post within the subset of interest
topic_ids = user.topic_ids.where(:id=>topics.map(&:id))
topics.each {|t| t[:has_post]=topic_ids.include(t.id)}
end
But the only thing I can do concretely is
def mark_topics_with_post(user, topics)
# needlessly create Post objects only to unwrap them later
topic_ids = user.posts.where(:topic_id=>topics.map(&:id)).select(:topic_id).map(&:topic_id)
topics.each {|t| t[:has_post]=topic_ids.include(t.id)}
end
Is there a better way?
Is it possible to have something like select_values on a association or scope?
FWIW, I'm on rails 3.0.x, but I'd be curious about 3.1 too.
Why am I doing this?
Basically, I have a result page for a semi-complex search (which happens based on the Topic data only), and I want to mark the results (Topics) as stuff on which the user has interacted (wrote a Post).
So yeah, there is another option which would be doing a join [Topic,Post] so that the results come out as marked or not from the search, but this would destroy my ability to cache the Topic query (the query, even without the join, is more expensive than fetching only the ids for the user)
Notice the approaches outlined above do work, they just feel suboptimal.
I think that your second solution is almost the optimal one (from the point of view of the queries involved), at least with respect to the one you'd like to use.
user.topic_ids generates the query:
SELECT `topics`.id FROM `topics`
INNER JOIN `posts` ON `topics`.`id` = `posts`.`topic_id`
WHERE `posts`.`user_id` = 1
if user.topic_ids.where(:id=>topics.map(&:id)) was possible it would have generated this:
SELECT topics.id FROM `topics`
INNER JOIN `posts` ON `topics`.`id` = `posts`.`topic_id`
WHERE `posts`.`user_id` = 1 AND `topics`.`id` IN (...)
this is exactly the same query that is generated doing: user.topics.select("topics.id").where(:id=>topics.map(&:id))
while user.posts.select(:topic_id).where(:topic_id=>topics.map(&:id)) generates the following query:
SELECT topic_id FROM `posts`
WHERE `posts`.`user_id` = 1 AND `posts`.`topic_id` IN (...)
which one of the two is more efficient depends on the data in the actual tables and indices defined (and which db is used).
If the topic ids list for the user is long and has topics repeated many times, it may make sense to group by topic id at the query level:
user.posts.select(:topic_id).group(:topic_id).where(:topic_id=>topics.map(&:id))
Suppose your Topic model has a column named id you can do something like this
Topic.select(:id).join(:posts).where("posts.user_id = ?", user_id)
This will run only one query against your database and will give you all the topics ids that have posts for a given user_id

Rails active record query

How would i do a query like this.
i have
#model = Model.near([latitude, longitude], 6.8)
Now i want to filter another model, which is associated with the one above.
(help me with getting the right way to do this)
model2 = Model2.where("model_id == :one_of_the_models_filtered_above", {:one_of_the_models_filtered_above => only_from_the_models_filtered_above})
the model.rb would be like this
has_many :model2s
the model2.rb
belongs_to :model
Right now it is like this (after #model = Model.near([latitude, longitude], 6.8)
model2s =[]
models.each do |model|
model.model2s.each do |model2|
model2.push(model2)
end
end
I want to accomplish the same thing, but with an active record query instead
i think i found something, why does this fail
Model2.where("model.distance_from([:latitude,:longitude]) < :dist", {:latitude => latitude, :longitude => longitude, :dist => 6.8})
this query throws this error
SQLite3::SQLException: near "(": syntax error: SELECT "tags".* FROM "tags" WHERE (model.distance_from([43.45101666666667,-80.49773333333333]) < 6.8)
, why
use includes. It will eager-load associated models (only two SQL queries instead of N+1).
#models = Model.near( [latitude, longitude], 6.8 ).includes( :model2s )
so when you will do #models.first.model2s, associated model2s will already be loaded (see RoR guides for more info).
If you want to get an array of all model2s belonging to your collection of models, you can do :
#models.collect( &:model2s )
# add .flatten at the end of the chain if you want a one level deep array
# add .uniq at the end of the chain if you don't want duplicates
collect (also called map) will gather in an array the result of any block passed to each of the caller's elements (this does exactly the same as your code, see Enumerable's doc for more info). The & before the symbol converts it into a Proc passed to each element of the collection, so this is the same as writing
#models.collect {|model| model.model2s }
one more thing : #mu is right, seems SQLite does not know about your distance_from stored procedure. As i suspect this is a GIS related question, you may ask about this particular issue on gis.stackexchange.com

Resources