Calculate formula through unique rails associations - ruby-on-rails

Say that I have these tables/associations :
Product has_many :keywords, :through => :product_keywords
Keyword has_many :products, :through => :product_keywords
And the ProductKeyword table is :
=> ProductKeyword(id: integer, keyword_position: integer, product_id: integer, keyword_id: integer)
Let's picture the prk variable as a list of products. Each product can have many keywords and keyword is defined as :
=> Keyword(id: integer, phrase: text)
Multiple keywords could contain the same phrases in terms of the text of the Keyword, but they are different database entries.
What I want to do is create a table with all the keywords of all the products and for each keyword calculate the sum of all those keywords that had a keyword_position < 10.
So if 5 products have the same keyword phrase(say "beach") and thus 5 different entries and their respective ProductKeyword keyword_positions are [1, 6, 11, 13, 3], I want that keyword to return a unique entry associated with the sum of less than 10 keyword_positions, which in this case would be 3.
I have tried a few different things, but end up confusing myself. What is the proper way to do this ?

I believe the below code should produce the data you're looking for:
Keyword
.joins(:product_keywords)
.where(ProductKeyword.arel_table[:keyword_position].lt(10))
.group(Keyword.primary_key)
.select(
Keyword.arel_table[Arel.star],
Arel.star.count.as('product_count')
)
This will run the following SQL query (syntax may vary depending on your database):
SELECT "keywords".*, COUNT(*) AS product_count
FROM "keywords"
INNER JOIN "product_keywords" ON "product_keywords"."keyword_id" = "keywords"."id"
WHERE "product_keywords"."keyword_position" < 10
GROUP BY "keywords"."id"
That will return a list of Keyword records; you can run .product_count on each of these records to determine how many associated ProductKeyword records there are with a keyword_position value of less than 10.
You could then create a table to hold the data produced by the above code.
If you wanted to determine the count for a specific Keyword record without running that whole query, the following code should produce that count:
my_keyword.product_keywords.where(ProductKeyword.arel_table[:keyword_position].lt(10)).count

Related

Rails include returns filtered relations instead of all relations

I'm using an includes instead of a join because it runs faster but the statement is returning an association that doesn't include all of the data I'm looking for. It returns all of the left data, but only the right data that matches the query. Hopefully the examples below help clarify the problem and what I'm trying to achieve.
The join does seem to do what I'm after from a data and rails association perspective but executes a ton of queries and is much slower.
Setup and examples
class Species < ActiveRecord::Base
has_many :species_types, foreign_key: 'species_id', primary_key: "id"
end
class SpeciesTypes < ActiveRecord::Base
belongs_to :species, :foreign_key => "id", :primary_key => "species_id"
end
create_table "species", force: :cascade do |t|
t.bigint "id"
t.string "identifier"
end
create_table "species_types", force: :cascade do |t|
t.bigint "species_id"
t.bigint "type_id"
t.string "name"
end
Table data to help visualize the queries below
Species
id
identifier
1
furry
2
sleek
3
hairy
4
shiny
5
reflective
6
rough
7
rubbery
SpeciesTypes
species_id
type_id
identifier
1
1
hairy
1
2
metalic
2
3
skin
3
1
hairy
4
2
metalic
4
3
skin
5
3
skin
5
3
skin
6
2
metalic
7
2
metalic
I know the SpeciesTypes.type_id, and I'm looking to get all Species that have that type, including all of their SpeciesTypes.
Using includes
`species = Species.includes(:species_types).where(:species_types => {:type_id => 1})`
This does return all Species with a matching SpeciesType. However, instead of returning all Species with all SpeciesType it return all Species with only the SpeciesType that match the :type_id parameter. So, in this case you cannot reference all SpeciesTypes from the Species object (species[0].species_types). Does not return what was expected, although it makes sense why it does limit to the matched type_id.
Response from above query for Species
irb()$ species = Species.includes(:species_types).where(:species_types => {:type_id => 1})
irb()$ species[0].species_types
[#<SpeciesTypes:0x0000ffff9ad73490
species_id: 1,
type_id: 1,
identifier: hairy>]
I'm looking for this:
irb()$ species = Species.includes(:species_types).where(:species_types => {:type_id => 1})
irb()$ species[0].species_types
[#<SpeciesTypes:0x0000ffff9ad73490
species_id: 1,
type_id: 1,
identifier: hairy>,
<SpeciesTypes:0x0000ffff9ad73490
species_id: 1,
type_id: 2,
identifier: metalic>,
]
Using joins
This is returning what I'm after (using join instead of includes) however the query is much much slower. I think I'm missing something obvious (or not obvious but fundamental)
species = Species.joins(:species_types).where(:species_types => {:type_id => 3})
The above returns the values that I expect but is a much slower query.
Can the includes query be updated to return all Species with all types that match the known :type_id?
While its pretty natural to think that Species.includes(:species_types).where(:species_types => {:type_id => 3}) would load all the species and just eager load the species_types that match the where clause thats not how ActiveRecord and SQL actually works.
What this generates in terms of a query something like:
SELECT species.name AS t0_c1, species_types.id AS t1_c1 ...
LEFT OUTER JOIN species_types, t1
ON species_types.specie_id = species.id
WHERE species_types.type_id = ?
When you use includes and reference the other table it delegates to .eager_load which loads both tables in a single database query.
The where clause here applies to the entire query and not just the joined assocation. Remember that this query returns a row for every species_types row (with duplicate data for the species table).
If you wanted to load just the records that match the condition you would need to put the restriction into the JOIN clause:
SELECT species.name AS t0_c1, ...
LEFT OUTER JOIN species_types, t1
ON species_types.specie_id = species.id AND species_types.type_id = ?
Unfortunately ActiveRecord associations do not provide a way to do that.
The easiest solution to the problem is most likely to just query from the other end:
Type.find(1)
.specie_types.where(specie: specie)
.joins is not the answer
You can't just replace includes with joins as they do very things.
joins just adds an INNER LEFT JOIN to the query but doesn't actually select any columns from the joined table. Its used to filter the assocation based on the joined table or to select aggregates. Not to prevent N+1 queries.
In this case it's most likely not the first query itself thats slower - rather you're creating a N+1 query when you iterate through specie_types as the assocation is not eager loaded / preloaded.
includes does an OUTER LEFT JOIN and will load the assocatiated records either in one or two queries depending on how its used.

Update models based on association record value

I have thousands of price comparators where each of them have many products. The comparator has an attribute :minimum_price which is the minimum price of it's products. What would be the fastest way to update all comparators :minimum_price
Comparator.rb
has_many :products
Product.rb
belongs_to :comparator
Let's imagine the following:
comparator_1 have 3 products with a price of 3, 5, 7
comparator_2 have 2 products with a price of 2, 4
How could I update all comparators :minimum_price in one query ?
Updating all in one query will require the use of a CTE which are not supported by default by ActiveRecord. There are libraries that provide you with tools to use them in Rails (e.g. this) or you can also do it with a direct query like this:
ActiveRecord::Base.connection.execute("
update comparators set minimum_price = min_vals.min_price
from (
select comparators.id as comp_id, min(products.price) as min_price
from comparators inner join products on comparators.id = products.comparator_id
group by comparators.id
) as min_vals
where comparators.id = min_vals.comp_id
")
NOTE: This is a postgresql query, so the syntax may vary slightly if it's a different database.
Try this, but I don't know how you store your minimum values.
Comparator.update_all([
'minimum_price = ?, updated_at = ?',
Product.find(self.product_id).price, Time.now
])

Activerecord: Getting a type error when using alias for a table

I have a table with items, and a table with item characteristics.
Corresponding models:
class Characteristic < ActiveRecord::Base
belongs_to :item
end
class Item < ActiveRecord::Base
has_many :characteristics
end
Each characteristic has it's name (i.e. 'price'), value, an a reference to an item.
I need to select items by multiple characteristics, say, price = 100 and weight = 50. For this, i need to join tables twice, like this:
Item.joins('INNER JOIN characteristics c1 ON c1.item_id = items.id').
joins('INNER JOIN characteristics c2 ON c2.item_id = items.id').
where('c1' => {name: 'price', value: '100'},
'c2' => {name: 'weight', value: '50'})
and this is where the problem is. Characteristic's value is stored in database as string, and when i try to compare it to an integer, or range, I get a type conversion error. But when i don't use an alias for a table, there is no error.
So, the code below works:
Item.joins('INNER JOIN "characteristics" ON "characteristics"."item_id" = "items"."id"').
where(characteristics: {characteristic_type_id: 223, value: 380})
But this one does not:
Item.joins('INNER JOIN "characteristics" c1 ON c1."item_id" = "items"."id"').
where(c1: {characteristic_type_id: 223, value: 380})
So how can I select items with, say, price in 50..100 and color 'brown'?
UPD:
neither of above code works, actually. First one does not produce an SQL error, but it does the wrong thing. It just quotes the value so it becomes string. I.e.
where(c1: {value: 10..15})
becomes
WHERE ("c1"."value" BETWEEN '10' AND '15')
which is, obviously, not what I really want
So I decided do add one more field to characteristics, value_f:decimal{8,2} to hold the numeric value of characteristic. I also added
after_validation do
self.value_f = value.to_f
end
to characteristic's model. So, when I want to compare a value to a number, I just use value_f instead.
Try this:
Item.joins(:characteristics).where(characteristics: [{name: 'price', value: '100'},{name: 'weight', value: '50'}] )
Found the solution:
Postgresql. CREATE CAST 'character varying' to 'integer'
CREATE CAST (varchar AS integer) WITH INOUT AS IMPLICIT;
Still have to figure out how to execute this on Heroku, but that's a different question

Rails query a has_many :through conditionally with multiple ids

I'm trying to build a filtering system for a website that has locations and features through a LocationFeature model. Basically what it should do is give me all the locations based on a combination of feature ids.
So for example if I call the method:
Location.find_by_features(1,3,4)
It should only return the locations that have all of the selected features. So if a location has the feature_ids [1, 3, 5] it should not get returned, but if it had [1, 3, 4, 5] it should. However, currently it is giving me Locations that have either of them. So in this example it returns both, because some of the feature_ids are present in each of them.
Here are my models:
class Location < ActiveRecord::Base
has_many :location_features, dependent: :destroy
has_many :features, through: :location_features
def self.find_by_features(*ids)
includes(:features).where(features: {id: ids})
end
end
class LocationFeature < ActiveRecord::Base
belongs_to :location
belongs_to :feature
end
class Feature < ActiveRecord::Base
has_many :location_features, dependent: :destroy
has_many :locations, through: :location_features
end
Obviously this code isn't working the way I want it to and I just can't get my head around it. I've also tried things such as:
Location.includes(:features).where('features.id = 5 AND features.id = 9').references(:features)
but it just returns nothing. Using OR instead of AND give me either again. I also tried:
Location.includes(:features).where(features: {id: 9}, features: {id: 1})
but this just gives me all the locations with the feature_id of 1.
What would be the best way to query for a location matching all the requested features?
When you do an include it makes a "pseudo-table" in memory which has all the combinations of table A and table B, in this case joined on the foreign_key. (In this case there's already a join table included (feature_locations), to complicate things.)
There won't be any rows in this table which satisfy the condition features.id = 9 AND features.id = 1. Each row will only have a single features.id value.
What i would do for this is forget about the features table: you only need to look in the join table, location_features, to test for the presence of specific feature_id values. We need a query which will compare feature_id and location_id from this table.
One way is to get the features, then get a collection of arrays if associated location_ids (which just calls the join table), then see which location ids are in all of the arrays: (i've renamed your method to be more descriptive)
#in Location
def self.having_all_feature_ids(*ids)
location_ids = Feature.find_all_by_id(ids).map(&:location_ids).inject{|a,b| a & b}
self.find(location_ids)
end
Note1: the asterisk in *ids in the params means that it will convert a list of arguments (including a single argument, which is like a "list of one") into a single array.
Note2: inject is a handy device. it says "do this code between the first and second elements in the array, then between the result of this and the third element, then the result of this and the fourth element, etc, till you get to the end. In this case the code i'm doing between the two elements in each pair (a and b) is "&" which, when dealing with arrays, is the "set intersection operator" - this will return only elements which are in both pairs. By the time you've gone through the list of arrays doing this, only elements which are in ALL arrays will have survived. These are the ids of locations which are associated with ALL of the given features.
EDIT: i'm sure there's a way to do this with a single sql query - possibly using group_concat - which someone else will probably post shortly :)
I would do this as a set of subqueries. You can actually also do it as a scope if you wish.
scope :has_all_features, ->(*feature_ids) {
where( ( ["locations.id in (select location_id from location_features where feature_id=?)"] * feature_ids.count).join(' and '), *feature_ids)
}

Querying based on two associated records

I have a product that has many variants, those variants have two attributes: Size and Color.
I want to query for the Variant based on the two attributes I pass in - I got it to work with following:
variants = Spree::Variant.joins(:option_values).where(:spree_option_values => {:id => size.id}, :product_id => prod.id).joins(:option_values)
variant = variants.select{|v| v.option_values.include?(size)}
From my understanding, the select method more or less iterates through the array, which is kinda slow. I would rather have a query that finds the variant directly based on those two attributes.
I tried the following:
Spree::Variant.joins(:option_values).where(:spree_option_values => {:id => size.id}, :product_id => prod.id).joins(:option_values).where(:spree_option_values => {:id => color.id})
but this only ended up in returning an empty array.
How would I go about this?
Edit: Here are the product, variant and option_values models:
Product:
https://github.com/spree/spree/blob/master/core/app/models/spree/product.rb
Variant:
https://github.com/spree/spree/blob/master/core/app/models/spree/variant.rb
OptionValue: https://github.com/spree/spree/blob/master/core/app/models/spree/option_value.rb
OptionType: https://github.com/spree/spree/blob/master/core/app/models/spree/option_type.rb
Updated 2: you're right, this is not what you looking for.
So you can:
1) Build SQL subquery: (if joined table has size and has color at the same time then return TRUE). How quick it will be working - is a question...
2) Imagine you've created a model "ValuesVariants" for table "spree_option_values_variants" and kicked out habtm (replace with 2 has_manys + 2 has_manys through). Now you can search ValuesVariants with (option_type_id = size_id||color_id AND variant_id IN (array of product's variant ids)), extracting matched variants. It can be quick enough...
3) You can use :includes. so associated objects loaded into the memory and the second search do by array methods. In this case the concern is in memory usage.

Resources