I am working on a query that is driving me nuts. I am trying to search a table by the values in a join table. Currently, the query is returning all results that have either of the specified ids. I would only like it to return records that only have both of the ids.
Here is my table setup
creat_table champions do |t|
t.integer :id
end
create_table champions_champion_combinations do |t|
t.integer :champion_id
t.integer :champion_combination_id
end
create_table champion_combinations do |t|
t.integer :id
end
Here is the query as I have it
SELECT
champion_combinations.*
FROM champion_combinations
INNER JOIN champions_champion_combinations
ON champions_champion_combinations.champion_combination_id = champion_combinations.id
INNER JOIN champions
ON champions.id = champions_champion_combinations.champion_id
WHERE champions.id IN (1, 2)"
Generated from the RoR ActiveRecord query
ChampionCombination.joins(:champions).where(champions: {id:[1,2]})
So this returns all champion_combinations that have either champion ids 1 or 2 joined to it. What type of query do I need to write that only returns the combination with both ids 1 and 2 joined to it?
Thanks in advance.
If you're interesting in pure SQL solution, then you can use GROUP BY and HAVING clauses to achieve your goal. Here is the sql query:
SELECT cc.*
FROM champion_combinations AS cc
INNER JOIN champions_champion_combinations AS ccc ON ccc.champion_combination_id = cc.id
WHERE ccc.champion_id IN (1, 2)
GROUP BY cc.id
HAVING array_agg(ccc.champion_id) #> ARRAY[1,2];
PS Thanks to #IgorRomanchenko for great suggestions.
It is not inner join problem. Inner join is working as expected. SQL query given above doing inner join of both the tables with "champion_combination", there is no restriction which says both ids has to be present. You should do is
ChampionCombination.joins(:champions).where(champions: {id:[1,2]}).where("champion_id is not null and compion_combination_id is not null")
Related
I'm using an includes instead of a join because it runs faster but the statement is returning an association that doesn't include all of the data I'm looking for. It returns all of the left data, but only the right data that matches the query. Hopefully the examples below help clarify the problem and what I'm trying to achieve.
The join does seem to do what I'm after from a data and rails association perspective but executes a ton of queries and is much slower.
Setup and examples
class Species < ActiveRecord::Base
has_many :species_types, foreign_key: 'species_id', primary_key: "id"
end
class SpeciesTypes < ActiveRecord::Base
belongs_to :species, :foreign_key => "id", :primary_key => "species_id"
end
create_table "species", force: :cascade do |t|
t.bigint "id"
t.string "identifier"
end
create_table "species_types", force: :cascade do |t|
t.bigint "species_id"
t.bigint "type_id"
t.string "name"
end
Table data to help visualize the queries below
Species
id
identifier
1
furry
2
sleek
3
hairy
4
shiny
5
reflective
6
rough
7
rubbery
SpeciesTypes
species_id
type_id
identifier
1
1
hairy
1
2
metalic
2
3
skin
3
1
hairy
4
2
metalic
4
3
skin
5
3
skin
5
3
skin
6
2
metalic
7
2
metalic
I know the SpeciesTypes.type_id, and I'm looking to get all Species that have that type, including all of their SpeciesTypes.
Using includes
`species = Species.includes(:species_types).where(:species_types => {:type_id => 1})`
This does return all Species with a matching SpeciesType. However, instead of returning all Species with all SpeciesType it return all Species with only the SpeciesType that match the :type_id parameter. So, in this case you cannot reference all SpeciesTypes from the Species object (species[0].species_types). Does not return what was expected, although it makes sense why it does limit to the matched type_id.
Response from above query for Species
irb()$ species = Species.includes(:species_types).where(:species_types => {:type_id => 1})
irb()$ species[0].species_types
[#<SpeciesTypes:0x0000ffff9ad73490
species_id: 1,
type_id: 1,
identifier: hairy>]
I'm looking for this:
irb()$ species = Species.includes(:species_types).where(:species_types => {:type_id => 1})
irb()$ species[0].species_types
[#<SpeciesTypes:0x0000ffff9ad73490
species_id: 1,
type_id: 1,
identifier: hairy>,
<SpeciesTypes:0x0000ffff9ad73490
species_id: 1,
type_id: 2,
identifier: metalic>,
]
Using joins
This is returning what I'm after (using join instead of includes) however the query is much much slower. I think I'm missing something obvious (or not obvious but fundamental)
species = Species.joins(:species_types).where(:species_types => {:type_id => 3})
The above returns the values that I expect but is a much slower query.
Can the includes query be updated to return all Species with all types that match the known :type_id?
While its pretty natural to think that Species.includes(:species_types).where(:species_types => {:type_id => 3}) would load all the species and just eager load the species_types that match the where clause thats not how ActiveRecord and SQL actually works.
What this generates in terms of a query something like:
SELECT species.name AS t0_c1, species_types.id AS t1_c1 ...
LEFT OUTER JOIN species_types, t1
ON species_types.specie_id = species.id
WHERE species_types.type_id = ?
When you use includes and reference the other table it delegates to .eager_load which loads both tables in a single database query.
The where clause here applies to the entire query and not just the joined assocation. Remember that this query returns a row for every species_types row (with duplicate data for the species table).
If you wanted to load just the records that match the condition you would need to put the restriction into the JOIN clause:
SELECT species.name AS t0_c1, ...
LEFT OUTER JOIN species_types, t1
ON species_types.specie_id = species.id AND species_types.type_id = ?
Unfortunately ActiveRecord associations do not provide a way to do that.
The easiest solution to the problem is most likely to just query from the other end:
Type.find(1)
.specie_types.where(specie: specie)
.joins is not the answer
You can't just replace includes with joins as they do very things.
joins just adds an INNER LEFT JOIN to the query but doesn't actually select any columns from the joined table. Its used to filter the assocation based on the joined table or to select aggregates. Not to prevent N+1 queries.
In this case it's most likely not the first query itself thats slower - rather you're creating a N+1 query when you iterate through specie_types as the assocation is not eager loaded / preloaded.
includes does an OUTER LEFT JOIN and will load the assocatiated records either in one or two queries depending on how its used.
Let's say I have a User and User has_many :tags and I would like to remove all #users tags that have duplicated name. For example,
#user.tags #=> [<Tag name: 'A'>, <Tag name: 'A'>, <Tag name: 'B'>]
I would like to keep only the tags with unique names and delete the rest from the database.
I know I could pull out a list of unique tags names from user's tags and remove all users's tags and re-create user's tags with only unique names but it would be ineffficient?
On the other hand, select won't work as it returns only the selected column. uniq also won't work:
#user.tags.uniq #=> returns all tags
Is there a more efficient way?
UPDATE:
I would like to do this in a migration.
This method will give you an ActiveRecord::Relation with the duplicate tags:
class Tag < ApplicationRecord
belongs_to :user
def self.duplicate_tags
unique = self.select('DISTINCT ON(tags.name, tags.user_id) tags.id')
.order(:name, :user_id, :id)
self.where.not(id: unique)
end
end
Its actually run as a single query:
SELECT "tags".* FROM "tags"
WHERE "tags"."id" NOT IN
(SELECT DISTINCT ON(tags.name) tags.id
FROM "tags" GROUP BY "tags"."id", "tags"."user_id"
ORDER BY tags.name, tags.id)
You can remove the duplicates in a single query with #delete_all.
# Warning! This can't be undone!
Tag.duplicate_tags.destroy_all
If you need to destroy dependent associations or call your before_* or after_destroy callbacks, use the #destroy_all method instead. But you should use this together with #in_batches to avoid running out of memory.
# Warning! This can't be undone!
Tag.duplicate_tags.in_batches do |batch|
# destroys a batch of 1000 records
batch.destroy_all
end
You can write SQL model-independent query in the migration.
Here is PostgreSQL-specific migration code:
execute <<-SQL
DELETE FROM tags
WHERE id NOT IN (
SELECT DISTINCT ON(user_id, name) id FROM tags
ORDER BY user_id, name, id ASC
)
SQL
And here is more SQL common one:
execute <<-SQL
DELETE FROM tags
WHERE id IN (
SELECT DISTINCT t2.id FROM tags t1
INNER JOIN tags t2
ON (
t1.user_id = t2.user_id AND
t1.name = t2.name AND
t1.id < t2.id
)
)
SQL
This SQL fiddle shows
different queries you can use as sub-select in DELETE query depending on your goals: deleting first/last/all duplicates.
I have a model MobileApp and I use find_by_sql to do a complex SQL query with several INNER JOIN. The SQL query is the following :
SELECT DISTINCT mobile_apps.*, satisfaction_scores.id, satisfaction_scores.score FROM mobile_apps
INNER JOIN bucket_applications ON mobile_apps.id = bucket_applications.mobile_app_id
INNER JOIN satisfaction_scores ON mobile_apps.id = satisfaction_scores.mobile_app_id
WHERE NOT EXISTS
(SELECT * FROM feeds
WHERE feeds.mobile_app_id = mobile_apps.id
AND feeds.target_id = 41
AND feeds.left IS TRUE) ORDER BY satisfaction_scores.score DESC;
And I do a MobileApp.find_by_sql(<the_query>).
My problem is that I need to sort my MobileApp by score of SatisfactionScore. Consequently, I need to add in the SELECT the field score to have the ORDER BY working with score. I think the SQL query is right, but ActiveRecord doesn't like to have in the SELECT columns not from table mobile_apps and therefore returns wrong ids of MobileApp.
MobileApp.find_by_sql(<the_query>) =>
[#<MobileApp id: 41>,
#<MobileApp id: 42>,
#<MobileApp id: 43>,
#<MobileApp id: 44>]
But
MobileApp.ids => [153, 156, 159, 162, 165]
Is there a better way to do it ?
Thanks.
By removing the satisfaction_scores.id from the SELECT, the MobileApp ids returned by the find_by_sql are correct. Besides the column satisfaction_scores.id was not essential for the query.
Using rails 3.2 with active_admin and seeing PG::Error: ERROR: column reference "status" is ambiguous when using a custom filter on active_admin in Rents.rb:
filter :travel_car_brand, as: :string
filter :travel_car_model, as: :string
The error points to:
: SELECT COUNT(DISTINCT "rents"."id") FROM "rents" LEFT OUTER JOIN "travels" ON "travels"."id" = "rents"."travel_id" LEFT OUTER JOIN "cars" ON "cars"."travel_id" = "travels"."id" WHERE ("cars"."brand" ILIKE '%mazda%') AND ("startDate" > '2014-08-04 10:15:14 +0200' and status = 'paid'):
it's interesting that the above has status = 'paid' since I'm not sure why its using that as a filter.
models
Rent.rb
belongs_to :travel
Travel.rb
has_one :car
and both rents table and travels table have a status attribute.
I've seen Lucas' answer but if this is a rails app, the SQL should be generated by the application, not hardcoded. Therefore changing the SQL directly is not the solution.
Instead, I would suggest you find the code that is adding the "paid" filter and modify it to declare the relevant model name.
Somewhere you probably have a scope:
scope :paid, where(status: 'paid')
change that to (for example):
scope :paid, where("model.status = 'paid'")
You need to chose wich table you want your attribute to select, or use both
e.g.
SELECT
COUNT(DISTINCT "rents"."id")
FROM "rents"
LEFT OUTER JOIN "travels" ON "travels"."id" = "rents"."travel_id"
LEFT OUTER JOIN "cars" ON "cars"."travel_id" = "travels"."id"
WHERE ("cars"."brand" ILIKE '%mazda%')
AND ("startDate" > '2014-08-04 10:15:14 +0200')
AND rents.status = 'paid'
or if you require both:
SELECT
COUNT(DISTINCT "rents"."id")
FROM "rents"
LEFT OUTER JOIN "travels" ON "travels"."id" = "rents"."travel_id"
LEFT OUTER JOIN "cars" ON "cars"."travel_id" = "travels"."id"
WHERE ("cars"."brand" ILIKE '%mazda%')
AND ("startDate" > '2014-08-04 10:15:14 +0200')
AND rents.status = 'paid'
AND travels.status = 'paid'
Your "status" column is ambiguous. Because SQL can't understand which one table's column you want. Rent.status or Travels.status that SQL can not understand.
On Mysql it works correctly.
PG::Error: ERROR: for SELECT DISTINCT, ORDER BY expressions must
appear in select list LINE 1: ) ORDER BY
programs.rating DESC, program_sc... ^ :
Query:
SELECT DISTINCT "programs".* FROM "programs" INNER JOIN
"program_schedules" ON "program_schedules"."program_id" =
"programs"."id" WHERE (programs.rating >= 5 AND
program_schedules.start >= '2012-11-03 23:14:43.457659') AND (ptype =
'movie') ORDER BY programs.rating DESC,
program_schedules.start DESC
Rails code:
#data =
Program.joins(:program_schedules).where('programs.rating >= ?
AND program_schedules.start >= ?',5,
Time.now).order('programs.rating DESC,
program_schedules.start DESC').uniq
I have tried with
Program.select("programs.*,
program_schedules.*).joins(:program_schedules).where(...
but, in this way, when I'm going to read
#data.program_schedules
I get a nil value (When I know there are no nil values)
PostgreSQL 9.2 (Heroku), Ruby 1.9.2
Some info about my DB:
class Program < ActiveRecord::Base
has_many :program_schedules
end
class ProgramSchedule < ActiveRecord::Base
belongs_to :program
end
Schema.db
create_table "program_schedules", :force => true do |t|
t.integer "program_id"
t.datetime "start"
end
create_table "programs", :force => true do |t|
t.string "title",
t.string "ptype"
end
EDIT:
I don't need to order "programs_schedules" because I need all programs_schedules in my array related to that program.
You query is ambiguous in two ways:
ptype is not table-qualified and you did not disclose the table definitions. So the query is ambiguous.
More importantly, you want to:
ORDER BY programs.rating DESC, program_schedules.start DESC
At the same time, however, you instruct PostgreSQL to give you DISTINCT rows from programs. If there are multiple matching rows in program_schedules, how would Postgres know which one to pick for the ORDER BY clause? The first? Last? Earliest, latest, greenest? It's just undefined.
Generally, the ORDER BY clause cannot disagree with the DISTINCT clause, that's what the error message tells you.
Based on a few assumptions, filling in for missing information, your query could look like this:
SELECT p.*
FROM programs p
JOIN program_schedules ps ON ps.program_id = p.id
WHERE p.rating >= 5
AND ps.start >= '2012-11-03 23:14:43.457659'
AND p. ptype = 'movie' -- assuming ptype is from programs (?)
GROUP BY p.id -- assuming it's the primary key
ORDER BY p.rating DESC, min(ps.start) DESC; -- assuming smallest start
Also assuming you have PostgreSQL 9.0 or later, which is required for this to work. (Primary key covers whole table in GROUP BY.)
As for:
On Mysql it works correctly.
No it doesn't. It "works", but in mysterious ways rather than "correctly". MySQL allows for all sorts of weird mistakes and goes out of its way (and the SQL standard) to avoid having to throw exceptions - which is a very unfortunate way to deal with errors. It regularly comes back to haunt you later. Demo on Youtube.
Question update
I need all programs_schedules in my array related to that program.
You might want to add:
SELECT p.*, array_agg(ps.start ORDER BY ps.start)