PostgreSQL Error for Select Distinct in Rails - ruby-on-rails

On Mysql it works correctly.
PG::Error: ERROR: for SELECT DISTINCT, ORDER BY expressions must
appear in select list LINE 1: ) ORDER BY
programs.rating DESC, program_sc... ^ :
Query:
SELECT DISTINCT "programs".* FROM "programs" INNER JOIN
"program_schedules" ON "program_schedules"."program_id" =
"programs"."id" WHERE (programs.rating >= 5 AND
program_schedules.start >= '2012-11-03 23:14:43.457659') AND (ptype =
'movie') ORDER BY programs.rating DESC,
program_schedules.start DESC
Rails code:
#data =
Program.joins(:program_schedules).where('programs.rating >= ?
AND program_schedules.start >= ?',5,
Time.now).order('programs.rating DESC,
program_schedules.start DESC').uniq
I have tried with
Program.select("programs.*,
program_schedules.*).joins(:program_schedules).where(...
but, in this way, when I'm going to read
#data.program_schedules
I get a nil value (When I know there are no nil values)
PostgreSQL 9.2 (Heroku), Ruby 1.9.2
Some info about my DB:
class Program < ActiveRecord::Base
has_many :program_schedules
end
class ProgramSchedule < ActiveRecord::Base
belongs_to :program
end
Schema.db
create_table "program_schedules", :force => true do |t|
t.integer "program_id"
t.datetime "start"
end
create_table "programs", :force => true do |t|
t.string "title",
t.string "ptype"
end
EDIT:
I don't need to order "programs_schedules" because I need all programs_schedules in my array related to that program.

You query is ambiguous in two ways:
ptype is not table-qualified and you did not disclose the table definitions. So the query is ambiguous.
More importantly, you want to:
ORDER BY programs.rating DESC, program_schedules.start DESC
At the same time, however, you instruct PostgreSQL to give you DISTINCT rows from programs. If there are multiple matching rows in program_schedules, how would Postgres know which one to pick for the ORDER BY clause? The first? Last? Earliest, latest, greenest? It's just undefined.
Generally, the ORDER BY clause cannot disagree with the DISTINCT clause, that's what the error message tells you.
Based on a few assumptions, filling in for missing information, your query could look like this:
SELECT p.*
FROM programs p
JOIN program_schedules ps ON ps.program_id = p.id
WHERE p.rating >= 5
AND ps.start >= '2012-11-03 23:14:43.457659'
AND p. ptype = 'movie' -- assuming ptype is from programs (?)
GROUP BY p.id -- assuming it's the primary key
ORDER BY p.rating DESC, min(ps.start) DESC; -- assuming smallest start
Also assuming you have PostgreSQL 9.0 or later, which is required for this to work. (Primary key covers whole table in GROUP BY.)
As for:
On Mysql it works correctly.
No it doesn't. It "works", but in mysterious ways rather than "correctly". MySQL allows for all sorts of weird mistakes and goes out of its way (and the SQL standard) to avoid having to throw exceptions - which is a very unfortunate way to deal with errors. It regularly comes back to haunt you later. Demo on Youtube.
Question update
I need all programs_schedules in my array related to that program.
You might want to add:
SELECT p.*, array_agg(ps.start ORDER BY ps.start)

Related

Rails include returns filtered relations instead of all relations

I'm using an includes instead of a join because it runs faster but the statement is returning an association that doesn't include all of the data I'm looking for. It returns all of the left data, but only the right data that matches the query. Hopefully the examples below help clarify the problem and what I'm trying to achieve.
The join does seem to do what I'm after from a data and rails association perspective but executes a ton of queries and is much slower.
Setup and examples
class Species < ActiveRecord::Base
has_many :species_types, foreign_key: 'species_id', primary_key: "id"
end
class SpeciesTypes < ActiveRecord::Base
belongs_to :species, :foreign_key => "id", :primary_key => "species_id"
end
create_table "species", force: :cascade do |t|
t.bigint "id"
t.string "identifier"
end
create_table "species_types", force: :cascade do |t|
t.bigint "species_id"
t.bigint "type_id"
t.string "name"
end
Table data to help visualize the queries below
Species
id
identifier
1
furry
2
sleek
3
hairy
4
shiny
5
reflective
6
rough
7
rubbery
SpeciesTypes
species_id
type_id
identifier
1
1
hairy
1
2
metalic
2
3
skin
3
1
hairy
4
2
metalic
4
3
skin
5
3
skin
5
3
skin
6
2
metalic
7
2
metalic
I know the SpeciesTypes.type_id, and I'm looking to get all Species that have that type, including all of their SpeciesTypes.
Using includes
`species = Species.includes(:species_types).where(:species_types => {:type_id => 1})`
This does return all Species with a matching SpeciesType. However, instead of returning all Species with all SpeciesType it return all Species with only the SpeciesType that match the :type_id parameter. So, in this case you cannot reference all SpeciesTypes from the Species object (species[0].species_types). Does not return what was expected, although it makes sense why it does limit to the matched type_id.
Response from above query for Species
irb()$ species = Species.includes(:species_types).where(:species_types => {:type_id => 1})
irb()$ species[0].species_types
[#<SpeciesTypes:0x0000ffff9ad73490
species_id: 1,
type_id: 1,
identifier: hairy>]
I'm looking for this:
irb()$ species = Species.includes(:species_types).where(:species_types => {:type_id => 1})
irb()$ species[0].species_types
[#<SpeciesTypes:0x0000ffff9ad73490
species_id: 1,
type_id: 1,
identifier: hairy>,
<SpeciesTypes:0x0000ffff9ad73490
species_id: 1,
type_id: 2,
identifier: metalic>,
]
Using joins
This is returning what I'm after (using join instead of includes) however the query is much much slower. I think I'm missing something obvious (or not obvious but fundamental)
species = Species.joins(:species_types).where(:species_types => {:type_id => 3})
The above returns the values that I expect but is a much slower query.
Can the includes query be updated to return all Species with all types that match the known :type_id?
While its pretty natural to think that Species.includes(:species_types).where(:species_types => {:type_id => 3}) would load all the species and just eager load the species_types that match the where clause thats not how ActiveRecord and SQL actually works.
What this generates in terms of a query something like:
SELECT species.name AS t0_c1, species_types.id AS t1_c1 ...
LEFT OUTER JOIN species_types, t1
ON species_types.specie_id = species.id
WHERE species_types.type_id = ?
When you use includes and reference the other table it delegates to .eager_load which loads both tables in a single database query.
The where clause here applies to the entire query and not just the joined assocation. Remember that this query returns a row for every species_types row (with duplicate data for the species table).
If you wanted to load just the records that match the condition you would need to put the restriction into the JOIN clause:
SELECT species.name AS t0_c1, ...
LEFT OUTER JOIN species_types, t1
ON species_types.specie_id = species.id AND species_types.type_id = ?
Unfortunately ActiveRecord associations do not provide a way to do that.
The easiest solution to the problem is most likely to just query from the other end:
Type.find(1)
.specie_types.where(specie: specie)
.joins is not the answer
You can't just replace includes with joins as they do very things.
joins just adds an INNER LEFT JOIN to the query but doesn't actually select any columns from the joined table. Its used to filter the assocation based on the joined table or to select aggregates. Not to prevent N+1 queries.
In this case it's most likely not the first query itself thats slower - rather you're creating a N+1 query when you iterate through specie_types as the assocation is not eager loaded / preloaded.
includes does an OUTER LEFT JOIN and will load the assocatiated records either in one or two queries depending on how its used.

What does ActiveModel::MissingAttributeError in Rails mean in my case?

I have two Postgres tables that look like above.
amount_availables belongs to facilities as shown below:
class AmountAvailable < ApplicationRecord
belongs_to :sequence_number
belongs_to :facility
validates :facility, :presence => true
When I run a complex query that joins these 2 tables, I get the below error (and this error is not consistent):
ActiveModel::MissingAttributeError (missing attribute: facility_id):
Generated SQL:
This is the SQL that's generated:
SELECT "as_of_date", "entity", "facility", "financial_institution", "amount_availables"."amount_available", "amount_availables"."comments", "amount_availables"."last_updated_on", "amount_availables"."last_updated_by" FROM "amount_availables" INNER JOIN "sequence_numbers" ON "sequence_numbers"."sequence_number" = "amount_availables"."sequence_number_id" INNER JOIN "facilities" ON "facilities"."facility_id" = "amount_availables"."facility_id" WHERE (sequence_numbers.as_of_date >= '10/01/2019' and sequence_numbers.as_of_date <= '12/24/2019' AND facilities.entity in ('3C7','HOLD CO','PCM','PC-M','PFSI','PLS','PMIT','POP','QRS','TAG','TRS')) ORDER BY last_updated_on desc
NOTE:
And until last week, I remember I was getting this error inconsistently (it occurred a LOT of times but not ALL the time! But this week I seem to get it pretty much all the time. And this SQL runs just fine on Postgres client against the same database that my Rails app is using).
Does this error mean there should be a column named facility_id in facilities table?
What should I do on my Rails code to fix this?
I even tried renaming the id column in facilities table to facility_id using the Rails migration code below:
Approach 1 - Rails Migration Fix
class ModifyFacilitiesPkColumnName < ActiveRecord::Migration
def change
rename_column :facilities, :id, :facility_id
end
end
But I am still getting the same error even after the above approach (I DID run rake db:migrate and I can see in my postgres client that the id column in facilites HAS changed to facility_id).
What does this error mean precisely, and how do I fix this?
Your Inner Join seems to be incorrect:
SELECT * FROM "amount_availables"
INNER JOIN "sequence_numbers" ON "sequence_numbers"."sequence_number" = "amount_availables"."sequence_number_id"
INNER JOIN "facilities" ON "facilities"."facility_id" = "amount_availables"."facility_id"
WHERE (sequence_numbers.as_of_date >= '10/01/2019' and sequence_numbers.as_of_date <= '12/24/2019' AND facilities.entity in ('3C7','HOLD CO','PCM','PC-M','PFSI','PLS','PMIT','POP','QRS','TAG','TRS')) ORDER BY last_updated_on desc
Instead it should be:
INNER JOIN "facilities" ON "facilities"."id" = "amount_availables"."facility_id"
It should work.

Postgresql INNER JOIN with multiple conditons

I am working on a query that is driving me nuts. I am trying to search a table by the values in a join table. Currently, the query is returning all results that have either of the specified ids. I would only like it to return records that only have both of the ids.
Here is my table setup
creat_table champions do |t|
t.integer :id
end
create_table champions_champion_combinations do |t|
t.integer :champion_id
t.integer :champion_combination_id
end
create_table champion_combinations do |t|
t.integer :id
end
Here is the query as I have it
SELECT
champion_combinations.*
FROM champion_combinations
INNER JOIN champions_champion_combinations
ON champions_champion_combinations.champion_combination_id = champion_combinations.id
INNER JOIN champions
ON champions.id = champions_champion_combinations.champion_id
WHERE champions.id IN (1, 2)"
Generated from the RoR ActiveRecord query
ChampionCombination.joins(:champions).where(champions: {id:[1,2]})
So this returns all champion_combinations that have either champion ids 1 or 2 joined to it. What type of query do I need to write that only returns the combination with both ids 1 and 2 joined to it?
Thanks in advance.
If you're interesting in pure SQL solution, then you can use GROUP BY and HAVING clauses to achieve your goal. Here is the sql query:
SELECT cc.*
FROM champion_combinations AS cc
INNER JOIN champions_champion_combinations AS ccc ON ccc.champion_combination_id = cc.id
WHERE ccc.champion_id IN (1, 2)
GROUP BY cc.id
HAVING array_agg(ccc.champion_id) #> ARRAY[1,2];
PS Thanks to #IgorRomanchenko for great suggestions.
It is not inner join problem. Inner join is working as expected. SQL query given above doing inner join of both the tables with "champion_combination", there is no restriction which says both ids has to be present. You should do is
ChampionCombination.joins(:champions).where(champions: {id:[1,2]}).where("champion_id is not null and compion_combination_id is not null")

Rails: How to get objects with at least one child?

After googling, browsing SO and reading, there doesn't seem to be a Rails-style way to efficiently get only those Parent objects which have at least one Child object (through a has_many :children relation). In plain SQL:
SELECT *
FROM parents
WHERE EXISTS (
SELECT 1
FROM children
WHERE parent_id = parents.id)
The closest I've come is
Parent.all.reject { |parent| parent.children.empty? }
(based on another answer), but it's really inefficient because it runs a separate query for each Parent.
Parent.joins(:children).uniq.all
As of Rails 5.1, uniq is deprecated and distinct should be used instead.
Parent.joins(:children).distinct
This is a follow-up on Chris Bailey's answer. .all is removed as well from the original answer as it doesn't add anything.
The accepted answer (Parent.joins(:children).uniq) generates SQL using DISTINCT but it can be slow query. For better performance, you should write SQL using EXISTS:
Parent.where<<-SQL
EXISTS (SELECT * FROM children c WHERE c.parent_id = parents.id)
SQL
EXISTS is much faster than DISTINCT. For example, here is a post model which has comments and likes:
class Post < ApplicationRecord
has_many :comments
has_many :likes
end
class Comment < ApplicationRecord
belongs_to :post
end
class Like < ApplicationRecord
belongs_to :post
end
In database there are 100 posts and each post has 50 comments and 50 likes. Only one post has no comments and likes:
# Create posts with comments and likes
100.times do |i|
post = Post.create!(title: "Post #{i}")
50.times do |j|
post.comments.create!(content: "Comment #{j} for #{post.title}")
post.likes.create!(user_name: "User #{j} for #{post.title}")
end
end
# Create a post without comment and like
Post.create!(title: 'Hidden post')
If you want to get posts which have at least one comment and like, you might write like this:
# NOTE: uniq method will be removed in Rails 5.1
Post.joins(:comments, :likes).distinct
The query above generates SQL like this:
SELECT DISTINCT "posts".*
FROM "posts"
INNER JOIN "comments" ON "comments"."post_id" = "posts"."id"
INNER JOIN "likes" ON "likes"."post_id" = "posts"."id"
But this SQL generates 250000 rows(100 posts * 50 comments * 50 likes) and then filters out duplicated rows, so it could be slow.
In this case you should write like this:
Post.where <<-SQL
EXISTS (SELECT * FROM comments c WHERE c.post_id = posts.id)
AND
EXISTS (SELECT * FROM likes l WHERE l.post_id = posts.id)
SQL
This query generates SQL like this:
SELECT "posts".*
FROM "posts"
WHERE (
EXISTS (SELECT * FROM comments c WHERE c.post_id = posts.id)
AND
EXISTS (SELECT * FROM likes l WHERE l.post_id = posts.id)
)
This query does not generate useless duplicated rows, so it could be faster.
Here is benchmark:
user system total real
Uniq: 0.010000 0.000000 0.010000 ( 0.074396)
Exists: 0.000000 0.000000 0.000000 ( 0.003711)
It shows EXISTS is 20.047661 times faster than DISTINCT.
I pushed the sample application in GitHub, so you can confirm the difference by yourself:
https://github.com/JunichiIto/exists-query-sandbox
I have just modified this solution for your need.
Parent.joins("left join childrens on childrends.parent_id = parents.id").where("childrents.parent_id is not null")
You just want an inner join with a distinct qualifier
SELECT DISTINCT(*)
FROM parents
JOIN children
ON children.parent_id = parents.id
This can be done in standard active record as
Parent.joins(:children).uniq
However if you want the more complex result of find all parents with no children
you need an outer join
Parent.joins("LEFT OUTER JOIN children on children.parent_id = parent.id").
where(:children => { :id => nil })
which is a solution which sux for many reasons. I recommend Ernie Millers squeel library which will allow you to do
Parent.joins{children.outer}.where{children.id == nil}
try including the children with #includes()
Parent.includes(:children).all.reject { |parent| parent.children.empty? }
This will make 2 queries:
SELECT * FROM parents;
SELECT * FROM children WHERE parent_id IN (5, 6, 8, ...);
[UPDATE]
The above solution is usefull when you need to have the Child objects loaded.
But children.empty? can also use a counter cache1,2 to determine the amount of children.
For this to work you need to add a new column to the parents table:
# a new migration
def up
change_table :parents do |t|
t.integer :children_count, :default => 0
end
Parent.reset_column_information
Parent.all.each do |p|
Parent.update_counters p.id, :children_count => p.children.length
end
end
def down
change_table :parents do |t|
t.remove :children_count
end
end
Now change your Child model:
class Child
belongs_to :parent, :counter_cache => true
end
At this point you can use size and empty? without touching the children table:
Parent.all.reject { |parent| parent.children.empty? }
Note that length doesn't use the counter cache whereas size and empty? do.

Rails 3.1 with PostgreSQL: GROUP BY must be used in an aggregate function

I am trying to load the latest 10 Arts grouped by the user_id and ordered by created_at. This works fine with SqlLite and MySQL, but gives an error on my new PostgreSQL database.
Art.all(:order => "created_at desc", :limit => 10, :group => "user_id")
ActiveRecord error:
Art Load (18.4ms) SELECT "arts".* FROM "arts" GROUP BY user_id ORDER BY created_at desc LIMIT 10
ActiveRecord::StatementInvalid: PGError: ERROR: column "arts.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT "arts".* FROM "arts" GROUP BY user_id ORDER BY crea...
Any ideas?
The sql generated by the expression is not a valid query, you are grouping by user_id and selecting lot of other fields based on that but not telling the DB how it should aggregate the other fileds. For example, if your data looks like this:
a | b
---|---
1 | 1
1 | 2
2 | 3
Now when you ask db to group by a and also return b, it doesn't know how to aggregate values 1,2. You need to tell if it needs to select min, max, average, sum or something else. Just as I was writing the answer there have been two answers which might explain all this better.
In your use case though, I think you don't want a group by on db level. As there are only 10 arts, you can group them in your application. Don't use this method with thousands of arts though:
arts = Art.all(:order => "created_at desc", :limit => 10)
grouped_arts = arts.group_by {|art| art.user_id}
# now you have a hash with following structure in grouped_arts
# {
# user_id1 => [art1, art4],
# user_id2 => [art3],
# user_id3 => [art5],
# ....
# }
EDIT: Select latest_arts, but only one art per user
Just to give you the idea of sql(have not tested it as I don't have RDBMS installed on my system)
SELECT arts.* FROM arts
WHERE (arts.user_id, arts.created_at) IN
(SELECT user_id, MAX(created_at) FROM arts
GROUP BY user_id
ORDER BY MAX(created_at) DESC
LIMIT 10)
ORDER BY created_at DESC
LIMIT 10
This solution is based on the practical assumption, that no two arts for same user can have same highest created_at, but it may well be wrong if you are importing or programitically creating bulk of arts. If assumption doesn't hold true, the sql might get more contrieved.
EDIT: Attempt to change the query to Arel:
Art.where("(arts.user_id, arts.created_at) IN
(SELECT user_id, MAX(created_at) FROM arts
GROUP BY user_id
ORDER BY MAX(created_at) DESC
LIMIT 10)").
order("created_at DESC").
page(params[:page]).
per(params[:per])
You need to select the specific columns you need
Art.select(:user_id).group(:user_id).limit(10)
It will raise error when you try to select title in the query, for example
Art.select(:user_id, :title).group(:user_id).limit(10)
column "arts.title" must appear in the GROUP BY clause or be used in an aggregate function
That is because when you try to group by user_id, the query has no idea how to handle the title in the group, because the group contains several titles.
so the exception already mention you need to appear in group by
Art.select(:user_id, :title).group(:user_id, :title).limit(10)
or be used in an aggregate function
Art.select("user_id, array_agg(title) as titles").group(:user_id).limit(10)
Take a look at this post SQLite to Postgres (Heroku) GROUP BY
PostGres is actually following the SQL standard here whilst sqlite and mysql break from the standard.
Have at look at this question - Converting MySQL select to PostgreSQL. Postgres won't allow a column to be listed in the select statement that isn't in the group by clause.

Resources