Using multiple column names in where with RoR and ActiveRecord - ruby-on-rails

I want to produce the following sql using active record.
WHERE (column_name1, column_name1) IN (SELECT ....)
I don't know how to do this is active record.
I've tried these so far
where('column_name1, column_name2' => {})
where([:column_name1, :column_name2] => {})
This is the full query I'd like to create
SELECT a, Count(1)
FROM table
WHERE ( a, b ) IN (SELECT a,
Max(b)
FROM table
GROUP BY a)
GROUP BY a
HAVING Count(1) > 1)
I've already written a scope for the subquery
Thanks in advance.

WHERE (column_name1, column_name1) IN (SELECT ....) is not a valid construct in sql; so it can't be done in active record either.
The valid way of accomplishing the same in SQL would be:
WHERE column_name1 IN (select ....) OR column_name2 IN (select ...)
The same query can be used directly in the active record:
where("column_name1 IN (select ...) OR column_name2 IN (select...)")
Avoiding duplication:
selected_values = select ...
where("column_name IN ? OR column_name2 in ?", selected_values, selected_values)

So I decided to use an inner join to gain the same functionality. Here is my solution.
select(:column1, 'Count(1)').
joins("INNER JOIN (#{subquery.to_sql}) AS table2 ON
table1.column1=table2.column1
AND table1.column2=table2.column2")

Related

Find n most referenced records by foreign_key in related table

I have a table skills and a table programs_skills which references skill_id as a foreign key, I want to retrieve the 10 most present skills in table programs_skills (I need to count the number of occurrence of skill_id in programs_skills and then order it by descending order).
I wrote this in my skill model:
def self.most_used(limit)
Skill.find(
ActiveRecord::Base.connection.execute(
'SELECT programs_skills.skill_id, count(*) FROM programs_skills GROUP BY skill_id ORDER BY count DESC'
).to_a.first(limit).map { |record| record['skill_id'] }
)
end
This is working but I would like to find a way to perform this query in a more elegant, performant, "activerecord like" way.
Could you help me rewrite this query ?
Just replace your query by:
WITH
T AS
(
SELECT skill_id, COUNT(*) AS NB, RANK() OVER(ORDER BY COUNT(*) DESC) AS RNK
FROM programs_skills
GROUP BY skill_id
)
SELECT wojewodztwo, NB
FROM T
WHERE RNK <= 10
This use CTE and windowed function.
ProgramsSkills.select("skill_id, COUNT(*) AS nb_skills")
.group(:skill_id).order("nb_skills DESC").limit(limit)
.first(limit).pluck(:skill_id)

How to write sub query in active record?

I have two tables users and posts and they have association of has_many. I want to fetch details of both users and posts in a single query. I'm able to manage the sql query but I don't want to use the raw query in the code (using execute method) as i think it is kind of simple thing and can be written using active record.
Here is the sql query
SELECT a.id, a.name, a.timestamp, b.id, b.user_id, b.title
FROM users a
INNER JOIN (SELECT id, user_id, title, from, to FROM posts) b on b.user_id = a.id
where id IN ( 1, 2, 3);
I think includes does not help here because i'm dealing with large data.
Can any one help me ?
If you just want those specific columns and nothing else then this will work
User.joins(:post)
.where(id: [1,2,3])
.select("users.id, users.name, users.timestamp,
posts.id as post_id, posts.user_id as post_user_id,
posts.title as post_title")
This will return an ActiveRecord::Relation of User objects with virtual attributes for post_id, post_user_id (Not sure why you need this one since you already selected users.id), and post_title.
The query produced will be
SELECT users.id,
users.name,
users.timestamp,
posts.id as post_id,
posts.user_id as post_user_id,
posts.title as post_title
FROM users
INNER JOIN posts on posts.user_id = users.id
where users.id IN ( 1, 2, 3);
Please note you may have multiple User objects, one for each Post, just as the SQL query does.
You can execute your exact query using the string version of joins e.g.
User.joins("INNER JOIN (SELECT id, user_id, title, from, to FROM posts) b on b.user_id = users.id")
.where(id: [1,2,3])
.select("users.id, users.name, users.timestamp,
b.id as post_id, b.user_id as post_user_id,
b.title as post_title")
Additionally to avoid some of the overhead you can use arel instead e.g.
users_table = User.arel_table
posts_table = Post.arel_table
query = users_table.project(Arel.star)
.join(posts_table)
.on(posts_table[:user_id].eq(users_table[:id]))
.where(users_table[:id].in([1,2,3]))
ActiveRecord::Base.connection.exec_query(query.to_sql)
This will return an ActiveRecord::Result with 2 useful methods columns (the columns selected) and rows. You can convert this to a Hash(#to_hash) but note that any columns with duplicate names (id for instance) will overwrite one another.
You could fix this by specifying the colums you want selected in the project portion. e.g. your current query would be:
query = users_table.project(
users_table[:id],
users_table[:name],
users_table[:timestamp],
posts_table[:id].as('post_id'),
posts_table[:user_id].as('post_user_id'),
posts_table[:title].as('post_title')
).join(posts_table)
.on(posts_table[:user_id].eq(users_table[:id]))
.where(users_table[:id].in([1,2,3]))
ActiveRecord::Base.connection.exec_query(query.to_sql).to_hash
Since none of the names collide now it can be structured into a nice Hash where the keys are the column names and the values or the row value for that record.
users = User.joins(:posts).includes(:posts).where(id: [1, 2, 3])
Will give you all the users with theirs posts.
then you can do whatever you want with them, but to access posts data for first retrieved user
first_user_posts = users.first.posts # this will not make additional DB queries as you used includes and data is already added
We use joins to have INNER JOIN statement in the SQL
We use includes to load all posts in the memory
I have two tables users and posts and they have association of
has_many. I want to fetch details of both users and posts in a single
query.
can be done with includes like
users = User.includes(:posts).where({posts: {user_id: [1,2,3]}})
other is eager_load and preload you can use as per your requirements, for more https://blog.arkency.com/2013/12/rails4-preloading/

Re-write a query to avoid PG::GroupingError: ERROR: in the GROUP BY clause or be used in an aggregate function

I tried many alternatives before posting this question.
I have a query on a table A with columns: id, num, user_id.
id is PK, user_id can be duplicate.
I need to have all the rows such that only unique user_id has chosen to have highest num value. For this, I came up with aSQL below, which will work in Oracle database. I am on ruby on rails platform with Postgres Database.
select stats.* from stats as A
where A.num > (
select B.num
from stats as B
where A.user_id == B.user_id
group by B.user_id
having B.num> min(B.num) )
I tried writing this query via active record method but still ran into
PG::GroupingError: ERROR: column "b.num" must appear in the GROUP BY
clause or be used in an aggregate function
Stat.where("stats.num > ( select B.nums from stats as B where stats.user_id = B.user_id group by B.user_id having B.num < max(B.num) )")
Can someone tell me alternative way of writing this query
The SELECT clause of your subquery in Rails doesn't match that of your example. Note that since you're performing an aggregate function min(B.num) in your HAVING clause, you'll have to also include it in your SELECT clause:
Stat.where("stats.num > ( select B.num from stats as B where stats.user_id = B.user_id group by B.user_id having B.num < max(B.num) )")
You may also need a condition to handle the case where select B.num from stats as B where stats.user_id = B.user_id group by B.user_id having B.num < max(B.num) returns more than one row.

How do I get Rails ActiveRecord to generate optimized SQL?

Let's say that I have 4 models which are related in the following ways:
Schedule has foreign key to Project
Schedule has foreign key to User
Project has foreign key to Client
In my Schedule#index view I want the most optimized SQL so that I can display links to the Schedule's associated Project, Client, and User. So, I should not pull all of the columns for the Project, Client, and User; only their IDs and Name.
If I were to manually write the SQL it might look like this:
select
s.id,
s.schedule_name,
s.schedule_type,
s.project_id,
p.name project_name,
p.client_id client_id,
c.name client_name,
s.user_id,
u.login user_login,
s.created_at,
s.updated_at,
s.data_count
from
Users u inner join
Clients c inner join
Schedules s inner join
Projects p
on p.id = s.project_id
on c.id = p.client_id
on u.id = s.user_id
order by
s.created_at desc
My question is: What would the ActiveRecord code look like to get Rails 3 to generate that SQL? For example, somthing like:
#schedules = Schedule. # ?
I already have the associations setup in the models (i.e. has_many / belongs_to).
I think this will build (or at least help) you get what you're looking for:
Schedule.select("schedules.id, schedules.schedule_name, projects.name as project_name").joins(:user, :project=>:client).order("schedules.created_at DESC")
should yield:
SELECT schedules.id, schedules.schedule_name, projects.name as project_name FROM `schedules` INNER JOIN `users` ON `users`.`id` = `schedules`.`user_id` INNER JOIN `projects` ON `projects`.`id` = `schedules`.`project_id` INNER JOIN `clients` ON `clients`.`id` = `projects`.`client_id`
The main problem I see in your approach is that you're looking for schedule objects but basing your initial "FROM" clause on "User" and your associations given are also on Schedule, so I built this solution based on the plain assumption that you want schedules!
I also didn't include all of your selects to save some typing, but you get the idea. You will simply have to add each one qualified with its full table name.

Nested query in squeel

Short version: How do I write this query in squeel?
SELECT OneTable.*, my_count
FROM OneTable JOIN (
SELECT DISTINCT one_id, count(*) AS my_count
FROM AnotherTable
GROUP BY one_id
) counts
ON OneTable.id=counts.one_id
Long version: rocket_tag is a gem that adds simple tagging to models. It adds a method tagged_with. Supposing my model is User, with an id and name, I could invoke User.tagged_with ['admin','sales']. Internally it uses this squeel code:
select{count(~id).as(tags_count)}
.select("#{self.table_name}.*").
joins{tags}.
where{tags.name.in(my{tags_list})}.
group{~id}
Which generates this query:
SELECT count(users.id) AS tags_count, users.*
FROM users INNER JOIN taggings
ON taggings.taggable_id = users.id
AND taggings.taggable_type = 'User'
INNER JOIN tags
ON tags.id = taggings.tag_id
WHERE tags.name IN ('admin','sales')
GROUP BY users.id
Some RDBMSs are happy with this, but postgres complains:
ERROR: column "users.name" must appear in the GROUP BY
clause or be used in an aggregate function
I believe a more agreeable way to write the query would be:
SELECT users.*, tags_count FROM users INNER JOIN (
SELECT DISTINCT taggable_id, count(*) AS tags_count
FROM taggings INNER JOIN tags
ON tags.id = taggings.tag_id
WHERE tags.name IN ('admin','sales')
GROUP BY taggable_id
) tag_counts
ON users.id = tag_counts.taggable_id
Is there any way to express this using squeel?
I wouldn't know about Squeel, but the error you see could be fixed by upgrading PostgreSQL.
Some RDBMSs are happy with this, but postgres complains:
ERROR: column "users.name" must appear in the GROUP BY clause or be
used in an aggregate function
Starting with PostgreSQL 9.1, once you list a primary key in the GROUP BY you can skip additional columns for this table and still use them in the SELECT list. The release notes for version 9.1 tell us:
Allow non-GROUP BY columns in the query target list when the primary
key is specified in the GROUP BY clause
BTW, your alternative query can be simplified, an additional DISTINCT would be redundant.
SELECT o.*, c.my_count
FROM onetable o
JOIN (
SELECT one_id, count(*) AS my_count
FROM anothertable
GROUP BY one_id
) c ON o.id = counts.one_id

Resources