Executing a SQL query with an `IN` clause from Rails code - ruby-on-rails

I know precious nothing abour Rails, so please excuse my naivete about this question.
I'm trying to modify a piece of code that I got from somewhere to make it execute it for a randomly selected bunch of users. Here it goes:
users = RedshiftRecord.connection.execute(<<~SQL
select distinct user_id
from tablename
order by random()
limit 1000
SQL
).to_a
sql = 'select user_id, count(*) from tablename where user_id in (?) group by user_id'
<Library>.on_replica(:something) do
Something::SomethingElse.
connection.
exec_query(sql, users.join(',')).to_h
end
This gives me the following error:
ActiveRecord::StatementInvalid: PG::SyntaxError: ERROR: syntax error at or near ")"
LINE 1: ...ount(*) from tablename where user_id in (?) group by...
^
Users is an array, I know this coz I executed the following and it resulted in true:
p users.instance_of? Array
Would someone please help me execute this code? I want to execute a simple SQL query that would look like this:
select user_id, count(*) from tablename where user_id in (user1,user2,...,user1000) group by user_id

The problem here is that IN takes a list of parameters. Using a single bind IN (?) and a comma separated string will not magically turn it into a list of arguments. Thats just not how SQL works.
What you want is:
where user_id in (?, ?, ?, ...)
Where the number of binds matches the length of the array you want to pass.
The simple but hacky way to do this would be just interpolate in n number of question marks into the SQL string:
binds = Array.new(users.length, '?').join(',')
sql = <<~SQL
select user_id, count(*)
from tablename
where user_id in (#{binds)})
group by user_id'
SQL
<Library>.on_replica(:something) do
Something::SomethingElse.
connection.
exec_query(sql, users).to_h
end
But you would typically do this in a Rails app by creating a model and using the ActiveRecord query interface or using Arel to programatically create the SQL query.

Related

How to pass rails array mysql Where IN condition

I have on array like below,
skills = ['ruby','Ruby on Rails'];
I am trying to pass array in mysql where condition like below
questions = MysqlConnection.connection.select_all("
SELECT questions.*,quest_answers.* FROM `questions`
INNER JOIN `quest_answers` ON `quest_answers`.`question_id` =
`questions`.`id` where questions.category IN (#{skills.join(', ')})")
But it did not work,How can pass an array to where In condition.
Error I am getting
Mysql2::Error: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'on rails, Ruby)' at line 1: SELECT questions.*,quest_answers.* FROM `questions` INNER JOIN `quest_answers` ON `quest_answers`.`question_id` = `questions`.`id` where questions.category IN (Ruby on rails, Ruby)
You are passing a string representation of the array to MySQL, which doesn't work. You need to insert the values in the array into the query. This can be done by escaping the skills, and joining them:
skills.map { |s| "'#{s}'" }.join(', ')
This produces 'ruby', 'Ruby on Rails', which is a valid argument for the IN statement.
A better approach however is to not write the raw SQL at all, but rely on ActiveRecord to generate it. This is the more maintainable and readable approach.
Question.joins(:quest_answers).where(category: skills)
Passing an array to where converts it automatically into a subset condition.

Properly format an ActiveRecord query with a subquery in Postgres

I have a working SQL query for Postgres v10.
SELECT *
FROM
(
SELECT DISTINCT ON (title) products.title, products.*
FROM "products"
) subquery
WHERE subquery.active = TRUE AND subquery.product_type_id = 1
ORDER BY created_at DESC
With the goal of the query to do a distinct based on the title column, then filter and order them. (I used the subquery in the first place, as it seemed there was no way to combine DISTINCT ON with ORDER BY without a subquery.
I am trying to express said query in ActiveRecord.
I have been doing
Product.select("*")
.from(Product.select("DISTINCT ON (product.title) product.title, meals.*"))
.where("subquery.active IS true")
.where("subquery.meal_type_id = ?", 1)
.order("created_at DESC")
and, that works! But, it's fairly messy with the string where clauses in there. Is there a better way to express this query with ActiveRecord/Arel, or am I just running into the limits of what ActiveRecord can express?
I think the resulting ActiveRecord call can be improved.
But I would start improving with original SQL query first.
Subquery
SELECT DISTINCT ON (title) products.title, products.* FROM products
(I think that instead of meals there should be products?) has duplicate products.title, which is not necessary there. Worse, it misses ORDER BY clause. As PostgreSQL documentation says:
Note that the “first row” of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first
I would rewrite sub-query as:
SELECT DISTINCT ON (title) * FROM products ORDER BY title ASC
which gives us a call:
Product.select('DISTINCT ON (title) *').order(title: :asc)
In main query where calls use Rails-generated alias for the subquery. I would not rely on Rails internal convention on aliasing subqueries, as it may change anytime. If you do not take this into account you could merge these conditions in one where call with hash-style argument syntax.
The final result:
Product.select('*')
.from(Product.select('DISTINCT ON (title) *').order(title: :asc))
.where(subquery: { active: true, meal_type_id: 1 })
.order('created_at DESC')

Rails: Why does where(id: objects) work?

I have the following statement:
Customer.where(city_id: cities)
which results in the following SQL statement:
SELECT customers.* FROM customers WHERE customers.city_id IN (SELECT cities.id FROM cities...
Is this intended behavior? Is it documented somewhere? I will not use the Rails code above and use one of the followings instead:
Customer.where(city_id: cities.pluck(:id))
or
Customer.where(city: cities)
which results in the exact same SQL statement.
The AREL querying library allows you to pass in ActiveRecord objects as a short-cut. It'll then pass their primary key attributes into the SQL it uses to contact the database.
When looking for multiple objects, the AREL library will attempt to find the information in as few database round-trips as possible. It does this by holding the query you're making as a set of conditions, until it's time to retrieve the objects.
This way would be inefficient:
users = User.where(age: 30).all
# ^^^ get all these users from the database
memberships = Membership.where(user_id: users)
# ^^^^^ This will pass in each of the ids as a condition
Basically, this way would issue two SQL statements:
select * from users where age = 30;
select * from memberships where user_id in (1, 2, 3);
Each of these involves a call on a network port between applications and the data to then be passsed back across that same port.
This would be more efficient:
users = User.where(age: 30)
# This is still a query object, it hasn't asked the database for the users yet.
memberships = Membership.where(user_id: users)
# Note: this line is the same, but users is an AREL query, not an array of users
It will instead build a single, nested query so it only has to make a round-trip to the database once.
select * from memberships
where user_id in (
select id from users where age = 30
);
So, yes, it's expected behaviour. It's a bit of Rails magic, it's designed to improve your application's performance without you having to know about how it works.
There's also some cool optimisations, like if you call first or last instead of all, it will only retrieve one record.
User.where(name: 'bob').all
# SELECT "USERS".* FROM "USERS" WHERE "USERS"."NAME" = 'bob'
User.where(name: 'bob').first
# SELECT "USERS".* FROM "USERS" WHERE "USERS"."NAME" = 'bob' AND ROWNUM <= 1
Or if you set an order, and call last, it will reverse the order then only grab the last one in the list (instead of grabbing all the records and only giving you the last one).
User.where(name: 'bob').order(:login).first
# SELECT * FROM (SELECT "USERS".* FROM "USERS" WHERE "USERS"."NAME" = 'bob' ORDER BY login) WHERE ROWNUM <= 1
User.where(name: 'bob').order(:login).first
# SELECT * FROM (SELECT "USERS".* FROM "USERS" WHERE "USERS"."NAME" = 'bob' ORDER BY login DESC) WHERE ROWNUM <= 1
# Notice, login DESC
Why does it work?
Something deep in the ActiveRecord query builder is smart enough to see that if you pass an array or a query/criteria, it needs to build an IN clause.
Is this documented anywhere?
Yes, http://guides.rubyonrails.org/active_record_querying.html#hash-conditions
2.3.3 Subset conditions
If you want to find records using the IN expression you can pass an array to the conditions hash:
Client.where(orders_count: [1,3,5])
This code will generate SQL like this:
SELECT * FROM clients WHERE (clients.orders_count IN (1,3,5))

Solving a PG::GroupingError: ERROR

The following code gets all the residences which have all the amenities which are listed in id_list. It works with out a problem with SQLite but raises an error with PostgreSQL:
id_list = [48, 49]
Residence.joins(:listed_amenities).
where(listed_amenities: {amenity_id: id_list}).
references(:listed_amenities).
group(:residence_id).
having("count(*) = ?", id_list.size)
The error on the PostgreSQL version:
What do I have to change to make it work with PostgreSQL?
A few things:
references should only be used with includes; it tells ActiveRecord to perform a join, so it's redundant when using an explicit joins.
You need to fully qualify the argument to group, i.e. group('residences.id').
For example,
id_list = [48, 49]
Residence.joins(:listed_amenities).
where(listed_amenities: { amenity_id: id_list }).
group('residences.id').
having('COUNT(*) = ?", id_list.size)
The query the Ruby (?) code is expanded to is selecting all fields from the residences table:
SELECT "residences".*
FROM "residences"
INNER JOIN "listed_amenities"
ON "listed_amentities"."residence_id" = "residences"."id"
WHERE "listed_amenities"."amenity_id" IN (48,49)
GROUP BY "residence_id"
HAVING count(*) = 2
ORDER BY "residences"."id" ASC
LIMIT 1;
From the Postgres manual, When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or if the ungrouped column is functionally dependent on the grouped columns, since there would otherwise be more than one possible value to return for an ungrouped column.
You'll need to either group by all fields that aggregate functions aren't applied to, or do this differently. From the query, it looks like you only need to scan the amentities table to get the residence ID you're looking for:
SELECT "residence_id"
FROM "listed_amenities"
WHERE "listed_amenities"."amenity_id" IN (48,49)
GROUP BY "residence_id"
HAVING count(*) = 2
ORDER BY "residences"."id" ASC
LIMIT 1
And then fetch your residence data with that ID. Or, in one query:
SELECT "residences".*
FROM "residences"
WHERE "id" IN (SELECT "residence_id"
FROM "listed_amenities"
WHERE "listed_amenities"."amenity_id" IN (48,49)
GROUP BY "residence_id"
HAVING count(*) = 2
ORDER BY "residences"."id" ASC
LIMIT 1
);

Rails 3 LIKE query raises exception when using a double colon and a dot

In rails 3.0.0, the following query works fine:
Author.where("name LIKE :input",{:input => "#{params[:q]}%"}).includes(:books).order('created_at')
However, when I input as search string (so containing a double colon followed by a dot):
aa:.bb
I get the following exception:
ActiveRecord::StatementInvalid: SQLite3::SQLException: ambiguous column name: created_at
In the logs the these are the sql queries:
with aa as input:
Author Load (0.4ms) SELECT "authors".* FROM "authors" WHERE (name LIKE 'aa%') ORDER BY created_at
Book Load (2.5ms) SELECT "books".* FROM "books" WHERE ("books".author_id IN (1,2,3)) ORDER BY id
with aa:.bb as input:
SELECT DISTINCT "authors".id FROM "authors" LEFT OUTER JOIN "books" ON "books"."author_id" = "authors"."id" WHERE (name LIKE 'aa:.bb%') ORDER BY created_at DESC LIMIT 12 OFFSET 0
SQLite3::SQLException: ambiguous column name: created_at
It seems that with the aa:.bb input, an extra query is made to fetch the distinct author id_s.
I thought Rails would escape all the characters. Is this expected behaviour or a bug?
Best Regards,
Pieter
The "ambiguous column" error usually happens when you use includes or joins and don't specify which table you're referring to:
"name LIKE :input"
Should be:
"authors.name LIKE :input"
Just "name" is ambiguous if your books table has a name column too.
Also: have a look at your development.log to see what the generated query looks like. This will show you if it's being escaped properly.
Replace
.includes(:books)
with
.preload(:books)
This should force activerecord to use 2 queries instead of the join.
Rails has 2 versions of includes: One which constructs a big query with joins (the 2nd of your 2 queries and thus more likely to result in ambiguous column references and one that avoids the joins in favour of a separate query per association.
Rails decides which strategy to used based on whether it thinks that your conditions, order etc refer to the included tables (since in that case the joins version is required). Where a condition is a string fragment that heuristic isn't very sophisticated - i seem to recall that it just scans the conditions for anything that might look like a column from another table (ie foo.bar) so having a literal of that form could fool it.
You can either qualify your column names so that it doesn't matter which includes strategy is used or you can use preload/eager_load instead of includes. These behave similarly to includes but force a specific include strategy rather than trying to guess which is most appropriate.

Resources