Rails + Postgres: How to select count of how many records are updated or inserted? - ruby-on-rails

So I have an update statement:
UPDATE billing_infos set card_number = ''
FROM orders
WHERE billing_infos.order_id = orders.id ...);`
How would I find the count of how many records are updated by this statement?
I'm doing this in my console through ActiveRecord::Base.connection.execute() so it's just returning a <PG::Result:0x007f9c99ef0370> object.
Anyone know how I could do this using SQL or a Rails method?

p = ActiveRecord::Base.connection.execute(<query>)
p.cmd_status
This gives the command status. Something like
UPDATE 16
For more methods on PG::Result, refer here

While solution showed by Vimsha will definitely work, there is also another solution (assuming you use recent enough pg), which could be a bit nicer:
with u as (
update ... returning 1
)
select count(*) from u;
That's one query, and it's technically a select, so you run it as any other select.

As mentioned in a comment of another answer, the easiest way is to use the cmd_tuples attribute of the result
result = ActiveRecord::Base.connection.execute("insert into tbl select 'test' col")
puts result.cmd_tuples
result
1

Related

How do duplicate/clone ActiveRecord::Relation in optimized way

I want to clone a list of ActiveRecord objects in an optimized way. Maybe the way that I am using is optimized already but I need to speed up this process. So this is my code
ActiveRecord::Base.transaction do
data = MyModel.where(column_1: 'x')
data.each do |item|
new_item = item.dup
new_item.column_2 = 'y'
new_item.save!
end
end
Maybe there is a better way of duplicating a list of records at once then update all of them with one query. I tried to Google it but no luck till now.
If I understood right, you want to duplicate objects that meet condition
(column_1: 'x')
You can try this approach, looks like it will do the same
MyModel.where(column_1: 'x').find_each { |u| u.dup.update(column2: 'y') }
But a little slower(benchmarked n = 1000)
<Benchmark::Tms:0x00007f96e6a1b970 #label="**dup.update**", #real=6.75463099999979, #cstime=0.0, #cutime=0.0, #stime=0.331546, #utime=2.2468139999999996, #total=2.5783599999999995>,
<Benchmark::Tms:0x00007f96e8cb23f8 #label="**dup.save!**", #real=6.470054999999775, #cstime=0.0, #cutime=0.0, #stime=0.32828900000000005, #utime=1.972385, #total=2.300674>
Short answer is to use single insert statement instead of multiple inserts even though if it's in one DB transaction.
And bulk_insert gem will help you to do it. ( Thanks to arieljuod )
If you want to do this as quickly as possible then SQL is the way to go.
Something like this would do it:
ActiveRecord::Base.connection.execute <<-SQL
INSERT INTO my_models (column_1, column_2, column_3, ..)
SELECT column_1, 'y', column_3, ..
FROM my_models WHERE column_1 = 'x';
SQL
This should work in PostgreSQL and MySql.
If you have any rails magic columns (created_at, updated_at etc) they will have to be added manually in the SQL.

Is there anyway to make a lesser impact on my database with this request?

For the analytics of my site, I'm required to extract the 4 states of my users.
#members = list.members.where(enterprise_registration_id: registration.id)
# This pulls roughly 10,0000 records.. Which is evidently a huge data pull for Rails
# Member Load (155.5ms)
#invited = #members.where("user_id is null")
# Member Load (21.6ms)
#not_started = #members.where("enterprise_members.id not in (select enterprise_member_id from quizzes where quizzes.section_id IN (?)) AND enterprise_members.user_id in (select id from users)", #sections.map(&:id) )
# Member Load (82.9ms)
#in_progress = #members.joins(:quizzes).where('quizzes.section_id IN (?) and (quizzes.completed is null or quizzes.completed = ?)', #sections.map(&:id), false).group("enterprise_members.id HAVING count(quizzes.id) > 0")
# Member Load (28.5ms)
#completes = Quiz.where(enterprise_member_id: registration.members, section_id: #sections.map(&:id)).completed
# Quiz Load (138.9ms)
The operation returns a 503 meaning my app gives up on the request. Any ideas how I can refactor this code to run faster? Maybe by better joins syntax? I'm curious how sites with larger datasets accomplish what seems like such trivial DB calls.
The answer is your indexes. Check your rails logs (or check the console in development mode) and copy the queries to your db tool. Slap an "Explain" in front of the query and it will give you a breakdown. From here you can see what indexes you need to optimize the query.
For a quick pass, you should at least have these in your schema,
enterprise_members: needs an index on enterprise_member_id
members: user_id
quizes: section_id
As someone else posted definitely look into adding indexes if needed. Some of how to refactor depends on what exactly you are trying to do with all these records. For the #members query, what are you using the #members records for? Do you really need to retrieve all attributes for every member record? If you are not using every attribute, I suggest only getting the attributes that you actually use for something, .pluck usage could be warranted. 3rd and 4th queries, look fishy. I assume you've run the queries in a console? Again not sure what the queries are being used for but I'll toss in that it is often useful to write raw sql first and query on the db first. Then, you can apply your findings to rewriting activerecord queries.
What is the .completed tagged on the end? Is it supposed to be there? only thing I found close in the rails api is .completed? If it is a custom method definitely look into it. You potentially also have an use case for scopes.
THIRD QUERY:
I unfortunately don't know ruby on rails, but from a postgresql perspective, changing your "not in" to a left outer join should make it a little faster:
Your code:
enterprise_members.id not in (select enterprise_member_id from quizzes where quizzes.section_id IN (?)) AND enterprise_members.user_id in (select id from users)", #sections.map(&:id) )
Better version (in SQL):
select blah
from enterprise_members em
left outer join quizzes q on q.enterprise_member_id = em.id
join users u on u.id = q.enterprise_member_id
where quizzes.section_id in (?)
and q.enterprise_member_id is null
Based on my understanding this will allow postgres to sort both the enterprise_members table and the quizzes and do a hash join. This is better than when it will do now. Right now it finds everything in the quizzes subquery, brings it into memory, and then tries to match it to enterprise_members.
FIRST QUERY:
You could also create a partial index on user_id for your first query. This will be especially good if there are a relatively small number of user_ids that are null in a large table. Partial index creation:
CREATE INDEX user_id_null_ix ON enterprise_members (user_id)
WHERE (user_id is null);
Anytime you query enterprise_members with something that matches the index's where clause, the partial index can be used and quickly limit the rows returned. See http://www.postgresql.org/docs/9.4/static/indexes-partial.html for more info.
Thanks everyone for your ideas. I basically did what everyone said. I added indexes, resorted how I called everything, but the major difference was using the pluck method.. Here's my new stats :
#alt_members = list.members.pluck :id # 23ms
if list.course.sections.tests.present? && #sections = list.course.sections.tests
#quiz_member_ids = Quiz.where(section_id: #sections.map(&:id)).pluck(:enterprise_member_id) # 8.5ms
#invited = list.members.count('user_id is null') # 12.5ms
#not_started = ( #alt_members - ( #alt_members & #quiz_member_ids ).count #0ms
#in_progress = ( #alt_members & #quiz_member_ids ).count # 0ms
#completes = ( #alt_members & Quiz.where(section_id: #sections.map(&:id), completed: true).pluck(:enterprise_member_id) ).count # 9.7ms
#question_count = Quiz.where(section_id: #sections.map(&:id), completed: true).limit(5).map{|quiz|quiz.answers.count}.max # 3.5ms

Rails Postgres Error GROUP BY clause or be used in an aggregate function

In SQLite (development) I don't have any errors, but in production with Postgres I get the following error. I don't really understand the error.
PG::Error: ERROR: column "commits.updated_at" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: ...mmits"."user_id" = 1 GROUP BY mission_id ORDER BY updated_at...
^
: SELECT COUNT(*) AS count_all, mission_id AS mission_id FROM "commits" WHERE "commits"."user_id" = 1 GROUP BY mission_id ORDER BY updated_at DESC
My controller method:
def show
#user = User.find(params[:id])
#commits = #user.commits.order("updated_at DESC").page(params[:page]).per(25)
#missions_commits = #commits.group("mission_id").count.length
end
UPDATE:
So i digged further into this PostgreSQL specific annoyance and I am surprised that this exception is not mentioned in the Ruby on Rails Guide.
I am using psql (PostgreSQL) 9.1.11
So from what I understand, I need to specify which column that should be used whenever you use the GROUP_BY clause. I thought using SELECT would help, which can be annoying if you need to SELECT a lot of columns.
Interesting discussion here
Anyways, when I look at the error, everytime the cursor is pointed to updated_at. In the SQL query, rails will always ORDER BY updated_at. So I have tried this horrible query:
#commits.group("mission_id, date(updated_at)")
.select("date(updated_at), count(mission_id)")
.having("count(mission_id) > 0")
.order("count(mission_id)").length
which gives me the following SQL
SELECT date(updated_at), count(mission_id)
FROM "commits"
WHERE "commits"."user_id" = 1
GROUP BY mission_id, date(updated_at)
HAVING count(mission_id) > 0
ORDER BY updated_at DESC, count(mission_id)
LIMIT 25 OFFSET 0
the error is the same.
Note that no matter what it will ORDER BY updated_at, even if I wanted to order by something else.
Also I don't want to group the records by updated_at just by mission_id.
This PostgreSQL error is just misleading and has little explanation to solving it. I have tried many formulas from the stackoverflow sidebar, nothing works and always the same error.
UPDATE 2:
So I got it to work, but it needs to group the updated_at because of the automatic ORDER BY updated_at. How do I count only by mission_id?
#missions_commits = #commits.group("mission_id, updated_at").count("mission_id").size
I guest you want to show general number of distinct Missions related with Commits, anyway it won't be number on page.
Try this:
#commits = #user.commits.order("updated_at DESC").page(params[:page]).per(25)
#missions_commits = #user.commits.distinct.count(:mission_id)
However if you want to get the number of distinct Missions on page I suppose it should be:
#missions_commits = #commits.collect(&:mission_id).uniq.count
Update
In Rails 3, distinct did not exist, but pure SQL counting should be used this way:
#missions_commits = #user.commits.count(:mission_id, distinct: true)
See the docs for PostgreSQL GROUP BY here:
http://www.postgresql.org/docs/9.3/interactive/sql-select.html#SQL-GROUPBY
Basically, unlike Sqlite (and MySQL) postgres requires that any columns selected or ordered on must appear in an aggregate function or the group by clause.
If you think it through, you'll see that this actually makes sense. Sqlite/MySQL cheat under the hood and silently drop those fields (not sure that's technically what happens).
Or thinking about it another way if you are grouping by a field, what's the point of ordering it? How would that even make sense unless you also had an aggregate function on the ordered field?

Why doesn't Rails where clause return results with boolean condition?

I have the following squeel query:
i = Invoice.where{ paid == true }
that's the same as:
i = Invoice.where ['paid = ?', true]
and executes:
SELECT "invoices".* FROM "invoices" WHERE "invoices"."paid" = 't'
However, this query doesn't return any invoices at all. It doesn't work if I try to execute the query from my sqlite program ether, seems like that query is wrong. I'm absolutely sure there are invoices in the sqlite db with both 't' and 'f' as value. How to get this right?
SQLite does not have a separate Boolean storage class. Instead, Boolean values are stored as integers 0 (false) and 1 (true). Try this:
i = Invoice.where{ paid == 1 }
Also see: http://www.sqlite.org/datatype3.html
UPDATE:
I found a great explanation for your dilemma right here at SO. See Rails 3 SQLite3 Boolean false.
Good luck!
I believe the correct syntax should be:
i = Invoice.where(paid: true) # corrected from using braces to parenthesis

Complex order statement with rails AREL: SQL Case statement

I've got this bit of code that basically tries to use a SQL case statement in the active relation order method:
relation = Foo.order("CASE WHEN foos.thing IS NOT NULL THEN 0 ELSE 1 END ASC")
and in the generated (and executed) SQL it comes up as:
(ORDER BY CASE ASC)
I've tried digging down into the source and lose the thread down in the visitor.access call. Is this a known issue? Is it user error? Is there some magical thing I have to do to make it happen? I was under the impression that it just inserted the raw SQL. There are other things we're doing with the relation, such as select, limit, offset, group, having and joins.
help! :)
I've had this problem too:
You can put the CASE into the SELECT and name it so you can use it in the ORDER BY.
relation = Foo.select("*, CASE WHEN foos.thing IS NOT
NULL THEN 0 ELSE 1 END AS foo_order").order("foo_order ASC")

Resources