Rails query conditions for hour of creation - ruby-on-rails

In an attempt to summarise traffic data base on a time span, one cannot search invoking a component of a datetime object as such:
txat0 = Transaction.where(['shop_id = ? AND created_at.hour = ?', shop, 0]).count
One could go via the SQL route (i.e. postgresql)
select extract(shop_id, hour from created_at) from transactions
and filter from there.
But what is a succinct way of achieving this with ruby or rails (performance is not a concern for this query) ?

I believe you could do a mix and run the SQL part inside an ActiveRecord query.
What about:
Transaction.where("DATE_PART('hour', created_at) = ?", 0)
PS: I've ignored the shop_id clause in the above example, but you can just add it afterwards.

Related

Rails update_all from associated_object

I have a Glass object and a Prescription object, but i forgot to add timestamps to the Glass Object, so i created a migration to do that. However, not surprisingly all the objects have todays date and time.
glass belongs_to :prescription prescription has_one :glass
However, I can get the correct timestamp from the Prescription object. I just don't know how to do that. So I want to do something like
Glass.update_all(:created_at => self.prescription.created_at)
any ideas ?
Easiest thing to do is simply multiple SQL queries, it's a one off migration so no biggie I think. ActiveRecord update_all is meant to update the matching records with the same value so that won't work.
Glass.all.find_each do |glass|
glass.update!(created_at: glass.prescription.created_at)
end
If you want one query (update based on a join - called "update from" in sql terms) it seems not straightforward in ActiveRecord (should work on MySQL but not on Postgres) https://github.com/rails/rails/issues/13496 it will be easier to write raw SQL - this can help you get started https://www.dofactory.com/sql/update-join
You can use touch method
Prescription.find_each do |prescription|
prescription.glass.touch(:created_at, time: prescription.created_at)
end
Believe me when I say that I'm on team "idiomatic Rails" and it's true that iterating through each record and updating it is probably more idiomatic, but UPDATE FROM.. is so incredibly more performant and efficient (resources-wise) that unless the migration is iterating through < 1000 records, I prefer to do the in-SQL UPDATE FROM.
The particular syntax for doing an update from a join will vary depending on which SQL implementation you're running (Postgres, MySQL, etc.), but in general just execute it from a Rails DB connection.
InboundMessage.connection.execute <<-SQL
UPDATE
inbound_messages
INNER JOIN notifications
ON inbound_messages.message_detail_type = "Notification"
AND inbound_messages.message_detail_id = notifications.id
SET
inbound_messages.message_detail_type = notifications.notifiable_type,
inbound_messages.message_detail_id = notifications.notifiable_id
WHERE
notifications.type = "foo_bar"
SQL

Properly format an ActiveRecord query with a subquery in Postgres

I have a working SQL query for Postgres v10.
SELECT *
FROM
(
SELECT DISTINCT ON (title) products.title, products.*
FROM "products"
) subquery
WHERE subquery.active = TRUE AND subquery.product_type_id = 1
ORDER BY created_at DESC
With the goal of the query to do a distinct based on the title column, then filter and order them. (I used the subquery in the first place, as it seemed there was no way to combine DISTINCT ON with ORDER BY without a subquery.
I am trying to express said query in ActiveRecord.
I have been doing
Product.select("*")
.from(Product.select("DISTINCT ON (product.title) product.title, meals.*"))
.where("subquery.active IS true")
.where("subquery.meal_type_id = ?", 1)
.order("created_at DESC")
and, that works! But, it's fairly messy with the string where clauses in there. Is there a better way to express this query with ActiveRecord/Arel, or am I just running into the limits of what ActiveRecord can express?
I think the resulting ActiveRecord call can be improved.
But I would start improving with original SQL query first.
Subquery
SELECT DISTINCT ON (title) products.title, products.* FROM products
(I think that instead of meals there should be products?) has duplicate products.title, which is not necessary there. Worse, it misses ORDER BY clause. As PostgreSQL documentation says:
Note that the “first row” of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first
I would rewrite sub-query as:
SELECT DISTINCT ON (title) * FROM products ORDER BY title ASC
which gives us a call:
Product.select('DISTINCT ON (title) *').order(title: :asc)
In main query where calls use Rails-generated alias for the subquery. I would not rely on Rails internal convention on aliasing subqueries, as it may change anytime. If you do not take this into account you could merge these conditions in one where call with hash-style argument syntax.
The final result:
Product.select('*')
.from(Product.select('DISTINCT ON (title) *').order(title: :asc))
.where(subquery: { active: true, meal_type_id: 1 })
.order('created_at DESC')

How to filter rails model by month

I want to get only those records of my model which belong to a particular month. I am using:
Order.where(created_at.strftime("%B"): "April")
where created_at is DateTime. created_at.strftime("%B") gives me month, but it does not work. Any alternative?
You'll probably have to do this in plan SQL (not the activerecord DSL):
Order.where("strftime('%m', created_at) = ?", 'April')
This uses the SQLite function to extract the month name
(I haven't done Rails in a while, let me know if this doesn't work)

Is there anyway to make a lesser impact on my database with this request?

For the analytics of my site, I'm required to extract the 4 states of my users.
#members = list.members.where(enterprise_registration_id: registration.id)
# This pulls roughly 10,0000 records.. Which is evidently a huge data pull for Rails
# Member Load (155.5ms)
#invited = #members.where("user_id is null")
# Member Load (21.6ms)
#not_started = #members.where("enterprise_members.id not in (select enterprise_member_id from quizzes where quizzes.section_id IN (?)) AND enterprise_members.user_id in (select id from users)", #sections.map(&:id) )
# Member Load (82.9ms)
#in_progress = #members.joins(:quizzes).where('quizzes.section_id IN (?) and (quizzes.completed is null or quizzes.completed = ?)', #sections.map(&:id), false).group("enterprise_members.id HAVING count(quizzes.id) > 0")
# Member Load (28.5ms)
#completes = Quiz.where(enterprise_member_id: registration.members, section_id: #sections.map(&:id)).completed
# Quiz Load (138.9ms)
The operation returns a 503 meaning my app gives up on the request. Any ideas how I can refactor this code to run faster? Maybe by better joins syntax? I'm curious how sites with larger datasets accomplish what seems like such trivial DB calls.
The answer is your indexes. Check your rails logs (or check the console in development mode) and copy the queries to your db tool. Slap an "Explain" in front of the query and it will give you a breakdown. From here you can see what indexes you need to optimize the query.
For a quick pass, you should at least have these in your schema,
enterprise_members: needs an index on enterprise_member_id
members: user_id
quizes: section_id
As someone else posted definitely look into adding indexes if needed. Some of how to refactor depends on what exactly you are trying to do with all these records. For the #members query, what are you using the #members records for? Do you really need to retrieve all attributes for every member record? If you are not using every attribute, I suggest only getting the attributes that you actually use for something, .pluck usage could be warranted. 3rd and 4th queries, look fishy. I assume you've run the queries in a console? Again not sure what the queries are being used for but I'll toss in that it is often useful to write raw sql first and query on the db first. Then, you can apply your findings to rewriting activerecord queries.
What is the .completed tagged on the end? Is it supposed to be there? only thing I found close in the rails api is .completed? If it is a custom method definitely look into it. You potentially also have an use case for scopes.
THIRD QUERY:
I unfortunately don't know ruby on rails, but from a postgresql perspective, changing your "not in" to a left outer join should make it a little faster:
Your code:
enterprise_members.id not in (select enterprise_member_id from quizzes where quizzes.section_id IN (?)) AND enterprise_members.user_id in (select id from users)", #sections.map(&:id) )
Better version (in SQL):
select blah
from enterprise_members em
left outer join quizzes q on q.enterprise_member_id = em.id
join users u on u.id = q.enterprise_member_id
where quizzes.section_id in (?)
and q.enterprise_member_id is null
Based on my understanding this will allow postgres to sort both the enterprise_members table and the quizzes and do a hash join. This is better than when it will do now. Right now it finds everything in the quizzes subquery, brings it into memory, and then tries to match it to enterprise_members.
FIRST QUERY:
You could also create a partial index on user_id for your first query. This will be especially good if there are a relatively small number of user_ids that are null in a large table. Partial index creation:
CREATE INDEX user_id_null_ix ON enterprise_members (user_id)
WHERE (user_id is null);
Anytime you query enterprise_members with something that matches the index's where clause, the partial index can be used and quickly limit the rows returned. See http://www.postgresql.org/docs/9.4/static/indexes-partial.html for more info.
Thanks everyone for your ideas. I basically did what everyone said. I added indexes, resorted how I called everything, but the major difference was using the pluck method.. Here's my new stats :
#alt_members = list.members.pluck :id # 23ms
if list.course.sections.tests.present? && #sections = list.course.sections.tests
#quiz_member_ids = Quiz.where(section_id: #sections.map(&:id)).pluck(:enterprise_member_id) # 8.5ms
#invited = list.members.count('user_id is null') # 12.5ms
#not_started = ( #alt_members - ( #alt_members & #quiz_member_ids ).count #0ms
#in_progress = ( #alt_members & #quiz_member_ids ).count # 0ms
#completes = ( #alt_members & Quiz.where(section_id: #sections.map(&:id), completed: true).pluck(:enterprise_member_id) ).count # 9.7ms
#question_count = Quiz.where(section_id: #sections.map(&:id), completed: true).limit(5).map{|quiz|quiz.answers.count}.max # 3.5ms

Rails + ActiveRecord + optimization: Is there a better way to update on 300,000 records?

So I have a rake task that does this:
wine_club_memberships = WineClubMembership.pluck(:billing_info_id)
total_updated = BillingInfo.joins(:order).where(["orders.ordered_date < (CURRENT_DATE - 90) AND billing_infos.card_number IS NOT NULL AND billing_infos.card_number != '' AND billing_infos.id NOT IN (?)", wine_club_memberships]).update_all("card_number = ''")
log.error("Total records updated #{total_updated}")
The thing is that BillingInfo has 300,000+ records, and I'm wondering if all this joins, where, update_all is just the same as using pure SQL. Currently it's not too efficient, since I have a huge array of WineClubMembership records that I stuff in the statement.
Is there a more efficient way of doing this? Even though this is a long ugly statement, I was thinking that it would be efficient for the most part because it does everything pretty much in one or two hits to the database. However, people around me are thinking there must be other "Rails methods" that could do this in a better way that won't affect the performance of the production website.
I did see doing searches in "batches" but I am not sure if that will help.
UPDATE
I'm using Postgres 9.1+. In the old (just a little simpler) version of my activerecord search, This is what came out:
Ruby code:
wine_club_memberships = WineClubMembership.pluck(:billing_info_id)
total_updated = BillingInfo.joins(:order).where(["orders.ordered_date < (CURRENT_DATE - 90) AND billing_infos.id NOT IN (?)", wine_club_memberships]).update_all("card_number = ''")
SQL generated:
SQL (127848.6ms) UPDATE "billing_infos" SET card_number = '' WHERE "billing_infos"."id" IN (SELECT "billing_infos"."id" FROM "billing_infos" INNER JOIN "orders" ON "orders"."id" = "billing_infos"."order_id" WHERE (orders.ordered_date < (CURRENT_DATE - 90) AND billing_infos.id NOT IN (423908,390663,387323,402393,383446,416114,391009,456371,384305,386681,384382,384418, ...)))
It's possible that if you have your db manage the source of the final NOT IN comparison there will be optimizations in the db for dealing with it I.e. let sql manage the list of ids instead of passing it a 300,000 item long array. If your db allows try something like
... NOT IN (SELECT billing_info_id FROM wine_club_memberships)").update_all("card_number = ''")
As far as a Rails specific method for speeding this up, you're usually not going to be able to do better (performance-wise, if not maintainability-wise) than just passing a pure sql string to the dbs.

Resources