Query that checks overlap with composed timeframe (postgres, ActiveRecord) - ruby-on-rails

I have a Schema like this:
User(:id)
Event(:id, :start_date, :end_date,:duration, :recurrence_pattern)
EventAssignment(:user_id, :event_id, :date)
recurrence_pattern is a rrule string (see https://icalendar.org/rrule-tool.html, https://github.com/square/ruby-rrule)
date is a date formatted like 'YYYY-MM-DD'
start_date and end_date are timestamps like 'YYYY-MM-DDTHH:MM:SS'
I want to find all users that have at least one allocation overlapping with 2 timestamps, say from and to
So I wrote a bit of postgres and arel
assignment_subquery = EventAssignment.joins(:event).where(
'"user_id" = users.id AND
(?) <= "event_assignments".date + (to_char("events".start_date, \'HH24:MI\'))::time + make_interval(mins => "events".duration) AND
(event_assignments.date) + (to_char(events.start_date, \'HH24:MI\'))::time <= (?)',
from, to).arel.exists
User.where(assignment_subquery)
edit: some more postgres(#max)
assignment_subquery = EventAssignment.joins(:event).where(
'"user_id" = users.id AND
("event_assignments".date + (to_char("events".start_date, \'HH24:MI\'))::time + make_interval(mins => "events".duration),
(event_assignments.date) + (to_char(events.start_date, \'HH24:MI\'))::time)
OVERLAPS ((?), (?))'
, from, to).arel.exists
Which works fine
my question is:
Is there a better more rails way to do this?

Related

Postgres 9.5.3 JSON fields comparing dates with operators weird behavior on Mac OSX 10

I'm having a problem on both of my Mac machines, but not on my Linux machines. Using date comparison operators like this does not work on my Mac:
# Ruby code
dt_start = DateTime.current - 10.days
dt_end = DateTime.current
id = 1
# last_seen field looks like this in db when we store it:
# {"1":"2016-11-21T22:17:47.269Z"}
User.where("(last_seen->'?')::text <> 'null'", id
).where( "(last_seen->'?')::text > ?", id, dt_start
).where( "(last_seen->'?')::text <= ?", id, dt_end)
SELECT "public"."users".* FROM "public"."users" WHERE ((last_seen->'1')::text <> 'null') AND ((last_seen->'1')::text > '2016-11-12 18:13:03.432534') AND ((last_seen->'1')::text <= '2016-11-22 18:13:03.432534')
Returns no records on my Mac, but works on Linux
Upon breaking apart that query, when I use > operator, I get no records no matter what date range I put.
User.where( "(last_seen->'?')::text > ?", id, 10.years.ago).count
SELECT COUNT(*) FROM "public"."users" WHERE ((last_seen->'1')::text > '2006-11-22 23:46:59.199255')
=> 0
When I use only the < operator, I get all records that have non-empty last_seen fields no matter what date I put.
User.where( "(last_seen->'?')::text < ?", id, 10.years.ago).count
SELECT COUNT(*) FROM "public"."users" WHERE ((last_seen->'1')::text > '2006-11-22 23:46:59.199255')
=> 42
I've even tested by switching my time on my Mac to match my linux box timezone which is UTC. Any ideas?
UPDATE:
So DateTime and ActiveSupport::TimeWithZone formatted to ISO 8601 return different formats:
DateTime.current.iso8601 # => "2016-11-23T19:18:36+00:00"
Time.zone.now.iso8601 # => "2016-11-23T19:18:44Z"
Since the last_seen JSON field stored dates using ActiveSupport::TimeWithZone, I tried changing the SQL queries to match that format, but same problem:
last_seen: {"1"=>"2016-10-20T14:30:00Z"}
SELECT COUNT(*) FROM "public"."users" WHERE ((last_seen->'1')::text <> 'null') AND ((last_seen->'1')::text > '2016-01-23T19:03:11Z') AND ((last_seen->'1')::text <= '2016-11-23T19:01:10Z')
=> 0
Then I changed last_seen JSON to have the second format with DateTime, and queried with DateTime instead with the same problem.
You say that your JSON column contains things like:
{"1":"2016-11-21T22:17:47.269Z"}
The value in that object is a real ISO-8601 timestamp. The queries that ActiveRecord is producing:
SELECT "public"."users".*
FROM "public"."users"
WHERE ... ((last_seen->'1')::text > '2016-11-12 18:13:03.432534') ...
are using not-quite-ISO-8601 timestamps, not the missing T between the date and time components in '2016-11-12 18:13:03.432534'. The results of your text comparisons will depend on how 'T' and ' ' compare and that's not guaranteed to be what you want it to be or even to be consistent across platforms.
If you're going to do this sort of thing you'll need to make sure the formats are consistent. I'd go with strict ISO-8601 since that is the One True Timestamp Format and it will behave consistently everywhere. The #iso8601 method will take care the formatting for you:
User.where("(last_seen->'?')::text <> 'null'", id)
.where( "(last_seen->'?')::text > ?", id, dt_start.iso8601)
.where( "(last_seen->'?')::text <= ?", id, dt_end.iso8601)
Calling #iso8601 yourself will give ActiveRecord a string so you'll bypass whatever timestamp-to-string formatting AR wants to use. There's also a precision argument to iso8601 if you one second precision isn't good enough.
As an aside, are you sure that JSON is the right approach to this? A separate table might be a better fit.

Ruby on Rails with sqlite, trying to query and return results from the last 7 days?

Noob here, I'm trying to query my SQLite database for entries that have been made in the last 7 days and then return them.
This is the current attempt
user.rb
def featuredfeed
#x = []
#s = []
Recipe.all.each do |y|
#x << "SELECT id FROM recipes WHERE id = #{y.id} AND created_at > datetime('now','-7 days')"
end
Recipe.all.each do |d|
#t = "SELECT id FROM recipes where id = #{d.id}"
#x.each do |p|
if #t = p
#s << d
end
end
end
#s
end
This code returns each recipe 6(total number of objects in the DB) times regardless of how old it is.
#x should only be 3 id's
#x = [13,15,16]
if i run
SELECT id FROM recipes WHERE id = 13 AND created_at > datetime('now','-7 days')
1 Rows returned with id 13 is returned
but if look for an id that is more than 7 days old such as 12
SELECT id FROM recipes WHERE id = 12 AND created_at > datetime('now','-7 days')
0 Rows returned
I'm probably over complicating this but I've spent way too long on it at this point.
the return type has to be Recipe.
To return objects created within last 7 days just use where clause:
Recipe.where('created_at >= ?', 1.week.ago)
Check out docs for more info on querying db.
Edit according to comments:
Since you are using acts_as_votable gem, add the votes caching, so that filtering by votes score is straightforward:
Recipe.where('cached_votes_total >= ?', 10)
Ruby is expressive. I would take the opportunity to use a scope. With Active Record Scopes, this query can be represented in a meaningful way within your code, using syntactic sugar.
scope :from_recent_week, -> { where('created_at >= ?', Time.zone.now - 1.week) }
This allows you to chain your scoped query and enhance readability:
Recipe.from_recent_week.each do
something_more_meaningful_than_a_SQL_query
end
It looks to me that your problem is database abstraction, something Rails does for you. If you are looking for a function that returns the three ids you indicate, I think you would want to do this:
#x = Recipe.from_recent_week.map(&:id)
No need for any of the other fluff, no declarations necessary. I also would encourage you to use a different variable name instead of #x. Please use something more like:
#ids_from_recent_week = Recipe.from_recent_week.map(&:id)

Rails query: Compare calculation of two attribute values

I have a model, Product, which has both a :created_at timestamp and an :expiration_in_days attribute. Products are considered expired a certain number of days after their creation. How do I write a query that only returns products that have not expired?
Rails 4, Ruby 2.1, PG 0.17
I have been trying queries like this, with no success:
#product.rb
def self.not_expired
where('created_at + expiration_in_days * 86400 >= ?', Time.now)
end
I have a pure Ruby version that works fine, it's just slower:
#product.rb
def self.not_expired
select{ |p| (p.created_at.to_i + p.expiration_in_days * 86400) >= Time.now.to_i}
end
note, the 86400 calculation converts the the :expiration_in_days integer into seconds
Any pointers on more advanced Rails queries than the documentation (http://guides.rubyonrails.org/active_record_querying.html) would be very welcome as well.
Thanks in advance.
Try this:
def self.not_expired
where("created_at + (expiration_in_days * 86400)::text::interval >= ?", Time.now)
end
UPDATE: Add references
You can learn more about date and time function here.
Given your case has a special requirement, which is the value of expiration_in_days is a column in the table, we cannot use created_at + interval expiration_in_days day. Instead, we need to type cast its value to interval. But you can't type cast straight to an integer, that's why we cast it to text first.
A + B > C is true if A <= C - B
So, instead of trying to add the expiration time to created_at, subtract the expiration time from Time.now.
def expiration_threshold
Time.now - amount_of_time # I'm not sure what type to use
end
def self.not_expired
where( "created_at > ? ", expiration_threshold )
end
I've always been a little stumped about what type Rails/Ruby will want when dealing with various dates/times, you may have to play around.
I'm not sure if this would work but try something like this
def self.not_expired
where("TIMESTAMP created_at + INTERVAL '1 day' * expiration_in_days >= NOW()")
end

What makes this query so inefficient in Rails 3

I have been struggling for a while with problems along the same lines - performing efficient queries in rails. I am currently trying to perform a query on a model with 500,000 records and then pull out some descriptive statistics regarding the results returned.
As an overview:
I want to pull out a number of products which match a set of criteria. I would then like to...
Count the number of records (if there aren't any I want to supress certain actions)
Identify the max and min prices of the matching records and calculate the number of items falling between certain ranges
As it stands this set of commands takes a lot longer than I was hoping for (26000ms running locally on my desktop computer) and involves either 8 or 9 active record actions each of which take around 3000ms
Is there something I am doing wrongly to make this so slow to process? Any suggestions would be fantastic
The code in my controller is:
filteredmatchingproducts = Allproduct.select("id, product_name, price")
.where('product_name LIKE ?
OR (product_name LIKE ? AND product_name NOT LIKE ? AND product_name NOT LIKE ? AND product_name NOT LIKE ? AND product_name NOT LIKE ? AND product_name NOT LIKE ?)
OR product_name LIKE ? OR product_name LIKE ? OR product_name LIKE ? OR product_name LIKE ? OR (product_name LIKE ? AND product_name NOT LIKE ?) OR product_name LIKE ?',
'%Bike Box', '%Bike Bag%', '%Pannier%', '%Shopper%', '%Shoulder%', '%Shopping%', '%Backpack%' , '%Wheel Bag%', '%Bike sack%', '%Wheel cover%', '%Wheel case%', '%Bike case%', '%Wahoo%', '%Bicycle Travel Case%')
.order('price ASC')
#selected_products = filteredmatchingproducts.paginate(:page => params[:page])
#productsfound = filteredmatchingproducts.count
#min_price = filteredmatchingproducts.first
#max_price = filteredmatchingproducts.last
#price_range = #max_price.price - #min_price.price
#max_pricerange1 = #min_price.price + #price_range/4
#max_pricerange2 = #min_price.price + #price_range/2
#max_pricerange3 = #min_price.price + 3*#price_range/4
#max_pricerange4 = #max_price.price
if #min_price == nil
#don't do anything - just avoid error
else
#restricted_products_pricerange1 = filteredmatchingproducts.select("price").where('price BETWEEN ? and ?', 0 , #max_pricerange1).count
#restricted_products_pricerange2 = filteredmatchingproducts.select("price").where('price BETWEEN ? and ?', #max_pricerange1 + 0.01 , #max_pricerange2).count
#restricted_products_pricerange3 = filteredmatchingproducts.select("price").where('price BETWEEN ? and ?', #max_pricerange2 + 0.01 , #max_pricerange3).count
#restricted_products_pricerange4 = filteredmatchingproducts.select("price").where('price BETWEEN ? and ?', #max_pricerange3 + 0.01 , #max_pricerange4).count
end
EDIT
For clarity, the fundamental question I have is - why does each of these queries need to be performed on the large Allproduct database, is there not a way to perform the latter queries on the result of the former ones (I.e. use filteredmatchingproducts itself not recalculate it for each query)? In other programming languages I am used to being able to remember variables and perform operations of those remembered values, rather than having to work them out again before performing the operations - is this not the mindset in Rails?
There are one too many things that are wrong with the code snippet that you have shared. Most importantly perhaps, this is not a rails specific optimisation problem, but instead a database structure, and optimisation issue.
You are using 'like' queries, with ampersand (%) on both sides that result in linear search time in SQLLite, as no index can be applied. Ideally, you should not be applying searches using 'Like', but instead should have defined a product_categories table, which would have been reference in the AllProducts table as product_category_id and would have a index defined on it.
For initializing #products_found, #min_price, and #max_price variables, you can do the following:
filteredmatchingproductlist = filteredmatchingproducts.to_a
#productsfound = filteredmatchingproductlist.count
#min_price = filteredmatchingproductlist.first
#max_price = filteredmatchingproductlist.last
This will avoid having the separate queries triggered for them as you're performing these operations on an Array instead of ActiveRecord::Relation.
Since the results are sorted, you can apply good old binary search on filteredmatchingproductlist array, and calculate the counts to achieve the same result as the last four lines of your code:
#restricted_products_pricerange1 = filteredmatchingproducts.select("price").where('price BETWEEN ? and ?', 0 , #max_pricerange1).count
#restricted_products_pricerange2 = filteredmatchingproducts.select("price").where('price BETWEEN ? and ?', #max_pricerange1 + 0.01 , #max_pricerange2).count
#restricted_products_pricerange3 = filteredmatchingproducts.select("price").where('price BETWEEN ? and ?', #max_pricerange2 + 0.01 , #max_pricerange3).count
#restricted_products_pricerange4 = filteredmatchingproducts.select("price").where('price BETWEEN ? and ?', #max_pricerange3 + 0.01 , #max_pricerange4).count
Finally, it would be best to integrate a search engine such as Sphinx or Solr if you really need counts and full text searching. Check out http://pat.github.io/thinking-sphinx/searching.html as a reference for how to implement that.
What is the product_name field? It seems like you could use act_as_taggable gem (https://github.com/mbleigh/acts-as-taggable-on). LIKE statement causes database to check every single record for matches and it is quite heavy. When you have 500k records, it has to take a while.
If all you're dealing with are prices, you should go ahead and do so on an array of prices, rather than an ActiveRecord::Relation. So try something like:
filteredmatchingproducts = (...).map(&:price)
And then do all operations on that array. Also, try to load large requests in batches wherever possible, and then maintain your own counts, etc. if you can. This will avoid the application chewing up all the memory at once and slowing things down:
http://guides.rubyonrails.org/active_record_querying.html#retrieving-multiple-objects-in-batches
The reason it's executing so many queries is because you're asking it to execute a lot of queries. (Also all of the LIKEs tend to make things slow.) Here's your code with a comment added before each query that will be made (8 total).
filteredmatchingproducts = Allproduct.select("id, product_name, price")
.where('product_name LIKE ?
OR (product_name LIKE ? AND product_name NOT LIKE ? AND product_name NOT LIKE ? AND product_name NOT LIKE ? AND product_name NOT LIKE ? AND product_name NOT LIKE ?)
OR product_name LIKE ? OR product_name LIKE ? OR product_name LIKE ? OR product_name LIKE ? OR (product_name LIKE ? AND product_name NOT LIKE ?) OR product_name LIKE ?',
'%Bike Box', '%Bike Bag%', '%Pannier%', '%Shopper%', '%Shoulder%', '%Shopping%', '%Backpack%' , '%Wheel Bag%', '%Bike sack%', '%Wheel cover%', '%Wheel case%', '%Bike case%', '%Wahoo%', '%Bicycle Travel Case%')
.order('price ASC')
#!!!! this is a query "select ... offset x, limit y"
#selected_products = filteredmatchingproducts.paginate(:page => params[:page])
#!!!! this is a query "select count ..."
#productsfound = filteredmatchingproducts.count
#!!!! this is a query "select ... order id asc, limit 1"
#min_price = filteredmatchingproducts.first
#!!!! this is a query "select ... order id desc, limit 1"
#max_price = filteredmatchingproducts.last
#price_range = #max_price.price - #min_price.price
#max_pricerange1 = #min_price.price + #price_range/4
#max_pricerange2 = #min_price.price + #price_range/2
#max_pricerange3 = #min_price.price + 3*#price_range/4
#max_pricerange4 = #max_price.price
if #min_price == nil
#don't do anything - just avoid error
else
#!!!! this is a query "select ... where price BETWEEN X and Y"
#restricted_products_pricerange1 = filteredmatchingproducts.select("price").where('price BETWEEN ? and ?', 0 , #max_pricerange1).count
#!!!! this is a query "select ... where price BETWEEN X and Y"
#restricted_products_pricerange2 = filteredmatchingproducts.select("price").where('price BETWEEN ? and ?', #max_pricerange1 + 0.01 , #max_pricerange2).count
#!!!! this is a query "select ... where price BETWEEN X and Y"
#restricted_products_pricerange3 = filteredmatchingproducts.select("price").where('price BETWEEN ? and ?', #max_pricerange2 + 0.01 , #max_pricerange3).count
#!!!! this is a query "select ... where price BETWEEN X and Y"
#restricted_products_pricerange4 = filteredmatchingproducts.select("price").where('price BETWEEN ? and ?', #max_pricerange3 + 0.01 , #max_pricerange4).count
end

Ruby on Rails 3 Check In/Check Out ranges by hour

I'm using Ruby on Rails 3 and I have a "visit" model which stores a check_in and check_out datetime and I need to search through visits in a general date range and count the number of "visitors present" grouped by all hours of the day.
...i.e. I need something like:
8:00am - 8:59am : 12 visitors
9:00am - 9:59am : 5 visitors
10:00am - 10:59am : 4 visitors
...given a table of visits with a check in and check out time stored.
The idea is to take check-in and check-out times for "visits" and then determine how many visitors (assuming each visit logs one visitor, which it does by policy) were visiting during any given hour of the day in order to find out peak visiting times.
I've tried setting up queries like:
eight_am_visits = Visit.where("EXTRACT(HOUR_MINUTE FROM check_in) <= 859").where("EXTRACT(HOUR_MINUTE FROM check_out) >= 800")
...and haven't quite hit on it because Rails stores dates in such an odd fashion (in UTC, which it will convert on database query) and it doesn't seem to be doing that conversion when I use something like EXTRACT in SQL...
...any idea how I can do this?
Looks like you're not actually interested in the Visit objects at all. If you just want a simple summary then push AR out of the way and let the database do the work:
# In visit.rb
def self.check_in_summary(date)
connection.select_rows(%Q{
select extract(hour from check_in), count(*)
from visits
where cast(check_in as date) = '#{date.iso8601}'
group by extract(hour from check_in)
}).inject([ ]) do |a, r|
a << { :hour => r[0].to_i, :n => r[1].to_i }
end
end
Then a = Visit.check_in_summary(Date.today - 1) will give you the summary for yesterday without doing any extra work. That demo implementation will, of course, have holes in the array for hours without any checkins but that is easy to resolve (if desired):
def self.check_in_summary(date)
connection.select_rows(%Q{
select extract(hour from check_in), count(*)
from visits
where cast(check_in as date) = '#{date.iso8601}'
group by extract(hour from check_in)
}).each_with_object([0]*24) do |r, a| # Don't forget the arg order change!
a[r[0].to_i] = r[1].to_i
end
end
That version returns an array with 24 elements (one for each zero-based hour) whose values are the number of checkins within that hour.
Don't be afraid to drop down to SQL when it is convenient, AREL is just one tool and you should have more than one tool in your toolbox. Also, don't be afraid to add extra data mangling and summarizing methods to your models, your models should have an interface that allows you to clearly express your intent in the rest of your code.
Maybe something like that?!
t = Time.now
eight_am_visits = Visit.all(:conditions => ['check_in > ? and check_in < ?', Time.utc(t.year, t.month, t.day, 8), Time.utc(t.year, t.month, t.day, 8, 59)])
EDIT:
Or you can grab all visits by day and filter it in Rails:
t = Time.now
visits = Visit.all(:conditions => ['created_at > ? and created_at < ?', Time.utc(t.year, t.month, t.day - 1), Time.utc(t.year, t.month, t.day + 1)])
visits_by_hour = []
(0..23).each do |h|
visits_by_hour << visits.map {|e| e if e.created_at > Time.utc(t.year, t.month, t.day, h) && e.created_at < Time.utc(t.year, t.month, t.day, h, 59)}.count
end
And in view:
<% visits_by_hour.each_with_index do |h, v| %>
<%= "#{h}:00 - #{h}:59: #{v} visitors" %>
<% end %>
Thanks for your help Olexandr and mu, I managed to figure something out with the insight you gave me here.
I came up with this, and it seems to work:
#grab the data here, this is nice because
#I can get other stats out of it (which I don't show here)
#visits = Visit.where(:check_in => #start_date..#end_date, :check_out => #start_date..#end_date).where("check_out IS NOT NULL");
#Here we go
#visitors_present_by_hour = {}
(0..23).each do |h|
# o.o Ooooooh.... o_o Hee-hee! ^_^
#visitors_present_by_hour[h] = #visits.collect{|v| v.id if v.check_in.hour <= h and v.check_out.hour >= h}.compact.count
end
Then I can just dump out that hash in my view.
It seems the solution was a bit simpler than I thought, and doing it this way actually makes rails do the time conversions from UTC.
So, I could just collect all the visits which have hours in the hour range, then compact out the nils and count what's left. I was surprised once I hit on it. I didn't need any custom SQL at all as I thought I would (unless this is completely wrong, but it seems to be working with some test data).
Thanks guys!

Resources