Retrieving only unique records with multiple requests - ruby-on-rails

I have this "heavy_rotation" filter I'm working on. Basically it grabs tracks from our database based on certain parameters (a mixture of listens_count, staff_pick, purchase_count, to name a few)
An xhr request is made to the filter_tracks controller action. In there I have a flag to check if it's "heavy_rotation". I will likely move this to the model (cos this controller is getting fat)... Anyway, how can I ensure (in a efficient way) to not have it pull the same records? I've considered an offset, but than I have to keep track of the offset for every query. Or maybe store track.id's to compare against for each query? Any ideas? I'm having trouble thinking of an elegant way to do this.
Maybe it should be noted that a limit of 14 is set via Javascript, and when a user hits "view more" to paginate, it sends another request to filter_tracks.
Any help appreciated! Thanks!
def filter_tracks
params[:limit] ||= 50
params[:offset] ||= 0
params[:order] ||= 'heavy_rotation'
# heavy rotation filter flag
heavy_rotation ||= (params[:order] == 'heavy_rotation')
#result_offset = params[:offset]
#tracks = Track.ready.with_artist
params[:order] = "tracks.#{params[:order]}" unless heavy_rotation
if params[:order]
order = params[:order]
order.match(/artist.*/){|m|
params[:order] = params[:order].sub /tracks\./, ''
}
order.match(/title.*/){|m|
params[:order] = params[:order].sub /tracks.(title)(.*)/i, 'LOWER(\1)\2'
}
end
searched = params[:q] && params[:q][:search].present?
#tracks = parse_params(params[:q], #tracks)
#tracks = #tracks.offset(params[:offset])
#result_count = #tracks.count
#tracks = #tracks.order(params[:order], 'tracks.updated_at DESC').limit(params[:limit]) unless heavy_rotation
# structure heavy rotation results
if heavy_rotation
puts "*" * 300
week_ago = Time.now - 7.days
two_weeks_ago = Time.now - 14.days
three_months_ago = Time.now - 3.months
# mix in top licensed tracks within last 3 months
t = Track.top_licensed
tracks_top_licensed = t.where(
"tracks.updated_at >= :top",
top: three_months_ago).limit(5)
# mix top listened to tracks within last two weeks
tracks_top_listens = #tracks.order('tracks.listens_count DESC').where(
"tracks.updated_at >= :top",
top: two_weeks_ago)
.limit(3)
# mix top downloaded tracks within last two weeks
tracks_top_downloaded = #tracks.order("tracks.downloads_count DESC").where(
"tracks.updated_at >= :top",
top: two_weeks_ago)
.limit(2)
# mix in 25% of staff picks added within 3 months
tracks_staff_picks = Track.ready.staff_picks.
includes(:artist).order("tracks.created_at DESC").where(
"tracks.updated_at >= :top",
top: three_months_ago)
.limit(4)
#tracks = tracks_top_licensed + tracks_top_listens + tracks_top_downloaded + tracks_staff_picks
end
render partial: "shared/results"
end

I think seeking an "elegant" solution is going to yield many diverse opinions, so I'll offer one approach and my reasoning. In my design decision, I feel that in this case it's optimal and elegant to enforce uniqueness on query intersections by filtering the returned record objects instead of trying to restrict the query to only yield unique results. As for getting contiguous results for pagination, on the other hand, I would store offsets from each query and use it as the starting point for the next query using instance variables or sessions, depending on how the data needs to be persisted.
Here's a gist to my refactored version of your code with a solution implemented and comments explaining why I chose to use certain logic or data structures: https://gist.github.com/femmestem/2b539abe92e9813c02da
#filter_tracks holds a hash map #tracks_offset which the other methods can access and update; each of the query methods holds the responsibility of adding its own offset key to #tracks_offset.
#filter_tracks also holds a collection of track id's for tracks that already appear in the results.
If you need persistence, make #tracks_offset and #track_ids sessions/cookies instead of instance variables. The logic should be the same. If you use sessions to store the offsets and id's from results, remember to clear them when your user is done interacting with this feature.
See below. Note, I refactored your #filter_tracks method to separate the responsibilities into 9 different methods: #filter_tracks, #heavy_rotation, #order_by_params, #heavy_rotation?, #validate_and_return_top_results, and #tracks_top_licensed... #tracks_top_<whatever>. This will make my notes easier to follow and your code more maintainable.
def filter_tracks
# Does this need to be so high when JavaScript limits display to 14?
#limit ||= 50
#tracks_offset ||= {}
#tracks_offset[:default] ||= 0
#result_track_ids ||= []
#order ||= params[:order] || 'heavy_rotation'
tracks = Track.ready.with_artist
tracks = parse_params(params[:q], tracks)
#result_count = tracks.count
# Checks for heavy_rotation filter flag
if heavy_rotation? #order
#tracks = heavy_rotation
else
#tracks = order_by_params
end
render partial: "shared/results"
end
All #heavy_rotation does is call the various query methods. This makes it easy to add, modify, or delete any one of the query methods as criteria changes without affecting any other method.
def heavy_rotation
week_ago = Time.now - 7.days
two_weeks_ago = Time.now - 14.days
three_months_ago = Time.now - 3.months
tracks_top_licensed(date_range: three_months_ago, max_results: 5) +
tracks_top_listens(date_range: two_weeks_ago, max_results: 3) +
tracks_top_downloaded(date_range: two_weeks_ago, max_results: 2) +
tracks_staff_picks(date_range: three_months_ago, max_results: 4)
end
Here's what one of the query methods looks like. They're all basically the same, but with custom SQL/ORM queries. You'll notice that I'm not setting the :limit parameter to the number of results that I want the query method to return. This would create a problem if one of the records returned is duplicated by another query method, like if the same track was returned by staff_picks and top_downloaded. Then I would have to make an additional query to get another record. That's not a wrong decision, just one I didn't decide to do.
def tracks_top_licensed(args = {})
args = #default.merge args
max = args[:max_results]
date_range = args[:date_range]
# Adds own offset key to #filter_tracks hash map => #tracks_offset
#tracks_offset[:top_licensed] ||= 0
unfiltered_results = Track.top_licensed
.where("tracks.updated_at >= :date_range", date_range: date_range)
.limit(#limit)
.offset(#tracks_offset[:top_licensed])
top_tracks = validate_and_return_top_results(unfiltered_results, max)
# Add offset of your most recent query to the cumulative offset
# so triggering 'view more'/pagination returns contiguous results
#tracks_offset[:top_licensed] += top_tracks[:offset]
top_tracks[:top_results]
end
In each query method, I'm cleaning the record objects through a custom method #validate_and_return_top_results. My validator checks through the record objects for duplicates against the #track_ids collection in its ancestor method #filter_tracks. It then returns the number of records specified by its caller.
def validate_and_return_top_results(collection, max = 1)
top_results = []
i = 0 # offset incrementer
until top_results.count >= max do
# Checks if track has already appeared in the results
unless #result_track_ids.include? collection[i].id
# this will be returned to the caller
top_results << collection[i]
# this is the point of reference to validate your query method results
#result_track_ids << collection[i].id
end
i += 1
end
{ top_results: top_results, offset: i }
end

Related

Rails: ActiveRecord where vs merge

First, I am getting the review statuses between particular dates.
date_range = Date.parse(#from_date).beginning_of_day..Date.parse(#to_date).end_of_day
#review_statuses = ReviewStatus.where(updated_at: date_range)
Next, I need to apply an 'AND' condition.
#review_cycle = params[:review_cycle]
if #review_cycle.present?
#review_statuses = #review_statuses.merge(
ReviewStatus.where(evidence_cycle: #review_cycle)
.or(ReviewStatus.where(roc_cycle: #review_cycle)))
end
Now for the below should I apply a 'where' or 'merge'.
#status = params[:status]
#review_statuses.where(evidence_status: :pass, roc_status: :pass) if #status == 'pass'
Can someone explain, when should we use merge instead of where?
You generally want to use where except in special circumstances -- most commonly, to apply conditions to a secondary (joined) table in the query. This is becase
it's shorter / clearer / more idiomatic, and
merge has tricky edge cases: it mostly combines the two queries, but there are situations where one side's value will just override the other.
Given that, even your existing condition doesn't need merge:
# Unchanged
date_range = Date.parse(#from_date).beginning_of_day..Date.parse(#to_date).end_of_day
#review_statuses = ReviewStatus.where(updated_at: date_range)
# direct #where+#or over #merge
#review_cycle = params[:review_cycle]
if #review_cycle.present?
#review_statuses = #review_statuses.where(evidence_cycle: #review_cycle).or(
#review_statuses.where(roc_cycle: #review_cycle))
end
# more #where
#status = params[:status]
#review_statuses = #review_statuses.where(evidence_status: :pass, roc_status: :pass) if #status == 'pass'

How to speed up a very frequently made query using raw SQL and without ORM?

I have an API endpoint that accounts for a little less than half of the average response time (on averaging taking about 514 ms, yikes). The endpoint simply returns some statistics about stored data scoped to particular time periods, such as this week, last week, this month, and so on...
There are a number of ways that we could reduce it's impact, like getting the clients to hit it less and with more particular queries such as only querying for "this week" when only that data is used. Here we focus on what can be done at the database-level first. In our current implementation we generate this data for all "time scopes" on-the-fly and the number of queries is enormous and made multiple times per second. No caching is used, but maybe there is a way to use Rails's cache_key, or the low-level Rails.cache?
The current implementation look something like this:
class FooSummaries
include SummaryStructs
def self.generate_for(user)
#user = user
summaries = Struct::Summaries.new
TimeScope::TIME_SCOPES.each do |scope|
foos = user.foos.by_scope(scope.to_sym)
summary = Struct::Summary.new
# e.g: summaries.last_week = build_summary(foos)
summaries.send("#{scope}=", build_summary(summary, foos))
end
summaries
end
private_class_method
def self.build_summary(summary, foos)
summary.all_quuz = #user.foos_count
summary.all_quux = all_quux(foos)
summary.quuw = quuw(foos).to_f
%w[foo bar baz qux].product(
%w[quux quuz corge]
).each do |a, b|
# e.g: summary.foo_quux = quux(foos, "foo")
summary.send("#{a.downcase}_#{b}=", send(b, foos, a) || 0)
end
summary
end
def self.all_quuz(foos)
foos.count
end
def self.all_quux(foos)
foos.sum(:quux)
end
def self.quuw(foos)
foos.quuwable.total_quuw
end
def self.corge(foos, foo_type)
return if foos.count.zero?
count = self.quuz(foos, foo_type) || 0
count.to_f / foos.count
end
def self.quux(foos, foo_type)
case foo_type
when "foo"
foos.where(foo: true).sum(:quux)
when "bar"
foos.bar.where(foo: false).sum(:quux)
when "baz"
foos.baz.where(foo: false).sum(:quux)
when "qux"
foos.qux.sum(:quux)
end
end
def self.quuz(foos, foo_type)
case trip_type
when "foo"
foos.where(foo: true).count
when "bar"
foos.bar.where(foo: false).count
when "baz"
foos.baz.where(foo: false).count
when "qux"
foos.qux.count
end
end
end
To avoid making changes to the model, or creating migrations to create a table to store this data (both of which may be valid and better solutions) I decided maybe it would be easier to construct one large sql query that will be executed at once in the hopes that it will be faster to build the query string and execute it without the overhead of active record set up and tear down of SQL queries.
The new approach looks something like this, it is horrifying to me and I know there must be a more elegant way:
class FooSummaries
include SummaryStructs
def self.generate_for(user)
results = ActiveRecord::Base.connection.execute(build_query_for(user))
results.each do |result|
# build up summary struct from query results
end
end
def self.build_query_for(user)
TimeScope::TIME_SCOPES.map do |scope|
time_scope = TimeScope.new(scope)
%w[foo bar baz qux].map do |foo_type|
%[
select
'#{scope}_#{foo_type}',
sum(quux) as quux,
count(*), as quuz,
round(100.0 * (count(*) / #{user.foos_count.to_f}), 3) as corge
from
"foos"
where
"foo"."user_id" = #{user.id}
and "foos"."foo_type" = '#{foo_type.humanize}'
and "foos"."end_time" between '#{time_scope.from}' AND '#{time_scope.to}'
and "foos"."foo" = '#{foo_type == 'foo' ? 't' : 'f'}'
union
]
end
end.join.reverse.sub("union".reverse, "").reverse
end
end
The funny way of replacing the last occurance of union also horrifies but it seems to work. There must be a beter way as there are probably many things that are wrong with the above implementation(s). It may be helpful to note that I use Postgresql and have no problem with writing queries that are not portable to other DB's. Any advice is truly appreciated!
Thanks for reading!
Update: I found a solution that works for me and sped up the endpoint that uses this service object by 500% ! Essentially the idea is, instead of building a query string and then executing it for each set of parameters, we create a prepared statement using prepare followed by an exec_prepared passing in parameters to the query. Since this query is made many times over this is a useful optmization because, as per the documentation:
A prepared statement is a server-side object that can be used to optimize performance. When the PREPARE statement is executed, the specified statement is parsed, analyzed, and rewritten. When an EXECUTE command is subsequently issued, the prepared statement is planned and executed. This division of labor avoids repetitive parse analysis work, while allowing the execution plan to depend on the specific parameter values supplied.
We prepare the query like so:
def prepare_query!
ActiveRecord::Base.transaction do
connection.prepare("foos_summary",
%[with scoped_foos as (
select
*
from
"foos"
where
"foos"."user_id" = $3
and ("foos"."end_time" between $4 and $5)
)
select
$1::text as scope,
$2::text as foo_type,
sum(quux)::float as quux,
sum(eggs + bacon + ham)::float as food,
count(*) as count,
round((sum(quux) / nullif(
(select
sum(quux)
from
scoped_foos), 0))::numeric,
5)::float as quuz
from
scoped_foos
where
(case $6
when 'Baz'
then (baz = 't')
else
(baz = 'f' and foo_type = $6)
end
)
])
end
You can see in this query we use a common table expression for more readability and to avoid writing the same select query twice over.
Then we execute the query, passing in the parameters we need:
def connection
#connection ||= ActiveRecord::Base.connection.raw_connection
end
def query_results
prepare_query! unless query_already_prepared?
#results ||= TimeScope::TIME_SCOPES.map do |scope|
time_scope = TimeScope.new(scope)
%w[bacon eggs ham spam].map do |foo_type|
connection.exec_prepared("foos_summary",
[scope,
foo_type,
#user.id,
time_scope.from,
time_scope.to,
foo_type.humanize])
end
end
end
Where query_already_prepared? is a simple check in the prepared statements table maintained by postgres:
def query_already_prepared?
connection.exec(%(select
name
from
pg_prepared_statements
where name = 'foos_summary')).count.positive?
end
A nice solution, I thought! Hopefully the technique illustrated here will help others with a similar problems.

Getting all the pages from an API

This is something I struggle with, or whenever I do it it seems to be messy.
I'm going to ask the question in a very generic way as it's not a single problem I'm really trying to solve.
I have an API that I want to consume some data from, e.g. via:
def get_api_results(page)
results = HTTParty.get("api.api.com?page=#{page}")
end
When I call it I can retrieve a total.
results["total"] = 237
The API limits the number of records I can retrieve in one call, say 20. So I need to call it a few more times.
I want to do something like the following, ideally breaking it into pieces so I can use things like delayed_job..etc
def get_all_api_pages
results = get_api_results(1)
total = get_api_results(1)["total"]
until page*20 > total do |p|
results += get_api_results(p)
end
end
I always feel like I'm writing rubbish whenever I try and solve this (and I've tried to solve it in a number of ways).
The above, for example, leaves me at the mercy of an error with the API, which knocks out all my collected results if I hit an error at any point.
Wondering if there is just a generally good, clean way of dealing with this situation.
I don't think you can have that much cleaner...because you only receive the total once you called the API.
Have you tried to build your own enum for this. It encapsulates the ugly part. Here is a bit of sample code with a "mocked" API:
class AllRecords
PER_PAGE = 50
def each
return enum_for(:each) unless block_given?
current_page = 0
total = nil
while total.nil? || current_page * PER_PAGE < total
current_page += 1
page = load_page(current_page)
total = page[:total]
page[:items].each do |item|
yield(item)
end
end
end
private
def load_page(page)
if page == 5
{items: Array.new(37) { rand(100) }, total: 237}
else
{items: Array.new(50) { rand(100) }, total: 237}
end
end
end
AllRecords.new.each.each_with_index do |item, index|
p index
end
You can surely clean that out a bit but i think that this is nice because it does not collect all the items first.

Rails Remove Model from ActiveRecord::Relation Query

What's the best way to dynamically remove a model from a query? Basically I want to find all campaigns where a user hasn't already provided a response.
The below method delete_at actually deletes the model which isn't what I want. I only want it remove from the local 'campaigns' ActiveRecord::Relation query set that I got.
def self.appuser_campaigns appuser_id, language
appuser = Appuser.find(appuser_id)
campaigns = Campaign.check_language language
i = -1
campaigns.each do |campaign|
i = i + 1
responses = Response.where(appuser_id: appuser_id, campaign_id: campaign.id)
if responses.length > 0
campaigns.delete_at(i)
end
end
puts campaigns.class.name #"ActiveRecord::Relation"
campaigns
end
def self.check_language language
campaigns = Campaign.where(language: language, status: "In Progress")
end
You can do the following:
already_answered_campaign_ids = Appuser.find(appuser_id).responses.pluck(:campaign_id)
Campaign.where('id NOT IN (?)', already_answered_campaign_ids.presence || -1)

removing objects from an array during a loop

I am trying to filter the results of an user search in my app to only show users who are NOT friends. My friends table has 3 columns; f1 (userid of person who sent request), f2 (userid of friend who received request), and confirmed (boolean of true or false). As you can see, #usersfiltered is the result of the search. Then the definition of the current user's friend is established. Then I am trying to remove the friends from the search results. This does not seem to be working but should be pretty straight forward. I've tried delete (not good) and destroy.
def index
#THIS IS THE SEARCH RESULT
#usersfiltered = User.where("first_name LIKE?", "%#{params[:first_name]}%" )
#THIS IS DEFINING ROWS ON THE FRIEND TABLE THAT BELONG TO CURRENT USER
#confirmedfriends = Friend.where(:confirmed => true)
friendsapproved = #confirmedfriends.where(:f2 => current_user.id)
friendsrequestedapproved = #confirmedfriends.where(:f1 => current_user.id)
#GOING THROUGH SEARCH RESULTS
#usersfiltered.each do |usersfiltered|
if friendsapproved.present?
friendsapproved.each do |fa|
if usersfiltered.id == fa.f1
#NEED TO REMOVE THIS FROM RESULTS HERE SOMEHOW
usersfiltered.remove
end
end
end
#SAME LOGIC
if friendsrequestedapproved.present?
friendsrequestedapproved.each do |fra|
if usersfiltered.id == fra.f2
usersfiltered.remove
end
end
end
end
end
I would flip it around the other way. Take the logic that is loop-invariant out of the loop, which gives a good first-order simplification:
approved_ids = []
approved_ids = friendsapproved.map { |fa| fa.f1 } if friendsapproved.present?
approved_ids += friendsrequestedapproved.map { |fra| fra.f2 } if friendsrequestedapproved.present?
approved_ids.uniq! # (May not be needed)
#usersfiltered.delete_if { |user| approved_ids.include? user.id }
This could probably be simplified further if friendsapproved and friendsrequestedapproved have been created separately strictly for the purpose of the deletions. You could generate a single friendsapproval list consisting of both and avoid unioning id sets above.
While I agree that there may be better ways to implement what you're doing, I think the specific problem you're facing is that in Rails 4, the where method returns an ActiveRecord::Relation not an Array. While you can use each on a Relation, you cannot in general perform array operations.
However, you can convert a Relation to an Array with the to_a method as in:
#usersfiltered = User.where("first_name LIKE?", "%#{params[:first_name]}%" ).to_a
This would then allow you to do the following within your loop:
usersfiltered.delete(fa)

Resources