This is something I struggle with, or whenever I do it it seems to be messy.
I'm going to ask the question in a very generic way as it's not a single problem I'm really trying to solve.
I have an API that I want to consume some data from, e.g. via:
def get_api_results(page)
results = HTTParty.get("api.api.com?page=#{page}")
end
When I call it I can retrieve a total.
results["total"] = 237
The API limits the number of records I can retrieve in one call, say 20. So I need to call it a few more times.
I want to do something like the following, ideally breaking it into pieces so I can use things like delayed_job..etc
def get_all_api_pages
results = get_api_results(1)
total = get_api_results(1)["total"]
until page*20 > total do |p|
results += get_api_results(p)
end
end
I always feel like I'm writing rubbish whenever I try and solve this (and I've tried to solve it in a number of ways).
The above, for example, leaves me at the mercy of an error with the API, which knocks out all my collected results if I hit an error at any point.
Wondering if there is just a generally good, clean way of dealing with this situation.
I don't think you can have that much cleaner...because you only receive the total once you called the API.
Have you tried to build your own enum for this. It encapsulates the ugly part. Here is a bit of sample code with a "mocked" API:
class AllRecords
PER_PAGE = 50
def each
return enum_for(:each) unless block_given?
current_page = 0
total = nil
while total.nil? || current_page * PER_PAGE < total
current_page += 1
page = load_page(current_page)
total = page[:total]
page[:items].each do |item|
yield(item)
end
end
end
private
def load_page(page)
if page == 5
{items: Array.new(37) { rand(100) }, total: 237}
else
{items: Array.new(50) { rand(100) }, total: 237}
end
end
end
AllRecords.new.each.each_with_index do |item, index|
p index
end
You can surely clean that out a bit but i think that this is nice because it does not collect all the items first.
Related
I've got a background job that I run about 5,000 of them every 10 minutes. Each job makes a request to an external API and then either adds new or updates existing records in my database. Each API request returns around 100 items, so every 10 minutes I am making 50,000 CREATE or UPDATE sql queries.
The way I handle this now is, each API item returned has a unique ID. I search my database for a post that has this id, and if it exists, it updates the model. If it doesn't exist, it creates a new one.
Imagine the api response looks like this:
[
{
external_id: '123',
text: 'blah blah',
count: 450
},
{
external_id: 'abc',
text: 'something else',
count: 393
}
]
which is set to the variable collection
Then I run this code in my parent model:
class ParentModel < ApplicationRecord
def update
collection.each do |attrs|
child = ChildModel.find_or_initialize_by(external_id: attrs[:external_id], parent_model_id: self.id)
child.assign_attributes attrs
child.save if child.changed?
end
end
end
Each of these individual calls is extremely quick, but when I am doing 50,000 in a short period of time it really adds up and can slow things down.
I'm wondering if there's a more efficient way I can handle this, I was thinking of doing something instead like:
class ParentModel < ApplicationRecord
def update
eager_loaded_children = ChildModel.where(parent_model_id: self.id).limit(100)
collection.each do |attrs|
cached_child = eager_loaded_children.select {|child| child.external_id == attrs[:external_id] }.first
if cached_child
cached_child.update_attributes attrs
else
ChildModel.create attrs
end
end
end
end
Essentially I would be saving the lookups and instead doing a bigger query up front (this is also quite fast) but making a tradeoff in memory. But this doesn't seem like it would be that much time, maybe slightly speeding up the lookup part, but I'd still have to do 100 updates and creates.
Is there some kind of way I can do batch updates that I'm not thinking of? Anything else obvious that could make this go faster, or reduce the amount of queries I am doing?
You can do something like this:
collection2 = collection.map { |c| [c[:external_id], c.except(:external_id)]}.to_h
def update
ChildModel.where(external_id: collection2.keys).each |cm| do
ext_id = cm.external_id
cm.assign_attributes collection2[ext_id]
cm.save if cm.changed?
collection2.delete(ext_id)
end
if collection2.present?
new_ids = collection2.keys
new = collection.select { |c| new_ids.include? c[:external_id] }
ChildModel.create(new)
end
end
Better because
fetches all required records all at once
creates all new records at once
You can use update_columns if you don't need callbacks/validations
Only drawback, more ruby code manipulation which I think is a good tradeoff for db queries..
I'm using rails 3.2.11 and ruby 1.9.3.
I have a slow page and I know I have many ways to optimize it. Currently I am focused on the method update_attributes.
Here is my code:
def create
#user = current_user
#demo = #user.demos.new
race_ethnicity_response = []
params[:race_ethnicity_response].each do |response, value|
race_ethnicity_response << response if value != '0'
end
params[:demo][:race_ethnicity_response] = race_ethnicity_response.join(', ')[0, 254]
#demo.update_attributes(params[:demo])
end
Or should I use something like build and save or create?
#demo = #user.demos.build
...
#demo.save!
Or
#users.demos.create!(params[demo])
I am curious which is faster. I know if it save 2ms then I should use the one which is more code correct/readable.
On a small operation like this you aren't going to see much of a performance difference. Go for readability + maintainability. The code above does seem a little scattered, particularly the middle block. Here is a straightforward approach although I may be missing something related to the params[:race_ethnicity_response] loop.
#demo = Demo.new(:params)
#demo.race_ethnicity_response = race_ethnicity_response.reject{|i| i == 0 }.join(', ')[0, 254]
current_user.demos << #demo
I have this "heavy_rotation" filter I'm working on. Basically it grabs tracks from our database based on certain parameters (a mixture of listens_count, staff_pick, purchase_count, to name a few)
An xhr request is made to the filter_tracks controller action. In there I have a flag to check if it's "heavy_rotation". I will likely move this to the model (cos this controller is getting fat)... Anyway, how can I ensure (in a efficient way) to not have it pull the same records? I've considered an offset, but than I have to keep track of the offset for every query. Or maybe store track.id's to compare against for each query? Any ideas? I'm having trouble thinking of an elegant way to do this.
Maybe it should be noted that a limit of 14 is set via Javascript, and when a user hits "view more" to paginate, it sends another request to filter_tracks.
Any help appreciated! Thanks!
def filter_tracks
params[:limit] ||= 50
params[:offset] ||= 0
params[:order] ||= 'heavy_rotation'
# heavy rotation filter flag
heavy_rotation ||= (params[:order] == 'heavy_rotation')
#result_offset = params[:offset]
#tracks = Track.ready.with_artist
params[:order] = "tracks.#{params[:order]}" unless heavy_rotation
if params[:order]
order = params[:order]
order.match(/artist.*/){|m|
params[:order] = params[:order].sub /tracks\./, ''
}
order.match(/title.*/){|m|
params[:order] = params[:order].sub /tracks.(title)(.*)/i, 'LOWER(\1)\2'
}
end
searched = params[:q] && params[:q][:search].present?
#tracks = parse_params(params[:q], #tracks)
#tracks = #tracks.offset(params[:offset])
#result_count = #tracks.count
#tracks = #tracks.order(params[:order], 'tracks.updated_at DESC').limit(params[:limit]) unless heavy_rotation
# structure heavy rotation results
if heavy_rotation
puts "*" * 300
week_ago = Time.now - 7.days
two_weeks_ago = Time.now - 14.days
three_months_ago = Time.now - 3.months
# mix in top licensed tracks within last 3 months
t = Track.top_licensed
tracks_top_licensed = t.where(
"tracks.updated_at >= :top",
top: three_months_ago).limit(5)
# mix top listened to tracks within last two weeks
tracks_top_listens = #tracks.order('tracks.listens_count DESC').where(
"tracks.updated_at >= :top",
top: two_weeks_ago)
.limit(3)
# mix top downloaded tracks within last two weeks
tracks_top_downloaded = #tracks.order("tracks.downloads_count DESC").where(
"tracks.updated_at >= :top",
top: two_weeks_ago)
.limit(2)
# mix in 25% of staff picks added within 3 months
tracks_staff_picks = Track.ready.staff_picks.
includes(:artist).order("tracks.created_at DESC").where(
"tracks.updated_at >= :top",
top: three_months_ago)
.limit(4)
#tracks = tracks_top_licensed + tracks_top_listens + tracks_top_downloaded + tracks_staff_picks
end
render partial: "shared/results"
end
I think seeking an "elegant" solution is going to yield many diverse opinions, so I'll offer one approach and my reasoning. In my design decision, I feel that in this case it's optimal and elegant to enforce uniqueness on query intersections by filtering the returned record objects instead of trying to restrict the query to only yield unique results. As for getting contiguous results for pagination, on the other hand, I would store offsets from each query and use it as the starting point for the next query using instance variables or sessions, depending on how the data needs to be persisted.
Here's a gist to my refactored version of your code with a solution implemented and comments explaining why I chose to use certain logic or data structures: https://gist.github.com/femmestem/2b539abe92e9813c02da
#filter_tracks holds a hash map #tracks_offset which the other methods can access and update; each of the query methods holds the responsibility of adding its own offset key to #tracks_offset.
#filter_tracks also holds a collection of track id's for tracks that already appear in the results.
If you need persistence, make #tracks_offset and #track_ids sessions/cookies instead of instance variables. The logic should be the same. If you use sessions to store the offsets and id's from results, remember to clear them when your user is done interacting with this feature.
See below. Note, I refactored your #filter_tracks method to separate the responsibilities into 9 different methods: #filter_tracks, #heavy_rotation, #order_by_params, #heavy_rotation?, #validate_and_return_top_results, and #tracks_top_licensed... #tracks_top_<whatever>. This will make my notes easier to follow and your code more maintainable.
def filter_tracks
# Does this need to be so high when JavaScript limits display to 14?
#limit ||= 50
#tracks_offset ||= {}
#tracks_offset[:default] ||= 0
#result_track_ids ||= []
#order ||= params[:order] || 'heavy_rotation'
tracks = Track.ready.with_artist
tracks = parse_params(params[:q], tracks)
#result_count = tracks.count
# Checks for heavy_rotation filter flag
if heavy_rotation? #order
#tracks = heavy_rotation
else
#tracks = order_by_params
end
render partial: "shared/results"
end
All #heavy_rotation does is call the various query methods. This makes it easy to add, modify, or delete any one of the query methods as criteria changes without affecting any other method.
def heavy_rotation
week_ago = Time.now - 7.days
two_weeks_ago = Time.now - 14.days
three_months_ago = Time.now - 3.months
tracks_top_licensed(date_range: three_months_ago, max_results: 5) +
tracks_top_listens(date_range: two_weeks_ago, max_results: 3) +
tracks_top_downloaded(date_range: two_weeks_ago, max_results: 2) +
tracks_staff_picks(date_range: three_months_ago, max_results: 4)
end
Here's what one of the query methods looks like. They're all basically the same, but with custom SQL/ORM queries. You'll notice that I'm not setting the :limit parameter to the number of results that I want the query method to return. This would create a problem if one of the records returned is duplicated by another query method, like if the same track was returned by staff_picks and top_downloaded. Then I would have to make an additional query to get another record. That's not a wrong decision, just one I didn't decide to do.
def tracks_top_licensed(args = {})
args = #default.merge args
max = args[:max_results]
date_range = args[:date_range]
# Adds own offset key to #filter_tracks hash map => #tracks_offset
#tracks_offset[:top_licensed] ||= 0
unfiltered_results = Track.top_licensed
.where("tracks.updated_at >= :date_range", date_range: date_range)
.limit(#limit)
.offset(#tracks_offset[:top_licensed])
top_tracks = validate_and_return_top_results(unfiltered_results, max)
# Add offset of your most recent query to the cumulative offset
# so triggering 'view more'/pagination returns contiguous results
#tracks_offset[:top_licensed] += top_tracks[:offset]
top_tracks[:top_results]
end
In each query method, I'm cleaning the record objects through a custom method #validate_and_return_top_results. My validator checks through the record objects for duplicates against the #track_ids collection in its ancestor method #filter_tracks. It then returns the number of records specified by its caller.
def validate_and_return_top_results(collection, max = 1)
top_results = []
i = 0 # offset incrementer
until top_results.count >= max do
# Checks if track has already appeared in the results
unless #result_track_ids.include? collection[i].id
# this will be returned to the caller
top_results << collection[i]
# this is the point of reference to validate your query method results
#result_track_ids << collection[i].id
end
i += 1
end
{ top_results: top_results, offset: i }
end
I am working on a survey app, and an Organisation has 0 or more SurveyGroups which have 0 or more Members who take the survey
For a SurveyGroup I need to know how many surveys still need to be completed so I have this method:
SurveyGroup#surveys_outstanding
def surveys_outstanding
respondents_count - surveys_completed
end
But I also need to know how many surveys are outstanding at an organisational level, so I have a method like below, but is there a more idiomatic way to do this with Array#inject or Array#reduce or similar?
Organisation#surveys_pending
def surveys_pending
result = 0
survey_groups.each do |survey_group|
result += survey_group.surveys_outstanding
end
result
end
Try this:
def surveys_pending
#surveys_pending ||= survey_groups.map(&:surveys_outstanding).sum
end
I'm using memoization in case it is slow to calculate
def surveys_pending
survey_groups.inject(0) do |result, survey_group|
result + survey_group.surveys_outstanding
end
end
I was able to get an access token working with oauth, but the end point limits the maximum records per request to 100.
I need to get more than that, and am wondering if there is a simple/common way to do this?
I'd like to keep making requests until I get all the records. For example:
#products = JSON.parse(#access_token.get("/api/rest/products?page=#{#n}&limit=100").body)
I may need 10,000+ of the products. Is this possible?
Assuming #products is an array of products you should be able to do this
#products = []
(1..100).each do |page|
#products << JSON.parse(#access_token.get("/api/rest/products?page=#{page}&limit=100").body)
end
If you don't know exactly how many pages or products there are, unless the api provide you that, you can do something like this to stop fetching when there are no more products.
#products = []
# Assuming you never want more then 10.000
(1..100).each do |page|
new_products = JSON.parse(#access_token.get("/api/rest/products?page=#{page}&limit=100").body)
#products << new_products
break if new_products.size < 100
end