Postgres - Randomize results once per day - ruby-on-rails

I'm building a store and would like to randomize a product page, but only change it once per day.
I know that a randomizer with a seed number can return consistent results, so perhaps using the current day as a seed would work.
Caching would also work, or storing the results in a table.
What would be a good way to do this?

Create a materialized view. That's just another table in current PostgreSQL, updated with the results of a query. I might install a cron job that triggers the refill. You can have any amount of caching on top of that.
The upcoming Postgres 9.3 will have a new feature.
More on materialized views in the Postgres wiki.
For a fast method to pull random rows you may be interested in this related question:
Best way to select random rows PostgreSQL

You definitely want to cache the results. Sorting things randomly is slow (especially in large datasets). You could have a cron job that ran every night to clear out the old cache and pick new random products. Page cache is best if you can pull that off, but a fragment cache would work fine too.

I found a different way to accomplish this that will also let me use the will_paginate gem and have fresh info when the products are updated.
I added a sort_order long integer to the table. Then, once a day, I will run a query to update that field with random numbers. I'll sort that field.
Conceptual Rails code:
# Pulling in the products in the specified random order
def show
#category = Category.where(slug: params[:id].to_s).first
if #category
#random_products = #category.products.order(sort_order: :desc) # desc so new products are at the end
end
end
# Elsewhere...
def update_product_order
products = Product.order("RANDOM()").all
order_index = 0
products.each do |p|
product.sort_order = order_index
product.save! # this can be done much more efficiently, obviously
order_index += 1
end
end

Related

Improving a Active Record / Postgresql Query Further

Following up on my question here, I'm trying to improve a search further. We first search a replays table (searching 2k records) and then get unique players associated with that table (10 per, so 20k records) and render a JSON. This is done through the controller, the search reads as:
def index
#replays = Replay.includes(:players).where(map_id: params['map_id'].to_i).order(id: :desc).limit(2000)
render json: #replays[0..2000].to_json(include: [:players])
end
The performance:
Completed 200 OK in 254032ms (Views: 34.1ms | ActiveRecord: 20682.4ms)
The actual Active Record search reads as:
Replay Load (80.4ms) SELECT "replays".* FROM "replays" WHERE "replays"."map_id" = $1 ORDER BY "replays"."id" DESC LIMIT $2 [["map_id", 1], ["LIMIT", 2000]]
Player Load (20602.0ms) SELECT "players".* FROM "players" WHERE "players"."replay_id" IN (117217...
This mostly works, but still takes an exceptional amount of time. Is there are way to improve performance?
You're getting bitten by this issue https://postgres.cz/wiki/PostgreSQL_SQL_Tricks_I#Predicate_IN_optimalization
I found note pg_performance about optimalization possibility of IN predicate when list of values is longer than eighty numbers. For longer list is better create constant subqueries with using multi values:
SELECT *
FROM tab
WHERE x IN (1,2,3,..n); -- n > 70
-- faster case
SELECT *
FROM tab
WHERE x IN (VALUES(10),(20));
Using VALUES is faster for bigger number of items, so don't use it for small set of values.
Basically, SELECT * FROM WHERE IN ((1),(2)...) with a long list of values is very slow. It's ridiculously faster if you can convert it to a list of values, like SELECT * FROM WHERE IN (VALUES(1),(2) ...)
Unfortunately, since this is happening in active record, it's a little tricky to exercise control over the query. You can either avoid using an includes call and just manually construct the SQL to load all your child records, and then manually build up the associations.
Alternatively, you can monkey patch active record. Here's what I've done on rails 4.2, in an initializer.
module PreloaderPerformance
private
def query_scope(ids)
if ids.count > 100
type = klass.columns_hash[association_key_name.to_s].sql_type
values_list = ids.map do |id|
if id.kind_of?(Integer)
" (#{id})"
elsif type == "uuid"
" ('#{id.to_s}'::uuid)"
else
" ('#{id.to_s}')"
end
end.join(",")
scope.where("#{association_key_name} in (VALUES #{values_list})")
else
super
end
end
end
module ActiveRecord
module Associations
class Preloader
class Association #:nodoc:
prepend PreloaderPerformance
end
end
end
end
Doing this I've seen a 50x speed up of some of my queries, with no issues as of yet. Note it's not fully battle tested, and I bet it will have some issues if you're association is using a unique data type for the foreign_key relationship. In my data base, I only use uuids or integers for our associations. The usual caveats about monkey patching core rails behavior applies.
I know find_each can be used to batch queries, which might lighten the memory load here. Could you try out the following and see how it impacts upon the time?
Replay.where(map_id: params['map_id'].to_i).includes(:players).find_each(batch_size: 100).map do |replay|
replay.to_json(includes: :players)
end
I'm not sure this will work. It might be the mapping negates the benefits of batching - there are certainly more queries, but it'll use less memory as it doesn't need to store > 20k records at a time.
Have a play and see how it looks - fiddle with the batch size too, see how that affects things.
There's a caveat in that you can't apply a limit, so bear that in mind.
I'm sure someone else'll come up with a far slicker solution, but hope this might help in the meantime. If it's awful when you check the speed, let me know and I'll delete this answer :)

Which of the following query has the lowest cost?

Which of the following query has the lowest cost?
A.
def recent_followers
self.followers.recent.includes(:user).collect {|f| f.user.name }.to_sentence
end
B.
Select followers where user_id = 1
Select users where user_id in (2,3,4,5)
Database querying is always faster, than Ruby processing.
Your first option uses collect, which has a disadvantage since it will have to load the whole collection into memory before processing.
You could rewrite your first try as:
followers.recent.joins(:user).pluck('users.name') # no need for self, btw

Handling lots of report / financial data in rails 3, without slowing down?

I'm trying to figure out how to ask this - so I'll update the question as it goes to clear things up if needed.
I have a virtual stock exchange game site I've been building for fun. People make tons of trades, and each trade is its own record in a table.
When showing the portfolio page, I have to calculate everything on the fly, on the table of data - i.e. How many shares the user has, total gains, losses etc.
Things have really started slowing down, when I try to segment it by trades by company by day.
I don't really have any code to show to demonstrate this - but it just feels like I'm not approaching this correctly.
UPDATE: This code in particular is very slow
#Returning an array of values for a total portfolio over time
def portfolio_value_over_time
portfolio_value_array = []
days = self.from_the_first_funding_date
companies = self.list_of_companies
days.each_with_index do |day, index|
#Starting value
days_value = 0
companies.each do |company|
holdings = self.holdings_by_day_and_company(day, company)
price = Company.find_by_id(company).day_price(day)
days_value = days_value + (holdings * price).to_i
end
#Adding all companies together for that day
portfolio_value_array[index] = days_value
end
The page load time can be up to 20+ seconds - totally insane. And I've cached a lot of the requests in Memcache.
Should I not be generating reports / charts on the live data like this? Should I be running a cron task and storing them somewhere? Whats the best practice for handling this volume of data in Rails?
The Problem
Of course it's slow. You're presumably looking up large volumes of data from each table, and performing multiple lookups on multiple tables on every iteration through your loop.
One Solution (Among Many)
You need to normalize your data, create a few new models to store expensive calculated values, and push more of the calculations onto the database or into tables.
The fact that you're doing a nested loop over high-volume data is a red flag. You're making many calls to the database, when ideally you should be making as few sequential requests as possible.
I have no idea how you need to normalize your data or optimize your queries, but you can start by looking at the output of explain. In general, though, you probably will want to eliminate any full table scans and return data in larger chunks, rather than a record at a time.
This really seems more like a modeling or query problem than a Rails problem, per se. Hope this gets you pointed in the right direction!
You should precompute and store all this data on another table. An example table might look like this:
Table: PortfolioValues
Column: user_id
Column: day
Column: company_id
Column: value
Index: user_id
Then you can easily load all the user's portfolio data with a single query, for example:
current_user.portfolio_values
Since you're using memcached anyway, use it to cache some of those queries. For example:
Company.find_by_id(company).day_price(day)

Updating several records at once in rails

In a rails 2 app I'm building, I have a need to update a collection of records with specific attributes. I have a named scope to find the collection, but I have to iterate over each record to update the attributes. Instead of making one query to update several thousand records, I'll have to make several thousand queries.
What I've found so far is something like Model.find_by_sql("UPDATE products ...)
This feels really junior, but I've googled and looked around SO and haven't found my answer.
For clarity, what I have is:
ps = Product.last_day_of_freshness
ps.each { |p| p.update_attributes(:stale => true) }
What I want is:
Product.last_day_of_freshness.update_attributes(:stale => true)
It sounds like you are looking for ActiveRecord::Base.update_all - from the documentation:
Updates all records with details given if they match a set of conditions supplied, limits and order can also be supplied. This method constructs a single SQL UPDATE statement and sends it straight to the database. It does not instantiate the involved models and it does not trigger Active Record callbacks or validations.
Product.last_day_of_freshness.update_all(:stale => true)
Actually, since this is rails 2.x (You didn't specify) - the named_scope chaining may not work, you might need to pass the conditions for your named scope as the second parameter to update_all instead of chaining it onto the end of the Product scope.
Have you tried using update_all ?
http://api.rubyonrails.org/classes/ActiveRecord/Relation.html#method-i-update_all
For those who will need to update big amount of records, one million or even more, there is a good way to update records by batches.
product_ids = Product.last_day_of_freshness.pluck(:id)
iterations_size = product_ids.count / 5000
puts "Products to update #{product_ids.count}"
product_ids.each_slice(5000).with_index do |batch_ids, i|
puts "step #{i} of iterations_size"
Product.where(id: batch_ids).update_all(stale: true)
end
If your table has a lot indexes, it also will increase time for such operations, because it will need to rebuild them. When I called update_all for all records in table, there were about two million records and twelve indexes, operation didn't accomplish in more than one hour. With this approach it took about 20 minutes in development env and about 4 minutes in production, of course it depends on application settings and server hardware. You can put it in rake task or some background worker.
Loos like update_all is the best option... though I'll maintain my hacky version in case you're curious:
You can use just plain-ole SQL to do what you want thus:
ps = Product.last_day_of_freshness
ps_ids = ps.map(%:id).join(',') # local var just for readability
Product.connection.execute("UPDATE `products` SET `stale` = TRUE WHERE id in (#{ps_ids)")
Note that this is db-dependent - you may need to adjust quoting style to suit.

Rails Random Active Record with Pagination

I need to find all records for a particular resource and display them in a random order, but with consistent pagination (you won't see the same record twice if you start paging). The display order should be randomized each time a user visits a page. I am using will_paginate. Any advice?
Store a random number in the user session cookies, then use that as seed for your database random function. This will be the same until the user closes their browser, and thus they will all see random, consistent records:
Get a large, random number:
cookies[:seed] = SecureRandom.random_number.to_s[2..20].to_i
Use this seed with e.g. MySQL:
SomeModel.all.order("RAND ?", cookies[:seed])
This is not standard to my knowledge. I can see a use for this for instance for online tests.
I would suggest using a list per session/user. So when a user first goes to the page, you determine a list of ID's, in a random order, and all consecutive views you will use this list to show the correct order for that user/session.
I hope that the amount of rows is limited, and then this would make sense, for instance for tests. Also, when a user would leave a test before finishing it completely, she could continue where he left off. But maybe that is not relevant for you.
Hope this helps.
If you're using a database such as MySQL that has a randomize function such as RAND(), you can just add that to your pagination query like so:
Resource.paginate( ... :order => "RAND()" ... )
Check out some of the comments here regarding performance concerns: https://rails.lighthouseapp.com/projects/8994/tickets/1274-patch-add-support-for-order-random-in-queries
Not sure if you still need help with this. One solution I've done in the past is to do the query with RAND but without pagination at first. Then store those record ID's and use that stored list to lookup and paginate from there. The initial RAND query could be set to only run when the page is 1 or nil. Just a thought.
I ended-up with this solution that worked for me on Postgres:
session[:seed] ||= rand() # should be between [-1..1]
seed = session[:seed]
Product.select("setseed(#{seed})").first
Product.order('random()').limit(10).offset(params[:offset])

Resources