Using limit and offset in rails together with updated_at and find_each - will that cause a problem? - ruby-on-rails

I have a Ruby on Rails project in which there are millions of products with different urls. I have a function "test_response" that checks the url and returns either a true or false for the Product attribute marked_as_broken, either way the Product is saved and has its "updated_at"-attribute updated to the current Timestamp.
Since this is a very tedious process I have created a task which in turn starts off 15 tasks, each with a N/15 number of products to check. The first one should check from, for example, the first to the 10.000th, the second one from the 10.000nd to the 20.000nd and so on, using limit and offset.
This script works fine, it starts off 15 process but rather quickly completes one script after another far too early. It does not terminate, it finishes with a "Process exited with status 0".
My guess here is that using find_each together with a search for updated_at as well as in fact updating the "updated_at" while running the script changes everything and does not make the script go through the 10.000 items as supposed but I can't verify this.
Is there something inherently wrong by doing what I do here. For example, does "find_each" run a new sql query once in a while providing completely different results each time, than anticipated? I do expect it to provide the same 10.000 -> 20.000 but just split it up in pieces.
task :big_response_launcher => :environment do
nbr_of_fps = Product.where(:marked_as_broken => false).where("updated_at < '" + 1.year.ago.to_date.to_s + "'").size.to_i
nbr_of_processes = 15
batch_size = ((nbr_of_fps / nbr_of_processes))-2
heroku = PlatformAPI.connect_oauth(auth_code_provided_elsewhere)
(0..nbr_of_processes-1).each do |i|
puts "Launching #{i.to_s}"
current_offset = batch_size * i
puts "rake big_response_tester[#{current_offset},#{batch_size}]"
heroku.dyno.create('kopa', {
:command => "rake big_response_tester[#{current_offset},#{batch_size}]",
:attach => false
})
end
end
task :big_response_tester, [:current_offset, :batch_size] => :environment do |task,args|
current_limit = args[:batch_size].to_i
current_offset = args[:current_offset].to_i
puts "Launching with offset #{current_offset.to_s} and limit #{current_limit.to_s}"
Product.where(:marked_as_broken => false).where("updated_at < '" + 1.year.ago.to_date.to_s + "'").limit(current_limit).offset(current_offset).find_each do |fp|
fp.test_response
end
end

As many have noted in the comments, it seems like using find_each will ignore the order and limit. I found this answer (ActiveRecord find_each combined with limit and order) that seems to be working for me. It's not working 100% but it is a definite improvement. The rest seems to be a memory issue, i.e. I cannot have too many processes running at the same time on Heroku.

Related

How can I keep the Tempfile contents from being empty in a separate (Ruby) thread?

In a Rails 6.x app, I have a controller method which backgrounds queries that take longer than 2 minutes (to avoid a browser timeout), advises the user, stores the results, and sends a link that can retrieve them to generate a live page (with Highcharts charts). This works fine.
Now, I'm trying to implement the same logic with a method that backgrounds the creation of a report, via a Tempfile, and attaches the contents to an email, if the query runs too long. This code works just fine if the 2-minute timeout is NOT reached, but the Tempfile is empty at the commented line if the timeout IS reached.
I've tried wrapping the second part in another thread, and wrapping the internals of each thread with a mutex, but this is all getting above my head. I haven't done a lot of multithreading, and every time I do, I feel like I stumble around till I get it. This time, I can't even seem to stumble into it.
I don't know if the problem is with my thread(s), or a race condition with the Tempfile object. I've had trouble using Tempfiles before, because they seem to disappear quicker than I can close them. Is this one getting cleaned up before it can be sent? The file handle actually still exists on the file system at the commented point, even though it's empty, so I'm not clear on what's happening.
def report
queue = Queue.new
file = Tempfile.new('report')
thr = Thread.new do
query = %Q(blah blah blah)
#calibrations = ActiveRecord::Base.connection.exec_query query
query = %Q(blah blah blah)
#tunings = ActiveRecord::Base.connection.exec_query query
if queue.empty?
unless #tunings.empty?
CSV.open(file.path, 'wb') do |csv|
csv << ["headers...", #parameters].flatten
#calibrations.each do |c|
line = [c["h1"], c["h2"], c["h3"], c["h4"], c["h5"], c["h6"], c["h7"], c["h8"]]
t = #tunings.select { |t| t["code"] == c["code"] }.first
#parameters.each do |parameter|
line << t[parameter.downcase]
end
csv << line
end
end
send_data file.read, :type => 'text/csv; charset=iso-8859-1; header=present', :disposition => "attachment; filename=\"report.csv\""
end
else
# When "timed out", `file` is empty here
NotificationMailer.report_ready(current_user, file.read).deliver_later
end
end
give_up_at = Time.now + 120.seconds
while Time.now < give_up_at do
if !thr.alive?
break
end
sleep 1
end
if thr.alive?
queue << "Timeout"
render html: "Your report is taking longer than 2 minutes to generate. To avoid a browser timeout, it will finish in the background, and the report will be sent to you in email."
end
end
The reason the file is empty is because you are giving the query 120 seconds to complete. If after 120 seconds that has not happened you add "Timeout" to the queue. The query is still running inside the thread and has not reached the point where you check if the queue is empty or not. When the query does complete, since the queue is now not empty, you skip the part where you write the csv file and go to the Notification.report line. At that point the file is still empty because you never wrote anything into it.
In the end I think you need to rethink the overall logic of what you are trying to accomplish and there needs to be more communication between the threads and the top level.
Each thread needs to tell the top level if it has already sent the result, and the top level needs to let the thread know that its past time to directly send the result, and instead should email the result.
Here is some code that I think / hope will give some insight into how to approach this problem.
timeout_limit = 10
query_times = [5, 15, 1, 15]
timeout = []
sent_response = []
send_via_email = []
puts "time out is set to #{timeout_limit} seconds"
query_times.each_with_index do |query_time, query_id|
puts "starting query #{query_id} that will take #{query_time} seconds"
timeout[query_id] = false
sent_response[query_id] = false
send_via_email[query_id] = false
Thread.new do
## do query
sleep query_time
unless timeout[query_id]
puts "query #{query_id} has completed, displaying results now"
sent_response[query_id] = true
else
puts "query #{query_id} has completed, emailing result now"
send_via_email[query_id] = true
end
end
give_up_at = Time.now + timeout_limit
while Time.now < give_up_at
break if sent_response[query_id]
sleep 1
end
unless sent_response[query_id]
puts "query #{query_id} timed out, we will email the result of your query when it is completed"
timeout[query_id] = true
end
end
# simulate server environment
loop { }
=>
time out is set to 10 seconds
starting query 0 that will take 5 seconds
query 0 has completed, displaying results now
starting query 1 that will take 15 seconds
query 1 timed out, we will email the result of your query when it is completed
starting query 2 that will take 1 seconds
query 2 has completed, displaying results now
starting query 3 that will take 15 seconds
query 1 has completed, emailing result now
query 3 timed out, we will email the result of your query when it is completed
query 3 has completed, emailing result now

Stripe API auto_paging get all Stripe::BalanceTransaction except some charge

I'm trying to get all Stripe::BalanceTransaction except those they are already in my JsonStripeEvent
What I did =>
def perform(*args)
last_recorded_txt = REDIS.get('last_recorded_stripe_txn_last')
txns = Stripe::BalanceTransaction.all(limit: 100, expand: ['data.source', 'data.source.application_fee'], ending_before: last_recorded_txt)
REDIS.set('last_recorded_stripe_txn_last', txns.data[0].id) unless txns.data.empty?
txns.auto_paging_each do |txn|
if txn.type.eql?('charge') || txn.type.eql?('payment')
begin
JsonStripeEvent.create(data: txn.to_json)
rescue StandardError => e
Rails.logger.error "Error while saving data from stripe #{e}"
REDIS.set('last_recorded_stripe_txn_last', txn.id)
break
end
end
end
end
But It doesnt get the new one from the API.
Can anyone could help me for this ? :)
Thanks
I think it's because the way auto_paging_each works is almost opposite to what you expect :)
As you can see from its source, auto_paging_each calls Stripe::ListObject#next_page, which is implemented as follows:
def next_page(params={}, opts={})
return self.class.empty_list(opts) if !has_more
last_id = data.last.id
params = filters.merge({
:starting_after => last_id,
}).merge(params)
list(params, opts)
end
It simply takes the last (already fetched) item and adds its id as the starting_after filter.
So what happens:
You fetch 100 "latest" (let's say) records, ordered by descending date (default order for BalanceTransaction API according to Stripe docs)
When you call auto_paging_each on this dataset then, it takes the last record, adds its id as the
starting_after filter and repeats the query.
The repeated query returns nothing because there are noting newer (starting later) than the set you initially fetched.
As far as there are no more newer items available, the iteration stops after the first step
What you could do here:
First of all, ensure that my hypothesis is correct :) - put the breakpoint(s) inside Stripe::ListObject and check. Then 1) rewrite your code to use starting_after traversing logic instead of ending_before - it should work fine with auto_paging_each then - or 2) rewrite your code to control the fetching order manually.
Personally, I'd vote for (2): for me slightly more verbose (probably), but straightforward and "visible" control flow is better than poorly documented magic.

Mongoid caching or multiple requests accessing the Database

I have an application running on Rails 4 using MongoDB and mongoid to connect the Rails app with Mongodb.
Let me please paste in a few lines of my trouble maker code.
def generate_transaction_uuid
current_time = Time.now
julian_date = current_time.strftime("%j")
year = current_time.strftime("%Y")
date = current_time.strftime("%H")
channel_identifier = '1'
last_transaction_random_sequence = TransactionUuid.last.transaction_uuid[-5..-1]
if last_transaction_random_sequence.to_i == 99999
running_sequence_numbers = '00000'
else
current_sequence_number = (last_transaction_random_sequence.to_i + 1).to_s
running_sequence_numbers = '0' * (5 - (current_sequence_number).length) + current_sequence_number.to_s
end
uuid = year[-1] + julian_date[-3..-1] + date + channel_identifier + running_sequence_numbers
TransactionUuid.create(transaction_uuid: uuid)
uuid
end
The code above generates a transaction uuid. The logic ( the core part ) is generating the last 5 digits of the transaction id, the running sequence number. Currently the logic is to generate an id and store it in a table 'TransactionUuid'. When the next request arrives, the last stored uuid is queried from the table, the last 5 digits extracted out of that and the next sequence number is created. Its again inserted into the DB and the process goes on.
last_transaction_random_sequence = TransactionUuid.last.transaction_uuid[-5..-1]
if last_transaction_random_sequence.to_i == 99999
running_sequence_numbers = '00000'
else
current_sequence_number = (last_transaction_random_sequence.to_i + 1).to_s
running_sequence_numbers = '0' * (5 - (current_sequence_number).length) + current_sequence_number.to_s
end
This works fine in most cases and gives unique transaction ids for each Rails request. But on certain cases, this generates duplicate transaction Ids. Why exactly does that happen is quite unclear to me because from the code its quite evident that the transaction Ids generated are to be unique. Also, when i run the code manually it doesnt seem to be having a problem.
I inspected into the Rails logs to check if the problem is having two simultaneous requests accessing the database at the same time, which is possible. But from the Rails logs, it seems that even requests that have a difference of about 20seconds in between them seem to generate the same transaction id on certain instances. This is whats really confusing me. We use nginx-passenger to serve our requests.
What exactly might be the issue here, some kind of database caching that happens internally or some totally unrelated weird problem or may be even a bug in my code? Any kind of help on this would be much appreciated.

Log long-running transactions in Rails

I'm getting "Mysql2::Error: Lock wait timeout exceeded" errors in my production Rails app, and I'm looking for help debugging which transaction is locking the tables for an excessively long time. MySQL has a "slow query" log, but not a log of slow transactions, as far as I can tell.
Is there a way to log information about how long transactions take directly from ActiveRecord?
My team logs slow transactions like this:
In lib/slow_transaction_logger.rb:
module SlowTransactionLogger
LOG_IF_SLOWER_THAN_SECONDS = ENV.fetch("LOG_TRANSACTIONS_SLOWER_THAN_SECONDS", "5").to_f
def transaction(*args)
start = Time.now
value = super
finish = Time.now
duration = finish - start
return value if duration <= LOG_IF_SLOWER_THAN_SECONDS
backtrace = caller.
select { |path| path.start_with?(Rails.root.to_s) }.
reject { |row|
row.include?("/gems/") ||
row.include?("/app/middleware/") ||
row.include?("/config/initializers/")
}
puts "slow_transaction_logger: duration=#{duration.round(2)} start=#{start.to_s.inspect} finish=#{finish.to_s.inspect} class=#{name} caller=#{backtrace.to_json}"
value
end
end
In config/initializers/active_record.rb:
class ActiveRecord::Base
class << self
prepend SlowTransactionLogger
end
end
So by default, it logs transactions slower than 5 seconds, in every environment.
If you set a LOG_TRANSACTIONS_SLOWER_THAN_SECONDS environment variable (e.g. on Heroku), you can tweak that limit.
We log with puts intentionally because it ends up in the logs in Heroku, and we also we see it more easily in dev and tests. But feel free to use Rails.logger if you prefer.

Activerecord transaction concurrency race condition issues

I'm currently doing live testing of a game I'm making for Android. The services are written in rails 3.1 and I'm using Postgresql. Some of my more technically savvy testers have been able to manipulate the game by recording their requests to the server and replaying them with high concurrency. I'll try to briefly describe the scenario below without getting caught up in the code.
A user can purchase multiple items, each item has its own record in the database.
The request goes to a controller action, which creates a purchase model to record information about the transaction.
The trade model has a method that sets up the purchase of the items. It essentially does a few logical steps to see if they can purchase the item. The most important is that they have a limit of 100 items per user at any given time. If all the conditions pass, a simple loop is used to create the number of items they requested.
So, what they are doing is, recording 1 valid request purchase via a proxy. Then replaying it with high concurrency, which essentially is allowing a few extra to slip through each time. So if they set it to purchase 100 quantity, they can get it up to 300-400 or if they do 15 quantity, they can get it up to like 120.
The above purchase method is wrapped in a transaction. However, even though its wrapped it won't stop it in certain circumstances where the requests are executing nearly at the same time. I'm guessing this may require some DB locking. Another twist in this that needs to be known is that at any given time rake task are being ran in cron jobs against the user table to update the players health and energy attributes. So, that cannot be blocked either.
Any assistance would be really awesome. This is my little hobby side project and I want to make sure the game is fair and fun for everyone.
Thanks so much!
Controller action:
def hire
worker_asset_type_id = (params[:worker_asset_type_id])
quantity = (params[:quantity])
trade = Trade.new()
trade_response = trade.buy_worker_asset(current_user, worker_asset_type_id, quantity)
user = User.find(current_user.id, select: 'money')
respond_to do |format|
format.json {
render json: {
trade: trade,
user: user,
messages: {
messages: [trade_response.to_s]
}
}
}
end
end
Trade Model Method:
def buy_worker_asset(user, worker_asset_type_id, quantity)
ActiveRecord::Base.transaction do
if worker_asset_type_id.nil?
raise ArgumentError.new("You did not specify the type of worker asset.")
end
if quantity.nil?
raise ArgumentError.new("You did not specify the amount of worker assets you want to buy.")
end
if quantity <= 0
raise ArgumentError.new("Please enter a quantity above 0.")
end
quantity = quantity.to_i
worker_asset_type = WorkerAssetType.where(id: worker_asset_type_id).first
if worker_asset_type.nil?
raise ArgumentError.new("There is no worker asset of that type.")
end
trade_cost = worker_asset_type.min_cost * quantity
if (user.money < trade_cost)
raise ArgumentError.new("You don't have enough money to make that purchase.")
end
# Get the users first geo asset, this will eventually have to be dynamic
potential_total = WorkerAsset.where(user_id: user.id).length + quantity
# Catch all for most people
if potential_total > 100
raise ArgumentError.new("You cannot have more than 100 dealers at the current time.")
end
quantity.times do
new_worker_asset = WorkerAsset.new()
new_worker_asset.worker_asset_type_id = worker_asset_type_id
new_worker_asset.geo_asset_id = user.geo_assets.first.id
new_worker_asset.user_id = user.id
new_worker_asset.clocked_in = DateTime.now
new_worker_asset.save!
end
self.buyer_id = user.id
self.money = trade_cost
self.worker_asset_type_id = worker_asset_type_id
self.trade_type_id = TradeType.where(name: "market").first.id
self.quantity = quantity
# save trade
self.save!
# is this safe?
user.money = user.money - trade_cost
user.save!
end
end
Sounds like you need idempotent requests so that request replay is ineffective. Where possible implement operations so that repeating them has no effect. Where not possible, give each request a unique request identifier and record whether requests have been satisfied or not. You can keep the request ID information in an UNLOGGED table in PostgreSQL or in redis/memcached since you don't need it to be persistent. This will prevent a whole class of exploits.
To deal with just this one problem create an AFTER INSERT OR DELETE ... FOR EACH ROW EXECUTE PROCEDURE trigger on the user items table. Have this trigger:
BEGIN
-- Lock the user so only one tx can be inserting/deleting items for this user
-- at the same time
SELECT 1 FROM user WHERE user_id = <the-user-id> FOR UPDATE;
IF TG_OP = 'INSERT' THEN
IF (SELECT count(user_item_id) FROM user_item WHERE user_item.user_id = <the-user-id>) > 100 THEN
RAISE EXCEPTION 'Too many items already owned, adding this item would exceed the limit of 100 items';
END IF;
ELIF TG_OP = 'DELETE' THEN
-- No action required, all we needed to do is take the lock
-- so a concurrent INSERT won't run until this tx finishes
ELSE
RAISE EXCEPTION 'Unhandled trigger case %',TG_OP;
END IF;
RETURN NULL;
END;
Alternately, you can implement the same thing in the Rails application by taking row-level lock on the customer ID before adding or deleting any item ownership records. I prefer to do this sort of thing in triggers where you can't forget to apply it somewhere, but I realise you might prefer to do it at the app level. See Pessimistic locking.
Optimistic locking is not a great fit for this application. You can use it by incrementing the lock counter on the user before adding/removing items, but it'll cause row churn on the users table and is really unnecessary when your transactions will be so short anyway.
We can't help much unless you show us your relevant schema and queries. I suppose that you do something like:
$ start transaction;
$ select amount from itemtable where userid=? and itemid=?;
15
$ update itemtable set amount=14 where userid=? and itemid=?;
commit;
An you should do something like:
$ start transaction;
$ update itemtable set amount=amount-1 returning amount where userid=? and itemid=?;
14
$ commit;

Resources