How to limit request rates in .map loop?

How to limit request rates in .map loop? - ruby-on-rails

I'm requesting amazon product advertising api with code like this:
products = asins.map do |asin|
item = Amazon::Ecs.item_lookup(asin, response_group: :Large)
json = {asin: item.get_element('Item').get('ASIN'),
manufacturer: item.get_element('ItemAttributes').get('Manufacturer'),
model: item.get_element('ItemAttributes').get('Model')}
end
And get 503 error: You are submitting requests too quickly. Please retry your requests at a slower rate.
I found out that they want 1 request per second.
What's the best way of doing it in my case?

Perhaps just decelerate by waiting a second between to iterations:
products = asins.map do |asin|
sleep 1 # wait one second before doing the next API call
item = Amazon::Ecs.item_lookup(asin, response_group: :Large)
{
asin: item.get_element('Item').get('ASIN'),
manufacturer: item.get_element('ItemAttributes').get('Manufacturer'),
model: item.get_element('ItemAttributes').get('Model')
}
end

Using sleep is for sure the first solution that comes to mind. In my opinion, it's not an elegant one, becacause it's totally not managable. I would think of some queueing system to do the work - maybe sidekiq using a self triggering worker?
Some simplified code:
# some kind of queueing logic, to fetch asins
asin = AsinQueue.fetch
# trigger first worker
LookupWorker.perform_async(asin)
# and the worker itself:
class LookupWorker
include Sidekiq::Worker
def perform(asin)
item = Amazon::Ecs.item_lookup(asin, response_group: :Large)
# all the domain logic
# queue next lookup
next_asin = AsinQueue.fetch
LookupWorker.perform_in(1.second, next_asin)
end
end

ItemLookup supports batch requests. You can lookup up to 10 items at once.

Related

Rails application taking more than 30 seconds to respond

I'm making a small rails application that fetch data from some different languages at github-api.
The problem is, when i click the button that will fetch the informations, it takes a long time to redirect to the correct page. What i got from network is, the TTFB is actually 30s (!) and is getting a response with the status 302.
The controller function that is doing the logic:
Language.delete_all
search_urls = Introduction.all.map { |introduction| "https://api.github.com/search/repositories?q=#{introduction.name}&per_page=1" }
search_urls.each do |search_url|
json_file = JSON.parse(open(search_url).read)
pl = Language.new
pl.hash_response = json_file['items'].first
pl.name = pl.hash_response['language']
pl.save
end
main_languages = %w[ruby javascript python elixir java]
deletable_languages = Introduction.all.reject do |introduction|
main_languages.include?(introduction.name)
end
deletable_languages.each do |language|
language.delete
end
redirect_to languages_path
end

I believe the bottleneck is the http request in which you are doing it one by one. You could have filtered the languages that you want before generating the url and fetch them.
However, if the count of the urls after filtered is still large, say 20-50, assuming each request take 200ms, this would take at least 4s to 10s just for http request. Thats already too long for the user to wait for. In that case you should make it a background job.
If you insist to do this synchronously, you may consider fire those http requess by spawning multiple threads and join all the results after all threads are completed. You will achieve some concurrency here as the GIL will not block thread for IO wait. But this is very prone to error as you need to manage the threads on your own.

How to split a long-lived Sidekiq job into many short-lived jobs in a Ruby on Rails app

So I'm building a website that calls a third-party API that can take from 20 seconds to 30 minutes to return a result. But I can't know this duration in advance so need to poll it frequently to check if the work is done (returns "COMPLETE" and the result) or not (returns "IN_PROGRESS"). Also, this API might be called many times from many users at the same time.
So I created a Sidekiq worker that checks the API every 5 seconds until it receives "COMPLETE", and only then it ends. But I've read that Sidekiq should only be doing short-lived jobs, and I'm struggling to get my head around how should I do it. Also I've been trying to search for an answer but I suspect I don't know the words to find what I'm looking for.
I'm sure there is a way I can tell my workers to call the API once, and if the result is "IN_PROGRESS" end but make sure another worker will do another API call to check, and so on and so on until the result is "COMPLETE".
Also, I guess this is also handy to better distribute the load in case many users demand the use of said API, because fewer workers can do more of this short-lived jobs.
This is my worker, which I hope clarifies what I'm doing right now:
class ThingProgressWorker
include Sidekiq::Worker
def perform(id)
#thing = Thing.find(id)
#thing_api_call = ThingAPICall.new // This uses the ruby library of the API
completed = false
while completed == false
result = #thing_api_call.get_result( { thing_job_name: #thing.job_name })
if !result.include? "COMPLETED"
completed = false
sleep 5
else
completed = true
#thing.status = "completed"
#thing.save
break
end
end
end
end
So if the API takes ten minutes to go from "IN_PROGRESS" to "COMPLETED" this worker will be busy for that long, which I recon is not advised at all.
I've been thinking about this for some hours now and can't think of how should I do to make each API call its own job without having a worker busy until the API is done.
The only solution I've thought so far is having a master worker that calls another worker for each API call, but then I'll still have a worker busy for as long as the API takes to send the result.
I'd appreciate any help or directions!
Thanks in advance

Try to call the worker with a delay. for example:
class ThingProgressWorker
include Sidekiq::Worker
def perform(id)
#thing = Thing.find(id)
#thing_api_call = ThingAPICall.new // This uses the ruby library of the API
result = #thing_api_call.get_result( { thing_job_name: #thing.job_name })
if !result.include? "COMPLETED"
ThingProgressWorker.perform_in(1.minute, id)
else
completed = true
#thing.status = "completed”
#thing.save
end
end
end
This will add the worker to the queue but will not run it immediately but in the time you specify.

Getting a Primary Key error in Rails using Sidekiq and Sidekiq-Cron

I have a Rails project that uses Sidekiq for worker tasks, and Sidekiq-Cron to handle scheduling. I am running into a problem, though. I built a controller (below) that handled all of my API querying, validation of data, and then inserting data into the database. All of the logic functioned properly.
I then tore out the section of code that actually inserts API data into the database, and moved it into a Job class. This way the Controller method could simply pass all of the heavy lifting off to a job. When I tested it, all of the logic functioned properly.
Finally, I created a Job that would call the Controller method every minute, do the validation checks, and then kick off the other Job to save the API data (if necessary). When I do this the first part of the logic seems to work, where it inserts new event data, but the logic where it checks to see if this is the first time we've seen an event for a specific object seems to be failing. The result is a Primary Key violation in PG.
Code below:
Controller
require 'date'
class MonnitOpenClosedSensorsController < ApplicationController
def holderTester()
#MonnitschedulerJob.perform_later(nil)
end
# Create Sidekiq queue to process new sensor readings
def queueNewSensorEvents(auth_token, network_id)
m = Monnit.new("iMonnit", 1)
# Construct the query to select the most recent communication date for each sensor in the network
lastEventForEachSensor = MonnitOpenClosedSensor.select('"SensorID", MAX("LastCommunicationDate") as "lastCommDate"')
lastEventForEachSensor = lastEventForEachSensor.group("SensorID")
lastEventForEachSensor = lastEventForEachSensor.where('"CSNetID" = ?', network_id)
todaysDate = Date.today
sevenDaysAgo = (todaysDate - 7)
lastEventForEachSensor.each do |event|
# puts event["lastCommDate"]
recentEvent = MonnitOpenClosedSensor.select('id, "SensorID", "LastCommunicationDate"')
recentEvent = recentEvent.where('"CSNetID" = ? AND "SensorID" = ? AND "LastCommunicationDate" = ?', network_id, event["SensorID"], event["lastCommDate"])
recentEvent.each do |recent|
message = m.get_extended_sensor(auth_token, recent["SensorID"])
if message["LastDataMessageMessageGUID"] != recent["id"]
MonnitopenclosedsensorJob.perform_later(auth_token, network_id, message["SensorID"])
# puts "hi inner"
# puts message["LastDataMessageMessageGUID"]
# puts recent['id']
# puts recent["SensorID"]
# puts message["SensorID"]
# raise message
end
end
end
# Queue up any Sensor Events for new sensors
# This would be sensors we've never seen before, from a Postgres standpoint
sensors = m.get_sensor_ids(auth_token)
sensors.each do |sensor|
sensorCheck = MonnitOpenClosedSensor.select(:SensorID)
# sensorCheck = MonnitOpenClosedSensor.select(:SensorID)
sensorCheck = sensorCheck.group(:SensorID)
sensorCheck = sensorCheck.where('"CSNetID" = ? AND "SensorID" = ?', network_id, sensor)
# sensorCheck = sensorCheck.where('id = "?"', sensor["LastDataMessageMessageGUID"])
if sensorCheck.any? == false
MonnitopenclosedsensorJob.perform_later(auth_token, network_id, sensor)
end
end
end
end
The above code breaks Sensor Events for new sensors. It doesn't recognize that a sensor already exists, first issue, and then doesn't recognize that the event it is trying to create is already persisted to the database (uses a GUID for comparison).
Job to persist data
class MonnitopenclosedsensorJob < ApplicationJob
queue_as :default
def perform(auth_token, network_id, sensor)
m = Monnit.new("iMonnit", 1)
newSensor = m.get_extended_sensor(auth_token, sensor)
sensorRecord = MonnitOpenClosedSensor.new
sensorRecord.SensorID = newSensor['SensorID']
sensorRecord.MonnitApplicationID = newSensor['MonnitApplicationID']
sensorRecord.CSNetID = newSensor['CSNetID']
lastCommunicationDatePretty = newSensor['LastCommunicationDate'].scan(/[0-9]+/)[0].to_i / 1000.0
nextCommunicationDatePretty = newSensor['NextCommunicationDate'].scan(/[0-9]+/)[0].to_i / 1000.0
sensorRecord.LastCommunicationDate = Time.at(lastCommunicationDatePretty)
sensorRecord.NextCommunicationDate = Time.at(nextCommunicationDatePretty)
sensorRecord.id = newSensor['LastDataMessageMessageGUID']
sensorRecord.PowerSourceID = newSensor['PowerSourceID']
sensorRecord.Status = newSensor['Status']
sensorRecord.CanUpdate = newSensor['CanUpdate'] == "true" ? 1 : 0
sensorRecord.ReportInterval = newSensor['ReportInterval']
sensorRecord.MinimumThreshold = newSensor['MinimumThreshold']
sensorRecord.MaximumThreshold = newSensor['MaximumThreshold']
sensorRecord.Hysteresis = newSensor['Hysteresis']
sensorRecord.Tag = newSensor['Tag']
sensorRecord.ActiveStateInterval = newSensor['ActiveStateInterval']
sensorRecord.CurrentReading = newSensor['CurrentReading']
sensorRecord.BatteryLevel = newSensor['BatteryLevel']
sensorRecord.SignalStrength = newSensor['SignalStrength']
sensorRecord.AlertsActive = newSensor['AlertsActive']
sensorRecord.AccountID = newSensor['AccountID']
sensorRecord.CreatedOn = Time.now.getutc
sensorRecord.CreatedBy = "Monnit Open Closed Sensor Job"
sensorRecord.LastModifiedOn = Time.now.getutc
sensorRecord.LastModifiedBy = "Monnit Open Closed Sensor Job"
sensorRecord.save
sensorRecord = nil
end
end
Job to call controller every minute
class MonnitschedulerJob < ApplicationJob
queue_as :default
def perform(*args)
m = Monnit.new("iMonnit", 1)
getImonnitUsers = ImonnitCredential.select('"auth_token", "username", "password"')
getImonnitUsers.each do |user|
# puts user["auth_token"]
# puts user["username"]
# puts user["password"]
if user["auth_token"] != nil
m.logon(user["auth_token"])
else
auth_token = m.get_auth_token(user["username"], user["password"])
auth_token = auth_token["Result"]
end
network_list = m.get_network_list(auth_token)
network_list.each do |network|
# puts network["NetworkID"]
MonnitOpenClosedSensorsController.new.queueNewSensorEvents(auth_token, network["NetworkID"])
end
end
end
end
Sorry about the length of the post. I tried to include as much information as I could about the code involved.
EDIT
Here is the code for the extended sensor, along with the JSON response:
def get_extended_sensor(auth_token, sensor_id)
response = self.class.get("/json/SensorGetExtended/#{auth_token}?SensorID=#{sensor_id}")
if response['Result'] != "Invalid Authorization Token"
response['Result']
else
response['Result']
end
end
{
"Method": "SensorGetExtended",
"Result": {
"ReportInterval": 180,
"ActiveStateInterval": 180,
"InactivityAlert": 365,
"MeasurementsPerTransmission": 1,
"MinimumThreshold": 4294967295,
"MaximumThreshold": 4294967295,
"Hysteresis": 0,
"Tag": "",
"SensorID": 189092,
"MonnitApplicationID": 9,
"CSNetID": 24391,
"SensorName": "Open / Closed - 189092",
"LastCommunicationDate": "/Date(1500999632000)/",
"NextCommunicationDate": "/Date(1501010432000)/",
"LastDataMessageMessageGUID": "d474b3db-d843-40ba-8e0e-8c4726b61ec2",
"PowerSourceID": 1,
"Status": 0,
"CanUpdate": true,
"CurrentReading": "Open",
"BatteryLevel": 100,
"SignalStrength": 84,
"AlertsActive": true,
"CheckDigit": "QOLP",
"AccountID": 14728
}
}

Some thoughts:
recentEvent = MonnitOpenClosedSensor.select('id, "SensorID", "LastCommunicationDate"') -
this is not doing any ordering; you are presuming that the records you retrieve here are the latest records.
m = Monnit.new("iMonnit", 1)
newSensor = m.get_extended_sensor(auth_token, sensor)
without the implementation details of get_extended_sensor it's impossible to tell you how
sensorRecord.id = newSensor['LastDataMessageMessageGUID']
is resolving.
It's highly likely that you are getting duplicate messages. It's almost never a good idea to use input data as a primary key - rather autogenerate a GUID in your job, use that as the primary key, and then use the LastDataMessageMessageGUID as a correlation id.

So the issue that I was running into, as it turns out, is as follows:
A sensor event was pulled from the API and queued up in as a worker job in Sidekiq.
If the queue is running a bit slow, API speed or simply a lot of jobs to process, the 1 minute poll might hit again and pull the same sensor event down and queue it up.
As the queue processes, the sensor event gets inserted into the database with it's GUID being the primary key
As the queue continues to catch up with itself, it hits the same event that was scheduled as a secondary job. This job then fails.
My solution to this was to move my "does this SensorID and GUID exist in the database" to the actual job. So when the job ran the first thing it'd do is check AGAIN for the record to already exist. This means I am checking twice, but this quick check has low overhead.
There is still the risk that a check could happen and pass while another job is inserting the record, before it commits it to the database, and then it could fail. But the retry would catch it, and then clear it on out as a successful process when the check doesn't validate on the second round. Having said that, however, the check occurs AFTER the API data has been pulled. Since, in theory, the database persist of a single record from the API data would happen really fast (much faster than the API call would happen), it really does lower the chances of you having to hit a retry on any job....and I mean you'd have a better chance of hitting the lottery than having the second check fail and trigger a retry.
If anyone else has a better, or more clean solution, please feel free to include it as a secondary answer!

Ice cube, how to set a rule of every day at a certain time for Sidetiq/Fist of Fury

Per docs I thought it would be (for everyday at 3pm)
daily.hour_of_day(15)
What I'm getting is a random mess. First, it's executing whenever I push to Heroku regardless of time, and then beyond that, seemingly randomly. So the latest push to Heroku was 1:30pm. It executed: twice at 1:30pm, once at 2pm, once at 4pm, once at 5pm.
Thoughts on what's wrong?
Full code (note this is for the Fist of Fury gem, but FoF is heavily influenced by Sidetiq so help from Sidetiq users would be great as well).
class Outstanding
include SuckerPunch::Job
include FistOfFury::Recurrent
recurs { daily.hour_of_day(15) }
def perform
ActiveRecord::Base.connection_pool.with_connection do
# Auto email lenders every other day if they have outstanding requests
lender_array = Array.new
Inventory.where(id: (Borrow.where(status1:1).all.pluck("inventory_id"))).each { |i| lender_array << i.signup.id }
lender_array.uniq!
lender_array.each { |l| InventoryMailer.outstanding_request(l).deliver }
end
end
end

Maybe you should use:
recurrence { daily.hour_of_day(15) }
instead of recurs?

Activerecord transaction concurrency race condition issues

I'm currently doing live testing of a game I'm making for Android. The services are written in rails 3.1 and I'm using Postgresql. Some of my more technically savvy testers have been able to manipulate the game by recording their requests to the server and replaying them with high concurrency. I'll try to briefly describe the scenario below without getting caught up in the code.
A user can purchase multiple items, each item has its own record in the database.
The request goes to a controller action, which creates a purchase model to record information about the transaction.
The trade model has a method that sets up the purchase of the items. It essentially does a few logical steps to see if they can purchase the item. The most important is that they have a limit of 100 items per user at any given time. If all the conditions pass, a simple loop is used to create the number of items they requested.
So, what they are doing is, recording 1 valid request purchase via a proxy. Then replaying it with high concurrency, which essentially is allowing a few extra to slip through each time. So if they set it to purchase 100 quantity, they can get it up to 300-400 or if they do 15 quantity, they can get it up to like 120.
The above purchase method is wrapped in a transaction. However, even though its wrapped it won't stop it in certain circumstances where the requests are executing nearly at the same time. I'm guessing this may require some DB locking. Another twist in this that needs to be known is that at any given time rake task are being ran in cron jobs against the user table to update the players health and energy attributes. So, that cannot be blocked either.
Any assistance would be really awesome. This is my little hobby side project and I want to make sure the game is fair and fun for everyone.
Thanks so much!
Controller action:
def hire
worker_asset_type_id = (params[:worker_asset_type_id])
quantity = (params[:quantity])
trade = Trade.new()
trade_response = trade.buy_worker_asset(current_user, worker_asset_type_id, quantity)
user = User.find(current_user.id, select: 'money')
respond_to do |format|
format.json {
render json: {
trade: trade,
user: user,
messages: {
messages: [trade_response.to_s]
}
}
}
end
end
Trade Model Method:
def buy_worker_asset(user, worker_asset_type_id, quantity)
ActiveRecord::Base.transaction do
if worker_asset_type_id.nil?
raise ArgumentError.new("You did not specify the type of worker asset.")
end
if quantity.nil?
raise ArgumentError.new("You did not specify the amount of worker assets you want to buy.")
end
if quantity <= 0
raise ArgumentError.new("Please enter a quantity above 0.")
end
quantity = quantity.to_i
worker_asset_type = WorkerAssetType.where(id: worker_asset_type_id).first
if worker_asset_type.nil?
raise ArgumentError.new("There is no worker asset of that type.")
end
trade_cost = worker_asset_type.min_cost * quantity
if (user.money < trade_cost)
raise ArgumentError.new("You don't have enough money to make that purchase.")
end
# Get the users first geo asset, this will eventually have to be dynamic
potential_total = WorkerAsset.where(user_id: user.id).length + quantity
# Catch all for most people
if potential_total > 100
raise ArgumentError.new("You cannot have more than 100 dealers at the current time.")
end
quantity.times do
new_worker_asset = WorkerAsset.new()
new_worker_asset.worker_asset_type_id = worker_asset_type_id
new_worker_asset.geo_asset_id = user.geo_assets.first.id
new_worker_asset.user_id = user.id
new_worker_asset.clocked_in = DateTime.now
new_worker_asset.save!
end
self.buyer_id = user.id
self.money = trade_cost
self.worker_asset_type_id = worker_asset_type_id
self.trade_type_id = TradeType.where(name: "market").first.id
self.quantity = quantity
# save trade
self.save!
# is this safe?
user.money = user.money - trade_cost
user.save!
end
end

Sounds like you need idempotent requests so that request replay is ineffective. Where possible implement operations so that repeating them has no effect. Where not possible, give each request a unique request identifier and record whether requests have been satisfied or not. You can keep the request ID information in an UNLOGGED table in PostgreSQL or in redis/memcached since you don't need it to be persistent. This will prevent a whole class of exploits.
To deal with just this one problem create an AFTER INSERT OR DELETE ... FOR EACH ROW EXECUTE PROCEDURE trigger on the user items table. Have this trigger:
BEGIN
-- Lock the user so only one tx can be inserting/deleting items for this user
-- at the same time
SELECT 1 FROM user WHERE user_id = <the-user-id> FOR UPDATE;
IF TG_OP = 'INSERT' THEN
IF (SELECT count(user_item_id) FROM user_item WHERE user_item.user_id = <the-user-id>) > 100 THEN
RAISE EXCEPTION 'Too many items already owned, adding this item would exceed the limit of 100 items';
END IF;
ELIF TG_OP = 'DELETE' THEN
-- No action required, all we needed to do is take the lock
-- so a concurrent INSERT won't run until this tx finishes
ELSE
RAISE EXCEPTION 'Unhandled trigger case %',TG_OP;
END IF;
RETURN NULL;
END;
Alternately, you can implement the same thing in the Rails application by taking row-level lock on the customer ID before adding or deleting any item ownership records. I prefer to do this sort of thing in triggers where you can't forget to apply it somewhere, but I realise you might prefer to do it at the app level. See Pessimistic locking.
Optimistic locking is not a great fit for this application. You can use it by incrementing the lock counter on the user before adding/removing items, but it'll cause row churn on the users table and is really unnecessary when your transactions will be so short anyway.

We can't help much unless you show us your relevant schema and queries. I suppose that you do something like:
$ start transaction;
$ select amount from itemtable where userid=? and itemid=?;
15
$ update itemtable set amount=14 where userid=? and itemid=?;
commit;
An you should do something like:
$ start transaction;
$ update itemtable set amount=amount-1 returning amount where userid=? and itemid=?;
14
$ commit;

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart