Queuing API calls to fit rate limit - ruby-on-rails

Using Full Contact API, but they have a rate limit of 300calls/minute. I currently have it to set that it does an API call when uploading the CSV file of emails. I want to queue it such that once it hits the rate limit or does 300 calls, it waits for 1 minute and proceeds. Then I will put delayed_job on it. How can I do that? A quick fix is to use
sleep 60
but how do I find it such that it made 300 calls already, make it sleep or queue it for next set?
def self.import(file)
CSV.foreach(file.path, headers: true) do |row|
hashy = row.to_hash
email = hashy["email"]
begin
Contact.create!(email: email, contact_hash: FullContact.person(email: email).to_json)
rescue FullContact::NotFound
Contact.create!(email: email, contact_hash: "Not Found")
end
end
end

There are several issues to think about here - is there going to be a single process using your API key at any one time, or is it possible that multiple processes would be running at once? If you have multiple delayed_job workers, I think the latter is likely. I haven't used delayed_jobs enough to give you a good solution to that, but my feeling is you would be restricted to a single worker.
I am currently working on a similar problem with an API with a restriction of 1 request every 0.5 seconds, with a maximum of 1000 per day. I haven't worked out how I want to track the per-day usage yet, but I've handled the per-second restriction using threads. If you can frame the restriction as "1 request every 0.2 seconds", that might free you up from having to track it on a minute-by-minute basis (though you still have the issue of how to keep track multiple workers).
The basic idea is that I have an request method that splits a single request into a queue of request parameters (based on the maximum number of objects allowed per request by the api), and then another method iterates over that queue and calls a block which sends the actual request to the remote server. Something like this:
def make_multiple_requests(queue, &block)
result = []
queue.each do |request|
timer = Thread.new { sleep REQUEST_INTERVAL }
execution = Thread.new { result << yield(request) }
[timer, execution].each(&:join)
end
result
end
To use it:
make_multiple_requests(queue) do |request|
your_request_method_goes_here(request)
end
The main benefit here is that if a request takes longer than the allowed interval, you don't have to wait around for the sleep to finish, and you can start your next request right away. It just guarantees that the next request won't start until at least the interval has passed. I've noticed that even though the interval is set correctly, I occasionally get an 'over-quota' response from the API. In those cases, the request is retried after the appropriate interval has passed.

Related

Rails - multiple theads to avoid the slack 3 second API response rule

I am working with the slack API. My script does a bunch of external processing and in some cases it can take around 3-6 seconds. What is happening is the Slack API expects a 200 response within 3 seconds and because my function is not finished within 3 seconds, it retries again and then it ends up posting the same automated responses 2-3 times.
I confirmed this by commenting out all the functions and I had no issue, it posted the responses to slack fine. I then added sleep 10 and it done the same responses 3 times so the ohly thing different was it took longer.
From what I read, I need to have threaded responses. I then need to first respond to the slack API in thread 1 and then go about processing my functions.
Here is what I tried:
def events
Thread.new do
json = {
"text": "Here is your 200 response immediately slack",
}
render(json: json)
end
puts "--------------------------------Json response started----------------------"
sleep 30
puts "--------------------------------Json response completed----------------------"
puts "this is a successful response"
end
When I tested it the same issue happened so I tried using an online API tester and it hits the page, waits 30 seconds and then returns the 200 response but I need it to respond immediately with the 200, THEN process the rest otherwise I will get duplicates.
Am I using threads properly or is there another way to get around this Slack API 3 second response limit? I am new to both rails and slack API so a bit lost here.
Appreciate the eyes :)
I would recommend using ActionJob to run the code in the background if you don't need to use the result of the code in the response. First, create an ActiveJob job by running:
bin/rails generate job do_stuff
And then open up the file created in app/jobs/do_stuff_job.rb and edit the #perform function to include your code (so the puts statements and sleep 30 in your example). Finally, from the controller action you can call DoStuff.perform_later and your job will run in the background! Your final controller action will look something like this:
def events
DoStuff.perform_later # this schedules DoStuff to be done later, in
# the background, so it will return immediately
# and continue to the next line.
json = {
"text": "Here is your 200 response immediately slack",
}
render(json: json)
end
As an aside, I'd highly recommend never using Thread.new in rails. It can create some really confusing behavior especially in test scripts for a number of reasons, but usually because of how it interacts with open connections and specifically ActiveRecord.

Sending Thousands of Request at the Same Time with Ruby on Rails?

I need to develop an endpoint in rails that will send (possibly) hundreds/thousands of request, process it then return/render the json to user/client.
I've tried using thread pool with the size of 5, but it took forever, but when I tried increasing the size to the number of request, it threw ThreadError: can't create Thread: Resource temporarily unavailable exception.
I don't think I can use background job/worker for this because I should return the result.
So what should I do?
I was thinking that I should wrap the process in 20sec timeout so it doesn't reach rails 30sec limit, and if it's still not finished in 20sec, it will return the unfinished result. It goes like this
result = Queue.new
begin
Timeout::timeout(20) do
elements.each do |element|
pool.process {
response = send_request(element)
result << response
}
end
pool.shutdown
end
rescue Timeout::Error
pool.shutdown
end
result = (Array.new(elements.size {result.pop})).flatten
render json: {
data: result
}
But it's still not working, the process still keep going even after it timeout.

Preventing rapid-fire login attempts with Rack::Attack

We have been reading the Definitive guide to form based website authentication with the intention of preventing rapid-fire login attempts.
One example of this could be:
1 failed attempt = no delay
2 failed attempts = 2 sec delay
3 failed attempts = 4 sec delay
etc
Other methods appear in the guide, but they all require a storage capable of recording previous failed attempts.
Blocklisting is discussed in one of the posts in this issue (appears under the old name blacklisting that was changed in the documentation to blocklisting) as a possible solution.
As per Rack::Attack specifically, one naive example of implementation could be:
Where the login fails:
StorageMechanism.increment("bad-login/#{req.ip")
In the rack-attack.rb:
Rack::Attack.blacklist('bad-logins') { |req|
StorageMechanism.get("bad-login/#{req.ip}")
}
There are two parts here, returning the response if it is blocklisted and check if a previous failed attempt happened (StorageMechanism).
The first part, returning the response, can be handled automatically by the gem. However, I don't see so clear the second part, at least with the de-facto choice for cache backend for the gem and Rails world, Redis.
As far as I know, the expired keys in Redis are automatically removed. That would make impossible to access the information (even if expired), set a new value for the counter and increment accordingly the timeout for the refractory period.
Is there any way to achieve this with Redis and Rack::Attack?
I was thinking that maybe the 'StorageMechanism' has to remain absolutely agnostic in this case and know nothing about Rack::Attack and its storage choices.
Sorry for the delay in getting back to you; it took me a while to dig out my old code relating to this.
As discussed in the comments above, here is a solution using a blacklist, with a findtime
# config/initilizers/rack-attack.rb
class Rack::Attack
(1..6).each do |level|
blocklist("allow2ban login scrapers - level #{level}") do |req|
Allow2Ban.filter(
req.ip,
maxretry: (20 * level),
findtime: (8**level).seconds,
bantime: (8**level).seconds
) do
req.path == '/users/sign_in' && req.post?
end
end
end
end
You may wish to tweak those numbers as desired for your particular application; the figures above are only what I decided as 'sensible' for my particular application - they do not come from any official standard.
One issue with using the above that when developing/testing (e.g. your rspec test suite) the application, you can easily hit the above limits and inadvertently throttle yourself. This can be avoided by adding the following config to the initializer:
safelist('allow from localhost') do |req|
'127.0.0.1' == req.ip || '::1' == req.ip
end
The most common brute-force login attack is a brute-force password attack where an attacker simply tries a large number of emails and passwords to see if any credentials match.
You should mitigate this in the application by use of an account LOCK after a few failed login attempts. (For example, if using devise then there is a built-in Lockable module that you can make use of.)
However, this account-locking approach opens a new attack vector: An attacker can spam the system with login attempts, using valid emails and incorrect passwords, to continuously re-lock all accounts!
This configuration helps mitigate that attack vector, by exponentially limiting the number of sign-in attempts from a given IP.
I also added the following "catch-all" request throttle:
throttle('req/ip', limit: 300, period: 5.minutes, &:ip)
This is primarily to throttle malicious/poorly configured scrapers; to prevent them from hogging all of the app server's CPU.
Note: If you're serving assets through rack, those requests may be counted by rack-attack and this throttle may be activated too quickly. If so, enable the condition to exclude them from tracking.
I also wrote an integration test to ensure that my Rack::Attack configuration was doing its job. There were a few challenges in making this test work, so I'll let the code+comments speak for itself:
class Rack::AttackTest < ActionDispatch::IntegrationTest
setup do
# Prevent subtle timing issues (==> intermittant test failures)
# when the HTTP requests span across multiple seconds
# by FREEZING TIME(!!) for the duration of the test
travel_to(Time.now)
#removed_safelist = Rack::Attack.safelists.delete('allow from localhost')
# Clear the Rack::Attack cache, to prevent test failure when
# running multiple times in quick succession.
#
# First, un-ban localhost, in case it is already banned after a previous test:
(1..6).each do |level|
Rack::Attack::Allow2Ban.reset('127.0.0.1', findtime: (8**level).seconds)
end
# Then, clear the 300-request rate limiter cache:
Rack::Attack.cache.delete("#{Time.now.to_i / 5.minutes}:req/ip:127.0.0.1")
end
teardown do
travel_back # Un-freeze time
Rack::Attack.safelists['allow from localhost'] = #removed_safelist
end
test 'should block access on 20th successive /users/sign_in attempt' do
19.times do |i|
post user_session_url
assert_response :success, "was not even allowed to TRY to login on attempt number #{i + 1}"
end
# For DOS protection: Don't even let the user TRY to login; they're going way too fast.
# Rack::Attack returns 403 for blocklists by default, but this can be reconfigured:
# https://github.com/kickstarter/rack-attack/blob/master/README.md#responses
post user_session_url
assert_response :forbidden, 'login access should be blocked upon 20 successive attempts'
end
end

Optimal way to structure polling external service (RoR)

I have a Rails application that has a Document with the flag available. The document is uploaded to an external server where it is not immediately available (takes time to propogate). What I'd like to do is poll the availability and update the model when available.
I'm looking for the most performant solution for this process (service does not offer callbacks):
Document is uploaded to app
app uploads to external server
app polls url (http://external.server.com/document.pdf) until available
app updates model Document.available = true
I'm stuck on 3. I'm already using sidekiq in my project. Is that an option, or should I use a completely different approach (cron job).
Documents will be uploaded all the time and so it seems relevant to first poll the database/redis to check for Documents which are not available.
See this answer: Making HTTP HEAD request with timeout in Ruby
Basically you set up a HEAD request for the known url and then asynchronously loop until you get a 200 back (with a 5 second delay between iterations, or whatever).
Do this from your controller after the document is uploaded:
Document.delay.poll_for_finished(#document.id)
And then in your document model:
def self.poll_for_finished(document_id)
document = Document.find(document_id)
# make sure the document exists and should be polled for
return unless document.continue_polling?
if document.remote_document_exists?
document.available = true
else
document.poll_attempts += 1 # assumes you care how many times you've checked, could be ignored.
Document.delay_for(5.seconds).poll_for_finished(document.id)
end
document.save
end
def continue_polling?
# this can be more or less sophisticated
return !document.available || document.poll_attempts < 5
end
def remote_document_exists?
Net::HTTP.start('http://external.server.com') do |http|
http.open_timeout = 2
http.read_timeout = 2
return "200" == http.head(document.path).code
end
end
This is still a blocking operation. Opening the Net::HTTP connection will block if the server you're trying to contact is slow or unresponsive. If you're worried about it use Typhoeus. See this answer for details: What is the preferred way of performing non blocking I/O in Ruby?

How to test for asynchronous HTTP requests in ruby using EventMachine

I'm getting messages of a RabbitMQ queue and each message is a URL that I want to make a request to. Now I'm using the AMQP gem to subscribe to the queue and that uses EventMachine, so I'm using the the em-http-request library to make the http requests. According to the documentation here: https://github.com/igrigorik/em-http-request/wiki/Parallel-Requests
The following will issue asynchronous http-requests:
EventMachine.run {
http1 = EventMachine::HttpRequest.new('http://google.com/').get
http2 = EventMachine::HttpRequest.new('http://yahoo.com/').get
http1.callback { }
http2.callback { }
end
So when I subscribe to the RabbitMQ queue I have the following code:
x = 0
EventMachine.run do
connection = AMQP.connect(:host => '127.0.0.1')
channel = AMQP::Channel.new(connection)
channel.prefetch(50)
queue = channel.queue("http.requests")
exchange = channel.direct("")
queue.subscribe do |metadata, payload|
url = payload.inspect
eval "
#http#{x} = EventMachine::HttpRequest.new(url).get
#http#{x}.callback do
puts \"got a response\"
puts #http#{x}.response
end
x = x+1
"
end
end
This dynamically creates new variables and creates new http requests, similar to the way described in the em-http-request documentation. But is there a way to test whether the requests are actually being made asynchronously? Is it possible to write to the console every time a get request is fired off so I can see they are fired off one after the other without waiting for a response?
You can try running tcpdump and analysing the output. If you see the TCP three-way handshakes for the two connections being interleaved then the connections are happening in parallel.
This can't really be part of an automated test though, if that's what you're trying to aim for. I would be happy just to verify that the library does what it says it does once and not make it part of a test suite.
A very simple example, demonstrating exactly what you want:
require 'em-http-request'
EM.run do
# http://catnap.herokuapp.com/3 delays the HTTP response by 3 seconds.
http1 = EventMachine::HttpRequest.new('http://catnap.herokuapp.com/3').get
http1.callback { puts 'callback 1' }
http1
puts 'fired 1'
http2 = EventMachine::HttpRequest.new('https://www.google.com/').get
http2.callback { puts 'callback 2' }
puts 'fired 2'
end
Output (for me):
fired 1
fired 2
callback 2
callback 1
Depending on your internet connection, Heroku and Google, the response to the second HTTP request will likely come in first and you can be sure, the requests are indeed done in parallel.

Resources