Sending Thousands of Request at the Same Time with Ruby on Rails? - ruby-on-rails

I need to develop an endpoint in rails that will send (possibly) hundreds/thousands of request, process it then return/render the json to user/client.
I've tried using thread pool with the size of 5, but it took forever, but when I tried increasing the size to the number of request, it threw ThreadError: can't create Thread: Resource temporarily unavailable exception.
I don't think I can use background job/worker for this because I should return the result.
So what should I do?
I was thinking that I should wrap the process in 20sec timeout so it doesn't reach rails 30sec limit, and if it's still not finished in 20sec, it will return the unfinished result. It goes like this
result = Queue.new
begin
Timeout::timeout(20) do
elements.each do |element|
pool.process {
response = send_request(element)
result << response
}
end
pool.shutdown
end
rescue Timeout::Error
pool.shutdown
end
result = (Array.new(elements.size {result.pop})).flatten
render json: {
data: result
}
But it's still not working, the process still keep going even after it timeout.

Related

Rails - multiple theads to avoid the slack 3 second API response rule

I am working with the slack API. My script does a bunch of external processing and in some cases it can take around 3-6 seconds. What is happening is the Slack API expects a 200 response within 3 seconds and because my function is not finished within 3 seconds, it retries again and then it ends up posting the same automated responses 2-3 times.
I confirmed this by commenting out all the functions and I had no issue, it posted the responses to slack fine. I then added sleep 10 and it done the same responses 3 times so the ohly thing different was it took longer.
From what I read, I need to have threaded responses. I then need to first respond to the slack API in thread 1 and then go about processing my functions.
Here is what I tried:
def events
Thread.new do
json = {
"text": "Here is your 200 response immediately slack",
}
render(json: json)
end
puts "--------------------------------Json response started----------------------"
sleep 30
puts "--------------------------------Json response completed----------------------"
puts "this is a successful response"
end
When I tested it the same issue happened so I tried using an online API tester and it hits the page, waits 30 seconds and then returns the 200 response but I need it to respond immediately with the 200, THEN process the rest otherwise I will get duplicates.
Am I using threads properly or is there another way to get around this Slack API 3 second response limit? I am new to both rails and slack API so a bit lost here.
Appreciate the eyes :)
I would recommend using ActionJob to run the code in the background if you don't need to use the result of the code in the response. First, create an ActiveJob job by running:
bin/rails generate job do_stuff
And then open up the file created in app/jobs/do_stuff_job.rb and edit the #perform function to include your code (so the puts statements and sleep 30 in your example). Finally, from the controller action you can call DoStuff.perform_later and your job will run in the background! Your final controller action will look something like this:
def events
DoStuff.perform_later # this schedules DoStuff to be done later, in
# the background, so it will return immediately
# and continue to the next line.
json = {
"text": "Here is your 200 response immediately slack",
}
render(json: json)
end
As an aside, I'd highly recommend never using Thread.new in rails. It can create some really confusing behavior especially in test scripts for a number of reasons, but usually because of how it interacts with open connections and specifically ActiveRecord.

How to create async action in Ruby on Rails

I have a page that needs parameters received by a request from a third-party service. Unfortunately, the request takes a long time and the server crashes with a 504 error.
def show
start_time = Time.now
#project = Project.find(params[:id])
file = File.new(project.rvt_schema, 'rb')
rvt_params = ForgeHandler.instance.get_urn_token_params(file, "#{#project.id.to_s}.rvt")
#urn = rvt_params[:urn]
#token = rvt_params[:token]
end_time = Time.now
end
The most time inside the method is taken by request:
# Translate previously uploaded file to SVF format
def translate_to_svf(object_id,access_token)
base_64_urn = Base64.strict_encode64(object_id)
response = RestClient.post("#{API_URL}/modelderivative/v2/designdata/job",
{
input: {
urn: base_64_urn
},
output: {
formats: [
{
type: "svf",
views: [
"3d"
]
}
]
}
}.to_json,
{ Authorization: "Bearer #{access_token}", content_type:'application/json' })
return response
end
Which status is checked in cycle by another method:
def verify_job_complete(base_64_urn,access_token)
is_complete = false
while(!is_complete)
response = RestClient.get("#{API_URL}/modelderivative/v2/designdata/#{base_64_urn}/manifest",
{ Authorization: "Bearer #{access_token}"} )
json = JSON.parse(response.body)
if(json["progress"]=="complete")
is_complete = true
puts("***** Finished translating your file to SVF - status: #{json['status']}, progress: #{json['progress']} ")
else
puts("***** Haven't finished translating your file to SVF - status: #{json['status']}, progress: #{json['progress']} ")
sleep 5
end
end
I would like to implement asynchronous parameter loading. So I want to load data after losing control of the controller but but initializing the beginning of data loading from remote request in it. Tell me how best to implement this.
Or another way that would remove the error "Gateway timeout".
While this might be more of a question for the ruby-on-rails community, let me answer from the Autodesk Forge standpoint:
First of all, you should never wait for the Model Derivative job to complete when handling a request to your server. If the design file is complex enough, the translation could take up to hours, so this should definitely be handled asynchronously.
One option is to poll the status of the translation by requesting "derivative manifest" using the GET :urn/manifest endpoint.
Another option is to setup a Forge Webhook to get notified when the extraction.finished event is triggered.
It's probably easier to offload asynchronous stuff to a worker and save a reference to the user that needs to know about it. If you couple it with something like StimulusReflex you can render the result once it's finished. Another option might be the Render Async gem.

Why does my Net::HTTP.post_form timeout?

In my rails app controller I am posting to the api of the app on the same machine. I have build this out to handle the posting the data to the url:
url = "http://172.16.155.165:3000/api/jobs"
params = {
:input => "original/video.h264",
:output => "new/video.mp4",
:preset => 'h264'
}
jobResults = Net::HTTP.post_form(URI.parse(url), params)
This works great when I run this code through rails console but when I use it in my controller it gives me this error after loading for a minute or so:
Timeout::Error in SeminarsController#create
Timeout::Error
Once the timeout happens the data is actually posted and the api does what it should. It is like it is hanging until it times out then posts the data. The controller never goes beyond this step though. It should write the response body to a file with jobResults.body which would work fine if it didn't time out. If I write this into rails console it outputs the response immediately. The api will never take a whole minute to respond.
Am I doing something to cause this to happen? How can I make it work right?
edit:
This is the code for create in app/controllers/api/jobs_controller.rb:
def create
job = Job.from_api(params, :callback_url => lambda { |job| api_job_url(job) })
if job.valid?
response.headers["X-State-Changes-Location"] = api_state_changes_url(job)
response.headers["X-Notifications-Location"] = api_notifications_url(job)
respond_with job, :location => api_job_url(job) do |format|
format.html { redirect_to jobs_path }
end
else
respond_with job do |format|
format.html { #job = job; render "/jobs/new"}
end
end
end
Yes. Ideally you should remove the long running process (yes this is long running process) into background job. Remember that when many users start updating the videos, this process will show down for many reasons (like bandwidth, API acceptance rate etc). Rake::Timeout always pops out if the process passes the threshold. It is actually designed to abort requests that are taking too long to respond. And, it is not raised in console.
How can I make it work right?
Move it to the background job. Or you can explictly increase the rake timeout interval by doing something like this
# config/initializers/timeout.rb
Rack::Timeout.timeout = 30 # seconds
But i suggest not to do this. This rake-timeout helps in debugging. Mainly people use in heroku with newrelic.

Multiple curl leads to "Too many open files error" on Windows

I have an external REST API which handles storing data in "Data Store".
On a file upload, their is a Ruby library which calls this API and passes it the data array which then gets stored in the database by the external API.
I try to pass small chunks of array to the API so as to limit the post body content length in any curl call.
The library call looks like this
def add_data(table_name, table_data)
url = "#{ExternalAPI::URL}/addData"
m_curl = Curl::Multi.new
begin
chunks = table_data.each_slice(ExternalAPI::BATCH_SIZE).to_a
chunks.each do |data_chunk|
data = {
"tableName" => table_name,
"data" => data_chunk
}.to_json
curl = Curl::Easy.new(url)
curl.headers = {}
curl.headers['Content-type'] = 'text/plain'
curl.timeout = 300
curl.post_body = data
m_curl.add(curl)
end
m_curl.perform
true
rescue Exception => e
puts "Curl Failed #{e.message}"
puts "#{e.backtrace}"
Rails.logger.error "Curl Failed #{e.message}"
return false
end
end
This causes too many open connections error in Webrick in development mode.
I assumed Multi::Curl either recycles the connections but I'm not sure whether that happens internally.
I also tried to create a new curl connection in the for loop and close it at the end of the loop (I know its inefficient) but it still led to the same error.
Can anyone please shed some light on this?
I think Multi::Curl is going to try to execute all of your connections simultaneously. You probably need to batch them into smaller groups.

Queuing API calls to fit rate limit

Using Full Contact API, but they have a rate limit of 300calls/minute. I currently have it to set that it does an API call when uploading the CSV file of emails. I want to queue it such that once it hits the rate limit or does 300 calls, it waits for 1 minute and proceeds. Then I will put delayed_job on it. How can I do that? A quick fix is to use
sleep 60
but how do I find it such that it made 300 calls already, make it sleep or queue it for next set?
def self.import(file)
CSV.foreach(file.path, headers: true) do |row|
hashy = row.to_hash
email = hashy["email"]
begin
Contact.create!(email: email, contact_hash: FullContact.person(email: email).to_json)
rescue FullContact::NotFound
Contact.create!(email: email, contact_hash: "Not Found")
end
end
end
There are several issues to think about here - is there going to be a single process using your API key at any one time, or is it possible that multiple processes would be running at once? If you have multiple delayed_job workers, I think the latter is likely. I haven't used delayed_jobs enough to give you a good solution to that, but my feeling is you would be restricted to a single worker.
I am currently working on a similar problem with an API with a restriction of 1 request every 0.5 seconds, with a maximum of 1000 per day. I haven't worked out how I want to track the per-day usage yet, but I've handled the per-second restriction using threads. If you can frame the restriction as "1 request every 0.2 seconds", that might free you up from having to track it on a minute-by-minute basis (though you still have the issue of how to keep track multiple workers).
The basic idea is that I have an request method that splits a single request into a queue of request parameters (based on the maximum number of objects allowed per request by the api), and then another method iterates over that queue and calls a block which sends the actual request to the remote server. Something like this:
def make_multiple_requests(queue, &block)
result = []
queue.each do |request|
timer = Thread.new { sleep REQUEST_INTERVAL }
execution = Thread.new { result << yield(request) }
[timer, execution].each(&:join)
end
result
end
To use it:
make_multiple_requests(queue) do |request|
your_request_method_goes_here(request)
end
The main benefit here is that if a request takes longer than the allowed interval, you don't have to wait around for the sleep to finish, and you can start your next request right away. It just guarantees that the next request won't start until at least the interval has passed. I've noticed that even though the interval is set correctly, I occasionally get an 'over-quota' response from the API. In those cases, the request is retried after the appropriate interval has passed.

Resources