Rails and slow third-party APIs - ruby-on-rails

I'm building a Rails app with a huge third-party APIs usage. APIs are not like common web APIs they are about system linux tools, so requests to these APIs will take rather long time (1-5s).
Example:
I have a Document model like
def index
#documents = current_user.documents # just simple DB request
end
def create
#document = Document.new(document_params)
#document.sid = call_my_slow_api(#document.title)
#document.save
end
Let's say Alice start create request and waiting for reply. Same time Bob start index request. If i have only 1 worker it gonna be a problem (Bob will see the index only after Alice get reply).
What is the best way to separate API calls (call_my_slow_api) logic in Rails?
Thanks.

Background jobs might be the way to go, if you are on the latest version of rails 4 (4.3?) activejob is there as a common DSL to tie in with any worker service. If not Resque, DelayedJobs, etc are those you may wanna explore.

Related

How to organize delayed results processing in microservices communication?

I have the following system: my Rails server issues commands to the Flask server and the latest one responses immediately with status 200. After that Flask server runs a background task with some time-consuming function. After a little while, it comes up with some results and designed to send data back to the Rails server via HTTP (see diagram)
Each Flask data portion can affect several Rails models (User, Post etc...). Here I faced with two questions:
How I should structure my controllers/actions on the Rails side in this case? Currently, I think about one controller and each action of it corresponds to Python 'delayed' data portion.
Is it a normal way of microservices communication? Or I can organize it in a different, more simple way?
This sounds like pretty much your standard webhook process. Rails pokes Flask with a GET or POST request and Flask pokes back after a while.
For example lets say we have reports, and after creating the report we need flask to verify the report:
class ReportsController
# POST /reports
def create
#report = Report.new(report_params)
if #report.save
FlaskClient.new.verify(report) # this could be delegated to a background job
redirect_to #report
else
render :new
end
end
# PATCH /reports/:id/verify
def verify
# process request from flask
end
end
class FlaskClient
include Httparty
base_uri 'example.com/api'
format 'json'
def verify(report)
self.class.post('/somepath', data: { id: report.id, callback_url: "/reports/#{report.id}/verify", ... })
end
end
Of course the Rails app does not actually know when Flask will respond or that Flask and the background service are different. It just sends and responds to http requests. And you definitely don't want rails to wait around so save what you have and then later the hook can update the data.
If you have to update the UI on the Rails side without the user having to refresh manually you can use polling or websockets in the form of ActionCable.

How to get IDs of models subscribed to via a specific channel

How do I get a list of all ActiveRecord models currently subscribed to via a specific ActionCable Channel?
I'm sorry to give you an answer you do not want, but...:
You don't get a list of all subscribed clients. You shouldn't be able to. If you need this information, you might be experiencing a design flaw.
Why?
The pub/sub paradigm is designed to abstract these details away, allowing for horizontal scaling in a way that has different nodes managing their own subscription lists.
Sure, when you're running a single application on a single process, you might be able to extract this information - but the moment you scale up, using more processes / machines, this information is distributed and isn't available any more.
Example?
For example, when using iodine's pub/sub engine (see the Ruby iodine WebSocket / HTTP server for details):
Each process manages it's own client list.
Each process is a "client" in the master / root process.
Each master / root process is a client in a Redis server (assuming Redis is used).
Let's say you run two iodine "dynos" on Heroku, each with 16 workers, then:
Redis sees a maximum of two clients per channel.
Each of the two master processes sees a maximum of 16 clients per channel.
Each process sees only the clients that are connected to that specific process.
As you can see, the information you are asking for isn't available anywhere. The pub/sub implementation is distributed across different machines. Each process / machine only manages the small part of the pub/sub client list.
EDIT (1) - answering updated question
There are three possible approaches to solve this question:
a client-side solution;
a server-side solution; and
a lazy (invalidation) approach.
As a client side solution, the client could register to a global "server-notifications-channel". When a "re-authenticate" message appears, the client should re-authenticate, initiating the unique token generation on it's unique connection.
A server side solution requires the server-side connection to listen to a global "server-notifications-channel". Then the connection object will re-calculate the authentication token and transmit a unique message to the client.
The lazy-invalidation approach is to simply invalidate all tokens. Connected clients will stay connected (until they close the browser, close their machine or exit their app). Clients will have to re-authenticate when establishing a new connection.
Note (added as discussed in the comments):
The only solution that solves the "thundering herd" scenario is the lazy/invalidation solution.
Any other solution will cause a spike in network traffic and CPU consumption since all connected clients will be processing an event at a similar time.
Implementing:
With ActionCable a client-side solution might be easier to implement. It's design and documentation are very "push" oriented. They often assume a client side processing approach.
On iodine, server-side subscriptions simply require a block to be passed along to the client.subscribe method. This creates a client-specific subscription with an event that runs on the server (instead of a message sent to the client).
The lazy-invalidation approach might hurt user experience, depending on the design, since they might have to re-enter credentials.
On the other hand, lazy-invalidation might be the safest, add to the appearance of safety and ease the burden on the servers at the same time.
WARNING: Please see #Myst answer and associated comments. The answer below isn't recommended when scaling beyond a single server instance.
PATCH REQUIRED FOR DEVELOPMENT AND TEST ENVIRONMENT
module ActionCable
module SubscriptionAdapter
class SubscriberMap
def get_subscribers
#subscribers
end
end
end
end
CODE TO GET IDs OF MODELS SUBSCRIBED TO
pubsub = ActionCable.server.pubsub
if Rails.env.production?
channel_with_prefix = pubsub.send(:channel_with_prefix, ApplicationMetaChannel.channel_name)
channels = pubsub.send(:redis_connection).pubsub('channels', "#{channel_with_prefix}:*")
subscriptions = channels.map do |channel|
Base64.decode64(channel.match(/^#{Regexp.escape(channel_with_prefix)}:(.*)$/)[1])
end
else #DEVELOPMENT or Test Environment: Requires patching ActionCable::SubscriptionAdapter::SubscriberMap
subscribers = pubsub.send(:subscriber_map).get_subscribers.keys
subscriptions = []
subscribers.each do |sid|
next unless sid.split(':').size === 2
channel_name, encoded_gid = sid.split(':')
if channel_name === ApplicationMetaChannel.channel_name
subscriptions << Base64.decode64(encoded_gid)
end
end
end
# the GID URI looks like that: gid://<app-name>/<ActiveRecordName>/<id>
gid_uri_pattern = /^gid:\/\/.*\/#{Regexp.escape(SomeModel.name)}\/(\d+)$/
some_model_ids = subscriptions.map do |subscription|
subscription.match(gid_uri_pattern)
# compacting because 'subscriptions' include all subscriptions made from ApplicationMetaChannel,
# not just subscriptions to SomeModel records
end.compact.map { |match| match[1] }

Production Rails 3 application can't handle over 200 requests per minute

I have a Rails production application that is down several times per day. This application, in addition to serving its users, is the endpoint for a 3rd party website that sends it updates.
Occasionally, these updates will come flooding in so fast that the requests back up and the application becomes unavailable for long periods of time. It is a legitimate usage which ends up causing a Denial of Service.
The request from the 3rd party is pretty simple:
class NotificationsController < ApplicationController
def notify
begin
notification_xml = request.body.read
notification_hash = Hash.from_xml(item_response_xml)['Envelope']['Body']['NotificationResponse']
user = User.find(notification_hash['UserID'])
user.delay.set_notification(notification_hash)
rescue Exception => bang
logger.error bang.backtrace
unless user.blank?
alert_file_name = "#{user.id}_#{notification_hash['Message']['MessageID']}_#{notification_hash['NotificationEventName']}_#{notification_hash['Timestamp']}.xml"
File.open(alert_file_name, 'w') {|f| f.write(notification_xml) }
end
end
render nothing: true, status: 200
end
end
I have two app servers against a very large database. However, when this 3rd party website really hits us with the notification requests, over 200 per minute up to close to 1,000 requests per minute, both webservers get completely tied up.
You can also see above that I'm using the .delay call since I'm using Sidekiq. I thought that would help, and it did for a while, but the application can't handle that many requests.
Other than handling the requests in a separate application, which I'm not sure is really possible in my EngineYard installation, is there something I can do to speed up the handling of this request?
If it takes too much to process all those request, try a different approach.
Create a new model (I will call it Request) with only one field (I'll name it message) - the xml sent to you by that 3rd party app.
Rewrite your notify action to be very simple and fast:
def notify
Request.create(message: request.body)
render nothing: true, status: 200
end
Create a new action, let's say process_requests like this:
def process_requests
Request.order('id ASC')find_in_batches(100) do |group|
group.each do |request|
process_request(request)
request.destroy
end
end
end
def process_request(notification_xml)
begin
notification_hash = Hash.from_xml(item_response_xml)['Envelope']['Body']['NotificationResponse']
user = User.find(notification_hash['UserID'])
user.set_notification(notification_hash)
rescue Exception => bang
logger.error bang.backtrace
unless user.blank?
alert_file_name = "#{user.id}_#{notification_hash['Message']['MessageID']}_#{notification_hash['NotificationEventName']}_#{notification_hash['Timestamp']}.xml"
File.open(alert_file_name, 'w') {|f| f.write(notification_xml) }
end
end
Create a cron and call process_requests at a defined interval (few minutes).
I never used Sidekiq so I preferred to use find_in_batches (I used a batch of 100 results just for the sake of example).
notify action shouldn't run for more than a few milliseconds (inserts are pretty fast) so this should be able to handle the incoming traffic in your critical moments.
If you try something similar and it helps your servers to reduce the load in critical moments let me know :D
If this will be useful and you insert background processing here too, please post that for the others to see.
If you're monitoring this app with New Relic/AppNet/something else, checking your reports might give you an idea of some long-hanging fruit. We've only got a small picture of the application here; it's possible that enhancements elsewhere in the app might help as well.
With that said, here are a few ideas which can be applied separately or together:
Do Less Work on Intake
Right now you're doing a bunch of XML processing—which is expensive—before you pass the job off to Sidekiq. That's a choke point, and by running in the app process it's tying up your application.
If your Redis instance has enough memory, consider refactoring notify so the whole XML payload gets passed off to Sidekiq. You're already always returning a 200 response to the API consumer, so there's no impact on your existing external API.
Your worker instances can then process the XML payloads at their own pace without impacting the application.
Implement API Throttling
The third-party site is hammering you at a tremendous rate not normally permitted even by huge sites. That's a problem.
If you can't get them to address it on their end, play like the big dogs: Implement request throttling on your end. You likely have some ability to do this at the Rack level on EngineYard (though a quick search of their docs didn't immediately yield anything), but even doing it at the application level is likely to improve things.
There's a previous Stack Overflow discussion that may offer a couple options.
Proxy the API
A few services exist that will proxy your API for you, allowing you to easily implement features like rate limiting, throttling, and quotas that might otherwise be difficult to add.
The one I'm familiar with off the top of my head is Azure's API Management service. If this isn't a revenue-generating project, the cost might be prohibitive. ($49/month postpaid, though it would be cheaper prepaid, or could even be free if you qualify for BizSpark.)
Farm the API Out
The more advanced cousin of API proxies, "API as a Service" actually lets you run your API on its own VM instance—as well as offering the features a proxy does. If your database isn't a choke point, this can be a way to spread the load out and help prevent machine clients from affecting the experience of human clients.
The ten thousand pound gorilla is Apigee, though there are a variety of other established and startup options.
There is a catch: Most of these services are built around Node.js. If your Rails app is already leaning toward service-oriented architecture, and if you know and like JavaScript, this may not be an issue for you. Otherwise, the need to build an interface between services and maintain a service in a second language may be a bridge too far.

Sending data from an analytics engine to a Rails server

I have an analytics engine which periodically packages a bunch of stats in JSON format. I want to send these packages to a Rails server. Upon a package arriving, the Rails server should examine it, generate a model instance out of it (for historical purposes), and then display the contents to the user. I've thought of two approaches.
1) Have a little app residing on the same host as the Rails server to be listening for these packages (using ZeroMQ). Upon receiving a package, the app would invoke a Rails action through CURL, passing on the package as a parameter. My concern with this approach is that my Rails server checks that only signed-in users can access actions which affect models. By creating an action accessible to this listening app (and therefore other entities), am I exposing myself to a major security flaw?
2) The second approach is to simply have the listening app dump the package into a special database table. The Rails server will then periodically check this table for new packages. Upon detecting one or more, it will process them and remove them from the table.
This is the first time I'm doing something like this, so if you have techniques or experiences you can share for better solutions, I'd love to learn.
Thank you.
you can restrict access to a certain call by limiting the host name that is allowed for the request in routes.rb
post "/analytics" => "analytics#create", :constraints => {:ip => /127.0.0.1/}
If you want the users to see updates, you can use polling to refresh the page every minute orso.
1) Yes you are exposing a major security breach unless :
Your zeroMQ app provides the needed data to do authentification and authorization on the rails side
Your rails app is configured to listen only on the 127.0.0.1 interface and is thus not accessible from the outside
Like Benjamin suggests, you restrict specific routes to certain IP
2) This approach looks a lot like what delayed_job does. You might wanna take a look there : https://github.com/collectiveidea/delayed_job and use a rake task to add a new job.
In short, your listening app will call a rake task that will add a custom delayed_job when receiving a packet. Then let delayed_job handle the load. You benefit from delayed_job goodness (different queues, scaling, ...). The hard part is getting the result.
One idea would be to associated a unique ID with each job, and have the delayed_job task output the result in a data store wich associated the job ID with the result. This data store can be a simple relational table
+----+--------+
| ID | Result |
+----+--------+
or a memecache/redis/whatever instance. You just need to poll that data store looking for the result associated with the job ID. And delete everything when you are done displaying that to the user.
3) Why don't you directly POST the data to the rails server ?
Following Benjamin's lead, I implemented a filter for this particular action.
def verify_ip
#ips = ['127.0.0.1']
if not #ips.include? request.remote_ip
redirect_to root_url
end
end
The listening app on the localhost now invokes the action, passing the JSON package received from the analytics engine as a param. Thank you.

Should tweets be done in the background?

On a high Twitter app site. Where the app sends tweets via the users oauth credentials. Should the tweets be sent in the background, via a background worker (Resque, Delayed Job, etc)? Or should the web process handle it?
It really depends on your use case. Twitter itself I think sends an AJAX request to the API. You could do the same if it makes sense in your interface, but it does mean that you're using a web process to do this. One of the benefits to this is that you can verify that the request was successful before returning a resopnse to the user. This is much easier than a scenario where you queue something in the background, it fails, and you want to alert the user (e.g. through a "real-time" ajax/socket-based message system or a flash notice on another request).
If you aren't worried about showing the Tweets (e.g. your application is sending as part of a larger action), then doing it in the background is definitely the way to go.
Resque is great and jobs are really lightweight, so you could a quick integration to process these in the background pretty quickly.
# app/jobs/send_tweet.rb
class SendTweet
#queue = :tweets
def self.perform(user_id, content)
user = User.find(user_id)
# send Tweet
end
end
# app/controllers/tweet_controller.rb
def create
# assuming some things here, like validation and a `current_user` method
Resque.enqueue(SendTweet, current_user.id, params[:tweet][:message])
redirect_to :index
end

Resources