bandwidth management with rails? - ruby-on-rails

I was wondering if anyone knew of a way that you could manage bandwidth within a rails application in some way that isn't dependent on the web server. For example each account has a bandwidth limit. In and out bound traffic subtracts from the monthly allowance?

One option would be to add an after_filter in application.rb (so that it applies to all actions) and do the following:
def store_bandwidth_usage
response_size = response.body.size
# Assuming the User model has a bandwidth_usage attribute
#current_user.increment!(:bandwidth_usage, response_size)
end
Of course then you would need a before_filter which checked that a user had not gone over their allocated bandwidth otherwise they should be denied access.
Keep in mind that this will only be counted for requests that hit the rails server, any requests that are filled by an front-end server (e.g. images) will not be included.

Related

How to get IDs of models subscribed to via a specific channel

How do I get a list of all ActiveRecord models currently subscribed to via a specific ActionCable Channel?
I'm sorry to give you an answer you do not want, but...:
You don't get a list of all subscribed clients. You shouldn't be able to. If you need this information, you might be experiencing a design flaw.
Why?
The pub/sub paradigm is designed to abstract these details away, allowing for horizontal scaling in a way that has different nodes managing their own subscription lists.
Sure, when you're running a single application on a single process, you might be able to extract this information - but the moment you scale up, using more processes / machines, this information is distributed and isn't available any more.
Example?
For example, when using iodine's pub/sub engine (see the Ruby iodine WebSocket / HTTP server for details):
Each process manages it's own client list.
Each process is a "client" in the master / root process.
Each master / root process is a client in a Redis server (assuming Redis is used).
Let's say you run two iodine "dynos" on Heroku, each with 16 workers, then:
Redis sees a maximum of two clients per channel.
Each of the two master processes sees a maximum of 16 clients per channel.
Each process sees only the clients that are connected to that specific process.
As you can see, the information you are asking for isn't available anywhere. The pub/sub implementation is distributed across different machines. Each process / machine only manages the small part of the pub/sub client list.
EDIT (1) - answering updated question
There are three possible approaches to solve this question:
a client-side solution;
a server-side solution; and
a lazy (invalidation) approach.
As a client side solution, the client could register to a global "server-notifications-channel". When a "re-authenticate" message appears, the client should re-authenticate, initiating the unique token generation on it's unique connection.
A server side solution requires the server-side connection to listen to a global "server-notifications-channel". Then the connection object will re-calculate the authentication token and transmit a unique message to the client.
The lazy-invalidation approach is to simply invalidate all tokens. Connected clients will stay connected (until they close the browser, close their machine or exit their app). Clients will have to re-authenticate when establishing a new connection.
Note (added as discussed in the comments):
The only solution that solves the "thundering herd" scenario is the lazy/invalidation solution.
Any other solution will cause a spike in network traffic and CPU consumption since all connected clients will be processing an event at a similar time.
Implementing:
With ActionCable a client-side solution might be easier to implement. It's design and documentation are very "push" oriented. They often assume a client side processing approach.
On iodine, server-side subscriptions simply require a block to be passed along to the client.subscribe method. This creates a client-specific subscription with an event that runs on the server (instead of a message sent to the client).
The lazy-invalidation approach might hurt user experience, depending on the design, since they might have to re-enter credentials.
On the other hand, lazy-invalidation might be the safest, add to the appearance of safety and ease the burden on the servers at the same time.
WARNING: Please see #Myst answer and associated comments. The answer below isn't recommended when scaling beyond a single server instance.
PATCH REQUIRED FOR DEVELOPMENT AND TEST ENVIRONMENT
module ActionCable
module SubscriptionAdapter
class SubscriberMap
def get_subscribers
#subscribers
end
end
end
end
CODE TO GET IDs OF MODELS SUBSCRIBED TO
pubsub = ActionCable.server.pubsub
if Rails.env.production?
channel_with_prefix = pubsub.send(:channel_with_prefix, ApplicationMetaChannel.channel_name)
channels = pubsub.send(:redis_connection).pubsub('channels', "#{channel_with_prefix}:*")
subscriptions = channels.map do |channel|
Base64.decode64(channel.match(/^#{Regexp.escape(channel_with_prefix)}:(.*)$/)[1])
end
else #DEVELOPMENT or Test Environment: Requires patching ActionCable::SubscriptionAdapter::SubscriberMap
subscribers = pubsub.send(:subscriber_map).get_subscribers.keys
subscriptions = []
subscribers.each do |sid|
next unless sid.split(':').size === 2
channel_name, encoded_gid = sid.split(':')
if channel_name === ApplicationMetaChannel.channel_name
subscriptions << Base64.decode64(encoded_gid)
end
end
end
# the GID URI looks like that: gid://<app-name>/<ActiveRecordName>/<id>
gid_uri_pattern = /^gid:\/\/.*\/#{Regexp.escape(SomeModel.name)}\/(\d+)$/
some_model_ids = subscriptions.map do |subscription|
subscription.match(gid_uri_pattern)
# compacting because 'subscriptions' include all subscriptions made from ApplicationMetaChannel,
# not just subscriptions to SomeModel records
end.compact.map { |match| match[1] }

Why is Rails sharing code between user sessions?

When a user tries to sign into our Rails app, I contact a 3rd party ICAM server that returns some information about the user if he exists in the ICAM server. I get a hash back with the user name, email, etc... (Our environment is configured in a way that the ICAM server can detect the identify of the person who is attempting to sign in based on their workstation credentials).
We do all of this work in a custom gem. During the login process, I try to cache the info the ICAM sever returns so I don't have to talk to the ICAM server again. Naively, I had some code that basically did:
module Foo
def self.store_icam_data(data)
#icam_data = data
end
def self.icam_data
#icam_data || {}
end
end
I just discovered a problem when two users log into the system. When User A logs in, #icam_data is set with his info. When User B logs in, #icam_data is set with his info. The next time User A makes a request, #icam_data has by User B's info inside it instead of User A's!
I wasn't expecting the variable inside this module to be shared between threads/sessions like it is. It effectively makes all current users of the system become the last user who signs in... a pretty gnarly bug.
Can someone explain why the this #icam_data variable is getting shared across sessions? I was expecting the data/code to be more isolated than it apparently is.
There are only two ways you can share data between requests: your database (RDBMS, Redis, etc.) and session object (inside of controllers). Any other data which change and survive end of request is side-effect which should be avoided.
Your class variables are saved into memory (RAM) region which belong to particular app server process (e.g. Unicorn worker process). And single process naturally serve many requests, because it's inefficient to kill and restart Rails on each request.
So it's not "Rails sharing code", it's web application server shares its memory region amongst all requests which it serves.
If you want to bind small amount of data to current user, use session:
# save
session[:icam_data] = MyICAMModule.get_icam_data
# retain
MyICAMModule.set_icam_data(session[:icam_data])
More info on session is available in Action Controller Overview.
If you have large amount of data – use database.

Production Rails 3 application can't handle over 200 requests per minute

I have a Rails production application that is down several times per day. This application, in addition to serving its users, is the endpoint for a 3rd party website that sends it updates.
Occasionally, these updates will come flooding in so fast that the requests back up and the application becomes unavailable for long periods of time. It is a legitimate usage which ends up causing a Denial of Service.
The request from the 3rd party is pretty simple:
class NotificationsController < ApplicationController
def notify
begin
notification_xml = request.body.read
notification_hash = Hash.from_xml(item_response_xml)['Envelope']['Body']['NotificationResponse']
user = User.find(notification_hash['UserID'])
user.delay.set_notification(notification_hash)
rescue Exception => bang
logger.error bang.backtrace
unless user.blank?
alert_file_name = "#{user.id}_#{notification_hash['Message']['MessageID']}_#{notification_hash['NotificationEventName']}_#{notification_hash['Timestamp']}.xml"
File.open(alert_file_name, 'w') {|f| f.write(notification_xml) }
end
end
render nothing: true, status: 200
end
end
I have two app servers against a very large database. However, when this 3rd party website really hits us with the notification requests, over 200 per minute up to close to 1,000 requests per minute, both webservers get completely tied up.
You can also see above that I'm using the .delay call since I'm using Sidekiq. I thought that would help, and it did for a while, but the application can't handle that many requests.
Other than handling the requests in a separate application, which I'm not sure is really possible in my EngineYard installation, is there something I can do to speed up the handling of this request?
If it takes too much to process all those request, try a different approach.
Create a new model (I will call it Request) with only one field (I'll name it message) - the xml sent to you by that 3rd party app.
Rewrite your notify action to be very simple and fast:
def notify
Request.create(message: request.body)
render nothing: true, status: 200
end
Create a new action, let's say process_requests like this:
def process_requests
Request.order('id ASC')find_in_batches(100) do |group|
group.each do |request|
process_request(request)
request.destroy
end
end
end
def process_request(notification_xml)
begin
notification_hash = Hash.from_xml(item_response_xml)['Envelope']['Body']['NotificationResponse']
user = User.find(notification_hash['UserID'])
user.set_notification(notification_hash)
rescue Exception => bang
logger.error bang.backtrace
unless user.blank?
alert_file_name = "#{user.id}_#{notification_hash['Message']['MessageID']}_#{notification_hash['NotificationEventName']}_#{notification_hash['Timestamp']}.xml"
File.open(alert_file_name, 'w') {|f| f.write(notification_xml) }
end
end
Create a cron and call process_requests at a defined interval (few minutes).
I never used Sidekiq so I preferred to use find_in_batches (I used a batch of 100 results just for the sake of example).
notify action shouldn't run for more than a few milliseconds (inserts are pretty fast) so this should be able to handle the incoming traffic in your critical moments.
If you try something similar and it helps your servers to reduce the load in critical moments let me know :D
If this will be useful and you insert background processing here too, please post that for the others to see.
If you're monitoring this app with New Relic/AppNet/something else, checking your reports might give you an idea of some long-hanging fruit. We've only got a small picture of the application here; it's possible that enhancements elsewhere in the app might help as well.
With that said, here are a few ideas which can be applied separately or together:
Do Less Work on Intake
Right now you're doing a bunch of XML processing—which is expensive—before you pass the job off to Sidekiq. That's a choke point, and by running in the app process it's tying up your application.
If your Redis instance has enough memory, consider refactoring notify so the whole XML payload gets passed off to Sidekiq. You're already always returning a 200 response to the API consumer, so there's no impact on your existing external API.
Your worker instances can then process the XML payloads at their own pace without impacting the application.
Implement API Throttling
The third-party site is hammering you at a tremendous rate not normally permitted even by huge sites. That's a problem.
If you can't get them to address it on their end, play like the big dogs: Implement request throttling on your end. You likely have some ability to do this at the Rack level on EngineYard (though a quick search of their docs didn't immediately yield anything), but even doing it at the application level is likely to improve things.
There's a previous Stack Overflow discussion that may offer a couple options.
Proxy the API
A few services exist that will proxy your API for you, allowing you to easily implement features like rate limiting, throttling, and quotas that might otherwise be difficult to add.
The one I'm familiar with off the top of my head is Azure's API Management service. If this isn't a revenue-generating project, the cost might be prohibitive. ($49/month postpaid, though it would be cheaper prepaid, or could even be free if you qualify for BizSpark.)
Farm the API Out
The more advanced cousin of API proxies, "API as a Service" actually lets you run your API on its own VM instance—as well as offering the features a proxy does. If your database isn't a choke point, this can be a way to spread the load out and help prevent machine clients from affecting the experience of human clients.
The ten thousand pound gorilla is Apigee, though there are a variety of other established and startup options.
There is a catch: Most of these services are built around Node.js. If your Rails app is already leaning toward service-oriented architecture, and if you know and like JavaScript, this may not be an issue for you. Otherwise, the need to build an interface between services and maintain a service in a second language may be a bridge too far.

Updating 'notifications' for a user with memcached

I have an application with user 'Notifications' think SO or facebook or twitter. However, as notifications won't necessarily change on every page view I decided to save them in memcached.
def get_notification
if current_user
mc = Dalli::Client.new('localhost:11211')
require_dependency 'notification.rb'
#new_notification = mc.get(current_user.id.to_s+'new_notification')
if #new_notification == nil
#new_notification = Notification.getNew(current_user.id)
mc.set(current_user.id.to_s+'notification',#new_notification)
end
end
end
I overlooked the obvious flaw in this implementation. Once the notifications are loaded they would never be refreshed until the user logs out or the cache entry expires. One way to do this is to negate the user's cache entry when a event for a new notification occurs. This would force a new request to the db. Is there any other way to implement this?
Currently you are manually connecting to Memchaced, check if key exists, store content, expire it. But as you may notice this gets tedious & repetitive very soon.
However, Rails Provides you with few patterns that you can use to accomplish this but more easily.
First using Cache Stores option you can instruct rails to use Memchached
config.cache_store = :mem_cache_store, "example.com"
This cache store uses memcached server to provide a
centralized cache for your application. Rails uses the bundled dalli
gem by default. This is currently the most popular cache store for
production websites. It can be used to provide a single, shared cache
cluster with very a high performance and redundancy.
When initializing the cache, you need to specify the addresses for all
memcached servers in your cluster. If none is specified, it will
assume memcached is running on the local host on the default port, but
this is not an ideal set up for larger sites.
The write and fetch methods on this cache accept two additional
options that take advantage of features specific to memcached. You can
specify :raw to send a value directly to the server with no
serialization. The value must be a string or number. You can use
memcached direct operation like increment and decrement only on raw
values. You can also specify :unless_exist if you don't want memcached
to overwrite an existing entry.
Using rails Cache store instead of directly using Dalli allows you to use the following Nicer API
Rails.cache.read('key')
Rails.cache.write('key', value)
Rails.cache.fetch('key') { value }
Now, rails for actually caching. you can use Declarative Etags or Fragment Caching to cache the notifications. here is an example using Declarative Etags
def get_notification
if current_user
#new_notification = Notification.getNew(current_user.id)
end
refresh_when #new_notification
end
Now the way declarative E-tags works is Template is not rendered when request
sends a matching ETag & cache copy is sent. However, when #new_notification changes the E-tag value will change too. Thus causing the cache to expire. Now, Caching is a vast topic to cover & there are variously techniques to do it. so probally I won't give you a full answers but I would point to the following resources so you can learn more:
Caching with Rail
Rails 4: Zombie Outlaws Course
Rails Cache for dummies
Caching Strategies for Rails
Happy Caching ;-)

Does a before_filter in the application controller slow down the app?

I have a few before_filter in my application controller to check 1) If the current_user is banned, 2) If the current_user has received a new message and 3) If the current_user has any pending friend requests.
This means that before every request the app will check for these things. Will this cause server issues in the future, possible a server overload?
I wouldn't definitely say that it would create a server overload on it's own, for a server overload you need many concurrent requests and rails have a connection pool to the database out of the box, but this will slow down the process as you have 3 queries before each request is even at the controller to do what it was intended to do.
Facebook solved this at 2009 using what they called BigPipe, it is not a new technology rather it is leveraging the browsers and the ability to send a few requests with fragmented parts of the page and only then compose it using some Javascript.
You can have a read here http://www.facebook.com/note.php?note_id=389414033919.
As for your check if the user is banned, yes that is something you'd have to check either way, perhaps you can have this in cache using memcached or redis so it won't hit your database directly every time.

Resources