How to get IDs of models subscribed to via a specific channel - ruby-on-rails

How do I get a list of all ActiveRecord models currently subscribed to via a specific ActionCable Channel?

I'm sorry to give you an answer you do not want, but...:
You don't get a list of all subscribed clients. You shouldn't be able to. If you need this information, you might be experiencing a design flaw.
Why?
The pub/sub paradigm is designed to abstract these details away, allowing for horizontal scaling in a way that has different nodes managing their own subscription lists.
Sure, when you're running a single application on a single process, you might be able to extract this information - but the moment you scale up, using more processes / machines, this information is distributed and isn't available any more.
Example?
For example, when using iodine's pub/sub engine (see the Ruby iodine WebSocket / HTTP server for details):
Each process manages it's own client list.
Each process is a "client" in the master / root process.
Each master / root process is a client in a Redis server (assuming Redis is used).
Let's say you run two iodine "dynos" on Heroku, each with 16 workers, then:
Redis sees a maximum of two clients per channel.
Each of the two master processes sees a maximum of 16 clients per channel.
Each process sees only the clients that are connected to that specific process.
As you can see, the information you are asking for isn't available anywhere. The pub/sub implementation is distributed across different machines. Each process / machine only manages the small part of the pub/sub client list.
EDIT (1) - answering updated question
There are three possible approaches to solve this question:
a client-side solution;
a server-side solution; and
a lazy (invalidation) approach.
As a client side solution, the client could register to a global "server-notifications-channel". When a "re-authenticate" message appears, the client should re-authenticate, initiating the unique token generation on it's unique connection.
A server side solution requires the server-side connection to listen to a global "server-notifications-channel". Then the connection object will re-calculate the authentication token and transmit a unique message to the client.
The lazy-invalidation approach is to simply invalidate all tokens. Connected clients will stay connected (until they close the browser, close their machine or exit their app). Clients will have to re-authenticate when establishing a new connection.
Note (added as discussed in the comments):
The only solution that solves the "thundering herd" scenario is the lazy/invalidation solution.
Any other solution will cause a spike in network traffic and CPU consumption since all connected clients will be processing an event at a similar time.
Implementing:
With ActionCable a client-side solution might be easier to implement. It's design and documentation are very "push" oriented. They often assume a client side processing approach.
On iodine, server-side subscriptions simply require a block to be passed along to the client.subscribe method. This creates a client-specific subscription with an event that runs on the server (instead of a message sent to the client).
The lazy-invalidation approach might hurt user experience, depending on the design, since they might have to re-enter credentials.
On the other hand, lazy-invalidation might be the safest, add to the appearance of safety and ease the burden on the servers at the same time.

WARNING: Please see #Myst answer and associated comments. The answer below isn't recommended when scaling beyond a single server instance.
PATCH REQUIRED FOR DEVELOPMENT AND TEST ENVIRONMENT
module ActionCable
module SubscriptionAdapter
class SubscriberMap
def get_subscribers
#subscribers
end
end
end
end
CODE TO GET IDs OF MODELS SUBSCRIBED TO
pubsub = ActionCable.server.pubsub
if Rails.env.production?
channel_with_prefix = pubsub.send(:channel_with_prefix, ApplicationMetaChannel.channel_name)
channels = pubsub.send(:redis_connection).pubsub('channels', "#{channel_with_prefix}:*")
subscriptions = channels.map do |channel|
Base64.decode64(channel.match(/^#{Regexp.escape(channel_with_prefix)}:(.*)$/)[1])
end
else #DEVELOPMENT or Test Environment: Requires patching ActionCable::SubscriptionAdapter::SubscriberMap
subscribers = pubsub.send(:subscriber_map).get_subscribers.keys
subscriptions = []
subscribers.each do |sid|
next unless sid.split(':').size === 2
channel_name, encoded_gid = sid.split(':')
if channel_name === ApplicationMetaChannel.channel_name
subscriptions << Base64.decode64(encoded_gid)
end
end
end
# the GID URI looks like that: gid://<app-name>/<ActiveRecordName>/<id>
gid_uri_pattern = /^gid:\/\/.*\/#{Regexp.escape(SomeModel.name)}\/(\d+)$/
some_model_ids = subscriptions.map do |subscription|
subscription.match(gid_uri_pattern)
# compacting because 'subscriptions' include all subscriptions made from ApplicationMetaChannel,
# not just subscriptions to SomeModel records
end.compact.map { |match| match[1] }

Related

Accounting for users that have left website without using onunload

I have a webservice with very limited resources (I will be able to handle about 3 simultaneous users).
When users interact with my website they start a complex process server-side. (This process is the limiting factor, as my server machine will not be able to handle many in parallel, and clients cannot run this on their side.)
My question is how to make sure to end the process for users that leave, for example by closing the window.
I have considered onunload and onbeforeunload, but they are also triggered by links within the website (which I need for users to be able to interact with the process) so that does not seem like an option.
This approach seems problematic according to other questions (see this, for example), but it could work if there were a way to check if the user is still an active user when performing the action triggered by onunload (even if in a different page of the website), but I don't know how to do this.
I have also considered periodically checking the list of active users and cancelling the process for users that have left, but I don't know if this is even possible.
I have zero experience with cookies, but could this be a place to use them? Can the server access the (still living) cookies of disconnected users?
Which sounds like a reasonable approach for this problem?
Cases such as these are generally handled by heartbeats. Have your client send periodic heartbeats (which are essentially pings) to the server notifying that it is still alive and interested in the process's results. And the server automatically kills those processes for which it hasn't received client heartbeat for a configured amount of time.
I have considered onunload and onbeforeunload
You are right- you can't rely on them.
I have zero experience with cookies, but could this be a place to use them?
No. Cookies maintain client-side state that is sent to a server on HTTP calls. So, servers don't manage cookies. Instead, they only look at them to identify state.

Share data between ActiveJob and Controller

Every n seconds application is requesting a remote JSON file that provides live prices for securities in the Trading system. JSON has a block with the data I need (marketdata) and a block with the current dataversion(version and seqnum).
Right now I use ActionController::Live (with EventSource on the client side) to push updated data to the browser. All actions are done within one method:
opening SSE connection;
forming dynamic URL;
pulling new data from remote server;
comparing/reassigning seqnum value;
updating database if needed.
So my goal now is to separate pulling & updating the database (ActiveJob) with pushing updated values to the browser (ActionController::Live). To accomplish this I need:
either to store somewhere on the server side seqnum & version to share between controller and background job;
or monitor databases for the latest changes in the updated_at fields.
So basically I have two questions:
What is more efficient between the two options above?Are there any other good approaches?
(in case the first one has a right to exist) How to implement this approach?
Considering the fact that you might have, for example, multiple rails process running, I believe it becomes quite hard for you to let activejob talk directly to rails controller in some way.
Defintely store seqnum and version, I wouldn't rely on updated_at in any case, it's too easy to get it updated randomly and so end up sending stuff to the client without any real reason. Also in this case they seem like very solid fields to point out if the file has been updated.
With polling
That being said, you want to "signal" ActionController::Live in some way and I'm afraid polling here is your only option, unless on your client side there is a specific moment when it needs to know if the file has been updated, in which case you might want to use websockets or something similar.
So, something like
cached_request = YourCachedRequest.latest # Assuming it returns a single record
updated = true
loop do
if updated
updated = false
response.stream.write cached_request.serialize_in_some_way
end
current_version = cached_request.version # use seqnum too if you need
cached_request = cached_request.reload
updated = true if cached_request.version > current_version
sleep 20.0
end
Without polling
If you want an option that doesn't involve polling, you can only go for websockets I believe. However you have a more efficient option:
Create a mini application (evenmachine/sinatra/something light) where the clients will poll (you can pass through your main application to distribute this to differente nodes of this mini application), the point of this app is only to reroute messages from your main application to polling clients.
Now, you can create an internal API endpoint for your main application that it's used only by delayed job. Delayed job will hit this endpoint only when it notices that the fetched JSON is actually updated relative to the one currently stored. If that's the case, it will hit your main app API endpoint which in turn will send a message (again, probably through an HTTP API endpoint, this time on your mini app) to all your mini app instances, which in turn will send them to your clients.
In this way, you don't overload your main server but only these mini-nodes which can have localized outages (which is a big advantage, instead of having a big system outage).

Production Rails 3 application can't handle over 200 requests per minute

I have a Rails production application that is down several times per day. This application, in addition to serving its users, is the endpoint for a 3rd party website that sends it updates.
Occasionally, these updates will come flooding in so fast that the requests back up and the application becomes unavailable for long periods of time. It is a legitimate usage which ends up causing a Denial of Service.
The request from the 3rd party is pretty simple:
class NotificationsController < ApplicationController
def notify
begin
notification_xml = request.body.read
notification_hash = Hash.from_xml(item_response_xml)['Envelope']['Body']['NotificationResponse']
user = User.find(notification_hash['UserID'])
user.delay.set_notification(notification_hash)
rescue Exception => bang
logger.error bang.backtrace
unless user.blank?
alert_file_name = "#{user.id}_#{notification_hash['Message']['MessageID']}_#{notification_hash['NotificationEventName']}_#{notification_hash['Timestamp']}.xml"
File.open(alert_file_name, 'w') {|f| f.write(notification_xml) }
end
end
render nothing: true, status: 200
end
end
I have two app servers against a very large database. However, when this 3rd party website really hits us with the notification requests, over 200 per minute up to close to 1,000 requests per minute, both webservers get completely tied up.
You can also see above that I'm using the .delay call since I'm using Sidekiq. I thought that would help, and it did for a while, but the application can't handle that many requests.
Other than handling the requests in a separate application, which I'm not sure is really possible in my EngineYard installation, is there something I can do to speed up the handling of this request?
If it takes too much to process all those request, try a different approach.
Create a new model (I will call it Request) with only one field (I'll name it message) - the xml sent to you by that 3rd party app.
Rewrite your notify action to be very simple and fast:
def notify
Request.create(message: request.body)
render nothing: true, status: 200
end
Create a new action, let's say process_requests like this:
def process_requests
Request.order('id ASC')find_in_batches(100) do |group|
group.each do |request|
process_request(request)
request.destroy
end
end
end
def process_request(notification_xml)
begin
notification_hash = Hash.from_xml(item_response_xml)['Envelope']['Body']['NotificationResponse']
user = User.find(notification_hash['UserID'])
user.set_notification(notification_hash)
rescue Exception => bang
logger.error bang.backtrace
unless user.blank?
alert_file_name = "#{user.id}_#{notification_hash['Message']['MessageID']}_#{notification_hash['NotificationEventName']}_#{notification_hash['Timestamp']}.xml"
File.open(alert_file_name, 'w') {|f| f.write(notification_xml) }
end
end
Create a cron and call process_requests at a defined interval (few minutes).
I never used Sidekiq so I preferred to use find_in_batches (I used a batch of 100 results just for the sake of example).
notify action shouldn't run for more than a few milliseconds (inserts are pretty fast) so this should be able to handle the incoming traffic in your critical moments.
If you try something similar and it helps your servers to reduce the load in critical moments let me know :D
If this will be useful and you insert background processing here too, please post that for the others to see.
If you're monitoring this app with New Relic/AppNet/something else, checking your reports might give you an idea of some long-hanging fruit. We've only got a small picture of the application here; it's possible that enhancements elsewhere in the app might help as well.
With that said, here are a few ideas which can be applied separately or together:
Do Less Work on Intake
Right now you're doing a bunch of XML processing—which is expensive—before you pass the job off to Sidekiq. That's a choke point, and by running in the app process it's tying up your application.
If your Redis instance has enough memory, consider refactoring notify so the whole XML payload gets passed off to Sidekiq. You're already always returning a 200 response to the API consumer, so there's no impact on your existing external API.
Your worker instances can then process the XML payloads at their own pace without impacting the application.
Implement API Throttling
The third-party site is hammering you at a tremendous rate not normally permitted even by huge sites. That's a problem.
If you can't get them to address it on their end, play like the big dogs: Implement request throttling on your end. You likely have some ability to do this at the Rack level on EngineYard (though a quick search of their docs didn't immediately yield anything), but even doing it at the application level is likely to improve things.
There's a previous Stack Overflow discussion that may offer a couple options.
Proxy the API
A few services exist that will proxy your API for you, allowing you to easily implement features like rate limiting, throttling, and quotas that might otherwise be difficult to add.
The one I'm familiar with off the top of my head is Azure's API Management service. If this isn't a revenue-generating project, the cost might be prohibitive. ($49/month postpaid, though it would be cheaper prepaid, or could even be free if you qualify for BizSpark.)
Farm the API Out
The more advanced cousin of API proxies, "API as a Service" actually lets you run your API on its own VM instance—as well as offering the features a proxy does. If your database isn't a choke point, this can be a way to spread the load out and help prevent machine clients from affecting the experience of human clients.
The ten thousand pound gorilla is Apigee, though there are a variety of other established and startup options.
There is a catch: Most of these services are built around Node.js. If your Rails app is already leaning toward service-oriented architecture, and if you know and like JavaScript, this may not be an issue for you. Otherwise, the need to build an interface between services and maintain a service in a second language may be a bridge too far.

Client Server API pattern in REST (unreliable network use case)

Let's assume we have a client/server interaction happening over unreliable network (packet drop). A client is calling server's RESTful api (over http over tcp):
issuing a POST to http://server.com/products
server is creating an object of "product" resource (persists it to a database, etc)
server is returning 201 Created with a Location header of "http://server.com/products/12345"
! TCP packet containing an http response gets dropped and eventually this leads to a tcp connection reset
I see the following problem: the client will never get an ID of a newly created resource yet the server will have a resource created.
Questions: Is this application level behavior or should framework take care of that? How should a web framework (and Rails in particular) handle a situation like that? Are there any articles/whitepapers on REST for this topic?
The client will receive an error when the server does not respond to the POST. The client would then normally re-issue the request as they assume that it has failed. Off the top of my head I can think of two approaches to this problem.
One is that the client can generate some kind of request identifier, such as a guid, which it includes in the request. If the server receives a POST request with a duplicate GUID then it can refuse it.
The other approach is to PUT instead of POST to create. If you cannot get the client to generate the URI then you can ask the server to provide a new URI with a GET and then do a PUT to that URI.
If you search for something like "make POST idempotent" you will probably find a bunch of other suggestions on how to do this.
If it isn't reasonable for duplicate resources to be created (e.g. products with identical titles, descriptions, etc.), then unique identifiers can be generated on the server which can be tracked against created resources to prevent duplicate requests from being processed. Unlike Darrel's suggestion of generating unique IDs on the client, this would also prevent separate users from creating duplicate resources (which you may or may not find desirable). Clients will be able to distinguish between "created" responses and "duplicate" responses by their response codes (201 and 303 respectively, in my example below).
Pseudocode for generating such an identifier — in this case, a hash of a canonical representation of the request:
func product_POST
// the canonical representation need not contain every field in
// the request, just those which contribute to its "identity"
tags = join sorted request.tags
canonical = join [request.name, request.maker, tags, request.desc]
id = hash canonical
if id in products
http303 products[id]
else
products[id] = create_product_from request
http201 products[id]
end
end
This ID may or may not be part of the created resources' URIs. Personally, I'd be inclined to track them separately — at the cost of an extra lookup table — if the URIs were going to be exposed to users, as hashes tend to be ugly and difficult for humans to remember.
In many cases, it also makes sense to "expire" these unique hashes after some time. For example, if you were to make a money transfer API, a user transferring the same amount of money to the same person a few minutes apart probably indicates that the client never received the "success" response. If a user transfers the same amount of money to the same person once a month, on the other hand, they're probably paying their rent. ;-)
The problem as you describe it boils down to avoiding what are called double-adds. As mentioned by others, you need to make your posts idempotent.
This can be easily implemented at the framework level. The framework can keep a cache of completed responses. The requests have to have a request unique so that any retries are treated as such, and not as new requests.
If the successful response gets lost on its way to the client, the client will retry with the same request unique, the server will then respond with its cached response.
You are left with durability of the cache, how long to keep responses, etc. One approach is to remove responses from the server cache after a given period of time, this will depend on your app domain and traffic and can be left as a configurable step on the framework piece. Another approach is to force the client to sent acknowledgements. The acks can be sent either as separate requests (note that these could be lost too), or as extra data piggy backed on real requests.
Although what I suggest is similar to what others suggest, I strongly encourage you to keep this layer of network resiliency to do only that, deal with drop requests/responses and not allow it to deal with duplicate resources from separate requests which is an application level task. Merging both pieces will mush all functionality and will not leave you with a clear separation of responsibilities.
Not an easy problem, but if you keep it clean you can make your app much more resilient to bad networks without introducing too much complexity.
And for some related experiences by others go here.
Good luck.
As the other responders have pointed out, the basic problem here is that the standard HTTP POST method is not idempotent like the other methods. There is an effort underway to establish a standard for an idempotent POST method known as Post-Once-Exactly, or POE.
Now I'm not saying that this is a perfect solution for everybody in the situation you describe, but if it is the case that you are writing both the server and the client, you may be able to leverage some of the ideas from POE. The draft is here: https://datatracker.ietf.org/doc/html/draft-nottingham-http-poe-00
It isn't a perfect solution, which is probably why it hasn't really taken off in the six years since the draft was submitted. Some of the problems, and some clever alternate options are discussed here:
http://tech.groups.yahoo.com/group/rest-discuss/message/7646
HTTP is a stateless protocol, meaning the server can't open an HTTP connection. All connections get initialized by the client. So you can't solve such an error on the server side.
The only solution I can think of: If you know, which client created the product, you can supply it the products it created, if it pulls that information. If the client never contacts you again, you won't be able to transmit information about the new product.

Ideas for web application with external input and realtime notification

I am to build a web application which will accept different events from external sources and present them quickly to the user for further actions. I want to use Ruby on Rails for the web application. This project is a internal development project. I would prefer simple and easy to use solutions for rapid development over high reliable and complex systems.
What it should do
The user has the web application opened in his browser. Now an phone call comes is. The phone call is registered by a PBX monitoring daemon. In this case via the Asterisk Manager Interface. The daemon sends the available information (remote extension, local extension, call direction, channel status, start time, end time) somehow to the web application. Next the user receives a notified about the phone call event. The user now can work with this. For example by entering a summary or by matching the call to a customer profile.
The duration from the first event on the PBX (e.g. the creation of a new channel) to the popup notification in the browser should be short. Given a fast network I would like to be within two seconds. The single pieces of information about an event are created asynchronously. The local extension may be supplied separate from the remote extension. The user can enter a summary before the call has ended. The end time, new status etc. will show up on the interface as soon as one party has hung up.
The PBX monitor is just one data source. There will be more monitors like email or a request via a web form. The monitoring daemons will not necessarily run on the same host as the database or web server. I do not image the application will serve thousands of logged in users or concurrent requests soon. But from the design 200 users with maybe about the same number of events per minute should not be a scalability issue.
How should I do?
I am interested to know how you would design such an application. What technologies would you suggest? How do the daemons communicate their information? When and by whom is the data about an event stored into the main database? How does the user get notified? Should the browser receive a complete dataset on behalf of a daemon or just a short note that new data is available? Which JS library to use and how to create the necessary code on the server side?
On my research I came across a lot of possibilities: Message brokers, queue services, some rails background task solutions, HTTP Push services, XMPP and so on. Some products I am going to look into: ActiveMQ, Starling and Workling, Juggernaut and Bosh.
Maybe I am aiming too hight? If there is a simpler or easier way, like just using the XML or JSON interface of Rails, I would like to read this even more.
I hope the text is not too long :)
Thanks.
If you want to skip Java and Flash, perhaps it makes sense to use a technology in the Comet family to do the push from the server to the browser?
http://en.wikipedia.org/wiki/Comet_%28programming%29
For the sake of simplicity, for notifications from daemons to the Web browser, I'd leave Rails in the middle, create a RESTful interface to that Rails application, and have all of the daemons report to it. Then in your daemons you can do something as simple as use curl or libcurl to post the notifications. The Rails app would then be responsible for collecting the incoming notifications from the various sources and reporting them to the browser, either via JavaScript using a Comet solution or via some kind of fatter client implemented using Flash or Java.
You could approach this a number of ways but my only comment would be: Push, don't pull. For low latency it's not only quicker it's more efficient, as your server now doesn't have to handle n*clients once a second polling the db/queue. ActiveMQ is OK, but Starling will probably serve you better if you're not looking for insane levels of persistence.
You'll almost certainly end up using Flash on the client side (Juggernaut uses it last time I checked) or Java. This may be an issue for your clients (if they don't have Flash/Java installed) but for most people it's not an issue; still, a fallback mechanism onto a pull notification system might be prudent to implement.
Perhaps http://goldfishserver.com might be of some use to you. It provides a simple API to allow push notifications to your web pages. In short, when your data updates, send it (some payload data) to the Goldfish servers and your client browsers will be notified, with the same data.
Disclaimer: I am a developer working on goldfish.
The problem
There is an event - either external (or perhaps internally within your app).
Users should be notified.
One solution
I am myself facing this problem. I haven't solved it yet, but this is how I intend to do it. It may help you too:
(A) The app must learn about the event (via an exposed end point)
Expose an end point by which you app can be notified about external events.
When the end point is hit (and after authentication then users need to be notified).
(B) Notification
You can notify the user directly by changing the DOM on the current web page they are on.
You can notify users by using the Push API (but you need to make sure your browsers can target that).
All of these notification features should be able to be handled via Action Cable: (i) either by updating the DOM to notify you when a phone call comes in, or (ii) via a push notification that pops up in your browser.
Summary: use Action Cable.
(Also: why use an external service like Pusher, when you have ActionCable at your disposal? Some people say scalability, and infrastructure management. But I do not know enough to comment on these issues. )

Resources