I realize that actioncable doesn't guarantee messages are seen in sending order. But does it at least guarantee that all clients will see messages in the same order? I'm assuming it does since it goes through Redis pubsub and I understand Redis is single threaded but I wanted to make sure.
Related
I am ruby on rails developer. i am using rabbitMQ in my project to processed some data as soon as the data comes in queue. i am using bunny gem a rabbitMQ client that provide interface to interact with RabbitMq.
My issue is that whenever an exceptions occurred or server stops unexpectedly while processing data from queue my message from the queue is lost.
I want to know how people deal with lost messages from the rabbitMQ queue. is there any way to get those messages back for processing.
There is no way to get the messages back when they're lost. Maybe you could try and track down some entries in RMQ's database cache - but that's just a wild guess/long shot and I don't think that it will help.
What you do need to do for the future is:
in case you are using a single server: make the queues and messages durable, and explicitly acknowledge (so switch off the auto-ACK flag) messages on consumer side only once they're processed.
in case you are using cluster of RMQ nodes (which is of course recommended exactly to avoid these situations): set up queue mirroring
Take a look at RMQ persistance and high availability.
I am using Twilio to send/receive texts in a Rails 4.2 app. I am sending in bulk, around 1000 at a time, and receiving sporadically.
Currently when I receive a text I save it to the DB (to, from, body) and then pass that record to an ActiveJob worker to process later. For sending messages I currently persist the Twilio params to another DB and pass that record to a different ActiveJob worker. Since I am often doing it in batches I have two workers. The first outgoing message worker sends a single message. The second one queries the DB and finds all the user who should receive the message, creates a DB record for each message that should be sent, and then passes that record to the first outgoing message worker. So the second one basically just creates a bunch of jobs for the first one to process.
Right now I have the workers destroying the records once they finish processing (both incoming and outgoing). I am worried about not persisting things incase the server, redis, or resque go down but I do not know if this is actually a good design pattern. It was suggested to me just to use a vanilla ruby object and pass it's id to the worker but I am not sure how that effects data reliability. So is it over kill to be creating all these DBs and should I just be creating vanilla ruby objects and passing those object's ids to the workers?
Any and all insight is appreciated,
Drew
It seems to me that the approach of sending a minimal amount of data to your jobs is the best approach. Check out the 'Best Practices' section on the sidekiq wiki: https://github.com/mperham/sidekiq/wiki/Best-Practices
What if your queue backs up and that quote object changes in the meantime? Don't save state to Sidekiq, save simple identifiers. Look up the objects once you actually need them in your perform method.
Also in terms of reliability - you should be worried about your job queue going down. It happens. You either design your system to be fault tolerant of a failure or you find a job queue system that has higher reliability guarantees (but even then no queue system can guarantee 100% message deliverability). Sidekiq pro has better reliability guarantees than sidekiq (non-pro), but if you design your jobs with a little bit of forethought, you can create jobs that can scan your database after a crash and re-queue any jobs that may have been lost.
How much work you spend desinging fault tolerant solutions really just depends how critical it is that your information make it from point A to point B :)
I have a Rails front-end server, which receives multiple requests from users, then send these requests to backend server.
Backend server processes requests asynchronously and notifies front-end server when it finishes each of the requests.
I use Redis pub/sub to communicate between these two servers. In particular, for each request coming from users, I create a new Redis instance that subscribes to the single channel (say, scoring_channel).
However, if I have 100 users making requests at the same time, each of the Redis subscribers will hold one thread.
Does this affect my server performance? If I have a constraint on maximum number of threads (e.g., Heroku allows max 256 threads), how should I avoid this issue?
This would not affect server performance since redis never blocked by pub/sub.
You should use non-blocking API in client side instead of blocking version to decrease number of threads.
I'm experimenting with Rails 4 ActionController::Live and Server Sent Events. I'm using MRI 2.0.0 and Puma.
For what I can see, each connected client keeps an active connection to the server. I was wondering if it is possible to leverage SSEs without keeping all response streams running.
Puma manages multiple connections using threads, and I imagine there is a limit to the number of cuncurrent connections.
What if I want to support a real-world scenario with thousands of clients registering to my Rails app for SSE events?
Is there any example?
Also, I usually run Rails app servers behind an nginx reverse proxy. Would it require any particular setup?
The way that SSEs are built is by the client opening a connection to the server, which is then left open until the server has some data to send. This is part of the SSE spec, and not a thing specific to ActionController::Live. It's effectively the same as long-polling, but with the connection not being closed after the first bit of data is returned, and with the mechanism built into the browser.
As such, the only way it can be implemented is by having multiple open client connections to the webserver which sit there indefinitely. As to what resources are required to deal with them, I'm not sure, as I've not yet tried to benchmark this, but it'll need enough servers for Puma to keep open thousands of connections if you have that many users with a page open.
The default limit for puma is 16 concurrent connections. Several blogs posts about setting up SSEs for Rails mention upping this to a larger value, but none that I've found suggest what this higher value should be. They do mention that the number of DB connections will need to be the same, as each Rails thread keeps one running. Sort of sounds like an expensive way to run things.
"Run a benchmark" is the only answer really.
I can't comment as to reverse proxying as I've not tried it, but as SSEs are done over standard HTTP, I shouldn't think it'll need any special setup.
I'm working on a Rails application that periodically needs to perform large numbers of IO-bound operations. These operations can be performed asynchronously. For example, once per day, for each user, the system needs to query Salesforce.com to fetch the user's current list of accounts (companies) that he's tracking. This results in huge numbers (potentially > 100k) of small queries.
Our current approach is to use ActiveMQ with ActiveMessaging. Each of our users is pushed onto a queue as a different message. Then, the consumer pulls the user off the queue, queries Salesforce.com, and processes the results. But this approach gives us horrible performance. Within a single poller process, we can only process a single user at a time. So, the Salesforce.com queries become serialized. Unless we run literally hundreds of poller processes, we can't come anywhere close to saturating the server running poller.
We're looking at EventMachine as an alternative. It has the advantage of allowing us to kickoff large numbers of Salesforce.com queries concurrently within a single EventMachine process. So, we get great parallelism and utilization of our server.
But there are two problems with EventMachine. 1) We lose the reliable message delivery we had with ActiveMQ/ActiveMessaging. 2) We can't easily restart our EventMachine's periodically to lessen the impact of memory growth. For example, with ActiveMessaging, we have a cron job that restarts the poller once per day, and this can be done without worrying about losing any messages. But with EventMachine, if we restart the process, we could literally lose hundreds of messages that were in progress. The only way I can see around this is to build a persistance/reliable delivery layer on top of EventMachine.
Does anyone have a better approach? What's the best way to reliably execute large numbers of asynchronous IO-bound operations?
I maintain ActiveMessaging, and have been thinking about the issues of a multi-threaded poller also, though not perhaps at the same scale you guys are. I'll give you my thoughts here, but am also happy to discuss further o the active messaging list, or via email if you like.
One trick is that the poller is not the only serialized part of this. STOMP subscriptions, if you do client -> ack in order to prevent losing messages on interrupt, will only get sent a new message on a given connection when the prior message has been ack'd. Basically, you can only have one message being worked on at a time per connection.
So to keep using a broker, the trick will be to have many broker connections/subscriptions open at once. The current poller is pretty heavy for this, as it loads up a whole rails env per poller, and one poller is one connection. But there is nothing magical about the current poller, I could imagine writing a poller as an event machine client that is implemented to create new connections to the broker and get many messages at once.
In my own experiments lately, I have been thinking about using Ruby Enterprise Edition and having a master thread that forks many poller worker threads so as to get the benefit of the reduced memory footprint (much like passenger does), but I think the EM trick could work as well.
I am also an admirer of the Resque project, though I do not know that it would be any better at scaling to many workers - I think the workers might be lighter weight.
http://github.com/defunkt/resque
I've used AMQP with RabbitMQ in a way that would work for you. Since ActiveMQ implements AMQP, I imagine you can use it in a similar way. I have not used ActiveMessaging, which although it seems like an awesome package, I suspect may not be appropriate for this use case.
Here's how you could do it, using AMQP:
Have Rails process send a message saying "get info for user i".
The consumer pulls this off the message queue, making sure to specify that the message requires an 'ack' to be permanently removed from the queue. This means that if the message is not acknowledged as processed, it is returned to the queue for another worker eventually.
The worker then spins off the message into the thousands of small requests to SalesForce.
When all of these requests have successfully returned, another callback should be fired to ack the original message and return a "summary message" that has all the info germane to the original request. The key is using a message queue that lets you acknowledge successful processing of a given message, and making sure to do so only when relevant processing is complete.
Another worker pulls that message off the queue and performs whatever synchronous work is appropriate. Since all the latency-inducing bits have already performed, I imagine this should be fine.
If you're using (C)Ruby, try to never combine synchronous and asynchronous stuff in a single process. A process should either do everything via Eventmachine, with no code blocking, or only talk to an Eventmachine process via a message queue.
Also, writing asynchronous code is incredibly useful, but also difficult to write, difficult to test, and bug-prone. Be careful. Investigate using another language or tool if appropriate.
also checkout "cramp" and "beanstalk"
Someone sent me the following link: http://github.com/mperham/evented/tree/master/qanat/. This is a system that's somewhat similar to ActiveMessaging except that it is built on top of EventMachine. It's almost exactly what we need. The only problem is that it seems to only work with Amazon's queue, not ActiveMQ.