Strategies for multithreaded singleton object in Rails - ruby-on-rails

I have a compelling use case where notifications happen in realtime at the server level. I would like to push these events out over a websocket using Rails' ActionCable. How Can I reliably maintain a long-lived singleton object to react to and push server-level events?
I prototyped a Rails app using an object instantiated from a file in /app/lib that mixes in the Singleton module. Even with class caching, this was instantiated multiple times and occasionally garbage collected despite open sockets.
Marking the event producer's initialize method private and writing a class-level instance method that checks Thread.main[:event_provider] for an instance works 95% in development, but I worry about what I don't know that I don't know about production. Very occasionally I get exceptions like "Expected x_y.rb to define constant XY" exceptions, which make me think there's a problem to this approach.
The production server will ultimately serve a very small number of clients in an environment that demands 100% uptime. I can choose a server stack that makes sense.
I'm hoping someone with knowledge of Rack and/or ActionCable can comment on reliable ways to serve events to a Rails application from within the server.

As of now, the strategy I am undertaking is to instantiate a singleton object early in the boot process and then use it to maintain threads. Threadsafe practices are obviously needed for this.
The file application.rb defines MyApp::Application. At this point I declare an accessor my_thing_manager, require my_thing_manager and set self.my_thing_manager = MyThingManager.instance.
class MyThingManager
def instance
return Thread.main[:thing_manager] unless Thread.main[:thing_manager].nil?
Thread.main[:thing_manager] = self.new
end
private
def initialize
end
end
This approach works in a single multithreaded process but does not work in a clustered production environment. For my requirements that is completely acceptable. For a multi-process app, one could utilize hooks in e.g. Puma after_worker_fork or Unicorn's after_fork to manage a subscription to something like Redis pubsub. This will be a requirement for an upcoming project so I expect to develop this strategy further.

Related

What available message solutions are there for inter-process communication in ruby?

I have a rails app using delayed_job. I need my jobs to communicate with each other for things like "task 5 is done" or "this is the list of things that need to be processed for task 5".
Right now I have a special table just for this, and I always access the table inside a transaction. It's working fine. I want to build out a cleaner api/dsl for it, but first wanted to check if there were existing solutions for this already. Weirdly I haven't found a single things, I'm either googling completely wrong, or the task is so simple (set and get values inside a transaction) that no one has abstracted it out yet.
Am I missing something?
clarification: I'm not looking for a new queueing system, I'm looking for a way for background tasks to communicate with one another. Basically just safely shared variables. Do the below frameworks offer this facility? It's a shame that delayed job does not.
use case: "do these 5 tasks in parallel, and then when they are all done, do this 1 final task." So, each of the 5 tasks checks to see if it's the last one, and if it is, it fires off the final task.
I use resque. Also there are lots of plugins, which should make inter-process comms easier.
Using redis has another advantage: you can use the pub-sub channels for communication between workers/services.
Another approach (but untested by me): http://www.zeromq.org/, which also has ruby bindings. If you like to test new stuff, then try zeromq.
Update
To clarify/explain/extend my comments above:
Why I should switch from DelayedJobs to Resque is the mentioned advantage that I have queue and messages in one system because Redis offers this.
Further sources:
https://github.com/blog/542-introducing-resque
https://github.com/defunkt/resque#readme
If I had to stay on DJ I would extend the worker classes with redis or zeromq/0mq (only examples here) to get the messaging in my extisting background jobs.
I would not try messaging with ActiveRecord/MySQL (not even queueing actually!) because this DB isn't the best performing system for this use case especially if the application has too many background workers and huge queues and uncountable message exchanges in short times.
If it is a small app with less workers you also could implement a simple messaging via DB, but also here I would prefer memcache instead; messages are short living data chunk which can be handled in-memory only.
Shared variables will never be a good solution. Think of multiple machines where your application and your workers can live on. How you would ensure a save variable transfer between them?
Okay, someone could mention DRb (distributed ruby) but it seems not really used anymore. (never seen a real world example so far)
If you want to play around with DRb however, read this short introduction.
My personal preference order: Messaging (real) > Database driven messaging > Variable sharing
memcached
rabbitmq
You can use Pipes:
reader, writer = IO.pipe
fork do
loop do
payload = { name: 'Kris' }
writer.puts Marshal.dump(payload)
sleep(0.5)
end
end
loop do
begin
Timeout::timeout(1) do
puts Marshal.load(reader.gets) # => { name: 'Kris' }
end
rescue Timeout::Error
# no-op, no messages to receive
end
end
One way
Read as a byte stream
Pipes are expressed as a pair, a reader and a writer. To get two way communication you need two sets of pipes.

Rails best practice: background process/thread?

I'm coming from a PHP environment (at least in terms of web dev) and into the beautiful world of Ruby, so I may have some dumb questions. I imagine there are some fundamentally different options available when not using PHP.
In PHP, we use memcache to store alerts we want to display in a bar along the top of the page. When something happens that generates an alert (such as a new blog post being made), a cron script that runs once every 5 minutes or so puts that information into memcache.
Now when a user visits the site, we look in memcache to find any alerts that they haven't already dismissed and we display them.
What I'm guessing I can do differently in Rails, is to by-pass the need for a cron script, and also the need to look in memcache on every request, by using a Singleton and a polling process running in a separate thread to copy from memcache to this singleton. This would, in theory, be more optimized than checking memcache once-per-request and also encapsulate the polling logic into one place, rather than being split between a cron task and the lookup logic.
My question is: are there any caveats to having some sort of runloop in the background while a Rails app is running? I understand the implications of multithreading, from Objective-C/Java, but I'm asking specifically about the Rails (3) environment.
Basically something like:
class SiteAlertsMap < Hash
include Singleton
def initialize
super
begin_polling
end
# ... SNIP, any specific methods etc ...
private
def begin_polling
# Create some other Thread here, which polls at set intervals
end
end
This leads me into a similar question. We push (encrypted) tasks onto an SQS queue, for things related to e-commerce and for long-running background tasks. We don't use cron for this, but rather we have a worker daemon written in PHP, which runs in the background. Right now when we deploy, we have to shut down this worker and start it again from the new code-base. In Rails, could I somehow have this process start and stop with the rails server (unicorn) itself? I don't think that's something I'd running on the main process in a separate thread, since we often want to control it as a process by itself, but it would be nice if it just conveniently ran when the web application was running.
Threading for background processes in ruby would be a terrible mistake, especially since you're using a multi-process server. Using unicorn with say 4 worker processes would mean that you'd be polling from each of them, which is not what you want. Ruby doesn't really have real threads, it has green threads in 1.8 and a global interpreter lock in 1.9 IIRC. Many gems and libraries are also obnoxiously unthreadsafe.
Using memcache is still your best option and, if you have it set up correctly, you should only see it adding a millisecond or two to the request time. Another option which would give you the benefit of persisting these alerts while incurring minimal additional overhead would be to store these alerts in redis. This would better protect you against things like memcache crashing or server reboots.
For the background jobs you should use a similar approach to what you have now, but there are several off the shelf handlers for this like resque, delayed_job, and a few others. If you absolutely have to use SQS as the backend queue, you might be able to find some code to help you, but otherwise you could write it yourself. This still requires the other daemon to be rebooted whenever there is a code change. In practice this isn't a huge concern as best practices dictate using a deployment system like capistrano where a rule can easily be added to bounce the daemon on deploy. I use monit to watch the daemon process, so restarting it is as easy as telling monit to restart it.
In general, Ruby is not like Java/Objective-C when it comes to threads. It follows the more Unix-like model of process based isolation, but the community has come up with best practices and ways to make this less painful than in other languages. Ruby does require a bit more attention to setting up its stack as it is not as simple as enabling mod_php and copying some files around, but once the choices and architecture is understood, it is easier to reason about how your application works. The process model, in my opinion, is much better for web apps as it isolates code and state from the effects of other running operations. The isolation also makes the app easier to work with in a distributed system.

Why aren't global (dollar-sign $) variables used?

I'm hacking around Rails for a year and half now, and I quite enjoy it! :)
In rails, we make a lots of use of local variables, instance variables (like #user_name) and constants defined in initializers (like FILES_UPLOAD_PATH). But why doesn't anyone use global "dollarized" variables ($) like $dynamic_cluster_name?
Is it because of a design flaw? Is it performance related? A security weakness?
Is it because of design flaw issue ?
Design... flaw? That's a design blessing, design boon, design merit, everything but flaw! Global variables are bad, and they are especially bad in Web applications.
The sense of using global variables is keeping—and changing—the "global state". It works well in a simple single-threaded scripts (no, not well, it works awful, but, still, works), but in web apps it just does not. Most web applications run concurrent backends: i.e. several server instances that respond to requests through a common proxy and load balancer. If you change a global variable, it gets modified only in one of the server instances. Essentially, a dollar-sign variable is not global anymore when you're writing a web app with rails.
Global constant, however, still work, because they are constants, they do not change, and having several instances of them in different servers is OK, because they will always be equal there.
To store a mutable global state, you have to employ more sophisticated tools, such as databases (SQL and noSQL; ActiveRecord is a very nice way to access the DB, use it!), cache backends (memcached), even plain files (in rare cases they're useful)! But global variables simply don't work.
Global variables are often a sign of bad design, and can be a source of bugs due to concurrency issues. Global constants don't really have these issues.
Instead of using a global variable, consider using a singleton or a class variable. That way, you can limit access to the shared state to a small part of your code, making it easier to avoid these problems.
I've once used them to keep FTP connections alive across AJAX calls for a web-based FTP client. This allowed the user to repeatedly interact with their FTP site without having to reconnect each time for every action performed.
So one nice benefit of globals in Ruby is that you can safely store resource type objects in them.
The apparent lack of global usage is an indicator of the flaw of global variable concept, not of ruby's implementation of them. In fact, I didn't even know ruby had a $global syntax. They aren't needed, and so I have never looked for them. Good ruby code never needs them.

Logging inside threads in a Rails application

I've got a Rails application in which a small number of actions require significant computation time. Rather than going through the complexity of managing these actions as background tasks, I've found that I can split the processing into multiple threads and by using JRuby with a multicore sever, I can ensure that all threads complete in a reasonable time. (The customer has already expressed a strong interest in keeping this approach vs. running tasks in the background.)
The problem is that writing to the Rails logger doesn't work within these threads. Nothing shows up in the log file. I found a few references to this problem but no solutions. I wouldn't mind inserting puts in my code to help with debugging but stdout seems to be eaten up by the glassfish gem app server.
Has anyone successfully done logging inside a Rails ruby thread without creating a new log each time?
I was scratching my head with the same problem. For me the answer was as follows:
Thread.new do
begin
...
ensure
Rails.logger.flush
end
end
I understand your concerns about the background tasks, but remember that spinning off threads in Rails can be a scary thing. The framework makes next to no provisions for multithreading, which means you have to treat all Rails objects as not being thread-safe. Even the database connection gets tricky.
As for the logger: The standard Ruby logger class should be thread safe. But even if Rails uses that, you have no control over what the Rails app is doing to it. For example the benchmarking mechanism will "silence" the logger by switching levels.
I would avoid using the rails logger. If you want to use the threads, create a new logger inside the thread that logs the messages for that operation. If you don't want to create a new log for each thread, you can also try to create one thread-safe logging object in your runtime that each of the threads can access.
In your place I'd probably have another look at the background job solutions. While DRb looks like a nightmare, "bj" seems nice and easy; although it required some work to get it running with JRuby. There's also the alternative to use a Java scheduler from JRuby, see http://www.jkraemer.net/2008/1/12/job-scheduling-with-jruby-and-rails

Is Rails shared-nothing or can separate requests access the same runtime variables?

PHP runs in a shared-nothing environment, which in this context means that every web request is run in a clean environment. You can not access another request's data except through a separate persistence layer (filesystem, database, etc.).
What about Ruby on Rails? I just read a blog post stating that separate requests might access the same class variable.
It has occurred to me that this probably depends on the web server. Mongrel's FAQ states that Mongrel uses one thread per request - suggesting a shared-nothing environment. The FAQ goes on to say that RoR is not thread safe, which further suggests that RoR would not exist in a shared environment unless a new request reuses the in-memory objects created from the previous request.
Obviously this has huge security ramifications. So I have two questions:
Is the RoR environment shared-nothing?
If RoR runs in (or might run in some circumstances) a shared environment, what variables and other data storage should I be paranoid about?
Update: I'll clarify further. In a Java servlet container you can have objects which persist across multiple requests. This is typically done for caching data which multiple users would have access to, database connections, etc.. In PHP this can not be done at the application layer, it must be done in a separate persistence layer like Memcached. So the twofold question is: which scenario is RoR like (PHP or Java) and if like Java, which data types persist across multiple requests?
In short:
No, Rails never runs in a shared-nothing environment.
Be paranoid about class variables and class instance variables.
The longer version:
Rails processes start their life cycle by loading the framework and application. They will typically run only a single thread, which will process many requests during its lifetime. The requests will therefore be dispatched strictly sequentially.
Nevertheless, all classes persist across requests. This means any object referenced from your classes and metaclasses (such as class variables and class instance variables) will be shared across requests. This may bite you, for example, if you try to memoize methods (#var ||= expensive_calculation) in your class methods, expecting it will only persist during the current request. In reality, the calculation will only be performed on the first request.
On the surface, it may seem nice to implement caching, or other behaviour that depends on persistence across requests. Typically, it isn't. This is because most deployment strategies will use several Rails processes to counter their own single-threaded nature. It is simply not cool to block all requests while waiting for a slow database query, so the easy way out is to spawn more processes. Naturally, these processes do not share anything (except some memory perhaps, which you won't notice). This may bite you if you save stuff in your class variables or class instance variables during requests. Then, somehow, sometimes the stuff appears to be present, and sometimes it appears to be gone. (In reality, of course, the data may or may not be present in some process, and absent in others).
Some deployment configurations (most notably JRuby + Glassfish) are in fact multithreaded.
Rails is thread safe, so it can deal with it. But your application may not be thread safe. All controller instances are thrown away after each request, but as we know, the classes are shared. This may bite you if you pass information around in class variables or in class instance variables. If you do not properly use synchronisation methods, you may very well end up in race condition hell.
As a side note: Rails is typically run in single-threaded processes because Ruby's thread implementation is imperfect. Luckily, things are a little better in Ruby 1.9. And a lot better in JRuby.
With both these Ruby implementations gaining in popularity, it seems likely that multithreaded Rails deployment strategies will also gain in popularity and number. It is a good idea to write your application with multithreaded request dispatching in mind already.
Here is a relatively simple example that illustrates what can happen if you are not careful about modifying shared objects.
Create a new Rails project: rails test
Create a new file lib/misc.rb and put in it this:
class Misc
#xxx = 'Hello'
def Misc.contents()
return #xxx
end
end
Create a new controller: ruby script/generate controller Posts index
Change app/views/posts/index.html.erb to contain this code:
<%
require 'misc'; y = Misc.contents() ; y << ' (goodbye) '
%>
<pre><%= y %></pre>
(This is where we modify the implicitly shared object.)
Add RESTful routes to config/routes.rb.
Start the server ruby script/server and load the page /posts several times. You will see the number of ( goodbye) strings increasing by one on each successive page reload.
In your average deployment using Passenger, you probably have multiple app processes that share nothing between them but classes within each process that maintain their (static) state from request to request. Each request, though, makes a new instance of your controllers.
You might call this a cluster of distinct shared-state environments.
To use your Java analogy, you can do the caching and have it work from request to request, you just can't assume that it will be available on every request.
Shared-nothing is sometimes a good idea. But not when you have to load a large application framework and a large domain model and a large amount of configuration on every request.
For efficiency, Rails keeps some data available in memory to be shared among all requests for the lifetime of an application. Most of this data is read-only, so you shouldn't be worried.
When you write your app, stay away from writing to shared objects (excluding the database, for example, which comes out-of-the-box with good concurrency control) and you should be fine.

Resources