Why aren't global (dollar-sign $) variables used? - ruby-on-rails

I'm hacking around Rails for a year and half now, and I quite enjoy it! :)
In rails, we make a lots of use of local variables, instance variables (like #user_name) and constants defined in initializers (like FILES_UPLOAD_PATH). But why doesn't anyone use global "dollarized" variables ($) like $dynamic_cluster_name?
Is it because of a design flaw? Is it performance related? A security weakness?

Is it because of design flaw issue ?
Design... flaw? That's a design blessing, design boon, design merit, everything but flaw! Global variables are bad, and they are especially bad in Web applications.
The sense of using global variables is keeping—and changing—the "global state". It works well in a simple single-threaded scripts (no, not well, it works awful, but, still, works), but in web apps it just does not. Most web applications run concurrent backends: i.e. several server instances that respond to requests through a common proxy and load balancer. If you change a global variable, it gets modified only in one of the server instances. Essentially, a dollar-sign variable is not global anymore when you're writing a web app with rails.
Global constant, however, still work, because they are constants, they do not change, and having several instances of them in different servers is OK, because they will always be equal there.
To store a mutable global state, you have to employ more sophisticated tools, such as databases (SQL and noSQL; ActiveRecord is a very nice way to access the DB, use it!), cache backends (memcached), even plain files (in rare cases they're useful)! But global variables simply don't work.

Global variables are often a sign of bad design, and can be a source of bugs due to concurrency issues. Global constants don't really have these issues.
Instead of using a global variable, consider using a singleton or a class variable. That way, you can limit access to the shared state to a small part of your code, making it easier to avoid these problems.

I've once used them to keep FTP connections alive across AJAX calls for a web-based FTP client. This allowed the user to repeatedly interact with their FTP site without having to reconnect each time for every action performed.
So one nice benefit of globals in Ruby is that you can safely store resource type objects in them.

The apparent lack of global usage is an indicator of the flaw of global variable concept, not of ruby's implementation of them. In fact, I didn't even know ruby had a $global syntax. They aren't needed, and so I have never looked for them. Good ruby code never needs them.

Related

Strategies for multithreaded singleton object in Rails

I have a compelling use case where notifications happen in realtime at the server level. I would like to push these events out over a websocket using Rails' ActionCable. How Can I reliably maintain a long-lived singleton object to react to and push server-level events?
I prototyped a Rails app using an object instantiated from a file in /app/lib that mixes in the Singleton module. Even with class caching, this was instantiated multiple times and occasionally garbage collected despite open sockets.
Marking the event producer's initialize method private and writing a class-level instance method that checks Thread.main[:event_provider] for an instance works 95% in development, but I worry about what I don't know that I don't know about production. Very occasionally I get exceptions like "Expected x_y.rb to define constant XY" exceptions, which make me think there's a problem to this approach.
The production server will ultimately serve a very small number of clients in an environment that demands 100% uptime. I can choose a server stack that makes sense.
I'm hoping someone with knowledge of Rack and/or ActionCable can comment on reliable ways to serve events to a Rails application from within the server.
As of now, the strategy I am undertaking is to instantiate a singleton object early in the boot process and then use it to maintain threads. Threadsafe practices are obviously needed for this.
The file application.rb defines MyApp::Application. At this point I declare an accessor my_thing_manager, require my_thing_manager and set self.my_thing_manager = MyThingManager.instance.
class MyThingManager
def instance
return Thread.main[:thing_manager] unless Thread.main[:thing_manager].nil?
Thread.main[:thing_manager] = self.new
end
private
def initialize
end
end
This approach works in a single multithreaded process but does not work in a clustered production environment. For my requirements that is completely acceptable. For a multi-process app, one could utilize hooks in e.g. Puma after_worker_fork or Unicorn's after_fork to manage a subscription to something like Redis pubsub. This will be a requirement for an upcoming project so I expect to develop this strategy further.

Rails turn feature on/off on the fly

I am a newbie to rails. I have used feature flags when i was in java world. I found that there are a few gems in rails (rollout and others) for doing it. But how to turn a feature on/off on the fly in rails.
In java we can use a mbean to turn features on the fly. Any idea or pointers on how to do this? I dont want to do a server restart on my machines once a code is deployed.
Unless you have a way of communicating to all your processes at once, which is non-standard, then you'd need some kind of centralized configuration system. Redis is a really fast key-value store which works well for this, but a database can also do the job if a few milliseconds per page load to figure out which features to enable isn't a big deal.
If you're only deploying on a single server, you could also use a static YAML or JSON configuration file that's read before each request is processed. The overhead of this is almost immeasurable.

Debugging Amazon SQS consumers

I'm working with a PHP frontend which connects to a distributed back end, using Amazon SQS and a variety of message types and message consumers. I'm trying to come up with a way to safely debug those consumers, as we don't want message handlers with new, untested code consuming end-user messages, risking the messages being lost or incorrectly processed.
The actual message queue names are hardcoded as PHP constants in a class, so my first tactic was to create two different sets of queues, one for production and another for debugging, and to externalise the queue name constants into two different files. Depending on whether our debug condition is true or not, I wanted to include one or the other of those constant definitions and assign the constants in the included file to the class constants which currently have the names hardcoded.
This doesn't seem to work though because constants seem to act like class variables in PHP whereas I am trying to assign the values like instance variables. The next tactic was to see if there was anything on Amazon's side that would allow us to debug our message consumers transparently without adding lots of hacks to our code, but I couldn't see anything there that facilitated this. I'd love to know if anyone else has experienced (and ideally, solved this problem)
SQS doesn't provide a way to inspect the contents of messages in the queue, or for the sender to see if any consumers are failing to process messages.
A common approach to this problem would be to set up two sets of queues as you suggest and have the producer post the same message onto both queues. That way you can debug your code against a stream of production messages without affecting the actual production queue.
I'd recommend moving the decision of which queue to use out of your code and into config, and then deploy different config files to your development boxes vs your production boxes. The risk is always that a development box ends up talking to production systems, so having a single consistent approach to configuring those end-points across all your code is much less risky that doing it on an ad-hoc basis each time you call out to a service.
I'd also recommend putting your production and development queues in different AWS accounts with different access credentials. That way you can give your production account permission to publish to the development account's queue, but you can guarantee that your development systems can't read from the production queue.

Multiple redmine instances best practices

I'm studying the best way to have multiple redmine instances in the same server (basically I need a database for each redmine group).
Until now I have 2 options:
Deploy a redmine instance for each group
Deploy one redmine instance with multiple database
I really don't know what is the best practice in this situation, I've seen some people doing this in both ways.
I've tested the deployment of multiple redmines (3 instances) with nginx and passenger. It worked well but I think with a lot of instances it may not be feasible. Each app needs around 100mb of RAM, and with the increasing of requests it tends to allocate more processes to the app. This scenario seems bad if we had a lot of instances.
The option 2 seems reasonable, I think I can implement that with rails environments. But I think that there are some security problems related with sessions (I think a user of site A is allowed to make actions on site B after an authentication in A).
There are any good practice for this situation? What's the best practice to take in this situation?
Other requirement related with this is: we must be able to create or shut down a redmine instance without interrupt the others (e.g. we should avoid server restarts..).
Thanks for any advice and sorry for my english!
Edit:
My solution:
I used a redmine instance for each group. I used nginx+unicorn to manage each instance independently (because passenger didn't allow me to manage each instance independently).
The two options are not so different after all. The only difference is that in option 2, you only have one copy of the code on your disk.
In any case, you still need to run different worker processes for each instance, as Redmine (and generally most Rails apps) doesn't support database switching for each request and some data regarding a certain environment are cached in process.
Given that, there is not really much incentive to share even the codebase as it would require certain monkey patches and symlink-magic to allow the proper initialization for the intentional configuration differences (database and email configuration, paths to uploaded files, ...). The Debian package does that but it's (in my eyes) rather brittle and leads to a rather non-standard system.
But to stress again: even if you share the same code on the disk between instances, you can't share the running worker processes.
Running multiple instances from the same codebase is not officially supported by Redmine. However, Debian/Ubuntu packages seem to support such approach... See:
Multiple instances of redmine on Debian squeeze
So, generally:
If you use Debian/Ubuntu go with option #2
Otherwise go with #1
Rolling forward a couple of years, and you might now want to consider a third option of using docker containers for each of your redmine instances.
I've been using https://github.com/sameersbn/docker-redmine.git , and have been quite happy with it except that it doesn't yet support handling of incoming mail for creating and commenting on tickets.

Is Rails shared-nothing or can separate requests access the same runtime variables?

PHP runs in a shared-nothing environment, which in this context means that every web request is run in a clean environment. You can not access another request's data except through a separate persistence layer (filesystem, database, etc.).
What about Ruby on Rails? I just read a blog post stating that separate requests might access the same class variable.
It has occurred to me that this probably depends on the web server. Mongrel's FAQ states that Mongrel uses one thread per request - suggesting a shared-nothing environment. The FAQ goes on to say that RoR is not thread safe, which further suggests that RoR would not exist in a shared environment unless a new request reuses the in-memory objects created from the previous request.
Obviously this has huge security ramifications. So I have two questions:
Is the RoR environment shared-nothing?
If RoR runs in (or might run in some circumstances) a shared environment, what variables and other data storage should I be paranoid about?
Update: I'll clarify further. In a Java servlet container you can have objects which persist across multiple requests. This is typically done for caching data which multiple users would have access to, database connections, etc.. In PHP this can not be done at the application layer, it must be done in a separate persistence layer like Memcached. So the twofold question is: which scenario is RoR like (PHP or Java) and if like Java, which data types persist across multiple requests?
In short:
No, Rails never runs in a shared-nothing environment.
Be paranoid about class variables and class instance variables.
The longer version:
Rails processes start their life cycle by loading the framework and application. They will typically run only a single thread, which will process many requests during its lifetime. The requests will therefore be dispatched strictly sequentially.
Nevertheless, all classes persist across requests. This means any object referenced from your classes and metaclasses (such as class variables and class instance variables) will be shared across requests. This may bite you, for example, if you try to memoize methods (#var ||= expensive_calculation) in your class methods, expecting it will only persist during the current request. In reality, the calculation will only be performed on the first request.
On the surface, it may seem nice to implement caching, or other behaviour that depends on persistence across requests. Typically, it isn't. This is because most deployment strategies will use several Rails processes to counter their own single-threaded nature. It is simply not cool to block all requests while waiting for a slow database query, so the easy way out is to spawn more processes. Naturally, these processes do not share anything (except some memory perhaps, which you won't notice). This may bite you if you save stuff in your class variables or class instance variables during requests. Then, somehow, sometimes the stuff appears to be present, and sometimes it appears to be gone. (In reality, of course, the data may or may not be present in some process, and absent in others).
Some deployment configurations (most notably JRuby + Glassfish) are in fact multithreaded.
Rails is thread safe, so it can deal with it. But your application may not be thread safe. All controller instances are thrown away after each request, but as we know, the classes are shared. This may bite you if you pass information around in class variables or in class instance variables. If you do not properly use synchronisation methods, you may very well end up in race condition hell.
As a side note: Rails is typically run in single-threaded processes because Ruby's thread implementation is imperfect. Luckily, things are a little better in Ruby 1.9. And a lot better in JRuby.
With both these Ruby implementations gaining in popularity, it seems likely that multithreaded Rails deployment strategies will also gain in popularity and number. It is a good idea to write your application with multithreaded request dispatching in mind already.
Here is a relatively simple example that illustrates what can happen if you are not careful about modifying shared objects.
Create a new Rails project: rails test
Create a new file lib/misc.rb and put in it this:
class Misc
#xxx = 'Hello'
def Misc.contents()
return #xxx
end
end
Create a new controller: ruby script/generate controller Posts index
Change app/views/posts/index.html.erb to contain this code:
<%
require 'misc'; y = Misc.contents() ; y << ' (goodbye) '
%>
<pre><%= y %></pre>
(This is where we modify the implicitly shared object.)
Add RESTful routes to config/routes.rb.
Start the server ruby script/server and load the page /posts several times. You will see the number of ( goodbye) strings increasing by one on each successive page reload.
In your average deployment using Passenger, you probably have multiple app processes that share nothing between them but classes within each process that maintain their (static) state from request to request. Each request, though, makes a new instance of your controllers.
You might call this a cluster of distinct shared-state environments.
To use your Java analogy, you can do the caching and have it work from request to request, you just can't assume that it will be available on every request.
Shared-nothing is sometimes a good idea. But not when you have to load a large application framework and a large domain model and a large amount of configuration on every request.
For efficiency, Rails keeps some data available in memory to be shared among all requests for the lifetime of an application. Most of this data is read-only, so you shouldn't be worried.
When you write your app, stay away from writing to shared objects (excluding the database, for example, which comes out-of-the-box with good concurrency control) and you should be fine.

Resources