What can threads do that processes can't? - ruby-on-rails

I would like some input on this since it would help guide as to what I should focus on in my studies (if I should consider threads at all).
Are there examples of Rails application where threads are absolutely necessary and the multiple process model can't provide an adequate solution. One exception would be an application that has memory restrictions and would need to use threads instead of spawning multiple processes. But assuming that memory is not an issue, what are some additional cases where threads are the better bet?

Threads are easier to write and debug. I'll start with simple non-threaded code, debug it, then wrap a chunk with Thread.new and join at the end and I'm done.
And, yes, study them. You'll learn useful techniques and gain knowledge that's going to be good to have in your "programming toolchest".
As far as what can threads do that processes can't? Threads can very easily share data and work from the same queue or queues. Doing that with separate processes requires a database or IPC or using a messaging queue, all which add a lot of complexity (though they can also increase capacity too.)

Generally, Threads are more efficient to create / tear-down than processes.
SideKiq is more efficient than Resque largely because SideKiq workers are Threads, whereas Resque use forked workers (processes).
But the problem is that Ruby on MRI doesn't have native threads, so each Thread in Ruby is limited by the Global Interpreter Lock (GIL). See this Igvita article for more information: http://www.igvita.com/2008/11/13/concurrency-is-a-myth-in-ruby/
On platforms with native threads such as JRuby you can have a multi-threaded Rails app (running in a servlet container) and it will likely out-perform the same app running under MRI. Its also possible that JRuby on the Hotspot JVM can do just-in-time performance optimizations as well.

Related

Is it possible to run Sidekiq in the same process with a puma rails server?

Is there anything in its architecture that makes it hard to do?
I want to run an existing rails+sidekiq application in a VM with very little memory, and loading the entire rails stack in two different process is using a lot of RAM.
Puma is built to spin up homogenous web worker threads, and divide incoming requests among them. If you wanted to modify it to spawn off separate Sidekiq threads, it should technically be possible with a crazy puma.rb file, but there's no precedent I can find for doing so (edit: Mike’s answer below points out that the sucker_punch gem can essentially do this, for the same purpose of memory efficiency). Practically-speaking, if your VM cannot support running two Rails processes at a time, it probably won't be able to handle the increased memory load as your application does the work of both Sidekiq and Puma… but that depends on your workload.
If this is just for development purposes, you might be able to accomplish what you're looking for by turning on Sidekiq's inline mode (normally meant just for testing):
require 'sidekiq/testing'
Sidekiq::Testing.inline!
This will cause all perform_async calls to actually execute inline, instead of going into Redis and being picked up by the Sidekiq process.
Nothing official.
This is what sucker_punch is designed for.

If Node.js is single threaded, what is Rails?

If we say Node.js is single threaded and therefore there is just one thread that handles all the requests, what is Rails?
As I understand, Node.js is both the application and the server, but I am lost on what Rails would be? How does Rails handle requests in terms of threads/processes?
Rails can be single-threaded, it can be multi-threaded, it can be multi-process (where each process is single-threaded), or it can be multi-process where each process is multi-threaded.
It really all depends upon the app server you're using, and it kind of depends upon which Ruby implementation you're using. MRI Ruby supports native threads as of 1.9, but it still maintains what's known as a global interpreter lock. The GIL prevents the Ruby interpreter from running in multiple threads at a time. In most cases that's not really a big deal though, because the thing threads are helping with the most is waiting for I/O. If you're using either JRuby or Rubinius, they can actually run Ruby code in multiple threads at a time.
Check out the different app servers and what they offer in terms of concurrency features. Unicorn is a common one for deploying multi-process/single-threaded applications. Puma is a newer app server that's capable of running multi-threaded applications, and I believe they're either adding (or maybe have added by now, I'm not sure) the ability to run multi-process as well. Passenger seems to be able to work in every model I've listed above.
I hope this helps a little. It should at least give you some things to Google for to find more information.

What is the difference between forking and threading in a background process?

Reading the documentation for the spawn gem it states:
By default, spawn will use the fork to spawn child processes. You can
configure it to do threading either by telling the spawn method when
you call it or by configuring your environment. For example, this is
how you can tell spawn to use threading on the call,
What would be the difference between using a fork or a thread, what are the repercussions of either decision, and how do I know which to use?
Threading means you run the code in another thread in the same process whereas forking means you fork a separate process.
Threading in general means that you'll use less memory since you won't have a separate application instance (this advantage is lessened if you have a copy on write friendly ruby such as ree). Communication between threads is also a little easier.
Depending on your ruby interpreter, ruby may not use extra cores efficiently (jruby is good at this, MRI much worse) so spawning a bunch of extra threads will impact the performance of your web app and won't make full use of your resources - MRI only runs one thread at a time
Forking creates separate ruby instances so you'll make better use of multiple cores. You're also less likely to adversely affect your main application. You need to be a tiny bit careful when forking as you share open file descriptors when you fork, so you usually want to reopen database connections, memcache connections etc.
With MRI I'd use forking, with jruby there's more of a case to be made for threading
Fork creates another process and processes are generally designed to run independently of whatever else is going on in your application. Processes do not share resources.
Threads, however, are designed for a different purpose. You would want to use a thread if you wish to parallelize a certain task.
"A fork() induces a parent-child relationship between two processes. Thread creation induces a peer relationship between all the threads of a process."
Read a more extensive explanation on this link.

Mongrel does not use the full CPU power on Windows 2003 server

I have a deployment using:
rails 2.3.2
ruby 1.8.7
mysql db
and
3 mongrel instances (windows services) with apache as load balancer
[I know it is due for upgrade...]
OS: Windows2003
We have many CPU intensive tasks and when these occur on the 4 core machine the mongrel process is able to only use a max 25% cpu power on the core the task was scheduled.
After running many tests we noticed that it is only able to use the power of a single core and therefore there is time lag in finishing tasks.
There is a suggestion to virtualize... which is difficult to do on the client server.
Has anyone got any suggestion on how the situation can be improved? Memory does reach 250MB to 1GB for this process but this not such a big issue.
Thanks in advance
Linus
The typically used Ruby versions (MRI or YARV, i.e 1.8 or 1.9) are not able to use more than one core at a time. MRI is single-threaded and just provides green threads internally. YARV uses real OS threads but has a GIL (global interpreter lock) that ensures that only one thread is running at a time.
Thus, your mongrels are unable to use more than one core each (even if you would have coded your Rails app to be threadsafe). There are alternative Ruby implementations, like JRuby or Rubinius that provide native threads without a global interpreter lock and thus allow your app to use more than one core, but you'd probably need to adapt your app a bit.
That said, even then a single request would run in a single thread and thus only use a single core. But that is something that is hard to come by without you handling your own threads (or at least fibers in 1.9) which is most probably not worth the hassle.
So generally, the recommendation is to start multiple app server processes (mongrels in your case). I personally use about 1.5 - 3 per core (depending on the app). That way, you are able to answer that many parallel requests and fully use your available CPU power shared between them.

Why use threads in Ruby Event Machine?

Since event machine is said to be an event based model async I/O library (like node.js) that is single-threaded and uses event loop to handle concurrent requests, is it really necessary to care about and use threading on the ruby application layer code (i.e rails controller when handling requests)?
I'm more used to node.js model where you actually just wrap your code inside the callback, and then everything is taken care of for you. (the select() system call to kqueue, epoll, etc that spawns new threads are handled in the lower level C++ implementation), and also, ECMAscript by its nature doesnt have threads anyway.
Recently I saw this piece of ruby code when trying to learn about Event Machine:
Thread = Thread.current
Thread.new{
EM.run{ thread.wakeup }
}
# pause until reactor starts
Thread.stop
I'm just curious when threads are to be used in the event-based programming paradigm in ruby environment and what specific situation would require us to use them.
I know that ruby has threads built into the language (MRI green threads, JRuby JVM threads) so it may be tempting to use threads? However from my point of view, it kinds of defeats the whole purpose if you're actually not supposed to worry about them in the higher level application code since event based model pretty much is introduced to solve this problem.
Thanks. appreciate any answers/clarifications.
While using EventMachine, you cannot have a cpu intensive task because the time you spend on your task is "taken away" from the reactor, I use threads when I know a task is going to:
be blocking (you should never block the eventmachine thread)
use more cpu than my average tasks
In these cases spawning the task in a separate thread allows it to do its job without preventing the reactor from doing its own work.
Another choice is to use fibers which is yet another different beast.
The biggest difference between a thread and a state machine, as far as I'm aware, is that threads will take advantage of a multi-core processor to do true parallel processing, while a state machine processes everything in serial. The state machine, on the other hand, is easier to maintain data integrity with since you don't have to worry so much about race conditions.

Resources