I have a server running several RoR applications. Some of them require delayed_job to handle picture resizing and other tasks. At the moment, I'm running delayed_job per each application... this results in a higher memory consumption.
Is it possible to run a shared delayed_job instance on the server, that will be used by applications?
That would be suspicious. Delayed_job is supposed to run with the same codebase as the application requesting the jobs - it uses this fact for job serialization/deserialization which would be extremely difficult otherwise. In your case the jobs might be similar but at some point one of the applications is going to schedule a job that the delayed process simply won't understand.
Related
Is there anything in its architecture that makes it hard to do?
I want to run an existing rails+sidekiq application in a VM with very little memory, and loading the entire rails stack in two different process is using a lot of RAM.
Puma is built to spin up homogenous web worker threads, and divide incoming requests among them. If you wanted to modify it to spawn off separate Sidekiq threads, it should technically be possible with a crazy puma.rb file, but there's no precedent I can find for doing so (edit: Mike’s answer below points out that the sucker_punch gem can essentially do this, for the same purpose of memory efficiency). Practically-speaking, if your VM cannot support running two Rails processes at a time, it probably won't be able to handle the increased memory load as your application does the work of both Sidekiq and Puma… but that depends on your workload.
If this is just for development purposes, you might be able to accomplish what you're looking for by turning on Sidekiq's inline mode (normally meant just for testing):
require 'sidekiq/testing'
Sidekiq::Testing.inline!
This will cause all perform_async calls to actually execute inline, instead of going into Redis and being picked up by the Sidekiq process.
Nothing official.
This is what sucker_punch is designed for.
Can i run delayed_job or similar schedule frameworks inside of the web server eg. thin or unicorn?
If yes how do i start it? (code example would be very cool!)
The reason is that i want to save money during my application is just in a build-up phase and it is hosted on heroku.
Officially
No, there is no supported way to run delayed_jobs asynchronously within the web framework. From the documentation on running jobs, it looks like the only supported way to run a job is to run a rake task or the delayed job script. Also, it seems conceptually wrong to bend a Rack server, which was designed to handle incoming client requests, to support pulling tasks off of some queue somewhere.
The Kludge
That said, I understand that saving money sometimes trumps being conceptually perfect. Take a look at these rake tasks. My kludge is to create a special endpoint in your Rails server that you hit periodically from some remote location. Inside this endpoint, instantiate a Delayed::Worker and call .start on it with the exit_on_complete option. This way, you won't need a new dyno or command.
Be warned, it's kind of a kludgy solution and it will tie up one of your rails processes until all delayed jobs are complete. That means unless you have other rails processes, all incoming requests will block until this queue request is finished. Unicorn provides facilities to spawn worker processes. Whether or not this solution will work will also depend on your jobs and how long they take to run and your application's delay tolerances.
Edit
With the spawn gem, you can wrap your instantiation of the Delayed::Worker with a spawn block, which will cause your jobs to be run in a separate process. This means your rails process will be available to serve web requests immediately instead of blocking while delayed jobs are run. However, the spawn gem has some dependencies on ActiveRecord and I do not know what DB/ORM you are using.
Here is some example code, because it's becoming a bit hazy:
class JobsController < ApplicationController
def run
spawn do
#options = {} # youll have to get these from that rake file
Delayed::Worker.new(#options.merge(exit_on_complete: true)).start
end
end
end
Here's a link to a similar question:
Is it feasible to run multiple processeses on a Heroku dyno?
Bear in mind, as the post says, if you're only using one web dyno, it will be shut down if there's no traffic going to it.
In a similar vein, you might look into:
http://blog.codeship.io/2012/05/06/Unicorn-on-Heroku.html
To save on the need for multiple web dynos whilst you're building your app (although it's still subject to the above shutdown issue).
I would suggest you might look at running on a VPS directly, rather than Heroku (check out the railscast):
http://railscasts.com/episodes/337-capistrano-recipes
Once set up, it's pretty easy to deploy to. Heroku cuts out the devops part for you.
You can run it inside a separate worker of Unicorn, so it shares memory with the master process and get restarted together with the app.
See https://gist.github.com/brauliobo/11298486
I'm currently using cron and "rails runner" to execute background jobs. For the most part these jobs are simple polls "Find the records that are due to receive a reminder email. Send that email."
I've been watching my Amazon EC2 Small instance, and noticed that each time one of these cron job kicks in, the CPU spikes to ~99%. The teeny tiny little query inside my current job is definitely not responsible. I'm presuming that the spike is simply due to the effort of loading the full rails environment via "rails runner".
Is there a more CPU efficient way to handle regularly scheduled batch jobs?
P.S. I know that in the particular example of sending a reminder email at time X in the future, I could delayed_jobs, and simply schedule the job in the future. Not every possible task fits into the delayed_jobs framework very well though, so I'm looking for a more traditional "cron job" type solution. Like "rails runner", but without the crazy CPU consequences.
You can use workers witch don't load rails env. Or load it only once(like resque)
I don't think there is a solution for this, since you do need to load a Rails environment to handle whatever that is you are handling. So when on the "cron" model you will be starting up a handler which in turn will create some load on your instance. I don't know how cloud services lend themselves to this, but I think the optimal model in your case would be to have a running daemon for job handling and forking coupled with REE for the job execution (that helps prevent memory leaks by letting as much as possible happen in the child process that will die at the end of the execution loop).
The daemon could be configured to accept signals (also via a job queue) that would spin off jobs doing specific things.
I am currently using Resque for dealing with background jobs in my application. Now i have 5 different queues at present (it ll grow very fast). Each of them is doing works like updating Solr indexes, Real-time Notification ,scheduled newsletters, delayed emails & SMS. And currently am using Resque as a rails gem and running the Resque from rails environment.
Now i am planning to move Solr index update task and scheduled newsletters to different server since these two performing heavy operations. One approach is just copy the rails directory to the new server and run the Resque jobs from the rails environment. But i am not comfortable doing this.
Another one is creating a seperate rake app for resque tasks. But the problem is, both these tasks are heavily tied to rails models and rails templating. I am totally unsure how to proceed next.
Has anyone faced the similar problem, and how you architected the application?
We use rubber to provision our servers and deploy. It's a plugin for Capistrano that does role-based deployment. One of the roles is "resque_worker" and any machine with that role will start up resque-pool to start processing work.
But you can do this much more simply. Just deploy your application to two different machines. Resque is designed to allow workers on different machines. As long as your second machine can access your redis server, everything will work just fine.
I'm coming from a PHP environment (at least in terms of web dev) and into the beautiful world of Ruby, so I may have some dumb questions. I imagine there are some fundamentally different options available when not using PHP.
In PHP, we use memcache to store alerts we want to display in a bar along the top of the page. When something happens that generates an alert (such as a new blog post being made), a cron script that runs once every 5 minutes or so puts that information into memcache.
Now when a user visits the site, we look in memcache to find any alerts that they haven't already dismissed and we display them.
What I'm guessing I can do differently in Rails, is to by-pass the need for a cron script, and also the need to look in memcache on every request, by using a Singleton and a polling process running in a separate thread to copy from memcache to this singleton. This would, in theory, be more optimized than checking memcache once-per-request and also encapsulate the polling logic into one place, rather than being split between a cron task and the lookup logic.
My question is: are there any caveats to having some sort of runloop in the background while a Rails app is running? I understand the implications of multithreading, from Objective-C/Java, but I'm asking specifically about the Rails (3) environment.
Basically something like:
class SiteAlertsMap < Hash
include Singleton
def initialize
super
begin_polling
end
# ... SNIP, any specific methods etc ...
private
def begin_polling
# Create some other Thread here, which polls at set intervals
end
end
This leads me into a similar question. We push (encrypted) tasks onto an SQS queue, for things related to e-commerce and for long-running background tasks. We don't use cron for this, but rather we have a worker daemon written in PHP, which runs in the background. Right now when we deploy, we have to shut down this worker and start it again from the new code-base. In Rails, could I somehow have this process start and stop with the rails server (unicorn) itself? I don't think that's something I'd running on the main process in a separate thread, since we often want to control it as a process by itself, but it would be nice if it just conveniently ran when the web application was running.
Threading for background processes in ruby would be a terrible mistake, especially since you're using a multi-process server. Using unicorn with say 4 worker processes would mean that you'd be polling from each of them, which is not what you want. Ruby doesn't really have real threads, it has green threads in 1.8 and a global interpreter lock in 1.9 IIRC. Many gems and libraries are also obnoxiously unthreadsafe.
Using memcache is still your best option and, if you have it set up correctly, you should only see it adding a millisecond or two to the request time. Another option which would give you the benefit of persisting these alerts while incurring minimal additional overhead would be to store these alerts in redis. This would better protect you against things like memcache crashing or server reboots.
For the background jobs you should use a similar approach to what you have now, but there are several off the shelf handlers for this like resque, delayed_job, and a few others. If you absolutely have to use SQS as the backend queue, you might be able to find some code to help you, but otherwise you could write it yourself. This still requires the other daemon to be rebooted whenever there is a code change. In practice this isn't a huge concern as best practices dictate using a deployment system like capistrano where a rule can easily be added to bounce the daemon on deploy. I use monit to watch the daemon process, so restarting it is as easy as telling monit to restart it.
In general, Ruby is not like Java/Objective-C when it comes to threads. It follows the more Unix-like model of process based isolation, but the community has come up with best practices and ways to make this less painful than in other languages. Ruby does require a bit more attention to setting up its stack as it is not as simple as enabling mod_php and copying some files around, but once the choices and architecture is understood, it is easier to reason about how your application works. The process model, in my opinion, is much better for web apps as it isolates code and state from the effects of other running operations. The isolation also makes the app easier to work with in a distributed system.