I've got a Rails application in which a small number of actions require significant computation time. Rather than going through the complexity of managing these actions as background tasks, I've found that I can split the processing into multiple threads and by using JRuby with a multicore sever, I can ensure that all threads complete in a reasonable time. (The customer has already expressed a strong interest in keeping this approach vs. running tasks in the background.)
The problem is that writing to the Rails logger doesn't work within these threads. Nothing shows up in the log file. I found a few references to this problem but no solutions. I wouldn't mind inserting puts in my code to help with debugging but stdout seems to be eaten up by the glassfish gem app server.
Has anyone successfully done logging inside a Rails ruby thread without creating a new log each time?
I was scratching my head with the same problem. For me the answer was as follows:
Thread.new do
begin
...
ensure
Rails.logger.flush
end
end
I understand your concerns about the background tasks, but remember that spinning off threads in Rails can be a scary thing. The framework makes next to no provisions for multithreading, which means you have to treat all Rails objects as not being thread-safe. Even the database connection gets tricky.
As for the logger: The standard Ruby logger class should be thread safe. But even if Rails uses that, you have no control over what the Rails app is doing to it. For example the benchmarking mechanism will "silence" the logger by switching levels.
I would avoid using the rails logger. If you want to use the threads, create a new logger inside the thread that logs the messages for that operation. If you don't want to create a new log for each thread, you can also try to create one thread-safe logging object in your runtime that each of the threads can access.
In your place I'd probably have another look at the background job solutions. While DRb looks like a nightmare, "bj" seems nice and easy; although it required some work to get it running with JRuby. There's also the alternative to use a Java scheduler from JRuby, see http://www.jkraemer.net/2008/1/12/job-scheduling-with-jruby-and-rails
Related
In my RoR app, I'm writing an API in which I need to call multiple upstream APIs, so I'm planning to call them in parallel to save time. I want to follow best practices when implementing multi-threaded logic in ruby-on-rails applications.
The RoR guide states clearly that we need to wrap our code but it didn't explain why it is important.
From ruby-on-rails guidelines:
Each thread should be wrapped before it runs application code, so if
your application manually delegates work to other threads, such as via
Thread.new or Concurrent Ruby features that use thread pools, you
should immediately wrap the block
My App runs Rails version 4.
Number of upstream API calls in a single request ranges from 3 to 30
I checked out this similar SO post, but it doesn't mention anything about wrapping threaded code.
Wrapping the thread on the executor makes sure that you won't have any problem with unloaded constants... You won't see errors like
Unable to autoload constant User, expected ../user.rb to define it (LoadError).
I want to build a service that notifies me when a url returns status 200. I'm currently using a sidekiq worker, if the status == 200, it updates my database (row.available = true), if not, it raises an exception and retries the worker in n seconds, n amount of times.
Though this works, it doesn't feel efficient or scalable (1000's checks would result in 1000's of exceptions, and on certain platforms that's bad news -- JRuby), and I'm sure there is a way I can build an internal service to manage this url monitoring that doesn't rely on sidekiq (perhaps in Go, or another, more suited Ruby gem). However, I have no idea where to begin, and so I'd appreciate some general direction.
Writing and running a simple link checker is easy. Doing that for 1000s of links quickly, without redundancy, and handling dead and slow-responding links without bogging down your entire system gets harder.
I'd use three threads, plus two queues:
A dispatcher thread that only reads from the database. It is responsible for finding and queuing URLs to be checked in to a "to be checked" queue.
A worker thread that consumes from the first queue and pushes results into the "updated URL results" queue.
An updater/consumer thread that takes the result of a thread in #2 and updates the database.
Ruby has some built-in classes to help:
Thread
Queue
I'd highly recommend Typhoeus and Hydra for use in the middle thread. The documentation for these two classes cover a lot of what you need to do as far as handling multiple threads running in parallel.
I wouldn't write this code as part of a Rails application. There is no value added by Rails to this, nor is it necessary. I would either require Active Record and piggy-back on the existing database.yaml settings and models, or use Rails' "runner" to run the code as an adjunct to the Rails code.
Or, I'd write a small, application-specific, piece of code to run on a different server to avoid bogging down the Rails server. Using something like MySQL or PostgreSQL drivers would let you talk to the same database that Rails uses. In this case I'd use the Sequel gem to act as the ORM, but that's because I prefer it over Active Record.
There are a lot of things to consider as you write this code, including retries of failed URLs, sensing redirections and updating the source URLs to reflect them to avoid wasting time, and not beating up the hosting servers causing you to be banned.
I've written several apps for this purpose over the years and doing it right takes a lot of forethought, so think out your design up front otherwise you could end up with some major rewrites later on.
I'm coming from a PHP environment (at least in terms of web dev) and into the beautiful world of Ruby, so I may have some dumb questions. I imagine there are some fundamentally different options available when not using PHP.
In PHP, we use memcache to store alerts we want to display in a bar along the top of the page. When something happens that generates an alert (such as a new blog post being made), a cron script that runs once every 5 minutes or so puts that information into memcache.
Now when a user visits the site, we look in memcache to find any alerts that they haven't already dismissed and we display them.
What I'm guessing I can do differently in Rails, is to by-pass the need for a cron script, and also the need to look in memcache on every request, by using a Singleton and a polling process running in a separate thread to copy from memcache to this singleton. This would, in theory, be more optimized than checking memcache once-per-request and also encapsulate the polling logic into one place, rather than being split between a cron task and the lookup logic.
My question is: are there any caveats to having some sort of runloop in the background while a Rails app is running? I understand the implications of multithreading, from Objective-C/Java, but I'm asking specifically about the Rails (3) environment.
Basically something like:
class SiteAlertsMap < Hash
include Singleton
def initialize
super
begin_polling
end
# ... SNIP, any specific methods etc ...
private
def begin_polling
# Create some other Thread here, which polls at set intervals
end
end
This leads me into a similar question. We push (encrypted) tasks onto an SQS queue, for things related to e-commerce and for long-running background tasks. We don't use cron for this, but rather we have a worker daemon written in PHP, which runs in the background. Right now when we deploy, we have to shut down this worker and start it again from the new code-base. In Rails, could I somehow have this process start and stop with the rails server (unicorn) itself? I don't think that's something I'd running on the main process in a separate thread, since we often want to control it as a process by itself, but it would be nice if it just conveniently ran when the web application was running.
Threading for background processes in ruby would be a terrible mistake, especially since you're using a multi-process server. Using unicorn with say 4 worker processes would mean that you'd be polling from each of them, which is not what you want. Ruby doesn't really have real threads, it has green threads in 1.8 and a global interpreter lock in 1.9 IIRC. Many gems and libraries are also obnoxiously unthreadsafe.
Using memcache is still your best option and, if you have it set up correctly, you should only see it adding a millisecond or two to the request time. Another option which would give you the benefit of persisting these alerts while incurring minimal additional overhead would be to store these alerts in redis. This would better protect you against things like memcache crashing or server reboots.
For the background jobs you should use a similar approach to what you have now, but there are several off the shelf handlers for this like resque, delayed_job, and a few others. If you absolutely have to use SQS as the backend queue, you might be able to find some code to help you, but otherwise you could write it yourself. This still requires the other daemon to be rebooted whenever there is a code change. In practice this isn't a huge concern as best practices dictate using a deployment system like capistrano where a rule can easily be added to bounce the daemon on deploy. I use monit to watch the daemon process, so restarting it is as easy as telling monit to restart it.
In general, Ruby is not like Java/Objective-C when it comes to threads. It follows the more Unix-like model of process based isolation, but the community has come up with best practices and ways to make this less painful than in other languages. Ruby does require a bit more attention to setting up its stack as it is not as simple as enabling mod_php and copying some files around, but once the choices and architecture is understood, it is easier to reason about how your application works. The process model, in my opinion, is much better for web apps as it isolates code and state from the effects of other running operations. The isolation also makes the app easier to work with in a distributed system.
I have a Rails application that unfortunately after a request to a controller, has to do some crunching that takes awhile. What are the best practices in Rails for providing feedback or progress on a long running task or request? These controller methods usually last 60+ seconds.
I'm not concerned with the client side... I was planning on having an Ajax request every second or so and displaying a progress indicator. I'm just not sure on the Rails best practice, do I create an additional controller? Is there something clever I can do? I want answers to focus on the server side using Rails only.
Thanks in advance for your help.
Edit:
If it matters, the http request are for PDFs. I then have Rails in conjunction with Ruport generate these PDFs. The problem is, these PDFs are very large and contain a lot of data. Does it still make sense to use a background task? Let's assume an average PDF takes about one minute to two minutes, will this make my Rails application unresponsive to any other server request during this time?
Edit 2:
Ok, after further investigation, it seems my Rails application is indeed unresponsive to any other HTTP requests after a request comes in for a large PDF. So, I guess the question now becomes: What is the best threading/background mechanism to use? It must be stable and maintained. I'm very surprised Rails doesn't have something like this built in.
Edit 3:
I have read this page: http://wiki.rubyonrails.org/rails/pages/HowToRunBackgroundJobsInRails. I would love to read about various experiences with these tools.
Edit 4:
I'm using Passenger Phusion "modrails", if it matters.
Edit 5:
I'm using Windows Vista 64 bit for my development machine; however, my production machine is Ubuntu 8.04 LTS. Should I consider switching to Linux for my development machine? Will the solutions presented work on both?
The Workling plugin allow you to schedule background tasks in a queue (they would perform the lengthy task). As of version 0.3 you can ask a worker for its status, this would allow you to display some nifty progress bars.
Another cool feature with Workling is that the asynchronous backend can be switched: you can used DelayedJobs, Spawn (classic fork), Starling...
I have a very large volume site that generates lots of large CSV files. These sometimes take several minutes to complete. I do the following:
I have a jobs table with details of the requested file. When the user requests a file, the request goes in that table and the user is taken to a "jobs status" page that lists all of their jobs.
I have a rake task that runs all outstanding jobs (a class method on the Job model).
I have a separate install of rails on another box that handles these jobs. This box just does jobs, and is not accessible to the outside world.
On this separate box, a cron job runs all outstanding jobs every 60 seconds, unless jobs are still running from the last invocation.
The user's job status page auto-refreshes to show the status of the job (which is updated by the jobs box as the job is started, running, then finished). Once the job is done, a link appears to the results file.
It may be too heavy-duty if you just plan to have one or two running at a time, but if you want to scale... :)
Calling ./script/runner in the background worked best for me. (I was also doing PDF generation.) It seems like the lowest common denominator, while also being the simplest to implement. Here's a write-up of my experience.
A simple solution that doesn't require any extra Gems or plugins would be to create a custom Rake task for handling the PDF generation. You could model the PDF generation process as a state machine with states such as submitted, processing and complete that are stored in the model's database table. The initial HTTP request to the Rails application would simply add a record to the table with a submitted state and return.
There would be a cron job that runs your custom Rake task as a separate Ruby process, so the main Rails application is unaffected. The Rake task can use ActiveRecord to find all the models that have the submitted state, change the state to processing and then generate the associated PDFs. Finally, it should set the state to complete. This enables your AJAX calls within the Rails app to monitor the state of the PDF generation process.
If you put your Rake task within your_rails_app/lib/tasks then it has access to the models within your Rails application. The skeleton of such a pdf_generator.rake would look like this:
namespace :pdfgenerator do
desc 'Generates PDFs etc.'
task :run => :environment do
# Code goes here...
end
end
As noted in the wiki, there are a few downsides to this approach. You'll be using cron to regularly create a fairly heavyweight Ruby process and the timing of your cron jobs would need careful tuning to ensure that each one has sufficient time to complete before the next one comes along. However, the approach is simple and should meet your needs.
This looks quite an old thread. However, what I have down in my app, which required to run multiple Countdown Timers for different pages, was to use Ruby Thread. The timer must continue running even if the page was closed by users. Ruby makes it easy to write multi-threaded programs with the Thread class. Ruby threads are a lightweight and efficient way to achieve parallelism in your code. I hope this will help other wanderers who is looking to achieve background: parallelism/concurrent services in their app. Likewise Ajax makes it a lot easier to call a specific Rails [custom] action every second.
This really does sound like something that you should have a background process running rather than an application instance(passenger/mongrel whichever you use) as that way your application can stay doing what it's supposed to be doing, serving requests, while a background task of some kind, Workling is good, handles the number crunching. I know that this doesn't deal with the issue of progress, but unless it is absolutely essential I think that is a small price to pay.
You could have a user click the action required, have that action pass the request to the Workling queue, and have it send some kind of notification to the user when it is completed, maybe an email or something. I'm not sure about the practicality of that, just thinking out loud, but my point is that it really seems like that should be a background task of some kind.
I'm using Windows Vista 64 bit for my
development machine; however, my
production machine is Ubuntu 8.04 LTS.
Should I consider switching to Linux
for my development machine? Will the
solutions presented work on both?
Have you considered running Linux in a VM on top of Vista?
I recommend using Resque gem with it's resque-status plug-in for your heavy background processes.
Resque
Resque is a Redis-backed Ruby library for creating background jobs,
placing them on multiple queues, and processing them later.
Resque-status
resque-status is an extension to the resque queue system that provides
simple trackable jobs.
Once you run a job on a Resque worker using resque-status extension, you will be able to get info about your ongoing progresses and ability to kill a specific process very easily. See examples:
status.pct_complete #=> 0
status.status #=> 'queued'
status.queued? #=> true
status.working? #=> false
status.time #=> Time object
status.message #=> "Created at ..."
Also resque and resque-status has a cool web interface to interact with your jobs which is so cool.
There is the brand new Growl4Rails ... that is for this specific use case (among others as well).
http://www.writebetterbits.com/2009/01/update-to-growl4rails.html
I use Background Job (http://codeforpeople.rubyforge.org/svn/bj/trunk/README) to schedule tasks. I am building a small administration site that allows Site Admins to run all sorts of things you and I would run from the command line from a nice web interface.
I know you said you were not worried about the client side but I thought you might find this interesting: Growl4Rails - Growl style notifications that were developed for pretty much what you are doing judging by the example they use.
I've used spawn before and definitely would recommend it.
Incredibly simple to set up (which many other solutions aren't), and works well.
Check out BackgrounDRb, it is designed for exactly the scenario you are describing.
I think it has been around for a while and is pretty mature. You can monitor the status of the workers.
It's a pretty good idea to develop on the same development platform as your production environment, especially when working with Rails. The suggestion to run Linux in a VM is a good one. Check out Sun xVM for Open Source virtualization software.
I personally use active_messaging plugin with a activemq server (stomp or rest protocol). This has been extremely stable for us, processing millions of messages a month.
I've been sending emails on my application (ruby 1.8.7, rails 2.3.2) like this
Thread.new{UserMailer.deliver_signup_notification(user)}
Since ruby use green threads, there's any performance advantage doing this, or I can just use
UserMailer.deliver_signup_notification(user)
?
Thanks
Global VM lock will still almost certainly apply while sending that email, meaning no difference.
You should not start threads in a request/response cycle. You should not start threads at all unless you can watch them from create to join, and even then, it is rarely worth the trouble it creates.
Rails is not thread-safe, and is not meant to be from within your controller actions. Only since Rails 2.3 has just dispatching been thread-safe, and only if you turn it on in environment.rb with config.threadsafe!.
This article explains in more detail. If you want to send your message asynchronously use BackgroundRb or its analog.
In general, using green threads to run background tasks asynchronously will mean that your application can respond to the user before the mail is sent. You're not concerned about exploiting multiple CPUs; you're only concerned on off-loading the work onto a background process and returning a web page as soon as possible.
And from examining the Rails documentation, it looks like deliver_signup_notification will block long enough to get the mail queued (although I may be wrong). So using a thread here might make your application seem more responsive, depending on how your mailer is configured.
Unfortunately, it's not clear to me that deliver_signup_notification is necessarily thread-safe. I'd want to read the documentation carefully before relying on that.
Note also that you're making assumptions about the lifetime of a Rails process once a request has been served. Many Rails applications using DRb (or a similar tool) to offload these background tasks onto an entirely separate worker process. The easiest way to do this changes fairly often--see Google for a number of popular libraries.
I have used your exact strategy and our applications are currently running in production (but rails 2.2.2). I've kept a close eye on it and our load has been relatively low (Less than 20 emails sent per day average, with peaks of around 150/day).
So far we have noticed no problems, and this appears to have resolved several performance issues we were having when using Google's mailserver.
If you need something in a hurry then give it a shot, it has been working for us.
They'll be the same as far as I know.