JRuby-friendly method for parallel-testing Rails app - ruby-on-rails

I am looking for a system to parallelise a large suite of tests in a Ruby on Rails app (using rspec, cucumber) that works using JRuby. Cucumber is actually not too bad, but the full rSpec suite currently takes nearly 20 minutes to run.
The systems I can find (hydra, parallel-test) look like they use forking, which isn't the ideal solution for the JRuby environment.

We don't have a good answer for this kind of application right now. Just recently I worked on a fork of spork that allows you to keep a process running and re-run specs or features in it, provided you're using an app framework that supports code reloading (like Rails). Take a look at the jrubyhub application for an example of how I use Spork.
You might be able to spawn a spork instance for your application and then send multiple, threaded requests to it to run different specs. But then you're relying on RSpec internals to be thread-safe, and unfortunately I'm pretty sure they're not.
Maybe you could take my code as a starting point and build a cluster of spork instances, and then have a client that can distribute your test suite across them. It's not going to save memory and will still take a long time to start up, but if you start them all once and just re-use them for repeated runs, you might make some gains in efficiency.
Feel free to stop by user#jruby.codehaus.org or #jruby on freenode for more ideas.

Related

Browse a Rails App with the DB in Sandbox Mode?

I'm writing a lot of request specs right now, and I'm spending a lot of time building up factories. It's pretty cumbersome to make a change to a factory, run the specs, and see if I forgot about any major dependencies in my data. Over and over and over...
It makes me want to set up some sort of sandboxed environment, where I could browse the site and refresh the database from my factories at will. Has anyone ever done this?
EDIT:
I'm running spork and rspec-guard to make this easier, but I still lose a lot of time.
A large part of that time is spent waiting for Capybara/FireFox to spin up. These are request specs, and quite often there are some JavaScript components that need to be exercised as well.
You might look at a couple of solutions first:
You can run specific test files rather than the whole suite with something like rspec spec/request/foo_spec.rb. You can run a specific test with the -e option, or by appending :lineno to the filename, where lineno is the line number the test starts on.
You can use something like guard-rspec, watchr, or autotest to automatically run tests when their associated files change.
Tools like spork and zeus can be used to preload the app environment so that test suite runs take less time to run. However, I don't think this will reload factories so they may not apply here.
See this answer for ways that you can improve Rails' boot time. This makes running the suite substantially less painful.

Improve slow Rails startup time (rails console, rails server)

I work with several Rails apps, some on Rails 3.2/Ruby 2.0, and some one Rails 2.3/Ruby 1.8.7.
What they have in common is that, as they've grown and added more dependencies/gems, they take longer and longer to start. Development, Test, Production, console, it doesn't matter; some take 60+ seconds.
What is the preferred way to first, profile for what is causing load times to be so slow, and two, improve the load times?
There are a few things that can cause this.
Too many GC passes and general VM shortcomings - See this answer for a comprehensive explanation. Ruby <2.0 has some really slow bits that can dramatically increase load speeds; compiling Ruby with the Falcon or railsexpress patches can massively help that. All versions of MRI Ruby use GC settings by default that are inappropriate for Rails apps.
Many legacy gems that have to be iterated over in order to load files. If you're using bundler, try bundle clean. If you're using RVM, you could try creating a fresh gemset.
As far as profiling goes, you can use ruby-prof to profile what happens when you boot your app. You can wrap config/environment.rb in a ruby-prof block, then use that to generate profile reports of a boot cycle with something like rails r ''. This can help you track down where you're spending the bulk of your time in boot. You can profile individual sections, too, like the bundler setup in boot.rb, or the #initialize! call in environment.rb.
Something you might not be considering is DNS timeouts. If your app is performing DNS lookups on boot, which it is unable to resolve, these can block the process for $timeout seconds (which might be as high as 30 in some cases!). You might audit the app for those, as well.
Ryan has a good tutorial about speeding up tests, console, rake tasks: http://railscasts.com/episodes/412-fast-rails-commands?view=asciicast
I have checked every methods there and found "spring" the best. Just run the tasks like:
$ spring rspec
The time for your first run of spring will be same as before, but the second and later will be much faster.
Also, from my experience, there will be time you need to stop spring server and restart when there is weird error, but the chance is rare.
For ruby 2 apps, try zeus - https://github.com/burke/zeus
1.8 apps seem to boot much faster than 1.9, spork might help? http://railscasts.com/episodes/285-spork

optimize capybara times

I have test suite for acceptance tests in my rails app that uses pure capybara (no cucumber).
It has 220 examples and it takes 21 minutes to finish. My non-js driver is rack_test and my js_driver is capybara-webkit instead of selenium.
I would like to improve test times, but i have no idea if there is a common bottle-neck in this kind of testing.
Some ideas i have/had:
Change capybara server. It was using mongrel as a fallback. The default is thin. I installed thin but i didn't get any speed improvement. Seem like thins advantage is concurrency, an tests dont have it.
Since I am cleaning the database between tests, before each example of a private part of my app (MOST of the examples are like this) I need to login. That mean it loggin the app 200 times. There is a way to mantain session between examples in order to avoid loggin again and again?
there are two things that come to my mind:
parallel_tests can improve your test-speed if you run multicore https://github.com/grosser/parallel_tests
providing a backdoor-login-route for your test-login can improve test-speed by bypassing the login-step
in general acceptance-tests are slow. that's why i use them only for testing critical user workflows. i try to keep my whole test-suite within a 5 minute range. i really think that it's critical for your application test suite to be fast. that's why i try to put a lot of logic outside of rails tests so that a test-run completes within a second or less.

What do I need to know about JRuby on Rails after developing RoR Apps?

I have done a few projects using Ruby on Rails. I am going to use JRuby on Rails and hosting it on GAE. In that case what are the stuff that I need to know while developing JRuby apps. I read that
JRuby has the same syntax
I can access Java libraries
JRuby does not have access to some gems/plugins
JRuby app would take some time to load for the first time, so I have to keep it alive by sending
request every 5 mins or so
I cannot use ActiveRecord and instead I must DataMapper
Please correct if I am wrong about any of the statements I have made and Is there anything else that I must know?. Do I need to start reading about JRuby from the scratch or I can go about as usual developing Ruby apps?
I use JRuby everyday.
True:
JRuby has the same syntax
JRuby does not have access to some gems/plugins
I can access Java libraries
Some gems/plugins have jruby-specific versions, some don't work at all. In general, I have found few problems and as the libraries and platforms have matured a lot of the problems have gone away (JRuby has become a lot better).
You can access Java, but in general why would you want to?
False:
JRuby app would take some time to load for the first time, so I have to keep it alive by sending request every 5 mins or so
I cannot use ActiveRecord and instead I must DataMapper
Although I guess it is possible to imagine a server setup where the initial startup/warmup cost of the JVM means you need to ping the server, there is nothing inherent in JRuby that makes this true. If you need to keep the server alive, you should look at your deployment environment. Something similar happens in shared-hosting with passenger where an app can go out of memory after a period of inactivity.
Also, we use ActiveRecord with no problems at all.
afaik, rails 3 is 100% compatible with jruby, so there should be no problem on that path.
like every new platform, you should make yourself comfortable with it by playing around with jruby. i recommend using RVM to do that.
as far as you questions go:
JRuby is just an other runtime like MRI or Rubinus
since JRuby is within the JVM using Java is very easy, but you can also use RJB from MRI
some gems are not compatible, when they use native c libraries, that do not run on JRuby
the JVM and your application container need startup time and some time to load your app, but that is all, there is no need for keep alive, that is wrong
you can use whatever you want, most gems are updated to be compatible with JRuby
#TobyHede mostly covered issues that you thought of you might have so I'll leave it at that.
As for other things to have in mind, it's simply a different interpreter and funny discrepancies will crop up that will take some adaptation.
some methods are implemented differently, such as sleep 10.seconds will throw exception (you have to sleep 10.seconds.to_i) and I remember getting NoMethodError on Symbol class when switching from MRI to JRuby (don't remember which method wasn't implemented), just have in mind slight variations will be there
you will experience hangs and exceptions in gems that otherwise worked for you (pry for example when listing more then one page)
some gems may work differently, pry (again) will exit if you press ctrl+c for example, pretty annoying
slightly slower load times of everything and no zeus
you'll get occasional java exception stack traces with no indication on which line of ruby code it happened
Timeout.timeout often will not work as expected when its wrapped around net code and stars align badly (this has mostly been fixed in jruby core, but it seems to still be an issue with gems that do their own netcode in pure java)
hidden problems with thread-safety in third party code How do you choose gems for a high throughput multithreaded Rails app? - stay away from EventMachine for example
threads will be awesome (due to nativeness and no gil) and fibers will suck (due to no coroutine support in JVM they're ordinary threads), this is why you often won't get a performance boost with celluloid when compared to MRI
you used to run your rails with MRI Ruby as processes in an OS, you knew how to track their PIDs, bloat, run times, kill them, monitor them etc, this part is not evident when you switch to JRuby because everything has turned to threads in a single process. Java world has very good tools to handle these issues, but its something you'll have to learn
killall -9 ruby doesn't do the trick with jruby when your console hangs (which it does more often then before), you have to ps -ef and then track the proper processes without killing your netbeans etc (minor, but annoying)
due to my last point, knowing Java and the JVM will help you get out of tight spots in certain situations (depending on what you intend to do this may be something you actually really need), choice of deployment server will increase or decrease this need (torquebox for example is a bit notorious for this, other deployment options might be simpler, see http://thenerdings.blogspot.com/2012/09/pulling-plug-on-torquebox-and-jruby-for.html)
...
Also, see what jruby team says about differences, https://github.com/jruby/jruby/wiki/DifferencesBetweenMriAndJruby
But yeah, otherwise its "just the same as MRI Ruby" :)

Best practice for Rails App to run a long task in the background?

I have a Rails application that unfortunately after a request to a controller, has to do some crunching that takes awhile. What are the best practices in Rails for providing feedback or progress on a long running task or request? These controller methods usually last 60+ seconds.
I'm not concerned with the client side... I was planning on having an Ajax request every second or so and displaying a progress indicator. I'm just not sure on the Rails best practice, do I create an additional controller? Is there something clever I can do? I want answers to focus on the server side using Rails only.
Thanks in advance for your help.
Edit:
If it matters, the http request are for PDFs. I then have Rails in conjunction with Ruport generate these PDFs. The problem is, these PDFs are very large and contain a lot of data. Does it still make sense to use a background task? Let's assume an average PDF takes about one minute to two minutes, will this make my Rails application unresponsive to any other server request during this time?
Edit 2:
Ok, after further investigation, it seems my Rails application is indeed unresponsive to any other HTTP requests after a request comes in for a large PDF. So, I guess the question now becomes: What is the best threading/background mechanism to use? It must be stable and maintained. I'm very surprised Rails doesn't have something like this built in.
Edit 3:
I have read this page: http://wiki.rubyonrails.org/rails/pages/HowToRunBackgroundJobsInRails. I would love to read about various experiences with these tools.
Edit 4:
I'm using Passenger Phusion "modrails", if it matters.
Edit 5:
I'm using Windows Vista 64 bit for my development machine; however, my production machine is Ubuntu 8.04 LTS. Should I consider switching to Linux for my development machine? Will the solutions presented work on both?
The Workling plugin allow you to schedule background tasks in a queue (they would perform the lengthy task). As of version 0.3 you can ask a worker for its status, this would allow you to display some nifty progress bars.
Another cool feature with Workling is that the asynchronous backend can be switched: you can used DelayedJobs, Spawn (classic fork), Starling...
I have a very large volume site that generates lots of large CSV files. These sometimes take several minutes to complete. I do the following:
I have a jobs table with details of the requested file. When the user requests a file, the request goes in that table and the user is taken to a "jobs status" page that lists all of their jobs.
I have a rake task that runs all outstanding jobs (a class method on the Job model).
I have a separate install of rails on another box that handles these jobs. This box just does jobs, and is not accessible to the outside world.
On this separate box, a cron job runs all outstanding jobs every 60 seconds, unless jobs are still running from the last invocation.
The user's job status page auto-refreshes to show the status of the job (which is updated by the jobs box as the job is started, running, then finished). Once the job is done, a link appears to the results file.
It may be too heavy-duty if you just plan to have one or two running at a time, but if you want to scale... :)
Calling ./script/runner in the background worked best for me. (I was also doing PDF generation.) It seems like the lowest common denominator, while also being the simplest to implement. Here's a write-up of my experience.
A simple solution that doesn't require any extra Gems or plugins would be to create a custom Rake task for handling the PDF generation. You could model the PDF generation process as a state machine with states such as submitted, processing and complete that are stored in the model's database table. The initial HTTP request to the Rails application would simply add a record to the table with a submitted state and return.
There would be a cron job that runs your custom Rake task as a separate Ruby process, so the main Rails application is unaffected. The Rake task can use ActiveRecord to find all the models that have the submitted state, change the state to processing and then generate the associated PDFs. Finally, it should set the state to complete. This enables your AJAX calls within the Rails app to monitor the state of the PDF generation process.
If you put your Rake task within your_rails_app/lib/tasks then it has access to the models within your Rails application. The skeleton of such a pdf_generator.rake would look like this:
namespace :pdfgenerator do
desc 'Generates PDFs etc.'
task :run => :environment do
# Code goes here...
end
end
As noted in the wiki, there are a few downsides to this approach. You'll be using cron to regularly create a fairly heavyweight Ruby process and the timing of your cron jobs would need careful tuning to ensure that each one has sufficient time to complete before the next one comes along. However, the approach is simple and should meet your needs.
This looks quite an old thread. However, what I have down in my app, which required to run multiple Countdown Timers for different pages, was to use Ruby Thread. The timer must continue running even if the page was closed by users. Ruby makes it easy to write multi-threaded programs with the Thread class. Ruby threads are a lightweight and efficient way to achieve parallelism in your code. I hope this will help other wanderers who is looking to achieve background: parallelism/concurrent services in their app. Likewise Ajax makes it a lot easier to call a specific Rails [custom] action every second.
This really does sound like something that you should have a background process running rather than an application instance(passenger/mongrel whichever you use) as that way your application can stay doing what it's supposed to be doing, serving requests, while a background task of some kind, Workling is good, handles the number crunching. I know that this doesn't deal with the issue of progress, but unless it is absolutely essential I think that is a small price to pay.
You could have a user click the action required, have that action pass the request to the Workling queue, and have it send some kind of notification to the user when it is completed, maybe an email or something. I'm not sure about the practicality of that, just thinking out loud, but my point is that it really seems like that should be a background task of some kind.
I'm using Windows Vista 64 bit for my
development machine; however, my
production machine is Ubuntu 8.04 LTS.
Should I consider switching to Linux
for my development machine? Will the
solutions presented work on both?
Have you considered running Linux in a VM on top of Vista?
I recommend using Resque gem with it's resque-status plug-in for your heavy background processes.
Resque
Resque is a Redis-backed Ruby library for creating background jobs,
placing them on multiple queues, and processing them later.
Resque-status
resque-status is an extension to the resque queue system that provides
simple trackable jobs.
Once you run a job on a Resque worker using resque-status extension, you will be able to get info about your ongoing progresses and ability to kill a specific process very easily. See examples:
status.pct_complete #=> 0
status.status #=> 'queued'
status.queued? #=> true
status.working? #=> false
status.time #=> Time object
status.message #=> "Created at ..."
Also resque and resque-status has a cool web interface to interact with your jobs which is so cool.
There is the brand new Growl4Rails ... that is for this specific use case (among others as well).
http://www.writebetterbits.com/2009/01/update-to-growl4rails.html
I use Background Job (http://codeforpeople.rubyforge.org/svn/bj/trunk/README) to schedule tasks. I am building a small administration site that allows Site Admins to run all sorts of things you and I would run from the command line from a nice web interface.
I know you said you were not worried about the client side but I thought you might find this interesting: Growl4Rails - Growl style notifications that were developed for pretty much what you are doing judging by the example they use.
I've used spawn before and definitely would recommend it.
Incredibly simple to set up (which many other solutions aren't), and works well.
Check out BackgrounDRb, it is designed for exactly the scenario you are describing.
I think it has been around for a while and is pretty mature. You can monitor the status of the workers.
It's a pretty good idea to develop on the same development platform as your production environment, especially when working with Rails. The suggestion to run Linux in a VM is a good one. Check out Sun xVM for Open Source virtualization software.
I personally use active_messaging plugin with a activemq server (stomp or rest protocol). This has been extremely stable for us, processing millions of messages a month.

Resources