Is changing `require` to concatenate Ruby files brilliant or crazy? - ruby-on-rails

When booting a Rails application with lots of dependencies, a lot of time is spent (I think) in requireing files.
Suppose you were to create a deploy process that converted all require statements to file concatenations, using the same rules (don't get the same file twice, etc). Essentially, it would treat Ruby the way the asset pipeline treats javascript.
Would this make a real speed difference? Would it create any issues - for instance, with variable scope - other than making it harder to trace errors to their original source files?
In short, is this brilliant or crazy?
Update
As pst points out, this would be pointless in production, where the server likely loads everything once, then forks to handle new requests.
But consider the test environment, where you boot your Rails app every time you run your tests. Pre-concatenating all your gems could have an effect similar to the Spork gem.
I suppose my real question is how much time is spent in require vs parsing the contents of the files.

You'll be happy to see what made it into Ruby 2.0:
http://bugs.ruby-lang.org/issues/7158

tldr; it will make no difference amortized across requests1 - any trivial start up cost is inconsequential2.
1 A much better way to "increase performance" is just to reuse processes - e.g. only load a process once for N requests (which implies only "running" the require statements once) - as is already done.
2 For those who are really interested in if "it will parse faster", please run a benchmark. Then realize that it doesn't matter - even saving a second on start up is of no importance to a web-server infrastructure. (Of course, it would be only milliseconds faster - from a few additional disk seeks - if at all.)

Related

Browse a Rails App with the DB in Sandbox Mode?

I'm writing a lot of request specs right now, and I'm spending a lot of time building up factories. It's pretty cumbersome to make a change to a factory, run the specs, and see if I forgot about any major dependencies in my data. Over and over and over...
It makes me want to set up some sort of sandboxed environment, where I could browse the site and refresh the database from my factories at will. Has anyone ever done this?
EDIT:
I'm running spork and rspec-guard to make this easier, but I still lose a lot of time.
A large part of that time is spent waiting for Capybara/FireFox to spin up. These are request specs, and quite often there are some JavaScript components that need to be exercised as well.
You might look at a couple of solutions first:
You can run specific test files rather than the whole suite with something like rspec spec/request/foo_spec.rb. You can run a specific test with the -e option, or by appending :lineno to the filename, where lineno is the line number the test starts on.
You can use something like guard-rspec, watchr, or autotest to automatically run tests when their associated files change.
Tools like spork and zeus can be used to preload the app environment so that test suite runs take less time to run. However, I don't think this will reload factories so they may not apply here.
See this answer for ways that you can improve Rails' boot time. This makes running the suite substantially less painful.

Do Rails/Haml/Sass execute slower than Node/Express/Jade/Stylus?

I'm not sure if it is my imagination but this is what I've experienced:
When I use SASS (--watch), save the .sass file and switch to the browser very fast (1~2 seconds) sometimes the changes are not being reflected. With jade/stylus the changes don't delay at all.
I noticed that it takes more time to install a gem than a node module.
Starting an Node.js/Express.js server takes me like 1 second. Starting a Rails server takes me like 3~4 seconds.
Also noticed that node frameworks (e.g. Express.js) generate files faster.
Now I'm not sure if it is because Node.js/Express.js are younger projects and have fewer features or it is because Node.js is actually faster?
(I'm using Ubuntu 11.10 with an AMD CPU).
The short answer is: Ruby is slow. It's a huge trouble for rubyists and is a purpose for many hot discussions like "does something faster than Ruby?".
Javascript runs fast even on virtual machine. It is even faster than native Ruby interpreter.
But i prefer Ruby and Ruby on Rails because of its much more stable library set, huge community and because of Ruby's syntax. It's really sweet =)

How to architect Rails site that can be edited while running?

I am writing a Rails app that "scrapes/navigates" some other websites and webservices for content. I am using Mechanize and Savon to do the heavylifting.
But given the dynamic nature of the web, I'd like to make my calls to these editable by the admin users of the site - rather than requiring me to release a new version of the site.
The actual scraping thread happens async to the website, using the daemons gem.
My requirements are:
Thinking that the scraping/webservice calling code is quite simple, the easiest route is to make the whole class editable by the admins.
Keep a history of the scraping code - so that we can fairly easily revert if we introduce a problem.
Initially use the code from the file system, but as soon as thats been edited and stored somewhere, to use that code instead.
I am thinking my options are:
Store the code in the db (with a history table for the old versions)
Store the code in a private git repo somewhere and access that for the history/latest versions.
I am thinking the git route might be easiest, given its raison d'etre is to track file history...
But perhaps there is a gem/plugin that does all this for me, out of the box?
Thanks in advance for any tips/advice.
~chris
I really hope you aren't doing something like what's talked about here...
Assuming you are doing a proper mixin, there used to be a gem called "acts_as_versioned" which would do something like you want. It's been a while so I don't know if it's been turned into a plugin or if it's been abandoned. Essentially the process it uses was to provide a combination key for your versioned table.
Your database would have a structure like this:
Key column (id for the record)
Version column (id for the record's version)
All the record attributes
Let's say you had a table for your scripts, and the script you wanted has three versions. Your table would have the following records:
123, 3, '#Be good now'
123, 2, 'puts "Hi"'
123, 1, '#Do not be bad'
Getting the most recent version would be as simple as
Scripts.find :first, :conditions=>{:id=>123}, :order=>"version desc"
Rolling back would be as simple as removing the most recent version, or having another table with a pointer to the active version. It's up to you.
You are correct in that git, subversion, mercurial and company are going to be much better at this. To provide support, you just follow these steps:
Check out the script on the server (using a tag so you can manage what goes there at any time)
Set up a cron job to check out the new script periodically (like every six hours or whatever you feel comfortable with)
The daemon you have for running the script should run the new version automatically.
IF your site is already under source control, and IF you're running under mod_rails/passenger, you could follow this procedure:
edit scraping code
commit change locally
touch yourapp/tmp/restart.txt
that should give you history of the change and you shouldn't have to re-deploy.
A bit safer, but not sure if it's possible for you is on a test/developement server: make change, commit locally, test it, then on production server, git pull then touch tmp/restart.txt
I've written some big spiders and page analyzers in the past, and one of the things to keep in mind is what code is providing what service to the entire application.
Rails is providing the presentation of the data being gathered by your spidering engine. The presentation is one side of the coin, and spidering is the other, and they should be two separate code bases, tied together by some data-sharing mechanism, which, in your case, is the database. The database gives you some huge advantages as does having Rails available, when your spidering code is separate. It sounds like you have some separation already, but I'd recommend creating a wider gap. With that in mind, here's how I've done it before, and what I'd do now.
Previously, I had a separate app for my spidering that was spawning multiple spider tasks. Each task would look at a bunch of different URLs, throw their results in the database, then quit. Each time one quit the main app would spawn another spider to process more URLs. Each loop, the main app checked a YAML configuration file for run-time parameters, like how many sub-tasks it should have running, how many URLs they'd get, how long they'd wait for connections, etc. It stored the last modification date of the config file each time it loaded it so, if I made a change to the file, the app would sense it in a reasonably short time, reread the file, and adjust its behavior.
All state information about the URLs/pages/sites being scraped/spidered, was kept in the database so I could check on its progress. I could see how many had been processed or remained in the queue, the various result codes, and the content being returned. If I didn't like something I could even tweak the filters to skip junk pages, knowing the spidering tasks would be updated in a few minutes.
That system worked extremely well, spidered a major customer's series of websites without a glitch, running for several weeks as I added new sites to the list. (We were helping one of the Fortune 50 companies improve their sites, and every site had been designed and implemented by a different team, making every site completely different. My code had to be flexible and robust; I was really happy with how it worked out.)
To change it, these days I'd use a database table to hold all the configuration info. That way I could easily build an admin form, and let someone else inherit the task of adjusting the app's runtime configuration. The spider tasks would also be written so they'd pull their configuration from the database, rather than inherit it from the main app. I originally had the main app do all the administration and pass the config info to the spidering apps because I wanted to keep the number of connections to the database as low as possible. I was using Postgres and now know it could have easily handled the load, so by letting the individual tasks handle their configuration I could have made it more responsive.
By making the spidering engine separate from the presentation engine it was possible to temporarily stop one or the other without affecting the progress of the spidering job. Once I had the auto-reload of the prefs in place I don't think I had to stop the spidering engine, I just adjusted its prefs. It literally ran for weeks without stopping and we eventually pulled the plug because we had enough data for our needs.
So, I'd recommend tweaking your code so your spidering engine doesn't rely on Rails, instead it will be fired off by cron or a separate scheduling app. If you have to temporarily stop Rails your engine will run anyway. If you have to temporarily stop the engine then Rails can continue serving pages. The database sits between the two acting as the glue.
Of course, if the database goes down you're hosed all the way around, but what else is new? :-)
EDIT: Chris said:
"I see your point about the splitting the code out, though my Ruby-fu is low - not sure how far I can separate things without having to have copies of the ActiveModel/migrations stuff, plus some shared model classes."
If we look at your application as spider engine <--> | <-- database --> | <--> Rails/MVC/presentation, where the engine and Rails separately read and write to the database, and look at what each does well, that helps figure out how to break them into separate code bases.
Rails is designed to handle migrations, so let it. There's no reason to reinvent that wheel. But, how often do you do migrations, and what is effected when you do? You do them seldom once the application is stable, and, at that point you'd do them in a maintenance cycle to tweak the database. You can shut down the spidering engine and the web interface for a few minutes, migrate the database, then bring things up and you're off and running. Migrations are a necessary evil, but are hardly show-stoppers once in production. Most enterprises have "Software Sunday", or some pre-announced window of maintenance, so do the same.
ActiveRecord, modeling and associations are pretty easy to deal with too. The models are in a file that is required internally by Rails already, so the spidering engine can inherit the database know-how that way too; Multiple apps/scripts can use the same model file. You don't see the Rails books talk about it much, but ActiveRecord is actually pretty easy to use outside of Rails. Search the googles for activerecord without rails for more info.
You can pull in ActiveSupport also if you want some of its extensions to classes by doing a regular require, but the Rails "view" and "controller" logic, which normally applies to presenting the web interface, shouldn't be needed at all in the engine.
Business logic, which goes in the controllers in Rails could even be refactored into separate methods that get required by the Rails side of things and by the spidering engine. It's a different way of looking at Rails but falls in line with the "DRY" mantra - don't repeat yourself, so make things modular and require (or require_relative) bits and pieces that are the building blocks of the entire system.
If you don't want a totally separate codebase, you can take advantage of Rail's script runner, which gives a script access to the ActiveRecord::Base and ActiveRecord::Associations and ActiveSupport. Do a rails runner -h from your app's main directory, or search for "rails runner" for more info. runner is not good for a job that starts and runs many times an hour, because Rail's startup cost is high. But, if you have a long-running task, say one that runs in parallel with your rails app, then it's a great choice. I'd give it serious consideration for the spidering side of your application. Eventually you might want to break the spidering-engine out to a separate host so the presentation side has a dedicated host, so runner will help you buy time and do it in small steps.

JRuby-friendly method for parallel-testing Rails app

I am looking for a system to parallelise a large suite of tests in a Ruby on Rails app (using rspec, cucumber) that works using JRuby. Cucumber is actually not too bad, but the full rSpec suite currently takes nearly 20 minutes to run.
The systems I can find (hydra, parallel-test) look like they use forking, which isn't the ideal solution for the JRuby environment.
We don't have a good answer for this kind of application right now. Just recently I worked on a fork of spork that allows you to keep a process running and re-run specs or features in it, provided you're using an app framework that supports code reloading (like Rails). Take a look at the jrubyhub application for an example of how I use Spork.
You might be able to spawn a spork instance for your application and then send multiple, threaded requests to it to run different specs. But then you're relying on RSpec internals to be thread-safe, and unfortunately I'm pretty sure they're not.
Maybe you could take my code as a starting point and build a cluster of spork instances, and then have a client that can distribute your test suite across them. It's not going to save memory and will still take a long time to start up, but if you start them all once and just re-use them for repeated runs, you might make some gains in efficiency.
Feel free to stop by user#jruby.codehaus.org or #jruby on freenode for more ideas.

Load Ruby on Rails models without loading the entire framework

I'm looking to create a custom daemon that will run various database tasks such as delaying mailings and user notifications (each notice is a separate row in the notifications table). I don't want to use script/runner or rake to do these tasks because it is possible that some of the tasks only require the create of one or two database rows or thousands of rows depending on the task. I don't want the overhead of launching a ruby process or loading the entire rails framework for each operation. I plan to keep this daemon in memory full time.
To create this daemon I would like to use my models from my ruby on rails application. I have a number of rails plugins such as acts_as_tree and AASM that I will need loaded if I where to use the models. Some of the plugins I need to load are custom hacks on ActiveRecord::Base that I've created. (I am willing to accept removing or recoding some of the plugins if they need components from other parts of rails.)
My questions are
Is this a good idea?
And - Is this possible to do in a way that doesn't have me manually including each file in my models and plugins?
If not a good idea
What is a good alternative?
(I am not apposed to doing writing my own SQL queries but I would have to add database constraints and a separate user for the daemon to prevent any stupid accidents. Given my lack of familiarity with configuring a database, I would like to use active record as a crutch.)
It sounds like your concern is that you don't want to pay the time- or memory- cost to spin up the rails stack every time your task needs to be run? If you plan on keeping the daemon running full-time, as you say, you can just daemonize a process that has loaded your rails stack and will only have to pay that memory- or time-related penalty for loading the stack one time, when the daemon starts up.
Async_worker is a good example of this sort of pattern: It uses beanstalk to pass messages to one or more worker processes that are each just daemons that have loaded the full rails stack.
One thing you have to pay attention to when doing this is that you'll need to restart your daemonized processes upon a deploy so they can reload your updated rails stack. I'm using this for a url-shortener app (the single async worker process I have running sits around waiting to save referral data after the visitor gets redirected), and it works well, I just have an after:deploy capistrano task that restarts any async worker(s).
You can load up one aspect of Rails such as ActiveRecord but when you get right down to it the cost of loading the entire environment is not much more than just loading ActiveRecord itself. You could certainly just not include aspects like ActionMailer or some of the side bits but I'm going to guess that you're not going to see much win out of it.
What I would suggest instead is either running through runner/console like you said you didn't want to but rather than bootstrapping each time, try to batch things so that you're doing 1000 at a time instead of 1. There are a lot of projects that use this style, some of the bulk mailers spring to mind if you want examples. DJ (delayed_job) does similar by storing a bit in the database saying that this code needs to be run at some point in the future using the environment stack but it tries to batch together as much as it can so you may get win from that.
The other option is to have a persistent mini-rails app with as much stripped out as possible so that the memory usage is lower which can listen for requests and do your bidding when you want it to. This would be more memory but the latency of bootstrapping would be essentially nullified.
Lastly, as an afterthought, this would be a great use for Postgres.

Resources