I'm using eval to run some code (that is in a database, there are no ruby files) but this code requires some gems. How would I go about running the code? Maybe there's a better way than eval?
To give a bit more context, I have a toggle button in the view that switches a boolean to true or false in the model. This is possible for every "piece of code".
When it gets switched to true, the code starts running in a thread that never stops, and when it's switched to false this kills the thread.
I'm just trying to make the thread run the code right now.
I'm pretty new to rails so maybe there's a much better way than doing it by hand like what I'm doing, but I've tried googling some typical threading stuff and it's used for sending mail or other such things. Not for something that never stops unless told to (ie toggle the button that switches the boolean to false).
Thanks in advance.
It sounds like you need the code to be prefixed with gem imports -- which probably requires a bundler environment to load those gems in the first place.
Since the code is stored in the database directly, you could prefix them with another column / constant that does all the gem loading for you.
If the gems are already installed via bundler on the server, try prefixing your code with bundle exec. To do this, you'll need to write the code to a temporary file location first.
To install the proper gems, that's part of your build/deploy processes.
To run a script in the rails environment, which sounds like what you need, you can use rails runner. E.g. something like:
rails runner lib/scripts/my_script_to_run.rb
This script would then get the required code from the database, and already have the correct (rails) context to be able to run the code.
Or maybe even more appropriate:
rails runner <your-ruby-code-here>
See documentation
An alternative is to look at background-jobs, which seem imho a better fit to this problem. Check out the guides for ActiveJob : http://edgeguides.rubyonrails.org/active_job_basics.html
The advantage of using background-jobs:
it is a proven way of working
some adapters include extra managerial tools
you do not have to manage threads yourself
there is a clean way to describe jobs in code and queue jobs
using existing concepts/solutions is also easier to explain to other developers
There is a variety of adapters available and some of the easiest just store the jobs in the current database. For a full list see http://edgeapi.rubyonrails.org/classes/ActiveJob/QueueAdapters.html, I would recommend starting with Que or DelayedJob. Your flow would change a little in that case: I would queue a job, and instead of looping, I would requeue as long as the toggle is not switched. But of course: I do not know your precise use-case so your approach could be equally valid or better.
Related
I have need to periodically run a background task in my rails app. I've done some research and it seems that there is not a standard way to do this.
I don't want to generate a new app context so a runner seems to be out of the question. I'd prefer not to run my own thread handler if I don't have to. (It looks like this may be the most promising option at this point other than the below).
Using the whenever gem I am doing the following in schedule.rb:
every 15.minutes do
command "curl http://127.0.0.1:3000/periodic"
end
This feels a little like a hack, but I can't find anything that is more straightforward than this -- Everything else I can find seems like a lot of effort for something seemingly basic.
I should note that this is an internal project and will be behind a firewall, and headless automated middleware, so there is low expectation that a user will ever be accessing this project directly.
Am I missing something or is background processing or timed events really just not supported out of the box with rails?
After some research, Amadan's suggestion of a railtie may get me what I want here. As an ideal solution it isn't perfect, though depending on the context (i.e. a publicly accessible rails project) this is the way to go. For the project I am working on it's a little heavy handed, as I'm doing a rails app that is a bridge between two systems, running headless, with no user input behind a firewall.
I'm am going to set up some functionality for my app that is Rails 3.2.3 and on Heroku. The idea is to have a task, or job (or whatever you want to call it) run every day, to make sure user information from the external API is up to date with the user information in my db. I'm curious what is the the best way to set this up? Should it be a cron job that runs a rake task?
Seems like there are quite a few ways to do this and I'm interested in the ways others are doing this. The only way I can think to do it is to run a rake task in a cron job, but would love to figure out what best practices are, or the most simple way to do it. Seems like there are a lot of ways to skin this cat... lots of different tools out there too.
If there was a pure rails way to do this, I think that would be better so I don't have to screw around with every system I place my app onto.
For a simple sync job that runs once a day, I believe having a cronjob would be sufficient and likely more stable in the long run.
Honestly, solutions such as Resque and Sidekiq is a bit overkill in my opinion (for your needs). You're still required to use a scheduler to send messages to these systems.
Check out the gem 'whenever' if you're looking at making the deployment and writing of crontabs easier: https://github.com/javan/whenever/
Railscasts regarding 'whenever': http://railscasts.com/episodes/164-cron-in-ruby
There are two options. They're better than options you mentioned in your question
Resque.
Sidekiq.
Try the later one. It is faster, lightweight and based on multithreading so there isn't interference with system. You'll need to look into scheduler of both the gem for processing everyday.
Hope this helps!
Use the Heroku scheduler add on to the handle scheduling itself. You can have it run a rake task, resque, or whatever.
Here is a few to choose from :
resque (with resque-scheduler. But you have to use redis with it)
rufus-scheduler ( if you want something simple, resque uses rufus-scheduler itself)
You may try delayed_job with a few tricks like this one. Not that great for scheduling but can use your application database.
I'm a bit overwhelmed by mere amount a possible solutions the Rails community has created for my problem. So perhaps anyone can help me to figure out how to solve it best.
What I want to do is to write a Rails app that behaves kind of "dropbox". On the one hand it should be a web interface where I can upload and download files to my web server. This interacts with my database and all that stuff. On the other hand I have SSH access to that server and can put files there manually. Now I want this file system actions to trigger my Rails app to do the things it would do if I'd created the file via the web interface.
So I somehow write a daemon, right? There are a lot of solutions, like
daemons.rubyforge.org/
github.com/mirasrael/daemons-rails
github.com/costan/daemonz
github.com/kennethkalmer/daemon-kit
Another feature that I would like to have, is that my Rails app automatically spawns and stops my daemon as start or quit my Rails app resp. So "daemonz" seems the best solution. But as I googled further I found
github.com/FooBarWidget/daemon_controller/
which seems a lot more "high tech" and already used as I deploy with passenger. But I don't understand if it kills my daemons as I quit Rails. I suppose that is not the case and so I wonder how to implement this in my app.
The way to implement a "thing" to react to file system changes seems straight forward for me. I'd use
github.com/guard/listen/
(an alternative would be: github.com/ttilley/fssm )
But what I don't understand as this the first time I'm really faced with this protocol things is, if this spawns a server I'm able to communicate with or what kind of object I have to deal with.
The last thing, I would like to implement is a kind of worker queue so that the listening for file system changes is seperated from the the actions of my rails app. But there are so many solutions that I'm totally overwhelmed to pick one:
github.com/tobi/delayed_job/
github.com/defunkt/resque
http://backgroundrb.rubyforge.org/
And what is
http://godrb.com/
all about? How could that help me?
Has anyone hints how to solve this? Thanks a lot!
Jan
P.S. I'd like to post links to all the github projects but unfortunately I don't have enough 'reputation'
I'd definitely look into creating a process (daemon) that monitors the relevant directory. Then your Rails app can just put files into it without having to know anything about the back end, and it'll work with SSH too.
Your daemon can load the Rails environment & communicate with your database. I'd leave all the communication between them at that level.
As for making it start/stop with your rails app...are you sure? I use god (the ruby gem) to start/monitor processes. It will "daemonize" your Ruby app for you, too. If you want to, you can actually tell god to stop your directory-monitor process & then exit when Rails stops. And you can fire off god from a Rails initializer.
However, if you might find yourself using SSH or some other means to put files into that directory when rails is not running, you might look into putting a script into /etc/init.d to automatically start god when the server boots up.
HTH
I think you want something like Guard for monitoring the changes on the filesystem and performing actions when changes occur.
As for god, you should definitely look into it. It will make starting/stopping processes you depend on considerably easier. We used Bluepill for a while, but there are so many bugs, we ditched it and moved to God, which IMHO is a lot more pleasant to work with, for the mostpart.
Have you tried creating a script file eg:
startDaemon.rb
And then placing it:
config/initializers/
?
I am writing a Rails app that "scrapes/navigates" some other websites and webservices for content. I am using Mechanize and Savon to do the heavylifting.
But given the dynamic nature of the web, I'd like to make my calls to these editable by the admin users of the site - rather than requiring me to release a new version of the site.
The actual scraping thread happens async to the website, using the daemons gem.
My requirements are:
Thinking that the scraping/webservice calling code is quite simple, the easiest route is to make the whole class editable by the admins.
Keep a history of the scraping code - so that we can fairly easily revert if we introduce a problem.
Initially use the code from the file system, but as soon as thats been edited and stored somewhere, to use that code instead.
I am thinking my options are:
Store the code in the db (with a history table for the old versions)
Store the code in a private git repo somewhere and access that for the history/latest versions.
I am thinking the git route might be easiest, given its raison d'etre is to track file history...
But perhaps there is a gem/plugin that does all this for me, out of the box?
Thanks in advance for any tips/advice.
~chris
I really hope you aren't doing something like what's talked about here...
Assuming you are doing a proper mixin, there used to be a gem called "acts_as_versioned" which would do something like you want. It's been a while so I don't know if it's been turned into a plugin or if it's been abandoned. Essentially the process it uses was to provide a combination key for your versioned table.
Your database would have a structure like this:
Key column (id for the record)
Version column (id for the record's version)
All the record attributes
Let's say you had a table for your scripts, and the script you wanted has three versions. Your table would have the following records:
123, 3, '#Be good now'
123, 2, 'puts "Hi"'
123, 1, '#Do not be bad'
Getting the most recent version would be as simple as
Scripts.find :first, :conditions=>{:id=>123}, :order=>"version desc"
Rolling back would be as simple as removing the most recent version, or having another table with a pointer to the active version. It's up to you.
You are correct in that git, subversion, mercurial and company are going to be much better at this. To provide support, you just follow these steps:
Check out the script on the server (using a tag so you can manage what goes there at any time)
Set up a cron job to check out the new script periodically (like every six hours or whatever you feel comfortable with)
The daemon you have for running the script should run the new version automatically.
IF your site is already under source control, and IF you're running under mod_rails/passenger, you could follow this procedure:
edit scraping code
commit change locally
touch yourapp/tmp/restart.txt
that should give you history of the change and you shouldn't have to re-deploy.
A bit safer, but not sure if it's possible for you is on a test/developement server: make change, commit locally, test it, then on production server, git pull then touch tmp/restart.txt
I've written some big spiders and page analyzers in the past, and one of the things to keep in mind is what code is providing what service to the entire application.
Rails is providing the presentation of the data being gathered by your spidering engine. The presentation is one side of the coin, and spidering is the other, and they should be two separate code bases, tied together by some data-sharing mechanism, which, in your case, is the database. The database gives you some huge advantages as does having Rails available, when your spidering code is separate. It sounds like you have some separation already, but I'd recommend creating a wider gap. With that in mind, here's how I've done it before, and what I'd do now.
Previously, I had a separate app for my spidering that was spawning multiple spider tasks. Each task would look at a bunch of different URLs, throw their results in the database, then quit. Each time one quit the main app would spawn another spider to process more URLs. Each loop, the main app checked a YAML configuration file for run-time parameters, like how many sub-tasks it should have running, how many URLs they'd get, how long they'd wait for connections, etc. It stored the last modification date of the config file each time it loaded it so, if I made a change to the file, the app would sense it in a reasonably short time, reread the file, and adjust its behavior.
All state information about the URLs/pages/sites being scraped/spidered, was kept in the database so I could check on its progress. I could see how many had been processed or remained in the queue, the various result codes, and the content being returned. If I didn't like something I could even tweak the filters to skip junk pages, knowing the spidering tasks would be updated in a few minutes.
That system worked extremely well, spidered a major customer's series of websites without a glitch, running for several weeks as I added new sites to the list. (We were helping one of the Fortune 50 companies improve their sites, and every site had been designed and implemented by a different team, making every site completely different. My code had to be flexible and robust; I was really happy with how it worked out.)
To change it, these days I'd use a database table to hold all the configuration info. That way I could easily build an admin form, and let someone else inherit the task of adjusting the app's runtime configuration. The spider tasks would also be written so they'd pull their configuration from the database, rather than inherit it from the main app. I originally had the main app do all the administration and pass the config info to the spidering apps because I wanted to keep the number of connections to the database as low as possible. I was using Postgres and now know it could have easily handled the load, so by letting the individual tasks handle their configuration I could have made it more responsive.
By making the spidering engine separate from the presentation engine it was possible to temporarily stop one or the other without affecting the progress of the spidering job. Once I had the auto-reload of the prefs in place I don't think I had to stop the spidering engine, I just adjusted its prefs. It literally ran for weeks without stopping and we eventually pulled the plug because we had enough data for our needs.
So, I'd recommend tweaking your code so your spidering engine doesn't rely on Rails, instead it will be fired off by cron or a separate scheduling app. If you have to temporarily stop Rails your engine will run anyway. If you have to temporarily stop the engine then Rails can continue serving pages. The database sits between the two acting as the glue.
Of course, if the database goes down you're hosed all the way around, but what else is new? :-)
EDIT: Chris said:
"I see your point about the splitting the code out, though my Ruby-fu is low - not sure how far I can separate things without having to have copies of the ActiveModel/migrations stuff, plus some shared model classes."
If we look at your application as spider engine <--> | <-- database --> | <--> Rails/MVC/presentation, where the engine and Rails separately read and write to the database, and look at what each does well, that helps figure out how to break them into separate code bases.
Rails is designed to handle migrations, so let it. There's no reason to reinvent that wheel. But, how often do you do migrations, and what is effected when you do? You do them seldom once the application is stable, and, at that point you'd do them in a maintenance cycle to tweak the database. You can shut down the spidering engine and the web interface for a few minutes, migrate the database, then bring things up and you're off and running. Migrations are a necessary evil, but are hardly show-stoppers once in production. Most enterprises have "Software Sunday", or some pre-announced window of maintenance, so do the same.
ActiveRecord, modeling and associations are pretty easy to deal with too. The models are in a file that is required internally by Rails already, so the spidering engine can inherit the database know-how that way too; Multiple apps/scripts can use the same model file. You don't see the Rails books talk about it much, but ActiveRecord is actually pretty easy to use outside of Rails. Search the googles for activerecord without rails for more info.
You can pull in ActiveSupport also if you want some of its extensions to classes by doing a regular require, but the Rails "view" and "controller" logic, which normally applies to presenting the web interface, shouldn't be needed at all in the engine.
Business logic, which goes in the controllers in Rails could even be refactored into separate methods that get required by the Rails side of things and by the spidering engine. It's a different way of looking at Rails but falls in line with the "DRY" mantra - don't repeat yourself, so make things modular and require (or require_relative) bits and pieces that are the building blocks of the entire system.
If you don't want a totally separate codebase, you can take advantage of Rail's script runner, which gives a script access to the ActiveRecord::Base and ActiveRecord::Associations and ActiveSupport. Do a rails runner -h from your app's main directory, or search for "rails runner" for more info. runner is not good for a job that starts and runs many times an hour, because Rail's startup cost is high. But, if you have a long-running task, say one that runs in parallel with your rails app, then it's a great choice. I'd give it serious consideration for the spidering side of your application. Eventually you might want to break the spidering-engine out to a separate host so the presentation side has a dedicated host, so runner will help you buy time and do it in small steps.
I have a Rails application that unfortunately after a request to a controller, has to do some crunching that takes awhile. What are the best practices in Rails for providing feedback or progress on a long running task or request? These controller methods usually last 60+ seconds.
I'm not concerned with the client side... I was planning on having an Ajax request every second or so and displaying a progress indicator. I'm just not sure on the Rails best practice, do I create an additional controller? Is there something clever I can do? I want answers to focus on the server side using Rails only.
Thanks in advance for your help.
Edit:
If it matters, the http request are for PDFs. I then have Rails in conjunction with Ruport generate these PDFs. The problem is, these PDFs are very large and contain a lot of data. Does it still make sense to use a background task? Let's assume an average PDF takes about one minute to two minutes, will this make my Rails application unresponsive to any other server request during this time?
Edit 2:
Ok, after further investigation, it seems my Rails application is indeed unresponsive to any other HTTP requests after a request comes in for a large PDF. So, I guess the question now becomes: What is the best threading/background mechanism to use? It must be stable and maintained. I'm very surprised Rails doesn't have something like this built in.
Edit 3:
I have read this page: http://wiki.rubyonrails.org/rails/pages/HowToRunBackgroundJobsInRails. I would love to read about various experiences with these tools.
Edit 4:
I'm using Passenger Phusion "modrails", if it matters.
Edit 5:
I'm using Windows Vista 64 bit for my development machine; however, my production machine is Ubuntu 8.04 LTS. Should I consider switching to Linux for my development machine? Will the solutions presented work on both?
The Workling plugin allow you to schedule background tasks in a queue (they would perform the lengthy task). As of version 0.3 you can ask a worker for its status, this would allow you to display some nifty progress bars.
Another cool feature with Workling is that the asynchronous backend can be switched: you can used DelayedJobs, Spawn (classic fork), Starling...
I have a very large volume site that generates lots of large CSV files. These sometimes take several minutes to complete. I do the following:
I have a jobs table with details of the requested file. When the user requests a file, the request goes in that table and the user is taken to a "jobs status" page that lists all of their jobs.
I have a rake task that runs all outstanding jobs (a class method on the Job model).
I have a separate install of rails on another box that handles these jobs. This box just does jobs, and is not accessible to the outside world.
On this separate box, a cron job runs all outstanding jobs every 60 seconds, unless jobs are still running from the last invocation.
The user's job status page auto-refreshes to show the status of the job (which is updated by the jobs box as the job is started, running, then finished). Once the job is done, a link appears to the results file.
It may be too heavy-duty if you just plan to have one or two running at a time, but if you want to scale... :)
Calling ./script/runner in the background worked best for me. (I was also doing PDF generation.) It seems like the lowest common denominator, while also being the simplest to implement. Here's a write-up of my experience.
A simple solution that doesn't require any extra Gems or plugins would be to create a custom Rake task for handling the PDF generation. You could model the PDF generation process as a state machine with states such as submitted, processing and complete that are stored in the model's database table. The initial HTTP request to the Rails application would simply add a record to the table with a submitted state and return.
There would be a cron job that runs your custom Rake task as a separate Ruby process, so the main Rails application is unaffected. The Rake task can use ActiveRecord to find all the models that have the submitted state, change the state to processing and then generate the associated PDFs. Finally, it should set the state to complete. This enables your AJAX calls within the Rails app to monitor the state of the PDF generation process.
If you put your Rake task within your_rails_app/lib/tasks then it has access to the models within your Rails application. The skeleton of such a pdf_generator.rake would look like this:
namespace :pdfgenerator do
desc 'Generates PDFs etc.'
task :run => :environment do
# Code goes here...
end
end
As noted in the wiki, there are a few downsides to this approach. You'll be using cron to regularly create a fairly heavyweight Ruby process and the timing of your cron jobs would need careful tuning to ensure that each one has sufficient time to complete before the next one comes along. However, the approach is simple and should meet your needs.
This looks quite an old thread. However, what I have down in my app, which required to run multiple Countdown Timers for different pages, was to use Ruby Thread. The timer must continue running even if the page was closed by users. Ruby makes it easy to write multi-threaded programs with the Thread class. Ruby threads are a lightweight and efficient way to achieve parallelism in your code. I hope this will help other wanderers who is looking to achieve background: parallelism/concurrent services in their app. Likewise Ajax makes it a lot easier to call a specific Rails [custom] action every second.
This really does sound like something that you should have a background process running rather than an application instance(passenger/mongrel whichever you use) as that way your application can stay doing what it's supposed to be doing, serving requests, while a background task of some kind, Workling is good, handles the number crunching. I know that this doesn't deal with the issue of progress, but unless it is absolutely essential I think that is a small price to pay.
You could have a user click the action required, have that action pass the request to the Workling queue, and have it send some kind of notification to the user when it is completed, maybe an email or something. I'm not sure about the practicality of that, just thinking out loud, but my point is that it really seems like that should be a background task of some kind.
I'm using Windows Vista 64 bit for my
development machine; however, my
production machine is Ubuntu 8.04 LTS.
Should I consider switching to Linux
for my development machine? Will the
solutions presented work on both?
Have you considered running Linux in a VM on top of Vista?
I recommend using Resque gem with it's resque-status plug-in for your heavy background processes.
Resque
Resque is a Redis-backed Ruby library for creating background jobs,
placing them on multiple queues, and processing them later.
Resque-status
resque-status is an extension to the resque queue system that provides
simple trackable jobs.
Once you run a job on a Resque worker using resque-status extension, you will be able to get info about your ongoing progresses and ability to kill a specific process very easily. See examples:
status.pct_complete #=> 0
status.status #=> 'queued'
status.queued? #=> true
status.working? #=> false
status.time #=> Time object
status.message #=> "Created at ..."
Also resque and resque-status has a cool web interface to interact with your jobs which is so cool.
There is the brand new Growl4Rails ... that is for this specific use case (among others as well).
http://www.writebetterbits.com/2009/01/update-to-growl4rails.html
I use Background Job (http://codeforpeople.rubyforge.org/svn/bj/trunk/README) to schedule tasks. I am building a small administration site that allows Site Admins to run all sorts of things you and I would run from the command line from a nice web interface.
I know you said you were not worried about the client side but I thought you might find this interesting: Growl4Rails - Growl style notifications that were developed for pretty much what you are doing judging by the example they use.
I've used spawn before and definitely would recommend it.
Incredibly simple to set up (which many other solutions aren't), and works well.
Check out BackgrounDRb, it is designed for exactly the scenario you are describing.
I think it has been around for a while and is pretty mature. You can monitor the status of the workers.
It's a pretty good idea to develop on the same development platform as your production environment, especially when working with Rails. The suggestion to run Linux in a VM is a good one. Check out Sun xVM for Open Source virtualization software.
I personally use active_messaging plugin with a activemq server (stomp or rest protocol). This has been extremely stable for us, processing millions of messages a month.