How can I make sure the Sphinx daemon runs? - ruby-on-rails

I'm working on setting up a production server using CentOS 5.3, Apache, and Phusion Passenger (mod_rails). I have an app that uses the Sphinx search engine and the Thinking Sphinx gem.
According to the Thinking Sphinx docs...
If you actually want to search against
the indexed data, then you’ll need
Sphinx’s searchd daemon to be running.
This can be controlled using the
following tasks:
rake thinking_sphinx:start
rake ts:start
rake thinking_sphinx:stop
rake ts:stop
What would be the best way to ensure that this takes place in production? I can deploy my app, then manually run rake thinking_sphinx:start, but I like to set things up so that if I have to bounce the server, everything will come back up.
Should I put a call to that Rake task in an initializer? Or something in rc.local?

rc.local is a good start, but its not enough. I would pair is with a monit rule to ensure it is running AND more importantly...
Sphinx requires a full-reindex to make all the latest and greatest available. There is some doco on the thinking sphinx site about delta indexing, but if your index is small, an hourly re-index will take care of things and you do not need the delta indexing stuff.
I run this hourly to take care of this:
0 * * * * cd /var/rails/my_site/current/ && RAILS_ENV=production /usr/bin/rake ts:rebuild
Note: for deployment, I will use the built in thinking sphinx capistrano tasks:
In your Capfile add
require 'thinking_sphinx/deploy/capistrano'
I used to chain the re-indexing in the cap task but stopped cause it is really slow, when I make schema changes I will remember to run it or wait for the hourly cron job to fix it up.

I haven't done this before with Spinix, so I hope someone can give you a better answer, but you should take a look at monit. Monit is designed for keeping daemons running, just like what you need to do.
A quick Google for spinix monit turned up this link: Capistrano recipes: sphinx:monit. That would be a good place to start.

For what it's worth, I'm running
thinking_sphinx:index
... in my cron job, instead of the "rebuild" task. This does not require the searchd process to be offline, but the indices are still rotated when it's done, so new changes are picked up. I think the "rebuild" task is only necessary when you actually change your index structure in your models, which happens very rarely for me.

Related

Setting up a rake task with Resque Scheduler - Rails 4

I am on Rails 4 using the Resque Scheduler gem.
I am also using the sitemap generator gem in order to dynamically generate my sitemap.
I am having trouble figuring out the best way to schedule a rake task with resque scheduler. The sitemap generator recommends whenever, but I am assuming resque scheduler can accomplish the same thing (don't want to install another gem if I don't have to).
Does anyone know how to set this up?
I would like to run rake sitemap:refresh:no_ping every 5 hours.
I was thinking I would just schedule a background job and run it from there:
# resque_schedule.yml
update_sitemap:
every: 5h
class: "SitemapUpdater"
description: "This job refreshes the sitemap"
# sitemap_updater.rb
class SitemapUpdater
#queue = :sitemap_queue
def self.perform
# run rake task here
end
end
... however, I'm not sure if this is a good practice. Any advice would be much appreciated.
I don't see a problem with your approach, you just must be aware that the scheduler is reset during every deployment, so if you do frequent deploys, your scheduled jobs might be run later or even not run at all, as documented:
IMPORTANT: Rufus every syntax will calculate jobs scheduling time starting from the moment of deploy, resulting in resetting schedule time on every deploy, so it's probably a good idea to use it only for frequent jobs (like every 10-30 minutes), otherwise - when you use something like every 20h and deploy once-twice per day - it will schedule the job for 20 hours from deploy, resulting in a job to never be run.
You might also run the rake from system cron itself, which is an even more lightweight solution as it requires no scheduler gems at all, just the rake task, and will be scheduled reliably in time.
See e.g. this answer for setting up the "every 5 hours" frequency in crontab and you might also need to study RVM wrappers if you use RVM for your ruby project (you must call rake using the RVM wrappers in such case, e.g. call /home/deploy/.rvm/wrappers/ruby-2.3.0#mygemset/rake instead of just rake).

Run simultaneous or asynchronous tasks with Capistrano

I have a few long-running restarts of processes in my deploy.rb like:
rake assets:precompile
script/delayed_job restart
rake sunspot:solr:stop, rake sunspot:solr:start
All of these processes have to occur, but not necessarily one after another.
I was wondering if I can run the assets:precompile and the delayed_job restart simultaneously, as they don't need to happen one after another, and I could speed up my deploy time by doing them asynchronously.
I've run some Google searches but I can't find anything about it.
This is not a feature that capistrano supports.
I have been looking around for a solution and found something on the Capistrano google groups. The suggestion was to use Capistrano to run a ruby script that runs the jobs in parallel using Ruby's own threading support.
If you read the post one of the authors does ask why do these tasks need to run in parallel because you can introduce race conditions and other non-deterministic behaviour which can make the deployment process more brittle.

A pulse or cron job for Rails app running in Heroku

I need some code executed once per day. Can be more than once a day and missing a day isn't the end of the world. That code will make sure users get some bonus points based on certain criteria. I'll keep track if they've already received the bonus points so it doesn't double up..
Some simple cron job calling a particular controller once in a while is perfect:
curl http://localhost/tasks/pulse
Of course a real crontab entry works great. Or is there an internal mechanism for this kind of thing in Rails? I'm using the latest stable Rails (currently 3.2.9).
The only wrinkle is this needs to work in Heroku too.
I just noticed Heroku's Scheduler. Looks great for Heroku. I can just run those tasks in my dev/test environment manually. Is this the best way to handle pulses/cron jobs in Rails? With rake tasks? Easy to incorporate running rake tasks in tests?
The Heroku Scheduler works great and is easy to set up!
You could check out this gem called whenever its a Ruby gem that provides a clear syntax for writing and deploying cron jobs. It's well maintained, not used it on Heroku myself but worth a look.
You can do loads of stuff like
every 3.hours do
runner "MyModel.some_process"
rake "my:rake:task"
command "/usr/bin/my_great_command"
end

Use linux script to make a continuous rake task running (start, stop etc)

I have a rake task which parses a streaming API and enters data into database. The streaming API is live feed and the rake task should run continuously for the live data to enter the database. The rake task once called will run continuously and parse the data. Now i have started the rake task and it is running. The problem is that if i close the terminal or reboot the server, the rake task wil be stopped. So, i want a script in linux (something like the one used to start, or stop apache server), which does the following:
1. start the rake task by calling rake command (rake parse:stream) from the RAILS-ROOT (application directory of Rails app)
2. stop the rake task by killing the process.
3. start the rake task automatically when the server reboots.
i am not familiar to linux scripts and i dont know where to start. i am using ubuntu server. can anyone help me?
Here's an article that might help you also. It discussed various options for managing Ruby applications and their related processes:
http://michaelvanrooijen.com/articles/2011/06/08-managing-and-monitoring-your-ruby-application-with-foreman-and-upstart/
You need to run your script as a daemon. When I create this kind of startup scripts I usually make 2 files, one that stays in /etc/init.d and handles the start/stop/status/restart commands and another one that actually does the job and gets called by the first script.
Here is one solution, and although the daemon script is written in perl, you want to run some command lines only, so daemonizing a perl script could do your job easily.
If you want, there are also ruby gems for daemonizing scripts, so you can write a script in ruby that does the rake tasks.
And if you want to go hardcore, there are solutions for writing bash scripts that can daemonize, but I'm not sure I would recommend a solution like that; at least I find them pretty difficult to use.
Take a look at how Github's Resque project does it.
Essentially they create tasks for starting/restarting/stopping a particular task, in this case resque:work. Note that the restart_workers task simply invokes the other tasks, stop and start. It should be really easy to change this for what you want.

Start up required additional services (resque, redis) with `rails server` command

I would like my development environment for Rails to automatically start redis and resque (and potentially in other projects, mongod, mysql-server etc.) for me, in the following cases:
When starting up the development server rails server.
Additionally, it would be nice if the following cases detect already running services, and, if not running start them up too:
Rake rspec, rspec /spec, when running tests.
When starting up a rails console.
When shutting down the rails server, the started child-services should be shut down too.
What is the correct place for such additional startup scripts?
And how to avoid them being started in production too (where I run everything trough /etc/init.d services)?
A lot of these built-in tasks are available as rake tasks already.
You can create a master rake task that does it all.
For example, with resque, you get "rake resque:start" "rake resque:scheduler:start", etc.
You can create a generic "start" task that depends on the rest. Similarly, a "stop" task would shut everything down.
So you would do:
rake start # starts all associated processes
rake stop # stops them all
This is also very use to use from Capistrano, when you end up deploying your code somewhere else. Rake tasks are very easy to call from Capistrano.
I think it's really better to do that in some external script. Do it in your rails server command can be really annoying to anyone to try your code.
By example, in one year, a nez developper come to your project. He can be desoriented if your rails server commande launch a such of other application in background.
In same idea, if you do that you need maintain your code in your rails env. Can be a little tricky. Maintain an independant script can be more usefull.
You can add your script in script directory. That be a good pratice. But not when you launch a command with a manual who do not that.

Resources