Rails app deploy using delayed job serving seemingly deleted files - ruby-on-rails

I have a rails 4.2.3 app running on an ubuntu server with nginx, postgres, and puma and I'm using Capistrano for deployments.
Users of my app can send bulk emails out using several different email templates I provide and making use of the Delayed_Job gem. Recently, I updated two of the email templates and I've found that occasionally (maybe 20 times out of 500) an old version of the email template is sent out rather than the current version. I've combed through my application code and I'm satisfied that the old version of the template no longer exists anywhere in the application.
What's more, users are able to edit the template by providing their own body text, and, when an old version of an email template is sent, it is sent with the correct body text that the user specifies. It's as if there's an old version of my app running on the server that occasionally commandeers the sending of an email.
Is it possible that somehow as I update my deployment using Capistrano, the old application process remains running and sometimes starts working off the delayed_job queue? Capistrano only keeps the 5 most recent versions of my application though, and none of them have the old email template that is being used. So if this was the case the old application process would need to be entirely kept in memory--so this doesn't seem possible.
Anyone have any ideas I can pursue? I'm stumped as to what could be causing this (or how this problem is even possible). Thanks so much for any help!
(PS: emails make use of the premailer gem, though I don't see how that could be involved)

The answer to this question was yes, "Is it possible that somehow as I update my deployment using Capistrano, the old application process remains running and sometimes starts working off the delayed_job queue?"
I was very surprised to find that, when you start a delayed job process, it serializes / saves into memory everything it needs to complete its tasks. A.K.A it saves / creates a new instance of the whole application, for practical purposes. Old delayed job processes weren't being properly killed when I pushed changes to my app, so I had something like 20 delayed job processes going at once. For the most part, the most recent delayed job process would take over the queue. If it was busy, however, old processes would butt in and start working off jobs using an old version of my application.
To fix the problem I had to kill off the old delayed job processes and then update my Capistrano code to make sure it wouldn't happen again.
Big thanks to #Dharam for leading me down the correct path (it's taken me a while to post this answer)

Related

Why is my Rails app hanging?

I am using Rails 4.2 and Ruby 2.2.0. Recently, I have noticed that my app will hang for long periods of time when a certain worker is working. This is an app that has been under development for several years and I have never seen this sort of hanging behavior before (nor am I unfortunately completely aware of what changes I've made since it was last not hanging).
What I mean by "hang" is: No pages will load (no controllers can be reached, and even the default homepage which is basically a static page won't load). None of my logs will update. Standard output is not being written to. The application also has many background worker processes (managed by Resque), which also appear to be hung.
I have looked at top and netstat and various other Unix utilities and I don't see anything alarming there, either.
Finally, I'm not sure if it's worth mentioning but this application is running via Foreman on an EC2 instance.
EDIT: I don't believe that it is query related, because nothing abnormal shows up on top. Also, I'm a little inexperienced with rails, but each job has its own entry in the process table, if that helps.

Ruby on Rails with Unicorn - how to do automated reporting in a threadpool in just one instance

We use Unicorn to run 16 instances of a RoR app. We are implementing automated reporting with the report results emailed and/or ftp'd. The reports can take up to a few minutes to generate and so we use a threadpool.
Since we have 16 instances we don't want to have potentially 16 x #_threads connections into our database. Ideally we would have just once of the instances running the scheduled reports.
I can think of a couple of ways to do it:
1) Have one of the 16 instances somehow distinguishable from the others and this is the only instance that can run the reports. I think that this would require some coding with the unicorn api, or possibly we could use a lockfile or have a database column that has the instance number allowed to run the reports.
The disadvantage of this approach is that the instance will be included in the unicorn load balancing and so users will be on the instance while the reports are being generated. However, if the thread is working properly it shouldn't be an issue.
2) Have a separate unicorn deployment for 1 instance that runs the reports and isn't included in the apache/unicorn connection. No one will interact with this instance via the ui - it just runs the reports.
The disadvantage of this approach is that I have to remember to update this instance when deploying and it's another instance to monitor for problems.
I'd prefer #1 for support simplicity, but I'm fine with #2 too.
Does anyone have experience in this?
I originally went with the approach of using a dedicated reporting instance that ran the reports in a thread pool of size=1 (and later just a plain single thread). It appeared like it would work but when I put it under load testing I quickly found situations where activity in the main thread and the reporting thread would block or cause problems on fetches (like returning nil instead of an array)
I did some research and rails/activerecord 2 (which we're currently on) isn't threadsafe.
So now I'm going to try using the whenever gem to run the reports in a rake process. I had this working for awhile but decided against it because I didn't want to maintain an external cron (even though its configured in the app, which is nice for git).

background tasks executing immediately and parallelly in rails

our rails web app has to download/unpack archives with html pages from ftp on request for user's viewing through the browser.
the archive can be quite big, so user has to wait until it downloads/unpacks on the server.
i implemented progress bar the way that i call fork/Process.detach in user's request, so that his request is done but downloading/unpacking process continues running in the background. and javascript rendered in his browser pings our server for status until all is ready and then it redirects him to unpacked html pages.
as long as user requests one archive, everything goes smoothly, but if he tries to run 2 or more requests at the same time(so that more forks are started), it seems that only one of them completes, and the rest expires/times outs/gets killed by passenger(?). i suppose its the issue with Passenger/forking.
i am not sure if its possible to fix it somehow so i guess i need to switch to another solution. the solution needs to permit immediate and parallel processing of downloads. so that if user requests multiple archives, he has to see download/decompression progress in all of them at the same time.
i was thinking about running background rake job immediately but it seems very slow to startup(also there's a lot of cron rake tasks happening every minute on our server). reason i liked fork was that it was very fast to start. i know there is delayed job, we also use it heavily for other tasks. but can it start multiple processes at the same time immediately without queues?
solved by keeping the fork and using single dj worker. this way i can have as many processes starting at the same time as needed without trouble with passenger/modifying our product's gemset (which we are trying to avoid since it resulted in bugs in the past)
not sure if forking inside dj worker can cause any troubles, so asked at
running fork in delayed job
if id be free to modify gemset, id probably use resque as wrdevos suggested, or sidekiq, or girl_friday(but thats less probable because it depends on the server running).
Use Resque: https://github.com/defunkt/resque
More on bg jobs and Resque here.
https://github.com/blog/542-introducing-resque

Zero downtime on Heroku

Is it possible to do something like the Github zero downtime deploy on Heroku using Unicorn on the Cedar stack?
I'm not entirely sure how the restart works on Heroku and what control we have over restarting processes, but I like the possibility of zero downtime deploys and up until now, from what I've read, it's not possible
There are a few things that would be required for this to work.
First off, we'd need backwards compatible migrations. I leave that up to our team to figure out.
Secondly, we'd want to migrate the db right after a push, but before the restart (assuming our migrations are fully backwards compatible, this should not affect anything)
Thirdly, we'd want to instruct Unicorn to launch a new master process and fork some workers, then swap the PIDs and gracefully shut down the old process/workers
I've scoured the docs but I can't find anything that would indicate this is possible on Heroku. Any thoughts?
I can't address migrations, but the part about restarting processes and avoiding wait time:
There is an beta feature for heroku called preboot. After a deploy, it boots your new dynos first and waits a while before switching traffic and killing the old ones:
https://devcenter.heroku.com/articles/labs-preboot/
I also wrote a blog post that has some measurements on my app's performance improvements using this feature:
http://ylan.segal-family.com/blog/2012/08/27/deploy-to-heroku-with-near-zero-downtime/
You might be interested in their feature called preboot.
Taken from their documentation:
This feature provides seamless deploys by booting web dynos with new code before killing existing web dynos.
Some apps take a long time to boot up, and this can cause unacceptable delays in serving HTTP requests during deployment.
There are a few caveats:
You must have at least two web dynos to use this feature. If you have your web process type scaled to 1 or 0, preboot will be disabled.
Whoever is doing the deployment will have to wait a few minutes before the new code starts serving user requests; this happens later than it would without preboot (but in the meanwhile, user requests are still served promptly by old dynos).
There will be a short period (a minute or two) where heroku ps shows the status of the new code, but user requests are still being served by old code.
There is much more information about it, so refer to their documentation.
It is possible, but requires a fair amount of forward planning. As of Rails 3.1 there's three tasks that need carrying out
Upload the new code
Run any database migrations
Sync the assets
Uploading code and restarting is fairly straightforward, the main problem lies with the other two, but the way round them is the pretty much the same.
Essentially you need to:
Make the code compatible with the migration you need to run
Run the migration, and remove any code written specifically for it
For instance, if you want to remove a column, you’ll need to deploy a patch telling ActiveRecord to ignore it first. Only then you can deploy the migration, and clean up that patch.
In short, you need to consider your database and the code compatability an work around them so that the two can overlap in terms of versioning.
An alternative to this method might be to have two versions of the application running on Heroku at the same time. When you deploy, switch the domain to the other version, do the deploy, and switch it back again. This will help in most instances, but again, database compat is an issue.
Personally, I would say that if your deployments are significant to require this sort of consideration, taking parts of the application offline are probably the safest answer. By breaking up an application into several smaller applications can help mitigate this and is a mechanism that I use regularly.
No - this is currently not possible using Unicorn on Heroku cedar. I've been bugging Heroku about this for weeks.
Here was Heroku Support's reply to my email on March 8, 2012:
Hi, you could enable maintenance mode when doing a deploy, at least your users would see a maintenance page instead of an error, and also request queue wouldn't build up.
We're definitely aware this is a pain and we're working to offer rolling / zero-downtime deploys in the future. We have no ETA to announce, though.

Best practice for Rails App to run a long task in the background?

I have a Rails application that unfortunately after a request to a controller, has to do some crunching that takes awhile. What are the best practices in Rails for providing feedback or progress on a long running task or request? These controller methods usually last 60+ seconds.
I'm not concerned with the client side... I was planning on having an Ajax request every second or so and displaying a progress indicator. I'm just not sure on the Rails best practice, do I create an additional controller? Is there something clever I can do? I want answers to focus on the server side using Rails only.
Thanks in advance for your help.
Edit:
If it matters, the http request are for PDFs. I then have Rails in conjunction with Ruport generate these PDFs. The problem is, these PDFs are very large and contain a lot of data. Does it still make sense to use a background task? Let's assume an average PDF takes about one minute to two minutes, will this make my Rails application unresponsive to any other server request during this time?
Edit 2:
Ok, after further investigation, it seems my Rails application is indeed unresponsive to any other HTTP requests after a request comes in for a large PDF. So, I guess the question now becomes: What is the best threading/background mechanism to use? It must be stable and maintained. I'm very surprised Rails doesn't have something like this built in.
Edit 3:
I have read this page: http://wiki.rubyonrails.org/rails/pages/HowToRunBackgroundJobsInRails. I would love to read about various experiences with these tools.
Edit 4:
I'm using Passenger Phusion "modrails", if it matters.
Edit 5:
I'm using Windows Vista 64 bit for my development machine; however, my production machine is Ubuntu 8.04 LTS. Should I consider switching to Linux for my development machine? Will the solutions presented work on both?
The Workling plugin allow you to schedule background tasks in a queue (they would perform the lengthy task). As of version 0.3 you can ask a worker for its status, this would allow you to display some nifty progress bars.
Another cool feature with Workling is that the asynchronous backend can be switched: you can used DelayedJobs, Spawn (classic fork), Starling...
I have a very large volume site that generates lots of large CSV files. These sometimes take several minutes to complete. I do the following:
I have a jobs table with details of the requested file. When the user requests a file, the request goes in that table and the user is taken to a "jobs status" page that lists all of their jobs.
I have a rake task that runs all outstanding jobs (a class method on the Job model).
I have a separate install of rails on another box that handles these jobs. This box just does jobs, and is not accessible to the outside world.
On this separate box, a cron job runs all outstanding jobs every 60 seconds, unless jobs are still running from the last invocation.
The user's job status page auto-refreshes to show the status of the job (which is updated by the jobs box as the job is started, running, then finished). Once the job is done, a link appears to the results file.
It may be too heavy-duty if you just plan to have one or two running at a time, but if you want to scale... :)
Calling ./script/runner in the background worked best for me. (I was also doing PDF generation.) It seems like the lowest common denominator, while also being the simplest to implement. Here's a write-up of my experience.
A simple solution that doesn't require any extra Gems or plugins would be to create a custom Rake task for handling the PDF generation. You could model the PDF generation process as a state machine with states such as submitted, processing and complete that are stored in the model's database table. The initial HTTP request to the Rails application would simply add a record to the table with a submitted state and return.
There would be a cron job that runs your custom Rake task as a separate Ruby process, so the main Rails application is unaffected. The Rake task can use ActiveRecord to find all the models that have the submitted state, change the state to processing and then generate the associated PDFs. Finally, it should set the state to complete. This enables your AJAX calls within the Rails app to monitor the state of the PDF generation process.
If you put your Rake task within your_rails_app/lib/tasks then it has access to the models within your Rails application. The skeleton of such a pdf_generator.rake would look like this:
namespace :pdfgenerator do
desc 'Generates PDFs etc.'
task :run => :environment do
# Code goes here...
end
end
As noted in the wiki, there are a few downsides to this approach. You'll be using cron to regularly create a fairly heavyweight Ruby process and the timing of your cron jobs would need careful tuning to ensure that each one has sufficient time to complete before the next one comes along. However, the approach is simple and should meet your needs.
This looks quite an old thread. However, what I have down in my app, which required to run multiple Countdown Timers for different pages, was to use Ruby Thread. The timer must continue running even if the page was closed by users. Ruby makes it easy to write multi-threaded programs with the Thread class. Ruby threads are a lightweight and efficient way to achieve parallelism in your code. I hope this will help other wanderers who is looking to achieve background: parallelism/concurrent services in their app. Likewise Ajax makes it a lot easier to call a specific Rails [custom] action every second.
This really does sound like something that you should have a background process running rather than an application instance(passenger/mongrel whichever you use) as that way your application can stay doing what it's supposed to be doing, serving requests, while a background task of some kind, Workling is good, handles the number crunching. I know that this doesn't deal with the issue of progress, but unless it is absolutely essential I think that is a small price to pay.
You could have a user click the action required, have that action pass the request to the Workling queue, and have it send some kind of notification to the user when it is completed, maybe an email or something. I'm not sure about the practicality of that, just thinking out loud, but my point is that it really seems like that should be a background task of some kind.
I'm using Windows Vista 64 bit for my
development machine; however, my
production machine is Ubuntu 8.04 LTS.
Should I consider switching to Linux
for my development machine? Will the
solutions presented work on both?
Have you considered running Linux in a VM on top of Vista?
I recommend using Resque gem with it's resque-status plug-in for your heavy background processes.
Resque
Resque is a Redis-backed Ruby library for creating background jobs,
placing them on multiple queues, and processing them later.
Resque-status
resque-status is an extension to the resque queue system that provides
simple trackable jobs.
Once you run a job on a Resque worker using resque-status extension, you will be able to get info about your ongoing progresses and ability to kill a specific process very easily. See examples:
status.pct_complete #=> 0
status.status #=> 'queued'
status.queued? #=> true
status.working? #=> false
status.time #=> Time object
status.message #=> "Created at ..."
Also resque and resque-status has a cool web interface to interact with your jobs which is so cool.
There is the brand new Growl4Rails ... that is for this specific use case (among others as well).
http://www.writebetterbits.com/2009/01/update-to-growl4rails.html
I use Background Job (http://codeforpeople.rubyforge.org/svn/bj/trunk/README) to schedule tasks. I am building a small administration site that allows Site Admins to run all sorts of things you and I would run from the command line from a nice web interface.
I know you said you were not worried about the client side but I thought you might find this interesting: Growl4Rails - Growl style notifications that were developed for pretty much what you are doing judging by the example they use.
I've used spawn before and definitely would recommend it.
Incredibly simple to set up (which many other solutions aren't), and works well.
Check out BackgrounDRb, it is designed for exactly the scenario you are describing.
I think it has been around for a while and is pretty mature. You can monitor the status of the workers.
It's a pretty good idea to develop on the same development platform as your production environment, especially when working with Rails. The suggestion to run Linux in a VM is a good one. Check out Sun xVM for Open Source virtualization software.
I personally use active_messaging plugin with a activemq server (stomp or rest protocol). This has been extremely stable for us, processing millions of messages a month.

Resources