Efficient rails deployment to a small EC2 instance - ruby-on-rails

I've got a few rails apps running under different vhosts on a single small EC2 instance. My automated deployment process for each involves running some rake tasks (migration, asset compilation, etc.), staging everything into a versioned directory and symlinking web root to it. I'm serving the apps with Apache + Passenger. Through this process (and the rebooting of passenger), I have ruby processes eating up 100% of CPU. I understand why this is happening, but I need a way to throttle these processes down so that all of the other apps on the instance aren't as significantly impacted as they currently are.

Don't know if you've already come across this. But it's there to make EC2 deployment more convenient. https://github.com/wr0ngway/rubber
There is also a Railscast on it at: http://railscasts.com/episodes/347-rubber-and-amazon-ec2
Hopefully, these two resources will help you somewhere.

Related

Architecture guidance for AWS run Rails app (Web/Worker setup)

We are hosting application on AWS and are using EB (ElasticBeanstalk) for deployments. Application is Rails and we are using Sidekiq for background processes. We have decoupled RDS instance, ElasticCache (for Sidekiq communication) and generally, we are stateless architecture.
At the moment our web process and sidekiq process are running on same EC2 instances. This means that we need to use larger instances to support this process. We want to move to separate web and worker architecture. Idea is to move web processes on EC2 small instances and have one EC2 large instance dedicated to Sidekiq only. Reason for this is that we have CPU usage issues where bigger worker jobs hog all the resources and take the instance down which than dominos in new instances and general not the optimal use of our resources.
This seems like no brainer to us, but we are having trouble finding web resources where this has been implemented. Also, it is confusing to us setting up Web EB app and Worker EB app separately. How would deploy work, would we deploy two separate EB applications at the same time? That does not seem safe.
We are looking on guidance on how to best go ahead achieving above goal, are there examples or setups that you could share where we could see a real-world example of this?
Also is a there a better way to do this?
The web/worker setup you described for a Rails application is absolutely appropriate. Within the same application, you can create an environment for your web server and an environment for your worker. Your code base can be deployed to both environments either separately (if your changes only affect either the worker or the web server), or at the same time (if your changes affect both environments). You can set environment variables specific to your environment that you can use to determine whether code should run on the worker or the web server. Here is a brief outline of the steps you could use:
Create a new application.
Create a web server environment within the application (for example "production").
Create a worker environment within the application (for example "production-worker").
Set an environment variable, for example APP_ENVIRONMENT (this name could be anything you choose) on production with the value "production", and with the value "production-worker" on the production-worker environment.
Create configuration files in .ebextensions to start/stop sidekiq (and any other programs needed for the worker) depending on if the APP_ENVIRONMENT variable name matches "worker".
Set up a cron.yaml file for background jobs (see AWS docs).
For the background jobs, I create a separate cron controller for the endpoints listed in the cron.yaml file.
Deploy your code base to both the web server and worker environments. Subsequent changes can be deployed as needed to the appropriate environment.
For Sidekiq, your web application and your worker both need to be given access to the Redis server, so your application can create jobs and then the worker can pick them up to process.

Will 2 servers running Rails + Sidekiq, using the same redis server cause unexpected behavour?

I'm planning to migrate my 2 sidekiq instances to use 1 Redis database. I'm concerned there may be issues with race conditions. Is it safe to do this or not?
I currently have 2 rails servers in production behind a load balancer. Each server is cloned, running a rails app, sidekiq, and a redis database.
The staging environment has the same setup. However, I have connected both sidekiq instances to a single Redis database.
So far I have had no problems, but the staging environment does not see much traffic to see any noticeable effects.
You should at least use different redis databases for staging and production so that tasks from one environment do not end up being run in the other.
In your current setup tasks from one server are executed solely by the same server, but it's not necessary - you can have sidekiq instances pool shared between servers (sidekiq is designed to run fine this way) as long as they have same or compatible code versions (there may be problems while rolling out new versions when task for new version ends up picked by older one).
This setup is actually better - if one sidekiq instance has all threads busy, tasks from correcponding server can be still run on the other.

Where should I run scheduled background jobs?

Here in my company we have our regular application in aws ebs with some background jobs. The problem is, these jobs are starting to get heavier and we were thinking in separate them from the application. The question is: Where should we do it?
We were thinking in doing it in aws lambda, but then we would have to port our rails code to python, node or java, which seems to be a lot of work. What are other options for this? Should we just create another ec2 environment for the jobs? Thanks in advance.
Edit: I'm using shoryuken gem: http://github.com/phstc/shoryuken integrated with SQS. But its currently with some memory leak and my application is going down sometimes, I dont know if the memory leak is the cause tough. We already separated the application between an API part in EBS and a front-end part in S3.
Normally, just another EC2 instance with a copy of your Rails app, where instead of rails s to start the web server, you run rake resque:work or whatever your job runner start command is. Both would share the same Redis instance and database so that your web server writes the jobs to the queue and the worker picks them up and runs them.
If you need more workers, just add more EC2 instances pointing to the same Redis instance. I would advise separating your jobs by queue name, so that one worker can just process fast stuff e.g. email sending, and others can do long running or slow jobs.
We had a similar requirement, for us it was the sidekiq background jobs, they started to get very heavy, so we split it into a separate opsworks stack, with a simple recipe to build the machine dependencies ( ruby, mysql, etc ), and since we don't have to worry about load balancers and requests timing out, it's ok for all machines to deploy at the same time.
Also another thing you could use in opsworks is using scheduled machines ( if the jobs are needed at certain times during the day ), having the machine get provisioned few minutes before the time of the task, and then after the task is done you could make it shutdown automatically, that would reduce your cost.
EB also has a different type of application, which is the worker application, you could also check that out, but honestly I haven't looked into it so I can't tell you what are the pros and cons of that.
We recently passed on that route. I dockerized our rails app, and wrote a custom entrypoint to that docker container. In summary that entrypoint parses commands after you run docker run IMAGE_NAME
For example: If you run: docker run IMAGE_NAME sb rake do-something-magical entrypoint understands that it will run rake job with sandbox envrionment config. if you only run: docker run IMAGE_NAME it will do rails s -b 0.0.0.0
PS: I wrote custom entrypoint because we have 3 different environments, that entrypoint downloads environment specific config from s3.
And I set up an ECS Cluster, wrote an task-runner job on Lambda this lambda function schedules a task on ecs cluster, and we trigger that lambda from CloudWatch Events. You can send custom payloads to lambda when using CloudWatch Events.
It sounds complicated but implementation is really simple.
You may consider to submit your task to AWS SQS services, then you may use elasticbeantaslk worker enviroment to process your backgrown task.
Elasticbeanstalk supports rail application:
http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create_deploy_Ruby_rails.html
Depending on what kind of work these background jobs perform, you might want to think about maybe extracting those functions into microservices if you are running your jobs on a difference instance anyway.
Here is a good codeship blog post on how to approach this.
For simple mailer type stuff, this definitely feels a little heavy handed, but if the functionality is more complex, e.g. general notification of different clients, it might well be worth the overhead.

Sustainable Solution To Configuring Rails, Sidekiq, Redis All On AWS Elastic Beanstalk

AWS Elastic Beanstalk rails app that needs a sidekiq worker processes running alongside Puma/Passenger. Getting the sidekiq process to run has resulted in hours failed attempts. Also, getting the rails app and sidekiq to talk to my AWS ElastiCache cluster apparently needs some security rule changes.
Background
We started out with an extremely simple Rails app that was easily deployed to AWS Elastic Beanstalk. Since those early times we've evolved the app to now use the worker framework Sidekiq. Sidekiq in turn likes to use Redis to pull its jobs. Anyway, getting all these puzzle pieces assembled in the AWS world is a little challenging.
Solutions From The Web...with some sustainability problems
The AWS ecosystem goes through updates and upgrades, many aren't documented with clarity. For example environment settings change regularly; a script you have written may break in subsequent versions.
I used the following smattering of solutions to try to solve this:
http://blog.noizeramp.com/2013/04/21/using-sidekiq-with-elastic-beanstalk/ (please note that the comments in this blog post contains a number of helpful gists). Many thanks to the contributor and commenters in this post.
http://qiita.com/sawanoboly/items/d28a05d3445901cf1b25 (starting sidekiq with upstart/initctl seems like the simplest and most sustainable approach). This page is in japanese, but the sidekiq startup code makes complete sense. Thanks!
Use AWS's ElastiCache for Redis. Make sure to configure your security groups accordingly: this AWS document was helpful...

Architectural overview for Resque on Heroku?

tldr; What pieces do you need to make a web app with a resque+resque_web dashboard?
I've seen the Heroku tutorial, and plenty of configuration examples, but it seems like there's a lot of complexity being glossed over:
Dynos don't have stable IP addresses, so how does the communication work between the web process, a resque process, and redis?
The Heroku docs imply that an additional communication service is necessary to coordinate between dynos; am I reading this right?
How many dynos and services are required to make a "basic" web app which:
hands off some long-running to jobs to resque which
saves its results in the web app's database, and
is accessible by resque_web (mounted w/in the web app or standalone)?
Honestly, if someone could sketch a diagram, that'd be great.
Disclaimer: I don't have actually deployed a heroku app with resque. So this is information gleaned from: https://devcenter.heroku.com/articles/queuing-ruby-resque and checking into the example app.
The web-dyno and the worker-dyno will not communicate directly with each other. They will communicate with each other via redis, and redis is provisioned on a specific DNS (which you can find on your apps resource page on heroku, after adding a redis plugin). These settings can be transfered into an .env file (via the heroku toolbelt plugin config). This env file can be used by foreman to set up the ENV variables. and these ENV variables you you use in your application to configure redis.
Not sure, but the example-app does not imply any such necessary service
2: 1 web-dyno, 1 worker-dyno

Resources