Architectural overview for Resque on Heroku? - ruby-on-rails

tldr; What pieces do you need to make a web app with a resque+resque_web dashboard?
I've seen the Heroku tutorial, and plenty of configuration examples, but it seems like there's a lot of complexity being glossed over:
Dynos don't have stable IP addresses, so how does the communication work between the web process, a resque process, and redis?
The Heroku docs imply that an additional communication service is necessary to coordinate between dynos; am I reading this right?
How many dynos and services are required to make a "basic" web app which:
hands off some long-running to jobs to resque which
saves its results in the web app's database, and
is accessible by resque_web (mounted w/in the web app or standalone)?
Honestly, if someone could sketch a diagram, that'd be great.

Disclaimer: I don't have actually deployed a heroku app with resque. So this is information gleaned from: https://devcenter.heroku.com/articles/queuing-ruby-resque and checking into the example app.
The web-dyno and the worker-dyno will not communicate directly with each other. They will communicate with each other via redis, and redis is provisioned on a specific DNS (which you can find on your apps resource page on heroku, after adding a redis plugin). These settings can be transfered into an .env file (via the heroku toolbelt plugin config). This env file can be used by foreman to set up the ENV variables. and these ENV variables you you use in your application to configure redis.
Not sure, but the example-app does not imply any such necessary service
2: 1 web-dyno, 1 worker-dyno

Related

Architecture guidance for AWS run Rails app (Web/Worker setup)

We are hosting application on AWS and are using EB (ElasticBeanstalk) for deployments. Application is Rails and we are using Sidekiq for background processes. We have decoupled RDS instance, ElasticCache (for Sidekiq communication) and generally, we are stateless architecture.
At the moment our web process and sidekiq process are running on same EC2 instances. This means that we need to use larger instances to support this process. We want to move to separate web and worker architecture. Idea is to move web processes on EC2 small instances and have one EC2 large instance dedicated to Sidekiq only. Reason for this is that we have CPU usage issues where bigger worker jobs hog all the resources and take the instance down which than dominos in new instances and general not the optimal use of our resources.
This seems like no brainer to us, but we are having trouble finding web resources where this has been implemented. Also, it is confusing to us setting up Web EB app and Worker EB app separately. How would deploy work, would we deploy two separate EB applications at the same time? That does not seem safe.
We are looking on guidance on how to best go ahead achieving above goal, are there examples or setups that you could share where we could see a real-world example of this?
Also is a there a better way to do this?
The web/worker setup you described for a Rails application is absolutely appropriate. Within the same application, you can create an environment for your web server and an environment for your worker. Your code base can be deployed to both environments either separately (if your changes only affect either the worker or the web server), or at the same time (if your changes affect both environments). You can set environment variables specific to your environment that you can use to determine whether code should run on the worker or the web server. Here is a brief outline of the steps you could use:
Create a new application.
Create a web server environment within the application (for example "production").
Create a worker environment within the application (for example "production-worker").
Set an environment variable, for example APP_ENVIRONMENT (this name could be anything you choose) on production with the value "production", and with the value "production-worker" on the production-worker environment.
Create configuration files in .ebextensions to start/stop sidekiq (and any other programs needed for the worker) depending on if the APP_ENVIRONMENT variable name matches "worker".
Set up a cron.yaml file for background jobs (see AWS docs).
For the background jobs, I create a separate cron controller for the endpoints listed in the cron.yaml file.
Deploy your code base to both the web server and worker environments. Subsequent changes can be deployed as needed to the appropriate environment.
For Sidekiq, your web application and your worker both need to be given access to the Redis server, so your application can create jobs and then the worker can pick them up to process.

Rails 4.2 load balancing with nginx redis and sidekiq

Hi I just launched a rails 4 application which uses nginx as load balancer with thin serving rails on 2 ports. Additionally I use redis as cache which is also getting used by sidekiq.
I was wondering how can I scale up using another machine in order to run two more rails applications there. My idea is just running two more rails applications on another machine but the headache comes with redis since sidekiq is making heavy use of it. My first idea was just to have another redis slave which is just read only on the second machine . But this might be error prone since I have a lot of writes into redis in order to check a worker queue.
The following scenario kind of confuses me. The web app makes a request and triggers sidekiq which performs a long running action, it continuously updates the status in redis. The web client polls the app every second in order to get the status. Now it could be possible that the request gets redirected to the second machine with the redis slave which is not yet updated. So I was wondering how would be the best setup, just using one redis instance taking into account latency or run a redis slave?
You have two machines:
MachineA running thin and sidekiq.
MachineB running thin and sidekiq.
Now you install redis on MachineA and point Sidekiq to MachineA for Redis. Both Sidekiqs will talk to Redis on MachineA. See Using Redis for more detail.
Side note: A redis slave is useful for read-only debugging but isn't useful for scaling Sidekiq.

Sustainable Solution To Configuring Rails, Sidekiq, Redis All On AWS Elastic Beanstalk

AWS Elastic Beanstalk rails app that needs a sidekiq worker processes running alongside Puma/Passenger. Getting the sidekiq process to run has resulted in hours failed attempts. Also, getting the rails app and sidekiq to talk to my AWS ElastiCache cluster apparently needs some security rule changes.
Background
We started out with an extremely simple Rails app that was easily deployed to AWS Elastic Beanstalk. Since those early times we've evolved the app to now use the worker framework Sidekiq. Sidekiq in turn likes to use Redis to pull its jobs. Anyway, getting all these puzzle pieces assembled in the AWS world is a little challenging.
Solutions From The Web...with some sustainability problems
The AWS ecosystem goes through updates and upgrades, many aren't documented with clarity. For example environment settings change regularly; a script you have written may break in subsequent versions.
I used the following smattering of solutions to try to solve this:
http://blog.noizeramp.com/2013/04/21/using-sidekiq-with-elastic-beanstalk/ (please note that the comments in this blog post contains a number of helpful gists). Many thanks to the contributor and commenters in this post.
http://qiita.com/sawanoboly/items/d28a05d3445901cf1b25 (starting sidekiq with upstart/initctl seems like the simplest and most sustainable approach). This page is in japanese, but the sidekiq startup code makes complete sense. Thanks!
Use AWS's ElastiCache for Redis. Make sure to configure your security groups accordingly: this AWS document was helpful...

how to configure Elastic Beanstalk to deploy code to an instance but not add it to the load balancer

I am moving a Rails app to AWS and am using EB. I need to run a daemon on a separate instance (I do not want this instance to be serving HTTP requests).
The daemon is part of app's codebase, and will communicate with the same RDS instance as the web server instances. I would like to know, if possible, how I can configure EB to deploy the rails app to an additional instance, but elide adding that instance to the load-balancer, and (re)start the daemon on that instance after a new revision is deployed.
I realize I could achieve the same result by managing this additional instance myself, outside of EB, but I have a feeling there's a better way. I have done some research myself, without finding what I'm after.
I could also just run the daemon on one of the web server instances, and live with the fact that it's also serving HTTP requests. Since this is acceptable for right now, that's what I'm doing today ... but I want a dedicated instance for that daemon, and it would be great if I didn't have to drop the convenience of EB deployments just for that.
This is the first time I've used Elastic Beanstalk; I have some experience with AWS. I hope my question makes sense. If it doesn't, an answer that points out why it doesn't make sense will be accepted.
Thanks!
For the most part, though not straight forward, you could provide a .config file in .ebextensions to run your script files.
This example of speeding up a deploy shows running some scripts and moving data back and forth. Better yet, the author describes the sequence and deployment process.
I'm just embarking on this type of container customization. I have read of others dropping files in the /opt/elasticbeanstalk/hooks/appdeploy/pre and /opt/elasticbeanstalk/hooks/appdeploy/post directories, much of which can be derived by reading the post linked above.
Also note that you can include the content of a script in the yaml .config file such as this which I found yesterday:
files:
"/opt/elasticbeanstalk/hooks/appdeploy/post/99_restart_delayed_job.sh":
mode: "000755"
owner: root
group: root
content: |
#!/usr/bin/env bash
. /opt/elasticbeanstalk/support/envvars
cd $EB_CONFIG_APP_CURRENT
su -c "RAILS_ENV=production script/delayed_job --pid-dir=$EB_CONFIG_APP_SUPPORT/pids restart" $EB_CONFIG_APP_USER
With Elastic Beanstalk, this is typically achieved by using a worker tier environment within the same EB application (same code base, same .eb* files, just different environments.
Here's an example of a rails application that is deployed to one web server, and two specialized workers:
[yacin#mac my_rails_app (master)]$ eb list -v
Region: us-west-1
Application: my_rails_app
Environments: 3
email-workers-production : ['i-xxxxxxx']
* web-servers-production : ['i-xxxxxx']
job1-workers-production : ['i-xxxxxxx', 'i-xxxxxx']
The workers don't have a public HTTP interface and pull jobs from a queue shared with the front-end.
The worker can be configured to have access the same database and with load balancing and autoscaling.
It's a very flexible and scalable approach, but will require some work to setup. Here's a couple of resources on the subject: Amazon Worker Tier Video Tutorial, Elastic Beanstalk.

Efficient rails deployment to a small EC2 instance

I've got a few rails apps running under different vhosts on a single small EC2 instance. My automated deployment process for each involves running some rake tasks (migration, asset compilation, etc.), staging everything into a versioned directory and symlinking web root to it. I'm serving the apps with Apache + Passenger. Through this process (and the rebooting of passenger), I have ruby processes eating up 100% of CPU. I understand why this is happening, but I need a way to throttle these processes down so that all of the other apps on the instance aren't as significantly impacted as they currently are.
Don't know if you've already come across this. But it's there to make EC2 deployment more convenient. https://github.com/wr0ngway/rubber
There is also a Railscast on it at: http://railscasts.com/episodes/347-rubber-and-amazon-ec2
Hopefully, these two resources will help you somewhere.

Resources