Architecture guidance for AWS run Rails app (Web/Worker setup) - ruby-on-rails

We are hosting application on AWS and are using EB (ElasticBeanstalk) for deployments. Application is Rails and we are using Sidekiq for background processes. We have decoupled RDS instance, ElasticCache (for Sidekiq communication) and generally, we are stateless architecture.
At the moment our web process and sidekiq process are running on same EC2 instances. This means that we need to use larger instances to support this process. We want to move to separate web and worker architecture. Idea is to move web processes on EC2 small instances and have one EC2 large instance dedicated to Sidekiq only. Reason for this is that we have CPU usage issues where bigger worker jobs hog all the resources and take the instance down which than dominos in new instances and general not the optimal use of our resources.
This seems like no brainer to us, but we are having trouble finding web resources where this has been implemented. Also, it is confusing to us setting up Web EB app and Worker EB app separately. How would deploy work, would we deploy two separate EB applications at the same time? That does not seem safe.
We are looking on guidance on how to best go ahead achieving above goal, are there examples or setups that you could share where we could see a real-world example of this?
Also is a there a better way to do this?

The web/worker setup you described for a Rails application is absolutely appropriate. Within the same application, you can create an environment for your web server and an environment for your worker. Your code base can be deployed to both environments either separately (if your changes only affect either the worker or the web server), or at the same time (if your changes affect both environments). You can set environment variables specific to your environment that you can use to determine whether code should run on the worker or the web server. Here is a brief outline of the steps you could use:
Create a new application.
Create a web server environment within the application (for example "production").
Create a worker environment within the application (for example "production-worker").
Set an environment variable, for example APP_ENVIRONMENT (this name could be anything you choose) on production with the value "production", and with the value "production-worker" on the production-worker environment.
Create configuration files in .ebextensions to start/stop sidekiq (and any other programs needed for the worker) depending on if the APP_ENVIRONMENT variable name matches "worker".
Set up a cron.yaml file for background jobs (see AWS docs).
For the background jobs, I create a separate cron controller for the endpoints listed in the cron.yaml file.
Deploy your code base to both the web server and worker environments. Subsequent changes can be deployed as needed to the appropriate environment.
For Sidekiq, your web application and your worker both need to be given access to the Redis server, so your application can create jobs and then the worker can pick them up to process.

Related

Automated deployment of a dockerized application on a single machine

I have a web application consisting of a few services - web, DB and a job queue/worker. I host everything on a single Google VM and my deployment process is very simple and naive:
I manually install all services like the database on the VM
a bash script scheduled by crontab polls a remote git repository for changes every N minutes
if there were changes, it would simply restart all services using supervisord (job queue, web, etc)
Now, I am starting a new web project where I enjoy using docker-compose for local development. However, I seem to suck in analysis paralysis deciding between available options for production deployment - I looked at Kubernetes, Swarm, docker-compose, container registries and etc.
I am looking for a recipe that will keep me productive with a single machine deployment. Ideally, I should be able to scale it to multiple machines when the time comes, but simplicity and staying frugal (one machine) is more important for now. I want to consider 2 options - when the VM already exists and when a new bare VM can be allocated specifically for this application.
I wonder if docker-compose is a reasonable choice for a simple web application. Do people use it in production and if so, how does the entire process look like from bare VM to rolling out an updated application? Do people use Kubernetes or Swarm for a simple single-machine deployment or is it an overkill?
I wonder if docker-compose is a reasonable choice for a simple web application.
It can be, sure, if the development time is best spent focused on the web application and less on the non-web stuff such as the job queue and database. The other asterisk is whether the development environment works ok with hot-reloads or port-forwarding and that kind of jazz. I say it's a reasonable choice because 99% of the work of creating an application suitable for use in a clustered environment is the work of containerizing the application. So if the app already works under docker-compose, then it is with high likelihood that you can take the docker image that is constructed on behalf of docker-compose and roll it out to the cluster.
Do people use it in production
I hope not; I am sure there are people who use docker-compose to run in production, just like there are people that use Windows batch files to deploy, but don't be that person.
Do people use Kubernetes or Swarm for a simple single-machine deployment or is it an overkill?
Similarly, don't be a person that deploys the entire application on a single virtual machine or be mentally prepared for one failure to wipe out everything that you value. That's part of what clustering technologies are designed to protect against: one mistake taking down the entirety of the application, web, queuing, and persistence all in one fell swoop.
Now whether deploying kubernetes for your situation is "overkill" or not depends on whether you get benefit from the other things that kubernetes brings aside from mere scaling. We get benefit from developer empowerment, log aggregation, CPU and resource limits, the ability to take down one Node without introducing any drama, secrets management, configuration management, using a small number of Nodes for a large number of hosted applications (unlike creating a single virtual machine per deployed application because the deployments have no discipline over the placement of config file or ports or whatever). I can keep going, because kubernetes is truly magical; but, as many people will point out, it is not zero human cost to successfully run a cluster.
Many companies I have worked with are shifting their entire production environment towards Kubernetes. That makes sense because all cloud providers are currently pushing Kubernetes and we can be quite positive about Kubernetes being the future of cloud-based deployment. If your application is meant to run in any private or public cloud, I would personally choose Kubernetes as operating platform for it. If you plan to add additional services, you will be easily able to connect them and scale your infrastructure with a growing number of requests to your application. However, if you already know that you do not expect to scale your application, it may be over-powered to use a Kubernetes cluster to run it although Google Cloud etc. make it fairly easy to setup such a cluster with a few clicks.
Regarding an automated development workflow for Kubernetes, you can take a look at my answer to this question: How to best utilize Kubernetes/minikube DNS for local development

Where should I run scheduled background jobs?

Here in my company we have our regular application in aws ebs with some background jobs. The problem is, these jobs are starting to get heavier and we were thinking in separate them from the application. The question is: Where should we do it?
We were thinking in doing it in aws lambda, but then we would have to port our rails code to python, node or java, which seems to be a lot of work. What are other options for this? Should we just create another ec2 environment for the jobs? Thanks in advance.
Edit: I'm using shoryuken gem: http://github.com/phstc/shoryuken integrated with SQS. But its currently with some memory leak and my application is going down sometimes, I dont know if the memory leak is the cause tough. We already separated the application between an API part in EBS and a front-end part in S3.
Normally, just another EC2 instance with a copy of your Rails app, where instead of rails s to start the web server, you run rake resque:work or whatever your job runner start command is. Both would share the same Redis instance and database so that your web server writes the jobs to the queue and the worker picks them up and runs them.
If you need more workers, just add more EC2 instances pointing to the same Redis instance. I would advise separating your jobs by queue name, so that one worker can just process fast stuff e.g. email sending, and others can do long running or slow jobs.
We had a similar requirement, for us it was the sidekiq background jobs, they started to get very heavy, so we split it into a separate opsworks stack, with a simple recipe to build the machine dependencies ( ruby, mysql, etc ), and since we don't have to worry about load balancers and requests timing out, it's ok for all machines to deploy at the same time.
Also another thing you could use in opsworks is using scheduled machines ( if the jobs are needed at certain times during the day ), having the machine get provisioned few minutes before the time of the task, and then after the task is done you could make it shutdown automatically, that would reduce your cost.
EB also has a different type of application, which is the worker application, you could also check that out, but honestly I haven't looked into it so I can't tell you what are the pros and cons of that.
We recently passed on that route. I dockerized our rails app, and wrote a custom entrypoint to that docker container. In summary that entrypoint parses commands after you run docker run IMAGE_NAME
For example: If you run: docker run IMAGE_NAME sb rake do-something-magical entrypoint understands that it will run rake job with sandbox envrionment config. if you only run: docker run IMAGE_NAME it will do rails s -b 0.0.0.0
PS: I wrote custom entrypoint because we have 3 different environments, that entrypoint downloads environment specific config from s3.
And I set up an ECS Cluster, wrote an task-runner job on Lambda this lambda function schedules a task on ecs cluster, and we trigger that lambda from CloudWatch Events. You can send custom payloads to lambda when using CloudWatch Events.
It sounds complicated but implementation is really simple.
You may consider to submit your task to AWS SQS services, then you may use elasticbeantaslk worker enviroment to process your backgrown task.
Elasticbeanstalk supports rail application:
http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create_deploy_Ruby_rails.html
Depending on what kind of work these background jobs perform, you might want to think about maybe extracting those functions into microservices if you are running your jobs on a difference instance anyway.
Here is a good codeship blog post on how to approach this.
For simple mailer type stuff, this definitely feels a little heavy handed, but if the functionality is more complex, e.g. general notification of different clients, it might well be worth the overhead.

How could travis CI prepare the test environment for Ruby on Rails and its backend

My infrastructure is based on AWS, 3 for EC2 instances for Rails App Server, 1 for RDS (MongoDB), 1 for EC2 instance as Redis server.
Will the TravisCI launch similar services (eg. MongoDB, Redis) for pass the RSpec tests.
If not, what's the logic behind the TravisCI?
Would it be more practical to run the test on my real infrastructure rather than in TravisCI?
Yes! Travis CI fully supports Ruby on Rails, and can launch the same services you need for the tests, so I expect you'd be all set. When you go to create your travis.yml file, you'll be able to set the configuration for your build environment, including setting up services including MongoDB and/or Redis. Here's some sample code on how that looks:
services
- mongodb
- redis
From a practical standpoint, using a separate environment makes it easier to ensure test integrity, though you do have to do the additional software setup. The main benefit though is that you get a clean slate at each build for all your tests and it's well away from your production code in case there's a problem.

Architectural overview for Resque on Heroku?

tldr; What pieces do you need to make a web app with a resque+resque_web dashboard?
I've seen the Heroku tutorial, and plenty of configuration examples, but it seems like there's a lot of complexity being glossed over:
Dynos don't have stable IP addresses, so how does the communication work between the web process, a resque process, and redis?
The Heroku docs imply that an additional communication service is necessary to coordinate between dynos; am I reading this right?
How many dynos and services are required to make a "basic" web app which:
hands off some long-running to jobs to resque which
saves its results in the web app's database, and
is accessible by resque_web (mounted w/in the web app or standalone)?
Honestly, if someone could sketch a diagram, that'd be great.
Disclaimer: I don't have actually deployed a heroku app with resque. So this is information gleaned from: https://devcenter.heroku.com/articles/queuing-ruby-resque and checking into the example app.
The web-dyno and the worker-dyno will not communicate directly with each other. They will communicate with each other via redis, and redis is provisioned on a specific DNS (which you can find on your apps resource page on heroku, after adding a redis plugin). These settings can be transfered into an .env file (via the heroku toolbelt plugin config). This env file can be used by foreman to set up the ENV variables. and these ENV variables you you use in your application to configure redis.
Not sure, but the example-app does not imply any such necessary service
2: 1 web-dyno, 1 worker-dyno

how to configure Elastic Beanstalk to deploy code to an instance but not add it to the load balancer

I am moving a Rails app to AWS and am using EB. I need to run a daemon on a separate instance (I do not want this instance to be serving HTTP requests).
The daemon is part of app's codebase, and will communicate with the same RDS instance as the web server instances. I would like to know, if possible, how I can configure EB to deploy the rails app to an additional instance, but elide adding that instance to the load-balancer, and (re)start the daemon on that instance after a new revision is deployed.
I realize I could achieve the same result by managing this additional instance myself, outside of EB, but I have a feeling there's a better way. I have done some research myself, without finding what I'm after.
I could also just run the daemon on one of the web server instances, and live with the fact that it's also serving HTTP requests. Since this is acceptable for right now, that's what I'm doing today ... but I want a dedicated instance for that daemon, and it would be great if I didn't have to drop the convenience of EB deployments just for that.
This is the first time I've used Elastic Beanstalk; I have some experience with AWS. I hope my question makes sense. If it doesn't, an answer that points out why it doesn't make sense will be accepted.
Thanks!
For the most part, though not straight forward, you could provide a .config file in .ebextensions to run your script files.
This example of speeding up a deploy shows running some scripts and moving data back and forth. Better yet, the author describes the sequence and deployment process.
I'm just embarking on this type of container customization. I have read of others dropping files in the /opt/elasticbeanstalk/hooks/appdeploy/pre and /opt/elasticbeanstalk/hooks/appdeploy/post directories, much of which can be derived by reading the post linked above.
Also note that you can include the content of a script in the yaml .config file such as this which I found yesterday:
files:
"/opt/elasticbeanstalk/hooks/appdeploy/post/99_restart_delayed_job.sh":
mode: "000755"
owner: root
group: root
content: |
#!/usr/bin/env bash
. /opt/elasticbeanstalk/support/envvars
cd $EB_CONFIG_APP_CURRENT
su -c "RAILS_ENV=production script/delayed_job --pid-dir=$EB_CONFIG_APP_SUPPORT/pids restart" $EB_CONFIG_APP_USER
With Elastic Beanstalk, this is typically achieved by using a worker tier environment within the same EB application (same code base, same .eb* files, just different environments.
Here's an example of a rails application that is deployed to one web server, and two specialized workers:
[yacin#mac my_rails_app (master)]$ eb list -v
Region: us-west-1
Application: my_rails_app
Environments: 3
email-workers-production : ['i-xxxxxxx']
* web-servers-production : ['i-xxxxxx']
job1-workers-production : ['i-xxxxxxx', 'i-xxxxxx']
The workers don't have a public HTTP interface and pull jobs from a queue shared with the front-end.
The worker can be configured to have access the same database and with load balancing and autoscaling.
It's a very flexible and scalable approach, but will require some work to setup. Here's a couple of resources on the subject: Amazon Worker Tier Video Tutorial, Elastic Beanstalk.

Resources