How to use Elasticsearch on Heroku - ruby-on-rails

I've just finished watching both Railscasts' episodes on Elasticsearch. I've also went ahead and implemented it into my rails application (3.1) and everything is working great. How I want to deploy my app to Heroku but I'm unsure how to get Elasticsearch working on Heroku (specifically on a cedar stack).
Any help would be greatly appreciated!

You can very easily [and freely ;-)] roll your own ElasticSearch server on Amazon EC2, and just connect to it with your app. This is what we're doing, and it's working nicely...
http://www.elasticsearch.org/tutorials/elasticsearch-on-ec2/

Heroku now supports ElasticSearch with the Bonsai add on. https://devcenter.heroku.com/articles/bonsai

I created a Play framework module that will run Elastic Search on Heroku using S3 to persist the state. No need for an EC2 instance - you only pay for the cost of S3 data which is much less - mainly IO transactions. It uses the ElasticSearch S3 gateway (persistence mechanism).
You can use it either by extending the Play application to create specific endpoints for your search functions, or if you like, you can access ElasticSearch REST API directly (by default it exposes it on the route http://yourapp.com/es). There is a very basic authentication system to secure it.
The only downside to this setup is that the dyno can take some time to spin up. So, it won't work well if you let the dyno spin down from inactivity - and you may get nailed for S3 data transfer charges if that happens a lot and your index is huge. The upside is you control your own data and it is cheap cheap cheap. Another word of warning - you will need to be careful to keep inside the memory limits of a Heroku dyno. That said, we had full text search autocomplete functions working on several indexes with no problems.
You might be able to build a similar module in Rails using JRuby to talk to the ElasticSearch Java API. My main contribution here was figuring out how to run it inside another web framework - since Play also uses Netty it was pretty easy to embed it. Performance tests compared to an EC2 cluster + Tire (Rails gem for ElasticSearch) showed that the Heroku/Play approach performed faster searches.
The project is here: https://github.com/carchrae/elastic-play - I'd be happy to help people set it up - it should be pretty painless.

That was exactly my first thought when I watched the RailsCast but unfortunately it's a java daemon that it runs as which isn't possible on Heroku.

Anyway you can't run it on a normal Heroku dyno since it would have to save data to disk which is not persisted on Heroku. You need to wait for an Add-on or host it somewhere else.

Related

Heroku to AWS Migration Advice

From what i've gathered, there are many solutions to my problem but i'd appreciate some suggestions on where to start. here's the stack we're running on heroku currently:
Rails on puma
mongoDB
elasticsearch
redis
mini_magick
What goes into the decision of using Elastic Beanstalk vs OpsWorks vs CloudFormation vs just setting up everything manually myself? Also, I'd really prefer, for financial reasons, not to use some third party service like Docker if possible. The plethora of options leaves me a little confused as to where to begin or how to even choose. Background: right now I really like Heroku b/c i don't have to think too much about sysadmin (on my team i'm the only developer), but we were recently given a lot of annual AWS credits so it seems to make financial sense for us to shift over to AWS.
I want to expand on Mark’s great answer.
Available alternatives
Since you’re the sole developer, Cloud Formation and OpsWorks aren’t good options for you.
With OpsWorks you’ll need to write, or at least be aware of, the Chef automation code that configures your instances. On the other hand, Cloud Formation by itself isn’t enough. It will help you with AWS cloud resources creations, but you will still need to figure out how to orchestrate your applications deployment, just for starters.
Neither of these options can give you everything you need to run and deploy your code like Heroku does right out of the box. You’ll need to implement parts of it by yourself.
Since rolling your own automation on top of EC2 takes even more effort than the options above, I think you have two alternatives within AWS that will fit your needs:
1) Elastic Beanstalk
It’s the closest you can get to Heroku within AWS. You might have to spend some time getting to know the platform at first since it’s not as intuitive as Heroku, but eventually Elastic Beanstalk will provide you with all the tools you need to continue running your applications without spending time on sysadmin tasks.
2) ECS + Empire
Although you mentioned that using Docker is out of question for you, I still would like to highlight the option of using ECS, Amazon’s Docker orchestration service, as an alternative to Heroku.
By itself, ECS doesn’t provide enough automation to do everything you would expect from a PaaS. The service was intended to be used as a building block which you should extend to fit your needs.
Luckily, the guys from Remind have already done this for you. They have released an open source project called Empire, which according to its own description is “a control layer on top of ECS that provides a Heroku-like workflow”.
Empire is compatible with Heroku’s API, and its command line implements the most important features of Heroku.
Empire is an open source project, so if you choose to use it, you should be prepared to dig into its code from time to time. The documentation isn’t perfect, and although there is some traction around the project, the community isn’t very big.
Overall, it’s a good alternative to Heroku if you’re willing to run your applications using Docker -- and why shouldn’t you?
Addons
The main benefit I see to switching from Redis Labs to Amazon’s Redis service (ElastiCache), other than the fact that you have free AWS credit, is that it’s going to be easier (and cheaper) to secure access to your Redis instances when you also run your applications on AWS.
Overall, it’s relatively easy to replicate the addons you’re using with Heroku when you migrate to AWS. For the third-party addons like Elasticsearch you just continue pointing your application to the relevant endpoint. It’s a bit more complicated to replicate Heroku’s native addons like deploy hooks since you can’t continue to use them when you migrate to AWS. In these cases it’s usually possible to find alternative ways of replicating their functionality within AWS.
If you want to learn about how to migrate the most common addons, I’ve written an article that details how to do that, you can find it here: how to replicate Heroku’s addons on AWS.
Hope this helps.
For your Rails app, Elastic Beanstalk is going to be very similar to Heroku. I would suggest using Elastic Beanstalk if you are already familiar with a PaaS like Heroku. It's probably going to be a bit more difficult to configure at first (there are just a lot more options you can configure), but then it will be a very similar deployment process to what you are used to.
Of course Heroku and most (probably all) of those other services you are using run on top of AWS already, so you would really just be switching from one set of services built on AWS to Amazon's own version of those services. You could possibly continue using some of the same services you are using on Heroku. For example I believe MongoLab is the recommended service for MongoDB on Heroku, and it is my preferred MongoDB-as-a-Service on AWS as well. If you want to use those AWS credits for MongoDB you will have to setup the EC2 servers and install and manage MongoDB yourself.
For Redis you could use Amazon's ElastiCache service or RedisLabs. I've found the features and price to be better with RedisLabs than ElastiCache, but you can use your AWS credits with ElastiCache.
For Elasticsearch you would probably want to use Amazon's new managed Elasticsearch service.

Rails Deployment 101

I am about to deploy my 1st Rails app. I am stuck as I don't know what exactly I need to do. I know about Heroku, AWS, Capistrano and the like, but don't know exactly what they do, and what their benefits are.
I kinda know some things, but all are blurry and ambiguous since I have no formal training and learn as I go. So basically I need someone to explain the general anatomy of Rails deployment.
Something like: 'To make any app working on the web you need the following components... Ways to make this component work with Rails are following. Alternatives are these. These are pros and cons.' Not into too much detail, but general and comprehensive 101 guide.
The reason you may be confused is that there are a number of ways to do it. :D
Heroku provides one of the easiest solutions for basic deployments. You don't need capistrano, just git. (they provide a toolset to assist). Just git push heroku master. Also nice is that a simple deployment on heroku is free; you can pay for more power when you actually need it.
But if you need a little extra functionality that heroku can't provide, you have to host elsewhere, such as a private virtual host.
Capistrano is a set of recipes that help build a deployment environment, sort of like rake tasks. It does so in a very organized manner and allows for easy rollbacks. You define the hosts, their roles, and then the recipes use ssh and scp to set up the evironment. (the server also has to be ready to accept rails applications, through something like passenger)
The Rails and Ruby World(s) are pretty noisy, so I understand your confusion.
At the end of the day, you need your rails app on a server.
Now, even the term server can be a little confusing, because it is generally associated with
A remote machine
A program handling HTTP requests.
(for example webrick or thin which you start when you are developing on your computer and type rails s)
In your case you actually want a remote computer (hooked up to the net) which is running a program called a server to process HTTP requests and forward these to your app which in turn produces a request...
Heroku will help you out with that. (However Heroku adds several layers of abstraction to the mix. So it is not like you have one computer sitting somewhere in the heroku office, serving your application.) Heroku is dead simple to setup with git and rails.
And in the end all that is need to get your app to the "remote server" is a simple git push.
Read the beginner articles on https://devcenter.heroku.com/
I would also suggest for now: Forget about Capistrano.
Oh and you can think of AWS (or probably S3) as some sort of external hard drive, which your app can use to store larger pieces of data (like images, videos etc.)
I have a deployment guide with a good shell script which support Nginx + RVM + Unicorn: deploy_rails

Deploying my first application across AWS

I'm a web developer just now getting interested in sysadmin stuff. I've set up a server before on Linode.com (Ubuntu 10.04 LTS, nginx, Ruby on Rails, PostgreSQL), but there were some issues. Everything was on one machine, so whenever something went wrong with Linode or I got a lot of traffic, my site would go down.
Now I'm interested in setting up a personal blog, and deploying it across Amazon AWS. This is a good opportunity for me to learn to how to use multiple servers with load balancing, auto-scaling, failover, etc. The only problem is I'm not quite sure where to start.
I've read a litany of documentation from Amazon and blog posts elsewhere, but as a sysadmin newbie I have a few questions:
I get that EC2 instances are too volatile to store data on. So where should I store it? Amazon Elastic Block Store? Will the entire filesystem go there, as well as the database?
Do I need serious knowledge of load balancing and scaling? Or will the Amazon Elastic Load Balancer handle make things simple for me? How does their load balancer interact with nginx?
How much of this do you recommend doing through the AWS interface as opposed to through the command line?
Any non-obvious snags that might catch me?
Are there any tutorials for deploying a blog or simple Rails app on EC2? I don't need a production-quality setup here; my main goal is to learn.
Thanks for any answers you can provide!
I've set up my fair share of AWS deployments; here's basics:
Data store
If you have frequently accessed data, as you likely know, it is best to use a database. This is one of the hairier parts of AWS hosting. Your options are, roughly in increasing order of complexity/cost:
SimpleDB - Amazon's own database offering. They give you an HTTP api, which you use to read and write your data. There are some rails libraries for it, but on the whole, it isn't a graceful drop-in for rails.
Amazon RDS - Amazon will preconfigure a mysql-like database server for you. This requires you to boot up an DB server instance, so the pricing server isn't favorable for tiny sites. On the plus side, it allows you to scale your DB server more easily.
Roll your own - Plan around Amazon EC2 instances vanishing at any point; therefore, the local storage you get with EC2 instances can best be considered a big temp directory. Elastic Block Store is Amazon's solution to this; it effectively is a disk image your instances mount. EBS images live independently of EC2 instances, so if your server goes down, you can mount the EBS image on a new EC2 instance. You can essentially roll your own database cluster by booting a bunch of instances and configuring them to replicate off eachother. This works, but is not graceful, and should really only be attempted if you cannot solve your problem with less exotic methods.
Amazon pretty much enumerates these options, plus a few more which are not applicable to you at http://aws.amazon.com/running_databases/
Infrequently changed data should be stored in S3; there's plenty of ruby gems for accessing this easily. If your website is entirely static on the server side, you can even run your entire site off S3
Load Balancing
Amazon "Elastic Load Balancing" is quite effective at the typical web load balancing requirements. It is usually a no-brainer choice, unless you have exotic requirements. It will not scale your cluster for you, however. For auto-booting and shutting down of instances, you should look to Amazon's own auto-scaling solution
Caveats
Be sure to note which "Availability Zone" (aka datacenter) you're in. In some cases, you cannot share AWS resources across availability zones.
Tutorials
There are plenty of tutorials, but in my brief search, none that I found to be really great or up to date. However, check out https://github.com/wr0ngway/rubber , which is a ruby tool for deploying apps to EC2. It will get you most of the way there.

Is there a Ruby on Rails host that supports FTP?

I'd like to create a development integration server that's on the open internet, running a Ruby on Rails app. But I need FTP or SFTP access to this server, so I can upload files to the codebase via FTP.
Is there a good Rails host that allows FTP connections? The cloud providers like Heroku and Dotcloud just support pushing from source code or build files, it appears.
Thanks!
If you have experience setting up a Linux box I'd suggest using a VPS service, like Linode (www.linode.com) for instance, that way you can pretty much have any service running that you want. And if you don't have experience, that's a great way to learn ;)
If it's just for development, Dreamhost's shared hosting works well enough and is affordable. $8.95/month for unlimited domains and storage and bandwidth and it supports Rails via passenger/modruby. You get ssh and sftp access and you can schedule cron jobs too. Especially great for development since you can easily create and destroy apps and subdomains. Main downside is that you'll face a tough time if you need any custom gems or if you need a different version of ruby from what your host's passenger is using. Customer service is good though, and they can install custom gems or move you around between shared hosts if need be. I probably wouldn't dare to deploy a live Rails site on their (or anybody else's) shared plan though.
All that said, lately I've moved to Heroku for dev/staging instances. Not worrying about custom gems is a big plus, and since we deploy live on Heroku it's nice to have almost the exact same environment in staging as well as live. Heroku is free for single-dyno apps as long as you don't spend too much time in the heroku console. Pushing code from different branches to different instances becomes a piece of cake when you use heroku-san.

amazon simpledb with aws-sdb-proxy suitable for high traffic production app?

i am using amazon simpledb with the aws_sdb gem and aws-sdb proxy as outlined in a documentation from amazon with ruby on rails and a local aws proxy that runs on webrick (providing a bridge with ActiveResource).
see http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1242
i am wondering if the aws-sdb-proxy (webrick!) is suitable for high traffic load, since webrick is supposed to be a development server. anyone has comments or experiences?
I've tried Rails with simple_record and I can tell you it's much slower compared to MySQL. You will also have to do quite some work to change your code to adapt to this.
Therefore if you have any high traffic tables that update frequently, I'd say just pass on it. Use MySQL or a different solution. SimpleDB is good only to store metadata for whatever doesn't update very often, and if you get a lot of traffic to that you definitely should get some memcached servers in front of it.
Check this out for some numbers (disregard the Dynamo part of it, I'm now on SDB and moving either back to RDS or Dynamo tonight) Moving MySQL table to AWS DynamoDB - how to set it up?

Resources