I would like to do some cloud processing on a very small cluster of machines (<5).
This processing should be based on 'jobs', where jobs are parameterized scripts that run in a certain docker environment.
As an example for what a job could be:
Run in docker image "my_machine_learning_docker"
Download some machine learning dataset from an internal server
Train some neural network on the dataset
Produce a result and upload it to a server again.
My use cases are not limited to machine learning however.
A job could also be:
Run in docker image "my_image_processing_docker"
Download a certain amount of images from some folder on a machine.
Run some image optimization algorithm on each of the images.
Upload the processed images to another server.
Now what I am looking for is some framework/tool, that keeps track of the compute servers, that receives my jobs and dispatches them to an available server. Advanced priorization, load management or something is not really required.
It should be possible to query the status of jobs and of the servers via an API (I want to do this from NodeJS).
Potentially, I could imagine this framework/tool to dynamically spin up these compute servers in in AWS, Azure or something. That would not be a hard requirement though.
I would also like to host this solution myself. So I am not looking for a commercial solution for this.
Now I have done some research, and what I am trying to do has similarities with many, many existing projects, but I have not "quite" found what I am looking for.
Similar things I have found were (selection):
CI/CD solutions such as Jenkins/Gitlab CI. Very similar, but it seems to be tailored very much towards the CI/CD case, and I am not sure whether it is such a good idea to abuse a CI/CD solution for what I am trying to do.
Kubernetes: Appears to be able to do this somehow, but is said to be very complex. It also looks like overkill for what I am trying to do.
Nomad: Appears to be the best fit so far, but it has some proprietary vibes that I am not very much a fan of. Also it still feels a bit complex...
In general, there are many many different projects and frameworks, and it is difficult to find out what the simplest solution is for what I am trying to do.
Can anyone suggest anything or point me in a direction?
Thank you
I would use Jenkins for this use case even if it appears to you as a “simple” one. You can start with the simplest pipeline which can also deal with increasing complexity of your job. Jenkins has API, lots of plugins, it can be run as container for a spin up in a cloud environment.
Its possible you're looking for something like AWS Batch flows: https://aws.amazon.com/batch/ or google datalflow https://cloud.google.com/dataflow. Out of the box they do scaling, distribution monitoring etc.
But if you want to roll your own ....
Option A: Queues
For your job distribution you are really just looking for a simple message queue that all of the workers listen on. In most messaging platforms, a Queue supports deliver once semantics. For example
Active MQ: https://activemq.apache.org/how-does-a-queue-compare-to-a-topic
NATS: https://docs.nats.io/using-nats/developer/receiving/queues
Using queues for load distribution is a common pattern.
A queue based solution can use both with manual or atuomated load balancing as the more workers you spin up, the more instances of your workers you have consuming off the queue. The same messaging solution can be used to gather the results if you need to, using message reply semantics or a dedicated reply channel. You could use the resut channel to post progress reports back and then your main application would know the status of each worker. Alternatively they could drop status in database. It probably depends on your preference for collecting results and how large the result sets would be. If large enough, you might even just drop results in an S3 bucket or some kind of filesystem.
You could use something quote simple to mange the workers - Jenkins was already suggested is in defintely a solution I have seen used for running multiple instances accross many servers as you just need to install the jenkins agent on each of the workers. This can work quote easily if you own or manage the physical servers its running on. You could use TeamCity as well.
If you want something cloud hosted, it may depend on the technology you use. Kubernetties is probably overkill here, but certiabnly could be used to spin up N nodes and increase/decrease those number of workers. To auto scale you could publish out a single metric - the queue depth - and trigger an increase in the number of workers based on how deep the queue is and a metric you work out based on cost of spinning up new nodes vs. the rate at which they are processed.
You could also look at some of the lightweight managed container solutions like fly.io or Heroku which are both much easier to setup than K8s and would let you scale up easily.
Option 2: Web workers
Can you design your solution so that it can be run as a cloud function/web worker?
If so you could set them up so that scaling is fully automated. You would hit the cloud function end point to request each job. The hosting engine would take care of the distribution and scaling of the workers. The results would be passed back in the body of the HTTP response ... a json blob.
Your workload may be too large for these solutions, but if its actually fairly light weight quick it could be a simple option.
I don't think these solutions would let you query the status of tasks easily.
If this option seems appealing there are quite a few choices:
https://workers.cloudflare.com/
https://cloud.google.com/functions
https://aws.amazon.com/lambda/
Option 3: Google Cloud Tasks
This is a bit of a hybrid option. Essentially GCP has a queue distribution workflow where the end point is a cloud function or some other supported worker, including cloud run which uses docker images. I've not actually used it myself but maybe it fits the bill.
https://cloud.google.com/tasks
When I look at a problem like this, I think through the entirity of the data paths. The map between source image and target image and any metadata or status information that needs to be collected. Additionally, failure conditions need to be handled, especially if a production service is going to be built.
I prefer running Python, Pyspark with Pandas UDFs to perform the orchestration and image processing.
S3FS lets me access s3. If using Azure or Google, Databricks' DBFS lets me seamlessly read and write to cloud storage without 2 extra copy file steps.
Pyspark's binaryFile data source lets me list all of the input files to be processed. Spark lets me run this in batch or an incremental/streaming configuration. This design optimizes for end to end data flow and data reliability.
For a cluster manager I use Databricks, which lets me easily provision an auto-scaling cluster. The Databricks cluster manager lets users deploy docker containers or use cluster libraries or notebook scoped libraries.
The example below assumes the image is > 32MB and processes it out of band. If the image is in the KB range then dropping the content is not necessary and in-line processing can be faster (and simpler).
Pseudo code:
df = (spark.read
.format("binaryFile")
.option("pathGlobFilter", "*.png")
.load("/path/to/data")
.drop("content")
)
from typing import Iterator
def do_image_xform(path:str):
# Do image transformation, read from dbfs path, write to dbfs path
...
# return xform status
return "success"
#pandas_udf("string")
def do_image_xform_udf(iterator: Iterator[pd.Series]) -> Iterator[pd.Series]:
for path in iterator:
yield do_image_xform(path)
df_status = df.withColumn('status',do_image_xform_udf(col(path)))
df_status.saveAsTable("status_table") # triggers execution, saves status.
I am not a developer, but reading about CI/CD at the moment. Now I am wondering about good practices for automated code deployment. I read a lot about the deployment of code to a pre-existing environment so far.
My question now is whether it is also good-practice to use e.g. a Jenkins workflow to deploy an environment from scratch when a new build is created. For example for testing of a newly created build, deleting the environment again after testing.
I know that there are various plugins to interact with AWS, Azure etc. that could be used to develop a job for deployment of a virtual machine.
There are also plugins to trigger Puppet to deploy infra (as code) and there are plugins to invoke an infrastructure orchestration.
So everything is available to be able to deploy the infrastructure and middleware before deploying code (with some extra effort of course).
Is this something that is used in real life? How is it done?
The background of my question is my interest in full automation of development with as few clicks as possible, and cost saving in a pay-per-use model by not having idle machines.
My question now is whether it is also good-practice to use e.g. a Jenkins workflow to deploy an environment from scratch when a new build is created
Yes it is good practice to deploy an environment from scratch. Like you say, Jenkins and Jenkins pipelines can certainly help with kicking off and orchestrating that process depending on your specific requirements. Deploying a full environment from scratch is one of the hardest things to automate, and if that is automated, it implies that a lot of other things are also automated, such as infrastructure, application deployments, application configuration, and so on.
Is this something that is used in real life?
Yes, definitely. A lot of shops do this. The simpler your environments, the easier it is, and therefore, a startup with one backend app would have relatively little trouble achieving this valhalla state. But even the creation of the most complex environments--with hundreds of interdependent applications--can be fully automated; it just takes more time and effort.
The background of my question is my interest in full automation of development with as less clicks as possible and cost saving in a pay-per-use model by not having idling machines.
Yes, definitely. The "spin up and destroy" strategy benefits all hosting models (since, after full automation, no one ever has to wait for someone to manually provision an environment), but those using public clouds see even larger benefits in terms of cost (vs always leaving AWS environments running, for example).
I appreciate your thoughts.
Not a problem. I will advise that this question doesn't fit stackoverflow's question and answer sweet spot super well, since it is quite general. In the future, I would recommend chatting with your developers, finding folks who are excited about this sort of thing, and formulating more specific questions when you all get stuck in the weeds on something. Welcome to stackoverflow!
All is being used in various combinations; the objective is to deliver continuous value to end user. My two cents:
Build & Release
It depends on what you are using. I personally recommend to use what is available with the tool. For example, VSTS (Visual Studio Team Services) offers complete CI/CD pipeline. But if you have a unique need which can only be served by Jenkins then you must use that and VSTS offers that out of the box.
IAC (Infrastructure as code)
In addition to Puppet etc. You can take benefits of AZURE ARM (Azure Resource Manager) Template to Build and destroy an environment. Again, see what is available out of the box with the tool set you have.
Pay-per-use
What I have personally used is Azure Dev/Test Labs and have the code deployed to that via CI/CD pipeline. Later setup Shutdown policy on the VM so it will auto-start and auto-shutdown based on time provided. This is a great feature to let you save cost on the resources being used and replicate environments.
For example, UAT environment might not needed until QA is signed off. But using IAC you can quickly spin up the environment automatically and then have one-click deployment setup to deploy code to UAT.
From what i've gathered, there are many solutions to my problem but i'd appreciate some suggestions on where to start. here's the stack we're running on heroku currently:
Rails on puma
mongoDB
elasticsearch
redis
mini_magick
What goes into the decision of using Elastic Beanstalk vs OpsWorks vs CloudFormation vs just setting up everything manually myself? Also, I'd really prefer, for financial reasons, not to use some third party service like Docker if possible. The plethora of options leaves me a little confused as to where to begin or how to even choose. Background: right now I really like Heroku b/c i don't have to think too much about sysadmin (on my team i'm the only developer), but we were recently given a lot of annual AWS credits so it seems to make financial sense for us to shift over to AWS.
I want to expand on Mark’s great answer.
Available alternatives
Since you’re the sole developer, Cloud Formation and OpsWorks aren’t good options for you.
With OpsWorks you’ll need to write, or at least be aware of, the Chef automation code that configures your instances. On the other hand, Cloud Formation by itself isn’t enough. It will help you with AWS cloud resources creations, but you will still need to figure out how to orchestrate your applications deployment, just for starters.
Neither of these options can give you everything you need to run and deploy your code like Heroku does right out of the box. You’ll need to implement parts of it by yourself.
Since rolling your own automation on top of EC2 takes even more effort than the options above, I think you have two alternatives within AWS that will fit your needs:
1) Elastic Beanstalk
It’s the closest you can get to Heroku within AWS. You might have to spend some time getting to know the platform at first since it’s not as intuitive as Heroku, but eventually Elastic Beanstalk will provide you with all the tools you need to continue running your applications without spending time on sysadmin tasks.
2) ECS + Empire
Although you mentioned that using Docker is out of question for you, I still would like to highlight the option of using ECS, Amazon’s Docker orchestration service, as an alternative to Heroku.
By itself, ECS doesn’t provide enough automation to do everything you would expect from a PaaS. The service was intended to be used as a building block which you should extend to fit your needs.
Luckily, the guys from Remind have already done this for you. They have released an open source project called Empire, which according to its own description is “a control layer on top of ECS that provides a Heroku-like workflow”.
Empire is compatible with Heroku’s API, and its command line implements the most important features of Heroku.
Empire is an open source project, so if you choose to use it, you should be prepared to dig into its code from time to time. The documentation isn’t perfect, and although there is some traction around the project, the community isn’t very big.
Overall, it’s a good alternative to Heroku if you’re willing to run your applications using Docker -- and why shouldn’t you?
Addons
The main benefit I see to switching from Redis Labs to Amazon’s Redis service (ElastiCache), other than the fact that you have free AWS credit, is that it’s going to be easier (and cheaper) to secure access to your Redis instances when you also run your applications on AWS.
Overall, it’s relatively easy to replicate the addons you’re using with Heroku when you migrate to AWS. For the third-party addons like Elasticsearch you just continue pointing your application to the relevant endpoint. It’s a bit more complicated to replicate Heroku’s native addons like deploy hooks since you can’t continue to use them when you migrate to AWS. In these cases it’s usually possible to find alternative ways of replicating their functionality within AWS.
If you want to learn about how to migrate the most common addons, I’ve written an article that details how to do that, you can find it here: how to replicate Heroku’s addons on AWS.
Hope this helps.
For your Rails app, Elastic Beanstalk is going to be very similar to Heroku. I would suggest using Elastic Beanstalk if you are already familiar with a PaaS like Heroku. It's probably going to be a bit more difficult to configure at first (there are just a lot more options you can configure), but then it will be a very similar deployment process to what you are used to.
Of course Heroku and most (probably all) of those other services you are using run on top of AWS already, so you would really just be switching from one set of services built on AWS to Amazon's own version of those services. You could possibly continue using some of the same services you are using on Heroku. For example I believe MongoLab is the recommended service for MongoDB on Heroku, and it is my preferred MongoDB-as-a-Service on AWS as well. If you want to use those AWS credits for MongoDB you will have to setup the EC2 servers and install and manage MongoDB yourself.
For Redis you could use Amazon's ElastiCache service or RedisLabs. I've found the features and price to be better with RedisLabs than ElastiCache, but you can use your AWS credits with ElastiCache.
For Elasticsearch you would probably want to use Amazon's new managed Elasticsearch service.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I'm a beginner RoR programmer who's planning to deploy my app using Heroku. Word from my other advisor friends says that Heroku is really easy, good to use. The only problem is that I still have no idea what Heroku does...
I've looked at their website and in a nutshell, what Heroku does is help with scaling but... why does that even matter? How does Heroku help with:
Speed - My research implied that deploying AWS on the US East Coast would be the fastest if I am targeting a US/Asia-based audience.
Security - How secure are they?
Scaling - How does it actually work?
Cost efficiency - There's something like a dyno that makes it easy to scale.
How do they fare against their competitors? For example, Engine Yard and bluebox?
Please use layman English terms to explain... I'm a beginner programmer.
First things first, AWS and Heroku are different things. AWS offer Infrastructure as a Service (IaaS) whereas Heroku offer a Platform as a Service (PaaS).
What's the difference? Very approximately, IaaS gives you components you need in order to build things on top of it; PaaS gives you an environment where you just push code and some basic configuration and get a running application. IaaS can give you more power and flexibility, at the cost of having to build and maintain more yourself.
To get your code running on AWS and looking a bit like a Heroku deployment, you'll want some EC2 instances - you'll want a load balancer / caching layer installed on them (e.g. Varnish), you'll want instances running something like Passenger and nginx to serve your code, you'll want to deploy and configure a clustered database instance of something like PostgreSQL. You'll want a deployment system with something like Capistrano, and something doing log aggregation.
That's not an insignificant amount of work to set up and maintain. With Heroku, the effort required to get to that sort of stage is maybe a few lines of application code and a git push.
So you're this far, and you want to scale up. Great. You're using Puppet for your EC2 deployment, right? So now you configure your Capistrano files to spin up/down instances as needed; you re-jig your Puppet config so Varnish is aware of web-worker instances and will automatically pool between them. Or you heroku scale web:+5.
Hopefully that gives you an idea of the comparison between the two. Now to address your specific points:
Speed
Currently Heroku only runs on AWS instances in us-east and eu-west. For you, this sounds like what you want anyway. For others, it's potentially more of a consideration.
Security
I've seen a lot of internally-maintained production servers that are way behind on security updates, or just generally poorly put together. With Heroku, you have someone else managing that sort of thing, which is either a blessing or a curse depending on how you look at it!
When you deploy, you're effectively handing your code straight over to Heroku. This may be an issue for you. Their article on Dyno Isolation details their isolation technologies (it seems as though multiple dynos are run on individual EC2 instances). Several colleagues have expressed issues with these technologies and the strength of their isolation; I am alas not in a position of enough knowledge / experience to really comment, but my current Heroku deployments consider that "good enough". It may be an issue for you, I don't know.
Scaling
I touched on how one might implement this in my IaaS vs PaaS comparison above. Approximately, your application has a Procfile, which has lines of the form dyno_type: command_to_run, so for example (cribbed from Heroku Architecture - The Process Model):
web: bundle exec rails server
worker: bundle exec rake jobs:work
This, with a:
heroku scale web:2 worker:10
will result in you having 2 web dynos and 10 worker dynos running. Nice, simple, easy. Note that web is a special dyno type, which has access to the outside world, and is behind their nice web traffic multiplexer (probably some sort of Varnish / nginx combination) that will route traffic accordingly. Your workers probably interact with a message queue for similar routing, from which they'll get the location via a URL in the environment.
Cost Efficiency
Lots of people have lots of different opinions about this. Currently it's $0.05/hr for a dyno hour, compared to $0.025/hr for an AWS micro instance or $0.09/hr for an AWS small instance.
Heroku's dyno documentation says you have about 512MB of RAM, so it's probably not too unreasonable to consider a dyno as a bit like an EC2 micro instance. Is it worth double the price? How much do you value your time? The amount of time and effort required to build on top of an IaaS offering to get it to this standard is definitely not cheap. I can't really answer this question for you, but don't underestimate the 'hidden costs' of setup and maintenance.
(A bit of an aside, but if I connect to a dyno from here (heroku run bash), a cursory look shows 4 cores in /proc/cpuinfo and 36GB of RAM - this leads me to believe that I'm on a "High-Memory Double Extra Large Instance". The Heroku dyno documentation says each dyno receives 512MB of RAM, so I'm potentially sharing with up to 71 other dynos. (I don't have enough data about the homogeny of Heroku's AWS instances, so your milage may vary))
How do they fare against their competitors?
This, I'm afraid I can't really help you with. The only competitor I've ever really looked at was Google App Engine - at the time I was looking to deploy Java applications, and the amount of restrictions on usable frameworks and technologies was incredibly off-putting. This is more than "just a Java thing" - the amount of general restrictions and necessary considerations (the FAQ hints at several) seemed less than convenient. In contrast, deploying to Heroku has been a dream.
Conclusion
Please comment if there are gaps / other areas you'd like addressed. I feel I should offer my personal position. I love Heroku for "quick deployments". When I'm starting an application, and I want some cheap hosting (the Heroku free tier is awesome - essentially if you only need one web dyno and 5MB of PostgreSQL, it's free to host an application), Heroku is my go-to position. For "Serious Production Deployment" with several paying customers, with a service-level-agreement, with dedicated time to spend on ops, et cetera, I can't quite bring myself to offload that much control to Heroku, and then either AWS or our own servers have been the hosting platform of choice.
Ultimately, it's about what works best for you. You say you're "a beginner programmer" - it might just be that using Heroku will let you focus on writing Ruby, and not have to spend time getting all the other infrastructure around your code built up. I'd definitely give it a try.
Note, AWS does actually have a PaaS offering, Elastic Beanstalk, that supports Ruby, Node.js, PHP, Python, .NET and Java. I think generally most people, when they see "AWS", jump to things like EC2 and S3 and EBS, which are definitely IaaS offerings
AWS / Heroku are both free for small hobby projects (to start with).
If you want to start an app right away, without much customization of the architecture, then choose Heroku.
If you want to focus on the architecture and to be able to use different web servers, then choose AWS. AWS is more time-consuming based on what service/product you choose, but can be worth it. AWS also comes with many plugin services and products.
Heroku
Platform as a Service (PAAS)
Good documentation
Has built-in tools and architecture.
Limited control over architecture while designing the app.
Deployment is taken care of (automatic via GitHub or manual via git commands or CLI).
Not time consuming.
AWS
Infrastructure as a Service (IAAS)
Versatile - has many products such as EC2, LAMBDA, EMR, etc.
Can use a Dedicated instance for more control over the architecture, such as choosing the OS, software version, etc. There is more than one backend layer.
Elastic Beanstalk is a feature similar to Heroku's PAAS.
Can use the automated deployment, or roll your own.
As Kristian Glass Said, there is no comparison between IaaS(AWS) and PaaS(Heroku, EngineYard).
PaaS basically helps developers to speed the development of app,thereby saving money and most importantly innovating their applications and business instead of setting up configurations and managing things like servers and databases. Other features buying to use PaaS is the application deployment process such as agility, High Availability, Monitoring, Scale / Descale, limited need for expertise, easy deployment, and reduced cost and development time.
But still there is a dark side to PaaS which lead barrier to PaaS adoption :
Less Control over Server and databases
Costs will be very high if not governed properly
Premature and dubious in current day and age
Apart from above you should have enough skill set to mange you IaaS:
Hardware acquisition
Operating System
Server Software
Server Side Scripting Environment
Web server
Database Management System(Mysql, Redis etc)
Configure production server
Tool for testing and deployment
Monitoring App
High Availability
Load Blancing/ Http Routing
Service Backup Policies
Team Collaboration
Rebuild Production
If you have small scale business, PaaS will be best option for you:
Pay as you Go
Low start up cost
Leave the plumbing to expert
PaaS handles auto scaling/descaling, Load balancing, disaster recovery
PaaS manages all security requirements
PaaS manages reliability, High Availability
Paas manages many third party add-ons for you
It will be totally individual choice based on requirement. You can have details on my PPT Hosting Rails Apps.
There are a lot of different ways to look at this decision from development, IT, and business objectives, so don't feel bad if it seems overwhelming. But also - don't overthink scalability.
Think about your requirements.
I've engineered websites which have serviced over 8M uniques a day and delivered terabytes of video a week built on infrastructures starting at $250k in capital hardware unr by a huge $MM IT labor staff.
But I've also had smaller websites which were designed to generate $10-$20k per year, didn't have very high traffic, db or processing requirements, and I ran those off a $10/mo generic hosting account without compromise.
In the future, deployment will look more like Heroku than AWS, just because of progress. There is zero value in the IT knob-turning of scaling internet infrastructures which isn't increasingly automatable, and none of it has anything to do with the value of the product or service you are offering.
Also, keep in mind with a commercial website - scalability is what we often call a 'good problem to have' - although scalability issues with sites like Facebook and Twitter were very high-profile, they had zero negative effect on their success - the news might have even contributed to more signups (all press is good press).
If you have a service which is generating a 100k+ uniques a day and having scaling issues, I'd be glad to take it off your hands for you no matter what the language, db, platform, or infrastructure you are running on!
Scalability is a fixable implementation problem - not having customers is an existential issue.
Actually you can use both - you can develop an app with amazon servers ec2. Then push it (with git) to heroku for free for awhile (use heroku free tier to serve it to the public) and test it like so. It is very cost effective in comparison to rent a server, but you will have to talk with a more restrictive heroku api which is something you should think about. Source: this method was adopted for one of my online classes "Startup engineering from Coursera/Stanford by Balaji S. Srinivasan and Vijay S. Pande
Well, people usually ask this question: Heroku or AWS when starting to deploy something.
My experiment of using both of Heroku & AWS, here is my quick review and comparison:
Heroku
One command to deploy whatever your project types: Ruby on Rails, Nodejs
So many 1-click to integrate plugins & third parties: It is super easy to start with something.
Don't have auto-scaling; that means you need to scale up/down manually
Cost is expensive, especially, when system needs more resources
Free instance available
The free instance goes to sleep if it is inactive.
Data center: US & EU only
CAN dive into/access to machine level by using Heroku run bash (Thanks, MJafar Mash for the advice) but it is kind of limited! You don't have full access!
Don't need to know too much about DevOps
AWS - EC2
This just like a machine with pre-config OS (or not), so you need to install software, library to make your website/service go online.
Plugin & Library need to be integrated manually, or automation script (public script & written by you)
Auto scaling & load balancer are the supported services, just learn how to config & integrate to your system
Cost is quite cheap, depends on which services and number of hours you use it
There are several free hours for T2.micro instances, but usually, you will pay few dollars every month (if still using T2.micro)
Your free instance won't go to sleep, available 24/7 (because you may pay for it :) )
Data center: around the world. Pick the region which is the best fit for you.
Dive into machine level. So you can enjoy it
Some knowledge about DevOps, but it is okay, Stackoverflow is helpful there!
AWS Elastic Beanstalk an alternative of Heroku, but cheaper
Elastic Beanstalk was announced as a public beta from 2010; it helps we easier to work with deployment. For detail please go here
Beanstalk is free, the cost you will pay will be for the services you use & number of hours of usage.
I use Elastic Beanstalk for a long time, and I think it can be the replacement of Heroku and cheaper!
Summary
Heroku: Easy at beginning, FREE instance, but expensive later
AWS: Not easy, free hours available, kind of cheaper, Beanstalk should be concerned to use
So in my current system, I use Heroku for staging and Beanstalk for production!
The existing answers are broadly accurate:
Heroku is very easy to use and deploy to, can be easily configured for auto-deployment a repository (eg GitHub), has lots of third party add-ons and charges more per instance.
AWS has a wider range of competitively priced first party services including DNS, load balancing, cheap file storage and has enterprise features like being able to define security policies.
For the tl;dr skip to the end of this post.
AWS ElasticBeanstalk is an attempt to provide a Heroku-like autoscaling and easy deployment platform. As it uses EC2 instances (which it creates automatically) EB servers can do everything any other EC2 instance can do and it's cheap to run.
Deployment with EB is very slow; deploying an update can take 10-15 minutes per server and deploying to a larger cluster can take the best part of an hour - compared to just seconds to deploy an update on Heroku. Deployments on EB are not handled particularly seamlessly either, which may impose constraints on application design.
You can use all the services ElasticBeanstalk uses behind the scenes to build your own bespoke system (with CodeDeploy, Elastic Load Balancer, Auto Scaling Groups - and CodeCommit, CodeBuild and CodePipeline if you want to go all in) but you can definitely spend a good couple of weeks setting it up the the first time as it's fairly convoluted and slightly tricker than just configuring things in EC2.
AWS Lightsail offers a competitively priced hosting option, but doesn't help with deployment or scaling - it's really just a wrapper for their EC2 offering (but costs much more). It lets you automatically run a bash script on initial setup, which is nice touch but it's pricy compared to the cost of just setting up an EC2 instance (which you can also do programmatically).
Some thoughts on comparing (to try and answer the questions, albeit in a roundabout way):
Don't underestimate how much work system administration is, including keeping everything you have installed up to date with security patches (and occasional OS updates).
Don't underestimate how much of a benefit automatic deployment, auto-scaling, and SSL provisioning and configuration are.
Automatic deployment when you update your Git repository is effortless with Heroku. It is near instant, graceful so there are no outages for end users and can be set to update only if the tests / Continuous Integration passes so you don't break your site if you deploy broken code.
You can also use ElasticBeanstalk for automatic deployment, but be prepared to spend a week setting that up the first time - you may have to change how you deploy and build assets (like CSS and JS) to work with how ElasticBeanstalk handles deployments or build logic into your app to handle deployments.
Be aware in estimating costs that for seamless deployment with no outage on EB you need to run multiple instances - EB rolls out updates to each server individually so that your service is not degraded - where as Heroku spins up a new dyno for you and just deprecates the old service until all the requests to it are done being handled (then it deletes it).
Interestingly, the hosting cost of running multiple servers with EB can be cheaper than a single Heroku instance, especially once you include the cost of add-ons.
Some other issues not specifically asked about, but raised by other answers:
Using a different provider for production and development is a bad idea.
I am cringing that people are suggesting this. While ideally code should run just fine on any reasonable platform so it's as portable as possible, versions of software on each host will vary greatly and just because code runs in staging doesn't mean it will run in production (e.g. major Node.js/Ruby/Python/PHP/Perl versions can differ in ways that make code incompatible, often in silent ways that might not be caught even if you have decent test coverage).
What is a good idea is to leverage something like Heroku for prototyping, smaller projects and microsites - so you can build and deploy things quickly without investing a lot of time in configuration and maintenance.
Be sure to factor in the cost of running both production and pre-production instances when making that decision, not forgetting the cost of replicating the entire environment (including third party services such as data stores / add ons, installing and configuring SSL, etc).
If using AWS, be wary of AWS pre-configured instances from vendors like Bitnami - they are a security nightmare. They can expose lots of notoriously vulnerable applications by default without mentioning it in the description.
Consider instead just using a well supported mainstream distribution, such as Ubuntu or Debian (or CentOS if you need RPM support).
Note: Amazon offer have their own distribution called Amazon Linux, which uses RPM, but it's EC2 specific and less well supported by third party/open source software.
You could also setup an EC2 instance on AWS (or Lightsail) and configure with something like flynn or dokku on it - on which you could then deploy multiple sites easily, which can be worth it if you maintain a lot of services or want to be able to spin up new things easily. However getting it set up is not as automagic as just using Heroku and you can end up spending a lot of time configuring and maintaining it (to the point I've found deploying using Amazon clustering and Docker Swarm to be easier than setting them up; YMMV).
I have used AWS EC instances (alone and in clusters), Elastic Beanstalk and Lightsail and Heroku at the same time depending on the needs of the project I'm working on.
I hate spending time configuring services but my Heroku bill would be thousands per year if I used it for everything and AWS works out a fraction of the cost.
tl;dr
If money was never an issue I'd use Heroku for almost everything as it's a huge timesaver - but I'd still want to use AWS for more complicated projects where I need the flexibility and more advanced services that Heroku doesn't offer.
The ideal scenario for me would be if ElasticBeanstalk just worked more like Heroku - i.e. with easier configuration and quicker and a better deployment mechanism.
An example of a service that is almost this is now.sh, which actually uses AWS behind the scenes, but makes deployments and clustering as easy as it is on Heroku (with automatic SSL, DNS, graceful deployments, super-easy cluster setup and management).
I've used it quite lot for both Node.js app and Docker image deployments, the major caveat is the instances are shared (something reflected in their lower cost) and currently no option to buy dedicated instances. However their open source deployment tool 'now' can also be used to deploy to dedicated instances on AWS as well as Google Cloud and Azure.
It's been a significant percentage of our business migrating people from Heroku to AWS. There are advantages to both, but it's gets messy on Heroku after a while... once you need a certain level of complexity no longer easy to maintain with Heroku's limitations.
That said, there are increasingly options to have the ease of Heroku and the flexibility of AWS by being on AWS with great frameworks/tools.
Funny thing is Heroku actually uses AWS on the backend. It takes away all the overhead and does architecture management on EC2 for you. (Got that knowledge from a senior engineer at a Big Company during an Interview)
Sometimes, I wonder why people compare AWS to Heroku. AWS is an IAAS( infrastructure as a service) it clearly speaks how robust and calculative the system is. Heroku, on the other hand, is just a SAAS, it is basically just one fraction of AWS services. So why struggle with setting up AWS when you can ship your first product to the prime using Heroku.
Heroku is free, simple and easy to deploy almost all types of stacks to the web. Heroku is specifically built to bypass all the hassles of shipping your application to a live server in less than no time.
Nevertheless, you may want to deploy your application using any of the tutorials from both parties and compare
AWS DOCS and Heroku Docs
Well Heroku uses AWS in background, it all depends on the type of solution you need. If you are a core linux and devops guy you are not worried about creating vm from scratch like selecting ami choosing palcement options etc, you can go with AWS. If you want to do things on surface level without having those nettigrities you can go with heroku.
Even though both AWS and Heroku are cloud platforms, they are different as AWS is IaaS and Heroku is PaaS
Well! I observer Heroku is famous in budding and newly born developers while AWS has advanced developer persona. DigitalOcean is also a major player in this ground. Cloudways has made it much easy to create Lamp stack in a click on DigitalOcean and AWS. Having all services and packages updates in a click is far better than doing all thing manually.
You can check out completely here: How to Host PHP on Amazon AWS EC2
Amazon Web Services (AWS) offers lots of services from IaaS to PaaS with assured 99.9999999% durability and availability of data and infrastructure. AWS offers infrastructure automation along with several tools for developers to pipeline their application deployment process.
On the other hand, Heroku is just PaaS which offers services to manage your platform on their cloud. It nowhere stands with AWS whether it is infrastructure or security.
Heroku is like subset of AWS. It is just platform as a service, while AWS can be implemented as anything and at any level.
The implementation depends on what the business requirement. If it fits in either, use accordingly.
I'm about to start development on a project with very uncertain load/traffic specifics. When it will be released there will certainly be very low load that can easily be handled by a single desktop quad code machine.
The problem is that there will be (after some invite-only period) a strong publicity for the product so I expect considerable traffic/load peaks.
I haven't read enough about cloud providers and I'm mostly leaning toward Amazon or Azure for the credibility these two companies have without checking them out as I should with others (ie. Rackspace that I suppose is also a cloud service provider).
What I want
I would like to create a normal Asp.net MVC web application that can be run on in-house single machine low-cost server. It would run web server along with database (relational and maybe also document) and fulltext search (not SQL FTS but rather high speed separate product like Lucene or Sphinx). But after initial invite-only period I'd like to move this app to the cloud to make it more traffic/load demand-friendly.
As much as I know Amazon offers a sort of virtual machine hosting which I understand you setup as a normal server but has possible flexible resources in terms of load power. I'm not sure if that can be accomplished on Azure as well.
Questions
What is your experience with application transition to cloud and which one did you choose and why?
What would you recommend I should think about when designing/developing the solution to make the transition as painless as possible.
Based on your experience is it better to move to the cloud (financial wise) or is it better to buy your own servers and load balance application yourself and maybe save money on the long run?
"Cloud" is such a vague term. Still, I think this is a very good question.
Basically, IaaS cloud hosting does not magically make your application scale. It's really a virtual private server with very short contract / cancellation periods.
For scalability, the magic lies not so much in the hosting, but in the horizontal scalability of the application code itself. This is related to all the distributed computing challenges. For example, adding more application servers is not always easy: you must be sure that you don't persist any user state in the server application (but rather in a database, static can be evil), caching can be problematic because local caches can make the situation worse if you're using a round-robin strategy, etc.
What is your experience with application transition to cloud and which one did you choose and why?
What would you recommend I should think about when designing/developing the solution to make the transition as painless as possible.
You don't really have to do anything different just to host on EC2 or Azure -- basically. But of course, it's not that easy when things grow.
For instance, EC2 instance storage is rather limited. Additional storage on EBS, however, does not provide comparable performance characteristics and can be a bit more laggy than a disk. The point here is that EBS does magically scale, and it's probably more PaaS than IaaS; but it's not a simple hard disk and it does, consequently, not behave like a hard drive. I don't know about Azure block storage. In general, expect additional abstraction layers to introduce problems of their own, no matter what they do.
Based on your experience is it better to move to the cloud (financial wise) or is it better to buy your own servers and load balance application yourself and maybe save money on the long run?
Typical cloud providers are more expensive than the usual 'round-the-corner VPS providers, but they are, to my experience, also much more reliable and professional. EC2 has a free tier (but it's quite small), Azure gives you a small instance for free for 3 months.
Doing the calculation right is rather tricky; for example, if you have to shut down your service for whatever reason, it's nice to be able to cancel now rather than pay another year - you might want to put this risk into your calculation. On the other hand, both EC2 and Azure will be considerably cheaper if you sign up for 6 or 12 months, rather than paying by the hour.
You might want to check out the free Azure plan, because it's nice to start fiddling around without any cost. A big advantage of cloud providers is that you can scale vertically very easily: buying a 16 core, 64GB RAM server machine is really expensive, but if there's so much traffic on your site, upgrading your plan won't be such a big issue.
As someone hasn't mention it yet...
AppHarbor has been amazing. You can push stuff in a matter of minutes. Deployment is a breeze. And setting up your project for it is easier as well. And it doesn't even require any major changes in your solution to fit in.
For the full-text search, you might consider something like Websolr.
A lot of this depends on what your app is doing (e.g., are there separable components that might benefit from running on different instances, vs. a simple CRUD application with a front end). One thing to consider is that in a cloud application you normally don't have a traditional relational database. As such, you have to choose either cloud or traditional hosting, or plan on coding your access layer twice. Azure does have relational databases (SQL Azure), although they're not identical to SQL Server 2008R2. You're going to have to research the pros/cons of a cloud setup for your specific situation.
As far as financial concerns, it's usually a lot cheaper to just get an account with a hosting company instead of a cloud service, since you pay by the month, instead of the hour (last time I checked an account with Azure running 24/7 for a month would cost about $40-$50, while you can get hosting for $15 a month). The savings with the cloud come in when you have to run several servers, and the cost of maintaining them surpasses the cost of the instance on the cloud platform.
So, sorry, there's no silver bullet answer for you. Read up on the different services available. Consider what your application needs, what prices will be, and go from there.
I have just migrated an MVC-based application from a dedicated server to Azure. When migrating the MSSQl-database, I first tried importing .bacpak files but some of the tables failed because of their size. I then used the SQL Database migratio wizard which worked fine for small tables but failed for tables with BLOB-fields. For these tables I had to use temporary intermediate tables. Then after a while after all the data was transferred setting up the Webapp was a breeze and we went in production. At first, everything seemed to work just fine, but after a couple of hours when the load got heavier, all kind of errors occurred. I went into the Azure portal and it was really easy to see the