Right now we are using PostgreSQL 8.3 (on Linux) as a database backend to our Ruby on Rails web application.
Considering that on PostgreSQL database we actively use row level blocking and PL/PGSQL, what can we employ to secure our data -- I mean tools, packages, scripts, strategies -- to successfully replicate the database and build multi-master combination?
I will appreciate master-slave suggestions as well.
For example, if I put several application servers running Apache/Ruby to achieve higher performance and at the end deploy several database servers, is there any way to build multi-master replication in PostgreSQL?
Right now we use PostgreSQL WAL mechanism to backup data to file system.
Thanks a lot.
There are a few tools for master-slave (and master-multislave) scenarios, usually trigger-based. Slony-I has already been mentioned (is stable and solid, but a bit difficult to operate). People having problems with Slony-I wrote
Londiste (by Skype team) and PyReplica. Bah, and I just spotted
Mammoth has been open-sourced
For multimaster there is Bucardo (note: it is not that polished)
or commercial offerings - for example by Continuent or CyberTec.
If you haven't already, I'd suggest a look at the High Availability, Load Balancing, and Replication chapter of the PostgreSQL manual. It gives a clear overview of the available techiques and their features.
Hm, Bucardo is really good and stable, in comparison to the others here. It is as polished as can be a Perl-based replication system, and supports master-slave as well as multi-master replication, with interesting conflict resolution concepts.
If you need simple master-slave rep I'd recommend Londiste, but for the multi-master needs, Bucardo is the only acceptable solution IMHO.
I though Postgres-R looked promising, however, its still in development.
It was supposedly stabilised and purported to be a potential for integration with the standard issue, but its yet to come to fruition.
Late answer but there is a new open source software for asynchronous master-master replication of PostgreSQL (also works for MySQL):
rubyrep
Focus is on easy setup.
Disclosure: I wrote it.
You can have a look at slony.
PGCluster looks promising - we use it in limited situations without much problems.
http://pgfoundry.org/projects/pgcluster/
Related
We have a very simple function (We look something up from a third party database and return an answer. It's literally five lines of code.) We would like to offload this task from our main server because we expect a high volume of traffic for this one function and would like to optimize it.
We are thinking about testing the promise of many cloud/PaaS providers, where they handle scaling and performance responsibilities.
We're most interested in Rails environments, but are curious to hear experiences from others about any company in the space.
Here are the PaaS companies we found that supports Rails:
1) Heroku
2) DotCloud
3) Duostack
Questions:
1) Do you know of other Rails-specific companies? Also feel free to list non-Rails companies since we're interested in following other companies in case they eventually provide Rails support.
2) How has your experience been with these companies?
Foreword and disclaimer: I work for DotCloud; so the following might be biased. You've been warned.
DotCloud could be interesting for you if you like the following features:
run something else than Ruby (what about some Django or Pylons code talking with your SQL DB? Or even some PHP blog like WordPress or Drupal, using the same user authentication database?)
experiment with databases like Redis or MongoDB, or background ruby workers, without paying for add-ons
SSH access, crontab access (without requiring an add-on)
cheaper workers (I didn't come up with this one; some of our users coming from the Heroku world told us that workers were insanely expensive there)
Duostack is indeed very nice if you want to mix specifically Rails and Node.js. I've been told that they had awesome auto-configuration facilities.
Finally, if you only plan to do Rails and nothing else, ever, you might as well stick with Heroku since they've been in that business for a while, and are probably more mature than the first two of the batch.
Shameless plug: DotCloud is offering a beta test drive; so if you want to see what it looks like, just subscribe to the beta and you will be quickly enough be able to see for yourself. Heroku has a free tier as well.
You could add EngineYard in the mix - but i'd be inclined to use Heroku as my first choice, Dotcloud second (it's a newish product, and is very good but still in development)
If you want more control over your app/servers or want to run it on any cloud or your own infrastructure without having to download/deploy anything, you can try Cloud 66 (www.cloud66.com)
Disclaimer: I work for Cloud 66
A lot has changed on the scene since this question was asked. We recently looked into these services and settled on Heroku, but even more recently decided to continue managing my own deployments directly on EC2. Here are some points not mentioned in the other answers.
Heroku
Now supports much more than just ruby
Has really great-looking support for PostgreSQL
Uses LXC for process containers, like DotCloud
DotCloud
Is now Docker, and is putting a lot of manpower into developing docker.io
Doesn't have a free tier any more
I'm not sure if DotCloud is using Docker internally or not, since the docs say explicitly it isn't production-ready yet.
Our decision to stick with plain EC2 was motivated by the fact that it's cheaper and affords a lot more flexibility. For example, we use local-only http servers behind our public server to do some of our request processing, which doesn't really fit into the PaaS models out there. We would have had to reimplement all our back-end components as redis workers, and pay for them as additional dynos. The fact that Amazon RDS now supports PostgreSQL was also a compelling factor. Incidentally, Amazon has a full-stack PaaS offering as well, Elastic Beanstalk.
Just stumbled upon the question. There are similar ones around here. The problem is also: The PaaS scene is changing very quickly. New vendors are popping in every week or so.
Nowadays OpenShift from Red Hat might also be mentioned here as a Ruby PaaS.
OFFTOPIC + shameless plug: I have compiled a list of PHP PaaS here: http://blog.fortrabbit.com/comparing-cloud-hosting-platforms/
I have 4 servers behind a load balancer and a staging server, a db server, and a utility server for a web application that hosts a number of web sites.
Should I make the jump to Chef to manage these servers or should I just maintain them manually? The servers were built using sprinkle but at that time there were only two. Now that there are four maintenance is becoming more of an issue.
I'd like to hear experiences and the pros and cons of chef and other chef-like tools.
Thanks!
We moved to Chef, and we now have a 1 minute redeployment for our app. So it certainly pays off.
However it took a long time (a few months) to get to the point where we were happy with the chef deployment strategy. With hindsight we would have had several spare boxes around to try out a 'from scratch' deployment. I certainly wouldn't advise trying chef in a production environment without an exact mirror of the setup and lots and lots of tests, nor would I advise using chef on a setup that hasn't been 'cheffed' from scratch.
Having said that, Chef is far better than the other options we looked at and now that we are out the other side it's a breeze deploying a new version of the app on multiple servers. In future I'll be using it for any staging or production environment I have.
In summary, yes, but only if your client/employer is aware that it may take some time before they see the benefits, which will be considerable.
Chef has a steep learning curve, so it will take a while - at least a few weeks - to become familiar with how to use it.
But once you pick up the basics, it is a very handy system, and can simplify any number of tasks - even for the smallest of infrastructures.
A few notes for when you start.
You will be setting up and tearing down cloud servers dozens of times, just to get the hang of it. Experiment.
The standard opscode cookbooks (github.com/opscode/cookbooks) are very useful. But you will need to extend/customize many of them for your particular case. And you will need to search the 'net for cookbooks that are missing from the opscode/cookbooks repository.
Read the opscode cookbooks, and read the 37signals cookbooks too.
The application and database cookbooks are geared towards standard Rails apps with MySQL and Memcached. To the extent that this describes you, you are way in luck.
I'm working on a project that is considering using Cassandra as a database. We would like to eventually migrate to Cassandra even if we use MySQL to start with, given its scalability. I know that big companies like Facebook, Digg, and recently Twitter is using Cassandra, but I don't believe any of those sites run off Rails. My question is whether or not it's feasible to use Cassandra using Ruby on Rails. Points to consider:
We heavily rely on the Authlogic gem. Would switching to Cassandra affect how it works?
Are there any mature ruby clients for Cassandra? Looking on Github it seems that fauna's client (now twitters's client) is the most mature. Has anyone had production experience with it?
Appreciate any tips.
Twitter is running rails on most of their front end. Fauna's client is actually built and released by twitter, so you can be pretty certain that it's up to date and stable on large workloads. Looking at the history of commits shows that there are frequent improvements being pushed to it, which is great.
Most likely Authlogic would need to be customized to work properly with Cassandra. In particular, it appears to provide certain methods based on named_scope and relational data.
It does appear that someone has built a plugin for DataMapper support in Authlogic: http://twitter.com/collintmiller/statuses/2064046718. You may be able to use that as a starting point for making it compatible with Cassandra.
Good luck!
I don't think starting with MySQL and then moving to Cassandra is a good idea.
Cassandra is a NoSQL solution, while MySQL is a "classic" SQL-driven database.
This means that your models would be different.
If you start with MySQL, you will have to rely on ActiveRecord for creating your models. If you then change to Cassandra, you will have to change all your models to a NoSQL-compatible middleware (such as BigRecord). This not only means changing your models, but also the controllers that use them (since their interface would be different).
This said, Cassandra and the like are supposed to be used on very demanding applications - like twitter.
The rest of web applications out there are orders of magnitude less intense - are you sure you still would need Cassandra?
PostgreSQL, and a well-designed database, is just good enough 98% of the time.
If you then change to Cassandra, you will have to change all your models to a NoSQL
This isn't true at all. If you have programmed in such a way that your MySQL db does loads of joins, then yes, you may have a problem. We avoided joins as much as we could from the beginning when we started the MySQL route. Then when we started migrating to Casandra it was fairly easy, we did so with 1 model only at first. Then say 4 models in one go. Etc. Works well. In fact, when you read the interview with twitter you'll notice they ran MySQL and Casandra in parallel for the same model for a while: http://nosql.mypopescu.com/post/407159447/cassandra-twitter-an-interview-with-ryan-king.
As to Authlogic, you can keep that part in mySQL for as long as you like, just keep it loosely coupled with your Cassandra data.
I'm researching Cassandra, MongoDB, and CouchDB right now.
One way to tell which has the most developer support, is by checking the number of watchers on the highest rated github project for each. At least as a rough estimate.
Right now it's
852 - MongoDB
http://github.com/jnunemaker/mongomapper
544 - CouchDB
http://github.com/jchris/couchrest
178 - Cassandra
http://github.com/fauna/cassandra
Although, I have to say with a bunch of high profile sites (Twitter, Digg, Reddit, etc) recently announcing that they're moving to Cassandra, this is a big vote of confidence for them.
Mongo seems to have the most and best documentation so far. Their auto-sharding is still in alpha though so how well it scales still remains to be seen I think.
I'm just starting to learn about all this stuff, so if others have insight please share.
There is also http://github.com/NZKoz/cassandra_object, which IIANM builds on top of the fauna client. "Cassandra Object provides a nice API for working with Cassandra. CassandraObjects are mostly duck-type compatible with ActiveRecord objects so most of your controller code should work ok... Use this in production only if you're looking to help out with the development, there are a bunch of rough edges right now."
What would you suggest as the best server stack for a dedicated server which needs to host Rails SaaS application (not a lot of traffic but need to keep options open for future).
Regardless of your application, you're probably going to want certain standard components:
nginx/passenger will work for small apps or large apps. You should use it.
Unless you have a specific reason to use something else, you should use MySQL since the vast majority of the Rails community uses it and you will be able to get better support.
You should have memcached running right away, even if you don't use it for much yet. You're going to want to be able to seamlessly add caching as it's needed.
You're going to want to have a process for setting up a new server that is fully automated. That way, if you need to spin up a second server, it's trivial. If you ssh into a box to configure it, this means that if you need another server in a pinch (or the first server gets corrupted), you're going to need to remember all the things you did. Not a good place to be in an emergency.
You should be on the very latest version of Ruby on Rails, and upgrade frequently. Keep an eye on deprecations and changes and make the suggested changes as early as possible. When Rails 3 is released, use it.
Engine Yard, where I work, uses an open source tool called chef to manage our automated deployment solution. That's probably a good option.
As ever with a question that broad, it depends. Some things to think about:
What does the application do?
Does the application use any database vendor-specific SQL?
What are the availability requirements?
What are the performance requirements?
How much data will there be?
Which server stacks do you or the person who will be administering it have experience of?
What is your budget?
One thing I can say with complete certainty is that you don't want to be using Windows because Rails work best on a Linux/UNIX stack.
A lot of it depends on your needs. If the model isn't very complex and/or your traffic is fairly low, you can probably get away with apache, mongrel, and sqlite on some *nix.
If you start seeing performance issues, you can add some memcached into the mix, upgrade (relatively painlessly) to mysql, and use a different server (passenger/nginx).
There are also alternate ruby implementations that have some performance boosting changes. Rubninous and jRuby come to mind.
I'm a big fan of ruby on rails, and it seems to incorporate many of the 'greatest hits' of web application programming techniques. Convention over configuration in particular is a big win to my mind.
However I also have the feeling that some of the convenience I am getting is coming at the expense of technical debt that will need to be repaid down the road. It's not that I think ROR is quick and dirty, as I think it incorporates a lot of best practices and good default options in many cases. However, it seems to me that just doesn't cover some things yet (in particular there is little direct support for security in the framework, and plugins that I have seen are variable in quality).
I'm not looking for religious opinions or flamewars here, but I'd be interested to know the community's opinion on what areas Rails needs to improve on, and/or things that users of Rails need to watch out for on their own because the framework won't hold their hand and guide them to do the right thing.
Regardless of framework the programmer needs to know what she's doing. I'd say that it's much easier to build a secure web application using something as mature, well designed and widely adapted as Ruby on Rails than going without the framework support.
Take care with plugins and find out how they work (know what you do, again).
I love Rails too, but its important for us to understand the shortcomings of the framework that we use. Though it might be a broad topic addressing these issues wont hurt anyone.
Aside from security issues, one other big issue is DEPLOYMENT on Shared Hosts. PHP thrives in shared hosting environments but Rails is still lagging behind.
Of course most professional Rails developers know that their apps need fine-tuned servers for production and they will obviously deploy on Rails-Specific hosts.
In order for Rails to continue success the core team should address this issue, especially with Rails 3.0 (Merb +Rails) coming..
An example of this is simple: I have a bluehost account, and i noticed the Rails icon in my cpanel. I talked to the bluehost support and they said its more or less a dummy icon, and that it doesn't function properly.
Having said that any professional who wanted to deploy a Rails App would not use bluehost. , but it does hurt Rails, when hosts say that they support it and then users run into problems which their support know nothing about..
The article you refer to defines technical debt as
[the] eventual consequences of
slapdash software architecture and
hasty software development
With rails, any development that is not test driven incurs technical debt. But that is the case with any platform.
At an architectural level Rails provides some deployment challenges. A busy site must scale with lots of hardware or use intelligent caching strategies.
My advice to anyone adapting Rails would be to:
use TDD for all your development
verify the quality off any plugin
you use by reading its tests. If
they are not clear and complete,
avoid the plugin
read "Rails
Recipes" and "Advanced Rails
Recipes" (Advanced Rails Recipes has
a good recipe for adding
authentication in a RESTful way)
be prepared to pay for hardware to scale your site (hardware is cheaper than development time)
From my experience, by far the biggest tolls you end up paying with RoR are:
Pretty big default stack (not counting plugins you might be using)
Updating models tends to be a pain in the ass, at least in production servers.
Updating Rails or Ruby themselves is a bit more complicated than it should, but this differs depending on your server setup.
As ewalshe mentioned, deployment is sometimes a drag, and further down the road, should you require it, scaling gets a bit iffy, as it does with most development frameworks.
That being said, I'm an avid user of RoR for some projects, and with the actual state of hardware, even though you do end up paying some tech debt to using it, it's almost negligible. And one can hope these issues will be reviewed eventually and solved.
With any level of abstraction there is a bit of a toll you pay - genericized methods aren't quite as fast as those specific to something built just for your purpose. Fortunately though, it's all right there for you to change. Don't like the query plans that come out of the dynamic find methods? write your own, good to go.
Someone above put it well - hardware is cheaper than developers. I'd add "at a sufficiently low amount of hardware"
I'm reading Deploying Rails Applications and recommend it highly to answer your concerns.
The book is full of suggestions to make life easier, taking a deployment-aware approach to your Rails development from scratch, rather than leaving it to later.
I don't think the choice of RoR implies a technical debt but just reading the first few chapters alerted me to practices I should be following, particularly on shared hosts, such as freezing the core rails gems so you can't be disrupted by upgrades on the host.
The 30-page chapter on Shared Hosts includes memory quote tips such as using multiple accounts (if possible) with one Rails app per account. It also warns about popular libraries such as RMagick possibly pushing your memory size to the point where your processes are killed (such as a 100MB limit, which it suggests some hosts periodically apply).