When creating a Saas product, the database is the biggest issue when it comes to scaling.
From my research, it seems Django provides a more built-in robust way to vertically partition the database.
Rails has some gems that you can use, but its not something that ships with the Rails framework and your really at the mercy of the developer who released it (and may not be keeping it up-to-date etc.)
In terms of vertical partitioning, is my observation correct that Django is more robust in this area?
Multi-dbs using django: http://docs.djangoproject.com/en/dev/topics/db/multi-db/
You can do multiple databases in Rails, each model can have a separate connection if you want, this is part of the core functionality, but this is not going to be a very effective way to scale. It is usually much better to focus efforts on tuning your database stack by adjusting settings, clustering, replicating, or by applying more rigorous caching within the application itself.
Splitting tables across databases is really not going to buy you much scalability. A more modern approach is sharding where each table is split into separate instances, though to go down this path requires both significant preparation and a very solid understanding of database architecture. Since you can control the size of each shard, you can scale without limitation in this regard.
Keep in mind that Rails ships with ActiveRecord as the standard Object Relationship Mapper (ORM) but there are two other production-ready packages which offer different features: DataMapper and Sequel
In my experience with developing large-scale applications, the only time you will find the database to be a bottleneck is when you're using a poorly tuned configuration, an under-powered piece of equipment, or your table structure isn't sufficiently indexed or de-normalized. This is common to all database-backed applications and isn't unique to Rails or Django, so your choice of platform is really not relevant here.
Most of the performance gains in a Rails application come from proper data de-normalization, use of Rails.cache backed by memcache, and switching from model-based loads to direct queries where necessary for performance reasons. Rails can scale very well on a single database instance with nothing especially exotic in terms of techniques, just thorough application of basic optimization principles.
Related
I'm dealing with a rails application which is growing too big. It takes long time to start and lots of memory. We are having performance issues. Tests are very slow. Managing the codebase, debugging and introducing new futures becoming harder everyday. We are thinking to split the application into smaller components (Rails engines or different Rails apps) where all components will share a single database.
Considering such scenario we will need to share the models, maybe some libs, tests, and some gems, etc... It doesn't sound right for me!
Is there any pattern can be applied?
There are some great suggestions here about to to split up a Rails code base however I think that before you apply any of then you need to stop and seriously consider how you got into this position. All of these solutions introduce new complexities and challenges. They might be worthwhile trade offs but only if you make sure they also solve a problem you have today.
Take each of the pain points you listed (startup time is slow, memory use if high, performance is poor, rest performance is poor, development speed is slow) and run a "5 whys" exercise on them. Why are these things happening. Why did the app get into this state.
Most importantly before you commit to any plan for splitting up a large app consider if the app should be large in the first place. If your app is more complex than your product demands then switching to an equally complex cluster of services is not an improvement.
More concretely I would recommend against shared database access between apps/services/whatever. A shared database has a shared schema which becomes fragile. It also leads to tightly coupled services which lack the separation of concerns you need to see any improvement in your pace of development. Just as splitting one massive class into several tightly coupled files does not improve it nor does splitting one app into coupled services.
If you must maintain a large app you need to isolate separate concerns. In order to do so you need to break the dependencies between them. Based on the efforts I have seen to repair Rails monoliths you'll have better success creating clean interface within your existing app an then splitting out components than if you split the app apart and then hope the resulting pieces can be improved independently.
Yes, there are several ways to do it:
Micro services Erin Swenson-Healey has a nice post for rails.
Hexagonal Architecture GoRuCo 2012 Hexagonal Rails by Matt Wynne and Refactoring with Hexagonal Rails
Rails Engines Approach. Dealing with Rails Application Complexity - A Report from MWRC
Use other framework more suitable for high complexity and more PORO oriented. http://lotusrb.org/
Also see Ruby Midwest 2011 - Keynote: Architecture the Lost Years by Robert Martin
Read "Practical Object Oriented Design in Ruby" by Sandi Metz. Also, watch every presentation and talk by her on the internet. Youtube is a great place to start. She's a great speaker. Rails is written in Ruby and Ruby is an object oriented language which brings with it massive benefits. But grokking the object oriented part takes a little work. Sandi will get you there.
Is it possible to merge two apps, one e-commerce application based on PrestaShop with second one "ordinary" build on Rails?
Is that even possible? Which issues I would encountered during building that application?
Or maybe I just saying "stupid things" and that is a very bad idea?
-
Second one approach is to use Spree with Rails. However I heard that Spree is much slower than PrestaShop and doesn't have much modules.
Advantages on Spree are that my skills at Ruby are much more superior than in PHP, so I would mastered Spree much faster than PrestaShop.
I have practically zero experience with both sollutions(Spree, PrestaShop).
Priority in choosing option is "how much time it would take" and quality of final application(this order).
If I understand correctly, you are weighing the advantages and disadvantages of several solutions and can't decide which one to pick, so you wonder whether it makes sense to use more than one. The answer is usually no. Even though it may be possible, multiple technologies are much more difficult to set up and maintain than choosing one single technology. I strongly advise against it.
Even if one particular solution doesn't have all the features you need (e.g. specific modules), it's usually much easier to simply write those features yourself than to add a completely different technology to your stack. I say usually because as always, there are exceptions, but only in rare cases where some highly specific feature is needed.
It's true that Rails apps are sometimes (not always) slower than PHP apps, but the speed difference only becomes relevant if you need to scale to a very large number of users (millions). And even at those high scales, Rails will perform well if you're smart about setting up your server infrastructure, make use of caching, etc.
Finally, I would personally recommend Spree over Prestashop, but that's a matter of opinion. If you already have experience with Ruby, I definitely recommend Spree.
I'm looking for tools to monitor/test performance in rails, and I'm not having much luck finding anything particularly effective. I've read the rails 'performance' guide, but I use RSpec instead of Rake:Test, so I'm not particularly keen to use the rake:test framework.
So, what do folks use for performance testing in rails apart from the rake:test benchmarker? Any suggestions appreciated
Performance benchmarking is one of those things that you'll get different opinions about depending on who you ask. One thing I hear over and over is that you shouldn't obsess over performance early on. I'm not sure where you're at with your application, but this could be something to consider. After developing a rather large application, I can honestly say I agree with them. It's better to use good practice when developing and wait to do performance tuning at a later time. Best practices include things like indexing database columns.
For performance monitoring of live Rails applications, New Relic is one of the best tools out there*. The free plan is a little limited as it only provides 30 minutes of historical data, but the information it collects is priceless. Some of the cloud hosts like Heroku and Engine Yard are offering free bronze plan upgrades, which stores a week of data. Once you have information about your application, you can make educated decisions about where to focus your time.
* My opinion
When your app needs some performance testing, the default TestUnit based performance benchmarking tests are a great start. However, you shouldn't stop there, and should consider using a variety of tools based on the nature of your application.
For example, analyzing production logs using a tool like the request-log-analyzer is a great way to identify the real performance bottlenecks. Bullet is another great tool you can run in your development environment to identify performance inefficiencies in your database calls. For low level benchmarking, rails also gives you the benchmark helper methods in models, controllers and views. This can be handy if you are focusing on tuning some specific part of your application.
It is also worth noting that rspec is not the best tool for benchmarking performance (to date). In my opinion, trying to assert things like it should_take_less_than 50 is stretching the idea of performance testing and trying to force it into the concept of BDD. Performance is less often about absolute expectations and more about identifying the slowest parts of your app and making them faster.
There are many online resources on the topic. I've found these railscasts to be a great starting point:
http://railscasts.com/episodes/368-miniprofiler (free)
http://railscasts.com/episodes/411-performance-testing (pro, requires subscription)
I have recently been come to for advise on an idea of rewriting an existing site due to massive maintenance problems in their old design.
Basically, the company is considering a complete rewrite of aprox. 90% of their site which is currently written in PHP using an in-house framework.
The company would like to rebuild the backend and some way down the road the front-end as well in order to minimize their maintenance problems and make it easier to bring in new tallent which doesn't need to spend months learning the architecture before they can become affective developers.
We've come up with several possible architectures, some involving rewriting the whole site using an existing scripting web framework such as Cake, Django or RoR and some compiled language frameworks in Java or even .Net.
In addition we have come up with some cross technology solutions - such as a web application built in Django with a Scala backend.
I was wondering what merit would there be to using a single technology stack (such as RoR) as apposed to using a cross between two (such as RoR with Scala, like Twitter now do) and vise versus.
Take into consideration the fact that this company's site is a high traffic site with over 1 million unique visitors a day, which will be transitioned onto the new architecture slowly over a long period (several month to a year)...
Thanks
Generally speaking, I don't think any particular technology stack is better than any other in terms of performance; Facebook runs on PHP and I know first hand that Java and .Net scale well too. Based on what you've said I'd be worrying more about the maintainability related issues than performance and scalability just now.
Generally speaking, I would keep within one well known technology stack if possible:
It'll be easier to find (good) staff for a well known platform / technology stack; there will be more in the market, and rates will not be as expensive as the skills are too rare.
Splitting your technology means you need a wider range of knowledge; by sticking with a single technology stack you can focus on it, with better / faster results.
People tend to focus on one platform / technology stack, so it'll be easier to find developers for technology X, rather than technologies X, Y and Z.
It's easier for team members to work on different parts of the system as it's all written in the same technology - presumably in a similar way.
In terms of integation, items within the same technology stack play nicer together, crossing into different stacks can quickly become more difficult and harder to support.
Where you do want to use different technology, ensure the boundary is clean - something standards based or technology agnostic like web service / JSON calls.
Rewriting your whole codebase will require significant effort and lots of pressure, and for a start you would be best to start by doubling or maybe tripling the initial time estimate.
You can think about your problem from two perspectives :
Number of platforms. In order to minimize and manage complexity of this task, it is most definitely your imperative to reduce mental strain by using as less new technologies/platforms as possible. For example, an advantage of RoR over PHP+Smarty that has been cited often is that with RoR you don't have to learn a new presentation language.
Team effort required to learn new techs. If your existing team is already versatile with PHP, Django etc, but not RoR, then you might be better off reusing existing skills, since the mental strain for developers will be lesser.
Single technology means less moving targets; simpler is always better as long as it meets the requirements. So, use as many technologies as you need, but not more than that. The technology is not important; the right technology is the one that makes your job easier. So, ask yourself what are your current pain points, and how would each of those technologies help.
Getting the architecture right and the code clean is the easiest with Smalltalk and Seaside, especially when you do the persistence with Gemstone. At this scale, you'll have to talk to them about license costs. You might know them from the Ruby work they do with Maglev.
I have a memory of talking to people who have got so far in using Ruby on Rails and then had to abandon it when they have hit limits, or found it was ultimately too rigid. I forget the details but it may have had to do with using more than one database.
So what I'd like is to know is what features/requirements fall outside of Ruby on Rails, or at least requires such contortions that it is better to use another more flexible framework, even though you may have to lose some elegance or write extra boilerplate code.
Rails (not ruby itself) is proud to be "Opinionated Software".
What this means in practice is that the authors of rails have a certain target audience in mind (themselves basically) and aim rails specifically at that. If X feature isn't needed for that target audience, it doesn't get added.
Off the top of my head, things that rails explicitly doesn't support that people may care about:
Foreign keys in databases
Connections to multiple DB's at once
SOAP web services (since rails 2.0)
Connections to multiple database servers at once
That said, it is very easy to extend rails with plugins, and there are plugins which add all of the above functionality to rails, and a lot more, so I wouldn't really count these as limits.
The only other caveat is that rails is built around the idea of creating CRUD web applications using MVC. If you're trying to do something which is NOT a CRUD web app (like twitter, which is actually a messaging system, or if you are insane and want to use a model like ASP.NET webforms) then you will also encounter problems. In this case you're better off not using rails, as you're essentially trying to build a boat out of bicycle parts.
In all likelihood, the problems you will run into that can't just be fixed with a quick plugin or a day or 2 of coding are all inherent problems with the underlying C Ruby runtime (memory leaks, green threads, crap performance, etc).
Ruby on Rails does not support two-phase commits out of the box, which maybe required if your database-backed application needs to guarantee immediate consistency AND you need to use two or more database schemas.
For many web applications, I would venture that this is not a common use-case. One can perfectly well support eventual consistency with two or more databases. Or one could support immediate consistency with one database schema. The former case is a great problem to have if your app has to support a mondo amount of transactions (note the technical term :). The latter case is more typical, and Rails does just fine.
Frankly, I wouldn't worry about limits to using Ruby on Rails (or any framework) until you hit real scalability problems. Build a killer app first, and then worry about scalability.
CLARIFICATION: I'm thinking of things that Rails would have a hard-time supporting because it might require a fundamental shift in its architecture. I'll be generous and include some things that are part of the gem/plugin ecosystem such as foreign key enforcement or SOAP services.
By two-phase commits, I mean attempting to make two commits to physically distinct servers within one transactional context.
Use case #1 for a two-phase commit: you've clustered your database, so that you have 2 or more database servers and your schema is spread across both servers. You may want to commit to both servers, because you want to allow ActiveRecord think do a "foreign key map" that traverses across the different servers.
Use case #2 for a two-phase commit: you're attempting to implement a messaging solution (sorry, I'm J2EE developer by day). The message producer commits to the messaging broker (one server) and to the database (a different server).
Also found some good discussion about the limits of ActiveRecord.
I think there is a greater “meta-question” here, that could be answered and that is “when is it OK to lean on external libraries to speed up development time?”
Third party libraries are often great and can drastically reduce development time, however there is a major problem, Joel Spolsky calls this “the law of leaky abstractions.” If you look that up on Google his post will come up first. Essentially this means that the trade off in development time means that you have no idea what is going on under the covers. So when something breaks you are completely stuck and have very limited methods of debugging. This also means that if you hit one of the features that are simply unsupported in RAILS, that you really need, you’ll have no next step except to write the feature yourself, if you’re lucky. Many libraries can make this difficult to do.
We’ve been burned badly in my dev shop by this issue. Our solutions worked fine under normal load, but we found that the third party subscription libraries that we were using simply could not stand up to the kind of load that we experienced once our site started to get a large number of concurrent users. This puts us in a very difficult place; essentially we have to rewrite the entire subscription service ourselves, with performance in mind. Doing this means that we’ve wasted all the time that we spent using the library.
Third party libraries can be great for small to medium sized applications; they can drastically reduce development time and hide complexities that aren’t necessary to deal with in the early stages of development. However eventually they will catch up with you and you’ll likely have to rewrite or re-engineer your solution to get past the “law of leaky absctractions”
Ruby don't have a functionality like IsPostBack in ASP.Net
Orion's answer is right on. There are few hard limits to AR/Rails: deploying to Windows, AR connectors that aren't frequently used, e.g. Firebird, ), but even the things he mentioned, multiple databases and DB servers, there are gems and plugins that address those for legacy, sharding, and other reasons.
The real limitation is how time-consuming it is to keep on top of all the things that rails devs are working on, and researching specific issues, given how many blogs, and how much mailing list volume there are.