Can one use Ruby $SAFE level against exploits of Rails vulnerabilities? - ruby-on-rails

Rails vulnerabilities such as CVE-2013-0155 and CVE-2013-0156 might allow a user to run arbitrary code, constructed from an untrusted source (XML/YAML parameters).
Does using $SAFE=4 (say) prevent such exploits or not? If yes, do Rails devs use such security level? If no, why?
Thanks

Tainted Objects and Protected Operations
Basically, $SAFE levels boil down to protecting an application from tainted data, but the devil is in the details. There's a whole chapter in Programming Ruby that addresses the various levels, and it's worth your time to read it over.
Rails Apps Generally Need Tainted Data
Generally speaking, the average Rails application invites tainted input. The params hash is tainted by definition, and most of your user interactions rely on tainted data. Granted that you can sanitize input or use framework features to prevent mass-assignment vulnerabilities, in most cases your application will still need to interact with user-supplied data to be truly useful.
Security Trade-Offs and Other Considerations
It may or may not be possible to run a Rails application usefully when $SAFE = 4. Quite frankly, I've never seen anyone do it with production code "in the wild." Even if you can, you will likely have to jump through so many hoops to untaint user-supplied data, instantiate ActiveRecord objects, and perform filesystem writes (e.g. logging or file uploads) that it may not be worth the security trade-offs.
You may be better served by using a lower $SAFE level and relying on other security best-practices to achieve your goals. It really just depends on what you're trying to accomplish. As with all security controls, your mileage will definitely vary.

When your whole Rails has $SAFE set to 4, nothing can happen to you. Thus said, your app is crippled to deliver static views only. At least that's my experience from some time ago.

Related

Is there a best practice and recommended alternative to Session variables in MVC

Okay, so first off before anyone attempts to make a determination that this is a "duplicate" question; I have reviewed most of the posts on SO regarding similar questions but even in combination of all that has been said I still am somewhat at a dilemma as to the definitive or maybe I should say unanimous agreement on this.
I can however say that I have (based on the posts) conclusively determined that the answer is based on the scope of the requirement. But even with consideration of this, the opinions seem too diverse for me to make a decision as to how I should handle this.
My immediate requirement is that I need to persist variable data from 1 controller across many views. More specifically, I have a controller and corresponding view that handles shopping cart item counts and I would like to persist that data across multiple views. I am thinking that the _layout view is the most logical choice for this.
Now I have successfully accomplished this task by assigning the value to a Session variable which is retrieved from my _layout view; so even when the user were to navigate any where within the site the number of items in the Shopping Cart will persist until either they leave the site or complete the checkout; in which case the variable will be cleared in code.
The posts I've read seemed biased to either staying away from Session variables in favor of Cookies and storing data in a database; or stating that for the intent purpose for which I propose to use them, Session variables are perfectly fine to use.
The other thing I've read suggests that Session variables can potentially impede overall performance if there is high traffic on the site since the information is stored on the server.
I personally cannot justify storing this type of information in a database and subsequently hitting the database as I'd imagine that this could also affect site performance and seems a bit overkill for storage of temporary data. TempData, ViewData and ViewBag do not work in persisting the data so they are not logical choices for the requirement IMO.
If there is another well suited alternative to the Session variable (which is working for me) I would like to know what it is.
2 posts that seem contradictory in effort of providing best recommendations leave me a bit confused.
Cons: Is it a good practice to avoid using Session State in ASP.NET MVC? If yes, why and how?
Pros: Still ok to use Session variables in ASP.NET mvc, or is there a better alternative for some things (like a cart)
Seems that this question (although presented in many different variations) has no definitive answer that I can conclude.
If there is a more preferrable way to accomplish this without overkill then that is the answer I'm in search of.
I read somewhere the use of MVC filters in tandem with the Global.ascx application start section as well, but this does not seem appropriate for variables set at the controller level as much as perhaps, static variables.
Can someone maybe squash (for lack of a better word) the many diverse opinions on the topic and maybe provide a more definitive answer to the question? I'm sure the diverse opinions have their place and I'm not attempting to discredit them. But having a definitive and possibly unanimous answer would be better; then I could sort through the other posts to determine what is best for my application.
Of course, if this question has no definitive answer; just tell me that and I'll attempt to derive my own answer from the other posts.
Thanks
===========================================================
UPDATED RESPONSE TO ANSWERS PROVIDED
Caching and Cookies seem to be a general preference from the responses however I've also noted the statement that caching its not an ideal candidate to use across multiple web server because synchronization can be a potential issue.
Giving credit to Tim, it's stated that Database storage is optimized and users have the option to return at a later time and continue where they left off.
That is an excellent point, but keeping foresight on probabilities; its likely a reasonable given that some users may not return leaving unneccessary data in the database.
So keeping the DB optimized and clean (which "to me" is of equal relevance) would require implementing a maintenance task to automatically expire those records based on a set threshold of time to account for those circumstances. Although a maintenance task is not an unquestionable option, I still think this adds just a bit more work to the task simply for the intent purpose of serving as temporary storage.
Nonetheless, I do respect Tim's recommendation and believe it deserves merit on countering my initial opinion to a degree; that a database would not seem to be a viable option for storing Temporary data; so I think the compromise would be to store the data in a database (given the scenario of a Shopping Cart or similar) perhaps after a checkout. This way as you previously stated, the data may be persistently tracked upon subsequent visits so you have a record of transactions. But more importantly, it would be data of those transactions having real relevance to persist to the database.
It was also stated that although Session is faster than Database; but notwithstanding to have its caveats that can to some degree be mitigated by other mechanisms such as leveraging the SessionStateBehavior attribute, just serving as one example.
BUT... I think Erik kind of drove the point home with the Dunning-Kruger Effect. Although, from the content and explanations for proposed answers given here; I seriously doubt the expertise of any of the individuals who have responded is any way questionable. Nonetheless, I tend to agree on the fact of getting a unanimous opinion may be somewhat of a higher than reasonable expectation on my part.
What I was more specifically looking for was a general consensus for a technique that would comfortably accomodate a diverse number of scenarios. In other words, something that would accomodate not only my particular scenario but also provide the element of scalability to larger environments with potentially heavier traffic. This way a change in the programming would be either alleviated altogether or minimal at best.
==================================================
Summary based on the feedback:
Session variables seem to accomodate smaller case scenarios and when applicable, but they have some potential for persistence concerns among other notable discrepancies as stated very thorougly by Erik. So this option obviously will not fit a scalable model.
Caching is preferable over Session variables but again not neccessarily the "best" scalable option due to among other things to the potential synchronization complexities in web server farm environments as previously pointed out. But an option nonetheless.
Database storage is scalable but for the intent purpose of temporary volatile storage is probably not the most elegant option from a database perspective as it would require periodical cleanup. Personally, having a strong foundation in database concepts earlier in my career this probably is not going to be something that many developers will likely agree with; but using the database for this purpose may suffice for Web Development from a programmers perspective; however from perspective of the DAL and DB development this (to me) has the potential for mandating an additional DB task to enforce an efficient backend.
Cookies seem to be a nice option having the combined "desirable" elements of Session variables and caching.
==================================================
CONCLUSION
Based on the answers; I think COOKIES and CACHING seem to be generally well rounded proposals for best practice across the board in combination with database storage when continued persistence is required after the fact; as potentially good candidates for scalability of the ones presented.
The ultimate choice between the 2 would seem to be based on the amount and type of data requiring storage (e.g. sensitive vs non-sensitive and whether or not there is any concern that the client may alter the data on their end); in addition to special considerations for COOKIES in the fact that they may be disabled by the clients.
Obviously, there is no one size fits all solution as clearly pointed out and concluded from the answers provided but in terms of scalability; I may be wrong but these seem to be the BEST choices available.
Because all the responses are good; I'm fairly going to credit all the posts as useful and going to accept Erik's answer as a well rounded overall scalable solution. I wish I could select more than one accepted answer as I believe Tim's response was also very well layed out and concise.
Gupta's response was good also, but I wanted more elaboration of the proposed answer and not a repeat of previous posts.
Thanks Guys!
You will never get unanimous opinion on anything in any large group of people. That's just human nature. Part of that stems from the Dunning-Kruger Effect which states that the less someone knows about a subject, the more likely they are to over value their expertise in that subject. In other words, lots of people think they know something, but only because they don't know they don't know it. Part of it is simply that people have different experiences, and some have found no problems with session, while others have in various situations, or vice versa...
So, to backup your research, which suggest that the answer depends heavily on the requirements, we need to understand what your requirements are. If this is to be a high traffic site, with load balanced servers in a web farm, then stay as far away from session as you can. Sure, it's possible to share session in various ways in a server farm environment (session server, distribute cache server, etc..), but avoiding session will almost always be faster if you can help it.
If your site is a single server, and unlikely to ever grow beyond that. And your traffic patterns are relatively low, then session may be a useful option. However, you should always be aware that session is unreliable storage, and can disappear on you at any time. If the app pool is recycled, session is gone. If an uncaught exception bubbles up to the worker process, the session may be gone. If IIS thinks there's not enough memory, your session may be gone, regardless of any timeout values configured. You also can't always get reliable notification that a session has ended, since terminated sessions do not fire the Session_End event.
Another issue is that Session is serialized. In other words, IIS prevents more than one thread from writing to the session at a time, and it often does this by locking the session while a thread is running if it has not opted out of writable session locking. This can cause severe problems in some cases, and merely poor performance in others. You can mitigate this by marking various methods with a read-only session attribute if you aren't going to be modifying it in that method.
Ultimately, if you do choose to use session, then try to only use it for small, short lived things if at all possible, and if not possible then build in a way to "regenerate" the data if the session is lost. For instance, using your number of items in cart example, you could write a method that first checks to see if the value is there, and if not it goes out and loads it from the database. Always use this method to access the variable, rather than accessing it directly from session... this way, if the session is lost it will just reload it.
However, having said this... For the number of items in a cart, I would generally prefer to use a cookie for this information, since cookies get passed to the page on every load anyways, and this is a small discrete unit of data. Generally prefer Session for sensitive data that you want to prevent the user from being able to change.. number of items in the cart simply doesn't fit that rule.
When
Databases are highly optimized. A simple value like a shopping cart count is a good candidate for caching by the database and (hopefully) cheap to compute outright. It may be a non-issue.
However, if you have ruled out other mechanisms, small, user-by-user values are viable candidates for session.
Cache is fine for site-wide values, or user-specific values with unique keys. However, synchronizing caches across multiple web servers can be difficult. Out of process session state will stay synchronized because it is stored in a single location (database or a state server).
Of course, there are many 3rd party caching alternatives with various options to keep them synchronized.
Regardless of where the count is temporarily stored, I'm of the opinion that shopping carts themselves should be stored in the database so that users have the option to return later and continue where they left off.
Performance
If you use out of process session state (e.g. in a load balanced environment and/or to make session more durable), it will hit a database or call an out of process service, but the call is relatively cheap unless you are serializing large object graphs.
Session is loaded once per request. Subsequent read access is very fast.
Writing to session can be detrimental to performance, even when there is no load. Why? most modern applications use asynchronous calls, and when multiple async calls hit an HTTP handler (page, controller, etc) that reads/writes session, ASP.Net will lock the session to serialize access. To avoid this, you can decorate your controllers with [SessionState( SessionStateBehavior.ReadOnly )]
Design
Now I have successfully accomplished this task by assigning the value
to a Session variable which is retrieved from my _layout view;
This seems like mixing concerns, i.e. having the view aware of the underlying storage mechanism. From a purist standpoint, I would set this value on a view model or at least put it in the ViewBag. From a practical standpoint, one or two values retrieved in this manner probably won't hurt anything, but beware of letting it grow much further.
I read somewhere the use of MVC filters in tandem with the Global.ascx
application start section as well, but this does not seem appropriate
for variables set at the controller level as much as perhaps, static
variables.
Static variables have perfectly legitimate uses, but you must understand them thoroughly or risk serious problems.
See my answers pertaining to static variables in ASP.Net:
does aspx provide special treatment for c# static variables
Static fields vs Session variables
Session alternative in different prospective :-
When you keep something in session it breaks the primary rule in ASP.NET MVC. You can use these options as an alternative of session.
If your asp.net (MVC) session do boxing unboxing on the object then it makes a little load on the server. Try this idea
Caching :- Storing a List or something like large data in session is better can fit in Caching. You have control on whenever you want it to expire rather than user session.
If your app depends on JSON/Ajax data then you can use some kind of functionality provided in html5 (like WebSQL, IndexDB). it will not use the cookie so you can save some workload on the server.

Is it sensible to run Backbone on both server (Node.js and Rails) and client for complex validations?

This is verbose, I apologise if it’s not in accordance with local custom.
I’m writing a web replacement for a Windows application used to move firefighters around between fire stations to fill skill requirements, enter sick leave, withdraw firetrucks from service, and so on. Rails was the desired back-end, but I quickly realised I needed a client-side framework and chose Backbone.js.
Only one user will be on it at a time, so I don’t have to consider keeping clients in sync.
I’ve implemented most of the application and it’s working well. I’ve been avoiding facing a significant shortcoming, though: server-side validations. I have various client-side procedures ensuring that invalid updates can’t be made through the interface; for instance, the user can’t move someone who isn’t working today to another station. But nothing is stopping a malicious user from creating a record outside of the UI and saving it to the server, hence the need for server-side validation.
The client loads receives all of today’s relevant records and processes them. When a new record is created, it’s sent to the server, and processed on the client if it saved successfully.
The process of determining who is working today is complex: someone could be scheduled to work, but have gone on holidays, but then been called in, but then been sent home sick. Untangling all this on the server (on each load?!) in Ruby/Rails seems an unfortunate duplication of business logic. It would also have significant overhead in a specific case involving calculating who is to be temporarily promoted to a higher rank based on station shortages and union rules, it could mean reloading and processing almost all today’s data, over and over as each promotion is performed.
So, I thought, I have all this Backbone infrastructure that’s building an object model and constraining what models can be created, why not also use it on the server side?
Here is my uncertainty:
Should I abandon Rails and just use Node.js or some other way of running Backbone on the server?
Or can I run Node.js alongside Rails? When a user opens the application, I could feed the same data to the browser and Node, and Rails would check with the server-side Backbone to make sure the proposed new object was valid before saving it and returning it to the browser.
One factory is how deeply Rails is entrenched in this application. There isn’t that much server-side Ruby for creation/deletion of changes, but I made a sort of adaptation layer for loading the data to compensate for the legacy database model. Rails is mostly just serving JSON, and CSS, Javascript, and template assets. I do have a lot of Cucumber features, but maybe only the data-creation ones would need to be updated?
Whew! So, I’m looking for reassurance: is it reasonable, like suggested in this answer, to be running both Rails and Node on the server, with some kind of inter-process communication? Or has Rails’s usefulness shrunk so much (it is pretty much a single-page application like mentioned in that answer) that I should just get rid of it entirely and suffer some rewriting to a Node environment?
Thanks for reading.
It doesn't sound like you're worried a lot about concurrency as much as being able to do push data, which both platforms are completely capable of performing. If you have a big investment in Ruby code now, and no one is complaining about its use then what might be the concern? If you just want to use Node for push and singularity of using javascript through the stack then it might be worth it to move code over to it. From your comments, I really feel it is more about what is interesting to you, but you'll have to support the language(s) of choice. If you're the only one on the team then its pretty easy to just slip into a refactor to node just because it is interesting. Just goes back to who's complaining more, you or the customer. So to summarize: Node lets you move toward a single language in your code base, but you have to worry about what pitfalls javascript on the server has currently. Ruby on Rails is nice because you have all the ability to quickly generate features and proto them out.

Rails: Caching a Tree in memory on the server

I have a postgresql database which contains multidimensional data. What I did was I wrote a data structure that sorts all database rows into a tree format. Now the database is large and so I dont want to generate the tree every time a request comes in from a browser. What Id like to do is construct the tree once in a certain time period and persist it in memory on the server.
The tree is read only by the way. So now each time a request comes in the tree need not be generated new, its already there.
How can I make this happen. Im not an expert programmer, just a beginner and definitely new to web programming. So some of these concepts are new to me.
But if you could please point me in the right direction in terms of the concepts involved here, I can google the rest.
Or if you have actual links or examples that would be fantastic.
Thanks
There are several ways to approach this problem. It depends on just how close to the application you want the variables. If you're really looking to have them right "on top" of the application, for fastest possible use, then you could look at using a global variable "$tree" and hooking in to the application flow. Other options might include memcached, which is still pretty darn close to the application. Redis would be a good option for an in-memory database that could be shared between instances of an application, as it is a NoSQL database that you query. Not quite as close to the application though.
Generally, those are your primary options. In-application variables that survive requests. Application frameworks that will help variables survive requests and provide you a querying mechanism. Or, an In-Memory databases that will allow you to store and query rapidly from multiple instances. Each is a viable option, though I'm pretty sure you'd get a lot of 'community' flack for using a straight up global variable (such practices are considered unclean for their lack of thread-safety and other such concerns).

The Ruby community values simplicity...what's your argument for simplifying a db schema in a new project?

I'm working on a project with developers who have not worked with Ruby OR Rails before.
They have created a schema that is too complicated, in my opinion. The schema has 117 tables, and obtaining the simplest piece of information would require traversing/joining 7 tabels...and of course, there's no "main" table that serves as a sort of key between them. The schema renders many of the rails tools like 'find' method, and many of the has_many/belongs to relationships almost useless. And coding for all of these relationships will likely be more time-consuming than we have the money to code for.
THE QUESTION:
Assuming you are VERY convinced (IMHO...hehe) that the schema is not ideal, and there are multiple ways to represent the domain, how would you argue FOR simplifying the schema (aside from what I've already said)?
I'll stand up in 2 roles here
DBA: Database admin/designer.
Dev: Application developer.
I assume the DBA is a person who really know all the Database tricks. Reaallyy Knows.
DBA:
Database is the key of the application and should have predefined structure in order to serve its purpose well and with best performance.
If you cannot use random schema (which is reasonably normalised and good) then the tools are wrong.
Dev:
The database is just a data store, so we need to keep it simple and concentrate on the application.
DBA:
Database is not a store it is the core of the application. There is no application without database.
Dev:
No. The application is the core. There is no application without the front-end and the business logic applied to it.
And the war begins...
Both points are valid and it is always trade off.
If the database will ONLY be used by RoR, then you can use it more like a simple store.
If the DB can be used by other application OR it will be used with large amount of data and high traffic it must enforce some best practices.
Generally there is no way you can disagree with DBA.
But they can understand your situation and might allow you to loose the standards a bit so you could be more productive.
So you need to work closely, together.
And you need to talk to each other to explain and prove the point why database should be like this or that.
Otherwise, the team is broken and project can be failure with hight probability.
ActiveRecord is a very handy tool. But it cannot do everything for you. It does not provide Database structure by default that you expect exactly. So it should be tuned.
On the other side. If DBA can accept that all PKs are Auto incremented integers that would make Developer's life easier (ActiveRecord does it by default).
On the other side, if developers would accept some of DBA constraints it would make DBA's life easier.
Now to answer your question:
how would you argue FOR simplifying the schema
Do not argue. Meet the team and deliver the message and point on WHY it should be done.
Maybe it really shouldn't and you don't know all the things, maybe they are not aware of something.
You could agree on the general structure of the database AND try to describe it using RoR migrations as a meta language.
This way they would see the general picture, and you would use your great ActiveRecords.
And also everybody would be on the same page.
Your DB schema should reflect the domain and its relationships.
De-normalisation should only be done when you have measured that there is a performance problem.
7 joins is not excessive or bad, provided you have good indexes in place.
The general way to make this argument up the chain is based on cost. If you do things simply, there will be less code and fewer bugs. The system will be able to be built more quickly, or with more features, and thus will create more ROI. If you can get the money manager on board with that approach, he or she may let you dictate terms to the team. There is the counterargument that extreme over-normalization prevents bad data, but I have found that this is not the case, as the complexity it engenders tends to lead to more errors and more database code in general.
The architectural and technical argument here is simple. You have decided to use Ruby on Rails. Therefore you have decided to use the ActiveRecord pattern. The ActiveRecord pattern is driven by having the database tables match the object model. That's the pattern in use here, and in many other places, so the best practices they are trying to apply for extreme data normalization simply do not apply. Buy a copy of Patterns of Enterprise Application Architecture and put the little red bookmark at page 160 so they can understand how the pattern works from the architecture perspective.
What the DBA types tend to be unaware of is how much work ActiveRecord does for you, from query generation, cascading deletes, optimistic locking, auto populated columns, versioning (with acts_as_versioned), soft deletes (with acts_as_paranoid), etc. There is a strong argument to use well tested, community supported library functions to perform these operations versus custom code that must be maintained by a DBA.
The real issue with DBAs is then that they need some work to do. Let them focus on monitoring performance, finding slow queries in the code, creating indexes and doing backups.
If you end up losing the political battle for a sane schema, you may want to consider switching to DataMapper. It's the next pattern in PoEAA. The other thing you may be able to get them to do is to create views in the database that correspond to the object model. This way, you could use many of the finding capabilities in the ActiveRecord model based on the views, but have custom insert, update, and delete methods.

server side db programming: why?

Given that database is generally the least scalable component (of a web application), are there any situations where one would put logic in procedures/triggers over keeping it in his favorite programming language (ruby...) or her favorite web framework (...rails!).
Server-side logic is often much faster, even with procedural approach.
You can fine-tune your grant options and hide the data you don't want to show
All queries in one places are more convenient than if they were scattered all around the code.
And here's a (very subjective) article in my blog on the reason I prefer stored procedures:
Schema Junk
BTW, triggers (as opposed to functions / stored procedures / packages) I generally dislike.
They are completely other story.
You're keeping the processing in the database, along with the data.
If you process on the server side, then you have to transfer the data out to a server process across the network, process it, and (optionally) send it back. You have the network bandwidth/latency issues, plus memory overheads.
To clarify - if I have 10m rows of data, my two extreme scenarios are to a) pull those 10m rows across the network and process on the server side, or b) process in place in the database using the server and language (SQL) optimised for this purpose. Note that this is a generalisation and not a hard-and-fast rule, but it's the one I follow for most scenarios.
When many heterogeneous applications and various other systems need to access your single database and be sure through their operations data stays consistent without integrity conflicts. So you put your logic into triggers and stored procedures that will offer an interface to external clients.
Maybe not for most web-based systems, but certainly for enterprise databases. Stored procedures and the like allow you much greater control over security and performance, as well as offering a bit of encapsulation for the database itself. You can change the schema all you want as long as the stored procedure interface remains the same.
In (almost) every situation you would keep the processing that is part of the database in the database. Application code cannot substitute for triggers, you won't get very far before you have updated the database and failed to fire the application's equivalent of the triggers (the first time you use the DBMS's management console, for instance).
Let the database do the database work and let the application to the application's work. If you have a specific performance problem with the database, and that performance problem can be addressed by moving processing from the database, in that case you might want to consider doing so.
But worrying about database performance without a database performance problem existing (which is what you seem to be doing here) is both silly and, sadly, apparently a pre-occupation of many Stackoverlow posters.
Least scalable? SQL???
Look up, "federating."
If the database is shared, having logic in the database is better in order to control everything that happens. If it's not it might just make the system overly complicated.
If you have multiple applications that talk to your database, stored procedures and triggers can enforce correctness more pervasively. Accordingly, if correctness is more important than convenience, putting logic in the database is sensible.
Scalability may be a red herring, though. Sometimes it's easier to express the behavior you want in the domain layer of an OO language, but it can be actually more expensive than doing the idiomatic SQL way.
The security mechanism at a previous company was first built in the service layer, then pushed to the db side. The motivation was actually due to some limitations in a data access framework we were using. The solution turned out to be a bit buggy because our security model was complicated, but the upside was that bugs only had to be fixed in the database; we didn't have to worry about different clients following different rules.
Triggers mean 3rd-party apps can modify the database without creating logical inconsistencies.
If you do that, you are tying your business logic to your model. If you code all your business logic in T-SQL, you aren't going to have a lot of fun if later you need to use Oracle or what have you as your database server. Actually, I'm not sure I understand this question exactly. How do you think this would improve scalability? It really shouldn't.
Personally, I'm really not a fan of triggers, particularly in a database dedicated to a single application. I hate trying to track down why some data is inconsistent, to find it's down to a poorly written trigger (and they can be tricky to get exactly correct).
Security is another advantage of using stored procs. You do not have to set the security at the table level if you don't use dynamic code (Including ithe stored proc). This means your users cannot do anything unless they have a proc to to it. This is one way of reducing the possibility of fraud.
Further procs are easier to performance tune than most application code and even better, when one needs to change, that is all you have to put on production, not recomplie the whole application.
Data integrity must be maintained at the database level. That means constraints, defaults values, foreign keys, possibly triggers (if you have very complex rules or ones involving multiple tables). If you do not do this at the database level, you will eventually have integrity issues. Peolpe will write a quick fix for a problem and run the code in the query window and the required rules are missed creating a larger problem. A millino new records will have to be imported through an ETL program that doesn't access the application because going through the application code would take too long running one record at a time.
If you think you are building an application where scalibility will be an issue, you need to hire a database professional and follow his or her suggestions for design based on performance. Databases can scale to terrabytes of data but only if they are originally designed by someone is a specialist in this kind of thing. When you wait until the while application is runnning slower than dirt and you havea new large client coming on board, it is too late. Database design must consider performance from the beginning as it is very hard to redesign when you already have millions of records.
A good way to reduce scalability of your data tier is to interact with it on a procedural basis. (Fetch row..process... update a row, repeat)
This can be done within a stored procedure by use of cursors or within an application (fetch a row, process, update a row) .. The result (poor performance) is the same.
When people say they want to do processing in their application it sometimes implies a procedural interaction.
Sometimes its necessary to treat data procedurally however from my experience developers with limited database experience will tend to design systems in a way that do not leverage the strenght of the platform because they are not comfortable thinking in terms of set based solutions. This can lead to severe performance issues.
For example to add 1 to a count field of all rows in a table the following is all thats needed:
UPDATE table SET cnt = cnt + 1
The procedural treatment of the same is likely to be orders of magnitude slower in execution and developers can easily overlook concurrency issues that make their process inconsistant. For example this kind of code is inconsistant given the avaliable read isolation levels of many RDMBS platforms.
SELECT id,cnt FROM table
...
foreach row
...
UPDATE table SET cnt = row.cnt+1 WHERE id=row.id
...
I think just in terms of abstraction and ease of servicing a running environment utilizing stored procedures can be a useful tool.
Procedure plan cache and reduced number of network round trips in high latency environments can also have significant performance advantages.
It is also true that trying to be too clever or work very complex problems in the RDBMS's half-baked procedural language can easily become a recipe for disaster.
"Given that database is generally the least scalable component (of a web application), are there any situations where one would put logic in procedures/triggers over keeping it in his favorite programming language (ruby...) or her favorite web framework (...rails!)."
What makes you think that "scalability" is the only relevant concern in a system design ? I agree with rexem where he commented that it is very obvious that you are "not" biased ...
Databases are sets of assertions of fact. Those sets become more valuable if they can also be guaranteed to conform to certain integrity rules. Those guarantees are not worth a dime if it is the applications that are expected to enforce such integrity. Triggers and sprocs are the only way SQL systems have to allow such guarantees to be offered by the DBMS itself.
That aspect outweighs "scalability" anytime, anywhere, anyhow.

Resources