I'm looking for suggestions on how to store things of value in my db - ruby-on-rails

I have a requirement (pending) that allows the user to buy "credits" and then to trade those credits for real goods or services.
This is the first time I have had to do this in an application and I am concerned it would paint a target on my database's back. A hacker could (in theory) change the amount of credits a user has and then "spend" those credits, or convert them to cash.
I'm building the solution in ruby/rails, but I am not limited to this tech. I can even use an outside provider if it's more practical.
Does anyone have any suggestions on how to do this? Would you encrypt your DB? Would that be enough?

There are a lot of different things to consider. When it comes to matters of security, there is never a silver bullet (anyone who suggests that is likely selling snake oil); rather, security often involves many different steps to mitigate and manage risks and also redundant layers of protection so that one is still protected if some subset of those layers fail for whatever reason.
In terms of storing monetary transactions, there are often a number of legal regulations that need to be followed, so I suggest consulting those in addition to other security measures. In terms of encrypting the database, there are many different ways to apply encryption... one can encrypt the database as a whole or individual rows of the database using different keys. If you just encrypt the database as a whole, it won't provide you much protection if someone has access to the key (which also begs the question of who has the key, where is it stored, etc.?). What you probably want, in addition to the database itself, is a write-only log of transactions with some sort of checksumming so that you can be assured of its authenticity that provides you with an audit trail that is independent of the database (and from which the database could be reconstructed in the event of some sort of breach). In addition, you'll want to ensure that only authorized applications (such as your production instance of your frontend server) can talk to and decrypt the database (and ensure that only a very limited number of people you trust can deploy new versions of those, so that no one can arbitrarily deploy malicious versions that abuse that access). If it's possible to independently encrypt individual rows with different keys (e.g. to encrypt each per-user row with key material derived from the user's login credential), then that is highly advisable (though this is not always possible to do, such as if you need to be able to process that row even when the user is not actively interacting with your application). I'm sure there are other things that I have not thought of, which is why you'll also want to regularly conduct penetration testing to check for any vulnerabilities (and not only fix anything you discover this way, but also use it to inform projects or processes that you can employ to prevent similar vulnerabilities in the future).
In addition to the security considerations, monetary transactions is one of the few cases where "eventual consistency" doesn't really work; you'll need to make sure that you are careful in your programming to make transactions appropriately atomic. That is, you wouldn't want the number of credits to decrease at a separate time step from the dispension of dollars, as that would allow the same credits to be spent twice... you'll want to be very careful in your coding that the decrease in credits and increase in dollars (or vice-versa) happen simultaneously. For this and other reasons thorough testing and good code review practice is a good idea.


Ensuring consistency with relational database without foreign keys

I have seen examples like discourse where tables in relational database don't have foreign keys. While the other tenants of RDB are still used like CONSTRAINTS, INDEXES , FULLTEXTSEARCH etc.. but as per Rails Active record guidelines , foreign keys are dropped.
Do we need to periodically check for consistency in such applications ? And in that case should it be done for each request -response that there is no invalid foreign key and correct it at same time in application layer.
Ok so the first thing to understand is why we generally put such constraints in the database. The second point will be why some people don't like this. The third will be what the ramifications of not doing so are.
Why we put RI checks in the database
A relational database is basically a big math engine performing set (well, actually bag, due to concessions with real-world data integrity problems) operations on large sets of data. As the sets grow, the ability to verify integrity of the data reduces until at some point one has trouble verifying the entire validity of the data according to the set model one follows. I have worked with PostgreSQL databases where constraints were not possible, so that in some areas we had to accept that there would be referential integrity violations.
The problem of managing referential integrity where one software project owns the database can be formidable, but they can become far worse when many different programs can read or write the same data. This gets worse because normalization and encapsulation concerns increase with the number of pathways for reading (and worse, writing) the data.
Ensuring that one can make sure that referential integrity is not violated on each write is thus an important tool in data management.
Why some people avoid RI checks in the database
Referential integrity constraints however are not free. There are two important drawbacks to using them that sometimes cause developers to decide not to.
Referential integrity checks are not free. They do impact database performance, and often the database is understood to be the least scalable part of a system, and
They divide logic, placing it in different locations and segregating data model logic from application logic. While this separation of concerns is usually desirable, where a single application owns a database, it is sometimes (but not always!) considered to be a less desirable tradeoff.
It is worth noting further that Rails guidelines don't offer solid guidance on this tradeoff. Like many ORMs, Active Record offers tools for addressing this in the application, I found plenty of examples of people using foreign keys in the database, and nobody saying "don't use them."
Concerns from avoiding RI checks in the database
The concerns and further mitigating measures of course depend on the importanc and further use of data. A lower-impact data set which is just the private data store of an application (the normal rails way) doesn't have the same implications as a higher-impact data store used that is intended to be used for decision support later. So repeated read-use is am important question in deciding whether you need to periodically re-scan.
The second concern are alternate sources of writes. In general in this model the most important concern is to prevent alternate sources of writes, outside of uses of these specific ActiveRecord-using classes.
So in answer to your question, you may or may not need to. But you should probably do a risk assessment and decide what to do. Such a risk assessment will guide this decision not only at the moment but also in the future.
As a side note
You can use foreign keys to insist on consistency while using the hooks and so forth to ensure that the logic is properly handled in the ActiveRecord component. I.e. instead of using ON DELETE CASCADE have that handled by a hook.

Cache solution for a news feed, based on objective information?

I need some suggestions of what works well for caching an updatable news feed.
Please, no "Fanboy" answers either please - not looking for subjective opinions of what the "best" system, just seeking some suggestions of technologies that will fit the requirements below. So please, share what you have used in the real world, even if you prefer some other solution.
I have a rails based news feed (Neo4j database), and while performance is good, I would like to cache it so that servers don't get bogged down serving live feeds.
EASY FRAGMENT UPDATES: I'd like to easily update parts of a user's newsfeed the
cache based upon specific triggers, for example, when a user edits
their status update - I don't want to regenerate the user's entire
news feed in the cache, rather I just want to update that one
"fragment", or section if you will, of the particular user's feed. And I don't want to jump through hoops to try and do so.
DELETION: If someone deletes an activity, I just want to remove that activity
from their news feed before the system eventually refreshes the entire feed for that user.
EASY RETRIEVAL: I'd like to retrieve the cache in such a way that the rails
controller/models can easily read them and hand them off to views without
any modification of the views.
PERSISTENCE: If I need to reboot the cache, it should load up the
cache from disk. Which means it should save cached entries to disk.
SPEED: Given that it must be able to be update fragments of cached
news feeds, there is going to be a performance hit of some sort. But
I need speed..
What cache technologies provide such capabilities? Will Redis, MongoDB, Memcached fit these requirements? What other options are there? (CouchDB, Tokyo File cabinet, etc)..
In the spirit Stack Overflow, I'm not asking for subjective opinions on what you like better and why, I'm just asking for possible candidate systems that you may have actually used in production to accomplish caching and updating a cached news feed (or anything similar).
Since it is mainly an opinion-based topic, this answer will be subjective. But I will try anyway to remain factual.
The first point to notice is your requirements tend to be mutually exclusive. As we said in France, you want the butter, the money for the butter, and the wife of the farmer (ok, this is probably a lousy translation).
For example, to support easy fragment updates and proper deletion, you will need some kind of data structures in the cache. I have zero knowledge about Rails, but I guess it will have impact on the data access patterns, and the definitions of controllers/models. In other words, it will add complexity to data retrieval. You need speed, but at the same time, you also require persistency, and also non-trivial data access patterns. Well, you cannot get everything at the same time, you will have to make choices, and prioritize these requirements.
My second point is a cache is only useful when there is a significant difference in term of performance between the cache and the underlying storage engine. Since you already use a NoSQL engine which is rather efficient (Neo4j), you need to consider only engines which are truly designed for raw performance (i.e. low-latency stores): memcached, redis, couchbase, aerospike, to name well-established open-source products. If you feel a bit more adventurous, you can also consider other projects like tarantool or hyperdex.
There are a number of commercial products as well, but I'm not sure they provide a Ruby client (TIBCO ActiveSpaces, Gigaspaces, Red-Hat Infinispan, etc ...)
Other NoSQL engines (MongoDB, Cassandra, CouchDB, etc ...) have other interesting properties, but they will not beat these solutions at raw performance for a mixed r/w workload. Here, I'm only talking about raw performance (i.e. low latency at high throughput), not scalability.
Actually, memcached can be excluded because it does not support persistency. I would say you can probably implement what you want with Redis, Couchbase or Aerospike, but Aerospike 3 does not seem to have yet an officially supported Ruby client.
Supporting multiple data accesses paths (i.e. consistent indexing data structure) will be easier with Redis and Aerospike than Couchbase. High-availability will be easier with Couchbase or Aerospike than with Redis. Implementing a cache behavior will be easier with Redis and Couchbase than with Aerospike.
Some general advices:
make sure you really have a performance or a scalability issue with Neo4j before adding the complexity of an extra layer. Complexity is like toothpaste: once it is out of the tube, you cannot put it back.
data access patterns should be listed at design time, and must be backed by corresponding data structures in the chosen engine.
the hardware footprint must be considered as well. If you have only a couple of boxes, pick a lightweight solution like Redis.
with persistency, you need to consider also HA. What happens if the caching layer is lost? Actually, I would say that for a cache, HA may be more important than persistency.
Finally, you need also to define the exact cache semantic you want (update behavior, invalidation behavior, cache miss management, TTL policy if any, etc ...). The 3 NoSQL engines I have listed provide some tools to help the implementation of the various strategies, but none of them will support an off-the-shelf strategy. This will require some coding to implement it.

Is there a best practice and recommended alternative to Session variables in MVC

Okay, so first off before anyone attempts to make a determination that this is a "duplicate" question; I have reviewed most of the posts on SO regarding similar questions but even in combination of all that has been said I still am somewhat at a dilemma as to the definitive or maybe I should say unanimous agreement on this.
I can however say that I have (based on the posts) conclusively determined that the answer is based on the scope of the requirement. But even with consideration of this, the opinions seem too diverse for me to make a decision as to how I should handle this.
My immediate requirement is that I need to persist variable data from 1 controller across many views. More specifically, I have a controller and corresponding view that handles shopping cart item counts and I would like to persist that data across multiple views. I am thinking that the _layout view is the most logical choice for this.
Now I have successfully accomplished this task by assigning the value to a Session variable which is retrieved from my _layout view; so even when the user were to navigate any where within the site the number of items in the Shopping Cart will persist until either they leave the site or complete the checkout; in which case the variable will be cleared in code.
The posts I've read seemed biased to either staying away from Session variables in favor of Cookies and storing data in a database; or stating that for the intent purpose for which I propose to use them, Session variables are perfectly fine to use.
The other thing I've read suggests that Session variables can potentially impede overall performance if there is high traffic on the site since the information is stored on the server.
I personally cannot justify storing this type of information in a database and subsequently hitting the database as I'd imagine that this could also affect site performance and seems a bit overkill for storage of temporary data. TempData, ViewData and ViewBag do not work in persisting the data so they are not logical choices for the requirement IMO.
If there is another well suited alternative to the Session variable (which is working for me) I would like to know what it is.
2 posts that seem contradictory in effort of providing best recommendations leave me a bit confused.
Cons: Is it a good practice to avoid using Session State in ASP.NET MVC? If yes, why and how?
Pros: Still ok to use Session variables in ASP.NET mvc, or is there a better alternative for some things (like a cart)
Seems that this question (although presented in many different variations) has no definitive answer that I can conclude.
If there is a more preferrable way to accomplish this without overkill then that is the answer I'm in search of.
I read somewhere the use of MVC filters in tandem with the Global.ascx application start section as well, but this does not seem appropriate for variables set at the controller level as much as perhaps, static variables.
Can someone maybe squash (for lack of a better word) the many diverse opinions on the topic and maybe provide a more definitive answer to the question? I'm sure the diverse opinions have their place and I'm not attempting to discredit them. But having a definitive and possibly unanimous answer would be better; then I could sort through the other posts to determine what is best for my application.
Of course, if this question has no definitive answer; just tell me that and I'll attempt to derive my own answer from the other posts.
Caching and Cookies seem to be a general preference from the responses however I've also noted the statement that caching its not an ideal candidate to use across multiple web server because synchronization can be a potential issue.
Giving credit to Tim, it's stated that Database storage is optimized and users have the option to return at a later time and continue where they left off.
That is an excellent point, but keeping foresight on probabilities; its likely a reasonable given that some users may not return leaving unneccessary data in the database.
So keeping the DB optimized and clean (which "to me" is of equal relevance) would require implementing a maintenance task to automatically expire those records based on a set threshold of time to account for those circumstances. Although a maintenance task is not an unquestionable option, I still think this adds just a bit more work to the task simply for the intent purpose of serving as temporary storage.
Nonetheless, I do respect Tim's recommendation and believe it deserves merit on countering my initial opinion to a degree; that a database would not seem to be a viable option for storing Temporary data; so I think the compromise would be to store the data in a database (given the scenario of a Shopping Cart or similar) perhaps after a checkout. This way as you previously stated, the data may be persistently tracked upon subsequent visits so you have a record of transactions. But more importantly, it would be data of those transactions having real relevance to persist to the database.
It was also stated that although Session is faster than Database; but notwithstanding to have its caveats that can to some degree be mitigated by other mechanisms such as leveraging the SessionStateBehavior attribute, just serving as one example.
BUT... I think Erik kind of drove the point home with the Dunning-Kruger Effect. Although, from the content and explanations for proposed answers given here; I seriously doubt the expertise of any of the individuals who have responded is any way questionable. Nonetheless, I tend to agree on the fact of getting a unanimous opinion may be somewhat of a higher than reasonable expectation on my part.
What I was more specifically looking for was a general consensus for a technique that would comfortably accomodate a diverse number of scenarios. In other words, something that would accomodate not only my particular scenario but also provide the element of scalability to larger environments with potentially heavier traffic. This way a change in the programming would be either alleviated altogether or minimal at best.
Summary based on the feedback:
Session variables seem to accomodate smaller case scenarios and when applicable, but they have some potential for persistence concerns among other notable discrepancies as stated very thorougly by Erik. So this option obviously will not fit a scalable model.
Caching is preferable over Session variables but again not neccessarily the "best" scalable option due to among other things to the potential synchronization complexities in web server farm environments as previously pointed out. But an option nonetheless.
Database storage is scalable but for the intent purpose of temporary volatile storage is probably not the most elegant option from a database perspective as it would require periodical cleanup. Personally, having a strong foundation in database concepts earlier in my career this probably is not going to be something that many developers will likely agree with; but using the database for this purpose may suffice for Web Development from a programmers perspective; however from perspective of the DAL and DB development this (to me) has the potential for mandating an additional DB task to enforce an efficient backend.
Cookies seem to be a nice option having the combined "desirable" elements of Session variables and caching.
Based on the answers; I think COOKIES and CACHING seem to be generally well rounded proposals for best practice across the board in combination with database storage when continued persistence is required after the fact; as potentially good candidates for scalability of the ones presented.
The ultimate choice between the 2 would seem to be based on the amount and type of data requiring storage (e.g. sensitive vs non-sensitive and whether or not there is any concern that the client may alter the data on their end); in addition to special considerations for COOKIES in the fact that they may be disabled by the clients.
Obviously, there is no one size fits all solution as clearly pointed out and concluded from the answers provided but in terms of scalability; I may be wrong but these seem to be the BEST choices available.
Because all the responses are good; I'm fairly going to credit all the posts as useful and going to accept Erik's answer as a well rounded overall scalable solution. I wish I could select more than one accepted answer as I believe Tim's response was also very well layed out and concise.
Gupta's response was good also, but I wanted more elaboration of the proposed answer and not a repeat of previous posts.
Thanks Guys!
You will never get unanimous opinion on anything in any large group of people. That's just human nature. Part of that stems from the Dunning-Kruger Effect which states that the less someone knows about a subject, the more likely they are to over value their expertise in that subject. In other words, lots of people think they know something, but only because they don't know they don't know it. Part of it is simply that people have different experiences, and some have found no problems with session, while others have in various situations, or vice versa...
So, to backup your research, which suggest that the answer depends heavily on the requirements, we need to understand what your requirements are. If this is to be a high traffic site, with load balanced servers in a web farm, then stay as far away from session as you can. Sure, it's possible to share session in various ways in a server farm environment (session server, distribute cache server, etc..), but avoiding session will almost always be faster if you can help it.
If your site is a single server, and unlikely to ever grow beyond that. And your traffic patterns are relatively low, then session may be a useful option. However, you should always be aware that session is unreliable storage, and can disappear on you at any time. If the app pool is recycled, session is gone. If an uncaught exception bubbles up to the worker process, the session may be gone. If IIS thinks there's not enough memory, your session may be gone, regardless of any timeout values configured. You also can't always get reliable notification that a session has ended, since terminated sessions do not fire the Session_End event.
Another issue is that Session is serialized. In other words, IIS prevents more than one thread from writing to the session at a time, and it often does this by locking the session while a thread is running if it has not opted out of writable session locking. This can cause severe problems in some cases, and merely poor performance in others. You can mitigate this by marking various methods with a read-only session attribute if you aren't going to be modifying it in that method.
Ultimately, if you do choose to use session, then try to only use it for small, short lived things if at all possible, and if not possible then build in a way to "regenerate" the data if the session is lost. For instance, using your number of items in cart example, you could write a method that first checks to see if the value is there, and if not it goes out and loads it from the database. Always use this method to access the variable, rather than accessing it directly from session... this way, if the session is lost it will just reload it.
However, having said this... For the number of items in a cart, I would generally prefer to use a cookie for this information, since cookies get passed to the page on every load anyways, and this is a small discrete unit of data. Generally prefer Session for sensitive data that you want to prevent the user from being able to change.. number of items in the cart simply doesn't fit that rule.
Databases are highly optimized. A simple value like a shopping cart count is a good candidate for caching by the database and (hopefully) cheap to compute outright. It may be a non-issue.
However, if you have ruled out other mechanisms, small, user-by-user values are viable candidates for session.
Cache is fine for site-wide values, or user-specific values with unique keys. However, synchronizing caches across multiple web servers can be difficult. Out of process session state will stay synchronized because it is stored in a single location (database or a state server).
Of course, there are many 3rd party caching alternatives with various options to keep them synchronized.
Regardless of where the count is temporarily stored, I'm of the opinion that shopping carts themselves should be stored in the database so that users have the option to return later and continue where they left off.
If you use out of process session state (e.g. in a load balanced environment and/or to make session more durable), it will hit a database or call an out of process service, but the call is relatively cheap unless you are serializing large object graphs.
Session is loaded once per request. Subsequent read access is very fast.
Writing to session can be detrimental to performance, even when there is no load. Why? most modern applications use asynchronous calls, and when multiple async calls hit an HTTP handler (page, controller, etc) that reads/writes session, ASP.Net will lock the session to serialize access. To avoid this, you can decorate your controllers with [SessionState( SessionStateBehavior.ReadOnly )]
Now I have successfully accomplished this task by assigning the value
to a Session variable which is retrieved from my _layout view;
This seems like mixing concerns, i.e. having the view aware of the underlying storage mechanism. From a purist standpoint, I would set this value on a view model or at least put it in the ViewBag. From a practical standpoint, one or two values retrieved in this manner probably won't hurt anything, but beware of letting it grow much further.
I read somewhere the use of MVC filters in tandem with the Global.ascx
application start section as well, but this does not seem appropriate
for variables set at the controller level as much as perhaps, static
Static variables have perfectly legitimate uses, but you must understand them thoroughly or risk serious problems.
See my answers pertaining to static variables in ASP.Net:
does aspx provide special treatment for c# static variables
Static fields vs Session variables
Session alternative in different prospective :-
When you keep something in session it breaks the primary rule in ASP.NET MVC. You can use these options as an alternative of session.
If your asp.net (MVC) session do boxing unboxing on the object then it makes a little load on the server. Try this idea
Caching :- Storing a List or something like large data in session is better can fit in Caching. You have control on whenever you want it to expire rather than user session.
If your app depends on JSON/Ajax data then you can use some kind of functionality provided in html5 (like WebSQL, IndexDB). it will not use the cookie so you can save some workload on the server.

Data distribution for a system with SOA

I have a rails application which manages different types of items and users who own them. Items of different types might have different features. There is a number of sinatra services which have to access items (read-only, every service one specific item type).
Is it a good idea to create separate tables / databases for every service and to keep them in sync with the rails DB? In this case the main DB will hold all items. It's postgres, so hstore could be used for different features. On all updates a sync message will be sent using Redis pub/sub or RabbitMQ messaging. Services will subscribe and update service specific tables.
The system should be really reliable, scalable, and prepared for high-load and new not yet known item categories. What do you think? Does it make sense or are there better approaches for these requirements? Thank you in advance, I really appreciate your help!
There is no one-size-fits-all answer here. The answer depends on your requirements and these will decide which of two approaches you might take.
The first approach is conceptually the simplest, which is to have every service hit the same database. The advantage here is that you can scale up relatively easily, the system is simple and flexible, and you can do a lot with the database to keep things working well. The disadvantage is that db downtime will take down all services at once.
The second approach is to keep every service (or group of closely related services) as separate self-contained service, kept in sync with some sort of message passing. This has the advantage of being more robust in terms of delivering basic services, but far less robust in terms of everything staying in sync (because the CAP theorem's consistency requirement is sacrificed for availability, and your data is effectively partitioned).
I don't know which one you will want to use. To the extent possible I would usually choose the single db approach but I am a Postgres guy, not a Rails guy. The second approach also works quite well in some cases but it does have a complexity cost.

server side db programming: why?

Given that database is generally the least scalable component (of a web application), are there any situations where one would put logic in procedures/triggers over keeping it in his favorite programming language (ruby...) or her favorite web framework (...rails!).
Server-side logic is often much faster, even with procedural approach.
You can fine-tune your grant options and hide the data you don't want to show
All queries in one places are more convenient than if they were scattered all around the code.
And here's a (very subjective) article in my blog on the reason I prefer stored procedures:
Schema Junk
BTW, triggers (as opposed to functions / stored procedures / packages) I generally dislike.
They are completely other story.
You're keeping the processing in the database, along with the data.
If you process on the server side, then you have to transfer the data out to a server process across the network, process it, and (optionally) send it back. You have the network bandwidth/latency issues, plus memory overheads.
To clarify - if I have 10m rows of data, my two extreme scenarios are to a) pull those 10m rows across the network and process on the server side, or b) process in place in the database using the server and language (SQL) optimised for this purpose. Note that this is a generalisation and not a hard-and-fast rule, but it's the one I follow for most scenarios.
When many heterogeneous applications and various other systems need to access your single database and be sure through their operations data stays consistent without integrity conflicts. So you put your logic into triggers and stored procedures that will offer an interface to external clients.
Maybe not for most web-based systems, but certainly for enterprise databases. Stored procedures and the like allow you much greater control over security and performance, as well as offering a bit of encapsulation for the database itself. You can change the schema all you want as long as the stored procedure interface remains the same.
In (almost) every situation you would keep the processing that is part of the database in the database. Application code cannot substitute for triggers, you won't get very far before you have updated the database and failed to fire the application's equivalent of the triggers (the first time you use the DBMS's management console, for instance).
Let the database do the database work and let the application to the application's work. If you have a specific performance problem with the database, and that performance problem can be addressed by moving processing from the database, in that case you might want to consider doing so.
But worrying about database performance without a database performance problem existing (which is what you seem to be doing here) is both silly and, sadly, apparently a pre-occupation of many Stackoverlow posters.
Least scalable? SQL???
Look up, "federating."
If the database is shared, having logic in the database is better in order to control everything that happens. If it's not it might just make the system overly complicated.
If you have multiple applications that talk to your database, stored procedures and triggers can enforce correctness more pervasively. Accordingly, if correctness is more important than convenience, putting logic in the database is sensible.
Scalability may be a red herring, though. Sometimes it's easier to express the behavior you want in the domain layer of an OO language, but it can be actually more expensive than doing the idiomatic SQL way.
The security mechanism at a previous company was first built in the service layer, then pushed to the db side. The motivation was actually due to some limitations in a data access framework we were using. The solution turned out to be a bit buggy because our security model was complicated, but the upside was that bugs only had to be fixed in the database; we didn't have to worry about different clients following different rules.
Triggers mean 3rd-party apps can modify the database without creating logical inconsistencies.
If you do that, you are tying your business logic to your model. If you code all your business logic in T-SQL, you aren't going to have a lot of fun if later you need to use Oracle or what have you as your database server. Actually, I'm not sure I understand this question exactly. How do you think this would improve scalability? It really shouldn't.
Personally, I'm really not a fan of triggers, particularly in a database dedicated to a single application. I hate trying to track down why some data is inconsistent, to find it's down to a poorly written trigger (and they can be tricky to get exactly correct).
Security is another advantage of using stored procs. You do not have to set the security at the table level if you don't use dynamic code (Including ithe stored proc). This means your users cannot do anything unless they have a proc to to it. This is one way of reducing the possibility of fraud.
Further procs are easier to performance tune than most application code and even better, when one needs to change, that is all you have to put on production, not recomplie the whole application.
Data integrity must be maintained at the database level. That means constraints, defaults values, foreign keys, possibly triggers (if you have very complex rules or ones involving multiple tables). If you do not do this at the database level, you will eventually have integrity issues. Peolpe will write a quick fix for a problem and run the code in the query window and the required rules are missed creating a larger problem. A millino new records will have to be imported through an ETL program that doesn't access the application because going through the application code would take too long running one record at a time.
If you think you are building an application where scalibility will be an issue, you need to hire a database professional and follow his or her suggestions for design based on performance. Databases can scale to terrabytes of data but only if they are originally designed by someone is a specialist in this kind of thing. When you wait until the while application is runnning slower than dirt and you havea new large client coming on board, it is too late. Database design must consider performance from the beginning as it is very hard to redesign when you already have millions of records.
A good way to reduce scalability of your data tier is to interact with it on a procedural basis. (Fetch row..process... update a row, repeat)
This can be done within a stored procedure by use of cursors or within an application (fetch a row, process, update a row) .. The result (poor performance) is the same.
When people say they want to do processing in their application it sometimes implies a procedural interaction.
Sometimes its necessary to treat data procedurally however from my experience developers with limited database experience will tend to design systems in a way that do not leverage the strenght of the platform because they are not comfortable thinking in terms of set based solutions. This can lead to severe performance issues.
For example to add 1 to a count field of all rows in a table the following is all thats needed:
UPDATE table SET cnt = cnt + 1
The procedural treatment of the same is likely to be orders of magnitude slower in execution and developers can easily overlook concurrency issues that make their process inconsistant. For example this kind of code is inconsistant given the avaliable read isolation levels of many RDMBS platforms.
SELECT id,cnt FROM table
foreach row
UPDATE table SET cnt = row.cnt+1 WHERE id=row.id
I think just in terms of abstraction and ease of servicing a running environment utilizing stored procedures can be a useful tool.
Procedure plan cache and reduced number of network round trips in high latency environments can also have significant performance advantages.
It is also true that trying to be too clever or work very complex problems in the RDBMS's half-baked procedural language can easily become a recipe for disaster.
"Given that database is generally the least scalable component (of a web application), are there any situations where one would put logic in procedures/triggers over keeping it in his favorite programming language (ruby...) or her favorite web framework (...rails!)."
What makes you think that "scalability" is the only relevant concern in a system design ? I agree with rexem where he commented that it is very obvious that you are "not" biased ...
Databases are sets of assertions of fact. Those sets become more valuable if they can also be guaranteed to conform to certain integrity rules. Those guarantees are not worth a dime if it is the applications that are expected to enforce such integrity. Triggers and sprocs are the only way SQL systems have to allow such guarantees to be offered by the DBMS itself.
That aspect outweighs "scalability" anytime, anywhere, anyhow.
