What's the purpose of stored procedures compared to the use of an ORM (nHibernate, EF, etc) to handle some CRUD operations? To call the stored procedure, we're just passing a few parameters and with an ORM we send the entire SQL query, but is it just a matter of performance and security or are there more advantages?
I'm asking this because I've never used stored procedures (I just write all SQL statements with an ORM and execute them), and a customer told me that I'll have to work with stored procedures in my next project, I'm trying to figure out when to use them.
Stored Procedures are often written in a dialect of SQL (T-SQL for SQL Server, PL-SQL Oracle, and so on). That's because they add extra capabilities to SQL to make it more powerful.
On the other hand, you have a ORM, let say NH that generates SQL.
the SQL statements generated by the ORM doesn't have the same speed or power of writing T-SQL Stored Procedures.
Here is where the dilemma enters: Do I need super fast application tied to a SQL Database vendor, hard to maintain or Do I need to be flexible because I need to target to multiple databases and I prefer cutting development time by writing HQL queries than SQL ones?
Stored Procedure are faster than SQL statements because they are pre-compiled in the Database Engine, with execution plans cached. You can't do that in NH, but you have other alternatives, like using Cache Level 1 or 2.
Also, try to do bulk operations with NH. Stored Procedures works very well in those cases. You need to consider that SP talks to the database in a deeper level.
The choice may not be that obvious because all depends of the scenario you are working on.
The main (and I'm tempted to say "only") reason you should use stored procedures is if you really need the performance.
It might seem tempting to just create "functions" in the database that do complex stuff quickly. But it can quickly spiral out of control.
I've worked with applications that encapsulate so much business logic in SQL, that it becomes virtually impossible to refactor anything. Literally hundreds of stored procedures that are black boxes for devs working with the ORM.
Such applications become brittle, hard to debug and hard to understand. By allowing business logic to live in stored procedures you are allowing SQL developers to make design choices that they shouldn't be making, in a tool that is much harder to work in, log and debug than an ORM. I've seen stored procedures that handle payment processing. Truly core stuff. Stuff that becomes so central to an application that nobody dares to touch it, all because some guy with good SQL skills made a script 5 years to fix something quickly, it was never migrated to the ORM and eventually grew into an unmanageable monster, full of tangled logic nobody understands. Devs end up having to blindly trust whatever it's doing. And what's worse, it's almost always outside test coverage, so you may break everything when you deploy, even if your tests pass with mocked data, but some ancient stored procedure suddenly starts acting up.
Abusing stored procedures is one of the worst forms of technical debt you can accumulate. The database, which is the persistence layer, should not be used for business logic. You should keep that distinction as strict as you can.
Of course, there will be cases where an ORM will have horrible performance or simply won't support a feature you need from SQL. If doing things in raw SQL is truly inevitable, only then should you consider stored procedures.
I've seen Stored Procedure Hell. You don't want that.
There are significant performance advantages to stored procedures in some circumstances. Often the queries generated by Linq and other ORMs can be inefficient, but still good enough for your purposes. Some RBDMS (such as SQL Server) will cache the execution plans for stored procedures, saving on query time. For more complex queries that you use frequently, this savings in performance can be critical.
For most normal CRUD, though, I have found that it is usually better for maintainability just to use the ORM if it is available and if its operations serve your needs. Entity Framework works quite well for me in the .NET world most of the time (in combination with Linq), and I like Propel a lot for PHP.
I stumbled over this pretty old question but I am shocked that the most important benefit of Stored Procedures is not even mentioned.
Security and resource protection
Using SPs you are able to grand execution rights for that SP to a user. The user can execute the SP and only that SP. You do not even have to give the user read or write access to the tables used. The user does not even have to know the tables used.
Using ORM you have to give read or/and write access to the tables used and users. The user can read all data from all the tables you granted the rights and even can combine them in queries, if you want it or not, and also can run queries that creates heavy load on the Database server.
This is especially useful in cases where application development and database development is done by different teams and the database is used by more than one application.
The primary use I find for them is to implement an abstraction layer and encapsulate query logic. In the same way that I write functions in a procedural language.
As le dorfier mentions one of the the primary reasons sprocs (and/or views) should be used is to provide an abstraction layer between a database and its clients (web apps, reports, ETLs etc)
This 'DB API' can make it easier to change/refactor your database without necessarily affecting clients.
See - Why use stored procs - for a more in depth discussion
I mainly stick to linq to sql as an ORM and i think its great, but there is still a place for stored procedures. Mainly i use the when the query i want to run is very complex, with many joins (especially outer joins, which suck in Linq), lots of aggregation in subqueries perhaps, recursive CTE's, and other similar scenarios.
For general crud though, there is no need.
Related
hi all
We do know that CommandType property of a SqlCommand object has 3 options: TableDirect, Text and StoredProcedure or "SP".
Knowing that "SP" has benefits over two other options, my question is do you make lots of SP in your own systems?
Or What solution do you have instead of creating SP?
Thank you
Aside of creating Stored Procedures you can use Object Relational Mapping
Such as:
linq to sql
Nhibernate
Entity Framework
Data Access :SP's vs ORMs
Choose the best way that suits you.
In all production system I used SPs and pure ADO.NET Core to access the data. Systems range from having 100-300 tables and about 500-1000 stored procedures.
Most of the Data Access code is generated using a tool. I've posted the source code and sample application on my blog if you're interested in using/modifying it. The tool can generate over 100,000 lines of code in about 20-25 seconds going against a database with about 750 stored procedures.
Data Access Layer - Code Gen
Of course if you're no familiar with Databases, data modeling/design and stored procedures you're probably better off using Linq to SQL or EF4 (Entity Framework version 4) or similar. If you need brute force performance then ADO.NET core along with Stored procedures is the way to go.
Re: your first question
When you go down the path of stored procedures, the number of stored procedures begins to grow continually for the life of the project. Outside of the basic CRUD operations, each stored procedure tends to be tightly bound to a particular problem and not very re-usable. A rule of thumb is that I can expect 8-12 stored procedures for each data table (excluding reference or code tables, such as the list of states or countries).
The very large number of procs makes naming conventions very important so that you can find anything without constantly visually re-scanning the whole list of 400-500 procs.
Re: your second question
There are a lot of ugly things that happen with sql written inside of strings inside of C# or VB.NET -- it's error prone, ugly, etc.
Linq, nHybernate and many others exist, but the "concept count" (the number of things you need to learn to start being productive), is much higher than learning how to write a good stored procedure executer in C#.
I try to make sure that stored procedures are only created for database functionality - not business logic.
It's Database Functionality when I have some database architecture that's a bit obscure and I want to hide that from callers.
It's Business Logic when it is simply the way in which my application adds or updates or how much validation they do, etc., etc.
I'm developing a Java EE application (JSF + Richfaces + Oracle 10g), and i wanted to use JPA.
But in the end, i didn't see any advantages of using it, because it's going to complexify my code.
I found that calling my procedures (stored procedures in my orale DB) is better than using JPA (because i can, for example, change some lines in those procedures without the need to re'compile my "WAR" project every time i have some error + i can use PL/SQL which helps me a lot)
So, i wanted to ask you people, when to use JPA ?
Isn't making your own queries (you can choose the right ordering for you selects, the columns that you want to select, and not all the columns: because of ORM and the fact that your entities attributes are mapped to the columns of your table, and that oblige you to select all the attributes present in your entity,....)
Is it my method that i used (stored function)
But in the end, i didn't see any advantages of using it, because it's going to complexify my code.
This might be subjective but, personally, I find JDBC typically more verbose, harder to maintain and thus somehow more complex. With an ORM like JPA, you don't have to write all the CRUD queries, you don't have to handle the mapping of query results to objects, you don't have to handle the low level stuff yourself, etc.
I found that calling my procedures (stored procedures in my orale DB) is better than using JPA (because i can, for example, change some lines in those procedures without the need to re'compile my "WAR" project every time i have some error + I can use PL/SQL which helps me a lot)
This is totally dependent on your definition of "better". In your case, you might prefer SP because of your development workflow (I don't run my code in container to setup the persistence part) and because you are comfortable with PL/SQL. But again, personally, I don't find SP "better":
I don't really enjoy hand writing SQL queries for everything
I find that SP are harder to test, debug, maintain
I find that SP are bad for code reuse
I find that SP lock you in (I consider this as a disadvantage)
I tend to find systems build around SP harder to scale
Read also Who Needs Stored Procedures, Anyways? (amongst other resource) for more opinions.
So, I wanted to ask you people, when to use JPA ?
When you want to speed-up development, to focus on implementing business code instead of plumbing. And of course, when appropriate.
Isn't making your own queries (...)
Did you identify a particular performance problem? Or are you just assuming there will be a problem. To my experience, retrieving more columns than required is most of time not an issue. And if it becomes an issue, there are solutions.
You can define what fields are retrieved in a query, using JPA.
Why is the complexity of your code going up with use of JPA ? You state no example. Before you'd have to do messy JDBC, and now you just call "em.persist(obj)" ... is that really more complex.
You ought to be thinking in terms of what relations there are between objects (if any) and that be the determiner on whether you need an ORM
Given that database is generally the least scalable component (of a web application), are there any situations where one would put logic in procedures/triggers over keeping it in his favorite programming language (ruby...) or her favorite web framework (...rails!).
Server-side logic is often much faster, even with procedural approach.
You can fine-tune your grant options and hide the data you don't want to show
All queries in one places are more convenient than if they were scattered all around the code.
And here's a (very subjective) article in my blog on the reason I prefer stored procedures:
Schema Junk
BTW, triggers (as opposed to functions / stored procedures / packages) I generally dislike.
They are completely other story.
You're keeping the processing in the database, along with the data.
If you process on the server side, then you have to transfer the data out to a server process across the network, process it, and (optionally) send it back. You have the network bandwidth/latency issues, plus memory overheads.
To clarify - if I have 10m rows of data, my two extreme scenarios are to a) pull those 10m rows across the network and process on the server side, or b) process in place in the database using the server and language (SQL) optimised for this purpose. Note that this is a generalisation and not a hard-and-fast rule, but it's the one I follow for most scenarios.
When many heterogeneous applications and various other systems need to access your single database and be sure through their operations data stays consistent without integrity conflicts. So you put your logic into triggers and stored procedures that will offer an interface to external clients.
Maybe not for most web-based systems, but certainly for enterprise databases. Stored procedures and the like allow you much greater control over security and performance, as well as offering a bit of encapsulation for the database itself. You can change the schema all you want as long as the stored procedure interface remains the same.
In (almost) every situation you would keep the processing that is part of the database in the database. Application code cannot substitute for triggers, you won't get very far before you have updated the database and failed to fire the application's equivalent of the triggers (the first time you use the DBMS's management console, for instance).
Let the database do the database work and let the application to the application's work. If you have a specific performance problem with the database, and that performance problem can be addressed by moving processing from the database, in that case you might want to consider doing so.
But worrying about database performance without a database performance problem existing (which is what you seem to be doing here) is both silly and, sadly, apparently a pre-occupation of many Stackoverlow posters.
Least scalable? SQL???
Look up, "federating."
If the database is shared, having logic in the database is better in order to control everything that happens. If it's not it might just make the system overly complicated.
If you have multiple applications that talk to your database, stored procedures and triggers can enforce correctness more pervasively. Accordingly, if correctness is more important than convenience, putting logic in the database is sensible.
Scalability may be a red herring, though. Sometimes it's easier to express the behavior you want in the domain layer of an OO language, but it can be actually more expensive than doing the idiomatic SQL way.
The security mechanism at a previous company was first built in the service layer, then pushed to the db side. The motivation was actually due to some limitations in a data access framework we were using. The solution turned out to be a bit buggy because our security model was complicated, but the upside was that bugs only had to be fixed in the database; we didn't have to worry about different clients following different rules.
Triggers mean 3rd-party apps can modify the database without creating logical inconsistencies.
If you do that, you are tying your business logic to your model. If you code all your business logic in T-SQL, you aren't going to have a lot of fun if later you need to use Oracle or what have you as your database server. Actually, I'm not sure I understand this question exactly. How do you think this would improve scalability? It really shouldn't.
Personally, I'm really not a fan of triggers, particularly in a database dedicated to a single application. I hate trying to track down why some data is inconsistent, to find it's down to a poorly written trigger (and they can be tricky to get exactly correct).
Security is another advantage of using stored procs. You do not have to set the security at the table level if you don't use dynamic code (Including ithe stored proc). This means your users cannot do anything unless they have a proc to to it. This is one way of reducing the possibility of fraud.
Further procs are easier to performance tune than most application code and even better, when one needs to change, that is all you have to put on production, not recomplie the whole application.
Data integrity must be maintained at the database level. That means constraints, defaults values, foreign keys, possibly triggers (if you have very complex rules or ones involving multiple tables). If you do not do this at the database level, you will eventually have integrity issues. Peolpe will write a quick fix for a problem and run the code in the query window and the required rules are missed creating a larger problem. A millino new records will have to be imported through an ETL program that doesn't access the application because going through the application code would take too long running one record at a time.
If you think you are building an application where scalibility will be an issue, you need to hire a database professional and follow his or her suggestions for design based on performance. Databases can scale to terrabytes of data but only if they are originally designed by someone is a specialist in this kind of thing. When you wait until the while application is runnning slower than dirt and you havea new large client coming on board, it is too late. Database design must consider performance from the beginning as it is very hard to redesign when you already have millions of records.
A good way to reduce scalability of your data tier is to interact with it on a procedural basis. (Fetch row..process... update a row, repeat)
This can be done within a stored procedure by use of cursors or within an application (fetch a row, process, update a row) .. The result (poor performance) is the same.
When people say they want to do processing in their application it sometimes implies a procedural interaction.
Sometimes its necessary to treat data procedurally however from my experience developers with limited database experience will tend to design systems in a way that do not leverage the strenght of the platform because they are not comfortable thinking in terms of set based solutions. This can lead to severe performance issues.
For example to add 1 to a count field of all rows in a table the following is all thats needed:
UPDATE table SET cnt = cnt + 1
The procedural treatment of the same is likely to be orders of magnitude slower in execution and developers can easily overlook concurrency issues that make their process inconsistant. For example this kind of code is inconsistant given the avaliable read isolation levels of many RDMBS platforms.
SELECT id,cnt FROM table
...
foreach row
...
UPDATE table SET cnt = row.cnt+1 WHERE id=row.id
...
I think just in terms of abstraction and ease of servicing a running environment utilizing stored procedures can be a useful tool.
Procedure plan cache and reduced number of network round trips in high latency environments can also have significant performance advantages.
It is also true that trying to be too clever or work very complex problems in the RDBMS's half-baked procedural language can easily become a recipe for disaster.
"Given that database is generally the least scalable component (of a web application), are there any situations where one would put logic in procedures/triggers over keeping it in his favorite programming language (ruby...) or her favorite web framework (...rails!)."
What makes you think that "scalability" is the only relevant concern in a system design ? I agree with rexem where he commented that it is very obvious that you are "not" biased ...
Databases are sets of assertions of fact. Those sets become more valuable if they can also be guaranteed to conform to certain integrity rules. Those guarantees are not worth a dime if it is the applications that are expected to enforce such integrity. Triggers and sprocs are the only way SQL systems have to allow such guarantees to be offered by the DBMS itself.
That aspect outweighs "scalability" anytime, anywhere, anyhow.
If I use stored procedures, can I use an ORM?
EDIT:
If I can use a ORM, doesn't that defeat part of the database agnosticity reason for using an ORM? In other words, why else would I want to use an ORM, if I am binding myself to a particular database with stored procedures (or is that assumption wrong)?
Using ORM to access stored procedures is one of the best uses of ORM. It'll give you strongly typed objects, while you still have full control over the SQL.
In my experience I would let the ORM handle the 'CRUD' operations, and leave the specialty work to the stored procedures. Generally, using a stored procedure for 'CRUD' operations is overkill, and to let the ORM handle it, could drastically improve your productivity.
Yes, you can, all main ORMs support stored procedures.
As for your assumption, you are particulary right, when you use stored procedures with ORM you are coupling your project to a particular database. But in practice it is 99% that you will not need to change your database provider, so in this case you use ORM not to abstract from concrete DB provider, but to help yourself with object-relational mapping task - which is a main ORM's task and which ORM was originally made for.
It raises an interesting point.
Once you have ORM, and relatively simple queries, why do you need stored procedures? SP's are intimately bound to the database. ORM frees you from having to maintain a lot of DB-specific code. What is DB-specific can be isolated and managed.
I suggest that an ORM is a golden chance to cut the complexity and put all the processing in the code where it belongs.
Use the database for what it does best -- store data.
Use your application for what it does best -- process data.
You can use both ORM features and stored procedures functionality at once. Particularly use ORM until it fits you, but if you have some trouble with performance or need some low level tune - include stored procedures in your business-logic.
Yes you can but you will want to spend some time investigating what capabilities the ORM provides around stored procedures.
Most will allow you to run a stored procedure that returns a strongly typed object / entity. More advanced ORM's will allow you to plug stored procedures in for performing CRUD actions as well (so your generic querying, deleting etc goes via a stored procedure rather than a dynamic query).
Generally ORM's are great for generating ad-hoc queries and getting strongly typed entities but having strong stored procedure support has the benefit of allowing you to (sometimes) more easily access native capability of your RDMS that may not be exposed as first class citizens in the ORM - especially if the ORM supports many database engines.
Following up from your edit:
Often you will want to use the ad-hoc querying engine provided by the ORM however as I alluded to earlier - sometimes you want to query using a capability not exposed from the ORM.
The benefits of strongly typed entities is invaluable as it means you have domain object usually, rather than data readers, data tables etc. You can cleanly encapsulate behaviors and logic within those entities that you have retrieved.
The list of additional benefits is very long indeed - for example, with the LightSpeed ORM (and most others) your entities will support standard binding interfaces, error reporting interfaces, validation etc. On the querying side you will lose out on lazy loading etc unless you write it yourself.
Database "agnosticity" (?) is not the only reason to use an ORM. However, you could take advantage of being DB independent on 99% of your interactions with the DB and in 1% (or 2% or 10% or whatever) you might need stored procedures for speed/clarity/complexity. If you changed DBs, you would need to rewrite those.
I use netTiers a lot at work and we let it generate our stored procedures for us. These only handle the basic CRUD operations, but they are very fast and save me a TON of time. netTiers will also let us create custom stored procedures and generate our data access code with these procedures.
You can, but many of the more advanced ORM features tend to become more cumbersome to use. Something like iBatis is very easy to integrate with stored procedures, while the more sophisticated features of more complex engines like (N?)Hibernate like generation of dynamic SQL and lazy loading of large fields can become more of a hassle than they're worth.
I believe that any tool that frees you from redoing work and concentrate in solving the problems is valid. ORMs appear to be that tool when it come to basic CRUD operations - even if using SPs to better implement a requirement (like using a hammer on a nail, it's just the right tool for task).
The point is: there's no black or white, just a scale of gray. Very inneficient and badly coded applications use the excuse of being 'DB agnostic' to explain the exagerated use of DB resources. In many cases, being very tied to a database is not good too. The objective is: getting maximum 'DB agnosticism' while not wasting customer IT resources without need.
There's no 'old vs new', just people saying that extreme 'pure' approaches are better. I don't really believe so. I believe that, as with any tool, the 'best' (notice the quotes) approach is using ORM until still is the right tool to make your data access. And use SPs inside your ORM when you reach a point where you're wasting resources and reducing scalability and 'worth life' (I forgot the english expression equivalent for the portuguese 'vida Ăștil') of TI resources. Or, in other words, use SP when it's for the processing at hand what a hammer is for the nail.
I do understand SQL querying and syntax because of previous work using ASP.NET web forms and stored procedures, but I would not call myself an "expert" in it.
Since I have been using ASP.NET MVC and LinqToSql it seems that so much of the heavy lifting is done for me and encapsulated away at the SQL end that I'm questioning whether there is any benefit in continuing to top-up my knowledge of SQL queries or whether I'm better off focusing my "learning time" on other things.
Your thoughts?
You should absolutely know SQL and keep your knowledge up-to-date. ORM is designed to ease the pain of doing something tedious that you know how to do, much like a graphing calculator is designed to do something that you can do by hand (and should know how).
The minute you start letting your ORM do things in the database that you don't fully understand is the minute you've lost control over your model.
In my opinion, knowing SQL is more valuable than any vendor specific technology. There will always be cases when those nice prepackaged frameworks will not be able to solve a particular situation and knowledge of advanced SQL will be required.
It is still important to learn SQL queries/syntax. The reason is you need to at least understand how Linq to SQL translate to the database behind the scenes.
This will help you when you find problems, for example something not updating correctly. Or a query performance needs to increase.
It is the same that you need to understand what assembly language is and how it eventually becomes machine language. However in all you don't have to be an expert, but at least be able to write in it and understand it.
It is still important to know SQL and the paradigm (set-based) behind it to be able to create efficient SQL statements, even if your using LinqToSql or any other OR/M.
There will always be situations where you will want to write the query in native SQL because it is not possible to write it in LinqToSql / HQL / whatever, or LinqToSql is just not able to generate a performant query for it.
There will always be situations where you will want to execute an ad-hoc query on a database using native sql, etc...
I think LinqToSQL (or other Linq to SQL providers) should not prevent you of knowing SQL.
When your query is not returning what you expect, or when it takes 30 minutes to run on the production database, you'd better be able to understand what LTS has generated, and why it is failing.
I know, it's a rehashed topic, and it might not be applicable to what you do ("small" database that will never hit that kind of problem etc), but it pays not to get too oblivious of abstraction layers sometimes.
The other reason is, Linq does not the whole range of what you can do in SQL, so you might have to resort to writing "raw" SQL, even if the result is materialised as objects.
It depends what you're working on, and from what you said it might make more sense to focus on other areas.
Having said that I find knowing SQL allows the following:
The ability to write queries to extract data from systems easily.
For adhoc queries, or for checking things.
The ability to write complex stored procedures, which allows me to group complex data processing in one place, where it should be, in the database.
The ability to fine tune LinqToSql by adding indexes, and understanding the SQL/query plan's it procedures.
Most of these are more of a help on more complex systems, so if you're not working on those it might not be as much of a help.
It may help in your situation to list the technologies which might be of use, and then prioritise them.
In order words make a development plan for yourself, which may encompass more then just learning technical knowledge but allow a more broad focus like design patterns, communication skills and other areas.
SQL is a tool. Linq to SQL is also a tool. Having more tools in your belt is a good thing. It'll give you more perspectives when attacking a problem.
Consider a scenario where you may want to do multiple queries or multiple updates to the db in one operation. If you can write TSQL you can potentially save yourself a lot of roundtrips to the database.
I would say you definately need to know your SQL in depth, because you need to know what code your Linq-expression generates and what effects the code will have if you want high performing queries. Sure you might get the job done in most cases, but sometimes there is a huge difference in performance in very subtle difference in Linq-syntax.
I ran into this this morning actually, where I had done .Any(d => d.Id == (...).First().Id) instead of doing where (...).Any(i => i.Id == d.Id). This resulted in the query executing five times slower.
Sometimes you need to analyze the actual Sql-query to realise the mistakes you make.
Its always a good think to learn the underlying language for stuff like Linq To SQL. SQL is pretty much standardized and it will help you understand a new paradigm in programming.
You may not always be working in .NET.
Doesn't hurt to know the underlying concepts.
LINQ to SQL is not being maintained anymore in favor of the Entity Framework
Sooner or later you will run into problems that need at leat a working knowledge of SQL to solve. And sooner or later you will run into requirements that are best realised in the DB (whether in SP-s or in triggers or views or whaterver).
LINQ To SQL will only work with .NET. IF you happen get another job where you are not working with .NET, then you will have to go back to writing Stored Procs.
Knowing SQL will also give you a better understanding of how the server operates as well as possibly making you a better database designer.