Coverting from Datasets to stored procs and modern ORM - stored-procedures

How much effort would it take to migrate a large existing codebase from a Strongly Typed Dataset driven data access layer to a DAL driven by stored procs and/or a more modern ORM package? Is there any shiny tool that automates a portion of this process?
The current code base has well over 100+ datasets mirroring the sql database (but haven't always been 100% in sync with changes in the DB structure). The current stance is that it would be too much time/effort to change now, but I question how much technical debt this is leaving us to pay down every week. This is to say nothing of the performance on the backend of datasets' SQL vs. an optimized sproc.
So, is that justified? Would something like that be too much of a monster to tackle in a reasonable time and get a worthwhile payoff? I know I could change the DAO-like classes they use to interfaces (should be already) and develop this on the side while still using the datasets in production until a feasibility test of some sort could be done on a small subset of the whole.

I would say moving on to an ORM like LINQ to SQL would be far less effort intensive compared to the stored proc driven DAL layer. few things that come straight to my mind :
Situation 1 - if you are using your typed datasets outside your DAL [in UI, BLL] then the effort is going to be high for sure because you will need to do an extensive impact analysis of the change and make changes pretty much everywhere you have used your typed datasets.
Situation 2 - if you are using your typed datasets ONLY withing your DAL and your UI, BLL dont care about the internal implementation of DAL and are oblivious of the typed dataset, then it would be far less effort intensive. You will need to change only within the DAL layer.
If you are in situation 2, then i think it would definitely be worthwile to move from typed data sets to ORM mapper.
If you intend to take the stored proc approach for DAL, then you might look at www.mygenerationsoftware.com to auto generate your procs to reduce some effort, however the effort would still be higher compared to the ORM mapper and another downside may be that you end up with umpteen simple insert / update procs in your DB. we generally use procs primarily for cascaded UPSERTS (update+insert) or complex calcualtions only and use LINQ to SQL for basic insert, update, deletes.
hope this helps in someway !

Related

Stored Procedures and ORM's

What's the purpose of stored procedures compared to the use of an ORM (nHibernate, EF, etc) to handle some CRUD operations? To call the stored procedure, we're just passing a few parameters and with an ORM we send the entire SQL query, but is it just a matter of performance and security or are there more advantages?
I'm asking this because I've never used stored procedures (I just write all SQL statements with an ORM and execute them), and a customer told me that I'll have to work with stored procedures in my next project, I'm trying to figure out when to use them.
Stored Procedures are often written in a dialect of SQL (T-SQL for SQL Server, PL-SQL Oracle, and so on). That's because they add extra capabilities to SQL to make it more powerful.
On the other hand, you have a ORM, let say NH that generates SQL.
the SQL statements generated by the ORM doesn't have the same speed or power of writing T-SQL Stored Procedures.
Here is where the dilemma enters: Do I need super fast application tied to a SQL Database vendor, hard to maintain or Do I need to be flexible because I need to target to multiple databases and I prefer cutting development time by writing HQL queries than SQL ones?
Stored Procedure are faster than SQL statements because they are pre-compiled in the Database Engine, with execution plans cached. You can't do that in NH, but you have other alternatives, like using Cache Level 1 or 2.
Also, try to do bulk operations with NH. Stored Procedures works very well in those cases. You need to consider that SP talks to the database in a deeper level.
The choice may not be that obvious because all depends of the scenario you are working on.
The main (and I'm tempted to say "only") reason you should use stored procedures is if you really need the performance.
It might seem tempting to just create "functions" in the database that do complex stuff quickly. But it can quickly spiral out of control.
I've worked with applications that encapsulate so much business logic in SQL, that it becomes virtually impossible to refactor anything. Literally hundreds of stored procedures that are black boxes for devs working with the ORM.
Such applications become brittle, hard to debug and hard to understand. By allowing business logic to live in stored procedures you are allowing SQL developers to make design choices that they shouldn't be making, in a tool that is much harder to work in, log and debug than an ORM. I've seen stored procedures that handle payment processing. Truly core stuff. Stuff that becomes so central to an application that nobody dares to touch it, all because some guy with good SQL skills made a script 5 years to fix something quickly, it was never migrated to the ORM and eventually grew into an unmanageable monster, full of tangled logic nobody understands. Devs end up having to blindly trust whatever it's doing. And what's worse, it's almost always outside test coverage, so you may break everything when you deploy, even if your tests pass with mocked data, but some ancient stored procedure suddenly starts acting up.
Abusing stored procedures is one of the worst forms of technical debt you can accumulate. The database, which is the persistence layer, should not be used for business logic. You should keep that distinction as strict as you can.
Of course, there will be cases where an ORM will have horrible performance or simply won't support a feature you need from SQL. If doing things in raw SQL is truly inevitable, only then should you consider stored procedures.
I've seen Stored Procedure Hell. You don't want that.
There are significant performance advantages to stored procedures in some circumstances. Often the queries generated by Linq and other ORMs can be inefficient, but still good enough for your purposes. Some RBDMS (such as SQL Server) will cache the execution plans for stored procedures, saving on query time. For more complex queries that you use frequently, this savings in performance can be critical.
For most normal CRUD, though, I have found that it is usually better for maintainability just to use the ORM if it is available and if its operations serve your needs. Entity Framework works quite well for me in the .NET world most of the time (in combination with Linq), and I like Propel a lot for PHP.
I stumbled over this pretty old question but I am shocked that the most important benefit of Stored Procedures is not even mentioned.
Security and resource protection
Using SPs you are able to grand execution rights for that SP to a user. The user can execute the SP and only that SP. You do not even have to give the user read or write access to the tables used. The user does not even have to know the tables used.
Using ORM you have to give read or/and write access to the tables used and users. The user can read all data from all the tables you granted the rights and even can combine them in queries, if you want it or not, and also can run queries that creates heavy load on the Database server.
This is especially useful in cases where application development and database development is done by different teams and the database is used by more than one application.
The primary use I find for them is to implement an abstraction layer and encapsulate query logic. In the same way that I write functions in a procedural language.
As le dorfier mentions one of the the primary reasons sprocs (and/or views) should be used is to provide an abstraction layer between a database and its clients (web apps, reports, ETLs etc)
This 'DB API' can make it easier to change/refactor your database without necessarily affecting clients.
See - Why use stored procs - for a more in depth discussion
I mainly stick to linq to sql as an ORM and i think its great, but there is still a place for stored procedures. Mainly i use the when the query i want to run is very complex, with many joins (especially outer joins, which suck in Linq), lots of aggregation in subqueries perhaps, recursive CTE's, and other similar scenarios.
For general crud though, there is no need.

Avoid writing SQL queries altogether in SSIS

Working on a Data Warehouse project, the guy that gave us the tutorial advised that we stick to using SQL queries over defining a lot of data flow transformations, citing points like it'll consume a lot of memory on the ETL box so we'd rather leave the processing to the DB box. Is this really advisable? Where's the balance between relying on GUI tools over executing a bunch of SQL scripts on your Integration package?
And honestly, I'd like to avoid writing SQL queries as much as I can. (but that's beside the point. I'd really like to look at this objectively.)
The answer is: it depends, but you want to pick one or the other for any given job and avoid mixing the two where possible.
Generally, it's best to either do everything possible within the tool or do everything possible within stored procedure code. When you have significant amounts of logic split between layers the system becomes harder to trace and debug.
Where the tool can do the transformations without the data flows becoming awkward and convoluted you could use the tool and try to have little or no logic in queries. This means that one single layer has the business logic and it should be fairly obvious where to find it. However, ETL tools tend to handle highly complex transformations relatively poorly. The sweet spot for this type of approach is on systems where you have a large number of data sources but relatively simple transformations.
If you have relatively complex transformations you may be better off putting all the business logic and transformation into a layer of stored procedures. SQL code is better at implementing complex transformations in a maintainable way - I have it on fairly good authority that around half of all data warehouse projects in the banking and insurance sectors use this type of architecture for precisely that reason. In this case the ETL tool can be used to implement relatively dumb data copies. Source data can be copied into staging areas essentially verbatim and then picked up by a body of stored procedure code that does the ETL. The ETL tool can be used for data copies, bulk load operations, logging, scheduling and other framework tasks.
In either case you're best off picking one approach. Otherwise, you can end up with business logic spread across extraction layers, database views, data flows, and stored procedure code. Logic spread across multiple layers is much harder to test.
When all of the logic is (for example) contained within stored procedures or focussed ETL transformation jobs you can unit test a given transformation in isolation. The clarity in design also helps with maintenance and auditing.
I find that using SQl code is not only faster to run, but it is faster to develop and much much easier to maintain.
Generally when you want to process each row individually, use a data flow, otherwise it may be better to use a Sql Command.
Personally I'd go with writing the SQL where I can. It's easier to optimise later and (usually) faster as well. Google will give much more detailed answers.
Another factor to think about is the provider you use for your connections.
You need to make the decision based on your needs. We use postgres DB, so we have to create a load of staging tables for some processes, which speeds the whole thing up.
You should also take into consideration the box it is running on, if you have an all powerful DB box, and a little ETL box, there'd be no point in running anything.
If you do all your processing on the ETL box you'll be dragging a lot of data across the network as well.
Check out these links to get you started:
ssistalk.com/category/ssis/ssis-advanced-techniques/
msdn.microsoft.com/en-us/library/ms141031.aspx
weblogs.sqlteam.com/jamesn/Default.aspx
I think this is a difficult question; and an interesting one as well.
One reason to use SSIS is to improve maintainability, IMHO. If you pack all the logic in SQL statements (and you sure can!) you tend to spoil this reason of using SSIS in the first place. You cannot really "see the data flow" anymore.
On the other hand I feel there are times when a well placed SQL statement has its value. For example when you read data from a table and for whatever reason already know you will only ever need the rows satisfying condition X I do not see the reason for reading the whole table and in the next step "conditional-splitting most of it away".
What I do not know is what this means in terms of performance, by the way. Is SSIS smart enough to see what is happening and change the "read-whole-table-and-conditional-split-it" into a "select Y from where X" on the fly (or when building/deploying)?
The big question is where to draw the line. And this depends to a certain extent on the people working on your ETL process. If everyone ever supporting the process knows SQL since its beginning you can better support a higher amount of SQL in your ETL than if you have co-workers (or customers, or successors you care about) that hardly understand what is happening in all your SQL, let alone change/improve/add to it.
So I think the bottom line is that neither not using nor doing everything in SQL is better. Try to make up some simple rules that fit your requirements and that everyone can live with, then follow them. This buys you the most value from using SSIS.
SQL Server does some things well and other things not so well. I use SSIS to import to or export data from SQL Server. During the course of the move I use SSIS where it makes sense. I can easily do work on a per row basis, which is not very efficient in SQL Server (cursors). To say that you shouldn't use transformations and data flows on an ETL box, because it is too expensive on the ETL box is like say 'don't drive your car too fast, because it causes the engine to work'. The purpose of an ETL and SSIS is to take some of the processing that SQL Sever does not do well and move it to an engine that does.
Got to use the right tool for the job. Generally, you do most things in SSIS, with certain things done in "pure" SQL.
For instance, in cases where you do a lot of UPDATE (table difference on dimension table in a dimensional model, say), you really don't want to execute an UPDATE for each row. In this scenario, you do a regular insert into a temporary table and then do the UPDATE in SQL, joining on appropriate keys.

Do I really need an ORM?

We're about to begin development on a mid-size ASP.Net MVC 2 web site. For a typical page, we grab data and throw it up on the web page, i.e. there is not much pre-processing of the data before it is sent to the UI.
We're now making the decision whether or not to use an ORM and if yes, which one. We had been looking at EF2 AKA EF4 (ASP.Net Entity Framework in VS 2010) as one possibility.
However, I'm thinking a simple solution in this case may be just to use datatables. The reason being that we don't plan to move the data around or process it a lot once we fetch it, so I'm not sure there is that much value in having strongly-typed objects as DTOs. Also, this way we avoid mapping altogether, thereby I think simplifying the code and allowing for faster development.
I should mention budget is an issue on this project, as well as speed of execution. We are striving for simplicity anywhere we can, both to keep the budget smaller, the schedule shorter, and performance fast.
We haven't fully decided this yet, but are currently leaning towards no ORM. Will we be OK with the no ORM approach or is an ORM worth it?
An ORM-tool isn't mandatory!
Jon's advice is sensible, but I think using DataTables isn't ideal.
If you're using an ORM tool, the object model is far simpler than a full-blown OO domain model. Moreover, Linq2Sql and Subsonic, for example, are straight-forward to use. Allowing very quick code changes when the database changes.
You say you won't move the data around or process it a lot, but any amount of processing will be far easier in ORM objects rather than in DataTables. Again, if the application changes and more processing is required the DataTable solution will be fragile.
If you're not going to practice full-blow Object Oriented Programming (and by that I mean you should have a very deep understanding of OOP, not just the ability to blurt out principles and design pattern names) then NO, it's not worth going for an ORM.
An ORM is only useful if your organization is fully invested in Object Oriented application design, and thus having the problem of having an Object to Relational model mapping. If you're not fully into OO, ORMs will become some sort of nasty roadblock that your organization will then feel it doesn't need.
If your team/organization's programming style has always leaned to keeping business logic in the DB (e.g., stored procs) or sticking to a more or less functional/procedural/static approach at writing objects and methods, do away with ORMs and stick to ADO.NET.
It sound as if you only need to show data and dont do CRUD.
I have found that ORM is not the best way to go when displaying lists that consists of data from various tables. You end up loading large objectgraphs just to get this one needed field.
A SQL-statement is just so much better at this.
I would still return lists of strongly typed objects from the DAL. By that you have a better chance of getting a compile time error when a change in the DAL is not reflected in other layers.
If you already have stored procedures you need then there probably isn't that much to gain from an ORM. If not though my experience has been that working with Linq to Entites has been much faster than the traditional stored procedure/strongly typed dataset approach assuming you are comfortable with Linq queries.
If you aren't worried about mapping to an object model then Linq to SQL is even simpler to use. You certainly don't need to be using a full OO model to reap the productivity benefits.
It would disagree with Malcolm's point about having to bring back graphs, if the ORM supports Linq you can use a projection to return a flat result with just the data you want with the added advantage the query is usually simpler than the corresponding SQL since you can use relationships rather than join.
Having made the switch and become comfortable with the technology I can think of this almost no good reason not to use one, they all support falling back to SQL stored procedures if you really need to. There will be a learning curve though and in this case that may make it not worth your while.
I agree with Joe R's advice - for speed of making changes & also the speed of initial development, LINQ-to-SQL or subsonic will get you up and going in no time.
If your application is really this simple and it's just a straight data out/data in direct mapping to the tables, have you considered looking at ASP.net dynamic data?
I'd also point to a good article about this by Scott Guthrie.
It largely depends on how familiar you are with ORMs.
I personally think that NHibernate, which is the king of ORMs in the .NET world, allows much more rapid development as once set up, you can pretty much forget about how you are getting data out of the database.
However, there is a steep learning curve to it, especially if you try and do things in a non-hacky way (which you should), so if your team doesn't have experience here and time is pressing then it probably won't cut it.
Linq2SQL is way too simple. Don't know about Subsonic, but if you are going to use an ORM it may be a good balance between having rapid development and getting something too powerful and complex.
Ultimately though, as a team, I think you want to learn NHibernate which is not time consuming to set up on a small to medium project once you know what you are doing, but is very powerful.

In terms of performance, which is faster: Linq2SQL or Linq2Entities (in ASP.net MVC)

Comparing the two data models in an asp.net MVC app, which provides better performance, LINQ 2 SQL or LINQ 2 Entities.
They don't always perform the same database operations when you use the same methods, or have the same LINQ methods for that matter....
You should benchmark the specific query yourself. However, LINQ to Entities provides an extra abstraction layer in comparison to LINQ to SQL which is likely to impede performance a little.
Performance issues aside, if your application is expected to work with SQL Server backends only and you need a one to one mapping between entities and tables, LINQ to SQL is simpler to use.
Maintainability over performance should be a goal at the outset of most projects even if performance is a requirement. Performance can be tuned later and with good design the ORM/data mapper can usually be swapped out later if it is necessary.
Don't bog yourself down with decisions like this early in a project. Sort with what is easier or what you already know. Then tune later.
Benchmark yourself for good results. Your environment is invariably always different to what the developer/manufacturer might use for comparisons.
It depends very specifically on what you're doing. There is no single answer to this.
If you are willing to take the performance hit involved in using an ORM, both of them should be suitable.
My advice would is always go with an ORM for as much code as you can. If you have something that's running unacceptably slowly then make a stored procedure to do that specific query.
This article seems to make some comprehensive comparisons:
http://toomanylayers.blogspot.com/2009/01/entity-framework-and-linq-to-sql.html
My general input though when I hear performance being mentioned is to choose a solution that best solves your problem until you discover that your problem is performance.

Would I use an ORM if I am using Stored Procedures?

If I use stored procedures, can I use an ORM?
EDIT:
If I can use a ORM, doesn't that defeat part of the database agnosticity reason for using an ORM? In other words, why else would I want to use an ORM, if I am binding myself to a particular database with stored procedures (or is that assumption wrong)?
Using ORM to access stored procedures is one of the best uses of ORM. It'll give you strongly typed objects, while you still have full control over the SQL.
In my experience I would let the ORM handle the 'CRUD' operations, and leave the specialty work to the stored procedures. Generally, using a stored procedure for 'CRUD' operations is overkill, and to let the ORM handle it, could drastically improve your productivity.
Yes, you can, all main ORMs support stored procedures.
As for your assumption, you are particulary right, when you use stored procedures with ORM you are coupling your project to a particular database. But in practice it is 99% that you will not need to change your database provider, so in this case you use ORM not to abstract from concrete DB provider, but to help yourself with object-relational mapping task - which is a main ORM's task and which ORM was originally made for.
It raises an interesting point.
Once you have ORM, and relatively simple queries, why do you need stored procedures? SP's are intimately bound to the database. ORM frees you from having to maintain a lot of DB-specific code. What is DB-specific can be isolated and managed.
I suggest that an ORM is a golden chance to cut the complexity and put all the processing in the code where it belongs.
Use the database for what it does best -- store data.
Use your application for what it does best -- process data.
You can use both ORM features and stored procedures functionality at once. Particularly use ORM until it fits you, but if you have some trouble with performance or need some low level tune - include stored procedures in your business-logic.
Yes you can but you will want to spend some time investigating what capabilities the ORM provides around stored procedures.
Most will allow you to run a stored procedure that returns a strongly typed object / entity. More advanced ORM's will allow you to plug stored procedures in for performing CRUD actions as well (so your generic querying, deleting etc goes via a stored procedure rather than a dynamic query).
Generally ORM's are great for generating ad-hoc queries and getting strongly typed entities but having strong stored procedure support has the benefit of allowing you to (sometimes) more easily access native capability of your RDMS that may not be exposed as first class citizens in the ORM - especially if the ORM supports many database engines.
Following up from your edit:
Often you will want to use the ad-hoc querying engine provided by the ORM however as I alluded to earlier - sometimes you want to query using a capability not exposed from the ORM.
The benefits of strongly typed entities is invaluable as it means you have domain object usually, rather than data readers, data tables etc. You can cleanly encapsulate behaviors and logic within those entities that you have retrieved.
The list of additional benefits is very long indeed - for example, with the LightSpeed ORM (and most others) your entities will support standard binding interfaces, error reporting interfaces, validation etc. On the querying side you will lose out on lazy loading etc unless you write it yourself.
Database "agnosticity" (?) is not the only reason to use an ORM. However, you could take advantage of being DB independent on 99% of your interactions with the DB and in 1% (or 2% or 10% or whatever) you might need stored procedures for speed/clarity/complexity. If you changed DBs, you would need to rewrite those.
I use netTiers a lot at work and we let it generate our stored procedures for us. These only handle the basic CRUD operations, but they are very fast and save me a TON of time. netTiers will also let us create custom stored procedures and generate our data access code with these procedures.
You can, but many of the more advanced ORM features tend to become more cumbersome to use. Something like iBatis is very easy to integrate with stored procedures, while the more sophisticated features of more complex engines like (N?)Hibernate like generation of dynamic SQL and lazy loading of large fields can become more of a hassle than they're worth.
I believe that any tool that frees you from redoing work and concentrate in solving the problems is valid. ORMs appear to be that tool when it come to basic CRUD operations - even if using SPs to better implement a requirement (like using a hammer on a nail, it's just the right tool for task).
The point is: there's no black or white, just a scale of gray. Very inneficient and badly coded applications use the excuse of being 'DB agnostic' to explain the exagerated use of DB resources. In many cases, being very tied to a database is not good too. The objective is: getting maximum 'DB agnosticism' while not wasting customer IT resources without need.
There's no 'old vs new', just people saying that extreme 'pure' approaches are better. I don't really believe so. I believe that, as with any tool, the 'best' (notice the quotes) approach is using ORM until still is the right tool to make your data access. And use SPs inside your ORM when you reach a point where you're wasting resources and reducing scalability and 'worth life' (I forgot the english expression equivalent for the portuguese 'vida Ăștil') of TI resources. Or, in other words, use SP when it's for the processing at hand what a hammer is for the nail.

Resources