Writing many stored procedures - stored-procedures

I need to write a long procedure which generates a report for a company.
Since report involves multiple data to be fetched i have written many small procedures to fetch the different records .
Is it the write approach to write many sub programs in the main program and calling them in the main program?
please help or is there any other way to do this.

Unless you really go wild (**) and build a 'tree' of stored procedures each calling the other one I don't see any problems with this. There might in fact be benefits to this as
it's easier to maintain smaller pieces of code
(re)compilation of smaller stored procedures is going to be faster
**: There is a 'limit' in MSSQL in that the stack is limited to 32 levels. That is, if procedure1 calls procedure1_1 and that procedure calls procedure1_1_1 and that one calls another etc... you'll get an error when you get over 32 calls 'deep'. Calling multiple stored procedures sequentially isn't a problem though.
The only thing to keep in mind is the context of the variables/temporary tables you're using. If you want to pass values around you'll need to use parameters. (using `OUTPUT can be useful to keep track of a #rowcount variable for instance).

Related

Optimizing looping through dataset

I have a code some thing like this
dxMemOrdered : TdxMemData;
while not qrySandbox2.EOF do
begin
dxMemOrdered.append;
dxMemOrderedTotal.asCurrency := qrySandbox2.FieldByName('TOTAL').asCurrency;
dxMemOrdered.post;
qrySandbox2.Next;
end;
this code executes in a thread. When there are huge records say "400000" it is taking around 25 minutes to parse through it. Is there any way that i can reduce the size by optimizing the loop? Any help would be appreciated.
Update
Based on the suggestions i made the following changes
dxMemOrdered : TdxMemData;
qrySandbox2.DisableControls;
while not qrySandbox2.Recordset.EOF do
begin
dxMemOrdered.append;
dxMemOrderedTotal.asCurrency := Recordset.Fields['TOTAL'].Value;
dxMemOrdered.post;
qrySandbox2.Next;
end;
qrySandbox2.EnableControls;
and my output time have improved from 15 mins to 2 mins. Thank you guys
Without seeing more code, the only suggestion I can make is make sure that any visual control that is using the memory table is disabled. Suppose you have a cxgrid called Grid that is linked to your dxMemOrdered memory table:
var
dxMemOrdered: TdxMemData;
...
Grid.BeginUpdate;
try
while not qrySandbox2.EOF do
begin
dxMemOrdered.append;
dxMemOrderedTotal.asCurrency := qrySandbox2.FieldByName('TOTAL').asCurrency;
dxMemOrdered.Post;
qrySandbox2.Next;
end;
finally
Grid.EndUpdate;
end;
Some ideas in order of performance gain vs work to do by you:
1) Check if the SQL dialect that you are using lets you use queries that directly SELECT from/INSERT to. This depends on the database you're using.
2) Make sure that if your datasets are not coupled to visual controls, that you call DisableControls/EnableControls around this loop
3) Does this code have to run in the main program thread? Maybe you can send if off to a separate thread while the user/program continues doing something else
4) When you have to deal with really large data, bulk insertion is the way to go. Many databases have options to bulk insert data from text files. Writing to a text file first and then bulk inserting is way faster than individual inserts. Again, this depends on your database type.
[Edit: I just see you inserting the info that it's TdxMemData, so some of these no longer apply. And you're already threading, missed that ;-). I leave this suggestions in for other readers with similar problems]
It's much better to let SQL do the work instead of iterating though a loop in Delphi. Try a query such as
insert into dxMemOrdered (total)
select total from qrySandbox2
Is 'total' the only field in dxMemOrdered? I hope that it's not the primary key otherwise you are likely to have collisions, meaning that rows will not be added.
There's actually a lot you could do to speed up your thread.
The first would be to look at the problem in a broader perspective:
Am I fetching data from a cached / fast disk, possibly moved in memory?
Am I doing the right thing, when aggregating totals by hand? SQL engines are expecially optimized to do those things, all you'd need to do is to define an additional logical field where to store the SQL aggregated result.
Another little optimization that may bring an improvement over large amounts of looping is to not use constructs like:
Recordset.Fields['TOTAL'].Value
Recordset.FieldByName('TOTAL').Value
but to add the fields with the fields editor and then directly accessing the right field. You'll save a whole loop through the fields collection, that otherwise is performed on every field, on every next record.

Why do we use data structures? (when no dynamic allocation is needed)

I'm pretty sure this is a silly newbie question but I didn't know it so I had to ask...
Why do we use data structures, like Linked List, Binary Search Tree, etc? (when no dynamic allocation is needed)
I mean: wouldn't it be faster if we kept a single variable for a single object? Wouldn't that speed up access time? Eg: BST possibly has to run through some pointers first before it gets to the actual data.
Except for when dynamic allocation is needed, is there a reason to use them?
Eg: using linked list/ BST / std::vector in a situation where a simple (non-dynamic) array could be used.
Each thing you are storing is being kept in it's own variable (or storage location). Data structures apply organization to your data. Imagine if you had 10,000 things you were trying to track. You could store them in 10,000 separate variables. If you did that, then you'd always be limited to 10,000 different things. If you wanted more, you'd have to modify your program and recompile it each time you wanted to increase the number. You might also have to modify the code to change the way in which the calculations are done if the order of the items changes because the new one is introduced in the middle.
Using data structures, from simple arrays to more complex trees, hash tables, or custom data structures, allows your code to both be more organized and extensible. Using an array, which can either be created to hold the required number of elements or extended to hold more after it's first created keeps you from having to rewrite your code each time the number of data items changes. Using an appropriate data structure allows you to design algorithms based on the relationships between the data elements rather than some fixed ordering, giving you more flexibility.
A simple analogy might help to understand. You could, for example, organize all of your important papers by putting each of them into separate filing cabinet. If you did that you'd have to memorize (i.e., hard-code) the cabinet in which each item can be found in order to use them effectively. Alternatively, you could store each in the same filing cabinet (like a generic array). This is better in that they're all in one place, but still not optimum, since you have to search through them all each time you want to find one. Better yet would be to organize them by subject, putting like subjects in the same file folder (separate arrays, different structures). That way you can look for the file folder for the correct subject, then find the item you're looking for in it. Depending on your needs you can use different filing methods (data structures/algorithms) to better organize your information for it's intended use.
I'll also note that there are times when it does make sense to use individual variables for each data item you are using. Frequently there is a mixture of individual variables and more complex structures, using the appropriate method depending on the use of the particular item. For example, you might store the sum of a collection of integers in a variable while the integers themselves are stored in an array. A program would need to be pretty simple though before the introduction of data structures wouldn't be appropriate.
Sorry, but you didn't just find a great new way of doing things ;) There are several huge problems with this approach.
How could this be done without requring programmers to massively (and nontrivially) rewrite tons of code as soon as the number of allowed items changes? Even when you have to fix your data structure sizes at compile time (e.g. arrays in C), you can use a constant. Then, changing a single constant and recompiling is sufficent for changes to that size (if the code was written with this in mind). With your approach, we'd have to type hundreds or even thousands of lines every time some size changes. Not to mention that all this code would be incredibly hard to read, write, maintain and verify. The old truism "more lines of code = more space for bugs" is taken up to eleven in such a setting.
Then there's the fact that the number is almost never set in stone. Even when it is a compile time constant, changes are still likely. Writing hundreds of lines of code for a minor (if it exists at all) performance gain is hardly ever worth it. This goes thrice if you'd have to do the same amount of work again every time you want to change something. Not to mention that it isn't possible at all once there is any remotely dynamic component in the size of the data structures. That is to say, it's very rarely possible.
Also consider the concept of implicit and succinct data structures. If you use a set of hard-coded variables instead of abstracting over the size, you still got a data structure. You merely made it implicit, unrolled the algorithms operating on it, and set its size in stone. Philosophically, you changed nothing.
But surely it has a performance benefit? Well, possible, although it will be tiny. But it isn't guaranteed to be there. You'd save some space on data, but code size would explode. And as everyone informed about inlining should know, small code sizes are very useful for performance to allow the code to be in the cache. Also, argument passing would result in excessive copying unless you'd figure out a trick to derive the location of most variables from a few pointers. Needless to say, this would be nonportable, very tricky to get right even on a single platform, and liable to being broken by any change to the code or the compiler invocation.
Finally, note that a weaker form is sometimes done. The Wikipedia page on implicit and succinct data structures has some examples. On a smaller scale, some data structures store much data in one place, such that it can be accessed with less pointer chasing and is more likely to be in the cache (e.g. cache-aware and cache-oblivious data structures). It's just not viable for 99% of all code and taking it to the extreme adds only a tiny, if any, benefit.
The main benefit to datastructures, in my opinion, is that you are relationally grouping them. For instance, instead of having 10 separate variables of class MyClass, you can have a datastructure that groups them all. This grouping allows for certain operations to be performed because they are structured together.
Not to mention, having datastructures can potentially enforce type security, which is powerful and necessary in many cases.
And last but not least, what would you rather do?
string string1 = "string1";
string string2 = "string2";
string string3 = "string3";
string string4 = "string4";
string string5 = "string5";
Console.WriteLine(string1);
Console.WriteLine(string2);
Console.WriteLine(string3);
Console.WriteLine(string4);
Console.WriteLine(string5);
Or...
List<string> myStringList = new List<string>() { "string1", "string2", "string3", "string4", "string5" };
foreach (string s in myStringList)
Console.WriteLine(s);

UnitTesting a class that returns a complex dataset

After months of frustration and of time spent in inserting needles in voodoo dolls of previous developers I decided that it is better try to refactor the legacy code.
I already ordered Micheal Feather's book, I am into Fowler's refactoring and I made some sample projects with DUnit.
So even if I don't master the subject I feel it is time to act and put some ideas into practice.
Almost 100% of the code I work on has the business logic trapped in the UI, moreover all is procedural programming (with some few exceptions). The application started as quick & dirty and continued as such.
Now writing tests for all the application is a meaningless task in my case, but I would like to try to unittest something that I need to refactor.
One of the complex tasks one big "TForm business logic class" does is to read DB data, make some computations and populate a scheduler component. I would like to remove the reading DB data and computation part and assign to a new class this task. Of course this is a way to improve the current design, it is not the best way for starting from scratch, but I'd like to do this because the data returned by this new class is useful also in other ways, for example now I've been ask to send e-mail notifications of scheduler data.
So to avoid a massive copy and paste operation I need the new class.
Now the scheduler is populated from a huge dataset (huge in size and in number of fields), probably a first refactoring step could be obtaining the dataset from the new class. But then in the future I'd better use a new class (like TSchedulerData or some other name less bound to scheduler) to manage the data, and instead of having a dataset as result i can have a TSchedulerData object.
Since refactor occurs at at small steps and tests are needed to refactor better I am a little confused on how to proceed.
The following points are not clear to me:
1) how to test a complex dataset? Should I run the working application, save one result set to xml, and write a test where I use a TClientDataSet containing that xml data?
2) How much do I have to care about TSchedulerData? I mean I am not 100% sure I will use TSchedulerData, may be I will stick with the Dataset, anyway thinking of creating complex tests that will be discarded in 2 weeks is not appealing for a DUnitNewbee. Anyway probably this is how it works. I can't imagine the number of bugs that I would face without a test.
Final note: I know someone thinks rewriting from scratch is a better option, but this is not an option. "The application is huge and it is sold today and new features are required today not to get out of business". This is what I have been told, anyway refactoring can save my life and extend the application life.
Your eventual goal is to separate the UI, data storage and business logic into distinct layers.
Its very difficult to test a UI with automatic testing frameworks. You'll want to eventually separate as much of the business logic from the UI as possible. This can be accomplished using one of the various Model/View/* patterns. I prefer MVP passive view, which attempts to make the UI nothing more than an interface. If you're using a Dataset MVP Supervising Controller may be a better fit.
Data storage needs to have its own suite of tests but these are different from unit tests (though you can use the same unit testing framework) and there are usually fewer of them. You can get away with this because most of the heavy lifting is being done by third party data components and a dbms (in your case T*Dataset). These are integration tests. Basically making sure your code plays nice with the vendor's code. Also needed if you have any stored procedures defined in the DB. They are much slower that unit tests and don't need to be run as often.
The business logic is what you want to test the most. Every calculation, loop or branch should have at least one test(more is preferable). In legacy code this logic often touches the UI and db directly and does multiple things in a single function. Here Extract Method is your friend. Good places to extract methods are:
for I:=0 to List.Count - 1 do
begin
//HERE
end;
if /*HERE if its a complex condition*/ then
begin
//HERE
end
else
begin
//HERE
end
Answer := Var1 / Var2 + Var1 * Var3; //HERE
When you come across one of these extraction points
Decide what you want the method signature to look like for your new method: Method name, parameters, return value.
Write a test that calls it and checks the expected outcome.
Extract the method.
If all goes well you will have a newly extracted method with at least one passing unit test.
Delphi's built in Extract Method doesn't give you any way to adjust the signature so if that's your own option you'll have to make do and fix it after extraction. You'll also want to make the new method public so your test can access it. Some people balk at making a private utility method public but at this early stage you have little choice. Once you've made sufficient progress you'll start to see that some utility methods you've extracted belong in their own class (in which case they'd have to be public anyway) while others can be made private/protected and tested indirectly by testing methods that depend on them.
As your test suite grows you'll want to run them after each change to ensure your latest change hasn't broken something elsewhere.
This topic is much too large to cover completely in an answer. You'll find the vast majority of your questions are covered when that book arrives.
I'd say approach it in focussed baby steps.
Step#1: Should always be to get some tests around your area of invasion TForm - regression tests aka safety net. In your case, sense what the app is doing. From what I read, it seems to be a data transformer. So spend time to understand all (or most important if all is not feasible) combinations of input data and the corresponding output schedules. Write them up as tests. Ensure that all tests pass.
Step#2: Now attempt your refactorings. Move blocks of code into cohesive classes etc all under the safety of the regression net.
Testing complex datasets - testing file dumps should be the last resort. But in this case, it seems like a simple option to get started. Maybe you could later make it a first class domain object TSchedule with its own Equals() implementation. Defer design decisions/changes until you have a solid regression test suite around your area of modification.

different between cfstoredproc and cfquery

I've found previous programmers using cfstoredproc in our existing project about insert records into database.
Just example he/she used like that:
<cfstoredproc procedure="myProc" datasource="myDsn">
<cfprocparam type="In" cfsqltype="CF_SQL_CHAR" value="ppshein" dbvarname="username">
<cfprocparam type="In" cfsqltype="CF_SQL_CHAR" value="28 yrs" dbvarname="age">
</cfstoredproc>
I can't convince why he/she used above code instead of:
<cfquery datasource="myDsn">
insert usertb
(name, age)
values
(<cfqueryparam cfsqltype="CF_SQL_CHAR" value="ppshein">, <cfqueryparam cfsqltype="CF_SQL_CHAR" value="28 yrs">)
</cfquery>
I feel there will be the hidden performance using cfstoredproc and cfquery for complex data manipulation. Please let me know the performance using cfstoredproc in coldfusion instead of cfquery for complex data manipulation. What I know is reusable.
CFSTOREDPROC should have better performance for the same reason a stored procedure will have better performance at the database level -- when created, the stored procedure is optimized internally by the database.
Whether this is noticeable depends on your database and your query. Use of CFQUERYPARAM inside your CFQUERY (as in your example) also speeds execution (at the db driver level).
Unless the application is VERY performance-sensitive, I tend to run my SQL code in a profiler first to optimize it, then put it into my CFQUERY parametrized with CFQUERYPARAM tags, rather than use a storedproc. That way, all the logic is in the CF code. This is a matter of taste, of course, and it's easy to move the SQL into a storedproc later when the application matures.
Some shops prefer to have all data logic controlled by the database, leaving CF to act almost exclusively as the front-end generator. Some places are so controlling that they won't let you write any SQL in your CF code.
Update: There might be more to that Stored Proc than a simple INSERT INTO. There may be some data lookup in another table. There could be validation. There may be conditional logic. There may be multiple transactions going on, like a log. A failure to perform the insert may return a specific status code rather than throwing an error.
Honestly, it's simply a matter of style. There are reasons for and against either way, and I've found that it usually comes down to who has more/more competent coders: The CF guys or the database guys.
Basically if you use a stored procedure, the stored procedure will be pre-complied by the database and the execution plan stored. This means that subsequent calls to the stored procedure do not incurr the that overhead. For large complex queries this can be substantial.
So as a rule of thumb: queries that are...
large and complex or
called very frequently or
both of the above
...are very good candidates for conversion to a stored procedure.
Hope that helps!

Minimizing the memory footprint of an ADO.NET DataSet?

Given a legacy system that is making heavy use of DataSets and little or no possibility of replacing these with business objects or other, more efficient data structures:
Are there any techniques for reducing the memory footprint of a DataSet?
I am thinking about things like setting initial capacity (when known), removing restrictions, etc., but I have little experience with DataSets and do not know which specific options might be available to me or if any of them would matter at all.
Update:
I am aware of the long-term refactoring possibilities, but I am looking for quick fixes given a set of DataTable objects stored in a DataSet, i.e. which properties are known to affect memory overhead.
Due to the way data is stored internally, setting the initial capacity could be one method, as this would prevent the object from allocating an arbitrarily large amount of memory when adding just one more row.
This is unluckily to help you, but it can help greatly in same cases.
If you are storing a lot of strings that are the same in the dataset, e.g. names of Towns, look at only using a single string object with each distinct string.
e.g.
Directory <string, string> towns = new Directory <string, string>();
foreach(var row in datatable)
{
if (towns.contains(row.town))
{
row.town = towns[row.town]
}
else
{
towns[row.town] = row.town;
}
}
Then the GC can reclaim most of the duplicate strings, however this only works if the datasets lives for along time.
You may wish to do this in the rowCreated event, so that all the duplicate string objects are not created in first place.
If you're using VS2005+, you can instantiate DataTable objects, rather than the whole DataSet. In 2003, if the DataTable is instantiated, it comes with DataSet by default. 2005 and after, you get just the DataTable.
Look at your Data Access layer for filling the DataSets or DataTables. It's most often the case that there is too much data coming through. Make your queries more specific.
Make sure the code you're using does not do goofy things like copy the DataSets when they're passed around. Make sure you're using .Select statements or DataViews to filter and sort, rather than making copies.
There aren't a whole lot of quick "optimizations" for DataSets. If you're having trouble with memory, use items 2 and 3. This would be the case regardless of what type of data transport object you'd use.
And get good at DataSets. If you're not familiar with them, you can do silly things, like with anything. Then you'll write articles about how they suck, which are really articles about how little you know about them. They're really quite useful and simple to maintain. A couple tips:
Use typed DataSets. They'll save you gobs of coding and they're typed, which helps with simple validation.
If you're using typed DSs, make sure you don't modify the generated code file. If you're using VS2005+, you can put any custom business object behavior in the partial class for the DS (not the .designer code file).
Use DataView and .Select wherever you find yourself looping through DataRow objects.
Look around for a good code generation tool and build a rational data access framework for filling and updating from the DSs. One of the issues is that sometimes, designers tie the design of the DS directly to tables in the db, making the design brittle to data structure changes. If you -must- do that, build or use a code generator to build your data access layer from the db, like CodeSmith. Start by looking at some of the CodeSmith templates for generating stored procs and data access classes.
Remember when talking to someone about "objects" vs. "DataSets", the object in this case is the DataRow, not the DataSet. And because of the partial classes you can put behavior on the "object", getting you 95% of the benefits of "objects" for those who love writing code.
You could try making your tables and rows implement interfaces in the code behind files. Then over time change your code to make use of these interfaces rather then the table/rows directly.
Once most of your code just uses the interfaces, you could use a code generate to create C# class that implement those interfaces without the overhead of rows/tables.
However it may be cheaper just to move to 64 bit and buy more ram...

Resources