Given a legacy system that is making heavy use of DataSets and little or no possibility of replacing these with business objects or other, more efficient data structures:
Are there any techniques for reducing the memory footprint of a DataSet?
I am thinking about things like setting initial capacity (when known), removing restrictions, etc., but I have little experience with DataSets and do not know which specific options might be available to me or if any of them would matter at all.
Update:
I am aware of the long-term refactoring possibilities, but I am looking for quick fixes given a set of DataTable objects stored in a DataSet, i.e. which properties are known to affect memory overhead.
Due to the way data is stored internally, setting the initial capacity could be one method, as this would prevent the object from allocating an arbitrarily large amount of memory when adding just one more row.
This is unluckily to help you, but it can help greatly in same cases.
If you are storing a lot of strings that are the same in the dataset, e.g. names of Towns, look at only using a single string object with each distinct string.
e.g.
Directory <string, string> towns = new Directory <string, string>();
foreach(var row in datatable)
{
if (towns.contains(row.town))
{
row.town = towns[row.town]
}
else
{
towns[row.town] = row.town;
}
}
Then the GC can reclaim most of the duplicate strings, however this only works if the datasets lives for along time.
You may wish to do this in the rowCreated event, so that all the duplicate string objects are not created in first place.
If you're using VS2005+, you can instantiate DataTable objects, rather than the whole DataSet. In 2003, if the DataTable is instantiated, it comes with DataSet by default. 2005 and after, you get just the DataTable.
Look at your Data Access layer for filling the DataSets or DataTables. It's most often the case that there is too much data coming through. Make your queries more specific.
Make sure the code you're using does not do goofy things like copy the DataSets when they're passed around. Make sure you're using .Select statements or DataViews to filter and sort, rather than making copies.
There aren't a whole lot of quick "optimizations" for DataSets. If you're having trouble with memory, use items 2 and 3. This would be the case regardless of what type of data transport object you'd use.
And get good at DataSets. If you're not familiar with them, you can do silly things, like with anything. Then you'll write articles about how they suck, which are really articles about how little you know about them. They're really quite useful and simple to maintain. A couple tips:
Use typed DataSets. They'll save you gobs of coding and they're typed, which helps with simple validation.
If you're using typed DSs, make sure you don't modify the generated code file. If you're using VS2005+, you can put any custom business object behavior in the partial class for the DS (not the .designer code file).
Use DataView and .Select wherever you find yourself looping through DataRow objects.
Look around for a good code generation tool and build a rational data access framework for filling and updating from the DSs. One of the issues is that sometimes, designers tie the design of the DS directly to tables in the db, making the design brittle to data structure changes. If you -must- do that, build or use a code generator to build your data access layer from the db, like CodeSmith. Start by looking at some of the CodeSmith templates for generating stored procs and data access classes.
Remember when talking to someone about "objects" vs. "DataSets", the object in this case is the DataRow, not the DataSet. And because of the partial classes you can put behavior on the "object", getting you 95% of the benefits of "objects" for those who love writing code.
You could try making your tables and rows implement interfaces in the code behind files. Then over time change your code to make use of these interfaces rather then the table/rows directly.
Once most of your code just uses the interfaces, you could use a code generate to create C# class that implement those interfaces without the overhead of rows/tables.
However it may be cheaper just to move to 64 bit and buy more ram...
Related
First, I understand the difference between value and reference types -this isn't that question. I am rewriting some of my code in Swift, and decided to also refactor some of the classes. Therefore, I thought I would see if some of the classes make sense as structs.
Memory: I have some model classes that hold very large arrays, that are constantly growing in size (unknown final size), and could exist for hours. First, are there any guidelines about a suggested or absolute size for a struct, since it lives on the stack?
Refactoring Use: Since I'm refactoring what right now is a mess with too much dependency, I wonder how I could improve on that. The views and view controllers are mostly easily, it's my model, and what it does, that's always left me wishing for better examples to follow.
WorkerManager: Singleton that holds one or two Workers at a time. One will always be recording new data from a sensor, and the other would be reviewing stored data. The view controllers get the Worker reference from the WorkerManager, and ask the Worker for the data to be displayed.
Worker: Does everything on a queue, to prevent memory access issues (C array pointers are constantly changing as they grow). Listening: The listening Worker listens for new data, sends it to a Processor object (that it created) that cleans up the data and stores it in C arrays held by the Worker. Then, if there is valid data, the Worker tells the Analyzer (also owned by the worker) to analyze the data and stores it in other C arrays to be fed to views. Both the Processor and Analyzer need state to know what has happened in the past and what to process and analyze next. The pure raw data is stored in a separate Record NSManaged object. Reviewer Takes a Record and uses the pure raw data to recreate all of the analyzed data so that it can be reviewed. (analyzed data is massive, and I don't want to store it to disk)
Now, my second question is, could/should Processor and Analyzer be replaced with structs? Or maybe protocols for the Worker? They aren't really "objects" in the normal sense, just convenient groups of related methods and the necessary state. And since the code is nearly a thousand lines for each, and I don't want to put it all in one class, or even the same file.
I just don't have a good sense of how to remove all of my state, use pure functions for all of the complex mathematical operations that are performed on the arrays, and where to put them.
While the struct itself lives on the stack, the array data lives on the heap so that array can grow in size dynamically. So even if you have an array with million items in it and pass it somewhere, none of the items are copied until you change the new array due to the copy-on-write implementation. This is described in details in 2015 WWDC Session 414.
As for the second question, I think that 2015 WWDC Session 414 again has the answer. The basic check that Apple engineers recommend for value types are:
Use a value type when:
Comparing instance data with == makes sense
You want copies to have independent state
The data will be used in code across multiple threads
Use a reference type (e.g. use a class) when:
Comparing instance identity with === makes sense
You want to create shared, mutable state
So from what you've described, I think that reference types fit Processor and Analyzer much better. It doesn't seem that copies of Processor and Analyzer are valid objects if you've not created new Producers and Analyzers explicitly. Would you not want the changes to these objects to be shared?
I'm doing scientific research, processing through millions of combinations of multi-megabyte arrays.
For you to be capable of answering this question you will need to have knowledge/experience of all of the following
how HHVM is able to cache data structures in RAM between requests
how to tell HHVM data structures will be constant
how to declare array index and value types
I need to process the entire arrays, so it's a lot of data to be loaded and processed. (millions of requests within minutes on a LAN). The faster I can complete requests the quicker I can complete my work. If HHVM has to do work loading this data on each request, it accounts for a significant fraction of the time to complete the request (sometimes more than half, it depends on the complexity of the analysis I'm doing at the time).
I have found a method that has allowed me to keep these data structures cached in RAM (no loading from files, interpreting code, pushing to the array hundreds of thousands of times for no reason, no pointless repetitive unserialize etc), and thus I have eliminated this massive measurable delay.
I have 3 questions regarding how I can make this even faster:
Is the way I'm doing it now creating a global scope penalty?
How can I declare my arrays as constant and tell HHVM what data types to expect?
If I declare my arrays as constant is it even necessary to declare the types for HHVM?
Instead of using nested arrays, if I use 3 separate data structures ImmVector, PackedArray, or define a class would it be faster?
Keep in mind that anything that prevents HHVM from caching the data structure in RAM between requests should be regarded as unacceptable.
Lookuptable35543.php
<?php
$data = [
["uuid (20 chars)", 5336, 7373],
["uuid (20 chars)", 5336, 7373],
#more lines as above
];
?>
Some of these files are many MB in size and there are a lot of them
Main.php
<?php
function main() {
require /path/to/Lookuptable35543.php;
#(Do stuff with $data)
}
?>
This is working quite well, as Main.php gets thousands of requests, in a short period of time, HHVM keeps Lookuptable.php's data structure in memory. Avoiding pointless processing and IO, as it just sits in RAM, ready for use. (I have more than enough RAM)
Unfortunately, the only way I know how to make HHVM hold the lookup table in RAM is, I set $data in the global scope inside my lookup####.php file (then require the lookup file into a function in the data processing file: Main.php)? This way HHVM doesnt bother loading the file or re executing the code to create $data, because it can see that $data can be determined at compile time, and it will not ever change during runtime. This works but I dont know if there is a penalty from having the $data exist in the lookup###.php file's global scope. (Or maybe its not global at all because it is required into main.php's function?)
What if I return $data from a function inside Lookup.php and call that function from Main.php like this
Main.php
Would the HHVM JIT the result of getData() in RAM?
Somehow I associate functions with unpredictability... but maybe HHVM is clever enough to know that the functions result can be determined at compile time, and never changes?
I can't put the lookup table inside Main.php because I require different lookup tables based on the type of request.
Is there a way I can tell HHVM that my outer array will always have an integer index that never changes, and the values of the outer array will always be an array?
Perhaps I need to use ImmVector?
Then is there a way to tell HHVM that my inner array will always be a fixed length string followed by 2 integers, always, no extra elements, contents never changes?
I'd prefer not to use OO or create a class. How can I declare types, procedural style?
If a class is absolutely necessary can you please give example code suitable for my requirements above?
Will it be faster if I dont nest arrays?
I just realized I could have one array with integer index and values of fixed length string. Then a 2nd array with integer index and integer values, and a 3rd one with integer index and integer values.
If you're not familiar with this HHVM caching technique please do not waste mutual time suggesting a database, redis, APC, unserialize, etc. The fastest is for HHVM to just keep my various $data variables in RAM. Even unserializing $data from a ramdisk file is slow, because then the entire data structure must be parsed as a string and converted into a data structure in memory for every request. APC has the same problem as far as i know. I dont want to even have to copy $data. The lookup tables are immutable, read only. They must just stay fully structured in RAM. My current caching solution (at the top of this question) has already given me huge gains, but as per my 3 questions I think there may be more gains to be had?
Incase you're wondering, I have measured the latency of various data loading or caching methods.
Now I basically want to keep the caching situation I have, but give the HHVM JIT maximum confidence about how to type my data, so it can save time not running type or even bound (array size) checks.
Edit
Ok so nobody has been able to give me any code examples yet, so I'm just trying stuff out.
Here's what I've found out so far.
const arrays don't work yet in HHVM. const foo = ['uuid1',43,43];
throws an error about HHVM only supporting constants with scalar values.
Vector with Array values: I don't know how it will perform yet... I expect it will be better than a normal array. This is valid HH code.
This is progress, because HHVM should be able to cache this in the same way, HHVM knows this whole structure is constant, and HHVM knows the indexes are all integers.
What I'm still not entirely happy about with this structure is this:
Consider this code
for ($n=0;$n<count($iv);++$n) if ($x > $iv[$n][1]) dosomething();
Will HHVM perform a type check on $if[$n][2] on every loop iteration?
In my definition of $iv above, there is nothing that says the 2nd element of the inner array will be an integer.
How can I improve on this?
Can disabling the type checker be of any use? Does this only hide errors from the external type checker, or does it prevent HHVM from constantly doing type checks? (I'm thinking it's the first thing)
Perhaps if I could make my own user-defined type that would solve the problem?
<?hh
#I don't know what mechanisms for UDT's exist, so this code is made-up
CreateUDT foo = <string,int,int>;
$iv = ImmVector<foo> {
['uuid1',425,244],
['uuid2',658,836]
};
print_r($iv);
I found a reference to this at Hack Collections Literal Syntax Vector<Foo> unfortunately it might not be available to use yet.
I'm a software engineer at Facebook working on HHVM.
This entire question reeks of premature optimization to me. Have you done profiling and determined that loading this array is actually a bottleneck for your app? (Not just microbenchmarks, but how it actually affects the performance, latency, RPS, etc of realistic pageloads.) And also isolated from other effects, e.g., if this array is a cache or some sort of precomputed data, you need to isolate the win of precomputing the data from the actual time to load it by caching it in various different ways.
In general, HHVM is very good at dealing with arrays, since they are so hot in nearly every codepath -- and in particular at constant arrays like this one. To your questions about how to inform it of the shape and types of things in the arrays, HHVM can figure that all out for itself, and is very good at doing so on constant arrays composed entirely of constants. (And the ways it thinks about arrays aren't quite the ways you think about arrays, so it can probably do a better job anyway!) Basically, unless profiling says this is actually a hotspot -- which I'm pretty skeptical of -- I wouldn't worry too much about it. A couple general notes to be aware of:
Measure every performance diff. Don't prematurely optimize -- use profiling to guide. The developer productivity lost by premature optimizations getting in the way can be lethal.
Get things out of toplevel ("pseudomains") as much as possible. A function which returns a static or constant array should be just fine, and will in general help HHVM optimize code even better.
Avoid references as much as possible, especially in this array if you care about performance so much.
You probably should look into repo authoritiative mode which can help HHVM optimize lots of things even more -- but in particular for this case, the more aggressive inlining that repo auth mode can do might be a win.
Edit, aside:
because then the entire data structure must be parsed as a string and converted into a data structure in memory for every request. APC has the same problem as far as i know
This is exactly what I mean by premature optimization: you're rejecting APC without even trying it, even if it might be a cleaner way of doing what you want. It turns out that, in most cases, HHVM actually can optimize away the serialization/deserialization of storing arrays in APC, particularly if they are constant arrays that are never modified. As above, HHVM is very good at optimizing lots of common patterns. Just write code that's clean, profile it, and fix the hotspots.
Okay I've solved my first question.
I don't have any global scope issues. My require is being done from inside function main(), so it's as if the code from lookuptable####.php is being inserted into function main().
HHVM docs: "If the include occurs inside a function..."
Basically if you were to open lookuptable####.php it looks like the code is in global scope, but that's not the file that is being requested from hhvm. main.php is the one being requested, thus there is no code in global scope.
I think I've answered my 2nd question, it's currently at the bottom of my question. I'm not 100% convinced, but I'm pretty happy to move ahead and test it.
I'm pretty sure this is a silly newbie question but I didn't know it so I had to ask...
Why do we use data structures, like Linked List, Binary Search Tree, etc? (when no dynamic allocation is needed)
I mean: wouldn't it be faster if we kept a single variable for a single object? Wouldn't that speed up access time? Eg: BST possibly has to run through some pointers first before it gets to the actual data.
Except for when dynamic allocation is needed, is there a reason to use them?
Eg: using linked list/ BST / std::vector in a situation where a simple (non-dynamic) array could be used.
Each thing you are storing is being kept in it's own variable (or storage location). Data structures apply organization to your data. Imagine if you had 10,000 things you were trying to track. You could store them in 10,000 separate variables. If you did that, then you'd always be limited to 10,000 different things. If you wanted more, you'd have to modify your program and recompile it each time you wanted to increase the number. You might also have to modify the code to change the way in which the calculations are done if the order of the items changes because the new one is introduced in the middle.
Using data structures, from simple arrays to more complex trees, hash tables, or custom data structures, allows your code to both be more organized and extensible. Using an array, which can either be created to hold the required number of elements or extended to hold more after it's first created keeps you from having to rewrite your code each time the number of data items changes. Using an appropriate data structure allows you to design algorithms based on the relationships between the data elements rather than some fixed ordering, giving you more flexibility.
A simple analogy might help to understand. You could, for example, organize all of your important papers by putting each of them into separate filing cabinet. If you did that you'd have to memorize (i.e., hard-code) the cabinet in which each item can be found in order to use them effectively. Alternatively, you could store each in the same filing cabinet (like a generic array). This is better in that they're all in one place, but still not optimum, since you have to search through them all each time you want to find one. Better yet would be to organize them by subject, putting like subjects in the same file folder (separate arrays, different structures). That way you can look for the file folder for the correct subject, then find the item you're looking for in it. Depending on your needs you can use different filing methods (data structures/algorithms) to better organize your information for it's intended use.
I'll also note that there are times when it does make sense to use individual variables for each data item you are using. Frequently there is a mixture of individual variables and more complex structures, using the appropriate method depending on the use of the particular item. For example, you might store the sum of a collection of integers in a variable while the integers themselves are stored in an array. A program would need to be pretty simple though before the introduction of data structures wouldn't be appropriate.
Sorry, but you didn't just find a great new way of doing things ;) There are several huge problems with this approach.
How could this be done without requring programmers to massively (and nontrivially) rewrite tons of code as soon as the number of allowed items changes? Even when you have to fix your data structure sizes at compile time (e.g. arrays in C), you can use a constant. Then, changing a single constant and recompiling is sufficent for changes to that size (if the code was written with this in mind). With your approach, we'd have to type hundreds or even thousands of lines every time some size changes. Not to mention that all this code would be incredibly hard to read, write, maintain and verify. The old truism "more lines of code = more space for bugs" is taken up to eleven in such a setting.
Then there's the fact that the number is almost never set in stone. Even when it is a compile time constant, changes are still likely. Writing hundreds of lines of code for a minor (if it exists at all) performance gain is hardly ever worth it. This goes thrice if you'd have to do the same amount of work again every time you want to change something. Not to mention that it isn't possible at all once there is any remotely dynamic component in the size of the data structures. That is to say, it's very rarely possible.
Also consider the concept of implicit and succinct data structures. If you use a set of hard-coded variables instead of abstracting over the size, you still got a data structure. You merely made it implicit, unrolled the algorithms operating on it, and set its size in stone. Philosophically, you changed nothing.
But surely it has a performance benefit? Well, possible, although it will be tiny. But it isn't guaranteed to be there. You'd save some space on data, but code size would explode. And as everyone informed about inlining should know, small code sizes are very useful for performance to allow the code to be in the cache. Also, argument passing would result in excessive copying unless you'd figure out a trick to derive the location of most variables from a few pointers. Needless to say, this would be nonportable, very tricky to get right even on a single platform, and liable to being broken by any change to the code or the compiler invocation.
Finally, note that a weaker form is sometimes done. The Wikipedia page on implicit and succinct data structures has some examples. On a smaller scale, some data structures store much data in one place, such that it can be accessed with less pointer chasing and is more likely to be in the cache (e.g. cache-aware and cache-oblivious data structures). It's just not viable for 99% of all code and taking it to the extreme adds only a tiny, if any, benefit.
The main benefit to datastructures, in my opinion, is that you are relationally grouping them. For instance, instead of having 10 separate variables of class MyClass, you can have a datastructure that groups them all. This grouping allows for certain operations to be performed because they are structured together.
Not to mention, having datastructures can potentially enforce type security, which is powerful and necessary in many cases.
And last but not least, what would you rather do?
string string1 = "string1";
string string2 = "string2";
string string3 = "string3";
string string4 = "string4";
string string5 = "string5";
Console.WriteLine(string1);
Console.WriteLine(string2);
Console.WriteLine(string3);
Console.WriteLine(string4);
Console.WriteLine(string5);
Or...
List<string> myStringList = new List<string>() { "string1", "string2", "string3", "string4", "string5" };
foreach (string s in myStringList)
Console.WriteLine(s);
After months of frustration and of time spent in inserting needles in voodoo dolls of previous developers I decided that it is better try to refactor the legacy code.
I already ordered Micheal Feather's book, I am into Fowler's refactoring and I made some sample projects with DUnit.
So even if I don't master the subject I feel it is time to act and put some ideas into practice.
Almost 100% of the code I work on has the business logic trapped in the UI, moreover all is procedural programming (with some few exceptions). The application started as quick & dirty and continued as such.
Now writing tests for all the application is a meaningless task in my case, but I would like to try to unittest something that I need to refactor.
One of the complex tasks one big "TForm business logic class" does is to read DB data, make some computations and populate a scheduler component. I would like to remove the reading DB data and computation part and assign to a new class this task. Of course this is a way to improve the current design, it is not the best way for starting from scratch, but I'd like to do this because the data returned by this new class is useful also in other ways, for example now I've been ask to send e-mail notifications of scheduler data.
So to avoid a massive copy and paste operation I need the new class.
Now the scheduler is populated from a huge dataset (huge in size and in number of fields), probably a first refactoring step could be obtaining the dataset from the new class. But then in the future I'd better use a new class (like TSchedulerData or some other name less bound to scheduler) to manage the data, and instead of having a dataset as result i can have a TSchedulerData object.
Since refactor occurs at at small steps and tests are needed to refactor better I am a little confused on how to proceed.
The following points are not clear to me:
1) how to test a complex dataset? Should I run the working application, save one result set to xml, and write a test where I use a TClientDataSet containing that xml data?
2) How much do I have to care about TSchedulerData? I mean I am not 100% sure I will use TSchedulerData, may be I will stick with the Dataset, anyway thinking of creating complex tests that will be discarded in 2 weeks is not appealing for a DUnitNewbee. Anyway probably this is how it works. I can't imagine the number of bugs that I would face without a test.
Final note: I know someone thinks rewriting from scratch is a better option, but this is not an option. "The application is huge and it is sold today and new features are required today not to get out of business". This is what I have been told, anyway refactoring can save my life and extend the application life.
Your eventual goal is to separate the UI, data storage and business logic into distinct layers.
Its very difficult to test a UI with automatic testing frameworks. You'll want to eventually separate as much of the business logic from the UI as possible. This can be accomplished using one of the various Model/View/* patterns. I prefer MVP passive view, which attempts to make the UI nothing more than an interface. If you're using a Dataset MVP Supervising Controller may be a better fit.
Data storage needs to have its own suite of tests but these are different from unit tests (though you can use the same unit testing framework) and there are usually fewer of them. You can get away with this because most of the heavy lifting is being done by third party data components and a dbms (in your case T*Dataset). These are integration tests. Basically making sure your code plays nice with the vendor's code. Also needed if you have any stored procedures defined in the DB. They are much slower that unit tests and don't need to be run as often.
The business logic is what you want to test the most. Every calculation, loop or branch should have at least one test(more is preferable). In legacy code this logic often touches the UI and db directly and does multiple things in a single function. Here Extract Method is your friend. Good places to extract methods are:
for I:=0 to List.Count - 1 do
begin
//HERE
end;
if /*HERE if its a complex condition*/ then
begin
//HERE
end
else
begin
//HERE
end
Answer := Var1 / Var2 + Var1 * Var3; //HERE
When you come across one of these extraction points
Decide what you want the method signature to look like for your new method: Method name, parameters, return value.
Write a test that calls it and checks the expected outcome.
Extract the method.
If all goes well you will have a newly extracted method with at least one passing unit test.
Delphi's built in Extract Method doesn't give you any way to adjust the signature so if that's your own option you'll have to make do and fix it after extraction. You'll also want to make the new method public so your test can access it. Some people balk at making a private utility method public but at this early stage you have little choice. Once you've made sufficient progress you'll start to see that some utility methods you've extracted belong in their own class (in which case they'd have to be public anyway) while others can be made private/protected and tested indirectly by testing methods that depend on them.
As your test suite grows you'll want to run them after each change to ensure your latest change hasn't broken something elsewhere.
This topic is much too large to cover completely in an answer. You'll find the vast majority of your questions are covered when that book arrives.
I'd say approach it in focussed baby steps.
Step#1: Should always be to get some tests around your area of invasion TForm - regression tests aka safety net. In your case, sense what the app is doing. From what I read, it seems to be a data transformer. So spend time to understand all (or most important if all is not feasible) combinations of input data and the corresponding output schedules. Write them up as tests. Ensure that all tests pass.
Step#2: Now attempt your refactorings. Move blocks of code into cohesive classes etc all under the safety of the regression net.
Testing complex datasets - testing file dumps should be the last resort. But in this case, it seems like a simple option to get started. Maybe you could later make it a first class domain object TSchedule with its own Equals() implementation. Defer design decisions/changes until you have a solid regression test suite around your area of modification.
I've found previous programmers using cfstoredproc in our existing project about insert records into database.
Just example he/she used like that:
<cfstoredproc procedure="myProc" datasource="myDsn">
<cfprocparam type="In" cfsqltype="CF_SQL_CHAR" value="ppshein" dbvarname="username">
<cfprocparam type="In" cfsqltype="CF_SQL_CHAR" value="28 yrs" dbvarname="age">
</cfstoredproc>
I can't convince why he/she used above code instead of:
<cfquery datasource="myDsn">
insert usertb
(name, age)
values
(<cfqueryparam cfsqltype="CF_SQL_CHAR" value="ppshein">, <cfqueryparam cfsqltype="CF_SQL_CHAR" value="28 yrs">)
</cfquery>
I feel there will be the hidden performance using cfstoredproc and cfquery for complex data manipulation. Please let me know the performance using cfstoredproc in coldfusion instead of cfquery for complex data manipulation. What I know is reusable.
CFSTOREDPROC should have better performance for the same reason a stored procedure will have better performance at the database level -- when created, the stored procedure is optimized internally by the database.
Whether this is noticeable depends on your database and your query. Use of CFQUERYPARAM inside your CFQUERY (as in your example) also speeds execution (at the db driver level).
Unless the application is VERY performance-sensitive, I tend to run my SQL code in a profiler first to optimize it, then put it into my CFQUERY parametrized with CFQUERYPARAM tags, rather than use a storedproc. That way, all the logic is in the CF code. This is a matter of taste, of course, and it's easy to move the SQL into a storedproc later when the application matures.
Some shops prefer to have all data logic controlled by the database, leaving CF to act almost exclusively as the front-end generator. Some places are so controlling that they won't let you write any SQL in your CF code.
Update: There might be more to that Stored Proc than a simple INSERT INTO. There may be some data lookup in another table. There could be validation. There may be conditional logic. There may be multiple transactions going on, like a log. A failure to perform the insert may return a specific status code rather than throwing an error.
Honestly, it's simply a matter of style. There are reasons for and against either way, and I've found that it usually comes down to who has more/more competent coders: The CF guys or the database guys.
Basically if you use a stored procedure, the stored procedure will be pre-complied by the database and the execution plan stored. This means that subsequent calls to the stored procedure do not incurr the that overhead. For large complex queries this can be substantial.
So as a rule of thumb: queries that are...
large and complex or
called very frequently or
both of the above
...are very good candidates for conversion to a stored procedure.
Hope that helps!