I've found previous programmers using cfstoredproc in our existing project about insert records into database.
Just example he/she used like that:
<cfstoredproc procedure="myProc" datasource="myDsn">
<cfprocparam type="In" cfsqltype="CF_SQL_CHAR" value="ppshein" dbvarname="username">
<cfprocparam type="In" cfsqltype="CF_SQL_CHAR" value="28 yrs" dbvarname="age">
</cfstoredproc>
I can't convince why he/she used above code instead of:
<cfquery datasource="myDsn">
insert usertb
(name, age)
values
(<cfqueryparam cfsqltype="CF_SQL_CHAR" value="ppshein">, <cfqueryparam cfsqltype="CF_SQL_CHAR" value="28 yrs">)
</cfquery>
I feel there will be the hidden performance using cfstoredproc and cfquery for complex data manipulation. Please let me know the performance using cfstoredproc in coldfusion instead of cfquery for complex data manipulation. What I know is reusable.
CFSTOREDPROC should have better performance for the same reason a stored procedure will have better performance at the database level -- when created, the stored procedure is optimized internally by the database.
Whether this is noticeable depends on your database and your query. Use of CFQUERYPARAM inside your CFQUERY (as in your example) also speeds execution (at the db driver level).
Unless the application is VERY performance-sensitive, I tend to run my SQL code in a profiler first to optimize it, then put it into my CFQUERY parametrized with CFQUERYPARAM tags, rather than use a storedproc. That way, all the logic is in the CF code. This is a matter of taste, of course, and it's easy to move the SQL into a storedproc later when the application matures.
Some shops prefer to have all data logic controlled by the database, leaving CF to act almost exclusively as the front-end generator. Some places are so controlling that they won't let you write any SQL in your CF code.
Update: There might be more to that Stored Proc than a simple INSERT INTO. There may be some data lookup in another table. There could be validation. There may be conditional logic. There may be multiple transactions going on, like a log. A failure to perform the insert may return a specific status code rather than throwing an error.
Honestly, it's simply a matter of style. There are reasons for and against either way, and I've found that it usually comes down to who has more/more competent coders: The CF guys or the database guys.
Basically if you use a stored procedure, the stored procedure will be pre-complied by the database and the execution plan stored. This means that subsequent calls to the stored procedure do not incurr the that overhead. For large complex queries this can be substantial.
So as a rule of thumb: queries that are...
large and complex or
called very frequently or
both of the above
...are very good candidates for conversion to a stored procedure.
Hope that helps!
Related
I need to write a long procedure which generates a report for a company.
Since report involves multiple data to be fetched i have written many small procedures to fetch the different records .
Is it the write approach to write many sub programs in the main program and calling them in the main program?
please help or is there any other way to do this.
Unless you really go wild (**) and build a 'tree' of stored procedures each calling the other one I don't see any problems with this. There might in fact be benefits to this as
it's easier to maintain smaller pieces of code
(re)compilation of smaller stored procedures is going to be faster
**: There is a 'limit' in MSSQL in that the stack is limited to 32 levels. That is, if procedure1 calls procedure1_1 and that procedure calls procedure1_1_1 and that one calls another etc... you'll get an error when you get over 32 calls 'deep'. Calling multiple stored procedures sequentially isn't a problem though.
The only thing to keep in mind is the context of the variables/temporary tables you're using. If you want to pass values around you'll need to use parameters. (using `OUTPUT can be useful to keep track of a #rowcount variable for instance).
I have a code some thing like this
dxMemOrdered : TdxMemData;
while not qrySandbox2.EOF do
begin
dxMemOrdered.append;
dxMemOrderedTotal.asCurrency := qrySandbox2.FieldByName('TOTAL').asCurrency;
dxMemOrdered.post;
qrySandbox2.Next;
end;
this code executes in a thread. When there are huge records say "400000" it is taking around 25 minutes to parse through it. Is there any way that i can reduce the size by optimizing the loop? Any help would be appreciated.
Update
Based on the suggestions i made the following changes
dxMemOrdered : TdxMemData;
qrySandbox2.DisableControls;
while not qrySandbox2.Recordset.EOF do
begin
dxMemOrdered.append;
dxMemOrderedTotal.asCurrency := Recordset.Fields['TOTAL'].Value;
dxMemOrdered.post;
qrySandbox2.Next;
end;
qrySandbox2.EnableControls;
and my output time have improved from 15 mins to 2 mins. Thank you guys
Without seeing more code, the only suggestion I can make is make sure that any visual control that is using the memory table is disabled. Suppose you have a cxgrid called Grid that is linked to your dxMemOrdered memory table:
var
dxMemOrdered: TdxMemData;
...
Grid.BeginUpdate;
try
while not qrySandbox2.EOF do
begin
dxMemOrdered.append;
dxMemOrderedTotal.asCurrency := qrySandbox2.FieldByName('TOTAL').asCurrency;
dxMemOrdered.Post;
qrySandbox2.Next;
end;
finally
Grid.EndUpdate;
end;
Some ideas in order of performance gain vs work to do by you:
1) Check if the SQL dialect that you are using lets you use queries that directly SELECT from/INSERT to. This depends on the database you're using.
2) Make sure that if your datasets are not coupled to visual controls, that you call DisableControls/EnableControls around this loop
3) Does this code have to run in the main program thread? Maybe you can send if off to a separate thread while the user/program continues doing something else
4) When you have to deal with really large data, bulk insertion is the way to go. Many databases have options to bulk insert data from text files. Writing to a text file first and then bulk inserting is way faster than individual inserts. Again, this depends on your database type.
[Edit: I just see you inserting the info that it's TdxMemData, so some of these no longer apply. And you're already threading, missed that ;-). I leave this suggestions in for other readers with similar problems]
It's much better to let SQL do the work instead of iterating though a loop in Delphi. Try a query such as
insert into dxMemOrdered (total)
select total from qrySandbox2
Is 'total' the only field in dxMemOrdered? I hope that it's not the primary key otherwise you are likely to have collisions, meaning that rows will not be added.
There's actually a lot you could do to speed up your thread.
The first would be to look at the problem in a broader perspective:
Am I fetching data from a cached / fast disk, possibly moved in memory?
Am I doing the right thing, when aggregating totals by hand? SQL engines are expecially optimized to do those things, all you'd need to do is to define an additional logical field where to store the SQL aggregated result.
Another little optimization that may bring an improvement over large amounts of looping is to not use constructs like:
Recordset.Fields['TOTAL'].Value
Recordset.FieldByName('TOTAL').Value
but to add the fields with the fields editor and then directly accessing the right field. You'll save a whole loop through the fields collection, that otherwise is performed on every field, on every next record.
I came across this post while I was looking for things to improve performance. Currently, in my application we are returning IList<> all over the place. Is it a good idea to change all of these returns to AsQueryable() ?
Here is what I found -
AsQueryable() - Context needs to be
open and you cannot control the
lifetime of the database context
it need to be disposed properly. Also
it is deferred execution('faster
filtering' as compared to Lists)
IList<> - This should be preferred
over List<> as it provides a barebone
and lightweight implementation.
Also when should be one preferred over another ? I know the basics but I am sorry I am still not clear when and how should we use them correctly in an application. It would be great to know this as the next time I would try to keep it in mind before returning anything..Thanks a lot.
Basically, you should try to reference the widest type you need. For example, if some variable is declared as List<...>, you put a constraint for the type of the values that can be assigned to it. It may happen that you need only sequential access, so it would be enough to declare the variable as IEnumerable<...> instead. That will allow you to assign the values of other types to the variable, as well as the results of LINQ operations.
If you see that your variable needs access by index, you can again declare it as IList<...> and not just List<...>, allowing other types implementing IList<...> be assigned to it.
For the function return types, it depends upon you. If you think it's important that the function returns exactly List<...>, you declare it to return exactly List<...>. If the only important thing is access to the result by index, perhaps you don't need to constrain yourself to return exactly List<...>, you may declare return type as IList<...> (but return actually an instance of List<...> in this implementation, and possibly of some another type supporting IList<...> later). Again, if you see that the only important thing about the return value of your function is that it can be enumerated (and the access by index is not needed), you should change the function return type to IEnumerable<...>, giving yourself more freedom.
Now, about AsQueriable, again it depends on your logic. If you think that possible delayed evaluation is a good thing in your case, as it may aid to avoid the unneeded calculations, or you intend to use it as a part of some another query, you use it. If you think that the results have to be "materialized", i.e., calculated at this very moment, you would better return a List<...>. You would especially need to materialize your result if the calculation later may result in a different list!
With the database a good rule of thumb is to use AsQueriable for the short-term intermediate results, but List for the "final" results which will be used within some longer time. Of course having a non-materialized query hanging around makes closing the database impossible (since at the moment of actual evaluation of the the database should be still open).
if you do not intend to do any further queries over sql server then you should return IList because it produces in-memory data
If you are concerned about performance you should also try to run your queries on as few DB requests as possible and cache the most used queries. It is very common to reduce significantly the request process time using batch approachs.
Which ORM do you use to retrieve data from DB? If you use NHibernate, see this post about how to use Future, Multi Criteria 1, Multi Criteria 2 and Multi Query.
Greetings.
I'm trying to perform a daily operation on a larger than normal dataset (2m+ records). However, Rails seems to take a very long time performing operations on such a dataset. Operations like
Dataset.all.each do |data|
...
end
take a very long time to complete (I assume this is because it can't fit all the items into memory at once, right?).
Does anyone have any strategies on how I could handle this situation? I know SQL would probably speed up the process, but I'm looking to use the Rails environment as I can do many more complicated things to the data than I can with just SQL statements.
You want to use ActiveRecord's find_each for this.
Dataset.find_each do |data|
...
end
When processing a large set of rows, a database is very fast and efficient, it what they were designed for. I would recommend attempting to do all this processing in SQL if you want max performance. If you prefer to use Rails, or it is impossible to do everything you want in SQL, you might attempt to do some pre-processing in SQL and the remainder in Rails. Short of that, 2m+ rows is a lot to loop over, even if each only takes a fraction of a second it add up to a long time.
Given a legacy system that is making heavy use of DataSets and little or no possibility of replacing these with business objects or other, more efficient data structures:
Are there any techniques for reducing the memory footprint of a DataSet?
I am thinking about things like setting initial capacity (when known), removing restrictions, etc., but I have little experience with DataSets and do not know which specific options might be available to me or if any of them would matter at all.
Update:
I am aware of the long-term refactoring possibilities, but I am looking for quick fixes given a set of DataTable objects stored in a DataSet, i.e. which properties are known to affect memory overhead.
Due to the way data is stored internally, setting the initial capacity could be one method, as this would prevent the object from allocating an arbitrarily large amount of memory when adding just one more row.
This is unluckily to help you, but it can help greatly in same cases.
If you are storing a lot of strings that are the same in the dataset, e.g. names of Towns, look at only using a single string object with each distinct string.
e.g.
Directory <string, string> towns = new Directory <string, string>();
foreach(var row in datatable)
{
if (towns.contains(row.town))
{
row.town = towns[row.town]
}
else
{
towns[row.town] = row.town;
}
}
Then the GC can reclaim most of the duplicate strings, however this only works if the datasets lives for along time.
You may wish to do this in the rowCreated event, so that all the duplicate string objects are not created in first place.
If you're using VS2005+, you can instantiate DataTable objects, rather than the whole DataSet. In 2003, if the DataTable is instantiated, it comes with DataSet by default. 2005 and after, you get just the DataTable.
Look at your Data Access layer for filling the DataSets or DataTables. It's most often the case that there is too much data coming through. Make your queries more specific.
Make sure the code you're using does not do goofy things like copy the DataSets when they're passed around. Make sure you're using .Select statements or DataViews to filter and sort, rather than making copies.
There aren't a whole lot of quick "optimizations" for DataSets. If you're having trouble with memory, use items 2 and 3. This would be the case regardless of what type of data transport object you'd use.
And get good at DataSets. If you're not familiar with them, you can do silly things, like with anything. Then you'll write articles about how they suck, which are really articles about how little you know about them. They're really quite useful and simple to maintain. A couple tips:
Use typed DataSets. They'll save you gobs of coding and they're typed, which helps with simple validation.
If you're using typed DSs, make sure you don't modify the generated code file. If you're using VS2005+, you can put any custom business object behavior in the partial class for the DS (not the .designer code file).
Use DataView and .Select wherever you find yourself looping through DataRow objects.
Look around for a good code generation tool and build a rational data access framework for filling and updating from the DSs. One of the issues is that sometimes, designers tie the design of the DS directly to tables in the db, making the design brittle to data structure changes. If you -must- do that, build or use a code generator to build your data access layer from the db, like CodeSmith. Start by looking at some of the CodeSmith templates for generating stored procs and data access classes.
Remember when talking to someone about "objects" vs. "DataSets", the object in this case is the DataRow, not the DataSet. And because of the partial classes you can put behavior on the "object", getting you 95% of the benefits of "objects" for those who love writing code.
You could try making your tables and rows implement interfaces in the code behind files. Then over time change your code to make use of these interfaces rather then the table/rows directly.
Once most of your code just uses the interfaces, you could use a code generate to create C# class that implement those interfaces without the overhead of rows/tables.
However it may be cheaper just to move to 64 bit and buy more ram...