OLEDB Transactions - oledb

I am trying to use transactions to speed up the insertion of a large number of database entries.
I am using SQL Server Compact 4.0 and the ATL OLEDB API based on C++.
Here is the basic sequence:
sessionobj.StartTransaction();
tableobject.Insert(//);
tableobject.Insert(//);
tableobject.Insert(//);
...
sessionobj.Commit();
NOTE: the tableobj object is a CTable that is initialized by the sessionobj CSession object.
I should be seeing a sizeable performance increase but I am not. Does anyone know what I am doing wrong?

Related

TClientDataSet and limit by memory

We have a system that creates reports out of our data. And we can deal with a lot of data. The idea of over 150,000 rows is not out of the question.
Unfortunately, our experience with TClientDataSet is its limitations, because it often results in an 'insufficient memory for this operation' error, when the data gets too big.
So the question is this: Does there exist a generally available implementation of TDataSet that can handle a large amount of data (such as streaming directly to a file and not keeping the entire dataset in memory)?
I am willing to implement such a class myself. But as far as I understand TClientDataSet, it needs to be able to contain the data itself before it can save it to a file/stream. In addition, loading the data again should also be possible as a stream rather than loading in an entire TClientDataSet object, because then we wouldn't have solved the issue.
You can use either FireBird or Interbase in embedded mode.
Is there really any need to cache all the data on the client before reporting? If not, maybe rethink how you're querying and processing data to generate these reports and see if there's a way that involves less client-side data (which comes with a bonus of less data transmitted over the network).
If you've been down that road before and you really do need all this data client side, then you could look at custom data structures. A TList<T> of records (even if you need to build your own indexes) takes a lot less memory than a TClientDataSet does.
KBMMemTable is a nice alternative to TClientDataset
http://www.components4programmers.com/products/kbmmemtable/
We are using it for years and it is very useful and fast.
Wanted to underline that the capacities of the TClientDataset could be something bigger.
Test on the limits of TClientDataset - appending xxx,xxx records, putting the single record in whole (repeated) to create an idea on the size.
// Begin Load Record to TCLientDataset for Reverse (on Reverse) Processing
dxMemData1.Append;
dxMemData1['NT_Rec_No'] := 1000;
dxMemData1['NT_User'] := 'DEV\Administrator';
dxMemData1['NT_Type'] := 'Information';
dxMemData1['Ora_Timestamp'] := '20170706033859.000';
dxMemData1['Ora_Host'] := 'DEV0001';
dxMemData1['Ora_SID'] := 'Oracle.orcl';
dxMemData1['Ora_Event_Id'] := '34';
dxMemData1['NT_Message'] := Memo1.Text;
dxMemData1.Post;
// End Load Record to TCLientDataset for Reverse (on Reverse) Processing
String on the memo1 is a of 100 characters - ansi
did several tests and managed to keep going to something from 600,000 to 900,000 records without crashing.
Difference can be made by making the text on the memo bigger - this did decrease the max number before crash - which means it is not a matter of exact max. record number - but of size consumed - my guess.
Tested the same with TdxMemData (devexpress), this time I could reach almost the double of records

Doctrine fetching objects creates memory exhaustion at about 4000 objects

Fatal error: Allowed memory size of 134217728 bytes exhausted.
There are a few cases where I need to create 10's of thousands of results, but obviously this is causing huge memory issues. Are there any ways of reducing the memory on large query sets?
It depends on how you will use the results:
if you don't need the result as object but array will suffice you
can change hydration mode:
->setHydrationMode(Doctrine::HYDRATE_ARRAY) can be used to retrieve
data in a multidimensional array (other hydration mode can be found
on doctrine documentation)
if you need objects as result (for example in a foreach cycle) remember to free them after use:
$myobject->free(); /* if using php 5.2 also unset($myobject) */
look also at doctrine docs on performance improving
also disabling debug bar helps a lot on big doctrine collections: sfConfig::set('sf_debug', false);

How does AdoQuery handle blobs?

I am testing some databases components such as SDAC and others and I found out something interesting:
When I execute a query with TADOQuery and this query has a lot of blob fields and I get all rows (fetchall) the memory of my application gets close to 1.8GB and everything works fine.
Using other components, the same query executed on the same database trows an Out of Memory exception because it exceeds 1.8GB of memory usage.
I know I should not return all those rows, i should use pagination and blablabla. But i am curious how can ADO manage to get all rows and other components cant.
I think that ADO is compressing the blobs in memory, but this is only a guess.
Does anyone knows why memory usage in ADO is so good?
I cannot say about SDAC, but will say about AnyDAC TADQuery:
if exclude fiBlobs from FetchOptions.Items, then AnyDAC will not fetch BLOB values immediately. But will defer fetching until the application will really need a BLOB value;
setting FormatOptions.InlineDataSize to more small value, will reduce memory usage on fetching large result set with multiple character fields;
specifying FormatOptions.MapRules, application may choose more compact data type representation.
Also there are few other techniques, allowing to reduce memory usage when fetching large result sets. To use them properly a developer should know what kind of data will be returned. The price of some options usage may be a slightly reduced fetch performance.

different between cfstoredproc and cfquery

I've found previous programmers using cfstoredproc in our existing project about insert records into database.
Just example he/she used like that:
<cfstoredproc procedure="myProc" datasource="myDsn">
<cfprocparam type="In" cfsqltype="CF_SQL_CHAR" value="ppshein" dbvarname="username">
<cfprocparam type="In" cfsqltype="CF_SQL_CHAR" value="28 yrs" dbvarname="age">
</cfstoredproc>
I can't convince why he/she used above code instead of:
<cfquery datasource="myDsn">
insert usertb
(name, age)
values
(<cfqueryparam cfsqltype="CF_SQL_CHAR" value="ppshein">, <cfqueryparam cfsqltype="CF_SQL_CHAR" value="28 yrs">)
</cfquery>
I feel there will be the hidden performance using cfstoredproc and cfquery for complex data manipulation. Please let me know the performance using cfstoredproc in coldfusion instead of cfquery for complex data manipulation. What I know is reusable.
CFSTOREDPROC should have better performance for the same reason a stored procedure will have better performance at the database level -- when created, the stored procedure is optimized internally by the database.
Whether this is noticeable depends on your database and your query. Use of CFQUERYPARAM inside your CFQUERY (as in your example) also speeds execution (at the db driver level).
Unless the application is VERY performance-sensitive, I tend to run my SQL code in a profiler first to optimize it, then put it into my CFQUERY parametrized with CFQUERYPARAM tags, rather than use a storedproc. That way, all the logic is in the CF code. This is a matter of taste, of course, and it's easy to move the SQL into a storedproc later when the application matures.
Some shops prefer to have all data logic controlled by the database, leaving CF to act almost exclusively as the front-end generator. Some places are so controlling that they won't let you write any SQL in your CF code.
Update: There might be more to that Stored Proc than a simple INSERT INTO. There may be some data lookup in another table. There could be validation. There may be conditional logic. There may be multiple transactions going on, like a log. A failure to perform the insert may return a specific status code rather than throwing an error.
Honestly, it's simply a matter of style. There are reasons for and against either way, and I've found that it usually comes down to who has more/more competent coders: The CF guys or the database guys.
Basically if you use a stored procedure, the stored procedure will be pre-complied by the database and the execution plan stored. This means that subsequent calls to the stored procedure do not incurr the that overhead. For large complex queries this can be substantial.
So as a rule of thumb: queries that are...
large and complex or
called very frequently or
both of the above
...are very good candidates for conversion to a stored procedure.
Hope that helps!

Insert many records using ADO

I am looking the fastest way to insert many records at once (+1000) to an table using ADO.
Options:
using insert commands and parameters
ADODataSet1.CommandText:='INSERT INTO .....';
ADODataSet1.Parameters.CreateParameter('myparam',ftString,pdInput,12,'');
ADODataSet1.Open;
using TAdoTable
AdoTable1.Insert;
AdoTable1.FieldByName('myfield').Value:=myvale;
//..
//..
//..
AdoTable1.FieldByName('myfieldN').value:=myvalueN;
AdoTable1.Post;
I am using delphi 7, ADO, and ORACLE.
Probably your fastest way would be option 2. Insert all the records and tell the dataset to send it off to the DB. But FieldByName is slow, and you probably shouldn't use it in a big loop like this. If you already have the fields (because they're defined at design time), reference the fields in code their actual names. If not, call FieldByName once for each field and store them in local variables, and reference the fields by these when you're inserting.
Using ADO I think you may be out of luck. Not all back-ends support bulk insert operations and so ADO implements an abstraction to allow consistent coding of apparent bulk operations (batches) irrespective of the back-end support which "under the hood" is merely inserting the "batch" as a huge bunch of parameterised, individual inserts.
The downside of this is that even those back-ends which do support bulk inserts do not always code this into their ADO/OLEDB provider(s) - why bother? (I've seen it mentioned that the Oracle OLEDB provider supports bulk operations and that it is ADO which denies access to it, so it's even possible that the ADO framework simply does not allow a provider to support this functionality more directly in ADO itself - I'm not sure).
But, you mention Oracle, and this back-end definitely does support bulk insert operations via it's native API's.
There is a commercial Delphi component library - ODAC (Oracle Direct Access Components) for, um, direct access to Oracle (it does not even require the Oracle client software to be installed).
This also directly supports the bulk insert capabilities provided by Oracle and is additionally a highly efficient means for accessing your Oracle data stores.
What you are trying to do is called bulk insert. Oracle provides .NET assembly Oracle.DataAccess.dll that you can use for this purpose. There is no hand-made solution that you can think of that would beat the performance of this custom vendor library for the Oracle DBMS.
http://download.oracle.com/docs/html/E10927_01/OracleBulkCopyClass.htm#CHDGJBBJ
http://dotnetslackers.com/articles/ado_net/BulkOperationsUsingOracleDataProviderForNETODPNET.aspx
The most common idea is to use arrays of values for each column and apply them to a template SQL. In the example below employeeIds, firstNames, lastNames and dobs are arrays of the same length with the values to insert.
The Array Binding feature in ODP.NET
allows you to insert multiple records
in one database call. To use Array
Binding, you simply set
OracleCommand.ArrayBindCount to the
number of records to be inserted, and
pass arrays of values as parameters
instead of single values:
> 01. string sql =
> 02. "insert into bulk_test (employee_id, first_name, last_name,
> dob) "
> 03.
> + "values (:employee_id, :first_name, :last_name, :dob)";
> 04.
>
> 05. OracleConnection cnn = new OracleConnection(connectString);
> 06. cnn.Open();
> 07. OracleCommand cmd = cnn.CreateCommand();
> 08. cmd.CommandText = sql;
> 09. cmd.CommandType = CommandType.Text;
> 10. cmd.BindByName = true;
> 11.
>
> 12. // To use ArrayBinding, we need to set ArrayBindCount
> 13. cmd.ArrayBindCount = numRecords;
> 14.
>
> 15. // Instead of single values, we pass arrays of values as parameters
> 16. cmd.Parameters.Add(":employee_id", OracleDbType.Int32,
> 17. employeeIds, ParameterDirection.Input);
> 18. cmd.Parameters.Add(":first_name", OracleDbType.Varchar2,
> 19. firstNames, ParameterDirection.Input);
> 20. cmd.Parameters.Add(":last_name", OracleDbType.Varchar2,
> 21. lastNames, ParameterDirection.Input);
> 22. cmd.Parameters.Add(":dob", OracleDbType.Date,
> 23. dobs, ParameterDirection.Input);
> 24. cmd.ExecuteNonQuery();
> 25. cnn.Close();
As you can see, the code does not look that much different
from doing a regular single-record
insert. However, the performance
improvement is quite drastic,
depending on the number of records
involved. The more records you have to
insert, the bigger the performance
gain. On my development PC, inserting
1,000 records using Array Binding is
90 times faster than inserting the
records one at a time. Yes, you read
that right: 90 times faster! Your
results will vary, depending on the
record size and network
speed/bandwidth to the database
server.
A bit of investigative work reveals
that the SQL is considered to be
"executed" multiple times on the
server side. The evidence comes from
V$SQL (look at the EXECUTIONS column).
However, from the .NET point of view,
everything was done in one call.
You can really improve the insert performance by using the TADOConnection object directly.
dbConn := TADOConnection......
dbConn.BeginTrans;
try
dbConn.Execute(command, cmdText, [eoExecuteNoRecords]);
dbConn.CommitTrans;
except
on E:Exception do
begin
dbConn.RollbackTrans;
Raise e;
end;
end;
Also, the speed can be improved further by inserting more than one records at once.
You could also try the BatchOptmistic mode of the TADODataset. I don't have Oracle so no idea whether it is supported for Oracle, but I have used similar for MS SQL Server.
ADODataSet1.CommandText:='select * from .....';
ADODataSet1.LockType:=ltBatchOptimistic;
ADODataSet1.Open;
ADODataSet1.Insert;
ADODataSet1.FieldByName('myfield').Value:=myvalue1;
//..
ADODataSet1.FieldByName('myfieldN').value:=myvalueN1;
ADODataSet1.Post;
ADODataSet1.Insert;
ADODataSet1.FieldByName('myfield').Value:=myvalue2;
//..
ADODataSet1.FieldByName('myfieldN').value:=myvalueN2;
ADODataSet1.Post;
ADODataSet1.Insert;
ADODataSet1.FieldByName('myfield').Value:=myvalue3;
//..
ADODataSet1.FieldByName('myfieldN').value:=myvalueN3;
ADODataSet1.Post;
// Finally update Oracle with entire dataset in one batch
ADODataSet1.UpdateBatch(arAll);
1000 rows is probably not the point where this approach becomes economic but consider writing the inserts to a flat file & then running the SQL*Loader command line utility. That is seriously the fastest way to bulk upload data into Oracle.
http://www.oracleutilities.com/OSUtil/sqlldr.html
I've seen developers spend literally weeks writing (Delphi) loading routines that performed several orders of magnitude slower than SQL*Loader controlled by a control file that took around an hour to write.
Remenber to disable posible control that are linked to the Dataset/Table/Query/...
...
ADOTable.Disablecontrols;
try
...
finally
ADOTable.enablecontrols;
end;
...
You might try Append instead of Insert:
AdoTable1.Append;
AdoTable1.FieldByName('myfield').Value:=myvale;
//..
//..
//..
AdoTable1.FieldByName('myfieldN').value:=myvalueN;
AdoTable1.Post;
With Append, you will save some effort on the client dataset, as the records will get added to the end rather than inserting records and pushing the remainder down.
Also, you can unhook any data aware controls that might be bound to the dataset or lock them using BeginUpdate.
I get pretty decent performance out of the append method, but if you're expecting bulk speeds, you might want to look at inserting multiple rows in a single query by executing the query itself like this:
AdoQuery1.SQL.Text = 'INSERT INTO myTable (myField1, myField2) VALUES (1, 2), (3, 4)';
AdoQuery1.ExecSQL;
You should get some benefits from the database engine when inserting multiple records at once.

Resources