Insert many records using ADO - delphi

I am looking the fastest way to insert many records at once (+1000) to an table using ADO.
Options:
using insert commands and parameters
ADODataSet1.CommandText:='INSERT INTO .....';
ADODataSet1.Parameters.CreateParameter('myparam',ftString,pdInput,12,'');
ADODataSet1.Open;
using TAdoTable
AdoTable1.Insert;
AdoTable1.FieldByName('myfield').Value:=myvale;
//..
//..
//..
AdoTable1.FieldByName('myfieldN').value:=myvalueN;
AdoTable1.Post;
I am using delphi 7, ADO, and ORACLE.

Probably your fastest way would be option 2. Insert all the records and tell the dataset to send it off to the DB. But FieldByName is slow, and you probably shouldn't use it in a big loop like this. If you already have the fields (because they're defined at design time), reference the fields in code their actual names. If not, call FieldByName once for each field and store them in local variables, and reference the fields by these when you're inserting.

Using ADO I think you may be out of luck. Not all back-ends support bulk insert operations and so ADO implements an abstraction to allow consistent coding of apparent bulk operations (batches) irrespective of the back-end support which "under the hood" is merely inserting the "batch" as a huge bunch of parameterised, individual inserts.
The downside of this is that even those back-ends which do support bulk inserts do not always code this into their ADO/OLEDB provider(s) - why bother? (I've seen it mentioned that the Oracle OLEDB provider supports bulk operations and that it is ADO which denies access to it, so it's even possible that the ADO framework simply does not allow a provider to support this functionality more directly in ADO itself - I'm not sure).
But, you mention Oracle, and this back-end definitely does support bulk insert operations via it's native API's.
There is a commercial Delphi component library - ODAC (Oracle Direct Access Components) for, um, direct access to Oracle (it does not even require the Oracle client software to be installed).
This also directly supports the bulk insert capabilities provided by Oracle and is additionally a highly efficient means for accessing your Oracle data stores.

What you are trying to do is called bulk insert. Oracle provides .NET assembly Oracle.DataAccess.dll that you can use for this purpose. There is no hand-made solution that you can think of that would beat the performance of this custom vendor library for the Oracle DBMS.
http://download.oracle.com/docs/html/E10927_01/OracleBulkCopyClass.htm#CHDGJBBJ
http://dotnetslackers.com/articles/ado_net/BulkOperationsUsingOracleDataProviderForNETODPNET.aspx
The most common idea is to use arrays of values for each column and apply them to a template SQL. In the example below employeeIds, firstNames, lastNames and dobs are arrays of the same length with the values to insert.
The Array Binding feature in ODP.NET
allows you to insert multiple records
in one database call. To use Array
Binding, you simply set
OracleCommand.ArrayBindCount to the
number of records to be inserted, and
pass arrays of values as parameters
instead of single values:
> 01. string sql =
> 02. "insert into bulk_test (employee_id, first_name, last_name,
> dob) "
> 03.
> + "values (:employee_id, :first_name, :last_name, :dob)";
> 04.
>
> 05. OracleConnection cnn = new OracleConnection(connectString);
> 06. cnn.Open();
> 07. OracleCommand cmd = cnn.CreateCommand();
> 08. cmd.CommandText = sql;
> 09. cmd.CommandType = CommandType.Text;
> 10. cmd.BindByName = true;
> 11.
>
> 12. // To use ArrayBinding, we need to set ArrayBindCount
> 13. cmd.ArrayBindCount = numRecords;
> 14.
>
> 15. // Instead of single values, we pass arrays of values as parameters
> 16. cmd.Parameters.Add(":employee_id", OracleDbType.Int32,
> 17. employeeIds, ParameterDirection.Input);
> 18. cmd.Parameters.Add(":first_name", OracleDbType.Varchar2,
> 19. firstNames, ParameterDirection.Input);
> 20. cmd.Parameters.Add(":last_name", OracleDbType.Varchar2,
> 21. lastNames, ParameterDirection.Input);
> 22. cmd.Parameters.Add(":dob", OracleDbType.Date,
> 23. dobs, ParameterDirection.Input);
> 24. cmd.ExecuteNonQuery();
> 25. cnn.Close();
As you can see, the code does not look that much different
from doing a regular single-record
insert. However, the performance
improvement is quite drastic,
depending on the number of records
involved. The more records you have to
insert, the bigger the performance
gain. On my development PC, inserting
1,000 records using Array Binding is
90 times faster than inserting the
records one at a time. Yes, you read
that right: 90 times faster! Your
results will vary, depending on the
record size and network
speed/bandwidth to the database
server.
A bit of investigative work reveals
that the SQL is considered to be
"executed" multiple times on the
server side. The evidence comes from
V$SQL (look at the EXECUTIONS column).
However, from the .NET point of view,
everything was done in one call.

You can really improve the insert performance by using the TADOConnection object directly.
dbConn := TADOConnection......
dbConn.BeginTrans;
try
dbConn.Execute(command, cmdText, [eoExecuteNoRecords]);
dbConn.CommitTrans;
except
on E:Exception do
begin
dbConn.RollbackTrans;
Raise e;
end;
end;
Also, the speed can be improved further by inserting more than one records at once.

You could also try the BatchOptmistic mode of the TADODataset. I don't have Oracle so no idea whether it is supported for Oracle, but I have used similar for MS SQL Server.
ADODataSet1.CommandText:='select * from .....';
ADODataSet1.LockType:=ltBatchOptimistic;
ADODataSet1.Open;
ADODataSet1.Insert;
ADODataSet1.FieldByName('myfield').Value:=myvalue1;
//..
ADODataSet1.FieldByName('myfieldN').value:=myvalueN1;
ADODataSet1.Post;
ADODataSet1.Insert;
ADODataSet1.FieldByName('myfield').Value:=myvalue2;
//..
ADODataSet1.FieldByName('myfieldN').value:=myvalueN2;
ADODataSet1.Post;
ADODataSet1.Insert;
ADODataSet1.FieldByName('myfield').Value:=myvalue3;
//..
ADODataSet1.FieldByName('myfieldN').value:=myvalueN3;
ADODataSet1.Post;
// Finally update Oracle with entire dataset in one batch
ADODataSet1.UpdateBatch(arAll);

1000 rows is probably not the point where this approach becomes economic but consider writing the inserts to a flat file & then running the SQL*Loader command line utility. That is seriously the fastest way to bulk upload data into Oracle.
http://www.oracleutilities.com/OSUtil/sqlldr.html
I've seen developers spend literally weeks writing (Delphi) loading routines that performed several orders of magnitude slower than SQL*Loader controlled by a control file that took around an hour to write.

Remenber to disable posible control that are linked to the Dataset/Table/Query/...
...
ADOTable.Disablecontrols;
try
...
finally
ADOTable.enablecontrols;
end;
...

You might try Append instead of Insert:
AdoTable1.Append;
AdoTable1.FieldByName('myfield').Value:=myvale;
//..
//..
//..
AdoTable1.FieldByName('myfieldN').value:=myvalueN;
AdoTable1.Post;
With Append, you will save some effort on the client dataset, as the records will get added to the end rather than inserting records and pushing the remainder down.
Also, you can unhook any data aware controls that might be bound to the dataset or lock them using BeginUpdate.
I get pretty decent performance out of the append method, but if you're expecting bulk speeds, you might want to look at inserting multiple rows in a single query by executing the query itself like this:
AdoQuery1.SQL.Text = 'INSERT INTO myTable (myField1, myField2) VALUES (1, 2), (3, 4)';
AdoQuery1.ExecSQL;
You should get some benefits from the database engine when inserting multiple records at once.

Related

TClientDataSet and limit by memory

We have a system that creates reports out of our data. And we can deal with a lot of data. The idea of over 150,000 rows is not out of the question.
Unfortunately, our experience with TClientDataSet is its limitations, because it often results in an 'insufficient memory for this operation' error, when the data gets too big.
So the question is this: Does there exist a generally available implementation of TDataSet that can handle a large amount of data (such as streaming directly to a file and not keeping the entire dataset in memory)?
I am willing to implement such a class myself. But as far as I understand TClientDataSet, it needs to be able to contain the data itself before it can save it to a file/stream. In addition, loading the data again should also be possible as a stream rather than loading in an entire TClientDataSet object, because then we wouldn't have solved the issue.
You can use either FireBird or Interbase in embedded mode.
Is there really any need to cache all the data on the client before reporting? If not, maybe rethink how you're querying and processing data to generate these reports and see if there's a way that involves less client-side data (which comes with a bonus of less data transmitted over the network).
If you've been down that road before and you really do need all this data client side, then you could look at custom data structures. A TList<T> of records (even if you need to build your own indexes) takes a lot less memory than a TClientDataSet does.
KBMMemTable is a nice alternative to TClientDataset
http://www.components4programmers.com/products/kbmmemtable/
We are using it for years and it is very useful and fast.
Wanted to underline that the capacities of the TClientDataset could be something bigger.
Test on the limits of TClientDataset - appending xxx,xxx records, putting the single record in whole (repeated) to create an idea on the size.
// Begin Load Record to TCLientDataset for Reverse (on Reverse) Processing
dxMemData1.Append;
dxMemData1['NT_Rec_No'] := 1000;
dxMemData1['NT_User'] := 'DEV\Administrator';
dxMemData1['NT_Type'] := 'Information';
dxMemData1['Ora_Timestamp'] := '20170706033859.000';
dxMemData1['Ora_Host'] := 'DEV0001';
dxMemData1['Ora_SID'] := 'Oracle.orcl';
dxMemData1['Ora_Event_Id'] := '34';
dxMemData1['NT_Message'] := Memo1.Text;
dxMemData1.Post;
// End Load Record to TCLientDataset for Reverse (on Reverse) Processing
String on the memo1 is a of 100 characters - ansi
did several tests and managed to keep going to something from 600,000 to 900,000 records without crashing.
Difference can be made by making the text on the memo bigger - this did decrease the max number before crash - which means it is not a matter of exact max. record number - but of size consumed - my guess.
Tested the same with TdxMemData (devexpress), this time I could reach almost the double of records

Optimizing looping through dataset

I have a code some thing like this
dxMemOrdered : TdxMemData;
while not qrySandbox2.EOF do
begin
dxMemOrdered.append;
dxMemOrderedTotal.asCurrency := qrySandbox2.FieldByName('TOTAL').asCurrency;
dxMemOrdered.post;
qrySandbox2.Next;
end;
this code executes in a thread. When there are huge records say "400000" it is taking around 25 minutes to parse through it. Is there any way that i can reduce the size by optimizing the loop? Any help would be appreciated.
Update
Based on the suggestions i made the following changes
dxMemOrdered : TdxMemData;
qrySandbox2.DisableControls;
while not qrySandbox2.Recordset.EOF do
begin
dxMemOrdered.append;
dxMemOrderedTotal.asCurrency := Recordset.Fields['TOTAL'].Value;
dxMemOrdered.post;
qrySandbox2.Next;
end;
qrySandbox2.EnableControls;
and my output time have improved from 15 mins to 2 mins. Thank you guys
Without seeing more code, the only suggestion I can make is make sure that any visual control that is using the memory table is disabled. Suppose you have a cxgrid called Grid that is linked to your dxMemOrdered memory table:
var
dxMemOrdered: TdxMemData;
...
Grid.BeginUpdate;
try
while not qrySandbox2.EOF do
begin
dxMemOrdered.append;
dxMemOrderedTotal.asCurrency := qrySandbox2.FieldByName('TOTAL').asCurrency;
dxMemOrdered.Post;
qrySandbox2.Next;
end;
finally
Grid.EndUpdate;
end;
Some ideas in order of performance gain vs work to do by you:
1) Check if the SQL dialect that you are using lets you use queries that directly SELECT from/INSERT to. This depends on the database you're using.
2) Make sure that if your datasets are not coupled to visual controls, that you call DisableControls/EnableControls around this loop
3) Does this code have to run in the main program thread? Maybe you can send if off to a separate thread while the user/program continues doing something else
4) When you have to deal with really large data, bulk insertion is the way to go. Many databases have options to bulk insert data from text files. Writing to a text file first and then bulk inserting is way faster than individual inserts. Again, this depends on your database type.
[Edit: I just see you inserting the info that it's TdxMemData, so some of these no longer apply. And you're already threading, missed that ;-). I leave this suggestions in for other readers with similar problems]
It's much better to let SQL do the work instead of iterating though a loop in Delphi. Try a query such as
insert into dxMemOrdered (total)
select total from qrySandbox2
Is 'total' the only field in dxMemOrdered? I hope that it's not the primary key otherwise you are likely to have collisions, meaning that rows will not be added.
There's actually a lot you could do to speed up your thread.
The first would be to look at the problem in a broader perspective:
Am I fetching data from a cached / fast disk, possibly moved in memory?
Am I doing the right thing, when aggregating totals by hand? SQL engines are expecially optimized to do those things, all you'd need to do is to define an additional logical field where to store the SQL aggregated result.
Another little optimization that may bring an improvement over large amounts of looping is to not use constructs like:
Recordset.Fields['TOTAL'].Value
Recordset.FieldByName('TOTAL').Value
but to add the fields with the fields editor and then directly accessing the right field. You'll save a whole loop through the fields collection, that otherwise is performed on every field, on every next record.

Trying to hack my way around SQLite3 concurrent writing, any better way to do this?

I use Delphi XE2 along with DISQLite v3 (which is basically a port of SQLite3). I love everything about SQLite3, except the lack of concurrent writing, especially that I extensively rely on multi-threading in this project :(
My profiler made it clear I needed to do something about it, so I decided to use this approach:
Whenever I need to insert a record in DB, Instead of doing an INSERT, I write the SQL query in a special foler, ie.
WriteToFile_Inline(SPECIAL_FOLDER_PATH + '\' + GUID, FileName + '|' + IntToStr(ID) + '|' + Hash + '|' + FloatToStr(ModifDate) + '|' + ...);
I added a timer (in the main app thread) that fires every minute, parse these files and then INSERT the queries using a transaction.
Delete those temporary files at the end.
The result is I have like 500% performance gain. Plus this technique is ACID, as I can always scan the SPECIAL_FOLDER_PATH after a power failure and execute the INSERTs I find.
Despite the good results, I'm not very happy with the method used (hackish to say the least), I keep thinking that if I could have a generics-like with fast lookup access, thread-safe, ACID list, this would be much cleaner (and possibly faster?)
So my question is: do you know anything like that for Delphi XE2?
PS. I trust many of you reading the code above be in shock and will start insulting me at this point! Please be my guest, but if you know a better (ie. faster) ACID approach, please share your thoughts!
Your idea of sending the inserts to a queue, which will rearrange the inserts, and join them via prepared statements is very good. Using a timer in the main thread or a separated thread is up to you. It will avoid any locking.
Do not forget to use a transaction, then commit it every 100/1000 inserts for instance.
About high performance using SQLite3, see e.g. this blog article (and graphic below):
In this graphic, best performance (file off) comes from:
PRAGMA synchronous = OFF
Using prepared statements
Inside a transaction
In WAL mode (especially in concurrency mode)
You may also change the page size, or the journal size, but settings above are the best. See https://stackoverflow.com/search?q=sqlite3+performance
If you do not want to use a background thread, ensure WAL is ON, prepare your statements, use batchs, and regroup your process to release the SQLite3 lock as soon as possible.
The best performance will be achieved by adding a Client-Server layer, just as we did for mORMot.
With files you organized an asynchronous job queue with persistance. It allows you to avoid one-by-one and use batch (records group) approach to insert the records. Comparing one-by-one and batch:
first works in auto-commit mode (probably) for each record, second wraps a batch into a single transaction and gives greatest performance gain.
first prepares an INSERT command each time when you need to insert a record (probably), second once per batch and gives second by value gain.
I dont think, that SQLite concurrency is a problem in your case (at least not the main issue). Because in SQLite a single insert is comparably fast and concurrency performance issues you will get with high workload. Probably similar results you will get with other DBMS, like Oracle.
To improve your batch approach, consider the following:
consider to set journal_mode to WAL and disable shared cache mode.
use a background thread to process your queue. Instead of a fixed time interval (1 min), check SPECIAL_FOLDER_PATH more often. And if the queue has more than X Kb of data, then start processing. Or use a count of queued records and event to notify the thread, that the queue should start processing.
use multy-record prepared INSERT instead of single-record INSERT. You can build an INSERT for 100 records and process your queue data in a single batch, but by 100 record chanks.
consider to write / read a binary field values instead of a text values.
consider to use a set of files with preallocated size.
etc
sqlite3_busy_timeout is pretty inefficient because it doesn't return immediately when the table it's waiting on is unlocked.
I would try creating a critical section (TCriticalSection?) to protect each table. If you enter the critical section before inserting a row and exit it immediately thereafter, you will create better table locks than SQLite provides.
Without knowing your access patterns, though, it's hard to say if this will be faster than batching up a minute's worth of inserts into single transactions.

How should I create unique bill/account numbers?

What do you people use for generating unique account numbers?
Some use Autoinc field, others something else...
What would be the proper way i.e to get an account number before I run the insert query?
If you are using a SQL database, use a Generator. If you want to use an independent mechanism you could consider using a GUID.
You haven't told us what database system you are using, but from the sound of it, you're talking about the Paradox tables in Delphi. If so, an autoInc column can work, although if I recall correctly, you have to be very careful when moving your data around with Paradox autoInc columns because they re-generate from zero when moved.
As has been mentioned, you can use GUIDs - sysutils.function CreateGUID(out Guid: TGUID): HResult; - they will always be unique, but the downside in GUIDS is that ordering by these keys will not be intuitive and probably be meaningless, so you'll need a timestamp column of some sort to maintain the order of your inserts, which can be important. Also, a GUID is a rather long character string and will not be very efficient for use as an account#, which assumedly will be a primary or foreign key in many tables.
So I'd stick to autoInc if you want something automatic, but if you have to move data around and you need to maintain your original keys, load your original autoincs as integer columns in their new location or you could end up corrupting your entire database. (I believe there are other scenarios that also cause autoIncs to reset in Paradox tables - research this if it's relevant - been a long time since I've used Pdox, and it may not be a problem with other flat file databases)
If you are indeed using a database server - SQLServer, Oracle, Interbase, etc, they all have autoInc/indentity or generator functionality, sometimes in conjuction with a trigger - that is your best option.
Dorin's answer is also an excellent solution if you want to handle this yourself from within your Delphi code. Create a global, thread safe function to implement it - that will ensure a very high level of safety.
HTH
Depending on how long you want the number, you can go with Jamies MD5 conversion or:
var
LDateTime: TDateTime;
LBytes: array[0..7] of Byte absolute LDateTime;
LAccNo: string;
Index: Integer;
begin
LDateTime := Now;
LAccNo := EmptyStr;
for Index := 0 to 7 do
LAccNo := LAccNo + IntToHex( LBytes[ Index ], 2 );
// now you have a code in LAccNo, use it wisely (:
end;
I use this PHP snippet to generate a decent account number:
$account_number = str_replace(array("0","O"),"D",strtoupper(substr(md5(time()),0,7)));
This will create a 7 digit varchar string that doesn't contain 0's or o's (to avoid errors on the phone or transcribing them in e-mails, etc.) You get something like EDB6DA6 or 76337D5 or DB2E624.

Fast read of a Nexus database table

I want to read the entire contents of a table into memory, as quickly as possible. I am using Nexus database, but there might be some techniques I could use that are applicable to all database types in Delphi.
The table I am looking at has 60,000 records with 20 columns. So not a huge data set.
From my profiling, I have found the following so far:
Accessing tables directly using TnxTable is no faster or slower than using a SQL query and 'SELECT * FROM TableName'
The simple act of looping through the rows, without actually reading or copying any data, takes the majority of the time.
The performance I am getting is
Looping through all records takes 3.5 seconds
Looping through all the records, reading the values and storing them, takes 3.7 seconds (i.e. only 0.2 seconds more)
A sample of my code
var query:TnxQuery;
begin
query.SQL.Text:='SELECT * FROM TableName';
query.Active:=True;
while not query.Eof do
query.Next;
This takes 3.5 seconds on a 60,000 row table.
Does this performance sound reasonable? Are there other approaches I can take that would let me read the data faster?
I am currently reading data from a server on the same computer, but eventually this may be from another server on a LAN.
You should be using BlockRead mode with a TnxTable for optimal read speed:
nxTable.BlockReadOptions := [gboBlobs, gboBookmarks];
//leave out gboBlobs if you want to access blobs only as needed
//leave out gboBookmarks if no bookmark support is required
nxTable.BlockReadSize := 1024*1024; //1MB
// setting block read size performs an implicit First
// while block read mode is active only calls to Next and First are allowed for navigation
try
while not nxTable.Eof do begin
// do something....
nxTable.Next;
end;
finally
nxTable.BlockReadSize := 0;
end;
Also, if you don't need to set a range on a specifc index, make sure to use the sequential access index for fastest possible access.

Resources