how dows bulk update flow of execution works - stored-procedures

I am putting up some code which will be copying billions of data from one table to another and we don't want the procedure to stop in case of exception. so i am putting up the script like (not putting 100% compilable syntax)
dml_errors exception;
errors number;
error_count number;
pragma exception_init (dml_errors, -24381);
---------
open cursor;
begin loop;
fetch cursor bulk collect into tbl_object limit batch_size;
exit when tbl_object.count = 0;
-- perform business logic
begin
forall in tbl_object save exceptions;
insert into table;
tbl_object.delete;
exception
when dml_errors then
errors := sql%bulk_exceptions.count;
error_count := error_count + errors;
insert into log_table (tstamp, line) values (sysdate, SUBSTR('[pr_procedure_name:'||r_guid||'] Batch # '||batch_number - 1||' had '||errors||' errors',1,300));
end;
end loop;
close cursor;
end procedure;
now based on this pseduo-code I have 2 questions
I am deleting my collection in forall loop. If there is an exception and i decided to fetch some information from my collection in dml_errors block, would i have collection elements in there ? If yes, then would it safe to delete them after logging ?
Since i am keeping my forall in begin-exception-end block, would it keep iterating ?

Are you sure you need to use PL/SQL here? Unless you're doing a lot of processing in the business logic that you aren't showing us that can't be done in SQL, I would tend to use DML error logging. That will be more efficient, less code, and give you better logging.
DBMS_ERRLOG.CREATE_ERROR_LOG( 'DESTINATION_TABLE' );
INSERT INTO destingation_table( <<columns>> )
SELECT <<columns>>
FROM source_table
LOG ERRORS
REJECT LIMIT UNLIMITED;
I don't see any reason to delete from your tbl_object collection. That doesn't seem to be gaining you anything. It's just costing some time. If your indentation is indicative of your expected control flow, you're thinking that the delete is part of the forall loop-- that would be incorrect. Only the insert is part of the forall, the delete is a separate operation.
If your second question is "If exceptions are raised in iteration N, will the loop still do the N+1th fetch", the answer is yes, it will.
One thing to note-- since error_count is not initialized, it will always be NULL. You'd need to initialize it to 0 in order for it to record the total number of errors.

Related

How to return Cursor as OUT parameter in DB2 for z/OS

I'm using DB2 for z/OS as my database. I have written one stored procedure in DB2 where it will return some result set. Currently I have declared one cursor and calling OPEN Cur at the end of the stored procedure. I,m calling my procedure from Java and I'm getting the result set using ResultSet resultSet = callableStatement.getResultSet();My SP is working for few hundred records. But getting failed when table contains millions of data:
Caused by: com.ibm.db2.jcc.am.SqlException: DB2 SQL Error:
SQLCODE=-904, SQLSTATE=57011, SQLERRMC=00C90084;00000100;DB2-MANAGED
SPACE WITHOUT SECONDARY ALLOCATION OR US, DRIVER=4.24.92
I want to know
Is it possible to return Cursor as OUT parameter in my SP ?
What is the difference between taking data using OPEN curs way and CURSOR as OUT parameter ?
How to solve issue when data is huge ?
Will CURSOR as OUT parameter solve the issue ?
EDITED (SP detail):
DYNAMIC RESULT SET 1
P1: BEGIN
-- Declare cursor
DECLARE cursor1 CURSOR WITH RETURN FOR
select a.TABLE_A_ID as TABLE_A_ID,
b.TABLE_B_ID as TABLE_B_ID
from TABLE_A a
left join TABLE_C c on
a.TABLE_A_ID = c.TABLE_A_ID
inner join TABLE_B b on
b.CONTXT_ID = a.CONTXT_ID
AND b.CONTXT_POINT_ID = a.CONTXT_POINT_ID
AND b.CONTXT_ART_ID = a.CONTXT_ART_ID
where c.TABLE_A_ID is null ;
OPEN cursor1;
Refer to the documentation here for suggestions for handling this specific condition. Consider each suggestion.
Talk with your DBA for Z/OS and decide on the best course of action in your specific circumstances.
As we cannot see your stored-procedure source code, more than one option might exist, especially if the queries in the stored-procedures are unoptimised.
While usually it's easier to allocate more temporary space for the relevant tablespace(s) at the Db2-server end, that may simply temporarily mask the issue rather than fix it. But if the stored-procedure has poor design or unoptimised queries, then fix that first.
An SQL PL procedure can return a CURSOR as an output parameter, but that cursor is usable by the calling SQL PL code. It may not be usable by Java.
You ask "how to solve issue when data is huge", although you don't define in numbers the meaning of huge. It is a relative term. Code your SQL procedure properly, index every query in that procedure properly and verify the access plans carefully. Return the smallest possible number of rows and columns in the result-set.

How to view the content of a resultset in Toad from a stored procedure with unknown number of columns?

In Tsql I can execute a stored procedure in Query Analyzer and view the content of a resultset right there query analyzer window without know anything about the query structure (tables, columns, ...)
--Tsql sample
exec myproc parm1, parm2, parm3
Now I am working with PLsql and Toad (which I am relatively new at for Toad). I need to view the content of a resultset of a convoluted stored procedure, and I don't know what the number of columns is -- let alone their data types (this proc is composed of several freaky subqueries -- which I can view individually, but they get pivoted, and the number of columns varies in the final resultset). How can I view the content of this resultset in Toad when I execute the procedure when I don't know how many columns there are or their data types?
Below is code that I have mustered together for viewing the content of a result set of stored procedures where I know how many columns there are and their data types ahead of time. In my code sample below I use a sys_refcursor that I named x_out and I also create a temporary table to store the content of the resultset for additional viewing. Is there a way I can do this when I don't know how many columns there are in the resultset? How to do this with PLsql -- Toad?
create global temporary table tmpResult (fld1 number, fld2 varchar(50), fld3 date);
declare
x_out sys_refcursor;
tmpfld1 number;
tmpfld2 varchar2(50);
tmpfld3 date;
BEGIN
myschema.mypkg.myproc(parm1, parm2, x_out);
LOOP
FETCH x_out INTO tmpfld1, tmpfld2, tmpfld3;
DBMS_OUTPUT.Put_Line ('fld1:-- '||tmpfld1||': fld2:-- '||tmpfld2||': fld3:-- '||tmpfld3);
-- I also insert the result set to a temp table for additional viewing of the data from the stored procedure
Insert Into tmpResult values(tmpfld1, tmpfld2, tmpfld3);
EXIT WHEN x_out%NOTFOUND;
END LOOP;
END;
Toad can automatically retrieve the cursor for you. You have a few options, #3 perhaps is the easiest if you just want to see the data.
If you have the myschema.mypkg loaded in the Editor you can hit F11 to execute it. In the dialog that shows select your package member to the left and select the Output Options tab. Check the option to fetch cursor results or use the DBMS Output options. Click OK and the package executes. Depending on your Toad version you'll see a grid at the bottom of Editor for your results or you'll see a PL/SQL results tab. If you see the latter double click the (CURSOR) value in the output column for your output argument. I suggest using the fetch option as long as your dataset isn't so large that it will cause Out of Memory errors.
Locate your package in the Schema Browser and rt-click, Execute Package. You'll see the same dialog as mentioned in #1. Follow the remaining steps there.
Use a bind variable from an anonymous block. Using your example you'd want something like this...
declare
x_out sys_refcursor;
begin
myschema.mypkg.myproc(parm1, parm2, x_out);
:retval := x_out;
end;
Execute this with F9 in the Editor. In the Bind Variable popup set the datatype of retval to Cursor. Click OK. Your results are then shown in the data grid. Again if your dataset is very large you may run out of memory here.
StackOverflow not letting me post this other solution:
I try posting part of this other solution (if SOF lets me) - this the 2nd half of the other way:
BEGIN
myschema.mypkg.myproc(parm1, parm2, parm3 x_out);
FOR rec_ IN get_columns LOOP
DBMS_OUTPUT.put_line(rec_.name || ': ' || rec_.VALUE);
END LOOP;
END;
and here is the 1st half of the other way:
DECLARE
x_out SYS_REFCURSOR;
CURSOR get_columns IS
...
You should bind the cursor to ":data_grid" in order to show SP result in Toad Data Grid pane.
Call Store Procedure in PL/SQL Script:
Run with F9 not F5
Toad is the best I know when it comes to DB IDE.
Press "F9" and bind it. That is all

Informix DELETE query taking long time

In production, I am facing this problem.
There is a delete which is taking long time to execute and is finally throwing SQL error of -243.
I got the query using onstat -g.
Is there any way to find out what is causing it to take this much time and finally error out?
It uses COMMITTED READ isolation.
This is causing high Informix cpu usage as well.
EDIT
Environment - Informix 9.2 on Solaris
I do not see any issue related to indexes or application logic, but I suspect some informix corruption.
The session holds 8 locks on different tables while executing this DELETE query.
But, I do not see any locks on the table on which the delete is performed.
Would it be something like, informix is unable to get lock on the table?
DELETE doesn't care about your isolation level. You are getting 243 because another process is locking the table while you're trying to run your delete operation.
I would put your delete into an SP and commit each Xth record:
CREATE PROCEDURE tmp_delete_sp (
p_commit_records INTEGER
)
RETURNING
INTEGER,
VARCHAR(64);
DEFINE l_current_count INTEGER;
SET LOCK MODE TO WAIT 5; -- Wait 5 seconds if another process is locking the table.
BEGIN WORK;
FOREACH WITH HOLD
SELECT .....
DELETE FROM table WHERE ref = ^^ Ref from above;
LET l_current_count = l_current_count + 1;
IF (l_current_count >= p_commit_records) THEN
COMMIT WORK;
BEGIN WORK;
LET l_current_count = 0;
END IF;
END FOREACH;
COMMIT WORK;
RETURN 0, 'Deleted records';
END PROCEDURE;
Some syntax issues there, but it's a good starting block for you. Remember, inserts and updates get incrementally slower as you use more logical logs.
Informix was restarted ungracefully many times, which led to informix instability.
This was the root cause.

How to create a record only if it doesn't exist, avoid duplications and don't raise any error?

It's well known that Model.find_or_create_by(X) actually does:
select by X
if nothing found -> create by X
return a record (found or created)
and there may be race condition between steps 1 and 2. To avoid a duplication of X in the database one should use an unique index on the set of fields of X. But if you apply an unique index then one of competing transactions would fail with exception (when trying to create a copy of X).
How can I implement 'a safe version' of #find_or_create_by which would never raise any exception and always work as expected?
The answer is in the doc
Whether that is a problem or not depends on the logic of the application, but in the particular case in which rows have a UNIQUE constraint an exception may be raised, just retry:
begin
CreditAccount.find_or_create_by(user_id: user.id)
rescue ActiveRecord::RecordNotUnique
retry
end
Solution 1
You could implement the following in your model(s), or in a Concern if you need to stay DRY
def self.find_or_create_by(*)
super
rescue ActiveRecord::RecordNotUnique
retry
end
Usage: Model.find_or_create_by(X)
Solution 2
Or if you don't want to overwrite find_or_create_by, you can add the following to your model(s)
def self.safe_find_or_create_by(*args, &block)
find_or_create_by *args, &block
rescue ActiveRecord::RecordNotUnique
retry
end
Usage: Model.safe_find_or_create_by(X)
It's the recurring problem of "SELECT-or-INSERT", closely related to the popular UPSERT problem. The upcoming Postgres 9.5 supplies the new INSERT .. ON CONFLICT DO NOTHING | UPDATE to provide clean solutions for each.
Implementation for Postgres 9.4
For now, I suggest this bullet-proof implementation using two server-side plpgsql functions. Only the helper-function for the INSERT implements the more expensive error-trapping, and that's only called if the SELECT does not succeed.
This never raises an exception due to a unique violation and always returns a row.
Assumptions:
Assuming a table named tbl with a column x of data type text. Adapt to your case accordingly.
x is defined UNIQUE or PRIMARY KEY.
You want to return the whole row from the underlying table (return a record (found or created)).
In many cases the row is already there. (Does not have to be the majority of cases, SELECT is a lot cheaper than INSERT.) Else it may be more efficient to try the INSERT first.
Helper function:
CREATE OR REPLACE FUNCTION f_insert_x(_x text)
RETURNS SETOF tbl AS
$func$
BEGIN
RETURN QUERY
INSERT INTO tbl(x) VALUES (_x) RETURNING *;
EXCEPTION WHEN UNIQUE_VIOLATION THEN -- catch exception, no row is returned
-- do nothing
END
$func$ LANGUAGE plpgsql;
Main function:
CREATE OR REPLACE FUNCTION f_x(_x text)
RETURNS SETOF tbl AS
$func$
BEGIN
LOOP
RETURN QUERY
SELECT * FROM tbl WHERE x = _x
UNION ALL
SELECT * FROM f_insert_x(_x) -- only executed if x not found
LIMIT 1;
EXIT WHEN FOUND; -- else keep looping
END LOOP;
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM f_x('foo');
SQL Fiddle demo.
The function is based on what I have worked out in this related answer:
Is SELECT or INSERT in a function prone to race conditions?
Detailed explanation and links there.
We could also create a generic function with polymorphic return type and dynamic SQL to work for any given column and table (but that's beyond the scope of this question):
Refactor a PL/pgSQL function to return the output of various SELECT queries
Basics for UPSERT in this related answer by Craig Ringer:
How to UPSERT (MERGE, INSERT ... ON DUPLICATE UPDATE) in PostgreSQL?
There is a method called find_or_create_by in rails
This link will help you to understand it better
But personally I prefer to have a find first and if nothing found then create, (I think it has more control)
Ex:
user = User.find(params[:id])
#User.create(#attributes) unless user
HTH

Best method of frequently storing, searching and modifying a large data set in Delphi

What would be the best way, in delphi, to create and store data which will often be searched on and modified?
Basically, I would like to write a function that searches an existing database for telephone numbers and keeps track of how many times each telephone number has been used, the first date used, and the latest date used. The database that is being searched is basically a log of orders placed, containing the telephone number that was used to place the order. It's not an SQL database or anything that can easily be queried for such things (it's an old btrieve database), so I need to create a way of gaining this information (to eventually output to a text file).
I am thinking of creating a record containing the phone number, the two dates, and the number of times used, and then adding a record to a dynamic array for each telephone number. I would then search the array, entry by entry, for each record in the database, to see if the phone number for the current record is already in the array. Then updating or creating a record as necessary.
This seems like it would work, but as there are tens of thousands of entries in the database, it may not be the best way, and a rather slow and inefficient way of doing things. Is there a better way, given the limited actions I can perform on the database?
Someone suggested that rather than using an array, use a MySQL table to keep track of the numbers, and then query each number for every database record. This seems like even more overhead though!
Thanks a lot for your time.
I would register the aggregates in a totally disconnected TClientDataset(cds), and updating the values as you get them from the looping. If the Btrieve could be sorted by telephone number, much better. Then use the data on the cds to generate the report.
(If you go this way, I suggest get Midas SpeedFix from the Andreas Hausladen' blog, along with the other finest stuff you can find there).
Ok, here is a double pass old-school method that works well and should scale well (I used this approach against a multi-million record database once, it took time but gave accurate results).
Download and install Turbo Power
SysTools -- the sort engine
works very well for this process.
create a sort, with a fixed record
of phone number, you will be using
this to sort.
Loop thru your records, at each order, add the
phone number to the sort.
Once the first iteration is done, start
popping the phone numbers from the
sort, increment a counter if the
phone number is the same as the last
one read, otherwise report the
number and clear your counter.
This process can also be done with any SQL Database, but my experience has been that the sort method is faster than managing a temporary table and generates the same results.
EDIT -- You stated that this is a BTrieve database, why not just create a key on the phone number, sort on that key, then apply step 4 over this table (next instead of pop). Either way you will need to touch every record in your database to get counts, the index/sort just makes your decision process easier.
For example, lets say that you have two tables, one the customer table is where the results will be stored, and the other the orders table. Sort both by the same phone number. Then start a cursor at the top of both lists and then apply the following psuedocode:
Count := 0;
While (CustomerTable <> eof) and (OrderTable <> eof) do
begin
comp = comparetext( customer.phone, order.phone );
while (comp = 0) and (not orderTable eof) do
begin
inc( Count );
order.next;
comp = comparetext( customer.phone, order.phone );
end;
if comp < 0 then
begin
Customer.TotalCount = count;
save customer;
count := 0;
Customer.next;
end
else if (Comp > 0) and (not OrderTable EOF) then
begin
Order.Next; // order no customer
end;
end;
// handle case where end of orders reached
if (OrdersTable EOF) and (not CustomersTable EOF) then
begin
Customer.TotalCount = count;
save customer;
end;
This code has the benefit of walking both lists once. There are no lookups necessary since both lists are sorted the same, they can be walked top to bottom taking action only when necessary. The only requirement is that both lists have something in common (in this example phone number) and both lists can be sorted.
I did not handle the case where there is an order and no customer. My assumption was that orders do not exist without customers and would be skipped for counting.
Sorry, couldn't edit my post (wasn't registered at the time). The data will be thrown away once all the records in the database have been iterated through. The function won't be called often. It's basically going to be used as a way of determining how often people have ordered over a period of time from records we already have, so really it's just needed to produce a one off list.
The data will be persistent for the duration of the creation of the list. That is, all telephone numbers will need to be present to be searched on until the very last database record is read.
If you were going to keep it in memory and don't want anything fancy, you'd be better off using a TStringList so you can use the Find function. Find uses Hoare's selection or Quick-select, an O(n) locator. For instance, define a type:
type
TPhoneData = class
private
fPhone:string;
fFirstCalledDate:TDateTime;
fLastCalledDate:TDateTime;
fCallCount:integer;
public
constructor Create(phone:string; firstDate, lastDate:TDateTime);
procedure updateCallData(date:TDateTime);
property phoneNumber:string read fPhone write fPhone;
property firstCalledDate:TDateTime read fFirstCalledDate write fFirstCalledDate;
property lastCalledDate:TDateTime read fLastCalledDate write fLastCalledDate;
property callCount:integer read fCallCount write fCallCount;
end;
{ TPhoneData }
constructor TPhoneData.Create(phone: string; firstDate, lastDate: TDateTime);
begin
fCallCount:=1;
fFirstCalledDate:=firstDate;
fLastCalledDate:=lastDate;
fPhone:=phone;
end;
procedure TPhoneData.updateCallData(date: TDateTime);
begin
inc(fCallCount);
if fFirstCalledDate<date then fFirstCalledDate:=date;
if date>fLastCalledDate then fLastCalledDate:=date;
end;
and then fill it, report on it:
procedure TForm1.btnSortExampleClick(Sender: TObject);
const phoneSeed:array[0..9] of string = ('111-111-1111','222-222-2222','333-333-3333','444-444-4444','555-555-5555','666-666-6666','777-777-7777','888-888-8888','999-999-9999','000-000-0000');
var TSL:TStringList;
TPD:TPhoneData;
i,index:integer;
phone:string;
begin
randseed;
TSL:=TStringList.Create;
TSL.Sorted:=true;
for i := 0 to 100 do
begin
phone:=phoneSeed[random(9)];
if TSL.Find(phone, index) then
TPhoneData(TSL.Objects[index]).updateCallData(now-random(100))
else
TSL.AddObject(phone,TPhoneData.Create(phone,now,now));
end;
for i := 0 to 9 do
begin
if TSL.Find(phoneSeed[i], index) then
begin
TPD:=TPhoneData(TSL.Objects[index]);
ShowMessage(Format('Phone # %s, first called %s, last called %s, num calls %d', [TPD.PhoneNumber, FormatDateTime('mm-dd-yyyy',TPD.firstCalledDate), FormatDateTime('mm-dd-yyyy',TPD.lastCalledDate), TPD.callCount]));
end;
end;
end;
Instead of a TStringList I would recommend using DeCAL's (on sf.net) DMap to store the items in memory. You could specify the phone is the key and store a Record/Class structure containing the rest of the record.
So your Record class will be:
TPhoneData = class
number: string;
access_count: integer;
added: TDateTime.
...
end;
Then in code:
procedure TSomeClass.RegisterPhone(number, phoneData);
begin
//FStore created in Constructor as FStore := DMap.Create;
FStore.putPair([number, phoneData])
end;
...
procedure TSoemClass.GetPhoneAndIncrement(number);
var
Iter: DIterator;
lPhoneData: TPhoneData;
begin
Iter := FStore.locate([number]);
if atEnd(Iter) then
raise Exception.CreateFmt('Number %s not found',[number])
else
begin
lPhoneData := GetObject(Iter) as TPhoneData;
lPhoneData.access_count = lPhoneData.access_count + 1;
//no need to save back to FStore as it holds a pointer to lPhoneData
end;
end;
DMap implements a red/black tree so the data structure sorts the keys for you for free. You can also use a DHashMap for the same affect and (arguably) increased speed.
DeCAL is one of my favourite data structure libraries and would recommend anybody doing in-memory storage operations to have a look.
Hope that helps

Resources