I'm trying to enhance the performance of a SQL Server 2000 job. Here's the scenario:
Table A has max. of 300,000 rows. If I update/delete the 100th row (Based on the insertion time) all the rows which has been added after that row, should update their values. Row no. 101, should update its value based on row no. 100 and row no. 102 should update its value based on the row no.101's updated value. e.g.
Old Table:
ID...........Value
100..........220
101..........(220/2) = 110
102..........(110/2)=55
......................
Row No. 100 updated with new value: 300.
New Table
ID...........Value
100..........300
101..........(300/2) = 150
102..........(150/2)=75
......................
The actual values calculation is more complex. the formula is for simplicity.
Right now, a trigger is defined for update/delete statements. When a row is updated or deleted, the trigger adds the row's data to a log table. Also, a SQL Job is created in code-behind after update/delete which fires a stored procedure that finally, iterates through all the next rows of table A and updates their values. The process takes ~10 days to be accomplished for 300,000 rows.
When the SP gets fired, it updates the next rows' values. I think this causes the trigger to run again for each SP update and add these rows to the log table too. Also, The task should be done in DB-side as requested by customer.
To solve the problem:
Modify the stored procedure and call it directly from the trigger. The stored procedure then drops the trigger and updates the next rows' values and then creates the trigger again.
There will be multiple instances of the program running simultaneously. if another user modifies a row while the SP is being executed, the system will not fire the trigger and I'll be in trouble! Is there any workaround for this?
What's your opinion about this solution? Is there any better way to achieve this?
Thank you.
First, about the update process. I understand, your procedure is simply calling itself, when it comes to updating the next row. With 300K rows this is certainly not going to be very fast, even without logging (though it would most probably take much fewer days to accomplish). But what is absolutely beyond me is how it is possible to update more than 32 rows that way without reaching the maximum nesting level. Maybe I've got the sequence of actions wrong.
Anyway, I would probably do that differently, with just one instruction:
UPDATE yourtable
SET #value = Value = CASE ID
WHEN #id THEN #value
ELSE #value / 2 /* basically, your formula */
END
WHERE ID >= #id
OPTION (MAXDOP 1);
The OPTION (MAXDOP 1) bit of the statement limits the degree of parallelism for the statement to 1, thus making sure the rows are updated sequentially and every value is based on the previous one, i.e. on the value from the row with the preceding ID value. Also, the ID column should be made a clustered index, which it typically is by default, when it's made the primary key.
The other functionality of the update procedure, i.e. dropping and recreating the trigger, should probably be replaced by disabling and re-enabling it:
ALTER TABLE yourtable DISABLE TRIGGER yourtabletrigger
/* the update part */
ALTER TABLE yourtable ENABLE TRIGGER yourtabletrigger
But then, you are saying the trigger shouldn't actually be dropped/disabled, because several users might update the table at the same time.
All right then, we are not touching the trigger.
Instead I would suggest adding a special column to the table, the one the users shouldn't be aware of, or at least shouldn't care much of and should somehow be made sure never to touch. That column should only be updated by your 'cascading update' process. By checking whether that column was being updated or not you would know whether you should call the update procedure and the logging.
So, in your trigger there could be something like this:
IF NOT UPDATE(SpecialColumn) BEGIN
/* assuming that without SpecialColumn only one row can be updated */
SELECT TOP 1 #id = ID, #value = Value FROM inserted;
EXEC UpdateProc #id, #value;
EXEC LogProc ...;
END
In UpdateProc:
UPDATE yourtable
SET #value = Value = #value / 2,
SpecialColumn = SpecialColumn /* basically, just anything, since it can
only be updated by this procedure */
WHERE ID > #id
OPTION (MAXDOP 1);
You may have noticed that the UPDATE statement is slightly different this time. I understand, your trigger is FOR UPDATE (= AFTER UPDATE), which means that the #id row is already going to be updated by the user. So the procedure should skip it and start from the very next row, and the update expression can now be just the formula.
In conclusion I'd like to say that my test update involved 299,995 of my table's 300,000 rows and took approximately 3 seconds on my not so very fast system. No logging, of course, but I think that should give your the basic picture of how fast it can be.
Big theoretical problem here. It is always extremely suspicious when updating one row REQUIRES updating 299,900 other rows. It suggests a deep flaw in the data model. Not that it is never appropriate, just that it is required far far less often than people think. When things like this are absolutely necessary, they are usually done as a batch operation.
The best you can hope for, in some miraculous situation, is to turn that 10 days into 10 minutes, but never even 10 seconds. I would suggest explaining thoroughly WHY this seems necessary, so that another approach can be explored.
Related
Good day to everyone! Hope all is well!
I am looking to run an update query or a group of queries that looks at my Date_Start and Date_End to determine if the Units (quantity of the respective record) fall in my defined current quarter 1/2/3/4 from another table (this table is a master table I’m using to provide the dates that I need to consider for defining the quarters).
I’ve been able to create queries that do this and then join them together to basically display the units out by quarter based on their respective start/end dates. The problem I am running into is this process takes a decent amount of time for the queries to populate that will drastically effect other processes down the line.
Thus we get to my desire. I am trying to no avail to create an update query that will update the quarter fields in my table based off of the queries I built to determine if the records start/end date fall in the respective quarter. I figure that running this update when records change will be an ok run time vs when I’m running reports or running an email script for the reports.
I have tried pulling in the table and query, joining them as equal on ID (the query pulls in the table's IDs), and selecting my field “CQ1” from the table, and setting the Update ether the respective field from the table or the query (which is the same as the field in the table).
All I get are the current values of the field in the data sheet view and an error of “Operation must use an updateable query.”
I have even tried placing a zero to see if that would do it with no luck. I have verified that all the fields are the same data type.
What am I doing wrong? Thanks!
Apologies to everyone.....I think my conscious brain was trying to overly complicate the process and while talking to a buddy about my issue distractedly created a new update query that worked. It all tied down to that I forgot to put a criteria on my quarter filed of is not null I believe. Thanks for anyone that has read this and is responding while I’m typing this or for those of you formulating a response.
After committing a row with a SERIAL (autonumber) column, the row is deleted, but when another row is added, the deleted row's sequence ID is not reused.
The only way I have found for reusing the deleted row's sequence ID is to ALTER the SERIAL column to an INTEGER, then change it back to a SERIAL.
Is there an easier quicker way for accomplishing the resetting of the next sequence ID so that there are no gaps in the sequence?
NOTE: This is a single-user application, so no worries about multiple users simultaneously inserting rows.
There isn't a particularly easy way to do that. You can reset the number by inserting … hmmm, once upon a long time ago, there were bugs in this, and you're using ancient enough versions of the software that on occasions the bug might still be relevant, though all current versions of Informix products do not have the bug.
The safe technique is to insert: 231-2 (note the minus 2; that's +2,147,483,646), then insert a row with 0 (to generate +2,147,483,647), then insert another row with 0 to trip the next sequence number back to 1. That insert operation will fail if there's already a row with 1 in the system and you have a unique constraint on the SERIAL column. You then need to insert the maximum value, or the value before the first gap that you want to fill (another failing insert). However, note that after filling the gap, the inserted values will increase by one, bumping into any still existing rows and causing insertion failures (because you do have a unique constraint/index on each SERIAL column, don't you — and don't 'fess up if you do not have such indexes; just go and add them!).
If you have a more recent version of Informix, you can insert +2,147,483,647 and then a single row to wrap the value without running into trouble. If you have an old version of Informix with the bug, then inserting +2,147,482,647 directly caused problems. IIRC, the trouble was that you ended up with NULLs being generated, but it was long enough ago now (another millennium) that I'm not absolutely sure of that any more.
None of this is really easy, in case you hadn't noticed.
Generally speaking, it is unwise to fill the gaps; you're better off leaving them and not worrying about them, or inserting some sort of dummy record that says "this isn't really a record — but the serial value is otherwise missing so I'm here to show that we know about it and it isn't missing but isn't really used either".
Let's say I am showing stock prices, or sports scores, or movie attendance or something.
Periodically, I will refresh the grid by Close() and then Open() of a query linked to its associated datasource.
I know how to owner draw a cell with OnDrawCell() - what I can't figure out is how to know if the new value is the same as or different from the previous value for a given cell.
I suppose there are two use cases here, one where the number of rows is fixed and they remain in the same row order and one where rows can change (insert/delete or reorder).
For the former, I can take a snapshopshot before updating and compare after the update, but that might be a lot of data. I am not sure if I want to restrict the operation to the currently visible rows. I think that a user might want to scroll down and still be notified of any which have changed during the last update.
For the latter, I am stumped, unless, of course, each row has a unique key.
How can I do this (efficiently)? A solution for TDbGrid would help everyone, a solution with TMS Software's TAdvDbGrid would be fine by me (as would a (preferably free) 3rd party component).
TDBGrid reads the data currently contained in its assigned dataset. It has no capacity to remember prior values, perform calculations, or anything else. If you want to track changes, you have to do it yourself. You can do it by multiple means (a prior value column, a history table, or whatever), but it can't be done by the grid itself. TDBGrid is for presenting data, not analyzing or storing it.
One suggestion would be to track it in the dataset using the BeforePost event, where you can store the _oldvalue of a your into a LastValue column, and then use that to see if the value has changed in your TDBGrid.OnDrawColumnCell event and alter the drawing/coloring as needed. Something like if LastValue <> CurrValue then... should work.
I have code first MVC 4 application with SQL Server 2008.
One of my tables is used much, so, many data stored on it in every day and after some time I delete old data. That is why, element's ID increases speedily.
I defined it's ID as int type in the model. I am worry about table will fill after some time.
What will I do if table ID will arrive max length? I never meet with this situation.
My second question is that, if I will change ID's type from int to long type, then export-import database, will this long type affect (reduce) speed of the site?
If you use an INT IDENTITY starting at 1, and you insert a row every second, every day of the year, all year long -- then you need 66.5 years before you hit the 2 billion limit ...
If you're afraid - what does this command tell you?
SELECT IDENT_CURRENT('your-table-name-here')
Are you really getting close to 2,147,483,647 - or are you still a ways away??
If you should be getting "too close": you can always change the column's datatype to be BIGINT - if you use a BIGINT IDENTITY starting at 1, and you insert one thousand rows every second, you need a mind-boggling 292 million years before you hit the 922 quadrillion limit ....
As far as I know, there isn't any effective method to prevent reaching the limit of the auto-increased identity. You can set it to a data type big enough to last long when you create the table. But, here's one solution I can think of.
Create a new temp table with the same data structure, with the auto-increament column already included and set as primary key. Then, inside the Management Studio, import the data into the new table from the old table. When you are asked to copy the data or write your own query, just choose to write the query, and select everything from your old table except the ID. This can reset the identity to start back from 1. You can delete the old table after that and rename the new temp table. Although you have to right click the database itself to access the Import and Export command, you can set the source and destination database as the same in the options.
This method is pretty much easy and I've done it myself a couple of times.
I have a python program that calls a stored procedure from db2 database. I am using results = cursor.fetchall() to process the results of my stored procedure. However, my stored procedure returns two cursors. results only contains the first one. I need a way to loop through as many cursors as I want. I was hoping fetchmany() would be my answer but it is not.
I need to be able to do multiple result sets as the program I am writing can only call one stored procedure. It would take a lot to go back and make it to be able to call two. Besides with one of these things I need to make 10 cursors return. Everything is dynamic, so the application doesn't know what procedure it is running, it just gets the data and spits it into excel not knowing the meaning. I need one cursor for the data, and the other cursors for different types of counts and totals.
I am looking for a built in function to do this, or maybe even a different library because I have done my share of googling and it looks like pyodbc does not do this for DB2. DB2 is a requirement.
Use the nextset() method of the cursor: https://github.com/mkleehammer/pyodbc/wiki/Cursor#nextset
Sample code:
# fetch rows from first set
rows = cursor.fetchall()
# process first set rows here
# advance to next result set
while (cursor.nextset()):
# fetch rows from next set, discarding first
rows = cursor.fetchall()
# process next set rows here
nextset() will return True if additional result sets are available, and subsequent cursor fetch methods will return rows from the next set. The method returns None if no additional sets are available.
Just a small simplification for the record:
while True:
rows = cursor.fetchall()
# process all result sets in the same place
if not cursor.nextset():
break