Stored Procedure Order of Operations - stored-procedures

I have a stored procedure which starts by creating a temp table, then it populates the temp table based on a query.
It then does a Merge statement, basically updating a physical table off of the temp table. Lastly it does an Update on the physical table. Pretty basic stuff.
It takes ~8 sec to run. My question is, at what point does it lock the physical table? Since the whole query is compiled prior to its run, is the physical table locked during the entire execution of the stored procedure, or does it wait until it gets to the statements that actually works with the physical table?
I'm not necessarily trying to resolve an issue, as much as make sure I don't cause one. We have other processes that need to be reworked to alleviate blocking, I don't want to create another.

OK, for SQL Server:
a stored procedure is "compiled" (e.g. an execution plan is determined) before it's first usage (and that execution plan is cached in the plan cache until it's either tossed out due to space constraints, or due to a restart). At the time an execution plan is determined, nothing happens on the table level - no locks, nothing
SQL Server will by default use row-level locks, e.g. a row is locked when it's read, inserted, updated or deleted - then and only then. So your procedure will place shared locks on the tables where it selects the data from to insert those rows into the temp table, it will put exclusive locks on the rows being inserted into the temp table (for the duration of the operation / transaction), and the MERGE will also place locks as and when needed
Most locks (not the shared locks, typically) in SQL Server are by default held until the end of the transaction. Unless you specifically handle transactions explicitly, each operation (MERGE, UPDATE, INSERT) runs inside its own implicit transaction, so any locks held will be released at the end of that transaction (e.g. that statement).
There are lots more aspects to locks - one could write entire books to cover all the details - but does this help you understand locking in SQL Server at least a little bit ?

Related

we know data cache for row data and index cache for group values. does aggregator process all the data to cache before its operation?

Can you please help me to understand this by taking below example.
Group by cust_id,item_id.
what records will process to caches(index/data) in both scenarios with sorted input and unsorted input?
What will be case if cache memory runs out?Which alogritham it uses to perform aggregate calculations internally?
I don't know about internal algorithm, but in unsorted mode, it's normal for the Aggregator to store all rows in cache and wait for the last row, because it could be the first that must be returned according to Aggregator rules ! The Aggregator will never complain about the order of incoming rows. When using cache, it will store rows first in memory, then when the allocated memory is full, it will push cache to disk. If it runs out of disk space, the session will fail (and maybe others because of that full disk). You will have to clean those files manually.
In sorted mode, there is no such problem : rows come in groups ready to be aggregated and the aggregated row will go out as soon as all rows from a group are received, which is detected when one of the values of the keys changes. The Aggregator will complain and stop if rows are not in expected order. However it pushes the problem upward to the sorting part, that could be a Sorter, which can use a lot of cache itself, or the database with an ORDER BY clause in the SQL query that could take resources on the database side.
Be careful also that SQL ORDER BY may use a different locale than Informatica.

Getting the number of the issued SQL statements for a specific table

I am using firebird DB, and for testing reasons I want to know how many times a specific table has been accessed, without doing it manually with some counter in the code.
Firebird does not keep a historical record of table access. You might be able to use the Firebird trace facility to track this yourself, but this requires the trace to be active the whole time (which can have an impact on performance), alternatively you can use third-party (paid) tools like FBScanner.
You can also try to use the monitoring tables, specifically MON$RECORD_STATS, but those statistics are only maintained for as long as the database is open (ie has active connections), once the last connection is closed (and assuming database linger is off) the database gets closed, those statistics are expunged.
MON$RECORD_STATS does not contain table access itself, but things like number of records read, inserted, deleted, etc. Associated tables can be found through RDB$TABLE_STATS:
select t.MON$TABLE_NAME, r.MON$STAT_ID, r.MON$STAT_GROUP, r.MON$RECORD_SEQ_READS,
r.MON$RECORD_IDX_READS, r.MON$RECORD_INSERTS, r.MON$RECORD_UPDATES,
r.MON$RECORD_DELETES, r.MON$RECORD_BACKOUTS, r.MON$RECORD_PURGES,
r.MON$RECORD_EXPUNGES, r.MON$RECORD_LOCKS, r.MON$RECORD_WAITS,
r.MON$RECORD_CONFLICTS, r.MON$BACKVERSION_READS, r.MON$FRAGMENT_READS,
r.MON$RECORD_RPT_READS
from MON$TABLE_STATS t
inner join MON$RECORD_STATS r
on t.MON$STAT_GROUP = r.MON$STAT_GROUP and t.MON$RECORD_STAT_ID = r.MON$STAT_ID
For details see doc/README.monitoring_tables.txt in your Firebird install, or README.monitoring_tables.txt (Firebird 3).

Erlang ets insert into multiple tables

I'm Erlang newbie. and I got a question about the ets tables.
I have two ets tables, and I need insert or delete values from both.
insert(V) ->
ets:insert(table_test,V),
ets:insert(table_cp,V).
delete(V)->
ets:delete(table_test,V),
ets:delete(table_cp,V).
how can I guarantee the operation was success or failed in both?
for example, the insert operation, if there is something wrong at ets:insert(table_cp,V), shall I delete the value from talbe_test?
same to delete, if ets:delete(table_cp,V) failed, shall I re-insert the value ?
please help.
What you are asking for is a transaction. ETS doesn't support transactions. Even if you don't insert the value in the other table if the first insert wasn't successful, you can't guarantee that you insert the second value if the first insert is successful because something may happen between those two writes, e.g. the process may die.
If you need transactions please consider mnesia which is built on top of ETS and provides support for transactions, even across distributed Erlang nodes.
It all depends how much you need to rely on the value being inserted to both or neither. If your application can survive (work correctly) with the value inserted only to one of those tables, or if it is able to correct the value if it's inserted incorrectly, then a programmatic handling of failures as you described may work fine. Otherwise ETS wouldn't be the right data structure.

Is is possible in ruby to set a specific active record call to read dirty

I am looking at a rather large database.. Lets say I have an exported flag on the product records.
If I want an estimate of how many products I have with the flag set to false, I can do a call something like this
Product.where(:exported => false).count.. .
The problem I have is even the count takes a long time, because the table of 1 million products is being written to. More specifically exports are happening, and the value I'm interested in counting is ever changing.
So I'd like to do a dirty read on the table... Not a dirty read always. And I 100% don't want all subsequent calls to the database on this connection to be dirty.
But for this one call, dirty is what I'd like.
Oh.. I should mention ruby 1.9.3 heroku and postgresql.
Now.. if I'm missing another way to get the count, I'd be excited to try that.
OH SNOT one last thing.. this example is contrived.
PostgreSQL doesn't support dirty reads.
You might want to use triggers to maintain a materialized view of the count - but doing so will mean that only one transaction at a time can insert a product, because they'll contend for the lock on the product count in the summary table.
Alternately, use system statistics to get a fast approximation.
Or, on PostgreSQL 9.2 and above, ensure there's a primary key (and thus a unique index) and make sure vacuum runs regularly. Then you should be able to do quite a fast count, as PostgreSQL should choose an index-only scan on the primary key.
Note that even if Pg did support dirty reads, the read would still not return perfectly up to date results because rows would sometimes inserted behind the read pointer in a sequential scan. The only way to get a perfectly up to date count is to prevent concurrent inserts: LOCK TABLE thetable IN EXCLUSIVE MODE.
As soon as a query begins to execute it's against a frozen read-only state because that's what MVCC is all about. The values are not changing in that snapshot, only in subsequent amendments to that state. It doesn't matter if your query takes an hour to run, it is operating on data that's locked in time.
If your queries are taking a very long time it sounds like you need an index on your exported column, or whatever values you use in your conditions, as a COUNT against an indexed an column is usually very fast.

Using an SQL Agent Job to call procedures in a loop

I am putting together a job on SQL Enterprise Manager 2000 to copy and delete records in a couple database tables. We've run a straight up mass copy and delete stored procedure, but it could be running it on millions of rows, and therefore hangs the server. I was interested in trying to run the service in 100-ish record chunks at a time, so the server doesn't grind to a halt (this is a live web database). I want this service to run once a night, which is why I've put it in an agent job. Is there any way to loop the calls to the stored procedures that actually do the copy and delete, and then "sleep" in between each call to give the server time to catch up? I know there is the WAITFOR command, but I'm unsure if this will hold the processor or let it run other queries in the meantime.
Thanks!
"Chunkifying" your deletes is the preferred way to delete excessive amounts of data without bloating up transaction log files. BradC's post is a reasonable example of this.
Managing such loops is best done within a single stored procedure. To spread such work out over time, I'd still keep it in the procedure. Inserting a WAITFOR in the loop will put a "pause" between each set of deletes, if you deem that necessary to deal with possible concurrency issues. Use a SQL Agent job to determine when the procedure start--and if you need to make sure it stops by a certain time, work that into the loop as well.
My spin on this code would be:
-- NOTE: This is a code sample, I have not tested it
CREATE PROCEDURE ArchiveData
#StopBy DateTime
-- Pass in a cutoff time. If it runs this long, the procedure will stop.
AS
DECLARE #LastBatch int
SET #LastBatch = 1
-- Initialized to make sure the loop runs at least once
WHILE #LastBatch > 0
BEGIN
WAITFOR DELAY '00:00:02'
-- Set this to your desired delay factor
DELETE top 1000 -- Or however many per pass are desired
from SourceTable
-- Be sure to add a where clause if you don't want to delete everything!
SET #LastBatch = ##rowcount
IF getdate() > #StopBy
SET #LastBatch = 0
END
RETURN 0
Hmm. Rereading you post implies that you want to copy the data somewhere first before deleting it. To do that, I'd set up a temp table, and inside the loop first truncate the temp table, then copy in the primary keys of the TOP N items, insert into the "archive" table via a join to the temp table, then delete the source table also via a join to the temp table. (Just a bit more complex than a straight delete, isn't it?)
Don't worry about waiting between loops, SQL server should handle the contention between your maintenance job and the regular activity on the server.
What really causes the problem in these types of situations is that the entire delete process happens all at once, inside a single transaction. This blows up the log for the database, and can cause the kinds of problems it sounds like you are experiencing.
Use a loop like this to delete in manageable chunks:
DECLARE #i INT
SET #i = 1
SET ROWCOUNT 10000
WHILE #i > 0
BEGIN
BEGIN TRAN
DELETE TOP 1000 FROM dbo.SuperBigTable
WHERE RowDate < '2009-01-01'
COMMIT
SELECT #i = ##ROWCOUNT
END
SET ROWCOUNT 0
You can use similar logic for your copy.
WAITFOR will let other processes 'have a go'. I've used this technique to stop large DELETE's locking up the machine. Create a WHILE loop, delete a block of rows, and then WAITFOR a few seconds (or less, whatever is appropriate).

Resources