Deletion of rows from Informix Database - informix

I have around 3 Million rows in a Table in Informix DB.
We have to delete it, before loading new data.
It has a primary key on one of its columns.
For deleting the same, I thought of going with rowid usage. But when I tried
select rowid from table
it responded with -857 error [Rowid does not exist].
So, I am not sure, how to go with the deletion. I prefer not going with primary key, as deletion with primary key is costly compared with rowid deletion.
Any suggestion on the above would be helpful.

If you get error -857, the chances are that the table is fragmented, and was created without the WITH ROWIDS option.
Which version of Informix are you using, and on which platform?
The chances are high that you have the TRUNCATE TABLE statement, which is designed to drop all the rows from a table very quickly indeed.
Failing that, you can use a straight-forward:
DELETE FROM TableName;
as long as you have sufficient logical log space available. If that won't work, then you'll need to do repeated DELETE statements based on ranges of the primary key (or any other convenient column).
Or you could consider dropping the table and then creating it afresh, possible with the WITH ROWIDS clause (though I would not particularly recommend using the WITH ROWIDS clause - it becomes a physical column with index instead of being a virtual column as it is in a non-fragmented table). One of the downsides of dropping and rebuilding a table is that the referential constraints have to be reinstated, and any views built on the table are automatically dropped when the table is dropped, so they have to be reinstated too.

I'm assuming this is IDS?.. How many new rows will be loaded and how often is this process repeated?.. Despite having to re-establish referential constraints and views, in my opinion, it is much better to drop the table, create it from scratch, load the data and then create the indexes because if you just delete all the rows, the deleted rows still remain physically in the table with a NULL \0 flag at the end of the row, thus the table size will be even larger when loading in the new rows and performance will suffer!.. It's also a good opportunity to create fresh indexes, and if possible, pre-sort the load data so that its in the most desirable order (like when creating a CLUSTERED INDEX). If you're going to fragment your tables on expressions or other type, then ROWID's go out the window, but use WITH ROWIDS if you're sure the table will never be fragmented. If your table has a serial column, are there any other tables using the serial columns as a foreign key?

Related

How can I keep tblPurchase and tblProductStock table without drop. (I need keep both table and value permanent without drop)

How to change this stored procedure without drop tblPurchase and tblProductStock.
When I run my program with this stored procedure after adding new record table and data will be drop. I want to keep all table and data protected. please help me to resolve that.
https://stackoverflow.com/a/65774309/13338320
Indexed View
An entirely new solution based on Indexed Views is possible.
An Indexed View is a view which has a clustered index on it, and the data is actually stored on disk.
As I understand it, you are trying to keep a sum of purchases per product item stored in tblProduct. I have assumed that ItemCode is the PK of tblProduct and that ItemName is also defined there (We cannot use MAX in an indexed view). So we can define a view like this:
CREATE VIEW dbo.vwTotalPurchases
WITH SCHEMABINDING -- must be schema bound, we cannot change underlying columns after creation
AS
SELECT
ItemCode,
SUM(Quantity) QuantityPurchased,
COUNT_BIG(*) CountPurchases -- if we group, must have count also, so that rows can be maintained
FROM dbo.tblPurchase -- must use two-part names
GROUP BY itemCode;
GO
We can then create a clustered index on it to persist it on disk. SQL Server will maintain the index whenever an update to the base table happens. If there are no more rows in the grouping (identified by count being 0), then the row is deleted:
CREATE UNIQUE CLUSTERED INDEX PK_vwTotalPurchases ON dbo.vwTotalPurchases (ItemCode);
GO
Now if we want to query it, we can left join this view onto tblProducts (left join because there may be no purchases):
SELECT
p.ItemCode,
p.ItemName,
ISNULL(tp.QuantityPurchased, 0) QuantityPurchased,
ISNULL(tp.CountPurchases, 0) CountPurchases
FROM tblProducts p
LEFT JOIN vwTotalPurchases tp WITH (NOEXPAND) ON tp.ItemCode = p.ItemCode;
We can define this as a view also (not an indexed one, but a standard view) so that the definition is usable anywhere.
Note on NOEXPAND:
If you are not on SQL Server Enterprise or Developer Edition, you must use the hint WITH (NOEXPAND) to force it to use the index, otherwise it will query the base tblPurchase instead. And even in those editions, it is best to use NOEXPAND.
See this article by Paul White on this.

Merging without rewriting one table

I'm wondering about something that doesn't seem efficient to me.
I have 2 tables, one very large table DATA (millions of rows and hundreds of cols), with an id as primary key.
I then have another table, NEW_COL, with variable rows (1 to millions) but alwas 2 cols : id, and new_col_name.
I want to update the first table, adding the new_data to it.
Of course, i know how to do it with a proc sql/left join, or a data step/merge.
Yet, it seems inefficient, as far as I see with time executing, (which may be wrong), these 2 ways of doing rewrite the huge table completly, even when NEW_DATA is only 1 row (almost 1 min).
I tried doing 2 sql, with alter table add column then update, but it's waaaaaaaay too slow as update with joining doesn't seem efficient at all.
So, is there an efficient way to "add a column" to an existing table WITHOUT rewriting this huge table ?
Thanks!
SAS datasets are row stores and not columnar stores like tables in other databases. As such, adding rows is far easier and efficient than adding columns. A key joined view could be argued as the most 'efficient' way to add a column to a data rectangle.
If you are adding columns so often that the 1 min resource incursion is a problem you may need to upgrade hardware with faster drives, less contentious operating environment, or more memory and SASFILE if the new columns are often yet temporary in nature.
#Richard answer is perfect. If you are adding columns on regular basis then there is problem with your design. You either need to give more details on what you are doing and someone can suggest you.
I would try hash join. you can find code for simple hash join. This is efficient way of joining because in your case you have one large table and one small table if it fit into memory, it much better than a left join. I have done various joins using and query run times was considerably less( to order of 10)
By Altering table approach you are rewriting the table and also it causes lock on your table and nobody can use the table.
You should perform this joins when workload is less, which means during not during office and you may need to schedule the jobs in night, when more SAS resources are available
Thanks for your answers guys.
To add information, i don't have any constraint about table locking, balance load or anything as it's a "projet tool" script I use.
The goal is, in data prep step 'starting point data generator', to recompute an already existing data, or add a new one (less often but still quite regularly). Thus, i just don't want to "lose" time to wait for the whole table to rewrite while i only need to update one data for specific rows.
When i monitor the servor, the computation of the data and the joining step are very fast. But when I want tu update only 1 row, i see the whole table rewriting. Seems a waste of ressource to me.
But it seems it's a mandatory step, so can't do much about it.
Too bad.

Surrogate keys in fact-less fact tables

Why do you need surrogate keys in fact-less fact tables (or many to many dimensional relation tables)
Few circumstances when assigning a surrogate key to the rows in a fact table is beneficial:
Sometimes the business rules of the organization legitimately allow multiple identical rows to exist for a fact table. Normally as a designer, you try to avoid this at all costs by searching the source system for some kind of transaction time stamp to make the rows unique. But occasionally you are forced to accept this undesirable input. In these situations it will be necessary to create a surrogate key for the fact table to allow the identical rows to be loaded.
Certain ETL techniques for updating fact rows are only feasible if a surrogate key is assigned to the fact rows. Specifically, one technique for loading updates to fact rows is to insert the rows to be updated as new rows, then to delete the original rows as a second step as a single transaction. The advantages of this technique from an ETL perspective are improved load performance, improved recovery capability and improved audit capabilities. The surrogate key for the fact table rows is required as multiple identical primary keys will often exist for the old and new versions of the updated fact rows between the time of the insert of the updated row and the delete of the old row.
A similar ETL requirement is to determine exactly where a load job was suspended, either to resume loading or back put the job entirely. A sequentially assigned surrogate key makes this task straightforward.

Table Normalization with no Domain values

There is a debate between our ETL team and a Data Modeler on whether a table should be normalized or not, and I was hoping to get some perspective from the online community.
Currently the tables are set up as such
MainTable LookupTable
PrimaryKey (PK) Code (PK)
Code (FK) Name
OtherColumns
Both tables are only being populated by a periodic file (from a 3rd party)
through an ETL job
A single record in the file contains all attributes in both tables for a single row)
The file populating these tables is a delta (only rows with some change in them are in the file)
One change to one attribute for one record (again only by the 3rd party) will result in all the data for that record in the file
The Domain Values for Code and Name are
not known.
Question:Should the LookupTable be denormalized into MainTable.
ETL team: Yes. With this setup, every row from the file will first have to check the 2nd table to see if their FK is in there (insert if it is not), then add the MainTable row. More Code, Worse Performance, and yes slightly more space. However ,regardless of a change to a LookupTable.Name from a 3rd party, the periodic file will reflect every row affected, and we will still have to parse through each row. If lumped into MainTable, all it is, is a simple update or insert.
Data Modeler: This is standard good database design.
Any thoughts?
Build prototypes. Make measurements.
You started with this, which your data modeler says is a standard good database design.
MainTable LookupTable
PrimaryKey (PK) Code (PK)
Code (FK) Name
OtherColumns
He's right. But this, too, is a good database design.
MainTable
PrimaryKey (PK)
Name
OtherColumns
If all updates to these tables come only from the ETL job, you don't need to be terribly concerned about enforcing data integrity through foreign keys. The ETL job would add new names to the lookup table anyway, regardless of what their values happen to be. Data integrity depends mainly on the system the data is extracted from. (And the quality of the ETL job.)
With this setup, every row from the file will first have to check the
2nd table to see if their FK is in there (insert if it is not), then
add the MainTable row.
If they're doing row-by-row processing, hire new ETL guys. Seriously.
More Code, Worse Performance, and yes slightly more space.
They'll need a little more code to update two tables instead of one. How long does it take to write the SQL statements? How long to run them? (How long each way?)
Worse performance? Maybe. Maybe not. If you use a fixed-width code, like an integer or char(3), updates to the codes won't affect the width of the row. And since the codes are shorter than the names, more rows might fit in a page. (It doesn't make any sense to use a code that longer than the name.) More rows per page usually means less I/O.
Less space, surely. Because you're storing a short code instead of a long name in every row of "MainTable".
For example, the average length of a country name is about 11.4 characters. If you used 3-character ISO country codes, you'd save an average of 8.4 bytes per row in "MainTable". For 100 million rows, you save about 840 million bytes. The size of that lookup table is negligible, about 6k.
And you don't usually need a join to get the full name; country codes are intended to be human-readable without expansion.

Reading from SQLite in Xcode iOS SDK

I have program with a table in SQLite with about 100 rows and 6 colunms of text (not more than hundred of characters). Each time click a button, the program will display in the view a contents of a row in the table.
My question is that should I copy the content of whole table into an array and then reading from array each time user clicks button or I access the table in database each time the user click the button? Which one is more efficient?
It all depends, but retrieving from the database as you need data (rather than storing the whole thing in an array) would generally be most efficient use of memory, which is a pretty precious resource. My most extravagant use of memory would be to store an array of table unique identifiers (i.e. a list of primary keys, returned in the correct order), that way I'm not scanning through the database every time, but my unique identifiers are always numeric, so it doesn't use up too much memory. So, for something of this size, I generally:
open the database;
load an array of solely the table's unique identifiers (which were returned in the right order for my tableview using the SQL ORDER BY clause);
as table cells are built, I'll go back to the database and get the few fields I need for that one row corresponding to the unique identifier that I've kept track of for that one table row;
when I go to the details view, I'll again get that one row from the database; and
when I'm all done, I'll close the database
This yields good performance while not imposing too much of a drain on memory. If the table was different (much larger or much smaller) I might suggest different approaches, but this seems reasonable to me given your description of your data.
for me - it was much easier to re-read the database for each view load, and drop the contents when done.
the overhead of keeping the contents in memory was just too much to offset the quick read of a small dataset.
100 row and 6 columns is not a lot of data. iOS is capable of handling larger data than that and very efficiently. So don't worry about creating a new array. Reading from the database should work just fine.

Resources