How to avoid Sybase 15 IDENTITY gaps in a table? - field

I have Sybase Table and a Field (Id) with Numeric datatype. The field is also an IDENTITY field.
Data in the table at times get deleted and all of a sudden the identity Field (Id) gets a big gap. Reading through articles, I come to know it's because of chunks of memory allocation for the identity field that causes this issue and quite common in Sybase.
I need to avoid such large gaps as it is absurd. Any help please?

You can change the identity gap for a table:
sp_chgattribute 'nameOfYouTable', 'identity_gap', 100
To see the value of the identity gap for a table:
sp_help nameOfYouTable
It is in the 'Result set 9', column: identity_gap

Related

Does setting column default values hurt in terms of DB space?

This is a quick question. I was curious as to whether setting the default value for a column in Rails+PostgreSQL costs additional space (versus leaving the default as nil).
I am also curious if that naturally leads to concluding whether creating an empty record using rails (full of nils) takes the same space as a row full of data would.
I would suppose this highly depends on the DB implementation, but I am only interested in PostgreSQL databases (9.1+). But I got lost when I was searching for implementation details specific to that.
Setting the default is just an attribute in the the catalog table pg_attributes.
But actual values in table rows generally cost additional disk space as compared to NULL. NULL values are stored in a special structure called "NULL bitmap", and require a single bit per column. Therefore, a row with NULL values typically takes less space on disk than a row with (dummy) values.
It's actually more complicated. Find details and links in these related answers:
Do nullable columns occupy additional space in PostgreSQL?
Calculating and saving space in PostgreSQL

Table Normalization with no Domain values

There is a debate between our ETL team and a Data Modeler on whether a table should be normalized or not, and I was hoping to get some perspective from the online community.
Currently the tables are set up as such
MainTable LookupTable
PrimaryKey (PK) Code (PK)
Code (FK) Name
OtherColumns
Both tables are only being populated by a periodic file (from a 3rd party)
through an ETL job
A single record in the file contains all attributes in both tables for a single row)
The file populating these tables is a delta (only rows with some change in them are in the file)
One change to one attribute for one record (again only by the 3rd party) will result in all the data for that record in the file
The Domain Values for Code and Name are
not known.
Question:Should the LookupTable be denormalized into MainTable.
ETL team: Yes. With this setup, every row from the file will first have to check the 2nd table to see if their FK is in there (insert if it is not), then add the MainTable row. More Code, Worse Performance, and yes slightly more space. However ,regardless of a change to a LookupTable.Name from a 3rd party, the periodic file will reflect every row affected, and we will still have to parse through each row. If lumped into MainTable, all it is, is a simple update or insert.
Data Modeler: This is standard good database design.
Any thoughts?
Build prototypes. Make measurements.
You started with this, which your data modeler says is a standard good database design.
MainTable LookupTable
PrimaryKey (PK) Code (PK)
Code (FK) Name
OtherColumns
He's right. But this, too, is a good database design.
MainTable
PrimaryKey (PK)
Name
OtherColumns
If all updates to these tables come only from the ETL job, you don't need to be terribly concerned about enforcing data integrity through foreign keys. The ETL job would add new names to the lookup table anyway, regardless of what their values happen to be. Data integrity depends mainly on the system the data is extracted from. (And the quality of the ETL job.)
With this setup, every row from the file will first have to check the
2nd table to see if their FK is in there (insert if it is not), then
add the MainTable row.
If they're doing row-by-row processing, hire new ETL guys. Seriously.
More Code, Worse Performance, and yes slightly more space.
They'll need a little more code to update two tables instead of one. How long does it take to write the SQL statements? How long to run them? (How long each way?)
Worse performance? Maybe. Maybe not. If you use a fixed-width code, like an integer or char(3), updates to the codes won't affect the width of the row. And since the codes are shorter than the names, more rows might fit in a page. (It doesn't make any sense to use a code that longer than the name.) More rows per page usually means less I/O.
Less space, surely. Because you're storing a short code instead of a long name in every row of "MainTable".
For example, the average length of a country name is about 11.4 characters. If you used 3-character ISO country codes, you'd save an average of 8.4 bytes per row in "MainTable". For 100 million rows, you save about 840 million bytes. The size of that lookup table is negligible, about 6k.
And you don't usually need a join to get the full name; country codes are intended to be human-readable without expansion.

nhibernate guid performance

What would be the overhead of using GUID's instead of an integer identity in nHibernate for table primary keys?
The main reason for use would be the obfuscation of table ids on user facing data.
Impact on the day to day level for operations like GetById, Find first 50 filtered records, Delete, Update... zero. There is no measurable difference at all.
What will be effected a lot is the Database - storage. Integer is stored in 4 bytes, Guid consumes 16. So on every million rows you will need 12 MB more of storage. In case that such an entity references other(s) it is another 12 MB per every referenced table.
And when such columns are used for indexes... spaces could grow more and more.
So on a larger scale, SQL server will have to scan larger piece of data.
For example MS SQL Server is really optimized for int types. (including bigint, shortint, tinyint).
BUT, to be honest, we were using both approaches, and performance issue was always elsewhere...
I would vote for primary keys based on int every time - if possible.
Rather later change the MVC routing from controller\action\id to something else, but keep ids int. You can introduce GUID as a property (consuming some space but only ones, not in referenced tables) and change routing to controller\actin\uniqueGuid and on your business layer you will call GetByGuid if really needed to obfuscate users.

Assign Key Field Value Only If Corresponding Lookup Result value Exist

I have ten master tables and one Transaction table. In my transaction table (it is a memory table just like ClientDataSet) there are ten lookup fields pointing to my ten master tables.
Now i am trying to dynamically assigning key field values to all my lookup key field values (of the transaction table) from a different Server(data is coming as a soap xml). Before assigning these values i need to check whether the corresponding result value is valid in master tables or not. I am using a filter (eg status = 1 ) to check whether it is valid or not.
Currently how we are doing is, before assigning each key field value we are filtering the master tables using this filter and using the locate function to check whether it is there or not. and if located we will assign its key field value.
This will work fine if there is only few records in my master tables. Consider my master tables having fifty thousand records each (yeah, customer is having so much data), this will lead to big performance issue.
Could you please help me to handle this situation.
Thanks
Basil
The only way to know if it is slow, why, where, and what solution works best is to profile.
Don't make a priori assumptions.
That being said, minimizing round trips to the server and the amount of data transferred is often a good thing to try.
For instance, if your master tables are on the server (not 100% clear from your question), sending only 1 Query (or stored proc call) passing all the values to check at once as parameters and doing a bunch of "IF EXISTS..." and returning all the answers at once (either output params or a 1 record dataset) would be a good start.
And 50,000 records is not much, so, as I said initially, you may not even have a performance problem. Check it first!

Delphi TClientDataSet, maximum number of fields per index

I have a simple Delphi (2007) procedure that given a TDataSet and a (sub)list of fields returns a new TClientDataSet with the distinct values from the given TDataSet.
This works quite well.
In my proc I used the TClientDataSet index to populate the distinct values.
It was fast and easy.
The problem is that TClientDataSet index support at maximum 16 fields.
If you add more of them they will be silently ignored.
I need more than 16 fields in the dataset (and thus in the index).
Is there any solution? Some hack?
Maybe some open source library to use as workaround?
I'm working offline so I must do it in memory. The size of the dataset is not huge
If you're needing to get distinct occurrences of records across more than 16 fields and you want to use an index to keep things fast you'll need to consider concatenating some of those fields. For example:
Test Field Field 1 Field 2 Field 3 Field 4
Apple~Banana~Carrot~Donut Apple Banana Carrot Donut
Create you index on the Test Field.
You might need to create multiple test fields if the total length of your other fields exceeds the maximum length of a text field.
You could swap out the TClientDataSet for a TjvCsvDataset from JVCL. It can be used as a pure "in memory dataset" replacement for Client Data Sets, without any need to read or write any CSV files on disk.
It is not quite like Client Data Set in design. I am not sure what benefit all these "Indexes" in a client data set offer you, other than that you can't have a field without an index definition, but in the case that this is all you need, you can set the TJvCsvDataSet.FieldDef property = 'Field1,Field2,.....FieldN' and then open the dataset and add as many rows as you like to the dataset. It is practically limited to the amount of memory you can address in a 32 bit process.

Resources