Why primary key ids are not in sequence? - ruby-on-rails

My last record's primary key was 552 & when I added a new record it's primary key allotted is 584.
I'm surprised & would like to know possible reasons for this behavior.
Application Details :
Server: Heroku hobby plan - dyno
Database: Heroku Postgres
Framework: Ruby on Rails
Additional Info -> I'm using rails admin panel to add new record

Possible reasons:
Some records were added+deleted
Inserting transaction was reverted for some reason, From postgres manual:
Note: Because smallserial, serial and bigserial are implemented using sequences, there may be "holes" or gaps in the sequence of values which appears in the column, even if no rows are ever deleted. A value allocated from the sequence is still "used up" even if a row containing that value is never successfully inserted into the table column. This may happen, for example, if the inserting transaction rolls back.
Corresponding sequence table_name_seq has increment more than 1 (probably not your case, sometimes is useful for sharding)

Related

What would cause Postgres to lose track of the next ID, and how could I fix it?

I mysteriously got an error in my Rails app locally:
PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "users_pkey"
DETAIL: Key (id)=(45) already exists.
The strange thing is that I didn't specify 45 as the ID. This number came from Postgres itself, which also then complained about it. I know this because when I tried it again I got the error with 46. The brute-force fix I used was to just repeat the insertion until it worked, therefore bringing Postgres' idea of the table's next available ID into line with reality.
500.times { User.create({employee_id: 1010101010101, blah_blah: "blah"}) rescue nil }
Since the employee_id has a unique constraint, any subsequent attempts to create the user after the first successful one would fail. And any previous to the first successful one would fail because Postgres tried to use an invalid id (primary key for the table).
So the brute-force approach works, but it's inelegant and it leaves me wondering what could have caused the database to get into this state. It also leaves me wondering how to check to see whether the production database is similarly inconsistent, and how to fix it (short of repeating the brute-force "fix").
Finding your Sequence
The first step to updating your sequence object is to figure out what the name of your sequence is. To find this, you can use the pg_get_serial_sequence() function.
SELECT pg_get_serial_sequence('table_name','id');
This will output something like public.person_id_seq, which is the relation name (regclass).
In Postgres 10+ there is also a pg_sequences view that you can use to find all sorts of information related to your sequences. The last_value column will show you the current value of the sequence:
SELECT * FROM pg_sequences;
Updating your Sequence
Once you have the sequence name, there are a few ways you can reset the sequence value:
1 - Use setval()
SELECT setval('public.person_id_seq',1020); -- Next value will be 1021
SELECT setval('public.person_id_seq',1020, False); -- Next value will be 1020
Source
2 - Use ALTER SEQUENCE (RESTART WITH)
ALTER SEQUENCE person_id_seq RESTART WITH 1030;
In this case, the value you provide (ex. 1030) will be the next value returned, so technically the sequence is being reset to <YOUR VALUE> - 1.
3 - Use ALTER SEQUENCE (START WITH, RESTART)
ALTER SEQUENCE person_id_seq START WITH 1030;
ALTER SEQUENCE person_id_seq RESTART;
Using this method is preferred if you need to repeatedly restart to a specific value. Subsequent calls to RESTART will reset the sequence to 1030 in this example.
This sort of thing happens when rows with specified IDs were inserted into the table. Since the IDs are specified, Postgres doesn't increment its sequence when inserting, and then the sequence becomes out of date with the data in the table. This could happen when manually inserted rows, or copying in rows from a CSV file, or replicating in rows, etc.
To avoid the issue, you simply need to let Postgres always handle the IDs, and never specify the ID yourself. However, if you've already messed up and need to fix the sequence, you can do so with the ALTER SEQUENCE command (using RESTART WITH or INCREMENT).

Sync SQLite3 database with iCloud

This question ha been asked a number of times before, but I have been unable to find a full answer. I need to store data in my app using an sqlite3 database and core data is not an option. I want to synchronise the data across devices using iCloud, and the best approach to this seems to be to send SQL transaction logs to iCloud and use them to keep device up to date. The process I've come up with so far is as follows:
All database altering queries (INSERT, UPDATE, DELETE) once executed are stored in a transactions array, each element of which contains the sql query and the timestamp it was carried out
The database contains a table for logging the point in the transactions array that the app last got to (including the filename of the transactions file stored on iCloud)
Transactions array saved to device-unique file on iCloud
When syncing:
Get array of transactions files from iCloud
Create empty array of transactions to be committed
For each file:
Check database for last position got to in transaction file
If none, start from beginning of file
Add each transaction from that point to the array of transactions to be committed
Update database with new last position of transaction file so the synced transactions are not repeated
Sort the array of transactions to be committed by transaction timestamp
Execute the commands in the array of transactions to be committed
I am confident that I can get this working in terms of pulling the data down to each device and carrying out the commands to update each local copy. The only problem I envisage is if two devices insert a record to the same table while both offline and then sync. For example:
Device 1 and device 2 both have synchronised copies of the database, with four records each in the table "table1"
Device 1 inserts value "foo" to table "table1" with PK 5
Device 2 inserts value "bar" to table "table1" with PK 5
Device 1 downloads transaction log for device 2 and inserts value "bar" to ID 6
Device 2 downloads transaction log for device 1 and inserts value "foo" to ID 6
We now have a situation where the primary keys for these records are inverted on each device, which will break links to tables which rely on the primary key for linking.
I'm still trying to research a solution to this but in the meantime if anybody has an suggestions I would be extremely grateful!
I've been thinking about this all day and I think I've come up with a solution. I'm posting it here to see if anyone has any comments, and I'll be having a go at implementing it tomorrow.
My idea is to do away with an auto-increment, integer primary key and replace it with a string-based key. The key will be generated from the device's UUID and the time stamp. This means the key is device specific and row unique. So, to rephrase the INSERT example from above using this method (using simplified key strings for ease of reading):
Device 1 and device 2 both have synchronised copies of the database, with four records each in the table "table1"
Device 1 inserts value "foo" to table "table1" with PK "abc123"
Device 2 inserts value "bar" to table "table1" with PK "def456"
Device 1 downloads transaction log for device 2 and inserts value "bar" to ID "def456"
Device 2 downloads transaction log for device 1 and inserts value "foo" to ID "abc123"
This works because the devices know that the transactions will result in he insertion of rows which are keyed using device specific values. There is therefore no danger of duplicate values in the keys column after the insert operation.
Any thoughts on this approach would be welcome!
UPDATE
I'm marking this as the correct answer as it worked. Here's how I modified my existing database (and app) to allow synchronisation across devices. Fortunately this is not a production app yet so the significant changes to the database which were required have not caused a problem.
Change all database tables to use a TEXT type column as their primary key
Add a "last index" table to the database which is keyed with the table name and has one other column to show the index number of the last row added to that table
Added a method to the app which generates a device- and row-unique TEXT key by retrieving the last inserted index for the table and incrementing it by one, and then adding to this the device's UUID.
When inserting any rows, the method described in (3) is invoked to obtain an appropriate key for inserting the record into the database
All database-modifying functions result in the SQL query being added to a transaction log, each entry of which is also date and time stamped and saved to a local file using the device's UUID as its filename
The device's local transaction log is then pushed up to iCloud
Syncing involves the following:
Download all of the transaction logs from iCloud
Ignoring the device's own transaction log, go through the list of SQL commands in each log and add them to an array of commands, starting from the index number finished at for that log on the last sync
Store the index number finished at for the log in the database so the same commands are not re-executed on the next sync
Sort the array of all SQL commands from all logs by date
Execute the commands in order
With regard to implementation of this process, I have got as far as step 5. None of the iCloud-specific stuff has been implemented yet. However, I have tested the process by manually copying the transaction logs between devices running copies of the app and I can confirm that the process works. The TEXT primary key containing the device's UUID ensures that no clashes occur. All devices can insert each other's data and the keys will always be unique.
The downside of this is that the database will be larger (because the keys are longer than an integer), and queries probably take longer as there is lots of string comparison involved. However, the database I am using is relatively small and there are just a few queries per user interaction so I do not anticipate a problem here.
I hope this is useful to other who come along with the same question :)

Most effective, secure way to delete Postgres column data

I have a column in my Postgres table that I want to remove for expired rows. What's the best way to do this securely? It's my understanding that simply writing 0's for those columns is ineffective because Postgres creates a new row upon Updates and marks the old row as dead.
Is the best way to set the column to null and manually vacuum to clean up the old records?
I will first say that it is bad practice to alter data like this - you are changing history. Also the below is only ONE way to do this (a quick and dirty way and not to be recommended):
1 Backup your database first.
2 Open PgAdmin, select the database, open the Query Editor and run a query.
3 It would be something like this
UPDATE <table_name> SET <column_name>=<new value (eg null)>
WHERE <record is dead>
The WHERE part is for you to figure out based on you are identifying which rows are dead (eg. is_removed=true, is_deleted=true are common for identifying soft deleted records).
Obviously you would have to run this script regularly. The better way would be to update your application to do this job instead.

Deletion of rows from Informix Database

I have around 3 Million rows in a Table in Informix DB.
We have to delete it, before loading new data.
It has a primary key on one of its columns.
For deleting the same, I thought of going with rowid usage. But when I tried
select rowid from table
it responded with -857 error [Rowid does not exist].
So, I am not sure, how to go with the deletion. I prefer not going with primary key, as deletion with primary key is costly compared with rowid deletion.
Any suggestion on the above would be helpful.
If you get error -857, the chances are that the table is fragmented, and was created without the WITH ROWIDS option.
Which version of Informix are you using, and on which platform?
The chances are high that you have the TRUNCATE TABLE statement, which is designed to drop all the rows from a table very quickly indeed.
Failing that, you can use a straight-forward:
DELETE FROM TableName;
as long as you have sufficient logical log space available. If that won't work, then you'll need to do repeated DELETE statements based on ranges of the primary key (or any other convenient column).
Or you could consider dropping the table and then creating it afresh, possible with the WITH ROWIDS clause (though I would not particularly recommend using the WITH ROWIDS clause - it becomes a physical column with index instead of being a virtual column as it is in a non-fragmented table). One of the downsides of dropping and rebuilding a table is that the referential constraints have to be reinstated, and any views built on the table are automatically dropped when the table is dropped, so they have to be reinstated too.
I'm assuming this is IDS?.. How many new rows will be loaded and how often is this process repeated?.. Despite having to re-establish referential constraints and views, in my opinion, it is much better to drop the table, create it from scratch, load the data and then create the indexes because if you just delete all the rows, the deleted rows still remain physically in the table with a NULL \0 flag at the end of the row, thus the table size will be even larger when loading in the new rows and performance will suffer!.. It's also a good opportunity to create fresh indexes, and if possible, pre-sort the load data so that its in the most desirable order (like when creating a CLUSTERED INDEX). If you're going to fragment your tables on expressions or other type, then ROWID's go out the window, but use WITH ROWIDS if you're sure the table will never be fragmented. If your table has a serial column, are there any other tables using the serial columns as a foreign key?

Can one rely on the auto-incrementing primary key in your database?

In my present Rails application, I am resolving scheduling conflicts by sorting the models by the "created_at" field. However, I realized that when inserting multiple models from a form that allows this, all of the created_at times are exactly the same!
This is more a question of best programming practices: Can your application rely on your ID column in your database to increment greater and greater with each INSERT to get their order of creation? To put it another way, can I sort a group of rows I pull out of my database by their ID column and be assured this is an accurate sort based on creation order? And is this a good practice in my application?
The generated identification numbers will be unique.
Regardless of whether you use Sequences, like in PostgreSQL and Oracle or if you use another mechanism like auto-increment of MySQL.
However, Sequences are most often acquired in bulks of, for example 20 numbers.
So with PostgreSQL you can not determine which field was inserted first. There might even be gaps in the id's of inserted records.
Therefore you shouldn't use a generated id field for a task like that in order to not rely on database implementation details.
Generating a created or updated field during command execution is much better for sorting by creation-, or update-time later on.
For example:
INSERT INTO A (data, created) VALUES (smething, DATE())
UPDATE A SET data=something, updated=DATE()
That depends on your database vendor.
MySQL I believe absolutely orders auto increment keys. SQL Server I don't know for sure that it does or not but I believe that it does.
Where you'll run into problems is with databases that don't support this functionality, most notably Oracle that uses sequences, which are roughly but not absolutely ordered.
An alternative might be to go for created time and then ID.
I believe the answer to your question is yes...if I read between the lines, I think you are concerned that the system may re-use ID's numbers that are 'missing' in the sequence, and therefore if you had used 1,2,3,5,6,7 as ID numbers, in all the implementations I know of, the next ID number will always be 8 (or possibly higher), but I don't know of any DB that would try and figure out that record Id #4 is missing, so attempt to re-use that ID number.
Though I am most familiar with SQL Server, I don't know why any vendor who try and fill the gaps in a sequence - think of the overhead of keeping that list of unused ID's, as opposed to just always keeping track of the last I number used, and adding 1.
I'd say you could safely rely on the next ID assigned number always being higher than the last - not just unique.
Yes the id will be unique and no, you can not and should not rely on it for sorting - it is there to guarantee row uniqueness only. The best approach is, as emktas indicated, to use a separate "updated" or "created" field for just this information.
For setting the creation time, you can just use a default value like this
CREATE TABLE foo (
id INTEGER UNSIGNED AUTO_INCREMENT NOT NULL;
created TIMESTAMP NOT NULL DEFAULT NOW();
updated TIMESTAMP;
PRIMARY KEY(id);
) engine=InnoDB; ## whatever :P
Now, that takes care of creation time. with update time I would suggest an AFTER UPDATE trigger like this one (of course you can do it in a separate query, but the trigger, in my opinion, is a better solution - more transparent):
DELIMITER $$
CREATE TRIGGER foo_a_upd AFTER UPDATE ON foo
FOR EACH ROW BEGIN
SET NEW.updated = NOW();
END;
$$
DELIMITER ;
And that should do it.
EDIT:
Woe is me. Foolishly I've not specified, that this is for mysql, there might be some differences in the function names (namely, 'NOW') and other subtle itty-bitty.
One caveat to EJB's answer:
SQL does not give any guarantee of ordering if you don't specify an order by column. E.g. if you delete some early rows, then insert 'em, the new ones may end up living in the same place in the db the old ones did (albeit with new IDs), and that's what it may use as its default sort.
FWIW, I typically use order by ID as an effective version of order by created_at. It's cheaper in that it doesn't require adding an index to a datetime field (which is bigger and therefore slower than a simple integer primary key index), guaranteed to be different, and I don't really care if a few rows that were added at about the same time sort in some slightly different order.
This is probably DB engine depended. I would check how your DB implements sequences and if there are no documented problems then I would decide to rely on ID.
E.g. Postgresql sequence is OK unless you play with the sequence cache parameters.
There is a possibility that other programmer will manually create or copy records from different DB with wrong ID column. However I would simplify the problem. Do not bother with low probability cases where someone will manually destroy data integrity. You cannot protect against everything.
My advice is to rely on sequence generated IDs and move your project forward.
In theory yes the highest id number is the last created. Remember though that databases do have the ability to temporaily turn off the insert of the autogenerated value , insert some records manaully and then turn it back on. These inserts are no typically used on a production system but can happen occasionally when moving a large chunk of data from another system.

Resources