Amazon SimpleDB Identity Seed equivalent - amazon-simpledb

Is there an equivalent to an identity Seed in SimpleDB?
If the answer is no, how do you handle creating something like a customer number or order number that will prevent the creation duplicate numbers?
My experience is mainly from SQL Server in which I would either create a primary key with an identity seed or use transactions in a stored procedure to increment the number.
Thanks for your help!

You can create unique keys using conditional writes. Just do a PutAttributes with the next customer number you want to use and the data you want to store. You can't add a condition for the actual item name, but you can use an attribute that always exists, (like creation date or user group).
Set the conditions:
Expected.1.Name=creation_date
Expected.1.Exists=false
The call will succeed only if there is no creation_date in an item with that item name. If you always write the creation_date, then you get the effect of optimistic locking on the new item name. Of course you can use any attribute you want, so long you always include it in that first conditional put.
The performance of the conditional write is the same as a normal write in most situations but when SimpleDB is under heavy load or high internal network latencies, these calls will take longer, compared to normal writes. During rare failure scenarios inside SimpleDB, the conditional writes will fail completely for a period of time.
If you can't tolerate this, you will have to code some sort of alternate way to get your unique keys during outages. A different SimpleDB region could be used for key generation only, since SimpleDB will still accept the normal writes (non-conditional PutAttributes) during outages.

If you don't already have something unique that will work, using a GUID for the Item is probably the typical solution.

Related

Implementing a unique surrogate key in Advantage Database Server

I've recently taken over support of a system which uses Advantage Database Server as its back end. For some background, I have years of database experience but have never used ADS until now, so my question is purely about how to implement a standard pattern in this specific DBMS.
There's a stored procedure which has been previously developed which manages an ID column in this manner:
#ID = (SELECT ISNULL(MAX(ID), 0) FROM ExampleTable);
#ID = #ID + 1;
INSERT INTO Example_Table (ID, OtherStuff)
VALUES (#ID, 'Things');
--Do some other stuff.
UPDATE ExampleTable
SET AnotherColumn = 'FOO'
WHERE ID = #ID;
My problem is that I now need to run this stored procedure multiple times in parallel. As you can imagine, when I do this, the same ID value is getting grabbed multiple times.
What I need is a way to consistently create a unique value which I can be sure will be unique even if I run the stored procedure multiple times at the same moment. In SQL Server I could create an IDENTITY column called ID, and then do the following:
INSERT INTO ExampleTable (OtherStuff)
VALUES ('Things');
SET #ID = SCOPE_IDENTITY();
ADS has autoinc which seems similar, but I can't find anything conclusively telling me how to return the value of the newly created value in a way that I can be 100% sure will be correct under concurrent usage. The ADS Developer's Guide actually warns me against using autoinc, and the online help files offer functions which seem to retrieve the last generated autoinc ID (which isn't what I want - I want the one created by the previous statement, not the last one created across all sessions). The help files also list these functions with a caveat that they might not work correctly in situations involving concurrency.
How can I implement this in ADS? Should I use autoinc, some other built-in method that I'm unaware of, or do I genuinely need to do as the developer's guide suggests, and generate my unique identifiers before trying to insert into the table in the first place? If I should use autoinc, how can I obtain the value that has just been inserted into the table?
You use LastAutoInc(STATEMENT) with autoinc.
From the documentation (under Advantage SQL->Supported SQL Grammar->Supported Scalar Functions->Miscellaneous):
LASTAUTOINC(CONNECTION|STATEMENT)
Returns the last used autoinc value from an insert or append. Specifying CONNECTION will return the last used value for the entire connection. Specifying STATEMENT returns the last used value for only the current SQL statement. If no autoinc value has been updated yet, a NULL value is returned.
Note: Triggers that operate on tables with autoinc fields may affect the last autoinc value.
Note: SQL script triggers run on their own SQL statement. Therefore, calling LASTAUTOINC(STATEMENT) inside a SQL script trigger would return the lastautoinc value used by the trigger's SQL statement, not the original SQL statement which caused the trigger to fire. To obtain the last original SQL statement's lastautoinc value, use LASTAUTOINC(CONNECTION) instead.
Example: SELECT LASTAUTOINC(STATEMENT) FROM System.Iota
Another option is to use GUIDs.
(I wasn't sure but you may have already been alluding to this when you say "or do I genuinely need to do as the developer's guide suggests, and generate my unique identifiers before trying to insert into the table in the first place." - apologies if so, but still this info might be useful for others :) )
The use of GUIDs as a surrogate key allows either the application or the database to create a unique identifier, with a guarantee of no clashes.
Advantage 12 has built-in support for a GUID datatype:
GUID and 64-bit Integer Field Types
Advantage server and clients now support GUID and Long Integer (64-bit) data types in all table formats. The 64-bit integer type can be used to store integer values between -9,223,372,036,854,775,807 and 9,223,372,036,854,775,807 with no loss of precision. The GUID (Global Unique Identifier) field type is a 16-byte data structure. A new scalar function NewID() is available in the expression engine and SQL engine to generate new GUID. See ADT Field Types and Specifications and DBF Field Types and Specifications for more information.
http://scn.sap.com/docs/DOC-68484
For earlier versions, you could store the GUIDs as a char(36). (Think about your performance requirements here of course.) You will then need to do some conversion back and forth in your application layer between GUIDs and strings. If you're using some intermediary data access layer, e.g. NHibernate or Entity Framework, you should be able to at least localise the conversions to one place.
If some part of your logic is in a stored procedure, you should be able to use the newid() or newidstring() function, depending on the type of the backing column:
INSERT INTO Example_Table (newid(), OtherStuff)

Is it bad to change _id type in MongoDB to integer?

MongoDB uses ObjectId type for _id.
Will it be bad if I make _id an incrementing integer?
(With this gem, if you're interested)
No it isn't bad at all and in fact the built in ObjectId is quite sizeable within the index so if you believe you have something better then you are more than welcome to change the default value of the _id field to whatever.
But, and this is a big but, there are some considerations when deciding to move away from the default formulated ObjectId, especially when using the auto incrementing _ids as shown here: https://docs.mongodb.com/v3.0/tutorial/create-an-auto-incrementing-field
Multi threading isn't such a big problem because findAndModify and the atomic locks can actually take care of that, but then you just hit into your first problem. findAndModify is not the fastest function nor the lightest and there have been significant performance drops noticed when using it regularly.
You also have to consider the overhead of doing this yourself anyway, even without findAndModify. For every insert you will need an extra query. Imagine having a unique id that you have to query the uniqueness of every time you want to insert. Eventually your insert rate will drop to a crawl and your lock time will build up.
Of course the ObjectId is really good at being unique without having to check or formulate its own uniqueness by touching the database prior to insertion, hence it doesn't have this overhead.
If you still feel an integer _id suites your scenario, then go for it, but bare in mind the overhead described above.
You can do it, but you are responsible to make sure that the integers are unique.
MongoDB doesn't support auto-increment fields like most SQL databases. When you have a distributed or multithreaded application which has multiple processes and/or threads which create new database entries, you have to make sure that they use the same counter. Otherwise it could happen that two threads try to store a document with the same _id in the database.
When that happens, one of them will fail. That means you have to wait for the database to return a success or error (by calling GetLastError or by setting the write concerns to acknowledged), which takes longer than just sending data in a fire-and-forget manner.
I had a use case for this: replacing _id with a 64 bit integer that represented a simhash of a document index for searching.
Since I intended to "Get or create", providing the initial simhash, and creating a new record if one didn't exist was perfect. Also, for anyone Googling, MongoDB support explained to me that simhashes are absolutely perfect for sharding and scaling, and even better than the more generic ObjectId, because they will divide up the data across shards perfectly and intrinsically, and you get the key stored for negative space (a uint64 is much smaller than an objectId and would need to be stored anyway).
Also, for you Googlers, replacing a MongoDB _id with something other than an objectId is absolutely simple: Just create an object with the _id being defined; use an integer if you like. That's it: Mongo will simply use it. If you try to create a document with the same _id you'll get an error (E11000/Duplicate key). So like me, if you're using simhashing, this is ideal in all respects.

Trimming BOLD_CLOCKLOG table

I am doing some maintenance on a database for an application that uses the Bold for Delphi object persistence framework. This database has been been in production for several years and several of the tables have grown quite large. One of them is the BOLD_CLOCKLOG which has something to do with Bold's transaction management.
I want to trim this table (it is up to 1.2GB, with entries from Jan 2006).
Can anyone confirm the system does not need this old information?
From the bolds documentation:
BOLD_CLOCKLOG
To be able to map the transaction numbers used in the TimeStamp columns to the corresponding physical time (such as 2001-01-01 12:34) the persistence mapper will store a log with timestamps and times. Normally, this log is written for each database operation, but if the traffic to the database is very intensive, it is possible to restrict how often this log is written by setting the property ClockLogGranularity. The event OnGetCurrentTime should also be implemented to ensure that all clients have the same time.The usage of this table can be controlled with the tagged value: Model.UseClockLog
So I believe this is used for versioning Boldobjects, see Object Versioning Extension in bolds documentation. If your application don't need this you can drop this in the database.
In our Bold application we don't use that feature. Why don't simply test to turn off Bold_ClockLog in the model, drop that big table and try to use your application. I'm pretty sure if something is wrong then it say so at once.
I can also mention that we have an own custom objecthistoy. It is simply big string (as TStringList.DelimetedText) in a ObjectHistory class that have Time, user and a note about action. This suits our need better that Bolds builtin objecthistory. The disadvantage is of course that we need to add calls in the code when logging to history is done.
Bold_ClockLog is an optional table, it's purpose is to store mapping between integer timestamps and corresponding DateTime values.
This allows you to find out datetime of the last modification to any object.
If you don't need this feature feel free to empty the table, it won't cause any problems.
In addition to Bold_ClockLog, the Bold_XFiles is another optional table that tends to grow large. But unlike the Bold_ClockLog the Bold_XFiles can not be emptied.
Both of these tables can be turned on/off in the model tag values.

Rails 3: Compare unique codes

What's the best way to guarantee that a code is unique? The code is XXX-XXXXX where X is a number only.
What way other than search for the code in a database table there is to make the process faster and cleaner?
Regards.
Normal approach is to use :uniqueness validation. That handles db searching.
More bulletproof is to use 1) + unique index on that field. If the saving fails without validation errors, you could generate a new code and try again.
Since no two times are the same, using some kind of hash based on time is the easiest way to guarantee uniqueness. If you are storing xxx-xxxx though, you are limiting yourself. You may also use a unique auto-incrementing value. Store the value server-side for the next number to be assigned and then increment it whenever you issue a new unique id.
both are acceptable options without knowing additional information
A hash based on time is actually not 'guarateed' to be unique. Using some type of hash is just a way to create a digest from a large source data. Since all data can then be described in 128bits (using md5) then its possible to encounter hash collisions.
The validates :uniquness will do a query to determine if the fields value has been used before. You can use this but it should not be your only solution. If the field is intended to be unique, you should place a unique index on the column in the database. If you only rely on the rails validation, you are running the risk of a race condition on data insertion into the table. I can bypass the validation, but another write could have also passed the validation and both end up getting into the table.
Are you generating the value or is it user input?

Can one rely on the auto-incrementing primary key in your database?

In my present Rails application, I am resolving scheduling conflicts by sorting the models by the "created_at" field. However, I realized that when inserting multiple models from a form that allows this, all of the created_at times are exactly the same!
This is more a question of best programming practices: Can your application rely on your ID column in your database to increment greater and greater with each INSERT to get their order of creation? To put it another way, can I sort a group of rows I pull out of my database by their ID column and be assured this is an accurate sort based on creation order? And is this a good practice in my application?
The generated identification numbers will be unique.
Regardless of whether you use Sequences, like in PostgreSQL and Oracle or if you use another mechanism like auto-increment of MySQL.
However, Sequences are most often acquired in bulks of, for example 20 numbers.
So with PostgreSQL you can not determine which field was inserted first. There might even be gaps in the id's of inserted records.
Therefore you shouldn't use a generated id field for a task like that in order to not rely on database implementation details.
Generating a created or updated field during command execution is much better for sorting by creation-, or update-time later on.
For example:
INSERT INTO A (data, created) VALUES (smething, DATE())
UPDATE A SET data=something, updated=DATE()
That depends on your database vendor.
MySQL I believe absolutely orders auto increment keys. SQL Server I don't know for sure that it does or not but I believe that it does.
Where you'll run into problems is with databases that don't support this functionality, most notably Oracle that uses sequences, which are roughly but not absolutely ordered.
An alternative might be to go for created time and then ID.
I believe the answer to your question is yes...if I read between the lines, I think you are concerned that the system may re-use ID's numbers that are 'missing' in the sequence, and therefore if you had used 1,2,3,5,6,7 as ID numbers, in all the implementations I know of, the next ID number will always be 8 (or possibly higher), but I don't know of any DB that would try and figure out that record Id #4 is missing, so attempt to re-use that ID number.
Though I am most familiar with SQL Server, I don't know why any vendor who try and fill the gaps in a sequence - think of the overhead of keeping that list of unused ID's, as opposed to just always keeping track of the last I number used, and adding 1.
I'd say you could safely rely on the next ID assigned number always being higher than the last - not just unique.
Yes the id will be unique and no, you can not and should not rely on it for sorting - it is there to guarantee row uniqueness only. The best approach is, as emktas indicated, to use a separate "updated" or "created" field for just this information.
For setting the creation time, you can just use a default value like this
CREATE TABLE foo (
id INTEGER UNSIGNED AUTO_INCREMENT NOT NULL;
created TIMESTAMP NOT NULL DEFAULT NOW();
updated TIMESTAMP;
PRIMARY KEY(id);
) engine=InnoDB; ## whatever :P
Now, that takes care of creation time. with update time I would suggest an AFTER UPDATE trigger like this one (of course you can do it in a separate query, but the trigger, in my opinion, is a better solution - more transparent):
DELIMITER $$
CREATE TRIGGER foo_a_upd AFTER UPDATE ON foo
FOR EACH ROW BEGIN
SET NEW.updated = NOW();
END;
$$
DELIMITER ;
And that should do it.
EDIT:
Woe is me. Foolishly I've not specified, that this is for mysql, there might be some differences in the function names (namely, 'NOW') and other subtle itty-bitty.
One caveat to EJB's answer:
SQL does not give any guarantee of ordering if you don't specify an order by column. E.g. if you delete some early rows, then insert 'em, the new ones may end up living in the same place in the db the old ones did (albeit with new IDs), and that's what it may use as its default sort.
FWIW, I typically use order by ID as an effective version of order by created_at. It's cheaper in that it doesn't require adding an index to a datetime field (which is bigger and therefore slower than a simple integer primary key index), guaranteed to be different, and I don't really care if a few rows that were added at about the same time sort in some slightly different order.
This is probably DB engine depended. I would check how your DB implements sequences and if there are no documented problems then I would decide to rely on ID.
E.g. Postgresql sequence is OK unless you play with the sequence cache parameters.
There is a possibility that other programmer will manually create or copy records from different DB with wrong ID column. However I would simplify the problem. Do not bother with low probability cases where someone will manually destroy data integrity. You cannot protect against everything.
My advice is to rely on sequence generated IDs and move your project forward.
In theory yes the highest id number is the last created. Remember though that databases do have the ability to temporaily turn off the insert of the autogenerated value , insert some records manaully and then turn it back on. These inserts are no typically used on a production system but can happen occasionally when moving a large chunk of data from another system.

Resources