Erlang ets:insert_new for bag - erlang

In my code I want to take advantage of ETS's bag type that can store multiple values for single key. However, it would be very useful to know if insertion actually inserts a new value or not (i.e. if the inserted key with value was or was not present in the bag).
With type set of ETS I could use ets:insert_new, but semantics is different for bag (emphasis mine):
This function works exactly like insert/2, with the exception that instead of overwriting objects with the same key (in the case of set or ordered_set) or adding more objects with keys already existing in the table (in the case of bag and duplicate_bag), it simply returns false.
Is there a way to achieve such functionality with one call? I understand it can be achieved by a lookup followed by an optional insert, but I am afraid it might hurt performance of concurrent access.

Related

Performance of a primary key lookup in Realm?

I've recently done some benchmarking, and it seems like looking up another object by primary key:
let foo = realm.object(ofType: Bar.self, forPrimaryKey: id)
is more efficient (and in this specific case more readable), than trying to set the property directly as:
class Other: Object {
#objc dynamic var relation: Bar? = nil
let list = List<Bar>()
}
My benchmarking wasn't too thorough though (used only one element in the list, etc.) and I'm wondering if this is actually the case.
Intuition makes me think primary key lookup AND using the relation property above would be O(1) or O(logn). With 1,000,000 records and 1,000,000 lookups:
primary key: ~10s
relation property: ~12s
list property: ~14s
In summary: what is the performance of Realm's object(ofType:forPrimaryKey:) lookup?
Extra credit: when is it beneficial to use LinkingObjects, Lists, etc.? Assuming it's just a readability / convenience wrapper of some sort. In my case it has been more messy / bug prone, so I'm assuming I'm not using Realm in the way it was intended.
Realm isn't a relational database like SQLite. Instead, data is stored in B+ trees. All the data for a given property on a given model type is stored within a single tree, and all data retrieval (whether getting a property value or a linked object) involves traversing such a tree.
Furthermore, when a Realm is opened, the contents of the entire database file are mmaped into memory. When you use one of the Realm SDKs, the objects you create (e.g. Object instances) are actually thin wrappers that store a conceptual pointer to a location in the database file and provide methods to directly read from and write to the object at that location. Likewise, relationships (such as object properties on a model) are references to nodes elsewhere in the tree.
This means that retrieving an object requires the time it takes to traverse the database data structures to locate the required information, plus the time it takes to instantiate an object and initialize it. The latter is effectively a constant-time operation, so we want to look primarily at the former.
As for the situations you've outlined...
If you already know your primary key value, getting an object takes O(log n) time, where n is the number of objects of that particular type in the database. (The time it takes to retrieve a Dog is irrespective of the number of Cats the database contains.)
If you're naively implementing a relational-style foreign key pattern, where you model a link to an object of type U by storing a primary key value (like a string) on some object of type T, it will take O(log t) time to retrieve the primary key value (where t is the number of Ts), and O(log u) time to look up the destination object (as described in the previous bullet point; u = the number of Us).
If you're using an object property on your model type T to model a link to another object, it takes O(log t) time to retrieve the location of the destination object.
Using a list introduces another level of indirection, so retrieving the single object from a one-object list will be slower than retrieving an object directly from an object property.
Object, list, and linking objects properties are not intended to be an alternative to looking up objects via primary keys. Rather, they are intended to model many-to-one, many-to-many, and inverse relationships, respectively. For example, a Cat may have a single Owner, so it makes sense for a Cat model to have a object property pointing to its Owner. A Person may have multiple friends, so it makes sense for a Person model to have a list property containing all their friends (which may contain zero, one, or many other Persons).
Finally, if you're interested in learning more, the entire database stack is open source (except for the sync component, which is a strictly optional peripheral component). You can find the code for the core database engine here. We also have an older article that discusses the high-level design of the database engine; you can find that here.

Sorting Realm records that are inserted quickly

Sometimes my app will add many Realm records at once.I need to be able to consistently keep them in the same order.
The documentation recommends that I use NSDate:
Another common motivation for auto-incrementing properties is to preserve order of insertion. In some situations, this can be accomplished by appending objects to a List or by using a createdAt property with a default value of NSDate().
However, since records are added so quickly sometimes, the dates are not always unique, especially considering Realm stores NSDate only to the second accuracy.
Is there something I'm missing about the suggestion in the documentation?Maybe the documentation wasn't considering records added in quick succession? If so, would it be recommended to keep an Int position property and to always query for the last record at the moment when adding a new record, so as to ensure sequential positions?However, querying for the last record in such a case won't return the previous record unless you've also added and finalized a write, which is wasteful if you need to add a lot of records.Then, it would require batch create logic, which is unfortunate.
However, since records are added so quickly sometimes, the dates are not always unique, especially considering Realm stores NSDate only to the second accuracy.
The limitation on date precision was addressed back in Realm v0.101. Realm can now represent dates with greater precision than NSDate.
However, querying for the last record in such a case won't return the previous record unless you've also added and finalized a write, which is wasteful if you need to add a lot of records.
It's not necessary to commit a write transaction for queries on the same thread to see data that you've added during the write transaction.
Is there something I'm missing about the suggestion in the documentation?
You skipped over the first suggestion: appending objects to a List. Lists in Realm are inherently ordered, so you do not need to find a way to create unique, ordered values. Simply append the new object to the list, and rely on the list's order to determine the order in which the objects were added. This also has the advantage of being safe when using Realm Mobile Platform's synchronization features, as incrementing fields can generate duplicates on different devices and timestamps may not be reliable.

Implementing a unique surrogate key in Advantage Database Server

I've recently taken over support of a system which uses Advantage Database Server as its back end. For some background, I have years of database experience but have never used ADS until now, so my question is purely about how to implement a standard pattern in this specific DBMS.
There's a stored procedure which has been previously developed which manages an ID column in this manner:
#ID = (SELECT ISNULL(MAX(ID), 0) FROM ExampleTable);
#ID = #ID + 1;
INSERT INTO Example_Table (ID, OtherStuff)
VALUES (#ID, 'Things');
--Do some other stuff.
UPDATE ExampleTable
SET AnotherColumn = 'FOO'
WHERE ID = #ID;
My problem is that I now need to run this stored procedure multiple times in parallel. As you can imagine, when I do this, the same ID value is getting grabbed multiple times.
What I need is a way to consistently create a unique value which I can be sure will be unique even if I run the stored procedure multiple times at the same moment. In SQL Server I could create an IDENTITY column called ID, and then do the following:
INSERT INTO ExampleTable (OtherStuff)
VALUES ('Things');
SET #ID = SCOPE_IDENTITY();
ADS has autoinc which seems similar, but I can't find anything conclusively telling me how to return the value of the newly created value in a way that I can be 100% sure will be correct under concurrent usage. The ADS Developer's Guide actually warns me against using autoinc, and the online help files offer functions which seem to retrieve the last generated autoinc ID (which isn't what I want - I want the one created by the previous statement, not the last one created across all sessions). The help files also list these functions with a caveat that they might not work correctly in situations involving concurrency.
How can I implement this in ADS? Should I use autoinc, some other built-in method that I'm unaware of, or do I genuinely need to do as the developer's guide suggests, and generate my unique identifiers before trying to insert into the table in the first place? If I should use autoinc, how can I obtain the value that has just been inserted into the table?
You use LastAutoInc(STATEMENT) with autoinc.
From the documentation (under Advantage SQL->Supported SQL Grammar->Supported Scalar Functions->Miscellaneous):
LASTAUTOINC(CONNECTION|STATEMENT)
Returns the last used autoinc value from an insert or append. Specifying CONNECTION will return the last used value for the entire connection. Specifying STATEMENT returns the last used value for only the current SQL statement. If no autoinc value has been updated yet, a NULL value is returned.
Note: Triggers that operate on tables with autoinc fields may affect the last autoinc value.
Note: SQL script triggers run on their own SQL statement. Therefore, calling LASTAUTOINC(STATEMENT) inside a SQL script trigger would return the lastautoinc value used by the trigger's SQL statement, not the original SQL statement which caused the trigger to fire. To obtain the last original SQL statement's lastautoinc value, use LASTAUTOINC(CONNECTION) instead.
Example: SELECT LASTAUTOINC(STATEMENT) FROM System.Iota
Another option is to use GUIDs.
(I wasn't sure but you may have already been alluding to this when you say "or do I genuinely need to do as the developer's guide suggests, and generate my unique identifiers before trying to insert into the table in the first place." - apologies if so, but still this info might be useful for others :) )
The use of GUIDs as a surrogate key allows either the application or the database to create a unique identifier, with a guarantee of no clashes.
Advantage 12 has built-in support for a GUID datatype:
GUID and 64-bit Integer Field Types
Advantage server and clients now support GUID and Long Integer (64-bit) data types in all table formats. The 64-bit integer type can be used to store integer values between -9,223,372,036,854,775,807 and 9,223,372,036,854,775,807 with no loss of precision. The GUID (Global Unique Identifier) field type is a 16-byte data structure. A new scalar function NewID() is available in the expression engine and SQL engine to generate new GUID. See ADT Field Types and Specifications and DBF Field Types and Specifications for more information.
http://scn.sap.com/docs/DOC-68484
For earlier versions, you could store the GUIDs as a char(36). (Think about your performance requirements here of course.) You will then need to do some conversion back and forth in your application layer between GUIDs and strings. If you're using some intermediary data access layer, e.g. NHibernate or Entity Framework, you should be able to at least localise the conversions to one place.
If some part of your logic is in a stored procedure, you should be able to use the newid() or newidstring() function, depending on the type of the backing column:
INSERT INTO Example_Table (newid(), OtherStuff)

Core Data: best way of checking the uniqueness of an attribute

As far as I know, there is no way of setting an entity's attribute as unique through Core Data, neither programmatically nor in Xcode's editor... I need to make sure that certain managed objects can't be created if there are collisions with the values of the attributes I want to be unique, and I've been reading some posts dealing with that.
I've found a couple of approaches (e.g. Core Data unique attributes):
To use -validateValue:forKey:error:
To write some kind of custom method to check if the attribute's value already exists
What should the most recommendable option be?
Thanks
You're going to need some kind of custom code, whether you put it in validateValue:forKey:error: or in a custom method or somewhere else.
Whether to use the built-in validation method is really a matter of how you prefer to organize your code. I'd prefer to do something like
Check to see if the value is unique.
If so, then insert a new instance.
That's partly because I find the built-in validation scheme to be a pain, but mostly it's because that code will run every time you save changes to an object. If your check is in validateValue:forKey:error:, you'll run it repeatedly, even after you've verified that the value is unique. Then again maybe you need to do that, so the best answer in your case depends on a bigger picture of how your app needs to work.
The simple way to approach validation is by doing a fetch with a predicate identifying the key and value that you need to check. The one change I'd make to the common fetching approach as described in the other answer is that I'd use countForFetchRequest:error: instead of executeFetchRequest:error:. It doesn't sound like you actually need to fetch existing objects during validation, you just need to know whether any exist, so just check that.
Depending on the type of the unique attribute, you may be able to reduce the performance hit that you're going to take by doing this. For example, if it's a string. Checking all existing strings for a match is relatively expensive. On the other hand checking a bunch of existing integers is cheap. In that case you might find it worthwhile to add a numeric property to your entity type that stores a numeric hash of the unique string value. Use the hash only when checking uniqueness. It'll be a hell of a lot faster than looking for matching strings, and NSString even has a handy hash method to calculate the value for you.

Amazon SimpleDB Identity Seed equivalent

Is there an equivalent to an identity Seed in SimpleDB?
If the answer is no, how do you handle creating something like a customer number or order number that will prevent the creation duplicate numbers?
My experience is mainly from SQL Server in which I would either create a primary key with an identity seed or use transactions in a stored procedure to increment the number.
Thanks for your help!
You can create unique keys using conditional writes. Just do a PutAttributes with the next customer number you want to use and the data you want to store. You can't add a condition for the actual item name, but you can use an attribute that always exists, (like creation date or user group).
Set the conditions:
Expected.1.Name=creation_date
Expected.1.Exists=false
The call will succeed only if there is no creation_date in an item with that item name. If you always write the creation_date, then you get the effect of optimistic locking on the new item name. Of course you can use any attribute you want, so long you always include it in that first conditional put.
The performance of the conditional write is the same as a normal write in most situations but when SimpleDB is under heavy load or high internal network latencies, these calls will take longer, compared to normal writes. During rare failure scenarios inside SimpleDB, the conditional writes will fail completely for a period of time.
If you can't tolerate this, you will have to code some sort of alternate way to get your unique keys during outages. A different SimpleDB region could be used for key generation only, since SimpleDB will still accept the normal writes (non-conditional PutAttributes) during outages.
If you don't already have something unique that will work, using a GUID for the Item is probably the typical solution.

Resources