I've recently done some benchmarking, and it seems like looking up another object by primary key:
let foo = realm.object(ofType: Bar.self, forPrimaryKey: id)
is more efficient (and in this specific case more readable), than trying to set the property directly as:
class Other: Object {
#objc dynamic var relation: Bar? = nil
let list = List<Bar>()
}
My benchmarking wasn't too thorough though (used only one element in the list, etc.) and I'm wondering if this is actually the case.
Intuition makes me think primary key lookup AND using the relation property above would be O(1) or O(logn). With 1,000,000 records and 1,000,000 lookups:
primary key: ~10s
relation property: ~12s
list property: ~14s
In summary: what is the performance of Realm's object(ofType:forPrimaryKey:) lookup?
Extra credit: when is it beneficial to use LinkingObjects, Lists, etc.? Assuming it's just a readability / convenience wrapper of some sort. In my case it has been more messy / bug prone, so I'm assuming I'm not using Realm in the way it was intended.
Realm isn't a relational database like SQLite. Instead, data is stored in B+ trees. All the data for a given property on a given model type is stored within a single tree, and all data retrieval (whether getting a property value or a linked object) involves traversing such a tree.
Furthermore, when a Realm is opened, the contents of the entire database file are mmaped into memory. When you use one of the Realm SDKs, the objects you create (e.g. Object instances) are actually thin wrappers that store a conceptual pointer to a location in the database file and provide methods to directly read from and write to the object at that location. Likewise, relationships (such as object properties on a model) are references to nodes elsewhere in the tree.
This means that retrieving an object requires the time it takes to traverse the database data structures to locate the required information, plus the time it takes to instantiate an object and initialize it. The latter is effectively a constant-time operation, so we want to look primarily at the former.
As for the situations you've outlined...
If you already know your primary key value, getting an object takes O(log n) time, where n is the number of objects of that particular type in the database. (The time it takes to retrieve a Dog is irrespective of the number of Cats the database contains.)
If you're naively implementing a relational-style foreign key pattern, where you model a link to an object of type U by storing a primary key value (like a string) on some object of type T, it will take O(log t) time to retrieve the primary key value (where t is the number of Ts), and O(log u) time to look up the destination object (as described in the previous bullet point; u = the number of Us).
If you're using an object property on your model type T to model a link to another object, it takes O(log t) time to retrieve the location of the destination object.
Using a list introduces another level of indirection, so retrieving the single object from a one-object list will be slower than retrieving an object directly from an object property.
Object, list, and linking objects properties are not intended to be an alternative to looking up objects via primary keys. Rather, they are intended to model many-to-one, many-to-many, and inverse relationships, respectively. For example, a Cat may have a single Owner, so it makes sense for a Cat model to have a object property pointing to its Owner. A Person may have multiple friends, so it makes sense for a Person model to have a list property containing all their friends (which may contain zero, one, or many other Persons).
Finally, if you're interested in learning more, the entire database stack is open source (except for the sync component, which is a strictly optional peripheral component). You can find the code for the core database engine here. We also have an older article that discusses the high-level design of the database engine; you can find that here.
Related
I want to know when to use below properties? What do they do? Why should we use it?
Transient: According to Apple Docs:
Transient attributes are properties that you define as part of the
model, but which are not saved to the persistent store as part of an
entity instance’s data. Core Data does track changes you make to
transient properties, so they are recorded for undo operations. You
use transient properties for a variety of purposes, including keeping
calculated values and derived values.
I do not understand the part that it is not saved to the persistent store as an entity instance's data. Can any one explain this?
indexed: It increase the search speed but at the cost of more space. So basically, if you do a search query using an attribute and you want faster result then make that property as 'indexed'. If the search operation is very rare then it decreases the performance as it take more space for indexing.
I am not sure whether it is correct or not?
index in spotlight
Store in External record file
Consider for instance that you have a navigation app. On your map you have your car at the center, updated a few dozen times a second, and an Entity of the type "GAS STATION". The entity's property 'distance' from your car would be a transient property, as it is a function of real time data thus there's no point to storing it.
An indexed attribute is stored sorted, therefore it can be searched faster. Explanation can be found on Wikipedia. If your frequent searches take noticeable time, you should probably consider indexing.
Consider indexing in Spotlight anything the user is likely to want to search for when not within your application. Documentation is here.
Large binary objects, like images, should be stored externally.
In my code I want to take advantage of ETS's bag type that can store multiple values for single key. However, it would be very useful to know if insertion actually inserts a new value or not (i.e. if the inserted key with value was or was not present in the bag).
With type set of ETS I could use ets:insert_new, but semantics is different for bag (emphasis mine):
This function works exactly like insert/2, with the exception that instead of overwriting objects with the same key (in the case of set or ordered_set) or adding more objects with keys already existing in the table (in the case of bag and duplicate_bag), it simply returns false.
Is there a way to achieve such functionality with one call? I understand it can be achieved by a lookup followed by an optional insert, but I am afraid it might hurt performance of concurrent access.
I need to store an array of User objects inside a Tile node. Each User object contains three primitive properties; Id(a single alpha-character string) , fName and lName. This list of objects is a property of the Tile node with several other primitive properties. The entire Tile node needs to be serialized to Json, including the nested User objects.
I understand that Neo can't store complex objects as properties. I created the User as a separate node with id, fName and lName as properties, and I can get these returned via Cypher. I can also get Json output results for the parent Tile node. (In this case, Users is just a string of comma-separated alphas). But how do I get the User node output nested inside the parent node?
I have created a list of User objects (userList) by relating user objects with the string of user ids in the Tile Node via a Cypher Query. I just need to get from two separate json outputs to a single nested output.
I hope this is enough detail. I'm using Neo4j 2.1.6 and Neo4jClient. I'm also using .Net 4.0.
You could do something like this with cypher and have the cypher return a composite object.
MATCH (t:Tile)-[:CONTAINS_USER]-(u:User)
WHERE t.name =~ 'Tile.*'
WITH {name: t.name, users: collect(u) } AS tile
RETURN collect(tile) AS tiles
You shouldn't store another object as a nested property. As you correctly state, neo4j doesn't support that but even if it did, you shouldn't do it, because you should link the two with a relationship. That's the key strength of a graph database like neo4j, so you should play to that strength and use the relationships.
The server has a default JSON format that tends to output nodes as their own JSON objects. That means that practically speaking, since you're going to model this as two separate nodes with a relationship, you can't get the server by default to nest the JSON for one object underneath of the other. It won't nest the JSON that way because that's not how the data will be stored.
In this case, I'd use the REST services to fetch the JSON for each object individually, and then do the nesting yourself in your code -- your code is the only place where you'll know which property it should be nested under, and how that should be done.
In addition to these answers, note that if you don't need to involve the subfield properties in any of your queries (e.g. search for Tiles where a User.name is "X"), you can simply serialise the objects fields to a string before insertion (e.g. with JSON.stringify), and unserialise them when reading from the DB.
This is especially useful when you want to "attach" structured data to a node, but that you don't much care about this data with regards to the relations in your DB (e.g. User preferences).
Having figured out most of my data-model for a new iOS app, I'm now stuck with a problem that I've been thinking about for a while.
An 'Experiment' has a name, description and owner. It also has one 'Action' and one 'Event'.
An 'Event' could be different things: Time, Location or Speed.
Depending on what the 'Event' is, it can have a different 'Type'. For example, Time could be one-off, interval, date-range, repeating or random. Location could be area or exact location.
Each 'Type' then has a value that has a data type unique to itself. The Time One-Off could be a date value of 12:15pm and the Location Exact could be a GeoPoint value of (30.0, -20.0).
The Problem
How do I design the data model so that the database is not riddled
with NULL values?
How do I design the data model to be extensible if I add more 'Events'
and 'Types'.
Thoughts
As an Experiment only has one Action and one Event, it would be wrong to separate these two into different tables, however not doing so would cause the Experiment table to be full of NULL values, as I'd have to have columns for Event, Event Type and Event Type Value to compensate for all of the possible data types one could enter for an Event Type Value. (date, int, string, geopoint, etc)
Separating the Event and Event Type into a separate table would probably fix the NULL value issue however I'd be left with repeating data, especially in the case of time as the Event with Type One-Off as 12:00pm, as this would exist in other experiments, not just one. (Unless I create EVERY possibility and populate a separate table with these - how could I easily do this though?)
Maybe I'm over complicating things, maybe I'm missing something so simple that I'm going to kick myself when I see it.
You need to think about your data model in terms of objects not tables. Core data works with object graphs so everything in core data is an object. In Objective-c you work with objects. This is why you don't need a ORM tool. If you think in terms of objects then I think the model below (obviously needs work but you should get the point) makes sense. The advantage of separating your concepts out into objects like this is that you can look at your problem from multiple angles. In other words you can look at it from the Experiment angle or from the Event angle. I suspect you will want to do something with the data such as use your Time object in your code to show on a calendar or set a reminder. Fetch all the events for all experiments of a specific type, etc. By encapsulating these data items in objects in core data, everything is ready for you to leverage, manipulate and modify in your code. It also removes the null value issue you identified. Because you won't be creating objects for null values, only for values that are relevant to your experiment. That being said, you might want to break down the model even further depending upon the specifics of your program. Also, you would not have the repeating data issue you mention if you design this properly. Again, you're not dealing with rows in a table you are dealing with objects. If you create an Event Type object with "one-off 12:00pm", you can assign that Event Type objec,t through its relationship, to as many Event(s) as you wish. You don't create the object again, you simply reference it. When you think of the relationships think "X can be associated with Y". For example, "An Experiment can be associated with only 1 Event", "An Event Type can be associated with many Events", "An Event can be associated with only 1 Event Type". Taking this approach sets you up for extensibility down the road. Imagine you want to add a new Event Type. You simply create a new event entity and associate it to your Event Type entity.
My suggestion is to think about your object model relative to how you anticipate using the objects in your code (and how you anticipate accessing the objects via queries). That should help drive how you construct it (e.g. if you need a time object then make sure you have that in your object model. If you need an alert object then make sure you have that in your object model). Let the model do the work for you and try not to write a lot of code to assemble the equivalent of an object model within objective-c or start creating objects in code and populating them with data from your data store.
(EDIT: Replace the "event" relationship in the diagram under time, location & speed with "event types")
Assume we have very big NSDictionary, when we want to call the objectForKey method, will it make lots of operations in core to get value? Or will it point to value in the memory directly?
How does it works in core?
The CFDictionary section of the Collections Programming Topics for Core Foundation (which you should look into if you want to know more) states:
A dictionary—an object of the CFDictionary type—is a hashing-based
collection whose keys for accessing its values are arbitrary,
program-defined pieces of data (or pointers to data). Although the key
is usually a string (or, in Core Foundation, a CFString object), it
can be anything that can fit into the size of a pointer—an integer, a
reference to a Core Foundation object, even a pointer to a data
structure (unlikely as that might be).
This is what wikipedia has to say about hash tables:
Ideally, the hash function should map each possible key to a unique
slot index, but this ideal is rarely achievable in practice (unless
the hash keys are fixed; i.e. new entries are never added to the table
after it is created). Instead, most hash table designs assume that
hash collisions—different keys that map to the same hash value—will
occur and must be accommodated in some way. In a well-dimensioned hash
table, the average cost (number of instructions) for each lookup is
independent of the number of elements stored in the table. Many hash
table designs also allow arbitrary insertions and deletions of
key-value pairs, at constant average (indeed, amortized) cost per
operation.
The performance therefore depends on the quality of the hash. If it is good then accessing elements should be an O(1) operation (i.e. not dependent on the number of elements).
EDIT:
In fact after reading further the Collections Programming Topics for Core Foundation, apple gives an answer to your question:
The access time for a value in a CFDictionary object is guaranteed to
be at worst O(log N) for any implementation, but is often O(1)
(constant time). Insertion or deletion operations are typically in
constant time as well, but are O(N*log N) in the worst cases. It is
faster to access values through a key than accessing them directly.
Dictionaries tend to use significantly more memory than an array with
the same number of values.
NSDictionary is essentially an Hash Table structure, thus Big-O for lookup is O(1). However, to avoid reallocations (and to achieve the O(1)) complexity you should use dictionaryWithCapacity: to create a new Dictionary with appropriate size with respect to the size of your dataset.