What kind of sort does Cocoa use? - ios

I'm always amazed by the abstractions our modern languages or frameworks create, even the ones considered relatively low level such as Objective-C/Cocoa.
Here I'm interested in the type of sort executed when one calls sortedArrayUsingComparator: on an NSArray. Is it dynamic, like analyzing the current constraints of the environment (particularly free memory) and the attributes of the array (length, unique values), and pick the best sort accordingly, or does it always use the same one, like Quick or Merge Sort?
It should be possible to test that by analyzing the running time of the method relatively to N, just wondering if anyone already bothered to.

This has been described at a developers conference. The sort doesn't need any memory. It checks if there is a sorted range of numbers at the start or the end or both and takes advantage of that. You can ask yourself how you would sort an 100,000 entry array if the first 50,000 are sorted in descending order.

Related

Linked lists in practice

I have some questions about the ideas proposed in this video.
The speaker shows an array that holds values and pointers, and he also shows a separate "free" linked list, that is updated whenever an item is added/removed.
Why are these used? Doesn't using an array / limiting yourself to a set of free nodes defeat the purpose of a linked list?
Isn't one of the perk of using a linked list the ability to traverse fragmented data?
Why use these free nodes, when you can dynamically allocate storage?
The proposed structure, to me, doesn't seem dynamic at all, and is in fact a convoluted and inefficient array.
The approach you mention makes sense in certain use cases. For example if the common case is that the array is 90% full and most of the time is spent iterating over it, you can very quickly loop over an array and just skip the few empty items. This can be much, much faster than "pointer chasing" which plain linked lists use, because the CPU's hardware prefetcher can predict which memory you will need in advance.
And compared with a plain array and no free list, it has the advantage of O(1) allocation of an element into an empty slot.

Storing text in Neo4j

We're using Neo4J and liking it. We do all sorts of graphy things in it. However, some of what we do is not graphy. For example, we keep a log of all changes to a certain type of node:
(n)-[:CHANGE]->(c1)-[:CHANGE]->(c2) etc etc
This list of changes can get to be 20 or 30 c1 nodes long. While it looks weird, I don't have a real problem with it. (Of course, I am smarter now, and since each :CHANGE relationship has a date in it, I could porcupine all the c1 nodes right from n. But whatever.)
But what if I wanted to store large amounts of text, or images. Is there a problem with storing large amounts of data in a node? I could use a different database for these things, but this just increases the skill set required to run the business. And of course, joining data in two disparate databases is always a PITA.
So I need to worry about storing large amounts of text in a single property? Do I need to avoid creating logs in the way I did above?
Creating linked lists of events, whatever they may be, is fine. It'd even say it is graphy! We use this approach a lot, in the case of ChangeFeed for something similar to what you're doing.
In Neo4j, such a linked list is actually better than a "porcupine", because you have to traverse all the relationships anyway, if you're looking for all changes. But in the linked list case, you don't have to look at properties to order them. In fact, the best approach is a hybrid approach, like the TimeTree.
As for storing large amounts of text or images, Neo4j is not the best place for it. Another database would be best, especially if the volume of these is large ( > hundreds of thousands).
But if you do want to store them in Neo4j, one thing to keep in mind is that Neo4j will load all the properties of a node/relationship once a single property is accessed. So in order to achieve good performance even with text/images in Neo4j, I would store them in a node of their own. That way, you can only load them if you really need them, but not during regular traversals.
For example:
CREATE (b:BlogPost {title:'Neo4j Rocks', author:'Tony Ennis', date:".."})-[:HAS_BODY]->(:BlogPostBody {content:'..'})

Mnesia: time and space efficiency of read, match_object, select, and qlc queries

Mnesia has four methods of reading from database: read, match_object, select, qlc. Besides their dirty counterparts of course. Each of them is more expressive than previous ones.
Which of them use indices?
Given the query in one of this methods will the same queries in more expressive methods be less efficient by time/memory usage? How much?
UPD.
As I GIVE CRAP ANSWERS mentioned, read is just a key-value lookup, but after a while of exploration I found also functions index_read and index_write, which work in the same manner but use indices instead of primary key.
One at a time, though from memory:
read always uses a Key-lookup on the keypos. It is basically the key-value lookup.
match_object and select will optimize the query if it can on the keypos key. That is, it only uses that key for optimization. It never utilizes further index types.
qlc has a query-compiler and will attempt to use additional indexes if possible, but it all depends on the query planner and if it triggers. erl -man qlc has the details and you can ask it to output its plan.
Mnesia tables are basically key-value maps from terms to terms. Usually, this means that if the key part is something the query can latch onto and use, then it is used. Otherwise, you will be looking at a full-table scan. This may be expensive, but do note that the scan is in-memory and thus usually fairly fast.
Also, take note of the table type: set is a hash-table and can't utilize a partial key match. ordered_set is a tree and can do a partial match:
Example - if we have a key {Id, Timestamp}, querying on {Id, '_'} as the key is reasonably fast on an ordered_set because the lexicographic ordering means we can utilize the tree for a fast walk. This is equivalent of specifying a composite INDEX/PRIMARY KEY in a traditional RDBMS.
If you can arrange data such that you can do simple queries without additional indexes, then that representation is preferred. Also note that additional indexes are implemented as bags, so if you have many matches for an index, then it is very inefficient. In other words, you should probably not index on a position in the tuples where there are few distinct values. It is better to index on things with many different (mostly) distinct values, like an e-mail address for a user-column for instance.

Tinkerpop Blueprints Vertex Query

I've been researching the Tinkerpop stack for quite a while. I think I have a good idea of what it can do and what databases it works well with. I've got a couple of different databases I'm thinking about right now, but haven't decided on a definite. So I've decided to write my code purely to the interfaces, and not take into account any implementation right now. Out of the databases I'm looking at, they implement TransactionalGraph and KeyIndexableGraph. I think that's good enough for what I need, but I have just one question.
I have different 'classes' of vertices. Using Blueprints, I believe that's best representable by having a field in each vertex containing the class name. Doing that, I can do something like graph.getVertices("classname", "User") and it would give me all of the user vertices. And since the getVertices function specifies that an implementation should make use of indexes, I'm guaranteed to get a fast lookup (if I index that field).
But let's say that I wanted to retrieve a vertex based on two properties. The vertex must have className=Users and username=admin. What's the best way to go about finding that single vertex? And is it possible to index over both of those properties, even though not all vertices will have a username field?
FYI - The databases I'm currently thinking of are OrientDB, Neo4j and Titan, but I haven't decided for sure yet. I'm also currently planning to use Gremlin if that helps at all.
Using a "class" or a "type" for vertices is a good way to segment them. Doing:
graph.createKeyIndex("classname",Vertex.class);
graph.getVertices("classname", "User");
is a pretty common pattern and should generally yield a fast lookup, though iterating an index of tens of millions of users might not be so great (if you intend to grow a particular classname to very big size). I think that leads to the second part of your question, in regards to doing a two property lookup.
Taking your example on the surface, the two element lookup would be something like (using Gremlin):
g.V('classname',"User").has('username','admin')
So, you narrow the vertices to just "User" vertices with a key index and then filter those for "admin". But, I'd model this differently. It would be even less expensive to simply do:
graph.createKeyIndex("username",Vertex.class);
graph.getVertices("username", "admin");
or in Gremlin:
g.V('username','admin')
If you know the username you want, there's no better/faster way to model this. You really only need the classname if you want to iterate over all "User" vertices. If you just want to find one (or a set of vertices with that username) then key indexing on that property is the better way.
Even if I don't create a key index on it, I still include a type or classname property on all vertices. I find it helpful in global operations where I may or may not care about speed, but just need an answer.
graph.getVertices() will iterate through all vertexes and look for ones with that property if you do not have the auto-index turned on in your graph implementation. If you already have data and cannot just turn on the auto-indexer, you should use is index = indexableGraph.getIndex() and then index.get('classname', 'User')
It's possible to perform a query over multiple objects, but without specifics, it's hard to say. For Neo4j they use Lucene, which means that query() will take a lucene query, such as className:Users AND username:admin, but I cannot speak for the others.
Yeah of those DB is good for playing with, I personally found neo4j to be the easiest, and as long as you understand their licensing structure, you shouldn't have any problems using them.

Delphi array elements alphanumeric sort order?

Is the best way to sort an array in Delphi is "alphanumeric".
I found this comment in an old code of my application
" The elements of this array must be in ascending, alphanumeric
sort order."
If so ,what copuld be the reason?
-Vas
There's no "best" way as to how to sort the elements of an array (or any collection for that fact). Sort is a humanized characteristic (things are not usually sorted) so I'm guessing the comment has more to do with what your program is expecting.
More concretely, there's probably other section of code elsewhere that expect the array elements to be sorted alphanumerically. It can be something so simple as displaying it into a TreeView already ordered so that the calling code doesn't have to sort the array first.
Arrays are represented as a contiguous memory assignment so that access is fast. Internally the compiler just does a call to GetMem asking for SizeOf(Type) * array size. There's nothing in the way the elements are sorted that affects the performance or memory size of the arrays in general. It MUST be in the program logic.
Most often an array is sorted to provide faster search times. Given a list of length L, I can compare with the midpoint (L DIV 2) and quickly determine if I need to look at the greater half, or the lesser half, and recursively continue using this pattern until I either have nothing to divide by or have found my match. This is what is called a Binary search. If the list is NOT sorted, then this type of operation is not available and instead I must inspect every item in the list until I reach the end.
No, there is no "best way" of sorting. And that's one of the reasons why you have multiple sorting techniques out there.
With QuickSort, you even provide the comparison function where you determine what order you ultimately want.
Sorting an array in some way is useful when you're trying to do a binary search on the array. A binary search can be extremely fast, compared to other methods. But if the sort error is wrong, the search will be unable to find the record.
Other reasons to keep arrays sorted are almost always for cosmetic reasons, to decide how the array is sent to some output.
The best way to re-order an array depends of the length of the array and the type of data it contains. A QuickSort algorithm would give a fast result in most cases. Delphi uses it internally when you're working with string-lists and some other lists. Question is, do you really need to sort it? Does it really need to stay an array even?
But the best way to keep an array sorted is by keeping it sorted from the first element that you add to it! In general, I write a wrapper around my array types, which will take care of keeping the array ordered. The 'Add' method will search for the biggest value in the array that's less or equal to the value that I want to add. I then insert the new item right after that position. To me, that would be the best solution. (With big arrays you could use the binary search method again to find the location where you need to insert the new record. It's slower than appending records to the end but you never have to wonder if it's sorted or not, since it is...

Resources