Trying to replace this block-based NSSortDescriptor with a Core Data-friendly one - ios

I've got an entity type which, as one of its properties, has a single character. I want to retrieve all such entities which match a predicate, and I want them to be sorted first by that character, and then by an index number (which is another of its properties).
This is simple if I just use the built-in sort descriptors... however, the single character can be anything from a letter to a number to punctuation to an emoji. And when I use the built-in sort, I get punctuation first, then numbers, and then so on. What I want is A-Z first, then numbers, then punctuation, then finally emoji or other non-alphanumeric-and-non-punctuation (those last ones I don't really care about their order).
This is easy enough to implement as a block-based NSSortDescriptor, but I can't figure out how to do it in a way that I can send it off to Core Data as part of a fetch request (i.e., no blocks allowed). I'd be fine with breaking it into a couple different requests, if that's the only way to do it, and then joining the resulting arrays afterward; but I'd prefer to do it in one fetch if possible.
Thanks!

When you create the objects in the first place, run your sort logic and save a resulting 'characterType' into another property. Now, on your fetch request, use 3 sort descriptors, with this character type identifier first, then the character and then the other index.

Related

Optimize finding matches in array of strings using array of regex

I have two arrays. One that is populated with values pulled from the database and another that is an user uploaded array of regex patterns to match:
db_array = ["ABCDEFG", "HIJKLMN", "OPQRSTU", "VWXYZ", etc...]
matching_array = ["(OPQRST)(3)?(.{1,})?", "(WXY)(1)?(.{1,})?", "(HIJKLMN)(3)?(.{1,})?", etc...]
Is there a better way to find any/all the matches in the db_array using the matching_array rather than iterating through the matching array and then the db_array and pulling any matches?
matching_array.map{|regex| db_array.select{|a| /#{regex}/.match}}
The issue is that both of these arrays can be over 3000+ records and that takes a substantial amount of time. Especially since the matching_array is built up to multiple times using different pattern criteria. Trying to limit the amount of db calls I make as well since I dont want to constantly be hitting the server.
If you can be sure that the regular expressions don't use PCRE extensions (look behind, etc), then you can use a much faster regex library.
Google maintains one called re2 (https://github.com/google/re2).
Ruby bindings are available (https://github.com/stefanor/ruby-re2, among others).

database design for dictionary of words

(my reason for asking this question is based on having read this answer, which made me rethink my current setup)
I currently am developing a ruby on rails application in which there are many languages, each of which has a dictionary of base words attached to it, as well as a list of the words that map to each base word. The way I currently have it set up, there is a base_words table that contains the base_word as a string, along with the language_id as a foreign key. There is also a words table, each row of which contains a word string, along with the base_word_id as a foreign key. There is also a language_id indexed on each column, although I'm almost positive that this is superfluous due to the language_id on base_word, so I'm planning to take it off (although this could be a bad assumption on my part).
In sum, on the contrary to the answer I mentioned in the beginning, the tables are not separated by language, because I've reasoned that I can simply pull out the language words programmatically when the time comes. However, my application will also have translation(s) associated with each base word (as did the answer I referenced), and so I'm doubting my structure due to the realization that each translation will actually be a base_word in the same table as itself, which would mean that the translation would actually be just an id of another base word in said table. This may be completely fine, or it may not be - I have no clue (this is my first ever programming project).
Is this ok? Do I need to separate my base_words into separate tables for each language, or can I leave it all in one table?
Another example: I also need to store many phrases for each language, along with their translations. Should I have one table where each row has the appropriate translation of the phrase, or one table where each row contains simply one phrase and a language_id, or multiple tables (one for each language)?
Un saludo,
Michael
As in the other scenario, you'll have a translations table. There is no technical reason it couldn't have multiple foreign keys to base_words (a source_word_id and target_word_id, perhaps). So yes, you can absolutely store all your words in one table. There are some minor side effects involved with translations being directional relationships: it becomes possible to have translations which only work one way, and there will be many pairs of entries with opposite source and target. Neither of these is much of a worry: the first is even potentially desirable in order to represent words with double meanings in one language but not the other, and as for the second, space is cheap and indexing is easy.
You are correct that you do not need words.language_id, so long as you always join base_words when you're querying words and the language matters. This obviously changes if you have a use case where it makes sense to leave base_words out, but that scenario sounds unlikely based on what you describe.
As for phrases: why should they be handled any differently than base_words?

CoreData sort by expression or select a column as an expression

I've got a coreData with sqlite backend of a table with two NSDate columns: eventStart and eventEnd. I would like to perform quite a complicated select and sort on it.
For one of the main displays in the application i'd like to retrieve two things:
10 events who's duration (eventEnd - eventStart) was smaller than specified value
10 events who's duration was larger than the specified value
The events have to be sorted correctly based on how far from the specified value they are
Two problems i've hit straight away is I can't find a way select a column from an expression (the date calculation). The second was NSSortDescriptor only seems to work on columns, not expressions. This is contrary to how SQLite works, and i'm wondering if it would be easier to just break out the raw SQL.
I should mention that the data i'm going to be working with will be too large to fit into memory for things like sorting, especially since because the sort is on an expression the query would have to return all data for sorting.
If you fetch the objects first, you should then be able to sort the result in memory using any key you want, including a method that returns the interval you mentioned. So, you'd create an NSSortDescriptor using the method name that returns this time interval, create a new array with it, then simply call
[originalFetchedArray sortedArrayUsingDescriptors:sortDescriptors];
which will return a new sorted array. If you're starting with an NSMutableArray, you can sort that in place using a similar method.
When I had a very similar problem, I just created a new column for the calculated value (in your case it will be duration). After that, the retrieval becomes trivial.

Sorted TStringList, how does the sorting work?

I'm simply curious as lately I have been seeing the use of Hashmaps in Java and wonder if Delphi's Sorted String list is similar at all.
Does the TStringList object generate a Hash to use as an index for each item? And how does the search string get checked against the list of strings via the Find function?
I make use of Sorted TStringLists very often and I would just like to understand what is going on a little bit more.
Please assume I don't know how a hash map works, because I don't :)
Thanks
I'm interpreting this question, quite generally, as a request for an overview of lists and dictionaries.
A list, as almost everyone knows, is a container that is indexed by contiguous integers.
A hash map, dictionary or associative array is a container whose index can be of any type. Very commonly, a dictionary is indexed with strings.
For sake of argument let us call our lists L and our dictionaries D.
Lists have true random access. An item can be looked-up in constant time if you know its index. This is not the case for dictionaries and they usually resort to hash-based algorithms to achieve efficient random access.
A sorted list can perform binary search when you attempt to find a value. Finding a value, V, is the act of obtaining the index, I, such that L[I]=V. Binary search is very efficient. If the list is not sorted then it must perform linear search which is much less efficient. A sorted list can use insertion sort to maintain the order of the list – when a new item is added, it is inserted at the correct location.
You can think of a dictionary as a list of <Key,Value> pairs. You can iterate over all pairs, but more commonly you use index notation to look-up a value for a given key: D[Key]. Note that this is not the same operation as finding a value in a list – it is the analogue of reading L[I] when you know the index I.
In older versions of Delphi it was common to coax dictionary behaviour out of string lists. The performance was terrible. There was little flexibility in the contents.
With modern Delphi, there is TDictionary, a generic class that can hold anything. The implementation uses a hash and although I have not personally tested its performance I understand it to be respectable.
There are commonly used algorithms that optimally use all of these containers: unsorted lists, sorted lists, dictionaries. You just need to use the right one for the problem at hand.
TStringList holds the strings in an array.
If you call Sort on an otherwise unsorted (Sorted property = false) string list then a QuickSort is performed to sort the items.
The same happens if you set Sorted to true.
If you call Find (or IndexOf which calls find) on an unsorted string list (Sorted property = false, even if you explicitly called Sort the list is considered unsorted if the Sorted property isn't true) then a linear search is performed comparing all strings from the start till a match is found.
If you call Find on a sorted string list (Sorted property = true) then a binary search is performed (see http://en.wikipedia.org/wiki/Binary_search for details).
If you add a string to a sorted string list, a binary search is performed to determine the correct insertion position, all following elements in the array are shifted by one and the new string is inserted.
Because of this insertion performance gets a lot worse the larger the string list is. If you want to insert a large number of entries into a sorted string list, it's usually better to turn sorting off, insert the strings, then set Sorted back to true which performs a quick sort.
The disadvantage of that approach is that you will not be able to prevent the insertion of duplicates.
EDIT: If you want a hash map, use TDictionary from unit Generics.Collections
You could look at the source code, since that comes with Delphi. Ctrl-Click on the "sort" call in your code.
It's a simple alphabetical sort in non-Unicode Delphi, and a slightly more complex Unicode one in later versions. You can supply your own comparison for custom sort orders. Unfortunately I don't have a recent version of Delphi so can't confirm, but I expect that under the hood there's a proper Unicode-aware and locale-aware string comparison routine. Unicode sorting/string comparison is not trivial and a little web searching will point out some of the pitfalls.
Supplying your own comparison routine is often done when you have delimited text in the strings or objects attached to them (the Objects property). In those cases you often wat to sort by a property of the object or something other than the first field in the string. Or it might be as simple as wanting a numerical sort on the strings (so "2" comes after "1" rather than after "19")
There is also a THashedStringList, which could be an option (especially in older Delphi versions).
BTW, the Unicode sort routines for TStringList are quite slow. If you override the TStringList.CompareStrings method then if the strings only contain Ansi characters (which if you use English exclusively they will), you can use customised Ansi string comparisons. I use my own customised TStringList class that does this and it is 4 times faster than the TStringList class for a sorted list for both reading and writing strings from/to the list.
Delphi's dictionary type (in generics-enabled versions of Delphi) is the closest thing to a hashmap, that ships with Delphi. THashedStringList makes lookups faster than they would be in a sorted string list. you can do lookups using a binary search in a sorted stringlist, so it's faster than brute force searches, but not as fast as a hash.
The general theory of a hash is that it is unordered, but very fast on lookup and insertion. A sorted list is reasonably fast on insertion until the size of the list gets large, although it's not as efficient as a dictionary for insertion.
The big benefit of a list is that it is ordered but a hash-lookup dictionary is not.

Delphi array elements alphanumeric sort order?

Is the best way to sort an array in Delphi is "alphanumeric".
I found this comment in an old code of my application
" The elements of this array must be in ascending, alphanumeric
sort order."
If so ,what copuld be the reason?
-Vas
There's no "best" way as to how to sort the elements of an array (or any collection for that fact). Sort is a humanized characteristic (things are not usually sorted) so I'm guessing the comment has more to do with what your program is expecting.
More concretely, there's probably other section of code elsewhere that expect the array elements to be sorted alphanumerically. It can be something so simple as displaying it into a TreeView already ordered so that the calling code doesn't have to sort the array first.
Arrays are represented as a contiguous memory assignment so that access is fast. Internally the compiler just does a call to GetMem asking for SizeOf(Type) * array size. There's nothing in the way the elements are sorted that affects the performance or memory size of the arrays in general. It MUST be in the program logic.
Most often an array is sorted to provide faster search times. Given a list of length L, I can compare with the midpoint (L DIV 2) and quickly determine if I need to look at the greater half, or the lesser half, and recursively continue using this pattern until I either have nothing to divide by or have found my match. This is what is called a Binary search. If the list is NOT sorted, then this type of operation is not available and instead I must inspect every item in the list until I reach the end.
No, there is no "best way" of sorting. And that's one of the reasons why you have multiple sorting techniques out there.
With QuickSort, you even provide the comparison function where you determine what order you ultimately want.
Sorting an array in some way is useful when you're trying to do a binary search on the array. A binary search can be extremely fast, compared to other methods. But if the sort error is wrong, the search will be unable to find the record.
Other reasons to keep arrays sorted are almost always for cosmetic reasons, to decide how the array is sent to some output.
The best way to re-order an array depends of the length of the array and the type of data it contains. A QuickSort algorithm would give a fast result in most cases. Delphi uses it internally when you're working with string-lists and some other lists. Question is, do you really need to sort it? Does it really need to stay an array even?
But the best way to keep an array sorted is by keeping it sorted from the first element that you add to it! In general, I write a wrapper around my array types, which will take care of keeping the array ordered. The 'Add' method will search for the biggest value in the array that's less or equal to the value that I want to add. I then insert the new item right after that position. To me, that would be the best solution. (With big arrays you could use the binary search method again to find the location where you need to insert the new record. It's slower than appending records to the end but you never have to wonder if it's sorted or not, since it is...

Resources