Working with large dictionaries in Swift - ios

I have a large set of data (approximately forty thousand value pairs), that ideally fits into [Int:Double] dictionary, but Xcode tries to reindex the file all the time and it just doesn't work.. the environment hangs, crashes..
What is the current best practice to deal with such issue in Swift for iOS?
class R {
let array:[Int:Double] = [
99502:0.1730,
// 40 thousand lines are here
99503:0.1730
]
}

Related

Call Directory Extension CallKit does't recognize numbers with more than 9 digits

I'm working with CallKit and developing an app with a Call Directory Extension. I've followed this tutorial and I'm currently test the capability of identify numbers that the user does't have in his contacts and show an ID from my app, but although is working perfectly with numbers of 1 to 9 digits, for example 123456, when I set numbers with 10 or more digits, iOs doesn't recognize the number. After a day and a half of google it, I've have found no information about that. If anyone can help me I'll appreciate it. Thanks in advance.
The method for set the phone numbers for recognize:
private func addAllIdentificationPhoneNumbers(to context: CXCallDirectoryExtensionContext) {
// Retrieve phone numbers to identify and their identification labels from data store. For optimal performance and memory usage when there are many phone numbers,
// consider only loading a subset of numbers at a given time and using autorelease pool(s) to release objects allocated during each batch of numbers which are loaded.
//
// Numbers must be provided in numerically ascending order.
let allPhoneNumbers: [CXCallDirectoryPhoneNumber] = [ 123456789, 1_888_555_5555 ]
let labels = [ "ID test", "Local business" ]
for (phoneNumber, label) in zip(allPhoneNumbers, labels) {
context.addIdentificationEntry(withNextSequentialPhoneNumber: phoneNumber, label: label)
}
}
With this code, when I simulate a call with the number 123456789, iOS shows the tag "ID test" and that's correct, but if I add any digit, for example 0 at the end: 1234567890, iOS does't show anything when I simulate a call. I don't know if I'm missing something.
Well, after a bunch of tests I could made it work. The point was that the phone must contain the full country code and the area code. So for example 00_52_55_4567_8932 877 or +52_55_4567_8932 both will work. But 55_4567_8932 and 4567_8932 will not work. I hope this can help someone else in the future. Thank you all!

How to merge multiple Arrays without slowing the compiler down?

Adding this line of code causes my compile time to go from 10 seconds to 3 minutes.
var resultsArray = hashTagParticipantCodes + prefixParticipantCodes + asterixParticipantCodes + attPrefixParticipantCodes + attURLParticipantCodes
Changing it to this brings the compile time back down to normal.
var resultsArray = hashTagParticipantCodes
resultsArray += prefixParticipantCodes
resultsArray += asterixParticipantCodes
resultsArray += attPrefixParticipantCodes
resultsArray += attURLParticipantCodes
Why does the first line cause my compile time to slow down so drastically and is there a more elegant way to merge these Arrays than the 5 line solution I've posted?
It's always +. Every time people complain about explosive compile times, I ask "do you have chained +?" And it's always yes. It's because + is so heavily overloaded. That said, I think this is dramatically better in Xcode 8, at least in my quick experiment.
You can dramatically speed this up without requiring a var by joining the arrays rather than adding them:
let resultsArray = [hashTagParticipantCodes,
prefixParticipantCodes,
asterixParticipantCodes,
attPrefixParticipantCodes,
attURLParticipantCodes]
.joinWithSeparator([]).map{$0}
The .map{$0} at the end is to force it back into an Array (if you need that, otherwise you can just use the lazy FlattenCollection). You can also do it this way:
let resultsArray = Array(
[hashTagParticipantCodes,
prefixParticipantCodes,
asterixParticipantCodes,
attPrefixParticipantCodes,
attURLParticipantCodes]
.joinWithSeparator([]))
But check Xcode 8; I believe this is at least partially fixed (but using .joined() is still much faster, even in Swift 3).

Why does Swift 2.2 not support fixed sized arrays, maybe swift 3.0 support?

I'm trying to create a game for iPhone (it's simple, just to sharpen my skills), and I've noticed that Apple has yet to show any support for fixed sized arrays (i had a huge issue with keeping track of my enemies I'm spawning and it would have helped a lot to have a fixed array). Does anyone know why they're not supported? Or if they will be supported in Swift 3.0?
You are not writing C, don't carry the C's baggage with you. Arrays in Swift falls into 2 types: constant array, which is fixed-length and fixed value, you can't modify anything; or variable-length and variable value. Each array knows its length (arr.count), so you don't need a separate variable to keep track of it.
As for for loop, the more I write Swift, the less I used this:
for var i = 0; i < arr.count; i++ { ... }
And the more I used these:
for item in arr { ... }
for (index, item) in arr.enumerate() { ... }
arr.forEach { ... }
arr.map { ... }

Xcode iOS: how to find position of one audio file (snippet) inside another?

I have audio files, with different durations. They have common content and unique content. E.g. two files, 70 seconds each, last 10 seconds of the first file is the same as first two seconds of the second file. How can I find the exact position of common content (e.g. 60.0 of the first file)?
Sounds a little bit messy, hope the following image can help https://drive.google.com/file/d/0BzBE2Kfw8uQoUWNTN1RXOEtLVEk/view?usp=sharing
So, I'm looking for the red mark - common content starts at 60.0 sec of the first file.
The problem is that I have files with different durations. Sometimes it's 70 seconds long, sometimes one file is 70 seconds, the other is 80 seconds long, etc. Most likely they have 60.0 seconds of unique content, but I'm not sure (it could be 59.9 of unique content, etc.).
Thus, I assume I need to get a short snippet of the second file from first 10 seconds and find it in the first file:
For example, output: 2.5 sec of the second file = 62.5 from the first file - works for me, as well.
THE MAIN GOAL IS TO PLAY FILE AFTER FILE GAPLESS. If I get the values, I'll be able to do this. Sometimes the values can be: 2.5 = 63.7, that's why I need the exact match.
Can anybody help with the code or at least some information of how to compare two snippets of audio content? Thanks in advance!
Wow, that is quite a problem to solve. And I must confess that i've not done anything exactly like this or have any code based suggestions.
All I will say is that if I were looking to try and solve this problem, then I would try and save the audio file as some kind of uncompressed and fixed size (as in a known number of bytes per second) format.
Then you could take a section of one file and byte match it with another, then you would know how many bytes inwards that snippet occurred. Then, knowing the bytes per ms (sort of frame size), you could work out the exact time position.
It's a bit hair brained, but i've used that technique with images before but at least audio is linear!
Here is an approximate example of how I would go about doing the comparison of a sample within a sound file.
- (int)positionOf:(NSData*)sample inData:(NSData*)soundfile {
// the block size has to be big enough to find something genuinely unique but small enough to ensure it is still fast.
int blockSize = 128;
int position = 0;
int returnPosition = INT32_MAX;
// check to see if the block size exceeds the sample or data file size
if (soundfile.length < blockSize || sample.length < blockSize) {
return returnPosition;
}
// create a byte array of the sample, ready to use to compare with the shifting buffer
char* sampleByteArray = malloc(sample.length);
memcpy(sampleByteArray, sample.bytes, sample.length);
// now loop through the sound file, shifting the window along.
while (position < (soundfile.length - blockSize)) {
char* window = malloc(blockSize);
memcpy(window, soundfile.bytes + position, blockSize);
// check to see if this is a match
if(!memcmp(sampleByteArray, window, blockSize)) {
// these are the same, now to check if the whole sample is the same
if ((position + sample.length) > soundfile.length) {
// the sample won't fit in the remaining soundfile, so it can't be this!
free(window);
break;
}
if(!memcmp(sampleByteArray, soundfile.bytes + position, sample.length)) {
// this is an entire match, position marks the start in bytes of the sample.
free(window);
returnPosition = position;
break;
}
}
free(window);
position++;
}
free(sampleByteArray);
return returnPosition;
}
It compiles, didn't have time to setup the scenario to check your exact case, but i'm quite confident this may help.

Erlang: What is most-wrong with this trie implementation?

Over the holidays, my family loves to play Boggle. Problem is, I'm terrible at Boggle. So I did what any good programmer would do: wrote a program to play for me.
At the core of the algorithm is a simple prefix trie, where each node is a dict of references to the next letters.
This is the trie:add implementation:
add([], Trie) ->
dict:store(stop, true, Trie);
add([Ch|Rest], Trie) ->
% setdefault(Key, Default, Dict) ->
% case dict:find(Key, Dict) of
% { ok, Val } -> { Dict, Val }
% error -> { dict:new(), Default }
% end.
{ NewTrie, SubTrie } = setdefault(Ch, dict:new(), Trie),
NewSubTrie = add(Rest, SubTrie),
dict:store(Ch, NewSubTrie, NewTrie).
And you can see the rest, along with an example of how it's used (at the bottom), here:
http://gist.github.com/263513
Now, this being my first serious program in Erlang, I know there are probably a bunch of things wrong with it… But my immediate concern is that it uses 800 megabytes of RAM.
So, what am I doing most-wrong? And how might I make it a bit less-wrong?
You could implement this functionality by simply storing the words in an ets table:
% create table; add words
> ets:new(words, [named_table, set]).
> ets:insert(words, [{"zed"}]).
> ets:insert(words, [{"zebra"}]).
% check if word exists
> ets:lookup(words, "zed").
[{"zed"}]
% check if "ze" has a continuation among the words
78> ets:match(words, {"ze" ++ '$1'}).
[["d"],["bra"]]
If trie is a must, but you can live with a non-functional approach, then you can try digraphs, as Paul already suggested.
If you want to stay functional, you might save some bytes of memory by using structures using less memory, for example proplists, or records, such as -record(node, {a,b,....,x,y,z}).
I don't remember how much memory a dict takes, but let's estimate. You have 2.5e6 characters and 2e5 words. If your trie had no sharing at all, that would take 2.7e6 associations in the dicts (one for each character and each 'stop' symbol). A simple purely-functional dict representation would maybe 4 words per association -- it could be less, but I'm trying to get an upper bound. On a 64-bit machine, that'd take 8*4*2.7 million bytes, or 86 megabytes. That's only a tenth of your 800M, so something's surely wrong here.
Update: dict.erl represents dicts with a hashtable; this implies lots of overhead when you have a lot of very small dicts, as you do. I'd try changing your code to use the proplists module, which ought to match my calculations above.
An alternative way to solve the problem is going through the word list and see if the word can be constructed from the dice. That way you need very little RAM, and it might be more fun to code. (optimizing and concurrency)
Look into DAWGs. They're much more compact than tries.
I don't know about your algorithm, but if you're storing that much data, maybe you should look into using Erlang's built-in digraph library to represent your trie, instead of so many dicts.
http://www.erlang.org/doc/man/digraph.html
If all words are in English, and the case doesn't matter, all characters can be encoded by numbers from 1 to 26 (and in fact, in Erlang they are numbers from 97 to 122), reserving 0 for stop. So you can use the array module as well.

Resources