Loading a large file into an array of String in iOS

Loading a large file into an array of String in iOS - ios

My app needs to load large files (3 to 7 MB of text) into arrays of String, i.e., one string per line.
The simple way to go is to use componentsSeparatedByString or componentsSeparatedByCharactersInSet (or their Swift counterparts) as follows:
array = [fileContents componentsSeparatedByCharactersInSet: [NSCharacterSet newlineCharacterSet]];
Everything works as it should, except that for large enough files and old enough devices, times are unacceptably long. Just to be clear, it takes 2+ seconds to create the array on an iPad 4th gen, and 8+ seconds on an iPhone 4.
Since I can't change the hardware of the customers and I can't change componentsSeparatedBy..., is there a way I could speed this up?
Perhaps having on disk, instead of the text file, something closer to an array of String (which would maybe load faster)? If so, how?
(Swift solutions are welcome, too)

Related

iOS - Voice Over - Accessibility for large amounts

I am not able to make iOS voice over / Accessiblity read large amounts in money format for example £782284.00 , this should read as seven hundered eighty two thousand , two hundered and eight four, but iOS voice over reads this as seven eight two two eight four.

The best way to get your purpose is to format perfectly your numbers to be vocalized as desired by VoiceOver.
Use of NumberFormatter and .spellOut style to read out the accessibilityLabel are important tools to adapt the VoiceOver vocalization for large amounts.
I deeply encourage you to try and vocalize numbers as they should : the application content MUST be adapted to the VoiceOver users, not the opposite.

It is really important to make sure you do all you can to make the app easier to use for VoiceOver users. I have been running an app for sighted and visually impaired players, you can see an example of this method running in the inventory section of the app:
https://apps.apple.com/us/app/swordy-quest-an-rpg-adventure/id1446641513
The number of requests I got from blind & visually impaired players to read out millions as millions, rather than individual digits, was huge. Please do take the extra effort to make it fully VoiceOver compatible. It makes life so much easier for VoiceOver users. Here is a method I created solely for this purpose, that VoiceOver seems to like. You are basically adding thousand comma separators:
// e.g. 1000 -> "1,000"
public static func addCommaSeperatorsToInt(number: Int) -> String {
let numberFormatter = NumberFormatter()
numberFormatter.numberStyle = NumberFormatter.Style.decimal
return numberFormatter.string(from: NSNumber(value: number))!
}

I agree to #aadrian is suggesting, try not to break convention VoiceOver users are used to. Because some large numbers are read in a long time, then the users have slow navigation across numbers.
However, if that is the case you need it hard, here you can have (I couldnot find sth for swift/objc but you will get the idea) a number to word converter and then you can set that to _.accessbilityLabel of the UIView or whatever. Then it will read as you like.
Also see this

Compression of 2-D array on the fly with iOS

I am currently using Swift to store some data on iOS. The values come as a 2-D integer array, defined as an [[Int]]. I need to save these integer arrays to disk. Currently, I am using the following function to do so:
func writeDataToFile(data: [[Int]], filename: String){
let fullfile = NSString(string: self.folderpath).stringByAppendingPathComponent(filename+".txt")
var fh = NSFileHandle(forWritingAtPath: fullfile)
if fh == nil{
NSFileManager.defaultManager().createFileAtPath(fullfile, contents: nil, attributes: nil)
fh = NSFileHandle(forWritingAtPath: fullfile)
}
fh?.writeData("Time: \(filename)\n".dataUsingEncoding(NSUTF16StringEncoding)!)
fh?.writeData("\(data)".dataUsingEncoding(NSUTF16StringEncoding)!)
fh?.closeFile()
}
Currently this function works just fine, but it produces files that are relatively large (1.1mb each - which when you are writing them at 1 Hz, gets huge fast). The arrays written have a fixed size and the values will be from 20000 < x < 35000. Is there a way to compress this data on the fly such that I can later read the data into say Python or some other language? Would it just be easier to use some library like Zip to compress the files into zips after writing? Is there some way to transform the data (without loss of data/fidelity) into an image (for compression purposes, not viewing purposes). There is some metadata that I would like to store along with the 2-D array, such as a timestamp.

Since you are currently saving those as string values, the simplest and fastest size reduction would be to save them as binary values (or base64 encoded strings). Then you could convert all of your int values into 2 byte sets (since unsigned 2 bytes can store up to 65536) and save the values that way. That would go from 5 bytes per int value down to 2 bytes per int value. Immediate savings of 60%.
For the Base64 encoding I use something I found on the internet called NSData+Base64. But in looking that up I just read:
In the iOS 7 and Mac OS 10.9 SDKs, Apple introduced new base64 methods on NSData that make it unnecessary to use a 3rd party base 64 decoding library. What's more, they exposed access to private base64 methods that are retrospectively available back as far as IOS 4 and Mac OS 6.
Link.
You could go much further into the compression by realizing that data from one element to the next will likely not change by the entire range, since heat maps will always be gradients. Then you could save the arrays as difference since the last element and likely get that down to a single byte (255 value) change set. But that may lose precision if you are viewing something with a very fast heat gradient (or using a low resolution camera).
If you eventually need to get into compression, I use GTMNSData+zlib and decompress it in a c# webservice. So with a little bit of work it is cross platform.

A proper answer for this would require more information about the problem domain. Most likely, 2D arrays are the wrong data structure for this but it's hard to tell without more info.
What's the data stored in these arrays?

Apple has had a compression library since last year:
https://developer.apple.com/library/ios/documentation/Performance/Reference/Compression/index.html

Core Data Model design - 8 bools or 1 NSString? Core Data iOS swift

I hope this is the right forum to ask this sort of question. I'm trying to minimize the amount of data performing a sync with iCloud, while ensuring ideal app speed as well... I am trying to use an efficient model... My application (which is a basic checklist application) will have around 8 variables that can be marked as "owned" for each item.
Would it be better to create 8 attributes as Boolean attributes or a single String attribute? With the string attribute, I would simply include 8 numbers like "00000000" or "10000000" or "10001000" with each character of the string linked to a particular item and retrieved by looking for a particular index of the string.
My initial thought is that the 8 booleans would allow for faster reading and writing, and would have a minimal footprint, but I would appreciate some more intelligent feedback from the experts.

I would not recommend nothing of this to minimize memory usage. Reason is that bool costs 1 byte - 8 bit (but wee need only one and other 7 wont be used), string same but with characters. If you want to minimize memory usage - than use 1 byte. Because 1 byte - 8 bit you can set each bit with 1 or 0 using memory mask(bit mask). And than all your values will be allocated in 1 byte what will use eight time less memory than bool. How to use memory mask(bit mask) you can read this topic
Declaring and checking/comparing (bitmask-)enums in Objective-C

I would think any difference in speed or memory is likely to be marginal. Design and code it in the most logical way, which at first sight seems to be using 8 booleans. For example, if you need to fetch a subset of the data based on the boolean values, it will be far easier to construct the required predicate.

How does the size of a realm-file develop?

How does the size of a realm-file develop ?
To start with: I have a realm-file with several properties and one of them being an array of 860 entries and each array-entry consists of a couple of properties again.
One array-property states the name of the entry.
I observed the following:
If the name-property sais "Criteria_A1" (until "Criteria_A860") - then the realm-file is 1.6 MB big
If the name-property sais "A1" (until "A860") - then the realm-file is only 786 kB big
Why is the extra letters in the array-name-property making the realm-file this much bigger ??
A second observation:
if I add more objects (each again having an array with 860 entries), then the file size gets 1.6MB big again (no matter how many objects I add; guess until a critical value again where the size tripples...or am I wrong??).
It almost seems to me that the realm-file at 786 kB is doubled in size as soon as something is added (either a property that has more letters or an object that is added). Why does the realm-file double at a critical value and not linearly increase in size with more content added ??
Thanks for a clarification on this.

It's pretty well observed. :-) The Realm file starts out at about 4k and will double in size once it runs out of free space. It keeps doubling until 128M and then adds constantly 128M thereafter.
The reason to double the file and not just grow linearly is only due to performance. It's a common algorithm for dynamic data structures to just keep doubling.
You can use the methods available as seen below to write a compacted copy removing all free space in the file. This can be useful if you don't add new data anymore, want to ship a static database or want to send the file over the network.
Realm.writeCopyToURL(_:encryptionKey:) in Swift
-[RLMRealm writeCopyToURL:encryptionKey:error:] in Objective-C
Realm.writeCopyTo() in Java
Those thresholds and algorithm mentioned are the current ones, and may change in future versions though.
Hope this clarifies?

Sorting 20GB of data

In the past I had to work with big files, somewhere about in the 0.1-3GB range. Not all the 'columns' were needed so it was ok to fit the remaining data in RAM.
Now I have to work with files in 1-20GB range, and they will probably grow as the time will pass. That is totally different because you cannot fit the data in RAM anymore.
My file contains several millions of 'entries' (I have found one with 30 mil entries). On entry consists in about 10 'columns': one string (50-1000 unicode chars) and several numbers. I have to sort the data by 'column' and show it. For the user only the top entries (1-30%) are relevant, the rest is low quality data.
So, I need some suggestions about in which direction to head out. I definitively don't want to put data in a DB because they are hard to install and configure for non computer savvy persons. I like to deliver a monolithic program.
Showing the data is not difficult at all. But sorting... without loading the data in RAM, on regular PCs (2-6GB RAM)... will kill some good hours.
I was looking a bit into MMF (memory mapped files) but this article from Danny Thorpe shows that it may not be suitable: http://dannythorpe.com/2004/03/19/the-hidden-costs-of-memory-mapped-files/
So, I was thinking about loading only the data from the column that has to be sorted in ram AND a pointer to the address (into the disk file) of the 'entry'. I sort the 'column' then I use the pointer to find the entry corresponding to each column cell and restore the entry. The 'restoration' will be written directly to disk so no additional RAM will be required.
PS: I am looking for a solution that will work both on Lazarus and Delphi because Lazarus (actually FPC) has 64 bit support for Mac. 64 bit means more RAM available = faster sorting.

I think a way to go is Mergesort, it's a great algorithm for sorting a
large amount of fixed records with limited memory.
General idea:
read N lines from the input file (a value that allows you to keep the lines in memory)
sort these lines and write the sorted lines to file 1
repeat with the next N lines to obtain file 2
...
you reach the end of the input file and you now have M files (each of which is sorted)
merge these files into a single file (you'll have to do this in steps as well)
You could also consider a solution based on an embedded database, e.g. Firebird embedded: it works well with Delphi/Windows and you only have to add some DLL in your program folder (I'm not sure about Lazarus/OSX).

If you only need a fraction of the whole data, scan the file sequentially and keep only the entries needed for display. F.I. lets say you need only 300 entries from 1 million. Scan the first first 300 entries in the file and sort them in memory. Then for each remaining entry check if it is lower than the lowest in memory and skip it. If it is higher as the lowest entry in memory, insert it into the correct place inside the 300 and throw away the lowest. This will make the second lowest the lowest. Repeat until end of file.

Really, there are no sorting algorithms that can make moving 30gb of randomly sorted data fast.
If you need to sort in multiple ways, the trick is not to move the data itself at all, but instead to create an index for each column that you need to sort.
I do it like that with files that are also tens of gigabytes long, and users can sort, scroll and search the data without noticing that it's a huge dataset they're working with.

Please finde here a class which sorts a file using a slightly optimized merge sort. I wrote that a couple of years ago for fun. It uses a skip list for sorting files in-memory.
Edit: The forum is german and you have to register (for free). It's safe but requires a bit of german knowledge.

If you cannot fit the data into main memory then you are into the realms of external sorting. Typically this involves external merge sort. Sort smaller chunks of the data in memory, one by one, and write back to disk. And then merge these chunks.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart