Reading in a large text resource into arrays on iOS

Reading in a large text resource into arrays on iOS - ios

I have a large text resource in my bundle. It's a CSV containing lines like
0,1,100,2.2345
It's over 9MB. What's the best way to open and sequentially read it in so I can do something like:
myObject->initData(col0, col1, col2, col3);
(Which just stuffs the float value into one of a number of multidimensional arrays indexed by the integers in the file.)
I tried reading it into a string using [NSString stringWithContentsOfFile:] and using an NSScanner to loop over it, but I don't want to double up my memory usage, even temporarily. It was also quite slow.
What's the best way to do this?
Thanks

You can use NSFileHandle or C APIs to read incrementally, or you can use mmap with NSData.
The remainder is basic cstring buffer handling, or you could use NSString line-by-line.
Your life may be easier if you can export it as a sequence of binary floats, rather than CSV.

Related

How to write a hashmap to a file in a memory efficient format?

I am writing a Huffman Coding/Decoding algorithm and I am running into the problem that the storing the Huffman tree is taking up way to much room. Currently, I am converting the tree into a hashMap as such -> hashMap<Character(s),Huffman Code> and then storing that hash map. The issue is that, while the string is compressed great, adding the Huffman Tree data stored in the hash map is adding so much overhead that it's actually ending up bigger than the original. Currently I am just naively writing [data, value] pairs to the file, but I imagine there must be some sort of trickier way to do that. Any ideas?

You do not need the tree in order to encode. All you need is the bit lengths for each symbol and a way to order the symbols. See Canonical Huffman Code.
In fact, all you need is the symbols that are coded ordered by bit length, and within bit length sorted by symbol, and then the number of codes of each length. With just those two things you can encode.

best way to serialize native C array

I'm currently using NSCoding to serialize a tree of objects, but 1 of them contains as data member a native C float array with 1,000,000 entries, so in order to serialize it using encodeFloat:forKey: for each array entry, I need to apply 1,000,000 useless keys , that might be very slow. what the prefered way to handle this?

for each array entry, I need to apply 1,000,000 useless keys
No, you definitely do not need separate keys for each element. A C array is a contiguous block of memory, so you can simply create a NSData object from that block and store that as Hot Licks suggested. Or, since a million floats will require a fair bit of storage, you might compress the data before storing it. And in fact, you don't really even need NSData -- you can encode a range of bytes directly with -encodeBytes:length:forKey:.

Replace characters in C string

Given this C string:
unsigned char *temp = (unsigned char *)[#"Hey, I am some usual CString" UTF8String]
How can I replace "usual" with "other" to get: "Hey, I am some other CString".
I cannot use NSString functions (replaceCharactersInRange/replaceOccurencesOfString, etc.) for performance reasons. I have to keep it all at low level, since the strings I'll be dealing with happen to exceed 5MB, and therefore the replacements (there will be a lot of replacements to do) take about 10 minutes on a iOS device.

Objective-C is a just thin layer over C.
If you need to work with native C strings, just go ahead and do it.
This
What is the function to replace string in C?
seems to address your problem fairly well.

The C string returned by UTF8String is const. You can't safely change it by casting it to a non-const string and mutate the bytes. So the only way to do this is by creating a copy.
If you really have reason to use an NSString as the source it might be much faster to do the transformation on the original string.
If you want to get a better answer that helps you to speed up your special case you should provide some more information. How do you create the original string, what's the number and size of search/replacement strings and so on.

How to write and read float data fast, not using string?

I have many float data which is generated from an image. I want to store it to a file, like XX.dat ( general in C). and I will read it again to do further processing.
I have method to represent float by nsstring and write it in to .txt file. but it is too slow. Is there some function which is same as fwrite( *data , *pfile) and fread(*buf, *pfile) in c? or some new idea?
many thanks!

In iOS you can still make use of the standard low-level file (and socket, among other things) API's. So you can use fopen(), fwrite(), fread(), etc. just as you would in any other C program.
This question has some examples of using the low-level file API on iOS: iPhone Unzip code
Another option to consider is writing your floats into something like an NSMutableData instance, and then writing that to file. That will be faster than converting everything to strings (you'll get a binary file instead of a text one), though probably still not as fast as using the low-level API's. And you'd probably have to use something like this to convert between floats and byte-arrays.

If you are familiar with lower level access, you could mmap your file, and then access the data directly just as you would any allocated memory.

Delphi TStringList wrapper to implement on-the-fly compression

I have an application for storing many strings in a TStringList. The strings will be largely similar to one another and it occurs to me that one could compress them on the fly - i.e. store a given string in terms of a mixture of unique text fragments plus references to previously stored fragments. StringLists such as lists of fully-qualified path and filenames should be able to be compressed greatly.
Does anyone know of a TStringlist descendant that implement this - i.e. provides read and write access to the uncompressed strings but stores them internally compressed, so that a TStringList.SaveToFile produces a compressed file?
While you could implement this by uncompressing the entire stringlist before each access and re-compressing it afterwards, it would be unnecessarily slow. I'm after something that is efficient for incremental operations and random "seeks" and reads.
TIA
Ross

I don't think there's any freely available implementation around for this (not that I know of anyway, although I've written at least 3 similar constructs in commercial code), so you'd have to roll your own.
The remark Marcelo made about adding items in order is very relevant, as I suppose you'll probably want to compress the data at addition time - having quick access to entries already similar to the one being added, gives a much better performance than having to look up a 'best fit entry' (needed for similarity-compression) over the entire set.
Another thing you might want to read up about, are 'ropes' - a conceptually different type than strings, which I already suggested to Marco Cantu a while back. At the cost of a next-pointer per 'twine' (for lack of a better word) you can concatenate parts of a string without keeping any duplicate data around. The main problem is how to retrieve the parts that can be combined into a new 'rope', representing your original string. Once that problem is solved, you can reconstruct the data as a string at any time, while still having compact storage.
If you don't want to go the 'rope' route, you could also try something called 'prefix reduction', which is a simple form of compression - just start out each string with an index of a previous string and the number of characters that should be treated as a prefix for the new string. Be aware that you should not recurse this too far back, or access-speed will suffer greatly. In one simple implementation, I did a mod 16 on the index, to establish the entry at which prefix-reduction started, which gave me on average about 40% memory savings (this number is completely data-dependant of course).

You could try to wrap a Delphi or COM API around Judy arrays. The JudySL type would do the trick, and has a fairly simple interface.
EDIT: I assume you are storing unique strings and want to (or are happy to) store them in lexicographical order. If these constraints aren't acceptable, then Judy arrays are not for you. Mind you, any compression system will suffer if you don't sort your strings.

I suppose you expect general flexibility from the list (including delete operation), in this case I don't know about any out of the box solution, but I'd suggest one of the two approaches:
You split your string into words and
keep separated growning dictionary
to reference the words and save list of indexes internally
You implement something related to
zlib stream available in Delphi, but operating by the block that
for example can contains 10-100
strings. In this case you still have
to recompress/compress the complete
block, but the "price" you pay is lower.

I dont think you really want to compress TStrings items in memory, because it terribly ineffecient. I suggest you to look at TStream implementation in Zlib unit. Just wrap regular stream into TDecompressionStream on load and TCompressionStream on save (you can even emit gzip header there).
Hint: you will want to override LoadFromStream/SaveToStream instead of LoadFromFile/SaveToFile

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart