best way to serialize native C array - ios

I'm currently using NSCoding to serialize a tree of objects, but 1 of them contains as data member a native C float array with 1,000,000 entries, so in order to serialize it using encodeFloat:forKey: for each array entry, I need to apply 1,000,000 useless keys , that might be very slow. what the prefered way to handle this?

for each array entry, I need to apply 1,000,000 useless keys
No, you definitely do not need separate keys for each element. A C array is a contiguous block of memory, so you can simply create a NSData object from that block and store that as Hot Licks suggested. Or, since a million floats will require a fair bit of storage, you might compress the data before storing it. And in fact, you don't really even need NSData -- you can encode a range of bytes directly with -encodeBytes:length:forKey:.

Related

Nested fixed-size arrays/records

I want to model fixed-size arrays that can contain records and other fixed-size arrays. I then want to model store and select accesses to them.
I currently use ArraySorts for the arrays and Datatypes for the records. However, according to this answer (arrays) and this answer (records), these aren't really intended for my usecase. Is there anything else in Z3 I could use instead?
Background: I want to model pointers as they occur in the LLVM IR. For this, each pointer has a data array that represents the memory buffer into which it is pointing and an indices array that represents the indices used in getelementptr calls. Since the memory buffer could contain pointers or structs, I need to be able to nest the arrays (or store the records in arrays).
An example (in z3py) looks like this:
Vec3 = z3.Datatype("Vec3")
Vec3.declare("Vec3__init",
("x", z3.IntSort()),
("y", z3.IntSort()),
("z", z3.IntSort())
)
Vec3 = Vec3.create()
PointerVec3 = z3.Datatype("Pointer__Vec3")
PointerVec3.declare("Pointer__Vec3__init",
("data", z3.ArraySort(z3.BitVecSort(32), Vec3)),
("nindices", z3.IntSort()),
("indices", z3.ArraySort(z3.IntSort(), z3.BitVecSort(32)))
)
PointerVec3 = PointerVec3.create()
Arrays and records are the only way to model these things in z3/SMT-Lib. And they can be nested as you wish.
It is true that SMTLib arrays are not quite like arrays you find in regular programming languages. But it's not true that they are always unbounded. Their size exactly matches the cardinality of their domain. If you want a bounded array, I recommend using an appropriate BitVec type for your source. For an n-bit bit vector, your array will have 2^n elements. You can work with a small enough n, and treat that as your array; which typically matches what you see in practice: Most of such internal arrays will be a power-of-two anyhow. (Moreover, provers usually don't do well with large arrays; so sticking to a small power-of-two is a good idea, at least to start with.)
So, these are your only options. Stack-overflow works the best if you try it out and actually ask about what sort of problems you ran into.

How to write a hashmap to a file in a memory efficient format?

I am writing a Huffman Coding/Decoding algorithm and I am running into the problem that the storing the Huffman tree is taking up way to much room. Currently, I am converting the tree into a hashMap as such -> hashMap<Character(s),Huffman Code> and then storing that hash map. The issue is that, while the string is compressed great, adding the Huffman Tree data stored in the hash map is adding so much overhead that it's actually ending up bigger than the original. Currently I am just naively writing [data, value] pairs to the file, but I imagine there must be some sort of trickier way to do that. Any ideas?
You do not need the tree in order to encode. All you need is the bit lengths for each symbol and a way to order the symbols. See Canonical Huffman Code.
In fact, all you need is the symbols that are coded ordered by bit length, and within bit length sorted by symbol, and then the number of codes of each length. With just those two things you can encode.

convert hash value into string in objective c

I have a hash value and I want to convert it into string formate but I do not know how to do that.
Here is the hash value
7616db6c232292d2e56a2de9da49ea810d5bb80d53c10e7b07d9521dc88b3177
Hashing technique is a one way algorithm. So that is not possible, sorry.
You can't. A hash is a one-way function. Plus, two strings could theoretically generate the same hash.
BUT. If you had a large enough, pre-computed list of as many hashes as you could generate for each unique string, you could potentially convert your hash back into a string. Although it would require thousands of GB in entries and a lot of time to compute each and every hash. So it practically isn't possible.

Optimal storage for string/integer pairs with fast lookup of strings?

I need to maintain correspondence between strings and integers, then lookup the string value and return the integer. What's the best structure to store this info that meets the following requirements:
Speed and memory size are important, in that order.
I don't want to reinvent the wheel and write my own sorting routine. A call to Sort(CompareFunction) is fine of course.
Conditions:
The integers are not guaranteed to be sequential, neither is there a 'start value' like 0 or 1
Number of data pairs can vary from 100 to 100000
The data are all read in at the beginning, there's no subsequent additions/deletions/modifications
FWIW the strings are the hex entry ID's that Outlook (MAPI?) uses to identify entries. Example: 00000000FE42AA0A18C71A10E8850B651C24000003000000040000000000000018000000000000001E7FDF4152B0E944BA66DFBF2C6A6416E4F52000487F22
There's so many options (TStringList (with objects or name/value pairs), TObjectList, TDictionary, ...) that I'd better ask for advice first...
I have read How can I search faster for name/value pairs in a Delphi TStringList? which suggest TDictionary for string/string pairs, and Sorting multidimensional array in Delphi 2007 which suggest TStringlist objects for string/integer but where sorting is done on the integers.
The second link that you include in the question is not applicable. That is a question concerning sorting rather than efficient lookup. Although you discuss sorting a number of times in your question, you do not have a requirement to sort. Your requirement is simply a dictionary, also known as an associative array. Of course, you can implement that by sorting an array and using binary search for your lookup, but sorting is not a requirement. You simply need an efficient dictionary.
Out of the box, the most efficient and convenient data structure for your problem is TDictionary<string, Integer>. This has lookup complexity of O(1) and so scales well for large collections. For smaller collections a binary search based lookup with lookup complexity of O(log n) can be competitive and can indeed out-perform a dictionary.
Cosmin Prund wrote an excellent answer here on SO where he compared the performance of dictionary lookup against binary search based lookup. I recommend you have a read. I would say that for small containers, performance is probably not that big a problem for you. So even though binary search may be quicker, it probably does not matter because your performance is good either way. But performance probably becomes an issue for larger containers and that's where the dictionary is always stronger. For large enough containers, the performance of binary search may become unacceptable.
I'm sure that it is possible to produce more efficient implementations of dictionaries than the Embarcadero one, but I'd also say that the Embarcadero implementation is perfectly solid. It uses a decent hash function and does not have any glaring weaknesses.
In terms of memory complexity, there's little to choose between a dictionary and a sorted array. It's not possible to improve on a sorted array for memory use.
I suggest that you start with TDictionary<string, Integer> and only look beyond that if your performance requirements are not met.
It seems you are going to lookup long evenly distributed strings. One of the fastest data structures for this kind of problem is Trie.
But your dataset size is rather small, and ready-to-use Delphi solutions like THashedStringList or TDictionary (more convenient) would provide a fairly high speed.

Delphi TStringList wrapper to implement on-the-fly compression

I have an application for storing many strings in a TStringList. The strings will be largely similar to one another and it occurs to me that one could compress them on the fly - i.e. store a given string in terms of a mixture of unique text fragments plus references to previously stored fragments. StringLists such as lists of fully-qualified path and filenames should be able to be compressed greatly.
Does anyone know of a TStringlist descendant that implement this - i.e. provides read and write access to the uncompressed strings but stores them internally compressed, so that a TStringList.SaveToFile produces a compressed file?
While you could implement this by uncompressing the entire stringlist before each access and re-compressing it afterwards, it would be unnecessarily slow. I'm after something that is efficient for incremental operations and random "seeks" and reads.
TIA
Ross
I don't think there's any freely available implementation around for this (not that I know of anyway, although I've written at least 3 similar constructs in commercial code), so you'd have to roll your own.
The remark Marcelo made about adding items in order is very relevant, as I suppose you'll probably want to compress the data at addition time - having quick access to entries already similar to the one being added, gives a much better performance than having to look up a 'best fit entry' (needed for similarity-compression) over the entire set.
Another thing you might want to read up about, are 'ropes' - a conceptually different type than strings, which I already suggested to Marco Cantu a while back. At the cost of a next-pointer per 'twine' (for lack of a better word) you can concatenate parts of a string without keeping any duplicate data around. The main problem is how to retrieve the parts that can be combined into a new 'rope', representing your original string. Once that problem is solved, you can reconstruct the data as a string at any time, while still having compact storage.
If you don't want to go the 'rope' route, you could also try something called 'prefix reduction', which is a simple form of compression - just start out each string with an index of a previous string and the number of characters that should be treated as a prefix for the new string. Be aware that you should not recurse this too far back, or access-speed will suffer greatly. In one simple implementation, I did a mod 16 on the index, to establish the entry at which prefix-reduction started, which gave me on average about 40% memory savings (this number is completely data-dependant of course).
You could try to wrap a Delphi or COM API around Judy arrays. The JudySL type would do the trick, and has a fairly simple interface.
EDIT: I assume you are storing unique strings and want to (or are happy to) store them in lexicographical order. If these constraints aren't acceptable, then Judy arrays are not for you. Mind you, any compression system will suffer if you don't sort your strings.
I suppose you expect general flexibility from the list (including delete operation), in this case I don't know about any out of the box solution, but I'd suggest one of the two approaches:
You split your string into words and
keep separated growning dictionary
to reference the words and save list of indexes internally
You implement something related to
zlib stream available in Delphi, but operating by the block that
for example can contains 10-100
strings. In this case you still have
to recompress/compress the complete
block, but the "price" you pay is lower.
I dont think you really want to compress TStrings items in memory, because it terribly ineffecient. I suggest you to look at TStream implementation in Zlib unit. Just wrap regular stream into TDecompressionStream on load and TCompressionStream on save (you can even emit gzip header there).
Hint: you will want to override LoadFromStream/SaveToStream instead of LoadFromFile/SaveToFile

Resources