What's Property List Output Encoding for? If it is set to binary, does it actually compress the plist files?
If set to binary, Xcode appears to take in any XML property lists and to save them into the application bundle as binary property lists. A binary property list has a more compact file format than an XML property list, but I don't know if you'd call it compressed.
From the Property List Programming Guide:
XML property lists are more portable
than the binary alternative and can be
manually edited, but binary property
lists are much more compact; as a
result, they require less memory and
can be read and written much faster
than XML property lists. In general,
if your property list is relatively
small, the benefits of XML property
lists outweigh the I/O speed and
compactness that comes with binary
property lists.
Related
I'm using C SPSS I/O library to write and read sav files.
I need to store my own version number in sav file. The requirements are:
1) That version should not be visible to user when he/she uses regular SPSS programs.
2) Obviously, regular SPSS programs and the I/O module should not overwrite the number.
Please, advice about that place or function.
Regards,
There is a header field in the sav file that identifies the creator. However, that would be overwritten if the file is resaved. It would be visible with commands such as sysfile info.
Another approach would be to create a custom file attribute using a name that is unlikely to be used by anyone else. It would also be visible in a few system status commands such as DISPLAY DICT and I think, CODEBOOK. It could be overwritten, with the DATASET ATTRIBUTE command but would not be changed just by resaving the file.
I am in need of a data format which will allow me to reduce the time needed to parse it to a minimum. In other words I'm looking for a format with as little overhead as possible and being parseable in the shortest amount of time.
I am building an application which will pull a lot of data from an API, parse it and display it to the user. So the format should be as small as possible so that the transmission will be fast and also should be very efficient for parsing. What are my options?
Here are a few formats that pop in in my head:
XML (a lot of overhead and slow parsing IMO)
JSON (still too cumbersome)
MessagePack (looks interesting)
CSV (with a custom parser written in C)
Plist (fast parsing, a lot of overhead)
... any others?
So currently I'm looking at CSV the most. Any other suggestions?
As stated by Apple in Property List Programming Guide binary plist representation should be fastest
Property List Representations
A property list can be stored in one of three different ways: in an
XML representation, in a binary format, or in an “old-style” ASCII
format inherited from OpenStep. You can serialize property lists in
the XML and binary formats. The serialization API with the old-style
format is read-only.
XML property lists are more portable than the binary alternative and
can be manually edited, but binary property lists are much more
compact; as a result, they require less memory and can be read and
written much faster than XML property lists. In general, if your
property list is relatively small, the benefits of XML property lists
outweigh the I/O speed and compactness that comes with binary property
lists. If you have a large data set, binary property lists, keyed
archives, or custom data formats are a better solution.
You just need to set the correct flag while creation or reading NSPropertyListBinaryFormat_v1_0. Just be sure that the data you want to represent in the plist are resented by this format.
I'm looking for an programmatic approach to positively identify fillable PDF form(s) among non-PDF form files.
The options that I believe are available are:
Parse the PDF code and content
Parse the file for signature identification with a hex capable language such as Python
Parse the file with a hex capable language such as Python to flag telltale signs
If the Catalog > AcroForm > Fields Array has at least one Dictionary element, the PDF is a fillable form. For bonus points, you could do some level of validation that the Field Dictionary is legit (validate Type and Subtype for example).
In the Core Programming Data Guide under the Fetched Properties section there is a paragraph that states the following.
The most significant constraint is that you cannot use substitutions to change the structure of the predicate—for example you cannot change a LIKE predicate to a compound predicate, nor can you change the operator (in this example, LIKE [c]). Moreover, in Mac OS X version 10.4, this only works with the XML and Binary stores as the SQLite store will not generate the appropriate SQL.
The last sentence states "this only works in XML and Binary stores". Is this saying that Fetched Properties only work in XML and Binary stores or some other part of the documentation?
Can you use fetched properties with a SQLite store?
Long story short: yes, you can use fetched properties with an SQLite store.
This paragraph refers to "substitution", which is described in the two preceding paragraphs. It basically says that Core Data allows substitutions for predicate expressions, like changing Cambridge to Durham, but disallows changing predicates types. So once you've setup a predicate
A like B
A and B can change, but like can't.
The bit about OS X 10.4 means that expression substitution is available for XML and Binary stores, but not SQL stores. Later versions of the OS support substitution for SQL stores as well.
I have an application for storing many strings in a TStringList. The strings will be largely similar to one another and it occurs to me that one could compress them on the fly - i.e. store a given string in terms of a mixture of unique text fragments plus references to previously stored fragments. StringLists such as lists of fully-qualified path and filenames should be able to be compressed greatly.
Does anyone know of a TStringlist descendant that implement this - i.e. provides read and write access to the uncompressed strings but stores them internally compressed, so that a TStringList.SaveToFile produces a compressed file?
While you could implement this by uncompressing the entire stringlist before each access and re-compressing it afterwards, it would be unnecessarily slow. I'm after something that is efficient for incremental operations and random "seeks" and reads.
TIA
Ross
I don't think there's any freely available implementation around for this (not that I know of anyway, although I've written at least 3 similar constructs in commercial code), so you'd have to roll your own.
The remark Marcelo made about adding items in order is very relevant, as I suppose you'll probably want to compress the data at addition time - having quick access to entries already similar to the one being added, gives a much better performance than having to look up a 'best fit entry' (needed for similarity-compression) over the entire set.
Another thing you might want to read up about, are 'ropes' - a conceptually different type than strings, which I already suggested to Marco Cantu a while back. At the cost of a next-pointer per 'twine' (for lack of a better word) you can concatenate parts of a string without keeping any duplicate data around. The main problem is how to retrieve the parts that can be combined into a new 'rope', representing your original string. Once that problem is solved, you can reconstruct the data as a string at any time, while still having compact storage.
If you don't want to go the 'rope' route, you could also try something called 'prefix reduction', which is a simple form of compression - just start out each string with an index of a previous string and the number of characters that should be treated as a prefix for the new string. Be aware that you should not recurse this too far back, or access-speed will suffer greatly. In one simple implementation, I did a mod 16 on the index, to establish the entry at which prefix-reduction started, which gave me on average about 40% memory savings (this number is completely data-dependant of course).
You could try to wrap a Delphi or COM API around Judy arrays. The JudySL type would do the trick, and has a fairly simple interface.
EDIT: I assume you are storing unique strings and want to (or are happy to) store them in lexicographical order. If these constraints aren't acceptable, then Judy arrays are not for you. Mind you, any compression system will suffer if you don't sort your strings.
I suppose you expect general flexibility from the list (including delete operation), in this case I don't know about any out of the box solution, but I'd suggest one of the two approaches:
You split your string into words and
keep separated growning dictionary
to reference the words and save list of indexes internally
You implement something related to
zlib stream available in Delphi, but operating by the block that
for example can contains 10-100
strings. In this case you still have
to recompress/compress the complete
block, but the "price" you pay is lower.
I dont think you really want to compress TStrings items in memory, because it terribly ineffecient. I suggest you to look at TStream implementation in Zlib unit. Just wrap regular stream into TDecompressionStream on load and TCompressionStream on save (you can even emit gzip header there).
Hint: you will want to override LoadFromStream/SaveToStream instead of LoadFromFile/SaveToFile