From the Core Data Programming Guide:
iOS: The XML store is not available on iOS.
Why isn't this available? Is it because of the lack of certain XML classes or does it require too much processing power or RAM?
Apple would be the authoritative source for this, so we can only guess.
It’s probably because of two factors: XML stores are slower (as stated in the official documentation, mainly because of the need to parse XML and lack of efficient algorithm/data structures for common database operations) and potentially use more disk space than SQLite stores (since data must be enclosed in tags and XML stores use human-readable representation of data).
Edit: libxml2 is available on iOS so XML parsing functionality (or lack thereof) is certainly not the reason.
Related
I thought this would be covered already, but my search returned nothing of relevance.
I am aware that there is NSUserDefaults, Core Data, object archiving, raw SQLite, plists, and of course, storage by web servers. What is unclear and somewhat hazy to a beginner is when to employ each of these various tools.
The usages of web servers vs Core Data is obvious, but what about NSUserDefaults vs plists?
Core Data vs object archiving? A simple breakdown of use cases would really help me understand why there are so many options for storage in iOS.
I try to write a quick and simple list of common use cases, because as #rmaddy says this answer could fill a book chapter:
NSUserDefaults: stores simple user preferences, nothing too complex or secure. If your app has a setting page with a few switches, you could save the data here.
Keychain (see SSKeychain for a great wrapper): used to store sensitive data, like credentials.
PLists: used to store larger structured data (but not huge): it is a really flexible format and can be used in a great number of scenarios. Some examples are:
User generated content storage: a simple list of Geopoint that will be shown by a map or list.
Provide simple initial data to your app: in this case the plist will be included in the NSBundle, instead of being generated by user and filled by user data.
Separate the data needed for a
particular module of your application from other data. For example,
the data needed to build a step-by-step startup tutorial, where each step is similar to the others but just needs different data. Hard-coding this data would easily fill your code, so you could be a better developer and use plists to store the data and read from them instead.
You are writing a library or framework that could be configured in some
way by the developer that uses it.
Object archiving could be useful to serialize more complex objects, maybe full of binary data, that can't (or that you don't want to) be mapped on simpler structures like plists.
Core Data is powerful, can be backed by different persistent stores (SQLite is just one of them, but you can also choose XML files or you can even write your own format!), and gives relationships between elements. It is complex and provides many features useful for the development, like KVO and contexts. You should use it for large data sets of many correlated records, that could be user generated or provided by a server.
Raw SQLite is useful when you need really, really fast access to a relational
data source (Core Data introduces some overhead), or if you need to support the same SQLite format across multiple platforms (you should never mess with CoreData inner SQLite: it uses its own format, so you can't just "import" an existing SQLite in CoreData). For example, for a project I worked for, a webservice provided me some large SQLite instead of jsons or xmls: some of this SQLite were imported to CoreData (operation that could take a while, depending on the source size), because I needed all the features of it, while other SQLites were read directly for a really fast access.
Webserver storage well it should be obvious: if you need to store data to a server it is because the device shouldn't be the only owner of that data. But if you just need to synchronize the same App across different iOS devices (or even with a Mac-ported version of the App) you could also look at iCloud storage, obviously.
I have a number of records (=< 100) that contain sizeable chunks of text that require marking up (semantically: lists, headings, tables, links, quotations, etc...) before storing in a re-usable file format.
When stored, it is likely to remain more or less unchanged for as many years into the future as possible.
It contains some non-ascii, so UTF-8 is required. I started using HTML, then considered Markdown... but would like to know what people think is the most future-proof markup format for long-term storage? The content is initially for a (mostly static) website, but may be used as content for other outputs.
Finally, opinions on the choice of storage for long-term use - database, separate documents...? Changes to records will be infrequent and edited by only 1-3 people, and read access should increase over time.
Update:
I've finally chosen the common features (e.g. for tables) between MultiMarkdown, PHP Markdown Extra and Kramdown as the text format (Markdown omits too many HTML tags), and am converting the resulting files to html with Kramdown. Now I'm trying out iOS Markdown editors that can handle an extended Markdown and sync via Dropbox to my desk/laptop.
Any storage not designed for long-term archiving will break.
It is not so much a question of database vs. filesystem, but how to ensure that no (silent) data corruption happens, and how to migrate data. I can give you no definitive answers, because it depends on a lot of factors (incl. costs), but here are a few resources:
Building Better Long-Term Archival Storage System, Talk by Miller/Storer at the Library of Congress
The Digital Dilemma, Book aimed at movie archiving, but highlights some of the issues in long term archiving.
Project Honeycomb, a project by SUN for open source long-term archiving, but discontinued.
I have no real answer for the format question, but I think HTML + UTF-8 should be readable even in decades, but document it.
We've got a rails app that processes large amounts of xml data imports. Right now we're storing these ~5MB xml docs in Postgres. This is not ideal given that we use each xml doc once or twice for parsing. We'd like to have an intelligent way of storing and archiving these docs, but not overly complicate the retrieval process for the sake of space. We've considered moving the docs to Mongo (which we're also using), but then aren't we just artificially boosting the memory requirements of our Mongo db servers?
What's the best way for us to deal with this?
I would just store a link to the file in the DB if you use it only for parsing once or twice and then load the file from the given link. Another aproach is to use a XML DB, e.g. eXist.
You could try eXist, an XML database. If you are just archiving them, though, why not just store them in a directory tree?
You may want to look into DB2's PureXML capabilities. To play with it, you can download the free DB2 Express-C version here. For the record, IBM is also the only database provider officially supporting their Ruby driver and Rails adapter, so you wouldn't be on your own.
What harm are they doing where they are? They will take up 'space' wherever you put them.
If are confident you will never need them again then there is a case for archival to less expensive storage (eg tape?) - otherwise whatever you do will 'overly complicate the retrieval process'
You could consider compressing them in-place if you are not already doing so
I have some experience with Pragmatic-Programmer-type code generation: specifying a data structure in a platform-neutral format and writing templates for a code generator that consume these data structure files and produce code that pulls raw bytes into language-specific data structures, does scaling on the numeric data, prints out the data, etc. The nice pragmatic(TM) ideas are that (a) I can change data structures by modifying my specification file and regenerating the source (which is DRY and all that) and (b) I can add additional functions that can be generated for all of my structures just by modifying my templates.
What I had used was a Perl script called Jeeves which worked, but it's general purpose, and any functions I wanted to write to manipulate my data I was writing from the ground up.
Are there any frameworks that are well-suited for creating parsers for structured binary data? What I've read of Antlr suggests that that's overkill. My current target langauges of interest are C#, C++, and Java, if it matters.
Thanks as always.
Edit: I'll put a bounty on this question. If there are any areas that I should be looking it (keywords to search on) or other ways of attacking this problem that you've developed yourself, I'd love to hear about them.
Also you may look to a relatively new project Kaitai Struct, which provides a language for that purpose and also has a good IDE:
Kaitai.io
You might find ASN.1 interesting, as it provide an absract way to describe the data you might be processing. If you use ASN.1 to describe the data abstractly, you need a way to map that abstract data to concrete binary streams, for which ECN (Encoding Control Notation) is likely the right choice.
The New Jersey Machine Toolkit is actually focused on binary data streams corresponding to instruction sets, but I think that's a superset of just binary streams. It has very nice facilities for defining fields in terms of bit strings, and automatically generating accessors and generators of such. This might be particularly useful
if your binary data structures contain pointers to other parts of the data stream.
I'm about to write a small utility to organze and tag my mp3s.
What is the best way to store small amounts of data. More importantly, are there databases which exist where I don't need to install a client/server environment, I just include the library and I'm good?
I could use XML, but I'm afraid that the file size would become large and hard to handle, not to mention keeping the memory footprint small.
Thanks
EDIT: I haven't decided on the language, I wanted to make my decision independent of platform. If I had to choose, most likely .NET, second Java, third C++.
My apologies, this is for a Windows App.
On Windows you can use the built-in esent database engine. There is an API you can use from C++
http://blogs.msdn.com/windowssdk/archive/2008/10/23/esent-extensible-storage-engine-api-in-the-windows-sdk.aspx
There is also a managed interop layer that you can use from C# code:
http://www.codeplex.com/ManagedEsent
Which language/platform are you talking about?
In the Java world I prefer using embedded databases such as HSQLDB, H2 or JavaDB (f.k.a. Derby).
They don't need installing and still provide the simple access you're used to from a "real" DBMS.
In the C/Python/Unixy world SQLite is a hot contender in that area.
Another option is the various forms of the Berkeley database (eg, db3, db4, SleepyCat.)
SQLITE if you want the pain of a relational DB without a server install or hassle.
I would use one of the many text-serialization formats. I personally think that YAML 1.1 is the most powerful (built-in support for referential object graphs) and easiest to read/modify by a human (parsing is a bear, use a library such as PyYAML or JYaml or some .NET libaray).
Otherwise XML or JSON are adequate file formats.
Whichever format you use, just compress the file if you're concerned about disk usage. If you're worried about in-memory usage, then I don't see how your serialization format matters...
Have a look at Prevayler - it's a serialization persistence framework (use xstream etc if you want to human-read your data), which is really fast, does not require annotations and "just works". Some basic info:
It does impose a more rigorous transaction pattern, as it does not give you automatic rollback:
Ensure transaction will succeed (with current state of system) - e.g. does it make sense now?
[transaction is added to queue], and stored (for power reset etc)
transaction is executed and applied to the object structure.
Writes of 1000's of transactions/sec
Reads of 100,000's transactions/sec
I haven't used it much, but it's sooo much nicer to use for small projects (persisting any serializable object is so nice)
Oh - as for every one saying "what platform you running on?", Prevayler (java) has/had ports to quite a few platforms, but I can't find a decent list :(. I remember that there were around 5-7, but can only remember .NET.
If you're planning on storing everything in memory while your program does work on it, then serializing to a file using a basic load() and save() function that you write would be fine, and less pain than a full on DB.
In Java that can be done using standard Serialization (or can serialize to and from XML to make it somewhat human readable and editable).
It shouldn't affect your memory footprint at all as it is merely saving and restoring your objects. You just won't get transactions and random access and queries and all that good stuff.
you could even use xml, json, an .ini file... a text file even
I would advise a SQL like database (such as SQLLite). Today your requirements might make a full SQL database seem silly. But you never know how much this "little project" will grow over the years. When it does grow to the point where you have to have a SQL engine, you will be glad you didn't just serialize some Java objects or store stuff in JSON format.