Tool to manage string literals used for parsing JSON - ios

We parse a lot of JSON in our app - without the back-end, it would be a pretty useless app. I know this goes for a bunch of other apps out there as well. In order to parse JSON, we need a list of keys to get to the data. I'd like to know what is considered 'best practice' or at least 'damn good practice' for managing these paths/string literals. Is there a tool out there that helps manage such keys and reduces duplication?
Hard-coding them is definitely not an option although to be frank, if our back-end programmers change the key, in concept, a simple find/replace in XCode (or whatever IDE you're using) would suffice. It's ugly and unclean and I just feel dirty putting string literals all over my code though.
What I'm currently doing now is putting them all into my PCH file, which means I end up with:
#define kBookmarksSearchResultsIDFieldName #"business.id"
#define kBookmarksSearchResultsNameFieldName #"business.name"
#define kBookmarksSearchResultsThumbnailURLFieldName #"business.display_image.images.small_mobile.source"
#define kBookmarksBusinessCategoryArrayFieldName #"business.categories"
This gets unwieldy real fast though since now I have around a thousand lines of these things in my PCH file.
The other option I'm considering is breaking these up into separate .h files - but then if two components of my app end up using the same key (for example, a business object is embedded into the JSON for a bookmark, or for a review of that business) then I have to import the .h that contains the JSON paths for the business object. So in this case I'm still importing all of the same data, it's just the file organization that's cleaner.
My objectives are:
Easy management of string literals used for parsing JSON
Reduce the amount of duplication needed
Easy changing/replacement of JSON paths if/when needed
Is option 3 that I listed above (separate .h files) my best option? What do you guys use, and am I missing an easy tool out there (and no, JSONModel isn't an option because of the way it requires your JSON keys to match your ivar/property names - our back-end supports a number of platforms so we can't change the JSON keys just for iOS).

Look into using a library such as RestKit which allows you to map a JSON document to a set of Objective-C classes. This means you can read the document in and get an array of objects you can manipulate by properties instead of having to keep track of key names. It's much easier, and Xcode will autocomplete your property names as you work with the classes.
It takes some setup, but you only have to do it once. :)

Just to update this answer - there's a very cool library called Mantle - not perfect, there are some issues with typecasting but still a very solid effort.

Related

Cache XMLProvider generated model(s)

Using XMLProvider from the FSharp.Data package like:
type internal MyProvider = XmlProvider<Sample = "C:\test.xml">
The test.xml file contains a total of 151,838 lines which makes up 15 types.
Working in the same project as the type declaration MyProvider is a pain, as it seems the XmlProvider is triggered everytime I hit CTRL+SPACE (Edit.CompleteWord) - and therefore regenerates all the models, which can take up to 10sec.
Is there any known work around, or setting to cache the generated models from XmlProvider?
I'm afraid F# Data does not currently have any caching mechanism for the inferred schema. It sounds like something that should not be too hard to add - if anyone is interested in contributing, please open an issue on GitHub to start the discussion!
My recommendation for the time being would be to try to simplify the sample XML, so that it is shorter and contains just a few representative records of all the different kinds.

Different coders for the same class in dataflow job

I'm trying to use different coders for the same class for two different scenarios:
Reading from JSON input files - using data = TextIO.Read.from(options.getInput()).withCoder(new Coder1())
Elsewhere in the job I want the class to be persisted using SerializableCoder using data.setCoder(SerializableCoder.of(MyClass.class)
It works locally, but fails when run in the cloud with
Caused by: java.io.StreamCorruptedException: invalid stream header: 7B227365.
Is it a supported scenario? The reason to do this in the first place is to avoid read/write of JSON format, and on the other hand make reading from input files more efficient (UTF-8 parsing is part of the JSON reader, so it can read from InputStream directly)
Clarifications:
Coder1 is my coder.
The other coder is a SerializableCoder.of(MyClass.class)
How does the system choose which coder to use? The two formats are binary incompatible, and it looks like due to some optimization, the second coder is used for data format which can only be read by the first coder.
Yes, using two different coders like that should work. (With the caveat that the coder in #2 will only be used if the system choses to persist 'data' instead of optimizing it into surround computations.)
Are you using your own Coders or ones provided by the Dataflow SDK? Quick caveat on TextIO -- because it uses newlines to encode element boundaries, you'll get into trouble if you use a coder that produces encoded values containing something that can be mistaken for a newline. You really should only use textual encodings within TextIO. We're hoping to make that clearer in the future.

Add new values to XML dynamically

I have an XML file in my app resources folder. I am trying to update that file with new dictionaries dynamically. In other words I am trying to edit an existing XML file to add new keys and values to it.
First of all can we edit a static XML file and add new dictionary with keys and values to it. What is the best way to do this.
In general, you can read an XML file into a document object (choose your language), use methods to modify it (add your new dictionary), and (re-)write it back out to either the original XML file, or a new one.
That's straightforward ... just roll up the ol' sleeves and code it up.
The real problem comes in with formatting in the XML file before and after said additions.
If you are going to 'unix diff' the XML file before and after, then order is important. Some standard XML processors do better with order than others.
If the order changes behind the scenes, and is gratuitously propagated into your output file, you lose standard diffing advantages, such as some gui differs, and some scm diffs (svn, cvs, etc.).
For example, browse to:
Order of XML attributes after DOM processing
They discuss that DOM loses order where SAX does not.
You can also write a custom XML 'diff'er (there may be such off-the-shelf ... for example check out 'http://diffxml.sourceforge.net/') that compares 2 XML documents tag-by-tag, attribute-by-attribute, etc.
Perhaps some standard XML-related tool such as XSLT will allow you to keep the formatting constant without changing tag or attribute order. You'd have to research that.
BTW, a related problem is the config (.ini) file problem ... many common processors flippantly announce that the write-order may not agree with the read-order.

How does one use namespaces in iOS objective-c code?

I'm writing an iOS app, "Best Korea". My organization name is "Srsly.co". I'm going to write re-usable "News" libraries that I'll use across my apps.
Each iOS app will have its own app-wide constants in a .h file, and the library code will have its constants as well in header files. I'll also have tests for each of these projects.
Is this the standard way of doing things?
In Ruby, Python, Java, etc., I'd set up namespaces along these lines:
co.srsly.bestkorea
co.srsly.bestkorea.test
co.srsly.newslib
co.srsly.newslib.test
As far as I can see, the Objective-C pattern is for each developer to choose two or three upper-case letters and prefix every class name with them.
So in my case, I'm thinking I'd choose BK as the app's classname prefix and NL for the news lib code? Am I thinking about this the right way?
EDIT: I'm considering not using namespacing at all in my application code as discussed here.
You're correct that Objective-C doesn't have built in support for namespaces, and the common solution is to use uppercase prefixes on each class. Note that Apple has stated that two letter prefixes are reserved for their use, so you should use three letter prefixes for your own classes. Otherwise, your suggested approach is the normal thing to do.
There is no NameSpace in Objective-C as you are expecting in Java.
Objective-C uses class Prefix like NS, UI, CG, CF etc to safely remove name space collision.
And it would be better to use 3 letter Prefix for your class.
You should read this : What is the best way to solve an Objective-C namespace collision?

Options for MeCab Japanese tokenizer on iOS?

I'm using the iPhone library for MeCab found at https://github.com/FLCLjp/iPhone-libmecab . I'm having some trouble getting it to tokenize all possible words. Specifically, I cannot tokenize "吉本興業" into two pieces "吉本" and "興業". Are there any options that I could use to fix this? The iPhone library does not expose anything, but it uses C++ underneath the objective-c wrapper. I assume there must be some sort of setting I could change to give more fine-grained control, but I have no idea where to start.
By the way, if anyone wants to tag this 'mecab' that would probably be appropriate. I'm not allowed to create new tags yet.
UPDATE: The iOS library is calling mecab_sparse_tonode2() defined in libmecab.cpp. If anyone could point me to some English documentation on that file it might be enough.
There is nothing iOS-specific in this. The dictionary you are using with mecab (probably ipadic) contains an entry for the company name 吉本興業. Although both parts of the name are listed as separate nouns as well, mecab has a strong preference to tag the compound name as one word.
Mecab lacks a feature that allows the user to choose whether or not compounds should be split into parts. Note that such a feature is generally hard to implement because not everyone agrees on which compounds can be split and which ones can't. E.g. is 容疑者 a compound made up of 容疑 and 者? From a purely morphological point of view perhaps yes, but for most practical applications probably no.
If you have a list of compounds you'd like to get segmented, a quick fix is to create a user dictionary for the parts they consist of, and make mecab use this in addition to the main dictionary.
There is Japanese documentation on how to do this here. For your particular example, it would involve the steps below.
Make a user dictionary with two entries, one for 吉本 and one for 興業:
吉本,,,100,名詞,固有名詞,人名,名,*,*,よしもと,ヨシモト,ヨシモト
興業,,,100,名詞,一般,*,*,*,*,こうぎょう,コウギョウ,コウギョウ
I suspect that both entries exist in the default dictionary already, but by adding them to a user dictionary and specifying a relatively low specificness indicator (I've used 100 for both -- the lower, the more likely to be split), you can get mecab to tend to prefer the parts over the whole.
Compile the user dictionary:
$> $MECAB/libexec/mecab/mecab-dict-index -d /usr/lib64/mecab/dic/ipadic -u mydic.dic -f utf-8 -t utf-8 ./mydic
You may have to adjust the command. The above assumes:
Mecab was installed from source in $MECAB. If you use mecab installed by a package manager, you might have difficulties finding the mecab-dict-index tool. Best install from source.
The default dictionary is in /usr/lib64/mecab/dict/ipadic. This is not part of the mecab package; it comes as a separate package (e.g. this) and you may have difficulties finding this, too.
mydic is the name of the user dictionary created in step 1. mydic.dic is the name of the compiled dictionary you'll get as output (needs not exist).
Both the system dictionary (-t option) and the user dictionary (-f option) are encoded in UTF-8. This may be wrong, in which case you'll get an error message later when you use mecab.
Modify the mecab configuration. In a system-wide installation, this is a file named /usr/lib64/mecab/dic/ipadic/dicrc or similar. In your case it may be located somewhere else. Add the following line to the end of the configuration file:
userdic = home/myhome/mydic.dic
Make sure the absolute path to the dictionary compiled above is correct.
If you then run mecab against your input, it will split the compound into its parts (I tested it, using mecab 0.994 on a Linux system).
A more thorough fix would be to get the source of the default dictionary and manually remove all compoun nouns you want to get split, then recompile the dictionary. As a general remark, using a CJK tokenizer for a serious application in production mode over a longer period of time usually involves a certain amount of dictionary maintenance (adding/removing entries) regularly.

Resources