I have lexer/parser (Generated from an ANTLR grammar file) which (for performance reasons) I have compiled to C code which will be included into my actionscript project using Adobe Alchemly.
The parser will generate an abstract syntax tree (In C) from an input string (passed from Actionscript) - I wish to return the C AST back into actionscript for further processing. How can I convert the tree structure of the AST to a format which I can return to actionscript?
Thanks,
Unfortunately you can't just send a C data structure across. You've got three options, in increasing order of madness:
Serialize the data on the C side and reconstitute it on the AS3 side.
Pack up the data into Objects and return those.
Pass a pointer and size back to AS3 and pull out the data from Alchemy's ram ByteArray.
I only include #3 for completeness-- I think it would be crazy to try it for any kind of complex data structure. The code would be fragile. Following pointers would be clunky. Bleah.
For #2 you could use dynamic Objects (via AS3_Object) or concrete ones (via AS3_Get, AS3_New). This is fairly complex code also and not so fast. Can be hard to maintain.
For #1, the type of serialization is what matters. You could have your C code render the structures to a binary 'file', return that, and have your AS3 parse the file format via ByteArray. Or you could render it to XML and have AS3's XML class parse it. This has the benefit of being fairly fast (since XML is implemented natively), at least on the de-serialization end. If you have a fast XML renderer on the C side (or, ahem, sprintfs), its not so bad.
Related
So I help write an app for my university and I'm wondering what's the best way to handle multiple XML feeds (like scores for sports, class information, etc.).
Should I have one XML parser that can handle all feeds? Or should I write a parser for each feed? We're having trouble deciding the best way to implement it.
This is iOS and we use a mix of Swift 3 and Objective-C
I think the right strategy is to write a base class that handles common data types like integers, booleans, strings, etc., and then write derived classes for each type of feed. This is the strategy I use in my XML parser, which is based on the data structures and Apple's XML parser as described here:
https://developer.apple.com/library/content/documentation/Cocoa/Conceptual/NSXML_Concepts/NSXML.html
Personally I prefer to use the XPath data models where you can query the XML tree for a specific node using a path-like string.
Scenario:
Large (dynamic) xml files being uploaded by users.
We need to map the xml to our own database structure.
We need to use a SAX parser (or something like it) because of memory issues when parsing large XML files.
We currently use https://github.com/craigambrose/sax_stream for parsing XML's that all have the same structure.
For a new feature, we need to parse XML with unknown contents.
How would one use a SAX parser when the xml nodes are different each time ?
I've tried using https://github.com/soulcutter/saxerator, especially the at_depth() function could come in handy to collect the elements at a certain depth, after that we could get the elements inside a node by using the for_tag() function. Based on this info we maybe could create a mapping on the fly
If a SAX parser isn't an option, are there any alternatives for parsing very large (dynamic) XML files?
aeson seems to take a somewhat simple-minded approach to parsing JSON: it parses a top-level JSON value (an object or array) to its own fixed representation and then offers facilities to help users convert that representation to their own. This approach works pretty well when JSON objects and arrays are small. When they're very large, things start to fall apart, because user code can't do anything until JSON values are completely read and parsed. This seems particularly unfortunate since JSON seems to be designed for recursive descent parsers— it seems like it should be fairly simple to allow user code to step in and say how each piece should be parsed. Is there a deep reason aeson and the earlier json work this way, or should I try to make a new library for more flexible JSON parsing?
json-stream is a stream based parser. This is a bit out of date (2015), but they took the benchmarks from aeson and compared the two libraries: aeson and json-stream performance comparison. There is one case where json-stream is significantly worse than aeson.
If you just want a faster aeson (not streaming), haskell-sajson looks interesting. It wraps a performant C++ library in Haskell and returns Value from aeson.
I am working with a huge JSON object and i need to extract a single parameter from it.
Is there a way to query the JSON object for the parameter?
You need a streaming JSON parser for that, i.e. a parser that produces events to which you listen as it goes through the JSON input, as opposed to document-based parsers, such as NSJSONSerialization of iOS 5+.
One of such parsers is YAJL: although it is a C library, you can use it from Objective C as well: all you need to do is defining a yajl_callbacks, put pointers to the handlers for the type of the item that you wish to extract, call the parser, and let the parser do the rest.
Parsec is designed to parse textual information, but it occurs to me that Parsec could also be suitable to do binary file format parsing for complex formats that involve conditional segments, out-of-order segments, etc.
Is there an ability to do this or a similar, alternative package that does this? If not, what is the best way in Haskell to parse binary file formats?
The key tools for parsing binary files are:
Data.Binary
cereal
attoparsec
Binary is the most general solution, Cereal can be great for limited data sizes, and attoparsec is perfectly fine for e.g. packet parsing. All of these are aimed at very high performance, unlike Parsec. There are many examples on hackage as well.
You might be interested in AttoParsec, which was designed for this purpose, I think.
I've used Data Binary successfully.
It works fine, though you might want to use Parsec 3, Attoparsec, or Iteratees. Parsec's reliance on String as its intermediate representation may bloat your memory footprint quite a bit, whereas the others can be configured to use ByteStrings.
Iteratees are particularly attractive because it is easier to ensure they won't hold onto the beginning of your input and can be fed chunks of data incrementally a they come available. This prevents you from having to read the entire input into memory in advance and lets you avoid other nasty workarounds like lazy IO.
The best approach depends on the format of the binary file.
Many binary formats are designed to make parsing easy (unlike text formats that are primarily to be read by humans). So any union data type will be preceded by a discriminator that tells you what type to expect, all fields are either fixed length or preceded by a length field, and so on. For this kind of data I would recommend Data.Binary; typically you create a matching Haskell data type for each type in the file, and then make each of those types an instance of Binary. Define the "get" method for reading; it returns a "Get" monad action which is basically a very simple parser. You will also need to define a "put" method.
On the other hand if your binary data doesn't fit into this kind of world then you will need attoparsec. I've never used that, so I can't comment further, but this blog post is very positive.