Parse Apache Thrift file - parsing

I have a huge Apache thrift file, which I need to parse and store information as per my application.
I could do this manually, reading line by line.
But it is always prone to errors and what not. So is there some API etc that I can use to parse the file fast and efficiently?
If not, any other suggestions?

Facebook's Swift tool has a Thrift IDL parser implemented in Java if that fits into your project: https://github.com/facebook/swift/tree/master/swift-idl-parser. If your application is .NET, you might still be able to use this library if you can translate the parser JAR using IKVM.NET. There is an ANTLR grammar in there someplace too if you want to develop your own parser.
Alternatively, I noticed that the thrift trunk now has a JSON generator that outputs the IDL as a JSON data structure, which should be easy enough to parse in any language. You'll probably need to compile from source in order to use that generator, but Thrift picks up new features so fast that you might want to do that anyways if you are not already.

thrift cli can help you, you can generate json from thrift file, and then parse json to get struct of thrift file
thrift --gen json example.thrift

Related

How to read and write id3v1 and id3v2 tags in Elixir

I would like to scan music files and read/write metadata using Elixir (this whole project is about learning Elixir - so please don't tell me to use Python!). As I understand it, I have two choices: call a system utility or (as no libraries exist in Erlang or Elixir that I am aware of) write an Elixir library. For m4a files, I make a system call to MP4Box and it writes an xml file to disk. I then read in the file, parse it, and load the data into a database.
def parse(file_name) do
System.cmd("MP4Box", ["-diso",file_name])
Ainur.XmlParser.parse(xml_file_name(file_name))
|> get_tags
end
Very slow, especially for thousands of files. And I want it to run at start up everytime to check for changed/new files.
Now I am trying to do the same for mp3's with id3 tags. I tried libid3-tools on Ubuntu and it only found the id3v1 tags. eyeD3 only found id3v2 tags. My mp3's have both so I need to make sure there are the same (I suppose I could delete the id3v1 tags, but I have been led to believe that id3v1 tags are needed on legacy equipment).
Are there any Erlang or Elixir libraries for music metadata? If not, are system calls to ubuntu utilities my best choice (any recommendations on which ones)?
Or do I need to write a library to obtain reasonable performance? If so, is there an existing library in a functional language that I could try to port?
Or is it possible to call a library written in another language directly from Elixir (without the system call)?
You can always use erlang NIFs (http://erlang.org/doc/tutorial/nif.html) to wrap an external library
In this project we have a module written in Elixir which extracts ID3 tags from mp3:
https://github.com/anisiomarxjr/shoutcast_server/blob/master/lib/mp3_file.ex
To use:
id3 = Mp3File.extract_id3("./test/fixtures/nederland.mp3")
I've implemented ID3v2 tag reading (not writing) in Elixir. It's on GitHub and Hex.
Support is very basic; I implemented the bare minimum to support my use case. There's lots of bugs, but all the building blocks are there to fork/improve/contribute.
You could also try directly reading the binary of the file to find the tag in question.
Check the File.stream/3 docs to get started.

xText parser usage during runtime configuration

I want to use the runtime configuration for running an xText parser. In an example xText project I get the standalone and the runtime configuration for using the parser.
Please can somebody indicate the steps needed to use the parser during runtime in another Eclipse plug-in project. I have no experience with the plugin.xml file and I know I need to create there some extension points.
The xText sample project contain also an ui project which uses the obtained parser during runtime. Still I was not able to understand what things I need from that configuration an what not.
Help is highly appreciated.

example of using Woodstox to parse a simple XML

Can anyone give me an example of using the Woodstox parser to parser an XML file? Or point me to a place where I can look at some examples?
Thanks
The below link would be a good resource to look into to compare the whole parsing thing along with JAXB and their comparisons. The author has also mentioned about the jaxb2 maven plugin to generate xjc sources as part of your build in case you are inclined towards maven.
xml-unmarshalling-benchmark

Any tools for clojure to parse java source code? [duplicate]

I'm trying to analyze Java source files with Clojure but I couldn't find a way to do that.
First, I thought using Eclipse AST plugin(by copying necessary JAR's to my Clojure project) but I gave up after seeing Eclipse AST's API(visitor based walker).
Then I've tried creating a Java parser with ANTLR. I can only find one Java 1.6 grammar for ANTLR( http://openjdk.java.net/projects/compiler-grammar/antlrworks/Java.g ) and it doesn't compile with latest ANTLR(here's the errors I'm getting ).
Now I have no idea how can I do that. At worst I'll try to go with Eclipse AST.
Does anyone know a better way to parse Java files with Clojure?
Thanks.
Edit: To clarify my point:
I need to find some specific method calls in Java projects and inspect it's parameters(we have multiple definitions of the method, with different type of parameters). Right now I have a simple solution written in Java(Eclipse AST) but I want to use Clojure in this project as much as possible.
... and it doesn't compile with latest ANTLR ...
I could not reproduce that.
Using ANTLR v3.2, I got some warnings, but no errors. Using both ANTLR v3.3 and v3.4 (latest version), I have no problems generating a parser.
You didn't mention how you're (trying) to generate a lexer/parser, but here's how it works for me:
java -cp antlr-3.4.jar org.antlr.Tool Java.g
EDIT 1
Here's my output when running the commands:
ls
wget http://www.antlr.org/download/antlr-3.4-complete.jar
wget http://openjdk.java.net/projects/compiler-grammar/antlrworks/Java.g
java -cp antlr-3.4-complete.jar org.antlr.Tool Java.g
ls
As you can see, the .java files of the lexer and parser are properly created.
EDIT 2
Instead of generating a parser yourself (from a grammar), you could use an existing parser like this one (Java 1.5 only AFAIK) and call it from your Clojure code.
It depends a bit on what you want to do - what are you hoping to get from the analysis?
If you want to actually compile Java or at least build an AST, then you probably need to go the ANTLR or Eclipse AST route. Java isn't that bad of a language to parse, but you still probably don't want to be reinventing too many wheels..... so you might as well build on the Eclipse and OpenJDK work.
If however you are just interesting in parsing the basic syntax and analysing certain features, it might be easier to use a simpler general purpose parser combinator library. Options to explore:
fnparse (Clojure, not sure how well maintained)
jparsec (Java, but can probably be used quite easily from Clojure)

llvm-clang: incremental or online parser?

Is there anyway to use the llvm-clang parser in an incremental/online manner?
Say I'm writing an editor and I want to be able to parse the C++ code I have in front of me.
I don't want to write my own hacked up parser.
I'd like to use something full featured, like llvm-clang.
Is there an easy way to hijack the llvm-clang parser? (And is it fast enough to run it continuously in the background)?
Thanks!
I don't think clang can incrementally parse C++ files, but it's one of this project goals: http://clang.llvm.org/features.html
I've written something similar for my final year project. It wasn't C++ editor, but a Visual Studio plugin, which main task was improving C++ intellisense (like Visual Assist X).
When I was writing this project I've been also thinking about C++ incremental parser, but I haven't found any suitable solution. To solve the C++ intellisense problem I used normal C++ parser from GCC. However it was to slow, to parse file after each code completion request (ctrl+space), just try including boost::spirit. To make this project work properly I parsed files in the background and after each code completion request I compared current file with it's previous version (via diff) to detect changes made from last parsing. Having those changes I updated syntax tree, mostly by adding or removing variables.
Except incremental parsing, there is also another problem with projects like this. Mostly you'll be parsing C++ code which is being edited so it's invalid code. Given the complex C++ grammar, sometimes parser won't be able to recover from syntax errors, so it won't detect correctly some symbols in code.
Another issue are C++ parsers / compilers differences. Let's say I'm using working in Visual Studio and I have used some VC++ compiler specific contruction in my code. Clang parser won't be able to parse it correctly.
For writing something similair to IntelliSense, I would advise you to write your own parser using the LALR parsing algorithm. Since you can save its state in each line so you don't have to reparse the whole file when a file has been editted, which is very fast!
Note that C++ can't be fully expressed in BNF, but I think you could get pretty far with some adjustments. It's ofcourse a lot more work than using Clang's frontend, but you could still use Clang for analysing header files in coöperation with you own written parser.

Resources