Parser for OCaml - parsing

Can anyone recommend me an open-source full OCaml parser?
Essentially, I would like to implement my own type-checker for OCaml. Ideally, the parser is written in OCaml. I would just use it to get the AST of the input program. (it is probably too much to ask for the initial typing environment pre-filled with standard library function signatures)

Use compiler-lib that is distributed with OCaml under QPL license. It has everything needed to create your own compiler (and even has some documentation). compiler-lib is essentially a compiler shipped as library.
Otherwise, you can use camlp4 to get the parsetree, but then you need to reimplement everything else from scratch. But at this case you're not restricted with QPL.

it is probably too much to ask for the initial typing environment pre-filled with standard library function signatures
It is not! See the files typing/predef.ml(i).
As for the stdlib, just use the same one as the compiler, except for pervasive which uses values from Predef, the rest is normal OCaml code without any special cases (except bootstraping, obviously).

Related

Programmatic access to fslex and fsyacc

The fslex and fsyacc tools currently require 2-stage compilation, generating files that are then compiled by fsc. It seems to me that these tools would be much easier to use if the source files were embedded resources, fed to fslex and fsyacc programmatically and the generated code compiled on-the-fly using the CodeDom.
Is this feasible and, if so, what would be required to implement this?
Jon, this is a great question; in fact, one of the design goals I have for fsharp-tools (new lexer- and parser-generator implementations for F#) is for them to be embeddable, specifically to enable scenarios like this.
As of now, I haven't implemented (yet) the functionality which would let you do this easily in fsharplex, but don't let that deter you; I've written fsharplex (and the other tools in fsharp-tools) in a more-or-less purely-functional style, so there shouldn't be any issues with global state or anything like that. It should be relatively straightforward to hack up the compiler code so you can build a regex AST using some combinators, run the compiler to get a compiled DFA, then emit IL for your state machine into a dynamic assembly (which you could then "bake" and execute).
fsharpyacc currently uses an approach where I've put the bulk of the compilation logic into a purely-functional library, Graham; the idea there is that the grammar analysis/manipulation and parser DFA compilation algorithms should be generic, reusable, and easy to test, so anyone else wanting to build language tools with F# will have a common framework on which to build them. Likewise, contributions/improvements to Graham can easily flow back to fsharpyacc. Eventually, I will modify fsharplex to use this same approach, which will allow you to embed the regex compiler in your own code simply by referencing the NuGet package (you'd just need to write the code to generate IL from the DFA).
fsharplex and fsharpyacc use MEF to allow various backends to be plugged in; for now, they're only targetting fslex and fsyacc for compatibility reasons, but I'd like to implement code-based backends (as opposed to the current table-based backends) to get better performance in the future.
Update -- I just re-read your question and noticed you want to embed the *.fsl and *.fsy files themselves and invoke the respective compilers at run-time. You could accomplish this by compiling the tools and referencing the assemblies from your own projects. IIRC, I exposed an entry point in both compilers so they could be called from outside code; the main entry points (e.g., what gets executed when you invoke the tools from a console) simply parse the command-line arguments then pass them into this "external" entry point.
There is one problem with directly embedding the *.fsl and *.fsy files though; if you embed them, then run them through fsharplex and fsharpyacc at run-time, your user-defined actions (e.g., the code executed when a lexer or parser rule is matched) will still be specified as F# source code -- you'd need to decide how you want to compile them into executable code.
It should be feasible to provide a parser combinator-like interface with a backend that uses expression trees (the LISP "eval" of F#) or something similar, for full integration with the language. Or else a TypeProvider. There are many options. If table generation is an expensive computation, it could be cached by providing a Cache, for example a disk cache.
I think nothing except lack of time, dedication and expertise, prevents us from having tools with (non-monadic) parser combinator-like interface, yet efficient compiled implementation.
Sometimes I get back to this pet project of mine, playing with an algebraic approach to optimizing regular expressions (and lexers) specified in source using combinators and then compiled to a state machine. It still lacks a few key pieces for efficiency, but there it is:
https://github.com/toyvo/ocaml-regex-algebraic

Cross-platform parser development - What are the options?

I'm currently working on a project that makes use of a custom language with a simple context-free grammar.
Due to the project's characteristics the same language will have to be used on several platforms, especially mobile ones. Currently, I'm using my small hand-written Java parser (for the Android platform). Soon, I'll have to write basically the same parser for JavaScript and later possibly also for C# (Windows Phone) and Objective C (iOS). There is an additional chance that I'll also have to write it for PHP.
My question is: What options are there to simplify the parser development process? Do I really have to write basically the same parser for each platform or is there a less work-intensive way?
From a development process point of view the best alternative would enable me to write a grammar definition which would then automatically be compiled into a parser.
However, basically the only cross-platform parser generator I've found so far it the GOLD Parser which supports two of my target platforms (Java and C#). It would really be awesome if you could point me to other alternatives.
In case you don't know about other cross-platform compiler-compilers: Do you have hints how to structure the code towards future language extensibility?
I commend https://en.wikipedia.org/wiki/Comparison_of_parser_generators to your attention: if we restrict the domain to Java and C/C++, it suggests APG, GOLD, SableCC, and SLK (amongst others) as being cross-language enough for your stated goals. (I'm also requiring that the action code be separated from the grammar rather than inline, since the latter would defeat the purpose.) If you want JavaScript as well, it looks like your choices are APG (GPL-licensed) and WaxEye (MIT-licensed).
If your language is reasonably simple then I would say to just go with whichever you think will be easiest to integrate into your build environment(s) and has a reasonable match with how you think. Unless parsing time is a huge fraction of your application's total workload, parsing speed should not be an issue -- although table size and memory usage might matter in a mobile context. If your grammar is "simple enough," (i.e. not Perl, for instance) I would expect any of those tools to work.
Have a look in Antlr, I am using it for transforming java code and it is really great. Moreover you can find different grammars here.
REx parser generator supports the required targets, except for Objective C and PHP (code generators for those might be possible). It has not yet been published as open source, though, and there is no decent documentation, just sample grammars. But there are projects that are using it successfully, e.g. xqlint. Here is a paper describing the experience from that project.

Bindings and introspection for OCaml library

I want to write an OCaml library which will be used by other programing languages like C or even python.
I not sure it's even feasible, and i guess i need to drop some type safety and add runtime checks to the interface for dynamically typed language.
Is it doable ? Is there tools to achieve this goal to auto-generate bindings ? I think stuffs like Corba do not fit well with ocaml ABI, but I may be wrong.
EDIT : by dropping the runtime requirement and using only languages having a llvm frontend, I could use llvm as a common ABI I guess, but it seems tricky.
OCaml has a FFI to interact with C code. The code for the binding has to be written in C, not in OCaml (which has no direct representation of C values, while C has representations of OCaml values). My advice would be:
On the C side, decide what would be the best interface to export that C programmers would like (or Python programmers writing Python bindings starting from your C interface)
Define a "low-level layer" on the OCaml side that gets your OCaml value as close as possible from the C representation
Write some C wrappers to convert from this low-level OCaml representation to your optimal C representation
The reason for step (2) is to have the step (3) as small as possible. Manipulating OCaml values from the C side is a bit painful, in particular you risk getting the interaction with the Garbage Collector wrong, which means segfaults -- plus you don't get any type safety. So the less work you have to do on the C side, the better.
There are some projects to do some of the wrapping work for you. CamlIDL for example, and I think Swig has some support for OCaml. I have never used those, though, so I can't comment.
If you know to which high-level language you wish to convert your interface to, there may be specialized bridge that don't need a C step. For example there are libraries to interact directly with Python representations (search for Pycaml, not sure how battle-tested their are) or with the Java runtime (the OCamlJava project). A C interface is still a safe bet that will allow other people to create bridges to their own languages.
It is feasible, but you need to understand involved topics, like how the GC works.
Have a look at this: http://caml.inria.fr/pub/docs/manual-ocaml-4.00/manual033.html#toc148
You need to be careful about types in the stub code, but otherwise you can keep type safety.

Creating a Haskell REPL within a Haskell application

I'm trying to embed a Haskell REPL within one of my Haskell applications. The idea would be that only a subset of the Haskell libraries would be loaded by default, plus my own set of functions, and the user would use those in order to interact with the environment.
To solve this problem, I know one way would be to create a (mini-)Haskell parser + evaluator and map my mini-Haskell parser's functions to actual Haskell functions, but I'm sure there is a better way to do this.
Is there a nice and clean way to build a REPL for Haskell using Haskell?
A few things that already exist:
GHCi, of course, both in the sense of being able to look at how it's implemented or being able to use it directly (i.e., have your REPL just talk to GHCi via stdin/stdout).
The full GHC API, which lets you hook into GHC and let it do all the heavy lifting for you--loading files, chasing dependencies, parsing, type checking, etc.
hint, which is a wrapper around a subset of the GHC API, with a focus on interactive interpretation rather than compilation--which seems to fit what you want to do.
mueval, an evaluator with limits on loaded modules, resource use, etc, basically a "safe" interactive mode. It's what lambdabot uses, if you've ever been in the #haskell IRC channel.
All of the above are assuming that you don't want to deal with writing a Haskell interpreter yourself, which is probably the case.

How do you make a language binding?

Although I do more or less understand what a language binding is, I am struggling to understand how they work.
Could anyone explain how do you make a Java binding for WinAPI, for example?
You'll find much better results if you search for Foreign Function Interface or FFI. The FFI is what allows you to call functions that were written in a different language, i.e., foreign ones. Different languages and runtimes have vastly different FFIs and you'll have to learn each one individually. Learning an FFI also forces you to know a little more about the internals of your language and its runtime than you are ordinarily used to. Some FFIs make you write code in the target language, like Haskell (where FFI code must be written in Haskell), and others make you write code in the source language, like Python (where FFI code must be written in C).
Certain languages don't use the term FFI (though it would be nice if they did). For Java, it's called Java Native Interface, or JNI.
Languages (usually) have defined syntax for calling "native" code. So if you have library that exports method foo(), making a biding would mean that you will create, in you example, Java class with method foo(). That way, you can call MyBinding.foo() from the rest of a code, it will make no difference whether it was pure Java method or compiled C code.
Again for Java, you probably want to look at JNI documentation. Other languages have similar mechanisms. There are tools like SIP that will take bunch of C(++) header files, and produce Python bindings for it. I guess other languages could have similar tools as well.

Resources