Creating a Haskell REPL within a Haskell application - parsing

I'm trying to embed a Haskell REPL within one of my Haskell applications. The idea would be that only a subset of the Haskell libraries would be loaded by default, plus my own set of functions, and the user would use those in order to interact with the environment.
To solve this problem, I know one way would be to create a (mini-)Haskell parser + evaluator and map my mini-Haskell parser's functions to actual Haskell functions, but I'm sure there is a better way to do this.
Is there a nice and clean way to build a REPL for Haskell using Haskell?

A few things that already exist:
GHCi, of course, both in the sense of being able to look at how it's implemented or being able to use it directly (i.e., have your REPL just talk to GHCi via stdin/stdout).
The full GHC API, which lets you hook into GHC and let it do all the heavy lifting for you--loading files, chasing dependencies, parsing, type checking, etc.
hint, which is a wrapper around a subset of the GHC API, with a focus on interactive interpretation rather than compilation--which seems to fit what you want to do.
mueval, an evaluator with limits on loaded modules, resource use, etc, basically a "safe" interactive mode. It's what lambdabot uses, if you've ever been in the #haskell IRC channel.
All of the above are assuming that you don't want to deal with writing a Haskell interpreter yourself, which is probably the case.

Related

What are common properties in an Abstract Syntax Tree (AST)?

I'm new to compiler design and have been watching a series of youtube videos by Ravindrababu Ravula.
I am creating my own language for fun and I'm parsing it to an Abstract Syntax Tree (AST). My understanding is that these trees can be portable given they follow the same structure as other languages.
How can I create an AST that will be portable?
Side notes:
My parser is currently written in javascript but I might move it to C#.
I've been looking at SpiderMonkey's specs for guidance. Is that a good approach?
Portability (however defined) is not likely to be your primary goal in building an AST. Few (if any) compiler frameworks provide a clear interface which allows the use of an external AST, and particular AST structures tend to be badly-documented and subject to change without notice. (Even if they are well-documented, the complexity of a typical AST implementation is challenging.)
An AST is very tied to the syntactic details of a language, as well as to the particular parsing strategy being used. While it is useful to be able to repurpose ASTs for multiple tasks -- compiling, linting, pretty-printing, interactive editing, static analysis, etc. -- the conflicting demands of these different use cases tends to increase complexity. Particularly at the beginning stages of language development, you'll want to give yourself a lot of scope for rapid prototyping.
The most tempting reason for portable ASTs would be to use some other language as a target, thereby saving the cost of writing code-generation, etc. However, in practice it is usually easier to generate the textual representation of the other language from your own AST than to force your parser to use a foreign AST. Even better is to target a well-documented virtual machine (LLVM, .Net IL, JVM, etc.), which is often not much more work than generating, say, C code.
You might want to take a look at the LLVM Kaleidoscope tutorial (the second section covers ASTs, although implemented in C++). Also, you might find this question on a sister site interesting reading. And finally, if you are going to do your implementation in Javascript, you should at least take a look at the jison parser generator, which takes a lot of the grunt-work out of maintaining a parser and scanner (and thus allows for easier experimentation.)

Getting Parsed and Type-annotated Haskell Code

I am trying to parse Haskell source code and generate a decision tree to analyze different paths Haskell programs can take.
haskell-src-exts gives a reasonable representation, but does not have any type information associated with it.
Does GHC or some other tool provide that functionality?
There's no tool except GHC that is particularly adept at typechecking Haskell source at the moment. A haskell-type-exts was in development to match src-exts, but it was never completed.
So you can use a reasonable wrapper to the GHC API, such as hint, and invoke it on the subexpressions you want to check using its type inference api.
This is a rather painful approach, but I can't think of much better. If you're only interested in working on haskell-like code as an exercise, you could instead import the PureScript compiler as a library, and then you'll be able to get a fully type annotated syntax tree in a more reasonable way.
Alternately, you can try to navigate the thicket of the GHC api itself to get fully typechecked source...
If you choose to go that route, this answer may get you started.

Parser for OCaml

Can anyone recommend me an open-source full OCaml parser?
Essentially, I would like to implement my own type-checker for OCaml. Ideally, the parser is written in OCaml. I would just use it to get the AST of the input program. (it is probably too much to ask for the initial typing environment pre-filled with standard library function signatures)
Use compiler-lib that is distributed with OCaml under QPL license. It has everything needed to create your own compiler (and even has some documentation). compiler-lib is essentially a compiler shipped as library.
Otherwise, you can use camlp4 to get the parsetree, but then you need to reimplement everything else from scratch. But at this case you're not restricted with QPL.
it is probably too much to ask for the initial typing environment pre-filled with standard library function signatures
It is not! See the files typing/predef.ml(i).
As for the stdlib, just use the same one as the compiler, except for pervasive which uses values from Predef, the rest is normal OCaml code without any special cases (except bootstraping, obviously).

Programmatic access to fslex and fsyacc

The fslex and fsyacc tools currently require 2-stage compilation, generating files that are then compiled by fsc. It seems to me that these tools would be much easier to use if the source files were embedded resources, fed to fslex and fsyacc programmatically and the generated code compiled on-the-fly using the CodeDom.
Is this feasible and, if so, what would be required to implement this?
Jon, this is a great question; in fact, one of the design goals I have for fsharp-tools (new lexer- and parser-generator implementations for F#) is for them to be embeddable, specifically to enable scenarios like this.
As of now, I haven't implemented (yet) the functionality which would let you do this easily in fsharplex, but don't let that deter you; I've written fsharplex (and the other tools in fsharp-tools) in a more-or-less purely-functional style, so there shouldn't be any issues with global state or anything like that. It should be relatively straightforward to hack up the compiler code so you can build a regex AST using some combinators, run the compiler to get a compiled DFA, then emit IL for your state machine into a dynamic assembly (which you could then "bake" and execute).
fsharpyacc currently uses an approach where I've put the bulk of the compilation logic into a purely-functional library, Graham; the idea there is that the grammar analysis/manipulation and parser DFA compilation algorithms should be generic, reusable, and easy to test, so anyone else wanting to build language tools with F# will have a common framework on which to build them. Likewise, contributions/improvements to Graham can easily flow back to fsharpyacc. Eventually, I will modify fsharplex to use this same approach, which will allow you to embed the regex compiler in your own code simply by referencing the NuGet package (you'd just need to write the code to generate IL from the DFA).
fsharplex and fsharpyacc use MEF to allow various backends to be plugged in; for now, they're only targetting fslex and fsyacc for compatibility reasons, but I'd like to implement code-based backends (as opposed to the current table-based backends) to get better performance in the future.
Update -- I just re-read your question and noticed you want to embed the *.fsl and *.fsy files themselves and invoke the respective compilers at run-time. You could accomplish this by compiling the tools and referencing the assemblies from your own projects. IIRC, I exposed an entry point in both compilers so they could be called from outside code; the main entry points (e.g., what gets executed when you invoke the tools from a console) simply parse the command-line arguments then pass them into this "external" entry point.
There is one problem with directly embedding the *.fsl and *.fsy files though; if you embed them, then run them through fsharplex and fsharpyacc at run-time, your user-defined actions (e.g., the code executed when a lexer or parser rule is matched) will still be specified as F# source code -- you'd need to decide how you want to compile them into executable code.
It should be feasible to provide a parser combinator-like interface with a backend that uses expression trees (the LISP "eval" of F#) or something similar, for full integration with the language. Or else a TypeProvider. There are many options. If table generation is an expensive computation, it could be cached by providing a Cache, for example a disk cache.
I think nothing except lack of time, dedication and expertise, prevents us from having tools with (non-monadic) parser combinator-like interface, yet efficient compiled implementation.
Sometimes I get back to this pet project of mine, playing with an algebraic approach to optimizing regular expressions (and lexers) specified in source using combinators and then compiled to a state machine. It still lacks a few key pieces for efficiency, but there it is:
https://github.com/toyvo/ocaml-regex-algebraic

Game Engine Scripting Languages

I am trying to build out a useful 3d game engine out of the Ogre3d rendering engine for mocking up some of the ideas i have come up with and have come to a bit of a crossroads. There are a number of scripting languages that are available and i was wondering if there were one or two that were vetted and had a proper following.
LUA and Squirrel seem to be the more vetted, but im open to any and all.
Optimally it would be best if there were a compiled form for the language for distribution and ease of loading.
One interesting option is stackless-python. This was used in the Eve-Online game.
The syntax is a matter of taste, Lua is like Javascript but with curly braces replaced with Pascal-like keywords. It has the nice syntactic feature that semicolons are never required but whitespace is still not significant, so you can even remove all line breaks and have it still work. As someone who started with C I'd say Python is the one with esoteric syntax compared to all the other languages.
LuaJIT is also around 10 times as fast as Python and the Lua interpreter is much much smaller (150kb or around 15k lines of C which you can actually read through and understand). You can let the user script your game without having to embed a massive language. On the other hand if you rip the parser part out of Lua it becomes even smaller.
The Python/C API manual is longer than the whole Lua manual (including the Lua/C API).
Another reason for Lua is the built-in support for coroutines (co-operative multitasking within the one OS thread). It allows one to have like 1000's of seemingly individual scripts running very fast alongside each other. Like one script per monster/weapon or so.
( Why do people write Lua in upper case so much on SO? It's "Lua" (see here). )
One more vote for Lua. Small, fast, easy to integrate, what's important for modern consoles - you can easily control its memory operations.
I'd go with Lua since writing bindings is extremely easy, the license is very friendly (MIT) and existing libraries also tend to be under said license. Scheme is also nice and easy to bind which is why it was chosen for the Gimp image editor for example. But Lua is simply great. World of Warcraft uses it, as a very high profile example. LuaJIT gives you native-compiled performance. It's less than an order of magnitude from pure C.
I wouldn't recommend LUA, it has a peculiar syntax so takes some time to get used to. Depending on who will be doing the scripting, this may not be a problem, but I would try to use something fairly accessible.
I would probably choose python. It normally compiles to bytecode, so you would need to embed the interpreter. However, if you must you can use PyPy to for example translate the code to C, and then compile it.
Embedding the interpreter is no issue. I am more interested in features and performance at this point in time. LUA and Squirrel are both interpreted, which is nice because one of the games i am building out is to include modifiable code, which has an editor in game.
I would love to hear about python, as i have seen its use within the battlefield series i believe.
python is also nice because it has actual OGRE bindings, just in case you need to modify something lower-level on the fly. I don't know of any equivalent bindings for Lua.
Since it's a C++ library, I would suggest either JavaScript or Squirrel, the latter being my personal favorite of the two for being even closer to C++, in particular to how it handles tables/structs and classes. It would be the easiest to get used to for a C++ coder because of all the similarities.
However, if you go with JavaScript and find an HTML5 version of Ogre3D, you should be able to port your game code directly into the web version with minimal (if any) changes necessary.
Both of these are a good choice, and both have their pros and cons, but both would definitely be the easiest to learn since you're likely already working in C++. If you're working with Java, the same may hold true, and if it's Game Maker, you wouldn't need either one unless you're trying to make an executable that people wouldn't need Game Maker itself to run, in which case, good luck finding an extension to run either of these.

Resources