Removing Fparsec Pipes from a F# project - f#

So i found a kind of abandonned open source project that uses fparsec to parse some graphql syntax. But I would need it in .net core and so I tried to port it. But the problem is, that it uses a fparsec library pipes that is not ported to .net core. So I wanted to fork the repo and remove the pipes library, so I can port the rest to .net core even tough I do not know F# or have any experience in fparsec.
This is the F# parser:
https://github.com/Lauchi/graphql-net/blob/master/GraphQL.Parser/Parsing/Parser.fs
I think I managed to translate some of the stuff back to fparsec, what I got so far is:
%[...] is choice [...]
%% '#' is pchar '#'
%% 'hello' is pstring 'hello'
%% '[' -..- is something like pchar '[' >>.
But by now you guys might have figured that I actually have no clue what I am doing because the syntax is very confusing to me. I understand what a parser and combinator is in theory and I have some understanding o functional programming and how i pass functions and parameters around but I just can not read the syntax and feel very lost.
Can anybody with some more experience give me some tips on how I can change this library back to normal fparsec code?

Related

Looking for unused symbols in Lua

I'm trying to embed Lua in my host program but my host program doesn't allow users to type '\', '{' and '}' symbols.
I need to use '{' and '}' for table construction so I'm looking for alternative symbols that are unused in Lua so I can replace these symbols internally before sending the code to an interpreter.
I'm a beginner in Lua and would like know if there's any symbol which is never used in Lua programming language. (except when used in a string)
I personally am guessing the grave accent symbol (`) is not used in Lua.
I would appreciate if anyone can confirm this.
Thanks!
Specifically the grave accent doesn't seem to be used and you can check it yourself by looking at the complete syntax of Lua section in the manual.
As far as I know this is not currently used in Lua, so you are fine. It might also be useful to know that Lua, as most of programming languages, has some special (magic) characters that when used in a specific context convey a special meaning. Take a look at the official documentation.

Use FParsec on already tokenized UInt16 stream

I need to parse an already tokenized stream of type UInt16 seq.
How can I do this with FParsec?
All the top level functions I can find in the reference work on charstreams.
At the moment I convert the UInt16s to chars which seems silly.
Unfortunately it is not possible to use FParsec on anything else than a CharStream.
I solved the problem by writing a simple parser combinator myself, using this article.
Surprisingly this was only one day's worth of work.
I learned a lot about parser combinators in the process.

Parsing an OCaml file with OCaml

I want to analysis OCaml files (.ml) using OCaml. I want to break the files into Abstract Syntax Trees for analysis. I have attempted to use camlp4 but have had no luck. Has anyone else successfully done this before? Is this the best way to parse an OCaml file?
(I assume you know basic parts of OCaml already: how to write OCaml code, how to link modules and libraries, how to write build scripts and so on. If you do not, learn them first.)
The best way is to use the genuine OCaml code parser used in OCaml compiler itself, since it is 100% compatible by definition.
CamlP4 also implements OCaml parser but it is slightly incompatible with the genuine parser and the parse tree is somewhat specialized for writing syntax extensions: not very good for any other kind of analysis.
You may want to parse .ml files with syntax extensions using P4. Even in this case, you should stick to the genuine parser: you can desugar the source code by P4 then send the result to your analyzer with the genuine parser.
To use OCaml compiler's parser, the easiest approach is to use compiler-libs.common OCamlFind package. It contains the parser and type checker of OCaml compiler.
Start from modifying driver/compile.ml of OCaml compiler source, it implements the major compilation phases: calling preprocessor, parse, typing then code generation. To parse .ml files you should modify (or simplify) Compile.implementation. For .mli files Compile.interface.
Good luck.
Couldn't you use the -dparsetree option to the ocaml compiler?
hello.ml:
let _ = print_endline "Hello AST"
Now compile it:
$ ocamlc -dparsetree hello.ml
Which results in:
[
structure_item (hello.ml[1,0+0]..[1,0+33])
Pstr_eval
expression (hello.ml[1,0+8]..[1,0+33])
Pexp_apply
expression (hello.ml[1,0+8]..[1,0+21])
Pexp_ident "print_endline" (hello.ml[1,0+8]..[1,0+21])
[
<label> ""
expression (hello.ml[1,0+22]..[1,0+33])
Pexp_constant Const_string("Hello AST",None)
]
]
See also this blog post on -ppx extensions which has some info on extension point syntax extensions (the new way of writing syntax extensions in OCaml 4.02). There is info there on various AST manipulation modules.

F# fslex fsyacc mature for production code?

After reading a 2 year old webpage really ripping fslex/fsyacc, buggy, slow, stupid etc. compared to their OCamel counterparts i wonder what would be ones best bet for lexing parsing needs?
Ive used ANTLR before with C# bindings but am currently in the process of learning F# and was excited when i saw it came with a parser generator. Since F# is now officaly released and it seems something Microsoft is really aiming to support and develop. Would you say fslex and fsyacc is worth it for production code?
Fslex and fsyacc are used by the F# compiler, so they kind of work. I have used them a few years ago, it was good enough for my needs.
However, my experience is that lex/yacc is much less mature in F# than in OCaml. Many people in the OCaml community have used them for years, including many students (it seems like writing a small interpreter/compiler with them is a common exercise). I don't think many F# developers have used them, and I don't think the F# team has done a lot of work on these tools recently (for instance, VS integration has not been a priority). If you're not very exigent, Fslex and fsyacc could be enough for you.
A solution could be to adapt Menhir (a camlyacc replacement with several nice features) to use it with F#. I have no idea how much work it would be.
Personally, I now use FParsec every time I need to write a parser. It's quite different to use, but it's also much more flexible and it generates good parse error messages. I've been very happy with it and its author has always been very helpful when I had questions.
Fslex and fsyacc are certainly ready for production use. After all, they are used in Microsoft Visual Studio 2010, because the F# lexer and parser are written using them (The F# compiler source code is also a good example that demonstrates how to use them efficiently).
I'm not sure how fslex/fsyacc compare to their OCaml equivalents or with ANTLR. However, Frederik Holmstrom has an article that compares ANTLR with hand-written parser written in F# used in IronJS. Unfortunatelly, he doesn't have fslex/fsyacc version, so there is no direct comparison.
To answer some specific concerns - you can get MSBUILD tasks for running fslex/fsyacc as part of the build, so it integrates quite well. You don't get a syntax highlighting, but I don't think that's such a big deal. It may be slower than OCaml version, but that affects the compilation only when you change the parser - I did some modifications to the F# parser and didn't find the compilation time a problem.
The fslex and fsyacc tools were specifically written for the F# compiler and were not intended for wider use. That said, I have managed to get significant code bases ported from OCaml to F# thanks to these tools but it was laborious due to the complete lack of VS integration on the F# side (OCaml has excellent integration with syntax highlighting, jump to definition and error throwback). In particular, I moved as much of the F# code out of the lexer and parser as possible.
We have often needed to write parsers and have asked Microsoft to add official support for fslex and fsyacc but I do not believe this will happen.
My advice would be to use fslex and fsyacc only if you are facing translating a large legacy OCaml code base that uses ocamllex and ocamlyacc. Otherwise, write a parser from scratch.
I am personally not a fan of parser combinator libraries and prefer to write parsers using active patterns that look something like this s-expression parser:
let alpha = set['A'..'Z'] + set['a'..'z']
let numeric = set['0'..'9']
let alphanumeric = alpha + numeric
let (|Empty|Next|) (s: string, i) =
if i < s.Length then Next(s.[i], (s, i+1)) else Empty
let (|Char|_|) alphabet = function
| Empty -> None
| s, i when Set.contains s.[i] alphabet -> Some(s, i+1)
| _ -> None
let rec (|Chars|) alphabet = function
| Char alphabet (Chars alphabet it)
| it -> it
let sub (s: string, i0) (_, i1) =
s.Substring(i0, i1-i0)
let rec (|SExpr|_|) = function
| Next ((' ' | '\n' | '\t'), SExpr(f, it)) -> Some(f, it)
| Char alpha (Chars alphanumeric it1) as it0 -> Some(box(sub it0 it1), it1)
| Next ('(', SExprs(fs, Next(')', it))) -> Some(fs, it)
| _ -> None
and (|SExprs|) = function
| SExpr(f, SExprs(fs, it)) -> box(f, fs), it
| it -> null, it
This approach does not require any VS integration because it is just vanilla F# code. I find it easy to read and maintainable. Performance has been more than adequate in my production code.

Parser generator for inline documentation

To have a general-purpose documentation system that can extract inline documentation of multiple languages, a parser for each language is needed. A parser generator (which actually doesn't have to be that complete or efficient) is thus needed.
http://antlr.org/ is a nice parser generator that already has a number of grammars for popular languages. Are there better alternatives i.e. simpler ones that support generating parsers for even more languages out-of-the-box?
If you're only looking for "partial parsing", then you could use ANTLR's option to partially "lex" a token stream and ignore the rest of the tokens. You can do that by enabling the filter=true in a lexer-grammar. The lexer then tries to match any token you defined in your grammar, and when it can't match one of the tokens, it advances one single character (and ignores it) and then again tries to match one of your token at the next character:
lexer grammar Foo;
options {filter=true;}
StringLiteral
: ...
;
CharLiteral
: ...
;
SingleLineComment
: ...
;
MultiLineComment
: ...
;
When implemented properly, you can get the MultiLineComments (/* ... */) from a Java file quite easily without being afraid of single line comments and String- or char literals messing things up.
Obviously, your source files need to be valid to be able to properly tokenize a file, otherwise you get strange results!
My compiler uses Dypgen. This is a user extenisble GLR parser with lots of enrichments so it can parse many languages. The bootstrap grammar is EBNF like (it supports * + and ? directly in your productions). It is powerful enough to dynamically load extensions, a fact my compiler leverages: the bulk of my programming language has its syntax dynamically loaded at compiler startup.
Dypgen is written in Ocaml and generates Ocaml code.
There is a C++ GLR parser called Elkhound which is powerful enough to parse most of C++.
However, for your actual requirements, you do not really need to do any serious parsing: a regular expression matching engine is probably good enough. Googles re2 may be suitable (provides most PCRE functionality, a lot faster and with C++ interface).
Although this is less accurate, it is good enough because you can demand that inline documentation adhere to some simple formats. Most existing inline docs already do so for just this reason.
Where I work we used to use GOLD Parser. This is a lot simpler that Antlr and supports multiple languages. We have since moved to Antlr however as we needed to do more complex parsing, which we found Antlr was better for than GOLD.

Resources