Idris parser combinator GADT - parsing

I am currently working on implementing a simple parser combinator library in Idris to learn the language and better understand type systems in general, but am having some trouble wrapping my head around how GADTs are declared and used. The parser data type that I am trying to formulate looks like (in Haskell): type Parser a = String -> [(a,String)], from this paper. Basically, the type is a function that takes a string and returns a list.
In Idris I have:
data Parser : Type -> Type where
fail : a -> (String -> [])
pass : a -> (String -> [(a,String)])
where the fail instance is a parser that always fails (i.e.- will be a function that always returns the empty list) and the pass instance is a parser that consumed some symbol. When loading the above into the interpreter, I get an error that there is a type mismatch between List elem and the expected type Type. But when I check the returned type of the parser in the repl with :t String -> List Type I get Type, which looks like it should work.
I would be very grateful if anyone could give a good explanation of why this data declaration does not work, or a better alternative to representing the parser data type.

In Haskell,
type Parser a = String -> [(a,String)]
doesn't create a new type, it is merely a type synonym. You can do something similar in Idris as
Parser : Type -> Type
Parser a = String -> List (a, String)
and then define your helper definitions as simple functions that return Parser as:
fail : Parser a
fail = const []
pass : a -> Parser a
pass x = \s => [(x, s)]

In defining your datatype Parser, the first line is saying this type takes a type and returns a type, in this case it takes an a and returns a Parser a.
So, your constructors should return Parser a.
Similar to a List which takes an a and returns a List a.
What you are currently returning in your constructors are not of type Parser - easily seen as nowhere does the word Parser occur on the right hand side.
Beyond that though, I'm not sure how you would best represent this.
However, there are some parser libraries already written in Idris, looking at these might help ? For example, have a look at this list of libraries, parsers are the first ones mentioned :-
Idris libraries

Related

F# quotation with spliced parameter of any type

I am trying to develop F# type provider.
It provides some DTOs (with the structure described in some external document) and a set of methods for processing them. The processing algorithm is based on reflection, and I want to have a single quotation representing it.
Generally, this algorithm must pass all method call arguments to the already written function serialize: obj -> MySerializationFormat, storing all results in a list, so I getting a value of MySerializationFormat list.
Code sample below shows, how I tried to do that for first time:
let serialize (value: obj) = ...
let processingCode: Expr list -> Expr =
fun args ->
let serializeArgExpr (arg: Expr) = <# serialize %%arg} #>
let argsExprs = List.map serializeArgExpr args
let serializedArgList =
List.foldBack (fun head tail -> <# (%head) :: (%tail)#>) argsExprs <# [] #>
// futher processing
At that point I faced with exception: In function serializeArgExpr the actual type of value in arg: Expr may vary, it can be some primitive type (e.g string, int, float), or some provided type. The problem is %% operator treats that arg as an expression of the obj type. Type check is performed on that line in Microsoft.FSharp.Quotations.Patterns module, in function fillHolesInRawExpr.
So, as the actual type of my term not matched the treated type for "hole" in the quotation, it throws invalidArg.
I have tried several technics to avoid these exceptions with casting operations in my quotation, but they don't work. Then I found Expr.Coerce(source, target) function, which looks like solving my problem. I have changed the code of serializeArgExpr to something like that:
let serializeArgExpr (arg: Expr) =
let value' = Expr.Coerce(value, typeof<obj>)
<# serialize %%value' } #>
Then faced a new strange exception:
The design-time type (point to a code line that uses my processingCode) utilized by a type provider was not found in the target reference assembly set
For me, it seems that my problem is to cast the type of value in any input Expr to an obj type. Thank you for diving in and trying to help.

How do I extract useful information from the payload of a GADT / existential type?

I'm trying to use Menhir's incremental parsing API and introspection APIs in a generated parser. I want to, say, determine the semantic value associated with a particular LR(1) stack entry; i.e. a token that's been previously consumed by the parser.
Given an abstract parsing checkpoint, encapsulated in Menhir's type 'a env, I can extract a “stack element” from the LR automaton; it looks like this:
type element =
| Element: 'a lr1state * 'a * position * position -> element
The type element describes one entry in the stack of the LR(1) automaton. In a stack element of the form Element (s, v, startp, endp), s is a (non-initial) state and v is a semantic value. The value v is associated with the incoming symbol A of the state s. In other words, the value v was pushed onto the stack just before the state s was entered. Thus, for some type 'a, the state s has type 'a lr1state and the value v has type 'a ...
In order to do anything useful with the value v, one must gain information about the type 'a, by inspection of the state s. So far, the type 'a lr1state is abstract, so there is no way of inspecting s. The inspection API (§9.3) offers further tools for this purpose.
Okay, cool! So I go and dive into the inspection API:
The type 'a terminal is a generalized algebraic data type (GADT). A value of type 'a terminal represents a terminal symbol (without a semantic value). The index 'a is the type of the semantic values associated with this symbol ...
type _ terminal =
| T_A : unit terminal
| T_B : int terminal
The type 'a nonterminal is also a GADT. A value of type 'a nonterminal represents a nonterminal symbol (without a semantic value). The index 'a is the type of the semantic values associated with this symbol ...
type _ nonterminal =
| N_main : thing nonterminal
Piecing these together, I get something like the following (where "command" is one of my grammar's nonterminals, and thus N_command is a string nonterminal):
let current_command (env : 'a env) =
let rec f i =
match Interpreter.get i env with
| None -> None
| Some Interpreter.Element (lr1state, v, _startp, _endp) ->
match Interpreter.incoming_symbol lr1state with
| Interpreter.N Interpreter.N_command -> Some v
| _ -> f (i + 1)
in
f 0
Unfortunately, this is puking up very confusing type-errors for me:
File "src/incremental.ml", line 110, characters 52-53:
Error: This expression has type string but an expression was expected of type
string
This instance of string is ambiguous:
it would escape the scope of its equation
This is a bit above my level! I'm pretty sure I understand why I can't do what I tried to do above; but I don't understand what my alternatives are. In fact, the Menhir manual specifically mentions this complexity:
This function can be used to gain access to the semantic value v in a stack element Element (s, v, _, _). Indeed, by case analysis on the symbol incoming_symbol s, one gains information about the type 'a, hence one obtains the ability to do something useful with the value v.
Okay, but that's what I thought I did, above: case-analysis by match'ing on incoming_symbol s, pulling out the case where v is of a single, specific type: string.
tl;dr: how do I extract the string payload from this GADT, and do something useful with it?
If your error sounds like
This instance of string is ambiguous:
it would escape the scope of its equation
it means that the type checker is not really sure if outside of the pattern matching branch the type of v should be a string, or another type that is equal to string but only inside the branch. You just need to add a type annotation when leaving the branch to remove this ambiguity:
| Interpreter.(N N_command) -> Some (v:string)

Unpacking nested applicative functors f#

Hi I am attempting to make a combinator parser and I am currently attempting to make it read headers and create parsers based upon what the header which is parsed is. I.e A header of; int, float, string will result in Parser<Parser<int>*Parser<float>*Parser<string>>.
I am wondering however how you would unpack the "inner" parsers which and then end up with a something like; Parser<int*float*string>?
Parser type is: type Parser<'a> = Parser of (string -> Result<'a * string, string>)
I'm not sure that your idea with nested parsers is going to work - if you parse a header dynamically, then you'll need to produce a list of parsers of the same type. The way you wrote this is suggesting that the type of the parser will depend on the input, which is not possible in F#.
So, I'd expect that you will need to define a value like this:
type Value = Int of int | String of string | Float of float
And then your parser that parses a header will produce something like:
let parseHeaders args : Parser<Parser<Value> list> = (...)
The next question is, what do you want to do with the nested parsers? Presumably, you'll need to turn them into a single parser that parses the whole line of data (if this is something like a CSV file). Typically, you'd define a function sequence:
val sequence : sep:Parser<unit> -> parsers:Parser<'a> list -> Parser<'a list>
This takes a separator (say, parser to recognize a comma) and a list of parsers and produces a single parser that runs all the parsers in a sequence with the separator in between.
Then you can do:
parseHeaders input |> map (fun parsers -> sequence (char ',') parsers)
And you get a single parser Parser<Parser<string>>. You now want to run the nested parser on the rest of the that is left after running the outer parser, which recognizes headers. The following function does the trick:
let unwrap (Parser f:Parser<Parser<'a>>) = Parser (fun s ->
match f s with
| Result.Ok(Parser nested, rest) -> nested rest
| Result.Error e -> Result.Error e )

Using Parsec to write a Read instance

Using Parsec, I'm able to write a function of type String -> Maybe MyType with relative ease. I would now like to create a Read instance for my type based on that; however, I don't understand how readsPrec works or what it is supposed to do.
My best guess right now is that readsPrec is used to build a recursive parser from scratch to traverse a string, building up the desired datatype in Haskell. However, I already have a very robust parser who does that very thing for me. So how do I tell readsPrec to use my parser? What is the "operator precedence" parameter it takes, and what is it good for in my context?
If it helps, I've created a minimal example on Github. It contains a type, a parser, and a blank Read instance, and reflects quite well where I'm stuck.
(Background: The real parser is for Scheme.)
However, I already have a very robust parser who does that very thing for me.
It's actually not that robust, your parser has problems with superfluous parentheses, it won't parse
((1) (2))
for example, and it will throw an exception on some malformed inputs, because
singleP = Single . read <$> many digit
may use read "" :: Int.
That out of the way, the precedence argument is used to determine whether parentheses are necessary in some place, e.g. if you have
infixr 6 :+:
data a :+: b = a :+: b
data C = C Int
data D = D C
you don't need parentheses around a C 12 as an argument of (:+:), since the precedence of application is higher than that of (:+:), but you'd need parentheses around C 12 as an argument of D.
So you'd usually have something like
readsPrec p = needsParens (p >= precedenceLevel) someParser
where someParser parses a value from the input without enclosing parentheses, and needsParens True thing parses a thing between parentheses, while needsParens False thing parses a thing optionally enclosed in parentheses [you should always accept more parentheses than necessary, ((((((1)))))) should parse fine as an Int].
Since the readsPrec p parsers are used to parse parts of the input as parts of the value when reading lists, tuples etc., they must return not only the parsed value, but also the remaining part of the input.
With that, a simple way to transform a parsec parser to a readsPrec parser would be
withRemaining :: Parser a -> Parser (a, String)
withRemaining p = (,) <$> p <*> getInput
parsecToReadsPrec :: Parser a -> Int -> ReadS a
parsecToReadsPrec parsecParser prec input
= case parse (withremaining $ needsParens (prec >= threshold) parsecParser) "" input of
Left _ -> []
Right result -> [result]
If you're using GHC, it may however be preferable to use a ReadPrec / ReadP parser (built using Text.ParserCombinators.ReadP[rec]) instead of a parsec parser and define readPrec instead of readsPrec.

Underlying Parsec Monad

Many of the Parsec combinators I use are of a type such as:
foo :: CharParser st Foo
CharParser is defined here as:
type CharParser st = GenParser Char st
CharParser is thus a type synonym involving GenParser, itself defined here as:
type GenParser tok st = Parsec [tok] st
GenParser is then another type synonym, assigned using Parsec, defined here as:
type Parsec s u = ParsecT s u Identity
So Parsec is a partial application of ParsecT, itself listed here with type:
data ParsecT s u m a
along with the words:
"ParsecT s u m a is a parser with stream type s, user state type u,
underlying monad m and return type a."
What is the underlying monad? In particular, what is it when I use the CharParser parsers? I can't see where it's inserted in the stack. Is there a relationship to the use of the list monad in Monadic Parsing in Haskell to return multiple successful parses from an ambiguous parser?
In your case the underlying monad is Identity. However ParsecT is different from most monad transformers in that it is an instance of the Monad class even if the type parameter m is not. If you look at the source code you will note the lack of "(Monad m) =>" in the instance declaration.
So then you ask yourself, "If I were to have a non-trivial monad stack, where would it be used?"
There are a three of answers to that question:
It is used to uncons the next token out of the stream:
class (Monad m) => Stream s m t | s -> t where
uncons :: s -> m (Maybe (t,s))
Notice that uncons takes an s (the stream of tokens t) and returns its result wrapped in your monad. This allows one to do interesting thing while or even during the process of getting the next token.
It is used in the resulting output of each parser. This means you can create parsers that don't touch the input but take action in the underlying monad and use the combinators to bind them to regular parsers. In other words, lift (x :: m a) :: ParsecT s u m a.
Finally, the end result of RunParsecT and friends (until you build up to the point where m is replaced by Identity) return their results wrapped in this monad.
There is not a relationship between this monad and the one from Monadic Parsing in Haskell. In this case Hutton and Meijer are referring to the monad instance for ParsecT itself. The fact that in Parsec-3.0.0 and beyond ParsecT has become a monad transformer with an underlying monad is not relevant to the paper.
What I think you are looking for however is where the list of possible results went. In Hutton and Meijer the parser returns a list of all possible results while Parsec stubbornly returns only one. I think you are looking at the m in the result and thinking to yourself that the list of results must be hiding in there somewhere. It is not.
Parsec, for reasons of efficiency, made a choice to prefer the first matching result in Hutton and Meijer's list of results. This let's it toss away both the unused results in the tail of Hutton and Meijer's list and also the front of the stream of tokens because we never backtrack. In parsec, given the combined parser a <|> b, if a consumes any input b will never be evaluated. The way around this is try which will reset the state back to where it was if a fails then evaluate b.
You asked in the comments if this was done using Maybe or Either. The answer is "almost but not quite." If you look at the low lever run* functions you see that they return an Algebraic type which tell weather input was consumed then a second which give either the result or an error message. These types work kind of like Either, but even they are not used directly. Rather then stretch this out further, I'll refer you to the post by Antoine Latter that explains how this works and why it is done this way.
GenParser is defined in terms of Parsec, not ParsecT. Parsec in turn is defined as
type Parsec s u = ParsecT s u Identity
So the answer is that when using CharParser the underlying monad is the Identity monad.

Resources