Pattern matching for custom read function - parsing

I am writing a custom read function for one of the data types in my module. For eg, when I do read "(1 + 1)" :: Data, I want it to return Plus 1 1. My data declaration is data Data = Plus Int Int. Thanks

This sounds like something better suited to a parser; Parsec is a powerful Haskell parser combinator library, which I would recommend.

I'd like to second the notion of using a parser. However, if you absolutely have to use a pattern-matching, go like this:
import Data.List
data Expr = Plus Int Int | Minus Int Int deriving Show
test = [ myRead "(1 + 1)", myRead "(2-1)" ]
myRead = match . lexer
where
match ["(",a,"+",b,")"] = Plus (read a) (read b)
match ["(",a,"-",b,")"] = Minus (read a) (read b)
match garbage = error $ "Cannot parse " ++ show garbage
lexer = unfoldr next_lexeme
where
next_lexeme "" = Nothing
next_lexeme str = Just $ head $ lex str

You could use GHC's ReadP.

Related

Parsec sepBy Haskell

I wrote a function and it complies, but I'm not sure if it works the way I intend it to or how to call it in the terminal. Essentially, I want to take a string, like ("age",5),("age",6) and make it into a list of tuples [("age1",5)...]. I am trying to write a function separate the commas and either I am just not sure how to call it in the terminal or I did it wrong.
items :: Parser (String,Integer) -> Parser [(String,Integer)]
items p = do { p <- sepBy strToTup (char ",");
return p }
I'm not sure what you want and I don't know what is Parser.
Starting from such a string:
thestring = "(\"age\",5),(\"age\",6),(\"age\",7)"
I would firstly remove the outer commas with a regular expression method:
import Text.Regex
rgx = mkRegex "\\),\\("
thestring' = subRegex rgx thestring ")("
This gives:
>>> thestring'
"(\"age\",5)(\"age\",6)(\"age\",7)"
Then I would split:
import Data.List.Split
thelist = split (startsWith "(") thestring'
which gives:
>>> thelist
["(\"age\",5)","(\"age\",6)","(\"age\",7)"]
This is what you want, if I correctly understood.
That's probably not the best way. Since all the elements of the final list have form ("age", X) you could extract all numbers (I don't know but it should not be difficult) and then it would be easy to get the final list. Maybe better.
Apologies if this has nothing to do with your question.
Edit
JFF ("just for fun"), another way:
import Data.Char (isDigit)
import Data.List.Split
thestring = "(\"age\",15),(\"age\",6),(\"age\",7)"
ages = (split . dropBlanks . dropDelims . whenElt) (not . isDigit) thestring
map (\age -> "(age," ++ age ++ ")") ages
-- result: ["(age,15)","(age,6)","(age,7)"]
Or rather:
>>> map (\age -> ("age",age)) ages
[("age","15"),("age","6"),("age","7")]
Or if you want integers:
>>> map (\age -> ("age", read age :: Int)) ages
[("age",15),("age",6),("age",7)]
Or if you want age1, age2, ...:
import Data.List.Index
imap (\i age -> ("age" ++ show (i+1), read age :: Int)) ages
-- result: [("age1",15),("age2",6),("age3",7)]

How to add a condition that a parsed number must satisfy in FParsec?

I am trying to parse an int32 with FParsec but have an additional restriction that the number must be less than some maximum value. Is their a way to perform this without writing my own custom parser (as below) and/or is my custom parser (below) the appropriate way of achieving the requirements.
I ask because most of the built-in library functions seem to revolve around a char satisfying certain predicates and not any other type.
let pRow: Parser<int> =
let error = messageError ("int parsed larger than maxRows")
let mutable res = Reply(Error, error)
fun stream ->
let reply = pint32 stream
if reply.Status = Ok && reply.Result <= 1000000 then
res <- reply
res
UPDATE
Below is an attempt at a more fitting FParsec solution based on the direction given in the comment below:
let pRow2: Parser<int> =
pint32 >>= (fun x -> if x <= 1048576 then (preturn x) else fail "int parsed larger than maxRows")
Is this the correct way to do it?
You've done an excellent research and almost answered your own question.
Generally, there are two approaches:
Unconditionally parse out an int and let the further code to check it for validity;
Use a guard rule bound to the parser. In this case (>>=) is the right tool;
In order to make a good choice, ask yourself whether an integer that failed to pass the guard rule has to "give another chance" by triggering another parser?
Here's what I mean. Usually, in real-life projects, parsers are combined in some chains. If one parser fails, the following one is attempted. For example, in this question, some programming language is parsed, so it needs something like:
let pContent =
pLineComment <|> pOperator <|> pNumeral <|> pKeyword <|> pIdentifier
Theoretically, your DSL may need to differentiate a "small int value" from another type:
/// The resulting type, or DSL
type Output =
| SmallValue of int
| LargeValueAndString of int * string
| Comment of string
let pSmallValue =
pint32 >>= (fun x -> if x <= 1048576 then (preturn x) else fail "int parsed larger than maxRows")
|>> SmallValue
let pLargeValueAndString =
pint32 .>> ws .>>. (manyTill ws)
|>> LargeValueAndString
let pComment =
manyTill ws
|>> Comment
let pCombined =
[ pSmallValue; pLargeValueAndString; pComment]
|> List.map attempt // each parser is optional
|> choice // on each iteration, one of the parsers must succeed
|> many // a loop
Built this way, pCombined will return:
"42 ABC" gets parsed as [ SmallValue 42 ; Comment "ABC" ]
"1234567 ABC" gets parsed as [ LargeValueAndString(1234567, "ABC") ]
As we see, the guard rule impacts how the parsers are applied, so the guard rule has to be within the parsing process.
If, however, you don't need such complication (e.g., an int is parsed unconditionally), your first snippet is just fine.

Creating Simple Parser in F#

I am currently attempting to create an extremely simple parser in F# using FsLex and FsYacc. At first, the only functionality I am trying to achieve is allowing the program to take in a string that represents addition of integers and output a result. For example, I would want the parser to be able to take in "5 + 2" and output the string "7". I am only interested in string arguments and outputs because I would like to import the parser into Excel using Excel DNA once I extend the functionality to support more operations. However, I am currently struggling to get even this simple integer addition to work correctly.
My lexer.fsl file looks like:
{
module lexer
open System
open Microsoft.FSharp.Text.Lexing
open Parser
let lexeme = LexBuffer<_>.LexemeString
let ops = ["+", PLUS;] |> Map.ofList
}
let digit = ['0'-'9']
let operator = "+"
let integ = digit+
rule lang = parse
| integ
{INT(Int32.Parse(lexeme lexbuf))}
| operator
{ops.[lexeme lexbuf]}
My parser.fsy file looks like:
%{
open Program
%}
%token <int>INT
%token PLUS
%start input
%type <int> input
%%
input:
exp {$1}
;
exp:
| INT { $1 }
| exp exp PLUS { $1 + $2 }
;
Additionally, I have a Program.fs file that acts like an (extremely small) AST:
module Program
type value =
| Int of int
type op = Plus
Finally, I have the file Main.fs that is supposed to test out the functionality of the interpreter (as well as import the function into Excel).
module Main
open ExcelDna.Integration
open System
open Program
open Parser
[<ExcelFunction(Description = "")>]
let main () =
let x = "5 + 2"
let lexbuf = Microsoft.FSharp.Text.Lexing.LexBuffer<_>.FromString x
let y = input lexer.lang lexbuf
y
However, when I run this function, the parser doesn't work at all. When I build the project, the parser.fs and lexer.fs files are created correctly. I feel that there is something simple I am missing, but I have no idea how to make this function correctly.

Tracking locations in Genlex

I'm writing a parser for a language that is sufficiently simple for Genlex + camlp4 stream parsers to take care of it. However, I'd still be interested in having a more or less precise location (i.e. at least a line number) in case of parsing error.
My idea is to use an intermediate stream between the original char Stream and the token Stream of Genlex, that takes care of line counts, like in the code below, but I'm wondering whether there's a simpler solution?
let parse_file s =
let num_lines = ref 1 in
let bol = ref 0 in
let print_pos fmt i =
(* Emacs-friendly location *)
Printf.fprintf fmt "File %S, line %d, characters %d-%d:"
s !num_lines (i - !bol) (i - !bol)
in
(* Normal stream *)
let chan =
try open_in s
with
Sys_error e -> Printf.eprintf "Cannot open %s: %s\n%!" s e; exit 1
in
let chrs = Stream.of_channel chan in
(* Capture newlines and move num_lines and bol accordingly *)
let next i =
try
match Stream.next chrs with
| '\n' -> bol := i; incr num_lines; Some '\n'
| c -> Some c
with Stream.Failure -> None
in
let chrs = Stream.from next in
(* Pass that to the Genlex's lexer *)
let toks = lexer chrs in
let error s =
Printf.eprintf "%a\n%s %a\n%!"
print_pos (Stream.count chrs) s print_top toks;
exit 1
in
try
parse toks
with
| Stream.Failure -> error "Failure"
| Stream.Error e -> error ("Error " ^ e)
| Parsing.Parse_error -> error "Unexpected symbol"
A much simpler solution is to use Camlp4 grammars.
Parsers built this way allow one to get decent error messages "for free", unlike the case with stream parsers (which are a low level tool).
It could be that there is no need to define your own lexer, because OCaml's lexer suits your needs already. But if you really need your own lexer, then you can easily plug in a custom one:
module Camlp4Loc = Camlp4.Struct.Loc
module Lexer = MyLexer.Make(Camlp4Loc)
module Gram = Camlp4.Struct.Grammar.Static.Make(Lexer)
open Lexer
let entry = Gram.Entry.mk "entry"
EXTEND Gram
entry: [ [ ... ] ];
END
let parse str =
Gram.parse rule (Loc.mk file) (Stream.of_string str)
If you are new to OCaml, then all this module system trickery might seem at first like black voodoo magic :-) The fact that Camlp4 is a severely underdocumented beast might also contribute to the surreality of experience.
So never hesitate to ask a question (even a stupid one) on the mailing list.

Haskell Parsec items numeration

I'm using Text.ParserCombinators.Parsec and Text.XHtml to parse an input like this:
- First type A\n
-- First type B\n
- Second type A\n
-- First type B\n
--Second type B\n
And my output should be:
<h1>1 First type A\n</h1>
<h2>1.1 First type B\n</h2>
<h1>2 Second type A\n</h2>
<h2>2.1 First type B\n</h2>
<h2>2.2 Second type B\n</h2>
I have come to this part, but I cannot get any further:
title1= do{
;(count 1 (char '-'))
;s <- many1 anyChar newline
;return (h1 << s)
}
title2= do{
;(count 2 (char '--'))
;s <- many1 anyChar newline
;return (h1 << s)
}
text=do {
;many (choice [try(title1),try(title2)])
}
main :: IO ()
main = do t putStr "Error: " >> print err
Right x -> putStrLn $ prettyHtml x
This is ok, but it does not include the numbering.
Any ideas?
Thanks!
You probably want to use GenParser with a state containing the current section numbers as a list in reverse order, so section 1.2.3 will be represented as [3,2,1], and maybe the length of the list to avoid repeatedly counting it. Something like
data SectionState = SectionState {nums :: [Int], depth :: Int}
Then make your parser actions return type be "GenParser Char SectionState a". You can access the current state in your parser actions using "getState" and "setState". When you get a series of "-" at the start of a line count them and compare it with "depth" in the state, manipulate the "nums" list appropriately, and then emit "nums" in reverse order (I suggest keeping nums in reverse order because most of the time you want to access the least significant item, so putting it at the head of the list is both easier and more efficient).
See Text.ParserCombinators.Parsec.Prim for details of GenParser. The more usual Parser type is just "type Parser a = GenParser Char () a" You probably want to say
type MyParser a = GenParser Char SectionState a
somewhere near the start of your code.

Resources