I was watching a video made by Richard Cook on SafariBookOnline. He builds a command line app with Haskell. In this video, he explains some basic concepts while writing a program to parse command lines arguments.
I am quite new to Haskell, and I can't figure out why this code does not work:
dataPathParser :: Parser FilePath
dataPathParser = strOption $
value defaultDataPath
<> long "data-path"
<> short 'p'
<> metavar "DATAPATH"
<> help ("path to data file (default " ++ defaultDataPath ++ ")")
This code does not work at well:
itemDescriptionValueParser :: Parser String
itemDescriptionValueParser =
strOption (long "desc" <> short 'd' <> metavar "DESCRIPTION" <> help "description")
And actually, everywhere I wrote "<>", I got an error where the compiler tells me that:
• Variable not in scope:
(<>) :: Mod f5 a5 -> Mod f4 a4 -> Mod ArgumentFields ItemIndex
• Perhaps you meant one of these:
‘<$>’ (imported from Options.Applicative),
‘<*>’ (imported from Options.Applicative),
‘<|>’ (imported from Options.Applicative)
The problem I got is most probably due to the difference of versions of GHC and Optparse-applicative. I use the latest ones.
LTS Haskell 9.12: 0.13.2.0.
But since I am quite new, I can't figure out how to rewrite the code of Richard Cook.
I would appreciate any help.
Thanks in advance,
Alex
http://hackage.haskell.org/package/optparse-applicative-0.14.0.0/docs/Options-Applicative.html#t:Parser:
A modifier can be created by composing the basic modifiers provided by here using the Monoid operations mempty and mappend, or their aliases idm and <>.
It looks like it doesn't export <>, though, so you need to get it from Data.Monoid:
import Data.Monoid
... or just:
import Data.Monoid ((<>))
Related
I'm making a parser for a DSL in Haskell using Alex + Happy.
My DSL uses dice rolls as part of the possible expressions.
Sometimes I have an expression that I want to parse that looks like:
[some code...] 3D6 [... rest of the code]
Which should translate roughly to:
TokenInt {... value = 3}, TokenD, TokenInt {... value = 6}
My DSL also uses variables (basically, Strings), so I have a special token that handle variable names.
So, with this tokens:
"D" { \pos str -> TokenD pos }
$alpha [$alpha $digit \_ \']* { \pos str -> TokenName pos str}
$digit+ { \pos str -> TokenInt pos (read str) }
The result I'm getting when using my parse now is:
TokenInt {... value = 3}, TokenName { ... , name = "D6"}
Which means that my lexer "reads" an Integer and a Variable named "D6".
I have tried many things, for example, i changed the token D to:
$digit "D" $digit { \pos str -> TokenD pos }
But that just consumes the digits :(
Can I parse the dice roll with the numbers?
Or at least parse TokenInt-TokenD-TokenInt?
PS: I'm using PosN as a wrapper, not sure if relevant.
The way I'd go about it would be to extend the TokenD type to TokenD Int Int so using the basic wrapper for convenience I would do
$digit+ D $digit+ { dice }
...
dice :: String -> Token
dice s = TokenD (read $ head ls) (read $ last ls)
where ls = split 'D' s
split can be found here.
This is an extra step that'd usually be done in during syntactic analysis but doesn't hurt much here.
Also I can't make Alex parse $alpha for TokenD instead of TokenName. If we had Di instead of D that'd be no problem. From Alex's docs:
When the input stream matches more than one rule, the rule which matches the longest prefix of the input stream wins. If there are still several rules which match an equal number of characters, then the rule which appears earliest in the file wins.
But then your code should work. I don't know if this is an issue with Alex.
I decided that I could survive with variables starting with lowercase letters (like Haskell variables), so I changed my lexer to parse variables only if they start with a lowercase letter.
That also solved some possible problems with some other reserved words.
I'm still curious to know if there were other solutions, but the problem in itself was solved.
Thank you all!
I mean something like
LISTOF(EL) := "[" EL ("," EL)* "]"
LISTNUM := LISTOF(NUMBER)
LISTID := LISTOF(IDENT)
so, with definitions
NUMBER := ('0'-'9')*
IDENT := ('a'-'z'|'A'-'Z')*
we have following
[435,657,44] is example of LISTNUM,
[dsf,thg,ewre] is example of LISTID.
Or another example (e means empty string)
A(0) := e
A(n) := "a" A(n-1) | e
so, A(5) is set of all strings consist of 'a' with length not more than 5
Are there any science works describing something similar to this? Can we describe our grammars in such way and still be able to parse it in acceptable time?
The commonly used meta syntaxes such as Bnf, Abnf and Ebnf doesn't have parameterized rules. However Iso Ebnf is extensible as per the standard. If I remember correctly, the standard actually shows an example of introducing parameters.
You can get the standard here for free.
I am working on the Commandline REPL environment with Rascal and trying to view things like parse trees and outputs from The Ambiguity library. However, these are truncated in the commandline. For example:
rascal>diagnose(parse(|cwd:///Core/tests/F0.func|));
list[Message]: [
info(
"Ambiguity cluster with 2 alternatives",
|cwd:///Core/tests/F0.func|(0,0,<1,0>,<1,0>)),
info(
"Production unique to the one alternative: Exp = app: Exp Exp ;",
|cwd:///Core/tests/F0.func|(0,0,<1,0>,<1,0>)),
info(
"Production unique to th...
I'm interested in seeing the rest of this output. Is there a setting I can change, or someway I can view this information. Thanks.
This is done for performance reasons. (Terminal/Shells do not like printing HUGE strings)
You can import IO and use iprintln to get the indented print without any truncating. For performance reasons you could als use iprintToFile:
import IO;
r = diagnose(parse(|cwd:///Core/tests/F0.func|));
iprintln(r)
As an alternative, you might want to get the value in an editor using util::ValueUI::text: (only works in eclipse)
import util::ValueUI;
r = diagnose(parse(|cwd:///Core/tests/F0.func|));
text(r, 4); // indentation level is 4
Finally we sometimes copy values to the clipboard with util::Clipboard:
import util::Clipboard;
r = diagnose(parse(|cwd:///Core/tests/F0.func|));
copy(r)
and then you can paste them anywhere using your OS shortcut.
I'm having trouble working out how to use any of the functions in the Text.Parsec.Indent module provided by the indents package for Haskell, which is a sort of add-on for Parsec.
What do all these functions do? How are they to be used?
I can understand the brief Haddock description of withBlock, and I've found examples of how to use withBlock, runIndent and the IndentParser type here, here and here. I can also understand the documentation for the four parsers indentBrackets and friends. But many things are still confusing me.
In particular:
What is the difference between withBlock f a p and
do aa <- a
pp <- block p
return f aa pp
Likewise, what's the difference between withBlock' a p and do {a; block p}
In the family of functions indented and friends, what is ‘the level of the reference’? That is, what is ‘the reference’?
Again, with the functions indented and friends, how are they to be used? With the exception of withPos, it looks like they take no arguments and are all of type IParser () (IParser defined like this or this) so I'm guessing that all they can do is to produce an error or not and that they should appear in a do block, but I can't figure out the details.
I did at least find some examples on the usage of withPos in the source code, so I can probably figure that out if I stare at it for long enough.
<+/> comes with the helpful description “<+/> is to indentation sensitive parsers what ap is to monads” which is great if you want to spend several sessions trying to wrap your head around ap and then work out how that's analogous to a parser. The other three combinators are then defined with reference to <+/>, making the whole group unapproachable to a newcomer.
Do I need to use these? Can I just ignore them and use do instead?
The ordinary lexeme combinator and whiteSpace parser from Parsec will happily consume newlines in the middle of a multi-token construct without complaining. But in an indentation-style language, sometimes you want to stop parsing a lexical construct or throw an error if a line is broken and the next line is indented less than it should be. How do I go about doing this in Parsec?
In the language I am trying to parse, ideally the rules for when a lexical structure is allowed to continue on to the next line should depend on what tokens appear at the end of the first line or the beginning of the subsequent line. Is there an easy way to achieve this in Parsec? (If it is difficult then it is not something which I need to concern myself with at this time.)
So, the first hint is to take a look at IndentParser
type IndentParser s u a = ParsecT s u (State SourcePos) a
I.e. it's a ParsecT keeping an extra close watch on SourcePos, an abstract container which can be used to access, among other things, the current column number. So, it's probably storing the current "level of indentation" in SourcePos. That'd be my initial guess as to what "level of reference" means.
In short, indents gives you a new kind of Parsec which is context sensitive—in particular, sensitive to the current indentation. I'll answer your questions out of order.
(2) The "level of reference" is the "belief" referred in the current parser context state of where this indentation level starts. To be more clear, let me give some test cases on (3).
(3) In order to start experimenting with these functions, we'll build a little test runner. It'll run the parser with a string that we give it and then unwrap the inner State part using an initialPos which we get to modify. In code
import Text.Parsec
import Text.Parsec.Pos
import Text.Parsec.Indent
import Control.Monad.State
testParse :: (SourcePos -> SourcePos)
-> IndentParser String () a
-> String -> Either ParseError a
testParse f p src = fst $ flip runState (f $ initialPos "") $ runParserT p () "" src
(Note that this is almost runIndent, except I gave a backdoor to modify the initialPos.)
Now we can take a look at indented. By examining the source, I can tell it does two things. First, it'll fail if the current SourcePos column number is less-than-or-equal-to the "level of reference" stored in the SourcePos stored in the State. Second, it somewhat mysteriously updates the State SourcePos's line counter (not column counter) to be current.
Only the first behavior is important, to my understanding. We can see the difference here.
>>> testParse id indented ""
Left (line 1, column 1): not indented
>>> testParse id (spaces >> indented) " "
Right ()
>>> testParse id (many (char 'x') >> indented) "xxxx"
Right ()
So, in order to have indented succeed, we need to have consumed enough whitespace (or anything else!) to push our column position out past the "reference" column position. Otherwise, it'll fail saying "not indented". Similar behavior exists for the next three functions: same fails unless the current position and reference position are on the same line, sameOrIndented fails if the current column is strictly less than the reference column, unless they are on the same line, and checkIndent fails unless the current and reference columns match.
withPos is slightly different. It's not just a IndentParser, it's an IndentParser-combinator—it transforms the input IndentParser into one that thinks the "reference column" (the SourcePos in the State) is exactly where it was when we called withPos.
This gives us another hint, btw. It lets us know we have the power to change the reference column.
(1) So now let's take a look at how block and withBlock work using our new, lower level reference column operators. withBlock is implemented in terms of block, so we'll start with block.
-- simplified from the actual source
block p = withPos $ many1 (checkIndent >> p)
So, block resets the "reference column" to be whatever the current column is and then consumes at least 1 parses from p so long as each one is indented identically as this newly set "reference column". Now we can take a look at withBlock
withBlock f a p = withPos $ do
r1 <- a
r2 <- option [] (indented >> block p)
return (f r1 r2)
So, it resets the "reference column" to the current column, parses a single a parse, tries to parse an indented block of ps, then combines the results using f. Your implementation is almost correct, except that you need to use withPos to choose the correct "reference column".
Then, once you have withBlock, withBlock' = withBlock (\_ bs -> bs).
(5) So, indented and friends are exactly the tools to doing this: they'll cause a parse to immediately fail if it's indented incorrectly with respect to the "reference position" chosen by withPos.
(4) Yes, don't worry about these guys until you learn how to use Applicative style parsing in base Parsec. It's often a much cleaner, faster, simpler way of specifying parses. Sometimes they're even more powerful, but if you understand Monads then they're almost always completely equivalent.
(6) And this is the crux. The tools mentioned so far can only do indentation failure if you can describe your intended indentation using withPos. Quickly, I don't think it's possible to specify withPos based on the success or failure of other parses... so you'll have to go another level deeper. Fortunately, the mechanism that makes IndentParsers work is obvious—it's just an inner State monad containing SourcePos. You can use lift :: MonadTrans t => m a -> t m a to manipulate this inner state and set the "reference column" however you like.
Cheers!
my working in file parsing using Haskell, and I'm using both Data.Attoparsec.Char8 and Data.ByteString.Char8. I want to parse an expression which can contains symbols like : - / [ ] _ . (minus, slashes, braquets and underscore).
I've write the following parser
import qualified Data.ByteString.Char8 as B
import qualified Data.Attoparsec.Char8 as A
identifier' :: Parser B.ByteString
identifier' = A.takeWhile $ A.inClass "A-Za-z0-9_//- /[/]"
... but it's not works like expected.
ghc> A.parse identifier' (B.pack "EMBXSHM-PortClo")
Done "-PortClo" "EMBXSHM"
ghc> A.parse identifier' (B.pack "AU_D[1].PCMPTask")
Done ".PCMPTask" "AU_D[1]"
can someone help me.
Thanks for your time.
Take a look at the documentation: http://hackage.haskell.org/packages/archive/attoparsec/0.10.1.0/doc/html/Data-Attoparsec-ByteString-Char8.html#g:9
To add a "-" to a set, place it a the beginning or end of a string.
The latter doesn't parse because you don't have dots in your class listing.
You want to allow '-' characters in identifiers, but A.inClass uses '-' for ranges. You have to put it at the start or end of the range string:
To add a literal '-' to a set, place it at the beginning or end of the string.
— attoparsec documentation