I've written this simple parser that take from the command line [ps auxww | ./myparser] and parses the output of the ps command in order to insert it into the process data structure I created.
I succeed to parse one line of the result String, but now I'm stuck trying to parse the whole string and return a [Process] and not a single Process. The problem is how to implement parsePS. It has to call many times myParser in order to parse every single line and return a list of Process and print it into the terminal.
Can someone help me?
I'm not sure what's failing for you, but I am guessing the spacing is killing you. If so, I have two ideas that might help.
Modify myParser to consume spaces at the end and the many combinator should work.
myParser = do
...
spaces
command <- pCommand
spaces -- CONSUME END OF LINE
return Entry{ ... }
Then many myParser should work.
Alternately, you could split the input into lines separately first and call parse on each.
argLines <- fmap lines getContents
(I take it you mean to burn the first line via getLine before the hGetContents?)
It sounds to me like you're looking for a way to parse each line in sequence and return a list of parsed results. How about mapM from the Prelude?
If myParser :: String -> Parser Process, then mapM myParser :: [String] -> Parser [Process], which seems to be what you're looking for (using generic names for Parsec's Parser types). So if you have a list of lines (call it lns) that you want to parse in sequence, you can use parse (mapM myParser) lns to get what you want.
Related
I'm using Numeric.readDec to parse numbers and reads to parse Strings. But I also need to know how many characters were read.
For example readDec "52 rest" returns [(52," rest")], and read 2 characters. But there isn't a great way that I can find to know that it read 2 characters.
You could check the string length of show 52, but if the input was 052 that would give you the wrong answer (this solution also wouldn't work for the string parsing which has escape characters). You also could use the length of the post parsed string subtracted from the length of the input string. But this is very inefficient for long strings with many parses.
How can this be done correctly and efficiently (preferably without just writing your own parse)?
With just base, instead of readDec, you can use readDecP from Text.Read.Lex, which uses a ReadP parser:
readDecP :: (Eq a, Num a) => ReadP a
The gather combinator in Text.ParserCombinators.ReadP returns the parse result along with the actual characters parsed:
gather :: ReadP a -> ReadP (String, a)
You can run the parser with readP_to_S, which gives back a ReadS parser, which is a function that accepts a string and produces a list of possible parses with the remainder of the string.
readP_to_S :: ReadP a -> ReadS a
type ReadS a = String -> [(a, String)]
An example in GHCi:
> import Text.ParserCombinators.ReadP (gather, readP_to_S)
> import Text.Read.Lex (readDecP)
> readP_to_S (gather readDecP) "52 rest"
[(("52",52)," rest")]
> readP_to_S (gather readDecP) "0644 permissions"
[(("0644",644)," permissions")]
You can simply check that there is only one valid parse if you want the result to be unambiguous, and then take the length of the first component to find the number of Char code points parsed.
These parsers are fairly limited, however; if you want something easier to use, faster, or able to produce more detailed error messages, then you should check out a more fully featured parsing package such as regex-applicative (regular grammars) or megaparsec (context-sensitive grammars).
I'm working on an EDI file parser, and I'm having considerable difficulty implementing an escape for the 'segment terminator'. For anyone fortunate enough to not work with EDI, the segment terminator (usually an apostrophe) is the deliter between segments, which are like cells.
The desired behaviour looks something like this:
ABC+123'DEF+567' -> ["ABC+123", "DEF+567"]
ABC+123?'DEF+567' -> ["ABC+123?'DEF+567"]
Using FParsec, without escaping the apostrophe (and, for simplicity, ignoring parameterisation), the parser looks something like this:
let pSegment = //logic to parse the contents of a segment
let pAllSegments = sepEndBy pSegment (str "'")
This approach with the above example would yield ["ABC+123?", "DEF+567"].
My next consideration was to use a regex:
let pAllSegments = sepEndBy pSegment (regex #"[^\?]'")
The problem here is that the character prior to the apostrophe is also consumed, leading to incomplete messages.
I'm fairly certain I just don't understand FParsec well enough here. Does anyone have any pointers?
The issue is in the parse contents step.
The parser is working 'bottom up'. It finds the contents of the segments, which are not permitted to contain the terminator, then finds that all these segments are separated by the terminator, and constructs the list.
My error was in the pSegment step, which was using a parameterised version of (?:[A-Za-z0-9 \\.]|\?[\?\+:\?])*. See that second ?? That should have been a '.
I am interested in learning how to send data efficiently between Haskell programs using standard input and output. Suppose I want to pipe two programs together: "P1" outputs the number 5 to stdout, and "P2" takes an integer from stdin, adds 1, and outputs it to stdout again. Right now, the best way I know to do this involves outputting the data as text from P1, parsing that text back to an integer in P2, and proceeding from there. For example:
P1.hs:
module Main where
main = do
print 5
P2.hs:
module Main where
main = fmap manipulateData getLine >>= print
where
manipulateData = (+ 1) . (read :: String -> Int)
Output:
$ (stack exec p1) | (stack exec p2)
6
I'd like to use standard i/o to send an integer without treating it as text, if possible. I'm assuming this still requires some sort of parsing to work, but I'm hoping it's possible to parse the data as binary and get a faster program.
Does Haskell have any way to make this straightforward? Since I am going from one fundamental Haskell datatype (Int) to the same type again with a pass through standard i/o in the middle, I'm wondering if there is an easy solution that doesn't require writing a custom binary parser (which I don't know how to do). Can anyone provide such a method?
Here is the code that I ended up with:
module Main where
import qualified Data.ByteString.Lazy as BS
import qualified Data.Binary as B
main :: IO ()
main = do
dat <- BS.getContents
print $ (B.decode dat :: Int) + 1
The other program uses similar imports and outputs 5 with the following line:
BS.putStr $ B.encode (5 :: Int)
The resulting programs can be piped together, and the resulting program behaves as required.
I'm having trouble working out how to use any of the functions in the Text.Parsec.Indent module provided by the indents package for Haskell, which is a sort of add-on for Parsec.
What do all these functions do? How are they to be used?
I can understand the brief Haddock description of withBlock, and I've found examples of how to use withBlock, runIndent and the IndentParser type here, here and here. I can also understand the documentation for the four parsers indentBrackets and friends. But many things are still confusing me.
In particular:
What is the difference between withBlock f a p and
do aa <- a
pp <- block p
return f aa pp
Likewise, what's the difference between withBlock' a p and do {a; block p}
In the family of functions indented and friends, what is ‘the level of the reference’? That is, what is ‘the reference’?
Again, with the functions indented and friends, how are they to be used? With the exception of withPos, it looks like they take no arguments and are all of type IParser () (IParser defined like this or this) so I'm guessing that all they can do is to produce an error or not and that they should appear in a do block, but I can't figure out the details.
I did at least find some examples on the usage of withPos in the source code, so I can probably figure that out if I stare at it for long enough.
<+/> comes with the helpful description “<+/> is to indentation sensitive parsers what ap is to monads” which is great if you want to spend several sessions trying to wrap your head around ap and then work out how that's analogous to a parser. The other three combinators are then defined with reference to <+/>, making the whole group unapproachable to a newcomer.
Do I need to use these? Can I just ignore them and use do instead?
The ordinary lexeme combinator and whiteSpace parser from Parsec will happily consume newlines in the middle of a multi-token construct without complaining. But in an indentation-style language, sometimes you want to stop parsing a lexical construct or throw an error if a line is broken and the next line is indented less than it should be. How do I go about doing this in Parsec?
In the language I am trying to parse, ideally the rules for when a lexical structure is allowed to continue on to the next line should depend on what tokens appear at the end of the first line or the beginning of the subsequent line. Is there an easy way to achieve this in Parsec? (If it is difficult then it is not something which I need to concern myself with at this time.)
So, the first hint is to take a look at IndentParser
type IndentParser s u a = ParsecT s u (State SourcePos) a
I.e. it's a ParsecT keeping an extra close watch on SourcePos, an abstract container which can be used to access, among other things, the current column number. So, it's probably storing the current "level of indentation" in SourcePos. That'd be my initial guess as to what "level of reference" means.
In short, indents gives you a new kind of Parsec which is context sensitive—in particular, sensitive to the current indentation. I'll answer your questions out of order.
(2) The "level of reference" is the "belief" referred in the current parser context state of where this indentation level starts. To be more clear, let me give some test cases on (3).
(3) In order to start experimenting with these functions, we'll build a little test runner. It'll run the parser with a string that we give it and then unwrap the inner State part using an initialPos which we get to modify. In code
import Text.Parsec
import Text.Parsec.Pos
import Text.Parsec.Indent
import Control.Monad.State
testParse :: (SourcePos -> SourcePos)
-> IndentParser String () a
-> String -> Either ParseError a
testParse f p src = fst $ flip runState (f $ initialPos "") $ runParserT p () "" src
(Note that this is almost runIndent, except I gave a backdoor to modify the initialPos.)
Now we can take a look at indented. By examining the source, I can tell it does two things. First, it'll fail if the current SourcePos column number is less-than-or-equal-to the "level of reference" stored in the SourcePos stored in the State. Second, it somewhat mysteriously updates the State SourcePos's line counter (not column counter) to be current.
Only the first behavior is important, to my understanding. We can see the difference here.
>>> testParse id indented ""
Left (line 1, column 1): not indented
>>> testParse id (spaces >> indented) " "
Right ()
>>> testParse id (many (char 'x') >> indented) "xxxx"
Right ()
So, in order to have indented succeed, we need to have consumed enough whitespace (or anything else!) to push our column position out past the "reference" column position. Otherwise, it'll fail saying "not indented". Similar behavior exists for the next three functions: same fails unless the current position and reference position are on the same line, sameOrIndented fails if the current column is strictly less than the reference column, unless they are on the same line, and checkIndent fails unless the current and reference columns match.
withPos is slightly different. It's not just a IndentParser, it's an IndentParser-combinator—it transforms the input IndentParser into one that thinks the "reference column" (the SourcePos in the State) is exactly where it was when we called withPos.
This gives us another hint, btw. It lets us know we have the power to change the reference column.
(1) So now let's take a look at how block and withBlock work using our new, lower level reference column operators. withBlock is implemented in terms of block, so we'll start with block.
-- simplified from the actual source
block p = withPos $ many1 (checkIndent >> p)
So, block resets the "reference column" to be whatever the current column is and then consumes at least 1 parses from p so long as each one is indented identically as this newly set "reference column". Now we can take a look at withBlock
withBlock f a p = withPos $ do
r1 <- a
r2 <- option [] (indented >> block p)
return (f r1 r2)
So, it resets the "reference column" to the current column, parses a single a parse, tries to parse an indented block of ps, then combines the results using f. Your implementation is almost correct, except that you need to use withPos to choose the correct "reference column".
Then, once you have withBlock, withBlock' = withBlock (\_ bs -> bs).
(5) So, indented and friends are exactly the tools to doing this: they'll cause a parse to immediately fail if it's indented incorrectly with respect to the "reference position" chosen by withPos.
(4) Yes, don't worry about these guys until you learn how to use Applicative style parsing in base Parsec. It's often a much cleaner, faster, simpler way of specifying parses. Sometimes they're even more powerful, but if you understand Monads then they're almost always completely equivalent.
(6) And this is the crux. The tools mentioned so far can only do indentation failure if you can describe your intended indentation using withPos. Quickly, I don't think it's possible to specify withPos based on the success or failure of other parses... so you'll have to go another level deeper. Fortunately, the mechanism that makes IndentParsers work is obvious—it's just an inner State monad containing SourcePos. You can use lift :: MonadTrans t => m a -> t m a to manipulate this inner state and set the "reference column" however you like.
Cheers!
Hello
after doing the parsing with a script in Haskell I got a file with the 'appearance' of lists of strings. However when I call the file content with the function getContents or hGetContents, ie, reading the contents I get something like: String with lines (schematically what I want is: "[" aaa "," bbb "" ccc "]" -> ["aaa", "bbb" "ccc"]). I have tried with the read function but without results. I need to work with these lists of strings to concatenating them all in a single list.
I'm using the lines function, but I think it only 'works' one line at a time, doesn't it?
What I need is a function that verify if an element of a line is repeted on other line. If I could have a list of a list of strings it could be easier (but what I have is a line of a string that looks like a list of strings)
Regards
Thanks.
I have tried with the read function but without results
Just tested, and it works fine:
Prelude> read "[\"aaa\",\"bbb\",\"ccc\"]" :: [String]
["aaa","bbb","ccc"]
Note that you need to give the return type explicitly, since it can't be determined from the type of the argument.
I think the function you are looking for is the lines function from Data.List (reexported by the Prelude) that breaks up a multi-line string into a list of strings.
in my understanding, what you can do is
create a function that receives a list of lists, each list is a line of the entire string, of the argument passed in, and checks if a element of a line occurs in other line.
then, this function passes the entire string, separated by lines using [lines][1].