Elm Parser loop does not terminate - parsing

I am running into a parser recursion problem I can't figure out. Any advice on what is causing the issue would be appreciated.
The following code works fine when the function rawData is defined with a finite number elements (as shown in the commented code immediately below). But does not halt (until the stack overflows) when defined with Parser.loop as shown in the code. The same loop construct works fine with all the other functions (e.g. files and directories )
module Reader exposing (..)
import Parser exposing (..)
type TermCmd
= CD Argument
| LS
type Argument
= Home
| UpOne
| DownOne String
type Content
= Dir String (List Content)
| File Int String String
type alias RawData =
List ( List TermCmd, List Content )
rawData : Parser RawData
rawData =
loop [] <| loopHelper dataChunk -- This never ends...
-- succeed (\a b c d -> [ a, b, c, d ]) -- but this works
-- |= dataChunk
-- |= dataChunk
-- |= dataChunk
-- |= dataChunk
dataChunk : Parser ( List TermCmd, List Content )
dataChunk =
succeed (\cmds ctnt -> ( cmds, ctnt ))
|= commands
|= contents
directory : Parser Content
directory =
succeed Dir
|. symbol "dir"
|. spaces
|= (chompUntilEndOr "\n"
|> getChompedString
)
|= succeed []
|. spaces
file : Parser Content
file =
succeed File
|= int
|. spaces
|= (chompWhile (\c -> c /= '.' && c /= '\n')
|> getChompedString
)
|= (chompUntilEndOr "\n"
|> getChompedString
|> Parser.map (String.dropLeft 1)
)
|. spaces
command : Parser TermCmd
command =
succeed identity
|. symbol "$"
|. spaces
|= oneOf
[ succeed CD
|. symbol "cd"
|. spaces
|= argument
, succeed LS
|. symbol "ls"
]
|. spaces
argument : Parser Argument
argument =
oneOf
[ succeed UpOne |. symbol ".."
, succeed Home |. symbol "/"
, succeed DownOne |= (chompUntilEndOr "\n" |> getChompedString)
, problem "Bad argument"
]
|. spaces
contents : Parser (List Content)
contents =
let
contentHelper revContent =
oneOf
[ succeed (\ctnt -> Loop (ctnt :: revContent))
|= file
, succeed (\ctnt -> Loop (ctnt :: revContent))
|= directory
, succeed ()
|> map (\_ -> Done (List.reverse revContent))
]
in
loop [] contentHelper
commands : Parser (List TermCmd)
commands =
loop [] <| loopHelper command
directories : Parser (List Content)
directories =
loop [] <| loopHelper directory
files : Parser (List Content)
files =
loop [] <| loopHelper file
loopHelper : Parser a -> List a -> Parser (Step (List a) (List a))
loopHelper parser revContent =
oneOf
[ succeed (\ctnt -> Loop (ctnt :: revContent))
|= parser
, succeed ()
|> map (\_ -> Done (List.reverse revContent))
]
sampleInput =
"$ cd /\n$ ls\ndir a\n14848514 b.txt\n8504156 c.dat\ndir d\n$ cd a\n$ ls\ndir e\n29116 f\n2557 g\n62596 h.lst\n$ cd e\n$ ls\n584 i\n$ cd ..\n$ cd ..\n$ cd d\n$ ls\n4060174 j\n8033020 d.log\n5626152 d.ext\n7214296 k"
The rawData function goes into an infinite loop, but the same construct (loop [] <| loopHelper parser) works fine everywhere else.

You can probably get a hint at what the problem is by running your four-step parser (i.e. the one starting succeed (\a b c d -> [ a, b, c, d ])) on an empty string. If you do this, you get the following result:
Ok [([],[]),([],[]),([],[]),([],[])]
Take a second to think about what you would get for a five-step parser, or a ten-step parser, or even a 100-step parser. loop provides a parser that can run for any number of steps.
The Elm documentation for the loop function hints at your problem:
Parsers like succeed () and chompWhile Char.isAlpha can succeed without consuming any characters. So in some cases you may want to use getOffset to ensure that each step actually consumed characters. Otherwise you could end up in an infinite loop!
Your parser is encountering an infinite loop because it is outputting an infinitely long list of tuples, each of which has an empty list of commands. Your parser consumes no characters as it generates each such tuple, so it will loop forever.
It seems that in your case an empty list of commands makes no sense. So we must ensure that an empty list of commands causes an unsuccessful parse.
One way to do this is to write a variation of loopHelper that fails if the list is empty:
checkNonEmpty : List a -> Parser ()
checkNonEmpty list =
if List.isEmpty list then
problem "List is empty"
else
succeed ()
loopHelperNonEmpty : Parser a -> List a -> Parser (Step (List a) (List a))
loopHelperNonEmpty parser revContent =
oneOf
[ succeed (\ctnt -> Loop (ctnt :: revContent))
|= parser
, checkNonEmpty revContent
|> map (\_ -> Done (List.reverse revContent))
]
(I couldn't find an easy way to introduce getOffset here so I did something different.)
You then change the definition of commands to use this function instead of loopHelper:
commands : Parser (List TermCmd)
commands =
loop [] <| loopHelperNonEmpty command
I made this change to your code and it generated the following output:
Ok
[ ( [ CD Home, LS ]
, [ Dir "a" [], File 14848514 "b" "txt", File 8504156 "c" "dat", Dir "d" [] ]
)
, ( [ CD (DownOne "a"), LS ]
, [ Dir "e" [], File 29116 "f" "", File 2557 "g" "", File 62596 "h" "lst" ]
)
, ( [ CD (DownOne "e"), LS ]
, [ File 584 "i" "" ]
)
, ( [ CD UpOne, CD UpOne, CD (DownOne "d"), LS ]
, [ File 4060174 "j" "", File 8033020 "d" "log", File 5626152 "d" "ext", File 7214296 "k" "" ]
)
]
(I've formatted this for clarity. When investigating your code I just outputted the result of the parser into the browser window using Debug.toString(), but that would come out as one long line. I pasted it into VS Code, added a few linebreaks and got elm-format to format it into something nicer.)

Related

Passing argument to a ReadP Parser in Haskell

I am trying to create a parser from scratch in Haskell. I have problems passing a string as an argument to a function that is already part of a do block in which the parsing occurs. Why does the following Minimal viable example code return [] and not 4 as expected.
import Data.Char
import Text.ParserCombinators.ReadP
import Control.Applicative ((<|>))
type Parser a = ReadP a
token :: Parser a -> Parser a
token combinator = (do spaces
combinator)
space :: Parser Char
space = satisfy isSpace
spaces :: Parser String
spaces = many space
parseString input = readP_to_S (do
e <- pExpr
token eof
return e) input
pExpr = (do
pv <- pOpHelper
spaces
str <- string pv
return str
)
pOpHelper :: Parser String
pOpHelper = (do
e1 <- munch isDigit
return e1
)
I am of course interested in returning a processed version of whatever string pv returns. However I can't understand why the current setup wouldn't return anything besides [] on parseString "4" since calling just pOpHelper wihtout pExpr seems to work.
Edit
I think I have located the 'bug' to be part of the string function. I had a closer look at it here but I can't see from the documentation why it shouldn't work in the above. But the above code is narrowed down to the parts that produce the unintended outputs as specified.
EDIT EDIT
I have now narrowed the problem down even further. It has to do with how 'consumption' works for the parser. The problem is that if I give it parseString "4" the string pv expects the "4" that is returned by pv, but it will still be parsing the next characters on which munch isDigit is no longer satisfies. This means that it will only return [("4","")] rather than [] if the input is parseString "4 4", and only if the spaces has been added to the do-clause in pExpr.
But how can I work around this and avoid 'consuming' the string that I put as input. Is there a way to use look for instance, in the above documentation.
As pointed out in the comments below I am interested in transforming whatever is the input to pOpHelper and then passing its output to functions (in a recursion) that is part of the parent parser-function called. But how can I do it without consuming the input with pOpHelper first such that the following example would return str on input of "4":
pExpr = (do
pv <- pOpHelper
--spaces
str <- string pv
if str == "(4)" then return str -- do stuff!
else pfail
)
pOpHelper :: Parser String
pOpHelper = (do
e1 <- munch isDigit
return ( "(" ++ e1 ++ ")" )
)

Haskell (Parsing) code can't understand how it works

openBr = char '['
closeBr = char ']'
openPn = char '('
closePn = char ')'
star = char '*'
myParser =
(many1 star >>=
\vs -> myParser >>=
\x -> return (x+length vs)
) +++
(openBr >>
myParser >>=
\c -> closeBr >>
myParser >>=
\d -> return (c+d)
) +++
(openPn >>
myParser >>=
\c -> closePn >>
myParser >>=
\d -> return (c+d)
) +++
return 0
parse myParser "*(***[*(**)]*)*"
-- outputs ([9,""])
parse myParser "*(***[*(**]*)*"
-- outputs ([1, "(***[*(**]*)*"])
when there is a matching bracket and parentheses it returns the number of stars however it returns like the second output. I don't understand how the code works. Can someone explain it for me?
First of all, when writing monadic Haskell code like this, It is idiomatic to use do notation. This reduces noise and makes your intention clearer.
Rewritten using do notation, your parser would look like this:
myParser =
-- run of stars
(do
vs <- many1 star
x <- myParser
return (x+length vs)
) +++
-- brackets
(do
openBr
c <- myParser
closeBr
d <- myParser
return (c+d)
) +++
-- parentheses
(do
openPn
c <- myParser
closePn
d <- myParser
return (c+d)
) +++
-- fail
return 0
Now, when you parse the second example, the parser correctly parses the first star, and tots the total up to 1. It then sees the open bracket, so it chooses the bracket choice.
The brackets are not balanced however, so the bracket parser fails, despite being able to parse a some of the characters following the open bracket. This causes the failure to bubble upward and fail the overall parse. When this happens, the parse function returns the return value accumulated to date, i.e. the number of stars before the failure, as well as the remainder of the string which failed to parse.
What you want to happen in this situation is for the parse to fail and return 0. Since your parsing library doesn't automatically backtrack, you'll need to explicitly enable backtracking by wrapping the subparsers in your library's equivalent of the try combinator.
Happy Haskelling!

Haskell: Parsing a file finishes after first expression despite more input in file

The following is an example program of a language in which I'm writing a parser.
n := 1
Do (1)-> -- The 1 in brackets is a placeholder for a Boolean or relational expression.
n := 1 + 1
Od
When the program looks like this, the parseFile functions ends after the assignment on the first line however when the assignment is removed, it parses as expected. Below is how it's called in GHCI, first with the first line present then removed:
λ > parseFile "example.hnry"
Assign "n" (HInteger 1)
λ > parseFile "example.hnry"
Do (HInteger 1) (Assign "n" (AExpr (HInteger 1) Add (HInteger 1)))
The expected output would look similar to this:
λ > parseFile "example.hnry"
Assign "n" (HInteger 1) Do (HInteger 1) (Assign "n" (AExpr (HInteger 1) Add (HInteger 1)))
I first assumed it was something to do with the the assignment parser but in the body of the loop, there exists an assignment which parses as expected so I was able to rule that out. I believe that the issue is within the parseFile function itself. The following is the parseFile function and the other functions that make up the parseExpression function that I'm using to parse a program.
I think that the error is within parseFile because it parses an expression only once and doesn't "loop" for the want of a better word to itself to check if there's more input left the parse. I think that's the error but I'm not quite sure.
parseFile :: String -> IO HVal
parseFile file =
do program <- readFile file
case parse parseExpression "" program of
Left err -> fail "Parse Error"
Right parsed -> return $ parsed
parseExpression :: Parser HVal
parseExpression = parseAExpr <|> parseDo <|> parseAssign
parseDo :: Parser HVal
parseDo = do
_ <- string "Do "
_ <- char '('
x <- parseHVal -- Will be changed to a Boolean expression
_ <- string ")->"
spaces
y <- parseExpression
spaces
_ <- string "Od"
return $ Do x y
parseAExpr :: Parser HVal
parseAExpr = do
x <- parseInteger
spaces
op <- parseOp
spaces
y <- parseInteger <|> do
_ <- char '('
z <- parseAExpr
_ <- char ')'
return $ z
return $ AExpr x op y
parseAssign :: Parser HVal
parseAssign = do
var <- oneOf ['a'..'z'] <|> oneOf ['A'..'Z']
spaces
_ <- string ":="
spaces
val <- parseHVal <|> do
_ <- char '('
z <- parseAExpr
_ <- char ')'
return $ z
return $ Assign [var] val
As you note, your parseFile function parses a single expression (though maybe "statement" would be a better name) using the parseExpression parser. You probably want to introduce a new parser for a "program" or sequence of expressions/statements:
parseProgram :: Parser [HVal]
parseProgram = spaces *> many (parseExpression <* spaces)
and then in parseFile, replace parseExpression with parseProgram:
parseFile :: String -> IO [HVal]
parseFile file =
do program <- readFile file
case parse parseProgram "" program of
Left err -> fail "Parse Error"
Right parsed -> return $ parsed
Note that I've had to change the type here from HVal to [HVal] to reflect the fact that a program, being a sequence of expressions each of type HVal, needs to be represented as some sort of data type capable of combining multiple HVals together, and a list [HVal] is one way of doing so.
If you want a program to be an HVal instead of an [HVal], then you need to introduce a new constructor in your HVal type that's capable of representing programs. One method is to use a constructor to directly represent a block of statements:
data HVal = ... | Block [HVal]
Another is to add a constructor represent a sequence of two statements:
data HVal = ... | Seq HVal HVal
Both methods are used in real parsers. (Note that you'd normally pick one; you wouldn't use both.) To represent a sequence of three assignment statements, for example, the block method would do it directly as a list:
Block [Assign "a" (HInteger 1), Assign "b" (HInteger 2), Assign "c" (HInteger 3)]
while the two-statement sequence method would build a sort of nested tree:
Seq (Assign "a" (HInteger 1)) (Seq (Assign "b" (HInteger 2)
(Assign "c" (HInteger 3))
The appropriate parsers for these two alternatives, both of which return a plain HVal, might be:
-- use blocks
parseProgram1 :: Parser HVal
parseProgram1 = do
spaces
xs <- many (parseExpression <* spaces)
return $ Block xs
parseProgram2 :: Parser HVal
parseProgram2 = do
spaces
x <- parseExpression
spaces
(do xs <- parseProgram2
return $ Seq x xs)
<|> return x

Haskell read variable name

I need to write a code that parses some language. I got stuck on parsing variable name - it can be anything that is at least 1 char long, starts with lowercase letter and can contain underscore '_' character. I think I made a good start with following code:
identToken :: Parser String
identToken = do
c <- letter
cs <- letdigs
return (c:cs)
where letter = satisfy isLetter
letdigs = munch isLetter +++ munch isDigit +++ munch underscore
num = satisfy isDigit
underscore = \x -> x == '_'
lowerCase = \x -> x `elem` ['a'..'z'] -- how to add this function to current code?
ident :: Parser Ident
ident = do
_ <- skipSpaces
s <- identToken
skipSpaces; return $ s
idents :: Parser Command
idents = do
skipSpaces; ids <- many1 ident
...
This function however gives me a weird results. If I call my test function
test_parseIdents :: String -> Either Error [Ident]
test_parseIdents p =
case readP_to_S prog p of
[(j, "")] -> Right j
[] -> Left InvalidParse
multipleRes -> Left (AmbiguousIdents multipleRes)
where
prog :: Parser [Ident]
prog = do
result <- many ident
eof
return result
like this:
test_parseIdents "test"
I get this:
Left (AmbiguousIdents [(["test"],""),(["t","est"],""),(["t","e","st"],""),
(["t","e","st"],""),(["t","est"],""),(["t","e","st"],""),(["t","e","st"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],"")])
Note that Parser is just synonym for ReadP a.
I also want to encode in the parser that variable names should start with a lowercase character.
Thank you for your help.
Part of the problem is with your use of the +++ operator. The following code works for me:
import Data.Char
import Text.ParserCombinators.ReadP
type Parser a = ReadP a
type Ident = String
identToken :: Parser String
identToken = do c <- satisfy lowerCase
cs <- letdigs
return (c:cs)
where lowerCase = \x -> x `elem` ['a'..'z']
underscore = \x -> x == '_'
letdigs = munch (\c -> isLetter c || isDigit c || underscore c)
ident :: Parser Ident
ident = do _ <- skipSpaces
s <- identToken
skipSpaces
return s
test_parseIdents :: String -> Either String [Ident]
test_parseIdents p = case readP_to_S prog p of
[(j, "")] -> Right j
[] -> Left "Invalid parse"
multipleRes -> Left ("Ambiguous idents: " ++ show multipleRes)
where prog :: Parser [Ident]
prog = do result <- many ident
eof
return result
main = print $ test_parseIdents "test_1349_zefz"
So what went wrong:
+++ imposes an order on its arguments, and allows for multiple alternatives to succeed (symmetric choice). <++ is left-biased so only the left-most option succeeds -> this would remove the ambiguity in the parse, but still leaves the next problem.
Your parser was looking for letters first, then digits, and finally underscores. Digits after underscores failed, for example. The parser had to be modified to munch characters that were either letters, digits or underscores.
I also removed some functions that were unused and made an educated guess for the definition of your datatypes.

Operating on parsed data with attoparsec

Background
I've written a logfile parser using attoparsec. All my smaller parsers succeed, as does the composed final parser. I've confirmed this with tests. But I'm stumbling over performing operations with the parsed stream.
What I've tried
I started by trying to pass the successfully parsed input to a function. But all the seems to get is Done (), which I'm presuming means the logfile has been consumed by this point.
prepareStats :: Result Log -> IO ()
prepareStats r =
case r of
Fail _ _ _ -> putStrLn $ "Parsing failed"
Done _ parsedLog -> putStrLn "Success" -- This now has a [LogEntry] array. Do something with it.
main :: IO ()
main = do
[f] <- getArgs
logFile <- B.readFile (f :: FilePath)
let results = parseOnly parseLog logFile
putStrLn "TBC"
What I'm trying to do
I want to accumulate some stats from the logfile as I consume the input. For example, I'm parsing response codes and I'd like to count how many 2** responses there were and how many 4/5** ones. I'm parsing the number of bytes each response returned as Ints, and I'd like to efficiently sum these (sounds like a foldl'?). I've defined a data type like this:
data Stats = Stats {
successfulRequestsPerMinute :: Int
, failingRequestsPerMinute :: Int
, meanResponseTime :: Int
, megabytesPerMinute :: Int
} deriving Show
And I'd like to constantly update that as I parse the input. But the part of performing operations as I consume is where I got stuck. So far print is the only function I've successfully passed output to and it showed the parsing is succeeding by returning Done before printing the output.
My main parser(s) look like this:
parseLogEntry :: Parser LogEntry
parseLogEntry = do
ip <- logItem
_ <- char ' '
logName <- logItem
_ <- char ' '
user <- logItem
_ <- char ' '
time <- datetimeLogItem
_ <- char ' '
firstLogLine <- quotedLogItem
_ <- char ' '
finalRequestStatus <- intLogItem
_ <- char ' '
responseSizeB <- intLogItem
_ <- char ' '
timeToResponse <- intLogItem
return $ LogEntry ip logName user time firstLogLine finalRequestStatus responseSizeB timeToResponse
type Log = [LogEntry]
parseLog :: Parser Log
parseLog = many $ parseLogEntry <* endOfLine
Desired outcome
I want to pass each parsed line to a function that will update the above data type. Ideally I want this to be very memory efficient because it'll be operating on large files.
You have to make your unit of parsing a single log entry rather than a list of log entries.
It's not pretty, but here is an example of how to interleave parsing and processing:
(Depends on bytestring, attoparsec and mtl)
{-# LANGUAGE NoMonomorphismRestriction, FlexibleContexts #-}
import qualified Data.ByteString.Char8 as BS
import qualified Data.Attoparsec.ByteString.Char8 as A
import Data.Attoparsec.ByteString.Char8 hiding (takeWhile)
import Data.Char
import Control.Monad.State.Strict
aWord :: Parser BS.ByteString
aWord = skipSpace >> A.takeWhile isAlphaNum
getNext :: MonadState [a] m => m (Maybe a)
getNext = do
xs <- get
case xs of
[] -> return Nothing
(y:ys) -> put ys >> return (Just y)
loop iresult =
case iresult of
Fail _ _ msg -> error $ "parse failed: " ++ msg
Done x' aword -> do lift $ process aword; loop (parse aWord x')
Partial _ -> do
mx <- getNext
case mx of
Just y -> loop (feed iresult y)
Nothing -> case feed iresult BS.empty of
Fail _ _ msg -> error $ "parse failed: " ++ msg
Done x' aword -> do lift $ process aword; return ()
Partial _ -> error $ "partial returned" -- probably can't happen
process :: Show a => a -> IO ()
process w = putStrLn $ "got a word: " ++ show w
theWords = map BS.pack [ "this is a te", "st of the emergency ", "broadcasting sys", "tem"]
main = runStateT (loop (Partial (parse aWord))) theWords
Notes:
We parse a aWord at a time and call process after each word is recognized.
Use feed to feed the parser more input when it returns a Partial.
Feed the parser an empty string when there is no more input left.
When Done is return, process the recognized word and continue with parse aWord.
getNext is just an example of a monadic function which gets the next unit of input. Replace it with your own version - i.e. something that reads the next line from a file.
Update
Here is a solution using parseWith as #dfeuer suggested:
noMoreInput = fmap null get
loop2 x = do
iresult <- parseWith (fmap (fromMaybe BS.empty) getNext) aWord x
case iresult of
Fail _ _ msg -> error $ "parse failed: " ++ msg
Done x' aword -> do lift $ process aword;
if BS.null x'
then do b <- noMoreInput
if b then return ()
else loop2 x'
else loop2 x'
Partial _ -> error $ "huh???" -- this really can't happen
main2 = runStateT (loop2 BS.empty) theWords
If each log entry is exactly one line, here's a simpler solution:
do loglines <- fmap BS.lines $ BS.readfile "input-file.log"
foldl' go initialStats loglines
where
go stats logline =
case parseOnly yourParser logline of
Left e -> error $ "oops: " ++ e
Right r -> let stats' = ... combine r with stats ...
in stats'
Basically you are just reading the file line-by-line and calling parseOnly on each line and accumulating the results.
This is properly done with a streaming library
main = do
f:_ <- getArgs
withFile f ReadMode $ \h -> do
result <- foldStream $ streamProcess $ streamHandle h
print result
where
streamHandle = undefined
streamProcess = undefined
foldStream = undefined
where the blanks can be filled by any streaming library, e.g.
import qualified Pipes.Prelude as P
import Pipes
import qualified Pipes.ByteString as PB
import Pipes.Group (folds)
import qualified Control.Foldl as L
import Control.Lens (view) -- or import Lens.Simple (view), or whatever
streamHandle = Pipes.ByteStream.fromHandle :: Handle -> Producer ByteString IO ()
in that case we might then divide the labor further thus:
streamProcess :: Producer ByteString m r -> Producer LogEntry m r
streamProcess p = streamLines p >-> lineParser
streamLines :: Producer ByteString m r -> Producer ByteString m r
streamLines p = L.purely fold L.list (view (Pipes.ByteString.lines p)) >-> P.map B.toStrict
lineParser :: Pipe ByteString LogEntry m r
lineParser = P.map (parseOnly line_parser) >-> P.concat -- concat removes lefts
(This is slightly laborious because pipes is sensible persnickety about accumulating lines, and memory generally: we are just trying to get a producer of individual strict bytestring lines, and then to convert that into a producer of parsed lines, and then to throw out bad parses, if there are any. With io-streams or conduit, things will be basically the same, and that particular step will be easier.)
We are now in a position to fold over our Producer LogEntry IO (). This can be done explicitly using Pipes.Prelude.fold, which makes a strict left fold. Here we will just cop the structure from user5402
foldStream str = P.fold go initial_stats id
where
go stats_till_now new_entry = undefined
If you get used to the use of the foldl library and the application of a fold to a Producer with L.purely fold some_fold, then you can build Control.Foldl.Folds for your LogEntries out of components and slot in different requests as you please.
If you use pipes-attoparsec and include the newline bit in your parser, then you can just write
handleToLogEntries :: Handle -> Producer LogEntry IO ()
handleToLogEntries h = void $ parsed my_line_parser (fromHandle h) >-> P.concat
and get the Producer LogEntry IO () more directly. (This ultra-simple way of writing it will, however, stop at a bad parse; dividing on lines first will be faster than using attoparsec to recognize newlines.) This is very simple with io-streams too, you would write something like
import qualified System.IO.Streams as Streams
io :: Handle -> IO ()
io h = do
bytes <- Streams.handleToInputStream h
log_entries <- Streams.parserToInputStream my_line_parser bytes
fold_result <- Stream.fold go initial_stats log_entries
print fold_result
or to keep with the structure above:
where
streamHandle = Streams.handleToInputStream
streamProcess io_bytes =
io_bytes >>= Streams.parserToInputStream my_line_parser
foldStream io_logentries =
log_entries >>= Stream.fold go initial_stats
Either way, my_line_parser should return a Maybe LogEntry and should recognize the newline.

Resources