strange behavior reading a file

strange behavior reading a file - parsing

I am writing a program in Haskell
here it is the code
module Main
where
import IO
import Maybe
import Control.Monad.Reader
--il mio environment consiste in una lista di tuple/coppie chiave-valore
data Environment = Env {variables::[(String,String)]}deriving (Show)
fromEnvToPair :: Environment-> [(String,String)]
fromEnvToPair (Env e)= e
estrai' x d
|length x==0=[]
|otherwise=estrai x d
estrai (x:xs) d
| (x:xs)=="" =[]
| x== d=[]
| otherwise = x:(estrai xs d)
--estrae da una stringa tutti i caratteri saino a d
conta' x d n
| length x==0 = 0
|otherwise = conta x d n
conta (x:xs) d n
| x== d=n
| otherwise = (conta xs d (n+1))
primo (a,b,c)=a
secondo (a,b,c)=b
terzo (a,b,c)=c
estraifrom x d n
|n>=(length x) =[]
| x!!n==d = []
|otherwise = x!!n:(estraifrom x d (n+1))
readerContent :: Reader Environment Environment
readerContent =do
content <- ask
return ( content)
-- resolve a template into a string
resolve :: [Char]-> Reader Environment (String)
resolve key= do
varValue <- asks (lookupVar key)
return $ maybe "" id varValue
maketuple x =(k,v,l) where
k= (estrai' x ':')--usare estrai'
v=estraifrom x ';' (conta' x ':' 1)
l= (length k)+(length v)+2 --è l'offset dovuto al; e al :
makecontext x
| length x==0 = []
| (elem ':' x)&&(elem ';' x)==False = []
|otherwise= (k,v):makecontext (drop l x) where
t= maketuple x
k= primo t
v= secondo t
l= terzo t
doRead filename = do
bracket(openFile filename ReadMode) hClose(\h -> do
contents <- hGetContents h
return contents
let cont=makecontext contents
putStrLn (take 100 contents)
return (contents))
-- putStrLn (snd (cont!!1)))
-- putStrLn (take 100 contents))
-- estrae i caratteri di una stringa dall'inizio fino al carattere di controllo
-- aggiungere parametri to the environment
-- estrae i caratteri di una stringa dall'inizio fino al carattere di controllo
-- aggiungere parametri to the environment
-- lookup a variable from the environment
lookupVar :: [Char] -> Environment -> Maybe String
lookupVar name env = lookup name (variables env)
lookup' x t=[v| (k,v)<-t,k==x]
fromJust' :: Maybe a -> a
fromJust' (Just x) = x
fromJust' Nothing = error "fromJust: Nothing"
main = do
file<- doRead "context.txt"-- leggo il contesto
let env= Env( makecontext file) -- lo converto in Environment
let c1= fromEnvToPair(runReader readerContent env)
putStrLn(fromJust'(lookupVar "user" env))
--putStrLn ((lookup' "user" (fromEnvToPair env))!!0)-- read the environment
--putStrLn ("user"++ (fst (c1!!1)))
putStrLn ("finito")
--putStrLn("contesto" ++ (snd(context!!1)))
What I want to do is reading a file formating the content and puting it in Environment, well it read the file and does all the other stuff only if in doRead there is the line
putStrLn (take 100 contents)
otherwise I can not take anithing, somebody knows why?
I do not want to leave that line if I do not know why
thanks in advance
thanks in advance

Using one of the Haskell parser libraries can make this kind of thing much less painful and error-prone. Here's an example of how to do this with Attoparsec:
module Main where
import Control.Applicative
import qualified Data.Map as M
import Data.Attoparsec (maybeResult)
import qualified Data.Attoparsec.Char8 as A
import qualified Data.ByteString.Char8 as B
type Environment = M.Map String String
spaces = A.many $ A.char ' '
upTo delimiter = B.unpack <$> A.takeWhile (A.notInClass $ delimiter : " ")
<* (spaces >> A.char delimiter >> spaces)
entry = (,) <$> upTo ':' <*> upTo ';'
environment :: A.Parser Environment
environment = M.fromList <$> A.sepBy entry A.endOfLine
parseEnvironment :: B.ByteString -> Maybe Environment
parseEnvironment = maybeResult . flip A.feed B.empty . A.parse environment
If we have a file context.txt:
user: somebody;
home: somewhere;
x: 1;
y: 2;
z: 3;
We can test the parser as follows:
*Main> Just env <- parseEnvironment <$> B.readFile "context.txt"
*Main> print $ M.lookup "user" env
Just "somebody"
*Main> print env
fromList [("home","somewhere"),("user","somebody"),("x","1"),("y","2"),("z","3")]
Note that I'm using a Map to represent the environment, as camcann suggested in a comment on your previous Reader monad question.

I think the problem is laziness. The file handle is closed before that content is actually read. By taking and printing some of the content before closing the handle, you force it to load it before returning/closing the handle.
I suggest using the readFile function from System.IO.Strict. It loads the content stricly (non-lazy), and it also saves some pains working with file handles. You simply replace the call to doRead with readFile, as it has the same type signature.

Related

Stack overflow with two functions calling each other in Applicative parser

I'm doing data61's course: https://github.com/data61/fp-course. In the parser one, the following implementation will cause parse (list1 (character *> valueParser 'v')) "abc" stack overflow.
Existing code:
data List t =
Nil
| t :. List t
deriving (Eq, Ord)
-- Right-associative
infixr 5 :.
type Input = Chars
data ParseResult a =
UnexpectedEof
| ExpectedEof Input
| UnexpectedChar Char
| UnexpectedString Chars
| Result Input a
deriving Eq
instance Show a => Show (ParseResult a) where
show UnexpectedEof =
"Unexpected end of stream"
show (ExpectedEof i) =
stringconcat ["Expected end of stream, but got >", show i, "<"]
show (UnexpectedChar c) =
stringconcat ["Unexpected character: ", show [c]]
show (UnexpectedString s) =
stringconcat ["Unexpected string: ", show s]
show (Result i a) =
stringconcat ["Result >", hlist i, "< ", show a]
instance Functor ParseResult where
_ <$> UnexpectedEof =
UnexpectedEof
_ <$> ExpectedEof i =
ExpectedEof i
_ <$> UnexpectedChar c =
UnexpectedChar c
_ <$> UnexpectedString s =
UnexpectedString s
f <$> Result i a =
Result i (f a)
-- Function to determine is a parse result is an error.
isErrorResult ::
ParseResult a
-> Bool
isErrorResult (Result _ _) =
False
isErrorResult UnexpectedEof =
True
isErrorResult (ExpectedEof _) =
True
isErrorResult (UnexpectedChar _) =
True
isErrorResult (UnexpectedString _) =
True
-- | Runs the given function on a successful parse result. Otherwise return the same failing parse result.
onResult ::
ParseResult a
-> (Input -> a -> ParseResult b)
-> ParseResult b
onResult UnexpectedEof _ =
UnexpectedEof
onResult (ExpectedEof i) _ =
ExpectedEof i
onResult (UnexpectedChar c) _ =
UnexpectedChar c
onResult (UnexpectedString s) _ =
UnexpectedString s
onResult (Result i a) k =
k i a
data Parser a = P (Input -> ParseResult a)
parse ::
Parser a
-> Input
-> ParseResult a
parse (P p) =
p
-- | Produces a parser that always fails with #UnexpectedChar# using the given character.
unexpectedCharParser ::
Char
-> Parser a
unexpectedCharParser c =
P (\_ -> UnexpectedChar c)
--- | Return a parser that always returns the given parse result.
---
--- >>> isErrorResult (parse (constantParser UnexpectedEof) "abc")
--- True
constantParser ::
ParseResult a
-> Parser a
constantParser =
P . const
-- | Return a parser that succeeds with a character off the input or fails with an error if the input is empty.
--
-- >>> parse character "abc"
-- Result >bc< 'a'
--
-- >>> isErrorResult (parse character "")
-- True
character ::
Parser Char
character = P p
where p Nil = UnexpectedString Nil
p (a :. as) = Result as a
-- | Parsers can map.
-- Write a Functor instance for a #Parser#.
--
-- >>> parse (toUpper <$> character) "amz"
-- Result >mz< 'A'
instance Functor Parser where
(<$>) ::
(a -> b)
-> Parser a
-> Parser b
f <$> P p = P p'
where p' input = f <$> p input
-- | Return a parser that always succeeds with the given value and consumes no input.
--
-- >>> parse (valueParser 3) "abc"
-- Result >abc< 3
valueParser ::
a
-> Parser a
valueParser a = P p
where p input = Result input a
-- | Return a parser that tries the first parser for a successful value.
--
-- * If the first parser succeeds then use this parser.
--
-- * If the first parser fails, try the second parser.
--
-- >>> parse (character ||| valueParser 'v') ""
-- Result >< 'v'
--
-- >>> parse (constantParser UnexpectedEof ||| valueParser 'v') ""
-- Result >< 'v'
--
-- >>> parse (character ||| valueParser 'v') "abc"
-- Result >bc< 'a'
--
-- >>> parse (constantParser UnexpectedEof ||| valueParser 'v') "abc"
-- Result >abc< 'v'
(|||) ::
Parser a
-> Parser a
-> Parser a
P a ||| P b = P c
where c input
| isErrorResult resultA = b input
| otherwise = resultA
where resultA = a input
infixl 3 |||
My code:
instance Monad Parser where
(=<<) ::
(a -> Parser b)
-> Parser a
-> Parser b
f =<< P a = P p
where p input = onResult (a input) (\i r -> parse (f r) i)
instance Applicative Parser where
(<*>) ::
Parser (a -> b)
-> Parser a
-> Parser b
P f <*> P a = P b
where b input = onResult (f input) (\i f' -> f' <$> a i)
list ::
Parser a
-> Parser (List a)
list p = list1 p ||| pure Nil
list1 ::
Parser a
-> Parser (List a)
list1 p = (:.) <$> p <*> list p
However, if I change list to not use list1, or use =<< in list1, it works fine. It also works if <*> uses =<<. I feel like it might be an issue with tail recursion.
UPDATE:
If I use lazy pattern matching here
P f <*> ~(P a) = P b
where b input = onResult (f input) (\i f' -> f' <$> a i)
It works fine. Pattern matching here is the problem. I don't understand this... Please help!

If I use lazy pattern matching P f <*> ~(P a) = ... then it works fine. Why?
This very issue was discussed recently. You could also fix it by using newtype instead of data: newtype Parser a = P (Input -> ParseResult a).(*)
The definition of list1 wants to know both parser arguments to <*>, but actually when the first will fail (when input is exhausted) we don't need to know the second! But since we force it, it will force its second argument, and that one will force its second parser, ad infinitum.(**) That is, p will fail when input is exhausted, but we have list1 p = (:.) <$> p <*> list p which forces list p even though it won't run when the preceding p fails. That's the reason for the infinite looping, and why your fix with the lazy pattern works.
What is the difference between data and newtype in terms of laziness?
(*)newtype'd type always has only one data constructor, and pattern matching on it does not actually force the value, so it is implicitly like a lazy pattern. Try newtype P = P Int, let foo (P i) = 42 in foo undefined and see that it works.
(**) This happens when the parser is still prepared, composed; before the combined, composed parser even gets to run on the actual input. This means there's yet another, third way to fix the problem: define
list1 p = (:.) <$> p <*> P (\s -> parse (list p) s)
This should work regardless of the laziness of <*> and whether data or newtype was used.
Intriguingly, the above definition means that the parser will be actually created during run time, depending on the input, which is the defining characteristic of Monad, not Applicative which is supposed to be known statically, in advance. But the difference here is that the Applicative depends on the hidden state of input, and not on the "returned" value.

Indentation-aware parsing of expression trees

As a follow-up to this question, I am now trying to parse an expression language that has variables and case ... of ... expressions. The syntax should be indentation-based:
Expressions can span multiple lines, as long as every line is indented relative to the first one; i.e. this should be parsed as a single application:
f x y
z
q
Each alternative of a case expression needs to be on its own line, indented relative to the case keyword. Right-hand sides can span multiple lines.
case E of
C -> x
D -> f x
y
should be parsed into a single case with two alternatives, with x and f x y as the right-hand sides
I've simplified my code into the following:
import qualified Text.Megaparsec.Lexer as L
import Text.Megaparsec hiding (space)
import Text.Megaparsec.Char hiding (space)
import Text.Megaparsec.String
import Control.Monad (void)
import Control.Applicative
data Term = Var String
| App [Term]
| Case Term [(String, Term)]
deriving Show
space :: Parser ()
space = L.space (void spaceChar) empty empty
name :: Parser String
name = try $ do
s <- some letterChar
if s `elem` ["case", "of"]
then fail $ unwords ["Unexpected: reserved word", show s]
else return s
term :: Parser () -> Parser Term
term sp = App <$> atom `sepBy1` try sp
where
atom = choice [ caseBlock
, Var <$> L.lexeme sp name
]
caseBlock = L.lineFold sp $ \sp' ->
Case <$>
(L.symbol sp "case" *> L.lexeme sp (term sp) <* L.symbol sp' "of") <*>
alt sp' `sepBy` try sp' <* sp
alt sp' = L.lineFold sp' $ \sp'' ->
(,) <$> L.lexeme sp' name <* L.symbol sp' "->" <*> term sp''
As you can see, I am trying to use the technique from this answer to separate alternatives with sp'aces that are more indented than the case keyword.
Problems
This seems to work for single expressions made up of application only:
λ» parseTest (L.lineFold space term) "x y\n z"
App [Var "x",Var "y",Var "z"]
It doesn't work for list of such expressions using the technique from the linked answer:
λ» parseTest (L.lineFold space $ \sp -> (term sp `sepBy` try sp)) "x\n y\nz"
3:1:
incorrect indentation (got 1, should be greater than 1)
case expressions fail out of the gate when trying to use line-folding:
λ» parseTest (L.lineFold space term) "case x of\n C -> y\n D -> z"
1:5:
Unexpected: reserved word "case"
case works without line folding for the outermost expression, for one alternative only:
λ» parseTest (term space) "case x of\n C -> y\n z"
App [Case (App [Var "x"]) [("C",App [Var "y",Var "z"])]]
But case fails as soon as I have multiple alternatives:
λ» parseTest (term space) "case x of\n C -> y\n D -> z"
3:2:
incorrect indentation (got 2, should be greater than 2)
What am I doing wrong?

I'm answering since I promised to take a look at this. This problem represents a rather difficult problem for Parsec-like parsers in their current state. I probably could make it work after spending much more time that I have available, but in the slot of time I can spend answering this, I only got this far:
module Main (main) where
import Control.Applicative
import Control.Monad (void)
import Text.Megaparsec
import Text.Megaparsec.String
import qualified Data.List.NonEmpty as NE
import qualified Text.Megaparsec.Lexer as L
data Term = Var String
| App [Term]
| Case Term [(String, Term)]
deriving Show
scn :: Parser ()
scn = L.space (void spaceChar) empty empty
sc :: Parser ()
sc = L.space (void $ oneOf " \t") empty empty
name :: Parser String
name = try $ do
s <- some letterChar
if s `elem` ["case", "of"]
then (unexpected . Label . NE.fromList) ("reserved word \"" ++ s ++ "\"")
else return s
manyTerms :: Parser [Term]
manyTerms = many pTerm
pTerm :: Parser Term
pTerm = caseBlock <|> app -- parse a term first
caseBlock :: Parser Term
caseBlock = L.indentBlock scn $ do
void (L.symbol sc "case")
t <- Var <$> L.lexeme sc name -- not sure what sort of syntax case of
-- case expressions should have, so simplified to vars for now
void (L.symbol sc "of")
return (L.IndentSome Nothing (return . Case t) alt)
alt :: Parser (String, Term)
alt = L.lineFold scn $ \sc' ->
(,) <$> L.lexeme sc' name <* L.symbol sc' "->" <*> pTerm -- (1)
app :: Parser Term
app = L.lineFold scn $ \sc' ->
App <$> ((Var <$> name) `sepBy1` try sc' <* scn)
-- simplified here, with some effort should be possible to go from Var to
-- more general Term in applications
Your original grammar is left-recursive because every term can be either a case expression or an application and if it's an application, then the first part of it again can be either case expression or application, etc. You'll need to deal with that somehow.
Here is a session:
λ> parseTest pTerm "x y\n z"
App [Var "x",Var "y",Var "z"]
λ> parseTest pTerm "x\n y\nz"
App [Var "x",Var "y"]
λ> parseTest manyTerms "x\n y\nz"
[App [Var "x",Var "y"],App [Var "z"]]
λ> parseTest pTerm "case x of\n C -> y\n D -> z"
Case (Var "x") [("C",App [Var "y"]),("D",App [Var "z"])]
λ> parseTest pTerm "case x of\n C -> y\n z"
3:3:
incorrect indentation (got 3, should be equal to 2)
This last result is because of (1) in the code. Introducing a parameter to app makes it impossible to use it without thinking of context (it would be no longer stand-alone expression, but factored-out part of something). We can see that if you indent z with respect to start of y application, not the entire alternative, it works:
λ> parseTest pTerm "case x of\n C -> y\n z"
Case (Var "x") [("C",App [Var "y",Var "z"])]
Finally, case expression works:
λ> parseTest pTerm "case x of\n C -> y\n D -> z"
Case (Var "x") [("C",App [Var "y"]),("D",App [Var "z"])]
My advice here would be to take a look at some pre-processor and use Megaparsec on top of that. The tools in Text.Megaparsec.Lexer are not that easy to apply in this case, but they are the best we could come up with and they work fine for simple indentation-sensitive grammars.

Operating on parsed data with attoparsec

Background
I've written a logfile parser using attoparsec. All my smaller parsers succeed, as does the composed final parser. I've confirmed this with tests. But I'm stumbling over performing operations with the parsed stream.
What I've tried
I started by trying to pass the successfully parsed input to a function. But all the seems to get is Done (), which I'm presuming means the logfile has been consumed by this point.
prepareStats :: Result Log -> IO ()
prepareStats r =
case r of
Fail _ _ _ -> putStrLn $ "Parsing failed"
Done _ parsedLog -> putStrLn "Success" -- This now has a [LogEntry] array. Do something with it.
main :: IO ()
main = do
[f] <- getArgs
logFile <- B.readFile (f :: FilePath)
let results = parseOnly parseLog logFile
putStrLn "TBC"
What I'm trying to do
I want to accumulate some stats from the logfile as I consume the input. For example, I'm parsing response codes and I'd like to count how many 2** responses there were and how many 4/5** ones. I'm parsing the number of bytes each response returned as Ints, and I'd like to efficiently sum these (sounds like a foldl'?). I've defined a data type like this:
data Stats = Stats {
successfulRequestsPerMinute :: Int
, failingRequestsPerMinute :: Int
, meanResponseTime :: Int
, megabytesPerMinute :: Int
} deriving Show
And I'd like to constantly update that as I parse the input. But the part of performing operations as I consume is where I got stuck. So far print is the only function I've successfully passed output to and it showed the parsing is succeeding by returning Done before printing the output.
My main parser(s) look like this:
parseLogEntry :: Parser LogEntry
parseLogEntry = do
ip <- logItem
_ <- char ' '
logName <- logItem
_ <- char ' '
user <- logItem
_ <- char ' '
time <- datetimeLogItem
_ <- char ' '
firstLogLine <- quotedLogItem
_ <- char ' '
finalRequestStatus <- intLogItem
_ <- char ' '
responseSizeB <- intLogItem
_ <- char ' '
timeToResponse <- intLogItem
return $ LogEntry ip logName user time firstLogLine finalRequestStatus responseSizeB timeToResponse
type Log = [LogEntry]
parseLog :: Parser Log
parseLog = many $ parseLogEntry <* endOfLine
Desired outcome
I want to pass each parsed line to a function that will update the above data type. Ideally I want this to be very memory efficient because it'll be operating on large files.

You have to make your unit of parsing a single log entry rather than a list of log entries.
It's not pretty, but here is an example of how to interleave parsing and processing:
(Depends on bytestring, attoparsec and mtl)
{-# LANGUAGE NoMonomorphismRestriction, FlexibleContexts #-}
import qualified Data.ByteString.Char8 as BS
import qualified Data.Attoparsec.ByteString.Char8 as A
import Data.Attoparsec.ByteString.Char8 hiding (takeWhile)
import Data.Char
import Control.Monad.State.Strict
aWord :: Parser BS.ByteString
aWord = skipSpace >> A.takeWhile isAlphaNum
getNext :: MonadState [a] m => m (Maybe a)
getNext = do
xs <- get
case xs of
[] -> return Nothing
(y:ys) -> put ys >> return (Just y)
loop iresult =
case iresult of
Fail _ _ msg -> error $ "parse failed: " ++ msg
Done x' aword -> do lift $ process aword; loop (parse aWord x')
Partial _ -> do
mx <- getNext
case mx of
Just y -> loop (feed iresult y)
Nothing -> case feed iresult BS.empty of
Fail _ _ msg -> error $ "parse failed: " ++ msg
Done x' aword -> do lift $ process aword; return ()
Partial _ -> error $ "partial returned" -- probably can't happen
process :: Show a => a -> IO ()
process w = putStrLn $ "got a word: " ++ show w
theWords = map BS.pack [ "this is a te", "st of the emergency ", "broadcasting sys", "tem"]
main = runStateT (loop (Partial (parse aWord))) theWords
Notes:
We parse a aWord at a time and call process after each word is recognized.
Use feed to feed the parser more input when it returns a Partial.
Feed the parser an empty string when there is no more input left.
When Done is return, process the recognized word and continue with parse aWord.
getNext is just an example of a monadic function which gets the next unit of input. Replace it with your own version - i.e. something that reads the next line from a file.
Update
Here is a solution using parseWith as #dfeuer suggested:
noMoreInput = fmap null get
loop2 x = do
iresult <- parseWith (fmap (fromMaybe BS.empty) getNext) aWord x
case iresult of
Fail _ _ msg -> error $ "parse failed: " ++ msg
Done x' aword -> do lift $ process aword;
if BS.null x'
then do b <- noMoreInput
if b then return ()
else loop2 x'
else loop2 x'
Partial _ -> error $ "huh???" -- this really can't happen
main2 = runStateT (loop2 BS.empty) theWords

If each log entry is exactly one line, here's a simpler solution:
do loglines <- fmap BS.lines $ BS.readfile "input-file.log"
foldl' go initialStats loglines
where
go stats logline =
case parseOnly yourParser logline of
Left e -> error $ "oops: " ++ e
Right r -> let stats' = ... combine r with stats ...
in stats'
Basically you are just reading the file line-by-line and calling parseOnly on each line and accumulating the results.

This is properly done with a streaming library
main = do
f:_ <- getArgs
withFile f ReadMode $ \h -> do
result <- foldStream $ streamProcess $ streamHandle h
print result
where
streamHandle = undefined
streamProcess = undefined
foldStream = undefined
where the blanks can be filled by any streaming library, e.g.
import qualified Pipes.Prelude as P
import Pipes
import qualified Pipes.ByteString as PB
import Pipes.Group (folds)
import qualified Control.Foldl as L
import Control.Lens (view) -- or import Lens.Simple (view), or whatever
streamHandle = Pipes.ByteStream.fromHandle :: Handle -> Producer ByteString IO ()
in that case we might then divide the labor further thus:
streamProcess :: Producer ByteString m r -> Producer LogEntry m r
streamProcess p = streamLines p >-> lineParser
streamLines :: Producer ByteString m r -> Producer ByteString m r
streamLines p = L.purely fold L.list (view (Pipes.ByteString.lines p)) >-> P.map B.toStrict
lineParser :: Pipe ByteString LogEntry m r
lineParser = P.map (parseOnly line_parser) >-> P.concat -- concat removes lefts
(This is slightly laborious because pipes is sensible persnickety about accumulating lines, and memory generally: we are just trying to get a producer of individual strict bytestring lines, and then to convert that into a producer of parsed lines, and then to throw out bad parses, if there are any. With io-streams or conduit, things will be basically the same, and that particular step will be easier.)
We are now in a position to fold over our Producer LogEntry IO (). This can be done explicitly using Pipes.Prelude.fold, which makes a strict left fold. Here we will just cop the structure from user5402
foldStream str = P.fold go initial_stats id
where
go stats_till_now new_entry = undefined
If you get used to the use of the foldl library and the application of a fold to a Producer with L.purely fold some_fold, then you can build Control.Foldl.Folds for your LogEntries out of components and slot in different requests as you please.
If you use pipes-attoparsec and include the newline bit in your parser, then you can just write
handleToLogEntries :: Handle -> Producer LogEntry IO ()
handleToLogEntries h = void $ parsed my_line_parser (fromHandle h) >-> P.concat
and get the Producer LogEntry IO () more directly. (This ultra-simple way of writing it will, however, stop at a bad parse; dividing on lines first will be faster than using attoparsec to recognize newlines.) This is very simple with io-streams too, you would write something like
import qualified System.IO.Streams as Streams
io :: Handle -> IO ()
io h = do
bytes <- Streams.handleToInputStream h
log_entries <- Streams.parserToInputStream my_line_parser bytes
fold_result <- Stream.fold go initial_stats log_entries
print fold_result
or to keep with the structure above:
where
streamHandle = Streams.handleToInputStream
streamProcess io_bytes =
io_bytes >>= Streams.parserToInputStream my_line_parser
foldStream io_logentries =
log_entries >>= Stream.fold go initial_stats
Either way, my_line_parser should return a Maybe LogEntry and should recognize the newline.

Unplanned greedy behaviour in uu-parsinglib

The problem
I came across a problem today and I do not know how to solve it. It is very strange to me, because the code I've written should (according to my current knowledge) is correct.
So below you can find a sample parser combinators. The most important one is pOperator, which in very simple way (only for demonstration purposes) builds an operator AST.
It consumes "x" and can consume multiple "x" separated by whitespaces.
I've got also pParens combinator which is defined like:
pPacked pParenL (pWSpaces *> pParenR)
so it consumes Whitespaces before closing bracket.
Sample input / output
The correct input/output SHOULD be:
in: "(x)"
out: Single "x"
in: "(x )"
out: Single "x"
but I'm getting:
in: "(x)"
out: Single "x"
in: "(x )"
out: Multi (Single "x") (Single "x")
-- Correcting steps:
-- Inserted 'x' at position LineColPos 0 3 3 expecting one of ['\t', ' ', 'x']
but in the second example I'm getting error - and the parser behaves like it greedy eats some tokens (and there is no greedy operation).
I would be thankful for any help with it.
Sample code
import Prelude hiding(lex)
import Data.Char hiding (Space)
import qualified Text.ParserCombinators.UU as UU
import Text.ParserCombinators.UU hiding(parse)
import qualified Text.ParserCombinators.UU.Utils as Utils
import Text.ParserCombinators.UU.BasicInstances hiding (Parser)
data El = Multi El El
| Single String
deriving (Show)
---------- Example core grammar ----------
pElement = Single <$> pSyms "x"
pOperator = applyAll <$> pElement <*> pMany (flip <$> (Multi <$ pWSpaces1) <*> pElement)
---------- Basic combinators ----------
applyAll x (f:fs) = applyAll (f x) fs
applyAll x [] = x
pSpace = pSym ' '
pTab = pSym '\t'
pWSpace = pSpace <|> pTab
pWSpaces = pMany pWSpace
pWSpaces1 = pMany1 pWSpace
pMany1 p = (:) <$> p <*> pMany p
pSyms [] = pReturn []
pSyms (x : xs) = (:) <$> pSym x <*> pSyms xs
pParenL = Utils.lexeme $ pSym '('
pParenR = Utils.lexeme $ pSym ')'
pParens = pPacked pParenL (pWSpaces *> pParenR)
---------- Program ----------
pProgram = pParens pOperator
-- if you replace it with following line, it works:
-- pProgram = pParens pElement
-- so it seems like something in pOperator is greedy
tests = [ ("test", "(x)")
, ("test", "(x )")
]
---------- Helpers ----------
type Parser a = P (Str Char String LineColPos) a
parse p s = UU.parse ( (,) <$> p <*> pEnd) (createStr (LineColPos 0 0 0) s)
main :: IO ()
main = do
mapM_ (\(desc, p) -> putStrLn ("\n=== " ++ desc ++ " ===") >> run pProgram p) tests
return ()
run :: Show t => Parser t -> String -> IO ()
run p inp = do let (a, errors) = parse p inp
putStrLn ("-- Result: \n" ++ show a)
if null errors then return ()
else do putStr ("-- Correcting steps: \n")
show_errors errors
putStrLn "-- "
where show_errors :: (Show a) => [a] -> IO ()
show_errors = sequence_ . (map (putStrLn . show))
IMPORTANT
pOperator = applyAll <$> pElement <*> pMany (flip <$> (Multi <$ pWSpaces1) <*> pElement)
is equivalent to:
foldr pChainl pElement (Multi <$ pWSpaces1)
according to: Combinator Parsing: A Short Tutorial
And it is used to define operator precedense.

The definition of pMany reads:
pMany :: IsParser p => p a -> p [a]
pMany p = pList p
and this suggest the solution. When seeing the space we should not commit immediately to the choice to continue with more x-es so we define:
pMany :: IsParser p => p a -> p [a]
pMany_ng p = pList_ng p
Of course you may also call pList_ng immediately. Even better would be to write:
pParens (pChainr_ng (pMulti <$ pWSpaces1) px) --
I did not test it since I am not sure whether between x-es there should be at least one space etc.
Doaitse

megaparsec reports incorrect location on parse error

For this project I'm parsing in two stages. The first stage handles include/ifdef/define directives and chunks the input up into [Span] items which define their start/end points in the original inputs along with the body text. This stream is then parsed by the second stage into my AST for subsequent processing.
Each element of the AST carries it's source position and any semantic error caught after parsing prints the correct error position regardless of include depth. This part is crucial since it comes after the stage that has the problem.
The problem is given a parse error in the second stage from an included file it reports a bogus error with a location at the top level rule in the input. A parse error in the initial file works fine. The presence of any directives will divide even the initial file into multiple chunks so it's not a 'single chunk' vs. 'multiple chunks' issue.
Given the fact that the AST is getting the locations correct I'm stumped as to how Megaparsec is reporting bad info when parse errors are encountered.
I'm included my stream instance and (set|get)(Position|Input) code since these seem like the relevant bits. i feel like there must be some bit of megaparsec housekeeping that I'm not doing or that my Stream instance is invalid for some reason.
data Span = Span
{ spanStart :: SourcePos
, spanEnd :: SourcePos
, spanBody :: T.Text
} deriving (Eq, Ord, Show)
instance Stream [Span] where
type Token [Span] = Span
type Tokens [Span] = [Span]
tokenToChunk Proxy = pure
tokensToChunk Proxy = id
chunkToTokens Proxy = id
chunkLength Proxy = foldl1 (+) . map (T.length . spanBody)
chunkEmpty Proxy = all ((== 0) . T.length . spanBody)
positionAt1 Proxy pos (Span start _ _) = trace ("pos1" ++ show start) start
positionAtN Proxy pos [] = pos
positionAtN Proxy _ (Span start _ _:_) = trace ("posN" ++ show start) start
advance1 Proxy _ _ (Span _ end _) = end
advanceN Proxy _ pos [] = pos
advanceN Proxy _ _ ts = let Span _ end _ = last ts in end
take1_ [] = Nothing
take1_ s = case takeN_ 1 s of
Nothing -> Nothing
Just (sp, s') -> Just (head sp, s')
takeN_ _ [] = Nothing
takeN_ n s#(t:ts)
| s == [] = Nothing
| n <= 0 = Just ([t {spanEnd = spanStart t, spanBody = ""}], s)
| n < (T.length . spanBody) t = let (l, r) = T.splitAt n (spanBody t)
sL = spanStart t
eL = foldl (defaultAdvance1 (mkPos 3)) sL (T.unpack (T.tail l))
sR = defaultAdvance1 (mkPos 3) eL (T.last l)
eR = spanEnd t
l' = [Span sL eL l]
r' = (Span sR eR r):ts
in Just (trace (show n) l', r')
| n == (T.length . spanBody) t = Just ([t], ts)
| otherwise = case takeN_ (n - T.length (spanBody t)) ts of
Nothing -> Just ([t], [])
Just (t', ts') -> Just (t:t', ts')
takeWhile_ p s = fromJust $ takeN_ (go 0 s) s
where go n s = case take1_ s of
Nothing -> n
Just (c, s') -> if p c
then go (n + 1) s'
else n
Find include and swap to it:
"include" -> do
file <- between dquote dquote (many (alphaNumChar <|> char '.' <|> char '/' <|> char '_'))
s <- liftIO (Data.Text.IO.readFile file)
p <- getPosition
i <- getInput
pushPosition p
stack %= (:) (p, i)
setPosition (initialPos file)
setInput s
And if we reach the end of input pop stack and continue:
parseStream' :: StreamParser [Span]
parseStream' = concat <$> many p
where p = do
b <- tick <|> block
end <- option False (True <$ hidden eof)
h <- use stack
when (end && (h /= [])) $ do
popPosition
setInput (h ^?! ix 0 . _2)
stack %= tail
return b

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

strange behavior reading a file - parsing

Related

Stack overflow with two functions calling each other in Applicative parser

Indentation-aware parsing of expression trees

Operating on parsed data with attoparsec

Unplanned greedy behaviour in uu-parsinglib

megaparsec reports incorrect location on parse error

Categories

Resources