Parsing columns of data with parsec

Parsing columns of data with parsec - parsing

I'm writing a parser to scan columns of numbers. like this :
T LIST2 LIST3 LIST4
1 235 623 684
2 871 699 557
3 918 686 49
4 53 564 906
5 246 344 501
6 929 138 474
The first line contain the name of the lists and I would like my program to parse exactly the same number of data as in the title (to exclude arrays with incoherent number of titles or columns).
I wrote this program :
title = do
tit <- many1 alphaNum
return tit
digits = do
dig <- many1 digit
return dig
parseSeries = do
spaces
titles <- title `sepBy` spaces
let nb = length titles
dat <- endBy (count (nb-1) (digits `sepBy` spaces)) endOfLine
spaces
return (titles,concat dat)
main = do
fichier <- readFile ("test_list3.txt")
putStrLn $ fichier
case parse parseSeries "(stdin)" fichier of
Left error -> do putStrLn "!!! Error !!!"
print error
Right (tit,resu) -> do
mapM_ putStrLn tit
mapM_ putStrLn (concat resu)
but when I try to parse a file with this kind of data, I have the following error :
!!! Error !!!
"(stdin)" (line 26, column 1):
unexpected end of input
expecting space or letter or digit
I'm a newbie with parsing and I don't understand why it fail?
Do you have an idea of what is wrong with my parser ?

Your program is doing something different than what you expect. The key part is right here:
parseSeries = do
spaces
titles <- title `sepBy` spaces
let nb = length titles
-- The following is the incorrect part
dat <- endBy (count (nb-1) (digits `sepBy` spaces)) endOfLine
spaces
return (titles,concat dat)
I believe what you actually wanted was:
parseSeries = do
spaces
titles <- title `sepBy` spaces
let nb = length titles
let parseRow = do
column <- digits
columns <- count (nb - 1) (spaces *> digits)
newline
return (column:columns)
dat <- many parseRow
return (titles, dat)

Related

Parsec fails without error if reading from file

I wrote a small parsec parser to read samples from a user supplied input string or an input file. It fails properly on wrong input with a useful error message if the input is provided as a semicolon separated string:
> readUncalC14String "test1,7444,37;6800,36;testA,testB,2000,222;test3,7750,40"
*** Exception: Error in parsing dates from string: (line 1, column 29):
unexpected "t"
expecting digit
But it fails silently for the input file inputFile.txt with identical entries:
test1,7444,37
6800,36
testA,testB,2000,222
test3,7750,40
> readUncalC14FromFile "inputFile.txt"
[UncalC14 "test1" 7444 37,UncalC14 "unknownSampleName" 6800 36]
Why is that and how can I make readUncalC14FromFile fail in a useful manner as well?
Here is a minimal subset of my code:
import qualified Text.Parsec as P
import qualified Text.Parsec.String as P
data UncalC14 = UncalC14 String Int Int deriving Show
readUncalC14FromFile :: FilePath -> IO [UncalC14]
readUncalC14FromFile uncalFile = do
s <- readFile uncalFile
case P.runParser uncalC14SepByNewline () "" s of
Left err -> error $ "Error in parsing dates from file: " ++ show err
Right x -> return x
where
uncalC14SepByNewline :: P.Parser [UncalC14]
uncalC14SepByNewline = P.endBy parseOneUncalC14 (P.newline <* P.spaces)
readUncalC14String :: String -> Either String [UncalC14]
readUncalC14String s =
case P.runParser uncalC14SepBySemicolon () "" s of
Left err -> error $ "Error in parsing dates from string: " ++ show err
Right x -> Right x
where
uncalC14SepBySemicolon :: P.Parser [UncalC14]
uncalC14SepBySemicolon = P.sepBy parseOneUncalC14 (P.char ';' <* P.spaces)
parseOneUncalC14 :: P.Parser UncalC14
parseOneUncalC14 = do
P.try long P.<|> short
where
long = do
name <- P.many (P.noneOf ",")
_ <- P.oneOf ","
mean <- read <$> P.many1 P.digit
_ <- P.oneOf ","
std <- read <$> P.many1 P.digit
return (UncalC14 name mean std)
short = do
mean <- read <$> P.many1 P.digit
_ <- P.oneOf ","
std <- read <$> P.many1 P.digit
return (UncalC14 "unknownSampleName" mean std)

What is happening here is that a prefix of your input is a valid string. To force parsec to use the whole input you can use the eof parser:
uncalC14SepByNewline = P.endBy parseOneUncalC14 (P.newline <* P.spaces) <* P.eof
The reason that one works and the other doesn't is due to the difference between sepBy and endBy. Here is a simpler example:
sepTest, endTest :: String -> Either P.ParseError String
sepTest s = P.runParser (P.sepBy (P.char 'a') (P.char 'b')) () "" s
endTest s = P.runParser (P.endBy (P.char 'a') (P.char 'b')) () "" s
Here are some interesting examples:
ghci> sepTest "abababb"
Left (line 1, column 7):
unexpected "b"
expecting "a"
ghci> endTest "abababb"
Right "aaa"
ghci> sepTest "ababaa"
Right "aaa"
ghci> endTest "ababaa"
Left (line 1, column 6):
unexpected "a"
expecting "b"
As you can see both sepBy and endBy can fail silently, but sepBy fails silently if the prefix doesn't end in the separator b and endBy fails silently if the prefix doesn't end in the main parser a.
So you should use eof after both parsers if you want to make sure you read the whole file/string.

How to parse integer with base prefix using parsec in haskell?

I'm trying to parse an input integer string in haskell using parsec. The string might either be in decimal, octal or hexadecimal. The base is specified by a prefix of #d, #o or #x for decimal, octal and hexadecimal respectively, which is then followed by the integer. If no prefix is specified, the base is assumed to be 10. Here's what I've done so far:
parseNumber = do x <- noPrefix <|> withPrefix
return x
where noPrefix = many1 digit
withPrefix = do char '#'
prefix <- oneOf "dox"
return $ case prefix of
'd' -> many1 digit
'o' -> fmap (show . fst . head . readOct) (many1 octDigit)
'x' -> fmap (show . fst . head . readHex) (many1 hexDigit)
However, this isn't compiling and is failing with type errors. I don't quite really understand the type error and would just like help in general with this problem. Any alternative ways to solve it will also be appreciated.
Thank you for your time and help. :)
EDIT: Here's the error I've been getting.

In Megaparsec—a modern
fork of Parsec, this problem is non-existent (from
documentation of hexadecimal):
Parse an integer in hexadecimal representation. Representation of
hexadecimal number is expected to be according to Haskell report except
for the fact that this parser doesn't parse “0x” or “0X” prefix. It is
responsibility of the programmer to parse correct prefix before parsing
the number itself.
For example you can make it conform to Haskell report like this:
hexadecimal = char '0' >> char' 'x' >> L.hexadecimal
So in your case you can just define (note how it's more readable):
import Data.Void
import Text.Megaparsec
import Text.Megaparsec.Char
import qualified Text.Megaparsec.Char.Lexer as L
type Parser = Parsec Void String
parseNumber :: Parser Integer
parseNumber = choice
[ L.decimal
, (string "o#" *> L.octal) <?> "octal integer"
, (string "d#" *> L.decimal) <?> "decimal integer"
, (string "h#" *> L.hexadecimal) <?> "hexadecimal integer" ]
Let's try the parser (note quality of error messages):
λ> parseTest' (parseNumber <* eof) ""
1:1:
|
1 | <empty line>
| ^
unexpected end of input
expecting decimal integer, hexadecimal integer, integer, or octal integer
λ> parseTest' (parseNumber <* eof) "d#3"
3
λ> parseTest' (parseNumber <* eof) "h#ff"
255
λ> parseTest' (parseNumber <* eof) "o#8"
1:3:
|
1 | o#8
| ^
unexpected '8'
expecting octal integer
λ> parseTest' (parseNumber <* eof) "o#77"
63
λ> parseTest' (parseNumber <* eof) "190"
190
Full-disclosure: I'm the author/maintainer of Megaparsec.

You have two slight errors:
One indention error (return x must be indented compared to do) and the parsers in withPrefix must not be returned, since they will return their results anyway.
parseNumber = do x <- noPrefix <|> withPrefix
return x
where noPrefix = many1 digit
withPrefix = do char '#'
prefix <- oneOf "dox"
case prefix of
'd' -> many1 digit
'o' -> fmap (show . fst . head . readOct) (many1 octDigit)
'x' -> fmap (show . fst . head . readHex) (many1 hexDigit)
This should work

Unexpected end of input with Parsec

I try to parse the following text file with series of data between keywords :
many text many text many text
BEGIN
T LISTE2
1 154
2 321
3 519
4 520
5 529
6 426
END
many text many text many text
By using the following haskell program
import Text.Parsec
import Text.Parsec.String
import Text.Parsec.Char
import Text.Parsec.Combinator
endOfLine :: Parser String
endOfLine = try (string "\n")
<|> try (string "\r\n")
line = many $ noneOf "\n"
parseListing = do
spaces
many $ noneOf "\n"
spaces
cont <- between (string "BEGIN\n") (string "END\n") $ endBy line endOfLine
spaces
many $ noneOf "\n"
spaces
eof
return cont
main :: IO ()
main = do
file <- readFile ("test_list.txt")
case parse parseListing "(stdin)" file of
Left err -> do putStrLn "!!! Error !!!"
print err
Right resu -> do putStrLn $ concat resu
And when I parse my text file, I get the following error :
"(stdin)" (line 16, column 1):
unexpected end of input
expecting "\n", "\r\n" or "END\n"
I'm a newbie with parsing and I don't understand why it fail?
My sequence is yet between BEGIN and END
Do you know what is wrong with my parser and how to correct it ?

Your between will never stop, because endBy line endOfLine consumes any line and END\n too, so it will eat more and more lines until it fails.
Then your parser tries to consume string "END\n" and fails too, that's why error message mentions "END\n"
You must rewrite line parser to fail on END\n. For example:
parseListing :: Parsec String () [String]
parseListing = do
spaces
many $ noneOf "\n"
spaces
cont <- between begin end $ endBy (notFollowedBy end >> line) endOfLine
spaces
many $ noneOf "\n"
spaces
eof
return cont
where
begin = string "BEGIN\n"
end = string "END\n"

Ignoring letters and parsing only numbers using Parsec

This code works only when numerals (eg: "1243\t343\n") are present:
tabFile = endBy line eol
line = sepBy cell (many1 tab)
cell = integer
eol = char '\n'
integer = rd <$> many digit
where rd = read :: String -> Int
Is there a way to make it parse "abcd\tefg\n1243\t343\n" such that it ignores the "abcd\tefg\n" part ?

You can skip everything except digits using skipMany. Something like the next:
many (skipMany (noneOf ['0'..'9']) >> digit)
or (depending on what you actually need)
skipMany (noneOf ['0'..'9']) >> many digit

So the trick is to modify integers to simply skip letters.
integer :: Parser Int
integer =
many letter *>
((read . concat) <$> many digit `sepBy` many1 letter)
<* many letter
This handles 12a34 correctly. Otherwise something as easy as
many letter *> (read <$> many digit) <* many letter

Correctly parsing line indentations in uu-parsinglib in Haskell

I want to create a parser combinator, which will collect all lines below current place, which indentation levels will be greater or equal some i. I think the idea is simple:
Consume a line - if its indentation is:
ok -> do it for next lines
wrong -> fail
Lets consider following code:
import qualified Text.ParserCombinators.UU as UU
import Text.ParserCombinators.UU hiding(parse)
import Text.ParserCombinators.UU.BasicInstances hiding (Parser)
-- end of line
pEOL = pSym '\n'
pSpace = pSym ' '
pTab = pSym '\t'
indentOf s = case s of
' ' -> 1
'\t' -> 4
-- return the indentation level (number of spaces on the beginning of the line)
pIndent = (+) <$> (indentOf <$> (pSpace <|> pTab)) <*> pIndent `opt` 0
-- returns tuple of (indentation level, result of parsing the second argument)
pIndentLine p = (,) <$> pIndent <*> p <* pEOL
-- SHOULD collect all lines below witch indentations greater or equal i
myParse p i = do
(lind, expr) <- pIndentLine p
if lind < i
then pFail
else do
rest <- myParse p i `opt` []
return $ expr:rest
-- sample inputs
s1 = " a\
\\n a\
\\n"
s2 = " a\
\\na\
\\n"
-- execution
pProgram = myParse (pSym 'a') 1
parse p s = UU.parse ( (,) <$> p <*> pEnd) (createStr (LineColPos 0 0 0) s)
main :: IO ()
main = do
print $ parse pProgram s1
print $ parse pProgram s2
return ()
Which gives following output:
("aa",[])
Test.hs: no correcting alternative found
The result for s1 is correct. The result for s2 should consume first "a" and stop consuming. Where this error comes from?

The parsers which you are constructing will always try to proceed; if necessary input will be discarded or added. However pFail is a dead-end. It acts as a unit element for <|>.
In you parser there is however no other alternative present in case the input does not comply to the language recognised by the parser. In you specification you say you want the parser to fail on input s2. Now it fails with a message saying that is fails, and you are surprised.
Maybe you do not want it to fail, but you want to stop accepting further input? In that case
replace pFail by return [].
Note that the text:
do
rest <- myParse p i `opt` []
return $ expr:rest
can be replaced by (expr:) <$> (myParse p i `opt` [])
A natural way to solve your problem is probably something like
pIndented p = do i <- pGetIndent
(:) <$> p <* pEOL <*> pMany (pToken (take i (repeat ' ')) *> p <* pEOL)
pIndent = length <$> pMany (pSym ' ')

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Parsing columns of data with parsec - parsing

Related

Parsec fails without error if reading from file

How to parse integer with base prefix using parsec in haskell?

Unexpected end of input with Parsec

Ignoring letters and parsing only numbers using Parsec

Correctly parsing line indentations in uu-parsinglib in Haskell

Categories

Resources