parsing Float with 'get' from ByteString not behaving as expected - parsing

In the context of creating a simple STL parser, I found some, at least to me, unexpected behavior in get from Data.Binary. It appears to me that it does not stop reading the ByteString after the 32 bits that I would assume it should.
{-# LANGUAGE FlexibleContexts #-}
module STLTransform where
import qualified Data.ByteString.Lazy as BL
import Data.Binary
import Data.Binary.Get
import Data.Binary.Put
import Data.Word
import Numeric.LinearAlgebra
import GHC.Float
import System.Endian
getTriangle = do
normal <- getR3
vertices <- sequence $ take 3 $ trace (show normal) $ repeat getR3
return (normal, vertices)
where getR3 = fmap (vector.fmap float2Double)
$ sequence $ take 3 $ repeat get
getSTL :: Get ([Word8], Word32, [(Vector R, [Vector R])], Word16)
getSTL = do
header <- sequence $ take 80 $ repeat get
number <- fmap (toBE32) $ get
triangles <- sequence $ take (fromIntegral number) $ repeat getTriangle
attribute <- get
return (header, number, triangles, attribute)
readSTL fn = BL.readFile fn >>= return.runGet getSTL
Attempting to read an STL-file using this code, I would get an error like
*Main STLTransform> readSTL "test.stl"
*** Exception: Data.Binary.Get.runGet at position 1268784: not enough bytes
CallStack (from HasCallStack):
error, called at libraries/binary/src/Data/Binary/Get.hs:351:5 in binary-0.8.6.0:Data.Binary.Get
After some debugging, I isolated the problem to getTriangle, and more specifically the parsing of Float values. By replacing float2Double with (\a -> fromIntegral (a :: Word32))
getTriangle = do
normal <- getR3
vertices <- sequence $ take 3 $ trace (show normal) $ repeat getR3
return (normal, vertices)
where getR3 = fmap (vector.fmap (\a -> fromIntegral (a :: Word32)))
$ sequence $ take 3 $ repeat get
I get code that runs as I intended (although obviously with incorrect numbers). So the the question is, why are Float and Word32 treated differently? Is not Float also a 32 bit value?
The version of the package used is binary-0.8.6.0
EDIT:
The corrected version of the function is
getTriangle :: Get (Vector R, [Vector R], Word16)
getTriangle = do
normal <- getR3
vertices <- sequence $ take 3 $ repeat getR3
attribute <- get
return (normal, vertices, attribute)
where getR3 = fmap (vector.fmap float2Double)
$ sequence $ take 3 $ repeat getFloatle
Unrelated to the stated issue, but corrected from the original post, the attribute is on the triangle level, not on the file level.

The Binary instance of Float parses a pair
(Integer, Int) and then uses encodeFloat :: Integer -> Int -> Float, which is why it takes more than 32 bits.
This is a known issue left around for backwards compatibility: https://github.com/kolmodin/binary/issues/69
Don't use get blindly, there is more than one way to encode things. In this case, there are other parsers for 32 bit formats of Float: https://hackage.haskell.org/package/binary-0.8.7.0/docs/Data-Binary-Get.html#v:getFloatbe

Related

Haskell - Parse binary stream

I'm new to Haskell, so I'm probably missing something very simple.
I'm having a supprisingly hard time trying to parse a structured binary stream.
The binary stream has varying and conditional parts (like a field which determines how many items follow it (ncount), or what type of message follows it (type)).
To get a simple working example, I'm trying to parse this hypothetical binary structure:
+----------------+---------------+--------------
| Magic (8 bits) | type (3 bits) | type message...
+----------------+---------------+--------------
Type 1:
+----------------+-------------+-------------+-----------------+
|ncount (3 bits) | n1 (3 bits) | n1 (3 bits) | nN (3 bits)... |
+----------------+-------------+-------------+-----------------+
Type 2:
+----------------+---------------+
| num1 (7 bits) | num2 (7 bits) |
+----------------+---------------+
...
My code so far:
{-# LANGUAGE RecordWildCards #-}
module Main where
import Data.Bits
import Data.Binary as B
import qualified Data.Binary.Bits.Get as BG
import qualified Data.ByteString as BS
data Header = Header {
magic :: Word8
,mtype :: Word8
,num1 :: Word8
,num2 :: Word8
} deriving (Show)
--instance Show (Get Header) where
-- show (Header {..}) = show . magic
parse_header :: B.Get Header
parse_header = BG.runBitGet parse_header'
-- Example, assume type 2 for now
parse_header' :: BG.BitGet Header
parse_header' = do
magic <- BG.getWord8 8
mtype <- BG.getWord8 3
num1 <- BG.getWord8 7
num2 <- BG.getWord8 7
return $ Header magic mtype num1 num2
main :: IO ()
main = do
putStrLn "Start"
-- File containing binary stream
fstr <- BS.readFile "data/hbin.bin"
let header = parse_header
in
-- How do I print out Header?
print header
-- * No instance for (Show (Get Header))
-- arising from a use of `print'
-- * In the expression: print header
putStrLn "\nEnd"
In which I get the error:
* No instance for (Show (Get Header)) arising from a use of `print'
* In the expression: print header
Obviously, I plan to parse this recursively, but for now I can't even see a value I've read.
I've followed https://wiki.haskell.org/Dealing_with_binary_data but this uses Data.Binary.Strict (binary-strict) which doesn't compile on Windows (atleast on mine).
I've also followed https://hackage.haskell.org/package/binary-bits-0.5/docs/Data-Binary-Bits-Get.html but it doesn't show how to use values you have gotten with getWord8 (Do I need to put them into an Int to read them as decimal?)
Again, I'm new to Haskell and not familiar with Monads (which I believe Get is).
header = parse_header is only giving a new name to the parser. You need a function to run the parser, there's one here (here choosing runGet for simplicity, but you should prefer the other one, to handle the error case more easily):
runGet :: Get a -> ByteString -> a
Note that it takes a lazy ByteString (Data.ByteString.Lazy) instead of a strict one (Data.ByteString).
...
import Data.ByteString.Lazy (toLazy)
...
main = do
fstr <- BS.readFile "data/hbin.bin"
let header = runGet parse_header (fromStrict fstr)
print header
putStrLn "End"

How can I parse non-contiguous elements of a permutation phrase using Parsec's Perm?

Haskell's Parsec supports parsing of permutation phrases via the Perm module. In the documentation for permute you can see how the various permutations "abc", "aab", "bc", "cbaaaaa", etc. can be parsed. While the example shows support for parting many contiguous instances of the same element like "aaaa", it won't parse non-contiguous instances like "aabca", presumably because each parser is included only once in each permutation (the paper seems to imply this in the tree...)
Besides sorting the input so like instances are contiguous, what options do I have for parsing non-contiguous instance?
Depending on what you actually want, you may be able to use many $ oneOf ['a','b','c'].
If you really need to use the permutation parser, keep in mind that allowing parses of multiple adjacent characters introduces ambiguity. For example, in the string "bacacbbca", it could be parsed as the perms bac, acb bca, or, if you allow repeated characters, bac, acbb, with a leftover non-permutation of ca.
If you allow repeated letters,
{-# LANGUAGE FlexibleContexts #-}
import Text.Parsec
import Text.Parsec.Char
import Text.Parsec.Perm
import Data.Text
import Control.Monad.Identity
perm :: (Stream s Identity Char) => Parsec s u (String,String,String)
perm = permute $ triple <$$>
(many1 $ char 'a') <||>
(many1 $ char 'b') <||>
(many1 $ char 'c')
where triple a b c = (a,b,c)
multiPerm :: (Stream s Identity Char) => Parsec s u [(String,String,String)]
multiPerm = many $ try $ perm
main :: IO ()
main = parseTest multiPerm $ "bacacbbca"
main produces [("a","b","c"),("a","bb","c")].
If not:
perm :: (Stream s Identity Char) => Parsec s u (Char,Char,Char)
perm = permute $ triple <$$>
(char 'a') <||>
(char 'b') <||>
(char 'c')
where triple a b c = (a,b,c)
you get the arguably better: [('a','b','c'),('a','b','c'),('a','b','c')].

How can I parse fixed-length, non-delimited integers with attoparsec?

I'm trying to parse two integers from 3 characters using attoparsec. A sample input might look something like this:
341
... which I would like to parse into:
Constructor 34 1
I have two solutions that work but which are somewhat clunky:
stdK :: P.Parser Packet
stdK = do
P.char '1'
qstr <- P.take 2
let q = rExt $ P.parseOnly P.decimal qstr
n <- P.decimal
return $ Std q n
stdK2 :: P.Parser Packet
stdK2 = do
P.char '1'
qn <- P.decimal
let q = div qn 10
let n = rem qn 10
return $ Std q n
There must be a better way to achieve something as simple as this. Am I missing something?
Your code snippet is far from being self-contained (in particular, imports and the definition of your Packet data type are missing), but you seem to be overcomplicating things.
First, define a parser for one-digit integers. Then, use the latter as a building block for a parser for two-digit integers. After that, use applicative operators to combine those two parsers and define a parser for your custom Packet data type. See below.
Note that you don't need the full power of monads; applicative parsing is sufficient, here.
-- test_attoparsec.hs
{-# LANGUAGE OverloadedStrings #-}
import Control.Applicative ((<$>))
import Data.Attoparsec.Text
import Data.Char
data Packet = Std {-# UNPACK #-} !Int
{-# UNPACK #-} !Int
deriving (Show)
stdK :: Parser Packet
stdK = char '1' *> (Std <$> twoDigitInt <*> oneDigitInt)
twoDigitInt :: Parser Int
twoDigitInt = timesTenPlus <$> oneDigitInt <*> oneDigitInt
where
timesTenPlus x y = 10 * x + y
oneDigitInt :: Parser Int
oneDigitInt = digitToInt <$> digit
Tests in GHCi:
λ> :l test_attoparsec.hs
[1 of 1] Compiling Main ( test_attoparsec.hs, interpreted )
Ok, modules loaded: Main.
λ> :set -XOverloadedStrings
λ> parseOnly stdK "1341"
Right (Std 34 1)
λ> parseOnly stdK "212"
Left "1: Failed reading: satisfyWith"

Invertible State monad (and parsers)

Good day, ladies and gentlemen!
I'm constantly writing parsers and codecs. Implementing both parsers and printers seems to be massive code duplication. I wonder whether it is possible to invert a stateful computation, given it is isomorphic by nature.
It is possible to invert pure function composition (Control.Lens.Iso did that by defining a composition operator over isomorphisms). As it can be observed,
Iso bc cb . Iso ab ba = Iso (bc . ab) (ba . cb) -- from Lenses talk
invert (f . g) = (invert g) . (invert f) -- pseudo-code
In other words, to invert a function composition one should compose inverted functions in the opposite order. So, given all primitive isomorphic pairs are defined, one can compose them to get more complicated pairs with no code duplication. Here is an example of pure bidirectional computation (Control.Lens is used, the explanatory video can help you to get the general idea of Lenses, Folds and Traversals):
import Control.Lens
tick :: Num a => Iso' a a
tick = iso (+1) (subtract 1) -- define an isomorphic pair
double :: Num a => Iso' a a
double = iso (+2) (subtract 2) -- and another one
threeTick :: Num a => Iso' a a
-- These are composed via simple function composition!
threeTick = double . tick
main :: IO ()
main = do
print $ (4 :: Int)^.tick -- => 5
print $ (4 :: Int)^.from tick -- => 3
print $ (4 :: Int)^.threeTick -- => 7, Composable
print $ (4 :: Int)^.from threeTick -- => 1, YEAH
As you can see, I didn't need to supply the inverted version of threeTick; it is obtained by backward composition automatically!
Now, let's consider a simple parser.
data FOO = FOO Int Int deriving Show
parseFoo :: Parser FOO
parseFoo = FOO <$> decimal <* char ' '
<*> decimal
parseFoo' :: Parser FOO
parseFoo' = do
first <- decimal
void $ char ' '
second <- decimal
return $ FOO first second
printFoo :: FOO -> BS.ByteString
printFoo (FOO a b) = BS.pack(show a) <>
BS.pack(" ") <>
BS.pack(show b)
main :: IO ()
main = do
print $ parseOnly parseFoo "10 11" -- => Right (FOO 10 11)
print $ parseOnly parseFoo' "10 11" -- => Right (FOO 10 11)
print . printFoo $ FOO 10 11 -- => "10 11"
print . parseOnly parseFoo . printFoo $ FOO 10 11 -- id
You can see that both versions of parseFoo are fairly declarative (thanks to parser combinators). Note the similarity between parseFoo and printFoo. Can I define isomorphisms over primitive parsers (decimal and char) and then just derive the printer (printFoo :: FOO -> String) automatically? Ideally, parser combinators will work as well.
I tried to redefine a monadic >>= operator in order to provide inverted semantics, but I've failed to do so. I feel that one could define inverted Kleisli composition operator (monadic function composition) similarly to composition inversion, but can one use it with ordinary monads?
f :: a -> m b, inverse f :: b -> m a
g :: b -> m c, inverse g :: c -> m b
inverse (f >=> g) = (inverse f) <=< (inverse g)
Why inverse f is of type b -> m a and not m b -> a? The answer is: monadic side effect is an attribute of an arrow, not that of a data type b. The state monad dualization is further discussed in the great Expert to Expert video.
If the solution does exist, could you please supply a working example of printFoo derivation? By the way, here is an interesting paper that could help us find the solution.
You may be interested in digging in further into the lens package for the concept of a Prism.
A Prism can be used as a 'smart constructor' to build something (e.g. a pretty printed string) unconditionally, and match on it (e.g. parse).
You'd have to ignore the laws or treat the laws as holding only up to a quotient though, as the string you get out of pretty printing is very likely not exactly the string you parsed.

Correct ReadP usage in Haskell

I did a very simple parser for lists of numbers in a file, using ReadP in Haskell. It works, but it is very slow... is this normal behavior of this type of parser or am I doing something wrong?
import Text.ParserCombinators.ReadP
import qualified Data.IntSet as IntSet
import Data.Char
setsReader :: ReadP [ IntSet.IntSet ]
setsReader =
setReader `sepBy` ( char '\n' )
innocentWhitespace :: ReadP ()
innocentWhitespace =
skipMany $ (char ' ') <++ (char '\t' )
setReader :: ReadP IntSet.IntSet
setReader = do
innocentWhitespace
int_list <- integerReader `sepBy1` innocentWhitespace
innocentWhitespace
return $ IntSet.fromList int_list
integerReader :: ReadP Int
integerReader = do
digits <- many1 $ satisfy isDigit
return $ read digits
readClusters:: String -> IO [ IntSet.IntSet ]
readClusters filename = do
whole_file <- readFile filename
return $ ( fst . last ) $ readP_to_S setsReader whole_file
setReader has exponential behavior, because it is allowing the whitespace between the numbers to be optional. So for the line:
12 34 56
It is seeing these parses:
[1,2,3,4,5,6]
[12,3,4,5,6]
[1,2,34,5,6]
[12,34,5,6]
[1,2,3,4,56]
[12,3,4,56]
[1,2,34,56]
[12,34,56]
You could see how this could get out of hand for long lines. ReadP returns all valid parses in increasing length order, so to get to the last parse you have to traverse through all these intermediate parses. Change:
int_list <- integerReader `sepBy1` innocentWhitespace
To:
int_list <- integerReader `sepBy1` mandatoryWhitespace
For a suitable definition of mandatoryWhitespace to squash this exponential behavior. The parsing strategy used by parsec is more resistant to this kind of error, because it is greedy -- once it consumes input in a given branch, it is committed to that branch and never goes back (unless you explicitly asked it to). So once it correctly parsed 12, it would never go back to parse 1 2. Of course that means it matters in which order you state your choices, which I always find to be a bit of a pain to think about.
Also I would use:
head [ x | (x,"") <- readP_to_S setsReader whole_file ]
To extract a valid whole-file parse, in case it very quickly consumed all input but there were a hundred bazillion ways to interpret that input. If you don't care about the ambiguity, you would probably rather it return the first one than the last one, because the first one will arrive faster.

Resources