Based on my previous question whose setting I refined gradually (How to create monadic behaviour in reactive-banana):
Suppose there is eKey, an event fired whenever a key is pressed, b of type Buffer which is modified appropriately whenever a supported event occurs in eKey, and finally there are IO actions to be taken on some of the events. These IO actions depend on the state of b (for simplicity assume they print the current state of b to the console).
Currently, I have this to select the action that happens on an event:
getKeyAction :: KeyVal -> Maybe (IO Buffer -> IO Buffer)
getKeyAction 65288 = Just . liftM $ backspace
getKeyAction 65293 = Just $ \mb -> do
b <- mb
print $ "PRESSED ENTER: " ++ toString b
return emptyBuffer
getKeyAction 65360 = Just . liftM $ leftMost
getKeyAction 65361 = Just . liftM $ left
...
getKeyAction _ = Nothing
and then I do:
let
eBufferActions = filterJust $ getKeyAction <$> eKey
bBuffer = accumB (return emptyBuffer) eBufferActions -- model `b`
eBuffer <- changes bBuffer
reactimate' $ fmap displayBuffer <$> eBuffer
for some displayBuffer :: IO Buffer -> IO ().
It does not seem to work as intended. The state of bBuffer seems to be evaluated on every event anew (effectively running all IO actions collected thus far every time an event occurs) which makes sense when I think about it in retrospect.
How should I refactor my code to do this correctly? (i.e. IO actions see current state of the buffer, nothing except the buffer gets accumulated)
If I could construct an Event that bears value of the bBuffer on occasion of an appropriate eKey event, then I could simply map my a dedicated IO action over it and reactimate. What do you think? How to do that? Would <# achieve what I am trying to do? But how would I postpone the current change to b associated with the current key press after I take snapshot of b with <# to map the Buffer -> IO () over it?
OK, so I believe this solves my problem, but I am not sure it is the right thing to do. So please comment.
I factor out those actions that do some non-trivial IO (other than return)
I filter the eKey event into two: eBuffer and eConfirm
eBuffer collects all modifying events (including clearing the buffer on confirmation)
eConfirm collects all confirmation events
I tag eConfirm with values of bBuffer, which captures the evolution of the buffer
Finally, I reactimate separately the IO and the changes of the buffer
The code fragment:
getKeyAction :: KeyVal -> Maybe (Buffer -> Buffer)
getKeyAction 65288 = Just backspace
-- omit action for ENTER
...
getConfirm :: KeyVal -> Maybe (Buffer -> Buffer)
getConfirm 65293 = Just (const mkBuffer) -- Clear buffer on ENTER
getConfirm _ = Nothing
Then in the description of the network:
let
eBuffer = filterJust $ getKeyAction <$> eKey
eConfirm = filterJust $ getConfirm <$> eKey
bBuffer = accumB mkBuffer $ unions [ eBuffer, eConfirm ]
eEval = bBuffer <# eConfirm
eBufferChanges <- changes bBuffer
reactimate $ evalBuffer <$> eEval
reactimate' $ fmap displayBuffer <$> eBufferChanges
for evalBuffer :: Buffer -> IO () and displayBuffer :: Buffer -> IO ().
Related
I am currently writing a basic parser. A parser for type a takes a string in argument and returns either nothing, or an object of type a and the rest of the string.
Here is a simple type satisfying all these features:
type Parser a = String -> Maybe (a, String)
For example, I wrote a function that takes a Char as argument and returns a Parser Char :
parseChar :: Char -> Parser Char
parseChar _ [] = Nothing
parseChar c (x:xs)
| c == x = Just (x, xs)
| otherwise = Nothing
I would like to write a function which takes a parser in argument and tries to apply it zero or more times, returning a list of the parsed elements :
parse :: Parser a -> Parser [a]
Usage example:
> parse (parseChar ' ') " foobar"
Just (" ", "foobar")
I tried to write a recursive function but I can't save the parsed elements in a list.
How can I apply the parsing several times and save the result in a list ?
I tried to write a recursive function but I can't save the parsed elements in a list.
You don't need to "save" anything. You can use pattern matching. Here's a hint. Try to reason about what should happen in each case below. The middle case is a bit subtle, don't worry if you get that wrong at first. Note how s and s' are used below.
parse :: Parser a -> Parser [a]
parse p s = case p s of
Nothing -> ... -- first p failed
Just (x,s') -> case parse p s' of
Nothing -> ... -- subtle case, might not be relevant after all
Just (xs,s'') -> ... -- merge the results
Another hint: note that according to your description parse p should never fail, since it can always return the empty list.
Good day, ladies and gentlemen!
I'm constantly writing parsers and codecs. Implementing both parsers and printers seems to be massive code duplication. I wonder whether it is possible to invert a stateful computation, given it is isomorphic by nature.
It is possible to invert pure function composition (Control.Lens.Iso did that by defining a composition operator over isomorphisms). As it can be observed,
Iso bc cb . Iso ab ba = Iso (bc . ab) (ba . cb) -- from Lenses talk
invert (f . g) = (invert g) . (invert f) -- pseudo-code
In other words, to invert a function composition one should compose inverted functions in the opposite order. So, given all primitive isomorphic pairs are defined, one can compose them to get more complicated pairs with no code duplication. Here is an example of pure bidirectional computation (Control.Lens is used, the explanatory video can help you to get the general idea of Lenses, Folds and Traversals):
import Control.Lens
tick :: Num a => Iso' a a
tick = iso (+1) (subtract 1) -- define an isomorphic pair
double :: Num a => Iso' a a
double = iso (+2) (subtract 2) -- and another one
threeTick :: Num a => Iso' a a
-- These are composed via simple function composition!
threeTick = double . tick
main :: IO ()
main = do
print $ (4 :: Int)^.tick -- => 5
print $ (4 :: Int)^.from tick -- => 3
print $ (4 :: Int)^.threeTick -- => 7, Composable
print $ (4 :: Int)^.from threeTick -- => 1, YEAH
As you can see, I didn't need to supply the inverted version of threeTick; it is obtained by backward composition automatically!
Now, let's consider a simple parser.
data FOO = FOO Int Int deriving Show
parseFoo :: Parser FOO
parseFoo = FOO <$> decimal <* char ' '
<*> decimal
parseFoo' :: Parser FOO
parseFoo' = do
first <- decimal
void $ char ' '
second <- decimal
return $ FOO first second
printFoo :: FOO -> BS.ByteString
printFoo (FOO a b) = BS.pack(show a) <>
BS.pack(" ") <>
BS.pack(show b)
main :: IO ()
main = do
print $ parseOnly parseFoo "10 11" -- => Right (FOO 10 11)
print $ parseOnly parseFoo' "10 11" -- => Right (FOO 10 11)
print . printFoo $ FOO 10 11 -- => "10 11"
print . parseOnly parseFoo . printFoo $ FOO 10 11 -- id
You can see that both versions of parseFoo are fairly declarative (thanks to parser combinators). Note the similarity between parseFoo and printFoo. Can I define isomorphisms over primitive parsers (decimal and char) and then just derive the printer (printFoo :: FOO -> String) automatically? Ideally, parser combinators will work as well.
I tried to redefine a monadic >>= operator in order to provide inverted semantics, but I've failed to do so. I feel that one could define inverted Kleisli composition operator (monadic function composition) similarly to composition inversion, but can one use it with ordinary monads?
f :: a -> m b, inverse f :: b -> m a
g :: b -> m c, inverse g :: c -> m b
inverse (f >=> g) = (inverse f) <=< (inverse g)
Why inverse f is of type b -> m a and not m b -> a? The answer is: monadic side effect is an attribute of an arrow, not that of a data type b. The state monad dualization is further discussed in the great Expert to Expert video.
If the solution does exist, could you please supply a working example of printFoo derivation? By the way, here is an interesting paper that could help us find the solution.
You may be interested in digging in further into the lens package for the concept of a Prism.
A Prism can be used as a 'smart constructor' to build something (e.g. a pretty printed string) unconditionally, and match on it (e.g. parse).
You'd have to ignore the laws or treat the laws as holding only up to a quotient though, as the string you get out of pretty printing is very likely not exactly the string you parsed.
Problem statement
Suppose I have two parsers p and q and I concatenate them like this:
r = try p *> q
In Parsec, the behavior of this is:
If p fails without consuming input, then r fails without consuming input.
If p fails after consuming input, then r fails without consuming input.
if q fails without consuming input, then r fails after consuming p.
if q fails after consuming input, then r fails after consuming p and parts of q.
However, the behavior I'm looking for is a bit unusual:
If p fails without consuming input, then r should fail without consuming input.
If p fails after consuming input, then r should fail without consuming input.
if q fails without consuming input, then r should fail without consuming input.
if q fails after consuming input, then r should fail after consuming some input.
I can't seem to think of a clean way to do this.
Rationale
The reason is that I have a parser like this:
s = (:) <$> q <*> many r
The q parser, embedded inside the r parser, needs a way to signal either: invalid input (which occurs when q consumes input but fails), or end of the many loop (which occurs when q doesn't consume anything and fails). If the input is invalid, it should just fail the parser entirely and report the problem to the user. If there is no more input to consume, then it should end the many loop (without reporting a parser error to the user). The trouble is that it's possible that the input ends with a p but without any more valid q's to consume, in which case q will fail but without consuming any input.
So I was wondering if anyone have an elegant way to solve this problem? Thanks.
Addendum: Example
p = string "P"
q = (++) <$> try (string "xy") <*> string "z"
Test input on (hypothetical) parser s, had it worked the way I wanted:
xyz (accept)
xyzP (accept; P remains unparsed)
xyzPx (accept; Px remains unparsed; q failed but did not consume any input)
xyzPxy (reject; parser q consumed xy but failed)
xyzPxyz (accept)
In the form r = try p *> q, s will actually reject both case #2 and case #3 above. Of course, it's possible to achieve the above behavior by writing:
r = (++) <$> try (string "P" *> string "xy") <*> string "z"
but this isn't a general solution that works for any parsers p and q. (Perhaps a general solution doesn't exist?)
I believe I found a solution. It's not especially nice, but seems to works. At least something to start with:
{-# LANGUAGE FlexibleContexts #-}
import Control.Applicative hiding (many, (<|>))
import Control.Monad (void)
import Control.Monad.Trans (lift)
import Control.Monad.Trans.Maybe
import Text.Parsec hiding (optional)
import Text.Parsec.Char
import Text.Parsec.String
rcomb :: (Stream s m t) => ParsecT s u m a -> ParsecT s u m b -> ParsecT s u m b
rcomb p q = ((test $ opt p *> opt q) <|> pure (Just ()))
>>= maybe empty (\_ -> p *> q)
where
-- | Converts failure to #MaybeT Nothing#:
opt = MaybeT . optional -- optional from Control.Applicative!
-- | Tests running a parser, returns Nothing if parsers failed consuming no
-- input, Just () otherwise.
test = lookAhead . try . runMaybeT . void
This is the r combinator you're asking for. The idea is that we first execute the parsers in a "test" run (using lookAhead . try) and if any of them fails without consuming input, we record it as Nothing inside MaybeT. This is accomplished by opt, it converts a failure to Nothing and wraps it into MaybeT. Thanks to MaybeT, if opt p returns Nothing, opt q is skipped.
If both p and q succeed, the test .. part returns Just (). And if one of them consumes input, the whole test .. fails. This way, we distinguish the 3 possibilities:
Failure with some input consumed by p or q.
Failure such that the failing part doesn't consume input.
Success.
After <|> pure (Just ()) both 1. and 3. result in Just (), while 2. results in Nothing. Finally, the maybe part converts Nothing into a non-consuming failure, and Just () into running the parsers again, now without any protection. This means that 1. fails again with consuming some input, and 3. succeeds.
Testing:
samples =
[ "xyz" -- (accept)
, "xyzP" -- (accept; P remains unparsed)
, "xyzPz" -- (accept; Pz remains unparsed)
, "xyzPx" -- (accept; Px remains unparsed; q failed but did not consume any input)
, "xyzPxy" -- (reject; parser q consumed xy but failed)
, "xyzPxyz" -- (accept)
]
main = do
-- Runs a parser and then accept anything, which shows what's left in the
-- input buffer:
let run p x = runP ((,) <$> p <*> many anyChar) () x x
let p, q :: Parser String
p = string "P"
q = (++) <$> try (string "xy") <*> string "z"
let parser = show <$> ((:) <$> q <*> many (rcomb p q))
mapM_ (print . run parser) samples
I've got a 279MB file that contains ~10 million key/value pairs, with ~500,000 unique keys. It's grouped by key (each key only needs to be written once), so all the values for a given key are together.
What I want to do is transpose the association, create a file where the pairs are grouped by value, and all the keys for a given value are stored together.
Currently, I'm using Parsec to read in the pairs as a list of tuples (K,[V]) (using lazy IO so I can process it as a stream while Parsec is processing the input file), where:
newtype K = K Text deriving (Show, Eq, Ord, Hashable)
newtype V = V Text deriving (Show, Eq, Ord, Hashable)
tupleParser :: Parser (K,[V])
tupleParser = ...
data ErrList e a = Cons a (ErrList e a) | End | Err e
parseAllFromFile :: Parser a -> FilePath-> IO (ErrList ParseError a)
parseAllFromFile parser inputFile = do
contents <- readFile inputFile
let Right initialState = parse getParserState inputFile contents
return $ loop initialState
where loop state = case unconsume $ runParsecT parser' state of
Error err -> Err err
Ok Nothing _ _ -> End
Ok (Just a) state' _ -> a `Cons` loop state'
unconsume v = runIdentity $ case runIdentity v of
Consumed ma -> ma
Empty ma -> ma
parser' = (Just <$> parser) <|> (const Nothing <$> eof)
I've tried to insert the tuples into a Data.HashMap.Map V [K] to transpose the association:
transpose :: ErrList ParseError (K,[V]) -> Either ParseError [(V,[K])]
transpose = transpose' M.empty
where transpose' _ (Err e) = Left e
transpose' m End = Right $ assocs m
transpose' m (Cons (k,vs) xs) = transpose' (L.foldl' (include k) m vs) xs
include k m v = M.insertWith (const (k:)) v [k] m
But when I tried it, I got the error:
memory allocation failed (requested 2097152 bytes)
I could think of a couple things I'm doing wrong:
2MB seems a bit low (considerably less than the 2GB RAM my machine has installed), so maybe I need to tell GHC it's ok to use more?
My problems could be algorithmic/data structure related. Maybe I'm using the wrong tools for the job?
My attempt to use lazy IO could be coming back to bite me.
I'm leaning toward (1) for now, but I'm not sure by any means.
Is there the possibility that the data will increase? If yes then I'd suggest not to read the while file into memory and process the data in another way.
One simple possibility is to use a relational database for that. This'd be fairly easy - just load your data in, create a proper index and get it sorted as you need. The database will do all the work for you. I'd definitely recommend this.
Another option would be to create your own file-based mechanism. For example:
Choose some limit l that is reasonable to load into memory.
Create n = d `div` l files, where d is the total amount of your data. (Hopefully this will not exceed your file descriptor limit. You could also close and reopen files after each operation, but this will make the process very slow.)
Process the input sequentially and place each pair (k,v) into file number hash v `mod` l. This ensures that the pairs with the same value v will end up in the same file.
Process each file separately.
Merge them together.
It is essentially a hash table with file buckets. This solution assumes that each value has roughly the same number of keys (otherwise some files could get exceptionally large).
You could also implement an external sort which would allow you to sort basically any amount of data.
To allow for files that are larger than available memory, it's a good idea to process them in bite-sized chunks at a time.
Here is a solid algorithm to copy file A to a new file B:
Create file B and lock it to your machine
Begin loop
If there isn't a next line in file A then exit loop
Read in the next line of file A
Apply processing to the line
Check if File B contains the line already
If File B does not contain the line already then append the line to file B
Goto beginning of loop
Unlock file B
It can also be worthwhile making a copy of file A into a temp folder and locking it while you work with it so that other people on the network aren't prevented from changing the original, but you have a snapshot of the file as it was at the point the procedure was begun.
I intend to revisit this answer in the future and add code.
I'm trying to wrap the Data.Binary.Put monad into another so that later I can ask it questions like "how many bytes it's going to write" or "what is the current position in file". But even very trivial wraps like:
data Writer1M a = Writer1M { write :: P.PutM a }
or
data Writer2M a = Writer2M { write :: (a, P.Put) }
create a huge space leak and the program usually crashes (after taking up 4GB of RAM). Here is what I've tried so far:
-- This works well and consumes almost no memory.
type Writer = P.Put
writer :: P.Put -> Writer
writer put = put
writeToFile :: String -> Writer -> IO ()
writeToFile path writer = BL.writeFile path (P.runPut writer)
-- This one will cause memory leak.
data Writer1M a = Writer1M { write :: P.PutM a }
instance Monad Writer1M where
return a = Writer1M $ return a
ma >>= f = Writer1M $ (write ma) >>= \a -> write $ f a
type WriterM = Writer1M
type Writer = WriterM ()
writer :: P.Put -> Writer
writer put = Writer1M $ put
writeToFile :: String -> Writer -> IO ()
writeToFile path writer = BL.writeFile path (P.runPut $ write writer)
-- This one will crash as well with exactly the
-- same memory foot print as Writer1M
data Writer2M a = Writer2M { write :: (a, P.Put) }
instance Monad Writer2M where
return a = Writer2M $ (a, return ())
ma >>= f = Writer2M $ (b, p >> p')
where (a,p) = write ma
(b,p') = write $ f a
type WriterM = Writer2M
type Writer = WriterM ()
writer :: P.Put -> Writer
writer put = Writer2M $ ((), put)
writeToFile :: String -> Writer -> IO ()
writeToFile path writer = BL.writeFile path (P.runPut $ snd $ write writer)
I'm new to Haskell and this makes no sence to me, but the wrapper monads seem very trivial so I'm guessing there is something obvious I'm missing.
Thanks for looking.
UPDATE:
Here is a sample code that demonstrates the problem: http://hpaste.org/43400/why_wrapping_the_databinaryp
UPDATE2:
There is also a second part to this question here.
After poking around for a bit, I found that the problem seems to be the usage of binary's (>>=) to implement (>>). The following addition to the Writer1M monad implementation solves the problem:
m >> k = Writer1M $ write m >> write k
Whereas this version still leaks memory:
m >> k = Writer1M $ write m >>= const (write k)
Looking at binary's source, (>>) seems to discard the result of the first monad explicitly. Not sure how exactly this prevents the leak, though. My best theory is that GHC otherwise holds onto the PairS object, and the "a" reference leaks because it never gets looked at.
Did you tried to make the monad more strict? Eg. try to make the constructors of your datatyp strict / replace them with a newtype.
I don't know what's the exact problem here, but this is the usual source of leaks.
PS: And try to remove unnecessary lambdas, for instance:
ma >>= f = Writer1M $ (write ma) >=> write . f