Haskell equivalent of scala collect - parsing

I'm trying to read in a file containing key/value pairs of the form:
#A comment
a=foo
b=bar
c=baz
Some other stuff
With various other lines, as suggested. This wants to go into a map that I can then look up keys from.
My initial approach would be to read in the lines and split on the '=' character to get a [[String]]. In Scala, I would then use collect, which takes a partial function (in this case something like \x -> case x of a :: b :: _ -> (a,b) and applies it where it's defined, throwing away the values where the function is undefined. Does Haskell have any equivalent of this?
Failing that, how would one do this in Haskell, either along my lines or using a better approach?

Typically this is done with the Maybe type and catMaybes:
catMaybes :: [Maybe a] -> [a]
So if your parsing function has type:
parse :: String -> Maybe (a,b)
then you can build the map by parsing the input string into lines, validating each line and returning just the defined values:
Map.fromList . catMaybes . map parse . lines $ s
where s is your input string.

The List Monad provides what you are looking for. This can probably be most easily leveraged via a list comprehension, although, it works with do notation as well.
First, here's the Scala implementation for reference -
// Using .toList for simpler demonstration
scala> val xs = scala.io.Source.fromFile("foo").getLines().toList
List[String] = List(a=1, b=2, sdkfjhsdf, c=3, sdfkjhsdf, d=4)
scala> xs.map(_.split('=')).collect { case Array(k, v) => (k, v) }
List[(String, String)] = List((a,1), (b,2), (c,3), (d,4))
Now the list comprehension version using Haskell -
λ :m + Data.List.Split
λ xs <- lines <$> readFile "foo"
λ xs
["a=1","b=2","sdkfjhsdf","c=3","sdfkjhsdf","d=4"]
-- List comprehension
λ [(k, v) | [k, v] <- map (splitOn "=") xs]
[("a","1"),("b","2"),("c","3"),("d","4")]
-- Do notation
λ do { [k, v] <- map (splitOn "=") xs; return (k, v) }
[("a","1"),("b","2"),("c","3"),("d","4")]
What's happening is that the pattern match condition is filtering out cases that don't match using the fail method from Monad.
λ fail "err" :: [a]
[]
So both the list comprehension and do notation are leveraging fail, which desugars to this -
map (splitOn "=") xs >>= (
\s -> case s of
[k, v] -> return (k, v)
_ -> fail ""
)

Related

Haskell Type error in Double recursion function

I'm trying to define a greedy function
greedy :: ReadP a -> ReadP [a]
that parses a sequence of values, returning only the "maximal" sequences that cannot be extended any further. For example,
> readP_to_S (greedy (string "a" +++ string "ab")) "abaac"
[(["a"],"baac"),(["ab","a","a"],"c")]
I'm using a very simple and probably clumsy way. Just parse the values and see if they can be parsed any further; if so, then reapply the function again to get all the possible values and concat that with the previous ones, or else just return the value itself. However, there seems to be some type problems, below is my code.
import Text.ParserCombinators.ReadP
addpair :: a -> [([a],String)] -> [([a],String)]
addpair a [] = []
addpair a (c:cs) = (a : (fst c), snd c ) : (addpair a cs)
greedy :: ReadP a -> ReadP [a]
greedy ap = readS_to_P (\s ->
let list = readP_to_S ap s in
f list )
where
f :: [(a,String)] -> [([a],String)]
f ((value, str2):cs) =
case readP_to_S ap str2 of
[] -> ([value], str2) : (f cs)
_ -> (addpair value (readP_to_S (greedy ap) str2)) ++ (f cs)
The GHC processes the code and says that function "f" has type [(a1,String)] -> [([a1],String)] but greedy is ReadP a -> ReadP [a]. I wonder why it is so because I think their type should agree. It also really helps if anyone can come up with some clever and more elegant approach to define the function greedy(my approach is definitely way too redundant)
To fix the compilation error, you need to add the language extension
{-# LANGUAGE ScopedTypeVariables #-}
to your source file, or pass the corresponding flag into the compiler. You also need to change the type signature of greedy to
greedy :: forall a. ReadP a -> ReadP [a]
This is because your two a type variables are not actually the same; they're in different scopes. With the extension and the forall, they are treated as being the same variable, and your types unify properly. Even then, the code errors, because you don't have an exhaustive pattern match in your definition of f. If you add
f [] = []
then the code seems to work as intended.
In order to simplify your code, I took a look at the provided function munch, which is defined as:
munch :: (Char -> Bool) -> ReadP String
-- ^ Parses the first zero or more characters satisfying the predicate.
-- Always succeeds, exactly once having consumed all the characters
-- Hence NOT the same as (many (satisfy p))
munch p =
do s <- look
scan s
where
scan (c:cs) | p c = do _ <- get; s <- scan cs; return (c:s)
scan _ = do return ""
In that spirit, your code can be rewritten as:
greedy2 :: forall a. ReadP a -> ReadP [a]
greedy2 ap = do
-- look at the string
s <- look
-- try parsing it, but without do notation
case readP_to_S ap s of
-- if we failed, then return nothing
[] -> return []
-- if we parsed something, ignore it
(_:_) -> do
-- parse it again, but this time inside of the monad
x <- ap
-- recurse, greedily parsing again
xs <- greedy2 ap
-- and return the concatenated values
return (x:xs)
This does have the speed disadvantage of executing ap twice as often as needed; this may be too slow for your use case. I'm sure my code could be further rewritten to avoid that, but I'm not a ReadP expert.

Implement `Applicative Parser`'s Apply Function

From Brent Yorgey's 2013 Penn class, after getting help on defining a Functor Parser, I'm attempting to make an Applicative Parser:
--p1 <*> p2 represents the parser which first runs p1 (which will
--consume some input and produce a function), then passes the
--remaining input to p2 (which consumes more input and produces
--some value), then returns the result of applying the function to the
--value
Here's my attempt:
instance Applicative (Parser) where
pure x = Parser $ \_ -> Just (x, [])
(Parser f) <*> (Parser g) = case (\ys -> f ys) of Nothing -> Parser Nothing
Just (_, xs) -> Parser $ g xs
However, I'm getting compile-time errors on the apply (<*>) definition.
Intuitively, I believe that using <*> achieves AND functionality.
If I have a parser for foo and a parser for bar, then I should be able to use apply <*> to say: foo followed by bar. In other words, input of foobar should successfully match, whereas foobip would not. It would fail on the second parser.
However, I believe that the types are:
Parser (a -> b) -> Parser a -> Parser b
So, that makes me think that my intuition is not entirely correct.
Please give me a tip to guide me towards understanding how to implement apply.
Your code is predicated on a misunderstanding of what a Parser is. Don't worry, virtually everybody makes this mistake.
newtype Parser a = Parser { runParser :: String -> Maybe (a, String) }
Lets break down what this means.
String -> Maybe (a, String)
[1] [2] [3] [4]
[1]: I take a string and return Maybe (a, String)
[2]: I might not succeed in parsing the input into the desired datatype
[3]: The desired type I am parsing the String into
[4]: Remaining input after having consumed the amount of data required to parse a
Parser is a function of text input to Maybe a tuple of a value and the rest of the text. Parser is emphatically not a tuple, otherwise you wouldn't have a parser. Just data in a tuple.
I'm not going to tell you how to implement <*> and nobody else should either as it would deprive you of the experience.
However, I'll give you pure so you understand the basic pattern:
pure a = Parser (\s -> Just (a, s))
See? It's a function of s -> Maybe (a, s). I intentionally mimicked the type variables in my terms to make it more obvious.

Parse string with lex in Haskell

I'm following Gentle introduction to Haskell tutorial and the code presented there seems to be broken. I need to understand whether it is so, or my seeing of the concept is wrong.
I am implementing parser for custom type:
data Tree a = Leaf a | Branch (Tree a) (Tree a)
printing function for convenience
showsTree :: Show a => Tree a -> String -> String
showsTree (Leaf x) = shows x
showsTree (Branch l r) = ('<':) . showsTree l . ('|':) . showsTree r . ('>':)
instance Show a => Show (Tree a) where
showsPrec _ x = showsTree x
this parser is fine but breaks when there are spaces
readsTree :: (Read a) => String -> [(Tree a, String)]
readsTree ('<':s) = [(Branch l r, u) | (l, '|':t) <- readsTree s,
(r, '>':u) <- readsTree t ]
readsTree s = [(Leaf x, t) | (x,t) <- reads s]
this one is said to be a better solution, but it does not work without spaces
readsTree_lex :: (Read a) => String -> [(Tree a, String)]
readsTree_lex s = [(Branch l r, x) | ("<", t) <- lex s,
(l, u) <- readsTree_lex t,
("|", v) <- lex u,
(r, w) <- readsTree_lex v,
(">", x) <- lex w ]
++
[(Leaf x, t) | (x, t) <- reads s ]
next I pick one of parsers to use with read
instance Read a => Read (Tree a) where
readsPrec _ s = readsTree s
then I load it in ghci using Leksah debug mode (this is unrelevant, I guess), and try to parse two strings:
read "<1|<2|3>>" :: Tree Int -- succeeds with readsTree
read "<1| <2|3> >" :: Tree Int -- succeeds with readsTree_lex
when lex encounters |<2... part of the former string, it splits onto ("|<", _). That does not match ("|", v) <- lex u part of parser and fails to complete parsing.
There are two questions arising:
how do I define parser that really ignores spaces, not requires them?
how can I define rules for splitting encountered literals with lex
speaking of second question -- it is asked more of curiousity as defining my own lexer seems to be more correct than defining rules of existing one.
lex splits into Haskell lexemes, skipping whitespace.
This means that since Haskell permits |< as a lexeme, lex will not split it into two lexemes, since that's not how it parses in Haskell.
You can only use lex in your parser if you're using the same (or similar) syntactic rules to Haskell.
If you want to ignore all whitespace (as opposed to making any whitespace equivalent to one space), it's much simpler and more efficient to first run filter (not.isSpace).
The answer to this seems to be a small gap between text of Gentle introduction to Haskell and its code samples, plus an error in sample code.
there should also be one more lexer, but there is no working example (satisfying my need) in codebase, so I written one. Please point out any flaw in it:
lexAll :: ReadS String
lexAll s = case lex s of
[("",_)] -> [] -- nothing to parse.
[(c, r)] -> if length c == 1 then [(c, r)] -- we will try to match
else [(c, r), ([head s], tail s)]-- not only as it was
any_else -> any_else -- parsed but also splitted
author sais:
Finally, the complete reader. This is not sensitive to white space as
were the previous versions. When you derive the Show class for a data
type the reader generated automatically is similar to this in style.
but lexAll should be used instead of lex (which seems to be said error):
readsTree' :: (Read a) => ReadS (Tree a)
readsTree' s = [(Branch l r, x) | ("<", t) <- lexAll s,
(l, u) <- readsTree' t,
("|", v) <- lexAll u,
(r, w) <- readsTree' v,
(">", x) <- lexAll w ]
++
[(Leaf x, t) | (x, t) <- reads s]

Erlang ++ operator. Syntactic sugar, or separate operation?

Is Erlang's ++ operator simply syntactic sugar for lists:concat or is it a different operation altogether? I've tried searching this, but it's impossible to Google for "++" and get anything useful.
This is how the lists:concat/1 is implemented in the stdlib/lists module:
concat(List) ->
flatmap(fun thing_to_list/1, List).
Where:
flatmap(F, [Hd|Tail]) ->
F(Hd) ++ flatmap(F, Tail);
flatmap(F, []) when is_function(F, 1) -> [].
and:
thing_to_list(X) when is_integer(X) -> integer_to_list(X);
thing_to_list(X) when is_float(X) -> float_to_list(X);
thing_to_list(X) when is_atom(X) -> atom_to_list(X);
thing_to_list(X) when is_list(X) -> X. %Assumed to be a string
So, lists:concat/1 actually uses the '++' operator.
X ++ Y is in fact syntactic sugar for lists:append(X, Y).
http://www.erlang.org/doc/man/lists.html#append-2
The function lists:concat/2 is a different beast. The documentation describes it as follows: "Concatenates the text representation of the elements of Things. The elements of Things can be atoms, integers, floats or strings."
One important use of ++ has not been mentioned yet: In pattern matching. For example:
handle_url(URL = "http://" ++ _) ->
http_handler(URL);
handle_url(URL = "ftp://" ++ _) ->
ftp_handler(URL).
It's an entirely different operation. ++ is ordinary list appending. lists:concat takes a single argument (which must be a list), stringifies its elements, and concatenates the strings.
++ : http://www.erlang.org/doc/reference_manual/expressions.html
lists:concat : http://www.erlang.org/doc/man/lists.html
Most list functions actually uses the '++' operator:
for example the list:append/2:
The source code defines it as follows:
-spec append(List1, List2) -> List3 when
List1 :: [T],
List2 :: [T],
List3 :: [T],
T :: term().
append(L1, L2) -> L1 ++ L2.

Haskell: Lifting a reads function to a parsec parser

As part of the 4th exercise here
I would like to use a reads type function such as readHex with a parsec Parser.
To do this I have written a function:
liftReadsToParse :: Parser String -> (String -> [(a, String)]) -> Parser a
liftReadsToParse p f = p >>= \s -> if null (f s) then fail "No parse" else (return . fst . head ) (f s)
Which can be used, for example in GHCI, like this:
*Main Numeric> parse (liftReadsToParse (many1 hexDigit) readHex) "" "a1"
Right 161
Can anyone suggest any improvement to this approach with regard to:
Will the term (f s) be memoised, or evaluated twice in the case of a null (f s) returning False?
Handling multiple successful parses, i.e. when length (f s) is greater than one, I do not know how parsec deals with this.
Handling the remainder of the parse, i.e. (snd . head) (f s).
This is a nice idea. A more natural approach that would make
your ReadS parser fit in better with Parsec would be to
leave off the Parser String at the beginning of the type:
liftReadS :: ReadS a -> String -> Parser a
liftReadS reader = maybe (unexpected "no parse") (return . fst) .
listToMaybe . filter (null . snd) . reader
This "combinator" style is very idiomatic Haskell - once you
get used to it, it makes function definitions much easier
to read and understand.
You would then use liftReadS like this in the simple case:
> parse (many1 hexDigit >>= liftReadS readHex) "" "a1"
(Note that listToMaybe is in the Data.Maybe module.)
In more complex cases, liftReadS is easy to use inside any
Parsec do block.
Regarding some of your other questions:
The function reader is applied only once now, so there is nothing to "memoize".
It is common and accepted practice to ignore all except the first parse in a ReadS parser in most cases, so you're fine.
To answer the first part of your question, no (f s) will not be memoised, you would have to do that manually:
liftReadsToParse p f = p >>= \s -> let fs = f s in if null fs then fail "No parse"
else (return . fst . head ) fs
But I'd use pattern matching instead:
liftReadsToParse p f = p >>= \s -> case f s of
[] -> fail "No parse"
(answer, _) : _ -> return answer

Resources