Erlang ++ operator. Syntactic sugar, or separate operation? - erlang

Is Erlang's ++ operator simply syntactic sugar for lists:concat or is it a different operation altogether? I've tried searching this, but it's impossible to Google for "++" and get anything useful.

This is how the lists:concat/1 is implemented in the stdlib/lists module:
concat(List) ->
flatmap(fun thing_to_list/1, List).
Where:
flatmap(F, [Hd|Tail]) ->
F(Hd) ++ flatmap(F, Tail);
flatmap(F, []) when is_function(F, 1) -> [].
and:
thing_to_list(X) when is_integer(X) -> integer_to_list(X);
thing_to_list(X) when is_float(X) -> float_to_list(X);
thing_to_list(X) when is_atom(X) -> atom_to_list(X);
thing_to_list(X) when is_list(X) -> X. %Assumed to be a string
So, lists:concat/1 actually uses the '++' operator.

X ++ Y is in fact syntactic sugar for lists:append(X, Y).
http://www.erlang.org/doc/man/lists.html#append-2
The function lists:concat/2 is a different beast. The documentation describes it as follows: "Concatenates the text representation of the elements of Things. The elements of Things can be atoms, integers, floats or strings."

One important use of ++ has not been mentioned yet: In pattern matching. For example:
handle_url(URL = "http://" ++ _) ->
http_handler(URL);
handle_url(URL = "ftp://" ++ _) ->
ftp_handler(URL).

It's an entirely different operation. ++ is ordinary list appending. lists:concat takes a single argument (which must be a list), stringifies its elements, and concatenates the strings.
++ : http://www.erlang.org/doc/reference_manual/expressions.html
lists:concat : http://www.erlang.org/doc/man/lists.html

Most list functions actually uses the '++' operator:
for example the list:append/2:
The source code defines it as follows:
-spec append(List1, List2) -> List3 when
List1 :: [T],
List2 :: [T],
List3 :: [T],
T :: term().
append(L1, L2) -> L1 ++ L2.

Related

Haskell Type error in Double recursion function

I'm trying to define a greedy function
greedy :: ReadP a -> ReadP [a]
that parses a sequence of values, returning only the "maximal" sequences that cannot be extended any further. For example,
> readP_to_S (greedy (string "a" +++ string "ab")) "abaac"
[(["a"],"baac"),(["ab","a","a"],"c")]
I'm using a very simple and probably clumsy way. Just parse the values and see if they can be parsed any further; if so, then reapply the function again to get all the possible values and concat that with the previous ones, or else just return the value itself. However, there seems to be some type problems, below is my code.
import Text.ParserCombinators.ReadP
addpair :: a -> [([a],String)] -> [([a],String)]
addpair a [] = []
addpair a (c:cs) = (a : (fst c), snd c ) : (addpair a cs)
greedy :: ReadP a -> ReadP [a]
greedy ap = readS_to_P (\s ->
let list = readP_to_S ap s in
f list )
where
f :: [(a,String)] -> [([a],String)]
f ((value, str2):cs) =
case readP_to_S ap str2 of
[] -> ([value], str2) : (f cs)
_ -> (addpair value (readP_to_S (greedy ap) str2)) ++ (f cs)
The GHC processes the code and says that function "f" has type [(a1,String)] -> [([a1],String)] but greedy is ReadP a -> ReadP [a]. I wonder why it is so because I think their type should agree. It also really helps if anyone can come up with some clever and more elegant approach to define the function greedy(my approach is definitely way too redundant)
To fix the compilation error, you need to add the language extension
{-# LANGUAGE ScopedTypeVariables #-}
to your source file, or pass the corresponding flag into the compiler. You also need to change the type signature of greedy to
greedy :: forall a. ReadP a -> ReadP [a]
This is because your two a type variables are not actually the same; they're in different scopes. With the extension and the forall, they are treated as being the same variable, and your types unify properly. Even then, the code errors, because you don't have an exhaustive pattern match in your definition of f. If you add
f [] = []
then the code seems to work as intended.
In order to simplify your code, I took a look at the provided function munch, which is defined as:
munch :: (Char -> Bool) -> ReadP String
-- ^ Parses the first zero or more characters satisfying the predicate.
-- Always succeeds, exactly once having consumed all the characters
-- Hence NOT the same as (many (satisfy p))
munch p =
do s <- look
scan s
where
scan (c:cs) | p c = do _ <- get; s <- scan cs; return (c:s)
scan _ = do return ""
In that spirit, your code can be rewritten as:
greedy2 :: forall a. ReadP a -> ReadP [a]
greedy2 ap = do
-- look at the string
s <- look
-- try parsing it, but without do notation
case readP_to_S ap s of
-- if we failed, then return nothing
[] -> return []
-- if we parsed something, ignore it
(_:_) -> do
-- parse it again, but this time inside of the monad
x <- ap
-- recurse, greedily parsing again
xs <- greedy2 ap
-- and return the concatenated values
return (x:xs)
This does have the speed disadvantage of executing ap twice as often as needed; this may be too slow for your use case. I'm sure my code could be further rewritten to avoid that, but I'm not a ReadP expert.

Erlang BIFs inside lists module

Inspired from this question, I was curious to see how lists:reverse/2 is implemented in the source code inside lists.erl module.
I found out that there is no implementation for lists:reverse/2 inside lists.erl, but there is an implementation for lists:reverse/1 that uses lists:reverse/2:
reverse([] = L) ->
L;
reverse([_] = L) ->
L;
reverse([A, B]) ->
[B, A];
reverse([A, B | L]) ->
lists:reverse(L, [B, A]).
At the top of the file there are some lines that tells that lists:reverse/2 (and some other functions) are BIFs:
%%% BIFs
-export([keyfind/3, keymember/3, keysearch/3, member/2, reverse/2]).
...
%% Shadowed by erl_bif_types: lists:reverse/2
-spec reverse(List1, Tail) -> List2 when
List1 :: [T],
Tail :: term(),
List2 :: [T],
T :: term().
reverse(_, _) ->
erlang:nif_error(undef).
Question: First, I couldn't find the actual implementations of those BIFs. Where can I find them? Second, If someone also knows to explain why is it organized that way?
The lists BIFs are implemented in erts/emulator/beam/erl_bif_lists.c. Parts of heavily-used standard modules such as lists are implemented as BIFs for efficiency and performance.

Why use lambda instead of pattern matching?

I am watching a tutorial on parsers in haskell https://www.youtube.com/watch?v=9FGThag0Fqs. The lecture starts with defining some really basic parsers. These are to be used together to create more complicated parsers later. One of the basic parsers is item. This is used to extract a character from the string we are parsing.
All parsers have the following type:
type Parser a = String -> [(a, String)]
The parser item is defined like this:
item :: Parser Char
item = \inp -> case inp of
[] -> []
(x:xs) -> [(x,xs)]
I am not so used to this syntax, so it looks strange to me. I would have written it:
item' :: Parser Char
item' [] = []
item' (x:xs) = [(x,xs)]
Testing it in ghci indicates that they are equal:
*Main> item ""
[]
*Main> item "abc"
[('a',"bc")]
*Main> item' ""
[]
*Main> item' "abc"
[('a',"bc")]
The lecturer makes a short comment about thinking it looks clearer, but I disagree. So my questions are:
Are they indeed completely identical?
Why is the lambda version clearer?
I believe this comes from the common practice of writing
f :: Type1 -> ... -> Typen -> Result
f x1 ... xn = someResult
where we have exactly n function arrows in the type, and exactly n arguments in the left hand side of the equation. This makes it easy to relate types and formal parameters.
If Result is a type alias for a function, then we may write
f :: Type1 -> ... -> Typen -> Result
f x1 ... xn y = something
or
f :: Type1 -> ... -> Typen -> Result
f x1 ... xn = \y -> something
The latter follows the convention above: n arrows, n variables in the left hand side. Also, on the right hand side we have something of type Result, making it easier to spot. The former instead does not, and one might miss the extra argument when reading the code quickly.
Further, this style makes it easy to convert Result to a newtype instead of a type alias:
newtype Result = R (... -> ...)
f :: Type1 -> ... -> Typen -> Result
f x1 ... xn = R $ \y -> something
The posted item :: Parser Char code is an instance of this style when n=0.
Why you should avoid equational function definitions (by Roman Cheplyaka):
http://ro-che.info/articles/2014-05-09-clauses
Major Points from the above link:
DRY: Function and argument names are repeated --> harder to refactor
Clearer shows which arguments the function decides upon
Easy to add pragmas (e.g. for profiling)
Syntactically closer to lower level code
This doesn't explain the lambda though..
I think they are absolutely equal. The lambda-style definition puts a name item to an anonymous lambda function which does pattern matching inside. The pattern-matching-style definition defines it directly. But in the end both are functions that do pattern matching. I think it's a matter of personal taste.
Also, the lambda-style definition could be considered to be in pointfree style, i.e. a function defined without explicitly writing down its arguments actually it is not very much pointfree since the argument is still written (but in a different place), but in this case you don't get anything with this.
Here is another possible definion, somewhere in between of those two:
item :: Parser Char
item inp = case inp of
[] -> []
(x:xs) -> [(x, xs)]
It's essentially identical to the lambda-style, but not pointfree.

Haskell equivalent of scala collect

I'm trying to read in a file containing key/value pairs of the form:
#A comment
a=foo
b=bar
c=baz
Some other stuff
With various other lines, as suggested. This wants to go into a map that I can then look up keys from.
My initial approach would be to read in the lines and split on the '=' character to get a [[String]]. In Scala, I would then use collect, which takes a partial function (in this case something like \x -> case x of a :: b :: _ -> (a,b) and applies it where it's defined, throwing away the values where the function is undefined. Does Haskell have any equivalent of this?
Failing that, how would one do this in Haskell, either along my lines or using a better approach?
Typically this is done with the Maybe type and catMaybes:
catMaybes :: [Maybe a] -> [a]
So if your parsing function has type:
parse :: String -> Maybe (a,b)
then you can build the map by parsing the input string into lines, validating each line and returning just the defined values:
Map.fromList . catMaybes . map parse . lines $ s
where s is your input string.
The List Monad provides what you are looking for. This can probably be most easily leveraged via a list comprehension, although, it works with do notation as well.
First, here's the Scala implementation for reference -
// Using .toList for simpler demonstration
scala> val xs = scala.io.Source.fromFile("foo").getLines().toList
List[String] = List(a=1, b=2, sdkfjhsdf, c=3, sdfkjhsdf, d=4)
scala> xs.map(_.split('=')).collect { case Array(k, v) => (k, v) }
List[(String, String)] = List((a,1), (b,2), (c,3), (d,4))
Now the list comprehension version using Haskell -
λ :m + Data.List.Split
λ xs <- lines <$> readFile "foo"
λ xs
["a=1","b=2","sdkfjhsdf","c=3","sdfkjhsdf","d=4"]
-- List comprehension
λ [(k, v) | [k, v] <- map (splitOn "=") xs]
[("a","1"),("b","2"),("c","3"),("d","4")]
-- Do notation
λ do { [k, v] <- map (splitOn "=") xs; return (k, v) }
[("a","1"),("b","2"),("c","3"),("d","4")]
What's happening is that the pattern match condition is filtering out cases that don't match using the fail method from Monad.
λ fail "err" :: [a]
[]
So both the list comprehension and do notation are leveraging fail, which desugars to this -
map (splitOn "=") xs >>= (
\s -> case s of
[k, v] -> return (k, v)
_ -> fail ""
)

What is the difference between the `fun` and `function` keywords?

Sometimes I see code like
let (alt : recognizer -> recognizer -> recognizer) =
fun a b p -> union (a p) (b p)
Or like:
let hd = function
Cons(x,xf) -> x
| Nil -> raise Empty
What is the difference between fun and function?
The semantics for this is the same as in F# (probably because F# is based on OCaml):
function allows the use of pattern matching (i.e. |), but consequently it can be passed only one argument.
function p_1 -> exp_1 | … | p_n -> exp_n
is equivalent to
fun exp -> match exp with p_1 -> exp_1 | … | p_n -> exp_n
fun does not allow pattern matching, but can be passed multiple arguments, e.g.
fun x y -> x + y
When either of the two forms can be used, fun is generally preferred due to its compactness.
See also OCaml documentation on Functions.
The way I think about it
function patterns
is shorthand for
(fun x -> match x with patterns)
where 'patterns' is e.g.
| Some(x) -> yadda | None -> blah
(And
fun args -> expr
is how you define a lambda.)
Russ Cam is correct in his answer.
Here is a posting on the OCaml list talking about it
http://caml.inria.fr/pub/ml-archives/ocaml-beginners/2003/11/b8036b7a0c1d082111d7a83c8f6dbfbb.en.html
function only allows for one argument but allows for pattern matching, while fun is the more general and flexible way to define a function.
I generally use fun unless there is a good reason to use function.
You can see this in the code you posted where the fun declaration takes 3 arguments and the function declaration does pattern matching on it's input
fun x1 ... xn -> e
is an abbreviation for
function x1 -> ... -> function xn -> e

Resources