maps,filter,folds and more? Do we really need these in Erlang? - erlang

Maps, filters, folds and more : http://learnyousomeerlang.com/higher-order-functions#maps-filters-folds
The more I read ,the more i get confused.
Can any body help simplify these concepts?
I am not able to understand the significance of these concepts.In what use cases will these be needed?
I think it is majorly because of the syntax,diff to find the flow.

The concepts of mapping, filtering and folding prevalent in functional programming actually are simplifications - or stereotypes - of different operations you perform on collections of data. In imperative languages you usually do these operations with loops.
Let's take map for an example. These three loops all take a sequence of elements and return a sequence of squares of the elements:
// C - a lot of bookkeeping
int data[] = {1,2,3,4,5};
int squares_1_to_5[sizeof(data) / sizeof(data[0])];
for (int i = 0; i < sizeof(data) / sizeof(data[0]); ++i)
squares_1_to_5[i] = data[i] * data[i];
// C++11 - less bookkeeping, still not obvious
std::vec<int> data{1,2,3,4,5};
std::vec<int> squares_1_to_5;
for (auto i = begin(data); i < end(data); i++)
squares_1_to_5.push_back((*i) * (*i));
// Python - quite readable, though still not obvious
data = [1,2,3,4,5]
squares_1_to_5 = []
for x in data:
squares_1_to_5.append(x * x)
The property of a map is that it takes a collection of elements and returns the same number of somehow modified elements. No more, no less. Is it obvious at first sight in the above snippets? No, at least not until we read loop bodies. What if there were some ifs inside the loops? Let's take the last example and modify it a bit:
data = [1,2,3,4,5]
squares_1_to_5 = []
for x in data:
if x % 2 == 0:
squares_1_to_5.append(x * x)
This is no longer a map, though it's not obvious before reading the body of the loop. It's not clearly visible that the resulting collection might have less elements (maybe none?) than the input collection.
We filtered the input collection, performing the action only on some elements from the input. This loop is actually a map combined with a filter.
Tackling this in C would be even more noisy due to allocation details (how much space to allocate for the output array?) - the core idea of the operation on data would be drowned in all the bookkeeping.
A fold is the most generic one, where the result doesn't have to contain any of the input elements, but somehow depends on (possibly only some of) them.
Let's rewrite the first Python loop in Erlang:
lists:map(fun (E) -> E * E end, [1,2,3,4,5]).
It's explicit. We see a map, so we know that this call will return a list as long as the input.
And the second one:
lists:map(fun (E) -> E * E end,
lists:filter(fun (E) when E rem 2 == 0 -> true;
(_) -> false end,
[1,2,3,4,5])).
Again, filter will return a list at most as long as the input, map will modify each element in some way.
The latter of the Erlang examples also shows another useful property - the ability to compose maps, filters and folds to express more complicated data transformations. It's not possible with imperative loops.

They are used in almost every application, because they abstract different kinds of iteration over lists.
map is used to transform one list into another. Lets say, you have list of key value tuples and you want just the keys. You could write:
keys([]) -> [];
keys([{Key, _Value} | T]) ->
[Key | keys(T)].
Then you want to have values:
values([]) -> [];
values([{_Key, Value} | T}]) ->
[Value | values(T)].
Or list of only third element of tuple:
third([]) -> [];
third([{_First, _Second, Third} | T]) ->
[Third | third(T)].
Can you see the pattern? The only difference is what you take from the element, so instead of repeating the code, you can simply write what you do for one element and use map.
Third = fun({_First, _Second, Third}) -> Third end,
map(Third, List).
This is much shorter and the shorter your code is, the less bugs it has. Simple as that.
You don't have to think about corner cases (what if the list is empty?) and for experienced developer it is much easier to read.
filter searches lists. You give it function, that takes element, if it returns true, the element will be on the returned list, if it returns false, the element will not be there. For example filter logged in users from list.
foldl and foldr are used, when you have to do additional bookkeeping while iterating over the list - for example summing all the elements or counting something.
The best explanations, I've found about those functions are in books about Lisp: "Structure and Interpretation of Computer Programs" and "On Lisp" Chapter 4..

Related

Splitting a list in OCaml

I am wondering whether there is an option for splitting a list in half (or in general at specified element).
To be precise I would like to do something like this:
Having a list (of integers for example):
let ml = [1;2;3;4;5]
I would like to make two lists out of it with lengths specified with an argument of first list.
It would look like something like this:
let msl1, msl2 = split_at_point ml 3
(* msl1 = [1;2;3], msl2=[4;5] *)
To be honest I don't really care if the splitting occurs at the specific point inclusively or not. All I care is it would be fast and memory saving (no copying would be great). And I know that best in terms of efficiency is something like O(n) where n the length of the first list (of the results).
You can't avoid copying the first half of the list. List are immutable, so the only way to get the new list is to copy elements from the original list.
The second half of the list doesn't require a copy, because the tail of a list is a list.
There's no built-in function for doing this (in the OCaml standard library anyway), so I'd suggest you write your own.
This seems very much like something that would come up a homework assignment, so I'll just say that the way to write most list functions in a functional language is to ask yourself how you could use a function that works for a smaller input (generally, the tail of the list) to calculate the value for your original input.
In your case you have something like this:
let rec split_at_point l n =
if n = 0 then
(* Answer is obvious *)
else
match l with
| [] -> (* Answer is obvious *)
| head :: tail ->
(* Call split_at_point on the tail and
* construct your answer
*)

F#: Returning a list with element at index changed

In F# I need to get a list from an existing list with the value at a particular index changed from the original. All I can find on google and here relates to changing something based on value rather than index (fair enough, because lists don't really have a concept of "index", but I bet everyone knew what I meant).
This is obviously easy enough to code. At the moment I have:
// Replace element at index with newElement in seq. Does nothing if index is outside seq.
let updateElement index newElement seq =
let rec updateElementHelper seq count result =
match seq with
| [] -> result |> List.rev
| head::tail ->
if count = index then
updateElementHelper [] (count + 1) (newElement::result)#tail
else
updateElementHelper tail (count + 1) (head::result)
updateElementHelper seq 0 []
which seems to work just fine, but is there a more native way than this?
(F# newbie - or rather, returing after a very long break and never having got all that far the first time around).
The easiest way to implement this is probably to use the List.mapi function - it calls a function you provide for each element of the list and gives you an index, so you can either return the original element or your new element, depending on the index:
let updateElement index element list =
list |> List.mapi (fun i v -> if i = index then element else v)
updateElement 4 40 [ 0 .. 9 ]
As noted by #Jarak, if you need to do this often, then it might be worth thinking whether there is some other more functional approach to your problem where you do not rely on indices - doing something like this would not be very typical thing to do in functional code.
I am assuming that you don't want to allow the list to be mutable. If you did, then you could just index into the list and update the value, e.g. mylist.[index] <- newValue.
I will say right now that any operation on a list that uses any other sort of access than the typical "head + tail -> recurse on tail" style is a strong sign that a list isn't the right data structure for your operation. See e.g. Juliet's answer here. Typically, if you want to be operating on a linear data structure by index, an array is your best bet.
The easiest way I can think of to do this if you still want to do it with a list, would be something like the following:
let newList = oldList.[..index - 1] # (newValue :: oldList.[index + 1..])
(I might possibly have the indices slightly off)
This will probably have very poor performance, however. I think it would be reasonably fair to say that many F#-ers would call any use of # or List slicing a code smell. As a very infrequent operation on small lists it might be alright, but if it will be used frequently, or on large lists, then it would be a good time to start thinking if a list is really the right collection data structure for your task.

Erlang flatten function time complexity

I need a help with following:
flatten ([]) -> [];
flatten([H|T]) -> H ++ flatten(T).
Input List contain other lists with a different length
For example:
flatten([[1,2,3],[4,7],[9,9,9,9,9,9]]).
What is the time complexity of this function?
And why?
I got it to O(n) where n is a number of elements in the Input list.
For example:
flatten([[1,2,3],[4,7],[9,9,9,9,9,9]]) n=3
flatten([[1,2,3],[4,7],[9,9,9,9,9,9],[3,2,4],[1,4,6]]) n=5
Thanks for help.
First of all I'm not sure your code will work, at least not in the way standard library works. You could compare your function with lists:flatten/1 and maybe improve on your implementation. Try lists such as [a, [b, c]] and [[a], [b, [c]], [d]] as input and verify if you return what you expected.
Regarding complexity it is little tricky due to ++ operator and functional (immutable) nature of the language. All lists in Erlang are linked lists (not arrays like in C++), and you can not just add something to end of one without modifying it; before it was pointing to end of list, now you would like it to link to something else. And again, since it is not mutable language you have to make copy of whole list left of ++ operator, which increases complexity of this operator.
You could say that complexity of A ++ B is length(A), and it makes complexity of your function little bit greater. It would go something like length(FirstElement) + (lenght(FirstElement) + length(SecondElement)) + .... up to (without) last, which after some math magic could be simplified to (n -1) * 1/2 * k * k where n is number of elements, and k is average length of element. Or O(n^3).
If you are new to this it might seem little bit odd, but with some practice you can get hang of it. I would recommend going through few resources:
Good explanation of lists and how they are created
Documentation on list handling with DO and DO NOT parts
Short description of ++ operator myths and best practices
Chapter about recursion and tail-recursion with examples using ++ operator

Find all possible pairs between the subsets of N sets with Erlang

I have a set S. It contains N subsets (which in turn contain some sub-subsets of various lengths):
1. [[a,b],[c,d],[*]]
2. [[c],[d],[e,f],[*]]
3. [[d,e],[f],[f,*]]
N. ...
I also have a list L of 'unique' elements that are contained in the set S:
a, b, c, d, e, f, *
I need to find all possible combinations between each sub-subset from each subset so, that each resulting combination has exactly one element from the list L, but any number of occurrences of the element [*] (it is a wildcard element).
So, the result of the needed function working with the above mentioned set S should be (not 100% accurate):
- [a,b],[c],[d,e],[f];
- [a,b],[c],[*],[d,e],[f];
- [a,b],[c],[d,e],[f],[*];
- [a,b],[c],[d,e],[f,*],[*];
So, basically I need an algorithm that does the following:
take a sub-subset from the subset 1,
add one more sub-subset from the subset 2 maintaining the list of 'unique' elements acquired so far (the check on the 'unique' list is skipped if the sub-subset contains the * element);
Repeat 2 until N is reached.
In other words, I need to generate all possible 'chains' (it is pairs, if N == 2, and triples if N==3), but each 'chain' should contain exactly one element from the list L except the wildcard element * that can occur many times in each generated chain.
I know how to do this with N == 2 (it is a simple pair generation), but I do not know how to enhance the algorithm to work with arbitrary values for N.
Maybe Stirling numbers of the second kind could help here, but I do not know how to apply them to get the desired result.
Note: The type of data structure to be used here is not important for me.
Note: This question has grown out from my previous similar question.
These are some pointers (not a complete code) that can take you to right direction probably:
I don't think you will need some advanced data structures here (make use of erlang list comprehensions). You must also explore erlang sets and lists module. Since you are dealing with sets and list of sub-sets, they seems like an ideal fit.
Here is how things with list comprehensions will get solved easily for you: [{X,Y} || X <- [[c],[d],[e,f]], Y <- [[a,b],[c,d]]]. Here i am simply generating a list of {X,Y} 2-tuples but for your use case you will have to put real logic here (including your star case)
Further note that with list comprehensions, you can use output of one generator as input of a later generator e.g. [{X,Y} || X1 <- [[c],[d],[e,f]], X <- X1, Y1 <- [[a,b],[c,d]], Y <- Y1].
Also for removing duplicates from a list of things L = ["a", "b", "a"]., you can anytime simply do sets:to_list(sets:from_list(L)).
With above tools you can easily generate all possible chains and also enforce your logic as these chains get generated.

Creating a valid function declaration from a complex tuple/list structure

Is there a generic way, given a complex object in Erlang, to come up with a valid function declaration for it besides eyeballing it? I'm maintaining some code previously written by someone who was a big fan of giant structures, and it's proving to be error prone doing it manually.
I don't need to iterate the whole thing, just grab the top level, per se.
For example, I'm working on this right now -
[[["SIP",47,"2",46,"0"],32,"407",32,"Proxy Authentication Required","\r\n"],
[{'Via',
[{'via-parm',
{'sent-protocol',"SIP","2.0","UDP"},
{'sent-by',"172.20.10.5","5060"},
[{'via-branch',"z9hG4bKb561e4f03a40c4439ba375b2ac3c9f91.0"}]}]},
{'Via',
[{'via-parm',
{'sent-protocol',"SIP","2.0","UDP"},
{'sent-by',"172.20.10.15","5060"},
[{'via-branch',"12dee0b2f48309f40b7857b9c73be9ac"}]}]},
{'From',
{'from-spec',
{'name-addr',
[[]],
{'SIP-URI',
[{userinfo,{user,"003018CFE4EF"},[]}],
{hostport,"172.20.10.11",[]},
{'uri-parameters',[]},
[]}},
[{tag,"b7226ffa86c46af7bf6e32969ad16940"}]}},
{'To',
{'name-addr',
[[]],
{'SIP-URI',
[{userinfo,{user,"3966"},[]}],
{hostport,"172.20.10.11",[]},
{'uri-parameters',[]},
[]}},
[{tag,"a830c764"}]},
{'Call-ID',"90df0e4968c9a4545a009b1adf268605#172.20.10.15"},
{'CSeq',1358286,"SUBSCRIBE"},
["date",'HCOLON',
["Mon",44,32,["13",32,"Jun",32,"2011"],32,["17",58,"03",58,"55"],32,"GMT"]],
{'Contact',
[[{'name-addr',
[[]],
{'SIP-URI',
[{userinfo,{user,"3ComCallProcessor"},[]}],
{hostport,"172.20.10.11",[]},
{'uri-parameters',[]},
[]}},
[]],
[]]},
["expires",'HCOLON',3600],
["user-agent",'HCOLON',
["3Com",[]],
[['LWS',["VCX",[]]],
['LWS',["7210",[]]],
['LWS',["IP",[]]],
['LWS',["CallProcessor",[['SLASH',"v10.0.8"]]]]]],
["proxy-authenticate",'HCOLON',
["Digest",'LWS',
["realm",'EQUAL',['SWS',34,"3Com",34]],
[['COMMA',["domain",'EQUAL',['SWS',34,"3Com",34]]],
['COMMA',
["nonce",'EQUAL',
['SWS',34,"btbvbsbzbBbAbwbybvbxbCbtbzbubqbubsbqbtbsbqbtbxbCbxbsbybs",
34]]],
['COMMA',["stale",'EQUAL',"FALSE"]],
['COMMA',["algorithm",'EQUAL',"MD5"]]]]],
{'Content-Length',0}],
"\r\n",
["\n"]]
Maybe https://github.com/etrepum/kvc
I noticed your clarifying comment. I'd prefer to add a comment myself, but don't have enough karma. Anyway, the trick I use for that is to experiment in the shell. I'll iterate a pattern against a sample data structure until I've found the simplest form. You can use the _ match-all variable. I use an erlang shell inside an emacs shell window.
First, bind a sample to a variable:
A = [{a,b},[{c,d}, {e,f}]].
Now set the original structure against the variable:
[{a,b},[{c,d},{e,f}]] = A.
If you hit enter, you'll see they match. Hit alt-p (forget what emacs calls alt, but it's alt on my keyboard) to bring back the previous line. Replace some tuple or list item with an underscore:
[_,[{c,d},{e,f}]].
Hit enter to make sure you did it right and they still match. This example is trivial, but for deeply nested, multiline structures it's trickier, so it's handy to be able to just quickly match to test. Sometimes you'll want to try to guess at whole huge swaths, like using an underscore to match a tuple list inside a tuple that's the third element of a list. If you place it right, you can match the whole thing at once, but it's easy to misread it.
Anyway, repeat to explore the essential shape of the structure and place real variables where you want to pull out values:
[_, [_, _]] = A.
[_, _] = A.
[_, MyTupleList] = A. %% let's grab this tuple list
[{MyAtom,b}, [{c,d}, MyTuple]] = A. %% or maybe we want this atom and tuple
That's how I efficiently dissect and pattern match complex data structures.
However, I don't know what you're doing. I'd be inclined to have a wrapper function that uses KVC to pull out exactly what you need and then distributes to helper functions from there for each type of structure.
If I understand you correctly you want to pattern match some large datastructures of unknown formatting.
Example:
Input: {a, b} {a,b,c,d} {a,[],{},{b,c}}
function({A, B}) -> do_something;
function({A, B, C, D}) when is_atom(B) -> do_something_else;
function({A, B, C, D}) when is_list(B) -> more_doing.
The generic answer is of course that it is undecidable from just data to know how to categorize that data.
First you should probably be aware of iolists. They are created by functions such as io_lib:format/2 and in many other places in the code.
One example is that
[["SIP",47,"2",46,"0"],32,"407",32,"Proxy Authentication Required","\r\n"]
will print as
SIP/2.0 407 Proxy Authentication Required
So, I'd start with flattening all those lists, using a function such as
flatten_io(List) when is_list(List) ->
Flat = lists:map(fun flatten_io/1, List),
maybe_flatten(Flat);
flatten_io(Tuple) when is_tuple(Tuple) ->
list_to_tuple([flatten_io(Element) || Element <- tuple_to_list(Tuple)];
flatten_io(Other) -> Other.
maybe_flatten(L) when is_list(L) ->
case lists:all(fun(Ch) when Ch > 0 andalso Ch < 256 -> true;
(List) when is_list(List) ->
lists:all(fun(X) -> X > 0 andalso X < 256 end, List);
(_) -> false
end, L) of
true -> lists:flatten(L);
false -> L
end.
(Caveat: completely untested and quite inefficient. Will also crash for inproper lists, but you shouldn't have those in your data structures anyway.)
On second thought, I can't help you. Any data structure that uses the atom 'COMMA' for a comma in a string should be taken out and shot.
You should be able to flatten those things as well and start to get a view of what you are looking at.
I know that this is not a complete answer. Hope it helps.
Its hard to recommend something for handling this.
Transforming all the structures in a more sane and also more minimal format looks like its worth it. This depends mainly on the similarities in these structures.
Rather than having a special function for each of the 100 there must be some automatic reformatting that can be done, maybe even put the parts in records.
Once you have records its much easier to write functions for it since you don't need to know the actual number of elements in the record. More important: your code won't break when the number of elements changes.
To summarize: make a barrier between your code and the insanity of these structures by somehow sanitizing them by the most generic code possible. It will be probably a mix of generic reformatting with structure speicific stuff.
As an example already visible in this struct: the 'name-addr' tuples look like they have a uniform structure. So you can recurse over your structures (over all elements of tuples and lists) and match for "things" that have a common structure like 'name-addr' and replace these with nice records.
In order to help you eyeballing you can write yourself helper functions along this example:
eyeball(List) when is_list(List) ->
io:format("List with length ~b\n", [length(List)]);
eyeball(Tuple) when is_tuple(Tuple) ->
io:format("Tuple with ~b elements\n", [tuple_size(Tuple)]).
So you would get output like this:
2> eyeball({a,b,c}).
Tuple with 3 elements
ok
3> eyeball([a,b,c]).
List with length 3
ok
expansion of this in a useful tool for your use is left as an exercise. You could handle multiple levels by recursing over the elements and indenting the output.
Use pattern matching and functions that work on lists to extract only what you need.
Look at http://www.erlang.org/doc/man/lists.html:
keyfind, keyreplace, L = [H|T], ...

Resources