erlang: Count items in list of tuples - erlang

I have the following list of items
[{id, user1, category1}, {id, user2, category1}, {id, user1, category2}....],
where id is unique, and user/category can be repeated. I am trying to figure out how to get stats from the list, e.g.
[{user1, category1, 20}, {user1, category2, 30}..]

You can do it using the lists:foldl/3 function.
F = fun({_,User,Cat},Accumulator) ->
N = maps:get({User,Cat},Accumulator,0),
maps:put({User,Cat},N+1,Accumulator) end.
CountMap = lists:foldl(F,#{},InputListe),
this returns a map of the form #{{user1, category1} => 20, {user1, category2} => 30 ...}
if you really need a list then you have to transform the map:
CountList = maps:fold(fun({User,Cat}, Count, Acc) -> [{User,Cat,Count}|Acc] end,[],CountMap).
I have used an intermediary Map because if the input list is big, then it gives fast accesses and fast update compare to a solution where you work directly in the output list. It costs a lot to retrieve information in a list (in average parse half of the list), and it costs also a lot to modify it (in average copy half of the list
for an input list of 200 000 elements, it took 94msec to generate the map and convert it into a list on my laptop, and 219ms for 500000 elements.

Although Pascal's solution is a good universal solution, for small datasets (like up to 15 000) you can use this version using lists:sort/1 which is significantly faster for them.
main(L) ->
count(lists:sort(transform(L))).
count([]) -> [];
count([H|T]) ->
count(H, T, 1, []).
count(H, [H|T], N, Acc) -> count(H, T, N+1, Acc);
count({U, C}, [H|T], N, Acc) -> count(H, T, 1, [{U, C, N}|Acc]);
count({U, C}, [], N, Acc) -> [{U, C, N}|Acc].
transform(L) ->
transform(L, []).
transform([], Acc) -> Acc;
transform([{_, User, Category}|T], Acc) ->
transform(T, [{User, Category}|Acc]).
Edit:
The key point to determine which algorithm will be faster is a proportion of unique keys. If there is big dataset but with a small amount of unique {User, Category} the solution using maps will be faster. If it is the other way around, lists:sort/1 will be faster. In other words, the size of list vs. map matter.

Related

Erlang: serial implementation of accumulator

I am trying to create a method that takes an associative and commutative operator, as well a list of values, and then returns the answer by applying an operator to the values in the list.
The following two examples represent what the input/output are supposed to look like.
Example 1
Input: sum(fun(A,B) -> A+B end, [2,6,7,10,12]).
Output: 37
Example 2
Input: sum(fun (A,B) -> A++B end , ["C", "D", "E"]).
Output: "CDE"
This is the code I am working with so far.
-module(tester).
-compile(export_all).
sum(Func, Data, Acc) ->
lists:foldr(Func, Acc, Data).
This code produces the correct result, however, there are two problems I am trying to figure out how to approach answering.
(1) In order for this code to work, it requires an empty list to be included at the end of the command line statements. In other words, if I enter the input above (as in the examples), it will err out, because I did not write it in the following way:
12> tester:sum(fun(X, Acc) -> X+Acc end, [2,6,7,10,12], 0).
How would I implement this without an empty list as in the examples above and get the same result?
(2) Also, how would the code be implemented without the list function, or in an even more serial way?
How would I implement this without an empty list as in the examples above and get the same result?
Assuming the list always has one element (you can't really do it without this assumption), you can extract the first element from the list and pass that as the initial accumulator. You'll need to switch to foldl to do this efficiently. (With foldr you'll essentially need to make a copy of the list to drop the last element.)
sum(Func, [X | Xs]) ->
lists:foldl(fun (A, B) -> Func(B, A) end, X, Xs).
1> a:sum(fun(A,B) -> A+B end, [2,6,7,10,12]).
37
2> a:sum(fun (A,B) -> A++B end , ["C", "D", "E"]).
"CDE"
Also, how would the code be implemented without the list function, or in an even more serial way?
Here's a simple implementation using recursion and pattern matching:
sum2(Func, [X | Xs]) ->
sum2(Func, Xs, X).
sum2(Func, [], Acc) ->
Acc;
sum2(Func, [X | Xs], Acc) ->
sum2(Func, Xs, Func(Acc, X)).
We define two versions of the function. The first one extracts the head and uses that as the initial accumulator. The second one, with arity 3, does essentially what the fold functions in lists do.
After working on this for a while, this was my solution. I've left some comments about the general idea of what I did, but there's a lot more to be said.
-module(erlang2).
-compile(export_all).
-export([reduce/2]).
reduce(Func, List) ->
reduce(root, Func, List).
%When done send results to Parent
reduce(Parent, _, [A]) ->
%send to parent
Parent ! { self(), A};
%I tried this at first to take care of one el in list, but it didn't work
%length ([]) ->
% Parent ! {self(), A};
%get contents of list, apply function and store in Parent
reduce(Parent, Func, List) ->
{ Left, Right } = lists:split(trunc(length(List)/2), List),
Me = self(),
%io:format("Splitting in two~n"),
Pl = spawn(fun() -> reduce(Me, Func, Left) end),
Pr = spawn(fun() -> reduce(Me, Func, Right) end),
%merge results in parent and call Func on final left and right halves
combine(Parent, Func,[Pl, Pr]).
%merge pl and pl and combine in parent
combine(Parent, Func, [Pl, Pr]) ->
%wait for processes to complete (using receive) and then send to Parent
receive
{ Pl, Sorted } -> combine(Parent, Func, Pr, Sorted);
{ Pr, Sorted } -> combine(Parent, Func, Pl, Sorted)
end.
combine(Parent, Func, P, List) ->
%wait and store in results and then call ! to send
receive
{ P, Sorted } ->
Results = Func(Sorted, List),
case Parent of
root ->
Results;
%send results to parent
_ -> Parent ! {self(), Results}
end
end.

how to efficiently build erlang lists in natural order?

In the Programming Erlang book, there is some example pseudo code that shows a pattern for efficiently adding elements to the head of a list:
some_function([H|T], ..., Result, ...) ->
H1 = ... H ...,
some_function(T, ..., [H1|Result], ...);
some_function([H|T], ..., Result, ...) ->
{..., Result, ...}.
I'm still getting used to functional programming so the above example is a little too abstract for me to understand at the moment.
I think it will be easier to understand if there is a concrete implementation of the pattern that I could dissect.
Question: Is there a simple concrete implementation of this pattern that someone can provide?
Let's say that we want a function which behaves a like the uniq command.
The function takes a list of elements and returns a list with all consecutive occurrences of an element substituted with a single occurrence of that element.
One of the possible approaches is presented below:
uniq(L) ->
uniq(L, []).
uniq([], Acc) ->
lists:reverse(Acc);
uniq([H, H | T], Acc) ->
uniq([H | T], Acc);
uniq([H | T], Acc) ->
uniq(T, [H | Acc]).
We build up an accumulator, by inserting new elements at the head of the Acc list (cheapest insertion cost) and once we're done, we reverse the whole list to get the initial order of elements back.
We "visit" some of the elements of the initial list twice, but the total cost is still linear, i.e. only dependent on the number of elements of the initial list.
This takes a factorized list, i.e.
[[],[2],[3],[2,2],[5],[2,3],[7],[2,2,2],etc...]
and removes all the primes.
remove_primes([HD|TL], Results) ->
case length(HD) of
0 -> % You're at 1
remove_primes (TL , Results);
1 -> % Its a prime, remove it, and keep going
remove_primes( TL , Results) ;
_ -> % its not prime, leave it in and keep going.
remove_primes(TL, [ HD | Results])
end;
remove_primes([], Result) ->
{Result}.
The structure Joe Armstrong was alluding too, is the standard structure of walking a list and applying a function to each element on the list. In this case, I desired to treat each element differently depending on its contents.
In practice, it is much easier to to use maps, filters and such, so I believe you will see that much more often - but as you seem to know, understanding the basics is vital to becoming a proficient functional programmer.
In hopes centralize information pertaining to 'building lists in natural order', does anyone know why pattern matching at the function level, works, 'but unpacking' a variable does not? (compare this)(it does not work)
remove_primes(Factorized_List, Results) ->
[HD|TL] = Factorized_List, % unpack the list <-------------
case length(HD) of
0 -> % You're at 1
remove_primes (TL , Results);
1 -> % Its a prime, remove it, and keep going
remove_primes( TL , Results) ;
_ -> % its not prime, leave it in and keep going.
remove_primes(TL, [HD|Results])
end;
remove_primes([], Result) ->
{Result}.
I believe this leads to more readable code, but it does not seem to work.
-rC
Here is the only way I can get your pattern to execute:
some_func([H|T], 4, Result, 4) ->
H1 = H * 2,
some_func(T, 3, [H1|Result], 4);
some_func([H|T], 3, Result, _) ->
{H, Result, T}.
--output:--
25> a:some_func([1, 2, 3], 4, [], 4).
{2,[2],[3]}
...which does nothing useful.
The pattern in the pseudo code makes no sense to me, so I'll join you in your confusion.
Here is another attempt:
some_func([H|T], [_|T2], Result, Y) ->
H1 = H * Y,
some_func(T, T2, [H1|Result], Y);
some_func([H|T], [], Result, _) ->
{H, Result, T}.
--output:--
34> a:some_func([1, 2, 3, 4], [one, two, three], [], 2).
{4,[6,4,2],[]}

Erlang, replacing an atom with another one in a list

I want to write a function to replace a specific atom with the given atom in an input list. But I want to do it using pattern matching and not using conditional statements. Any idea?
And also I want to write a function to return unique atoms in an expression.
e.g.
Input:
[a, b, c, a, b]
Output:
c
Input:
[b, b, b, r, t, y, y]
Output:
[t, r]
Assuming you want to replace all instances and keep the order of the list (works with all terms):
replace(Old, New, List) -> replace(Old, New, List, []).
replace(_Old, _New, [], Acc) -> lists:reverse(Acc);
replace(Old, New, [Old|List], Acc) -> replace(Old, New, List, [New|Acc]);
replace(Old, New, [Other|List], Acc) -> replace(Old, New, List, [Other|Acc]).
For the unique elements filter, you need to keep a state of which elements you have looked at already.
It would be really awkward to implement such a function using only pattern matching in the function headers and you would not really gain anything (performance) from it. The awkwardness would come from having to loop through both the list in question and the list(s) keeping your state of already parsed elements. You would also loose a lot of readability.
I would recommend going for something simpler (works with all terms, not just atoms):
unique(List) -> unique(List, []).
unique([], Counts) ->
lists:foldl(fun({E, 1}, Acc) -> [E|Acc];
(_, Acc) -> Acc
end, [], Counts);
unique([E|List], Counts) ->
unique(List, count(E, Counts).
count(E, []) -> [{E, 1}];
count(E, [{E, N}|Rest]) -> [{E, N + 1}|Rest];
count(E, [{X, N}|Rest]) -> [{X, N}|count(E, Rest)].
One way I'm looking for solving your first question would be to use guards, instead of if statements. Using only pattern matching doesn't seem possible (or desirable, even if you can do it).
So, for instance, you could do something like:
my_replace([H|T], ToReplace, Replacement, Accum) when H == ToReplace ->
my_replace(T, ToReplace, Replacement, [Replacement|Accum]);
my_replace([H|T], ToReplace, Replacement, Accum) ->
my_replace(T, ToReplace, Replacement, [H|Accum]);
my_replace([], ToReplace, Replacement, Accum) ->
lists:reverse(Accum).
EDIT: Edited for simplicity and style, thanks for the comments. :)
For the second part of your question, what do you consider an "expression"?
EDIT: Nevermind that, usort doesn't completely remove duplicates, sorry.

Finding the largest element in a list with K processes using Erlang?

It is easy to implement the algorithm using a single process, however, how can I use multiple processes to do the job?
Here is what I have done so far.
find_largest([H], _) -> H;
find_largest([H, Q | T], R) ->
if H > Q -> find_largest([H | T], [Q | R]);
true -> find_largest([Q | T], [H | R])
end.
Thanks
Given how Erlang represents lists, this is probably not a good idea to try and do in parallel. Partitioning the list implies a lot of copying (since they are linked lists) and so does sending these partitions to other processes. I expect the comparison to be far cheaper than copying everything twice and then combining the results.
The implementation is also not correct, you can find a good one in lists.erl as max/1
%% max(L) -> returns the maximum element of the list L
-spec max([T,...]) -> T.
max([H|T]) -> max(T, H).
max([H|T], Max) when H > Max -> max(T, H);
max([_|T], Max) -> max(T, Max);
max([], Max) -> Max.
If by some chance your data are already in separate processes, simply get the lists:max/1 or each of the lists and send them to a single place, and then get the lists:max/1 of the result list. You could also do the comparison as you receive the results to avoid building this intermediate list.
The single process version of your code should be replaced by lists:max/1. A useful function for parallelizing code is as follows:
pmap(Fun, List) ->
Parent = self(),
P = fun(Elem) ->
Ref = make_ref(),
spawn_link(fun() -> Parent ! {Ref, Fun(Elem)} end),
Ref
end,
Refs = [P(Elem) || Elem <- List],
lists:map(fun(Ref) -> receive {Ref, Elem} -> Elem end end, Refs).
pmap/2 applies Fun to each member of List in parallel and collects the results in input order. To use pmap with this problem, you would need to segment your original list into a list of lists and pass that to pmap. e.g. lists:max(pmap(fun lists:max/1, ListOfLists)). Of course, the act of segmenting the lists would be more expensive than simply calling lists:max/1, so this solution would require that the list be pre-segmented. Even then, it's likely that the overhead of copying the lists outweighs any benefit of parallelization - especially on a single node.
The inherent problem with your situation is that the computation of each sub-task is tiny when compared with the overhead of managing the data. Tasks which are more computationally intensive, (e.g. factoring a list of large numbers), are more easily parallelized.
This isn't to say that finding a max value can't be parallelized, but I believe it would require that your data be pre-segmented or segmented in a way that didn't require iterating over every value.

right rotate a List in Erlang

I am getting myself familiar to Sequential Erlang (and the functional programming thinking) now. So I want to implement the following two functionality without the help of BIF. One is left_rotate (which I have come up with the solution) and the other is right_rotate (which I am asking here)
-export(leftrotate/1, rightrotate/1).
%%(1) left rotate a lits
leftrotate(List, 0) ->
List;
leftrotate([Head | Tail], Times) ->
List = append(Tail, Head),
leftrotate(List, Times -1).
append([], Elem)->
[Elem];
append([H|T], Elem) ->
[H | append(T, Elem)].
%%right rotate a list, how?
%%
I don't want to use BIF in this exercise. How can I achieve the right rotation?
A related question and slightly more important question. How can I know one of my implementation is efficient or not (i.e., avoid unnecessary recursion if I implement the same thing with the help of a BIF, and etc.)
I think BIF is built to provide some functions to improve efficiency that functional programming is not good at (or if we do them in a 'functional way', the performance is not optimal).
The efficiency problem you mention has nothing to do with excessive recursion (function calls are cheap), and everything to do with walking and rebuilding the list. Every time you add something to the end of a list you have to walk and copy the entire list, as is obvious from your implementation of append. So, to rotate a list N steps requires us to copy the entire list out N times. We can use lists:split (as seen in one of the other answers) to do the entire rotate in one step, but what if we don't know in advance how many steps we need to rotate?
A list really isn't the ideal data structure for this task. Lets say that instead we use a pair of lists, one for the head and one for the tail, then we can rotate easily by moving elements from one list to the other.
So, carefully avoiding calling anything from the standard library, we have:
rotate_right(List, N) ->
to_list(n_times(N, fun rotate_right/1, from_list(List))).
rotate_left(List, N) ->
to_list(n_times(N, fun rotate_left/1, from_list(List))).
from_list(Lst) ->
{Lst, []}.
to_list({Left, Right}) ->
Left ++ reverse(Right).
n_times(0, _, X) -> X;
n_times(N, F, X) -> n_times(N - 1, F, F(X)).
rotate_right({[], []}) ->
{[], []};
rotate_right({[H|T], Right}) ->
{T, [H|Right]};
rotate_right({[], Right}) ->
rotate_right({reverse(Right), []}).
rotate_left({[], []}) ->
{[], []};
rotate_left({Left, [H|T]}) ->
{[H|Left], T};
rotate_left({Left, []}) ->
rotate_left({[], reverse(Left)}).
reverse(Lst) ->
reverse(Lst, []).
reverse([], Acc) ->
Acc;
reverse([H|T], Acc) ->
reverse(T, [H|Acc]).
The module queue provides a data structure something like this. I've written this without reference to that though, so theirs is probably more clever.
First, your implementation is a bit buggy (try it with the empty list...)
Second, I would suggest you something like:
-module(foo).
-export([left/2, right/2]).
left(List, Times) ->
left(List, Times, []).
left([], Times, Acc) when Times > 0 ->
left(reverse(Acc), Times, []);
left(List, 0, Acc) ->
List ++ reverse(Acc);
left([H|T], Times, Acc) ->
left(T, Times-1, [H|Acc]).
right(List, Times) ->
reverse(foo:left(reverse(List), Times)).
reverse(List) ->
reverse(List, []).
reverse([], Acc) ->
Acc;
reverse([H|T], Acc) ->
reverse(T, [H|Acc]).
Third, for benchmarking your functions, you can do something like:
test(Params) ->
{Time1, _} = timer:tc(?MODULE, function1, Params),
{Time2, _} = timer:tc(?MODULE, function2, Params),
{{solution1, Time1}, {solution2, Time2}}.
I didn't test the code, so look at it critically, just get the idea.
Moreover, you might want to implement your own "reverse" function. It will be trivial by using tail recursion. Why not to try?
If you're trying to think in functional terms then perhaps consider implementing right rotate in terms of your left rotate:
rightrotate( List, 0 ) ->
List;
rightrotate( List, Times ) ->
lists:reverse( leftrotate( lists:reverse( List ), Times ) ).
Not saying this is the best idea or anything :)
Your implementation will not be efficient since the list is not the correct representation to use if you need to change item order, as in a rotation. (Imagine a round-robin scheduler with many thousands of jobs, taking the front job and placing it at the end when done.)
So we're actually just asking ourself what would be the way with least overhead to do this on lists anyway. But then what qualifies as overhead that we want to get rid of? One can often save a bit of computation by consing (allocating) more objects, or the other way around. One can also often have a larger than needed live-set during the computation and save allocation that way.
first_last([First|Tail]) ->
put_last(First, Tail).
put_last(Item, []) ->
[Item];
put_last(Item, [H|Tl]) ->
[H|put_last(Item,Tl)].
Ignoring corner cases with empty lists and such; The above code would cons the final resulting list directly. Very little garbage allocated. The final list is built as the stack unwinds. The cost is that we need more memory for the entire input list and the list in construction during this operation, but it is a short transient thing. My damage from Java and Lisp makes me reach for optimizing down excess consing, but in Erlang you dont risk that global full GC that kills every dream of real time properties. Anyway, I like the above approach generally.
last_first(List) ->
last_first(List, []).
last_first([Last], Rev) ->
[Last|lists:reverse(Rev)];
last_first([H|Tl], Rev) ->
last_first(Tl, [H|Rev]).
This approach uses a temporary list called Rev that is disposed of after we have passed it to lists:reverse/1 (it calls the BIF lists:reverse/2, but it is not doing anything interesting). By creating this temporary reversed list, we avoid having to traverse the list two times. Once for building a list containing everything but the last item, and one more time to get the last item.
One quick comment to your code. I would change the name of the function you call append. In a functional context append usually means adding a new list to the end of a list, not just one element. No sense in adding confusion.
As mentioned lists:split is not a BIF, it is a library function written in erlang. What a BIF really is is not properly defined.
The split or split like solutions look quite nice. As someone has already pointed out a list is not really the best data structure for this type of operation. Depends of course on what you are using it for.
Left:
lrl([], _N) ->
[];
lrl(List, N) ->
lrl2(List, List, [], 0, N).
% no more rotation needed, return head + rotated list reversed
lrl2(_List, Head, Tail, _Len, 0) ->
Head ++ lists:reverse(Tail);
% list is apparenly shorter than N, start again with N rem Len
lrl2(List, [], _Tail, Len, N) ->
lrl2(List, List, [], 0, N rem Len);
% rotate one
lrl2(List, [H|Head], Tail, Len, N) ->
lrl2(List, Head, [H|Tail], Len+1, N-1).
Right:
lrr([], _N) ->
[];
lrr(List, N) ->
L = erlang:length(List),
R = N rem L, % check if rotation is more than length
{H, T} = lists:split(L - R, List), % cut off the tail of the list
T ++ H. % swap tail and head

Resources