Erlang - list comprehensions - populating records - erlang

I have a simple record structure consisting of a header (H) and a list of the data lines (D) 1:N. All header lines must start with a digit. All data lines have a leading whitespace. There also might be some empty lines (E) in between that must be ignored.
L = [H, D, D, E, H, D, E, H, D, D, D].
I would like to create a list of records:
-record(posting,{header,data}).
using list comprehension. Whats the best way to do it?

You must use lists:foldl/3 instead of list comprehensions in this case. With foldl/3 you can accumulate values of header and data through whole list L.

You should do something like this:
make_records(L) when is_list(L) ->
F = fun([32|_]=D,{#posting{}=H,Acc}) -> {H,[H#posting{data=D}|Acc]};
([], Acc) -> Acc;
([F|_]=H, {_,Acc}) when F=<$0, F>=$9 -> {#posting{header=>H}, Acc}
end,
{_, R} = lists:foldl(F, {undefined, []}, L),
R.
Anyway I think that straightforward Erlang version doesn't seems too complicated and should be little bit faster.
make_records2(L) when is_list(L) ->
make_records2(L, undefined, []).
make_records2([], _, R) -> R;
make_records2([[32|_]=D|T], H, Acc) when is_list(H) ->
make_records2(T, H, [#posting{header=H,data=D}|Acc]);
make_records2([[]|T], H, Acc) ->
make_records2(T, H, Acc);
make_records2([[F|_]=H|T], _, Acc) when F>=$0, F=<$9 ->
make_records2(T, H, Acc).
Edit: If you have to add better row classification or parsing, adding new function is better because it improves readability.
parse_row([Digit|_]=R) when Digit >= $0, Digit =< $9 -> {header, R};
parse_row(R) -> try_spaces(R).
try_spaces([]) -> empty;
try_spaces([Sp|R]) when Sp=:=$\s; Sp=:=$\t; Sp=:=$\n ->
try_spaces(R); % skip all white spaces from Data field
try_spaces(Data) -> {data, Data}.
You can use it like this:
make_records(L) when is_list(L) ->
F = fun(Row, {H, Acc}) ->
case parse_row(Row) of
{data, D} when is_record(H, posting) -> {H,[H#posting{data=D}|Acc]};
empty -> Acc;
{header, H} -> {#posting{header=>H}, Acc}
end,
{_, R} = lists:foldl(F, {undefined, []}, L),
R.
Tail recursive native Erlang solution:
make_records2(L) when is_list(L) ->
make_records2([parse_row(R) || R<-L], undefined, []).
make_records2([], _, R) -> R;
make_records2([{data, D}|T], H, Acc) when is_list(H) ->
make_records2(T, H, [#posting{header=H,data=D}|Acc]);
make_records2([empty|T], H, Acc) ->
make_records2(T, H, Acc);
make_records2([{header,H}|T], _, Acc) ->
make_records2(T, H, Acc).
I think that there is no reason use tail recursion from performance point of view:
make_records3(L) when is_list(L) ->
make_records3(L, undefined).
make_records3([], _) -> [];
make_records3([R|T], H) ->
case parse_row(R) of
{data, D} when is_list(H) -> [#posting{head=H,data=D}|make_records3(T, H)];
empty -> make_records3(T, H);
{header, H2} -> make_records3(T, H2)
end.
... and many many other variants.

I needed to collapse all Data lines beneath the header - so for the moment here is what I have:
sanitize(S) -> trim:trim(S).
make_records(L) when is_list(L) -> make_records(L, undefined, []).
make_records([], _, R) -> lists:reverse(R);
make_records([[32|_]=D|T], H, Acc) when is_tuple(H) ->
make_records(T, {element(1,H),[sanitize(D)|element(2,H)]},Acc);
make_records([[$\n|_]=D|T], H, Acc) when is_tuple(H) ->
make_records(T, H, Acc);
make_records([[F|_]=H|T], B, Acc) when F>=$0, F=<$9 ->
if is_tuple(B) ->
make_records(T, {sanitize(H),[]}, [#posting{header=element(1,B),
data=lists:reverse(element(2,B))}|Acc]);
true ->
make_records(T, {sanitize(H),[]}, Acc)
end.

Related

List of tuples [{id, [<List>]}, {id2, [<List>]} ] where ids are the second item of the tuple of the original list- Erlang

The title^ is kinda confusing but I will illustrate what I want to achieve:
I have:
[{<<"5b71d7e458c37fa04a7ce768">>,<<"5b3f77502dfe0deeb8912b42">>,<<"1538077790705827">>},
{<<"5b71d7e458c37fa04a7ce768">>,<<"5b3f77502dfe0deeb8912b42">>,<<"1538078530667847">>},
{<<"5b71d7e458c37fa04a7ce768">>,<<"5b3f77502dfe0deeb8912b42">>,<<"1538077778390908">>},
{<<"5b71d7e458c37fa04a7ce768">>,<<"5bad45b1e990057961313822">>,<<"1538082492283531">>
}]
I want to convert it to a list like this:
[
{<<"5b3f77502dfe0deeb8912b42">>,
[{<<"5b71d7e458c37fa04a7ce768">>,<<"5b3f77502dfe0deeb8912b42">>,<<"1538077790705827">>},
{<<"5b71d7e458c37fa04a7ce768">>,<<"5b3f77502dfe0deeb8912b42">>,<<"1538078530667847">>},
{<<"5b71d7e458c37fa04a7ce768">>,<<"5b3f77502dfe0deeb8912b42">>,<<"1538077778390908">>}
]},
{<<"5bad45b1e990057961313822">>,
[{<<"5b71d7e458c37fa04a7ce768">>,<<"5bad45b1e990057961313822">>,<<"1538082492283531">>}
]}
]
List of tuples [{id, [<List>]}, {id2, [<List>]} ] where ids are the second item of the tuple of the original list
Example :
<<"5b71d7e458c37fa04a7ce768">>,<<"5b3f77502dfe0deeb8912b42">>,<<"1538077790705827">>
Erlang newbie here. I created a dict with the second members of the tuples as keys and lists of corresponding tuples as values, then used dict:fold to transform it into the expected output format.
-export([test/0, transform/1]).
transform([H|T]) ->
transform([H|T], dict:new()).
transform([], D) ->
lists:reverse(
dict:fold(fun (Key, Tuples, Acc) ->
lists:append(Acc,[{Key,Tuples}])
end,
[],
D));
transform([Tuple={_S1,S2,_S3}|T], D) ->
transform(T, dict:append_list(S2, [Tuple], D)).
test() ->
Input=[{<<"5b71d7e458c37fa04a7ce768">>,<<"5b3f77502dfe0deeb8912b42">>,<<"1538077790705827">>},
{<<"5b71d7e458c37fa04a7ce768">>,<<"5b3f77502dfe0deeb8912b42">>,<<"1538078530667847">>},
{<<"5b71d7e458c37fa04a7ce768">>,<<"5b3f77502dfe0deeb8912b42">>,<<"1538077778390908">>},
{<<"5b71d7e458c37fa04a7ce768">>,<<"5bad45b1e990057961313822">>,<<"1538082492283531">>}
],
Output=transform(Input),
case Output of
[
{<<"5b3f77502dfe0deeb8912b42">>,
[{<<"5b71d7e458c37fa04a7ce768">>,<<"5b3f77502dfe0deeb8912b42">>,<<"1538077790705827">>},
{<<"5b71d7e458c37fa04a7ce768">>,<<"5b3f77502dfe0deeb8912b42">>,<<"1538078530667847">>},
{<<"5b71d7e458c37fa04a7ce768">>,<<"5b3f77502dfe0deeb8912b42">>,<<"1538077778390908">>}
]},
{<<"5bad45b1e990057961313822">>,
[{<<"5b71d7e458c37fa04a7ce768">>,<<"5bad45b1e990057961313822">>,<<"1538082492283531">>}
]}
] -> ok;
_Else -> error
end.
I think I see what you're after... Please correct me if I'm wrong.
There are a number of ways to do this, it really just depends on what sort of data structure you're interested in using to check the presence of like-keys. I'll show you two fundamentally different ways to do this and a third hybrid method that has become recently available:
Indexed data types (in this case a map)
List operations with matching
Hybrid matching over map keys
Since you're new I'll use the first case to demonstrate two ways of writing it: explicit recursion and using an actual list function from the lists module.
Indexy Data Types
The first way we'll do this is to use a hash table (aka "dict", "map", "hash", "K/V", etc.) and explicitly recurse through the elements, checking for the presence of the key encountered and adding it if it is missing, or appending to the list of values it points to if it does. We'll use an Erlang map for this. At the end of the function we'll convert the utility map back to a list:
explicit_convert(List) ->
Map = explicit_convert(List, maps:new()),
maps:to_list(Map).
explicit_convert([H | T], A) ->
K = element(2, H),
NewA =
case maps:is_key(K, A) of
true ->
V = maps:get(K, A),
maps:put(K, [H | V], A);
false ->
maps:put(K, [H], A)
end,
explicit_convert(T, NewA);
explicit_convert([], A) ->
A.
There is nothing wrong with explicit recursion (it is particularly good if you're new, because every part of it is left in the open to be examined), but this is a "left fold" and we already have a library function that abstracts a little bit of the plumbing out. So we really only need to write a function that checks for the presence of an element, and adds the key or appends the value:
fun_convert(List) ->
Map = lists:foldl(fun convert/2, maps:new(), List),
maps:to_list(Map).
convert(H, A) ->
K = element(2, H),
case maps:is_key(K, A) of
true ->
V = maps:get(K, A),
maps:put(K, [H | V], A);
false ->
maps:put(K, [H], A)
end.
Listy Conversion
The other major way we could have done this is with listy matching. To do that you need to first guarantee that your elements are sorted on the element you want to use as a key so that you can use it as a sort of "working element" and match on it. The code should be pretty easy to understand once you stare at it for a bit (maybe write out how it will step through your list by hand on paper once if you're totally perplexed):
listy_convert(List) ->
[T = {_, K, _} | Rest] = lists:keysort(2, List),
listy_convert(Rest, {K, [T]}, []).
listy_convert([T = {_, K, _} | Rest], {K, Ts}, Acc) ->
listy_convert(Rest, {K, [T | Ts]}, Acc);
listy_convert([T = {_, K, _} | Rest], Done, Acc) ->
listy_convert(Rest, {K, [T]}, [Done | Acc]);
listy_convert([], Done, Acc) ->
[Done | Acc].
Note that we split the list immediately after sorting it. The reason is that we have "prime the pump", so to speak, on the first call we make to listy_convert/3. This also means that this function will crash if you pass it an empty list. You can solve that by adding a clause to listy_convert/1 that matches on the empty list [].
A Final Bit of Magic
With those firmly in mind... consider that we also have a bit of a hybrid option available in newer versions of Erlang due to the magical syntax available to maps. We can match (most values) on map keys inside of a case clause (though we can't unify on a key value provided by other arguments within a function head):
map_convert(List) ->
maps:to_list(map_convert(List, #{})).
map_convert([T = {_, K, _} | Rest], Acc) ->
case Acc of
#{K := Ts} -> map_convert(Rest, Acc#{K := [T | Ts]});
_ -> map_convert(Rest, Acc#{K => [T]})
end;
map_convert([], Acc) ->
Acc.
Here is a one-liner that would produce your expected result:
[{K, [E || {_, K2, _} = E <- List, K =:= K2]} || {_, K, _} <- lists:ukeysort(2, List)].
What’s going on here? Let’s do it step by step…
This is your original list
List = […],
lists:ukeysort/2 leaves just one element per key in the list
OnePerKey = lists:ukeysort(2, List),
We then extract the keys with the first list comprehension
Keys = [K || {_, K, _} <- OnePerKey],
With the second list comprehension, we find the elements with the key…
fun Filter(K, List) ->
[E || {_, K2, _} = E <- List, K =:= K2]
end
Keep in mind that we can’t just pattern-match with K in the generator (i.e. [E || {_, K, _} = E <- List]) because generators in LCs introduce new scope for the variables.
Finally, putting all together…
[{K, Filter(K, List)} || K <- Keys]
It really depends on your dataset. For lager data sets using maps is a bit more efficient.
-module(test).
-export([test/3, v1/2, v2/2, v3/2, transform/1, do/2]).
test(N, Keys, Size) ->
List = [{<<"5b71d7e458c37fa04a7ce768">>,rand:uniform(Keys),<<"1538077790705827">>} || I <- lists:seq(1,Size)],
V1 = timer:tc(test, v1, [N, List]),
V2 = timer:tc(test, v2, [N, List]),
V3 = timer:tc(test, v3, [N, List]),
io:format("V1 took: ~p, V2 took: ~p V3 took: ~p ~n", [V1, V2, V3]).
v1(N, List) when N > 0 ->
[{K, [E || {_, K2, _} = E <- List, K =:= K2]} || {_, K, _} <- lists:ukeysort(2, List)],
v1(N-1, List);
v1(_,_) -> ok.
v2(N, List) when N > 0 ->
do(List,maps:new()),
v2(N-1, List);
v2(_,_) -> ok.
v3(N, List) when N > 0 ->
transform(List),
v3(N-1, List);
v3(_,_) -> ok.
do([], R) -> maps:to_list(R);
do([H={_,K,_}|T], R) ->
case maps:get(K,R,null) of
null -> NewR = maps:put(K, [H], R);
V -> NewR = maps:update(K, [H|V], R)
end,
do(T, NewR).
transform([H|T]) ->
transform([H|T], dict:new()).
transform([], D) ->
lists:reverse(
dict:fold(fun (Key, Tuples, Acc) ->
lists:append(Acc,[{Key,Tuples}])
end,
[],
D));
transform([Tuple={_S1,S2,_S3}|T], D) ->
transform(T, dict:append_list(S2, [Tuple], D)).
Running both with 100 unique keys and 100,000 records I get:
> test:test(1,100,100000).
V1 took: {75566,ok}, V2 took: {32087,ok} V3 took: {887362,ok}
ok

How to count the occurrence of each character in a list?

How to count all consecutive repeated elements in a list and packs them together with the number of their occurrences as pairs.
Example :
compress("Hello") == [{1,$H},{1,$e},{2,$l},{1,$o}]
I have tried this function but I have errors, can someone help me to solve :
compress([])->
[];
compress(L)->
helper(L,0).
helper([], _)->
[];
helper([H|T], Count)->
case H == hd(T) of
true -> helper(T,Count), [{Count+1, H}];
false -> helper(T, Count), [{Count, H}]
end.
This way:
compress(L) ->
helper(L, []).
helper([], Acc) -> lists:reverse(Acc);
helper([H|T], [{Count, H}|Acc]) ->
helper(T, [{Count+1, H}|Acc]);
helper([H|T], Acc) ->
helper(T, [{1, H}|Acc]).
Or more straightforward and on some platforms faster (less garbage generating) version:
compress2([]) -> [];
compress2([H|T]) ->
helper2(T, H, 1).
helper2([H|T], H, Count) ->
helper2(T, H, Count+1);
helper2([H|T], C, Count) ->
[{Count, C}|helper2(T, H, 1)];
helper2([], C, Count) ->
[{Count, C}].

How to divide a string into substrings?

I would like to divide a string to sub-strings based on a given number , for example:
divide("string",1) = ["s","t","r","i","n","g"].
I have tried this, but no success .
lists:split(1,"string") = {"s", "tring"}
Any idea?
I would calculate the length once (since it's a slow operation) and then recursively use lists:split/2 until the list left is smaller than N:
divide(List, N) ->
divide(List, N, length(List)).
divide(List, N, Length) when Length > N ->
{A, B} = lists:split(N, List),
[A | divide(B, N, Length - N)];
divide(List, _, _) ->
[List].
1> c(a).
{ok,a}
2> a:divide("string", 1).
["s","t","r","i","n","g"]
3> a:divide("string", 2).
["st","ri","ng"]
4> a:divide("string", 3).
["str","ing"]
5> a:divide("string", 4).
["stri","ng"]
6> a:divide("string", 5).
["strin","g"]
7> a:divide("string", 6).
["string"]
8> a:divide("string", 7).
["string"]
I think #Dogbert solution is currently the best... But here an other implementation example with recursive loop.
divide_test() ->
[?assertEqual(divide("string",1), ["s","t","r","i","n","g"]),
?assertEqual(divide("string",2), ["st","ri","ng"]),
?assertEqual(divide("string",3), ["str","ing"]),
?assertEqual(divide("string",4), ["stri","ng"])
].
-spec divide(list(), integer()) -> list(list()).
divide(String, Size)
when is_list(String), is_integer(Size) ->
divide(String, Size, 0, [], []).
-spec divide(list(), integer(), integer(), list(), list()) -> list(list()).
divide([], _, _, Buf, Result) ->
Return = [lists:reverse(Buf)] ++ Result,
lists:reverse(Return);
divide([H|T], Size, 0, Buf, Result) ->
divide(T, Size, 1, [H] ++ Buf, Result);
divide([H|T], Size, Counter, Buf, Result) ->
case Counter rem Size =:= 0 of
true ->
divide(T, Size, Counter+1, [H] ++ [], [lists:reverse(Buf)] ++ Result);
false ->
divide(T, Size, Counter+1, [H] ++ Buf, Result)
end.
You can try this function. provided the number is > 0 less than or equal to string length divided by two.
first_substring(List, Separator) ->
first_substring_loop(List, Separator, []).
first_substring_loop([], _, Reversed_First) ->
lists:reverse(Reversed_First);
first_substring_loop(List, Separator, Reversed_First) ->
[H|T]= my_tuple_to_list(lists:split(Separator,List)),
first_substring_loop(lists:flatten(T), Separator, [H|Reversed_First]).
my_tuple_to_list(Tuple) -> [element(T, Tuple) || T <- lists:seq(1, tuple_size(Tuple))].
the result is
1> fact:first_substring("string", 1).
["s","t","r","i","n","g"]
2> fact:first_substring("string", 2).
["st","ri","ng"]
3> fact:first_substring("string", 3).
["str","ing"]
A short simple solution can be:
divide(String, Length) -> divide(String, Length, []).
divide([], _, Acc) -> Acc;
divide(String, Length, Acc) ->
{Res, Rest} = lists:split(min(Length, length(String)), String),
divide(Rest, Length, Acc ++ [Res]).
Also for a specific case of splitting with length 1, a list comprehension can be used:
ListOfLetters = [[Letter] || Letter <- String].

How to split a list of strings into given number of lists in erlang

Given a list and an integer, I want to split that list into the specified number of lists (inside a list).
For example:
Input:
[1,2,3,4,5,6,7,8,9], 3
Output:
[[1,2,3],[4,5,6],[7,8,9]]
What is a clean and efficient way to do this?
The solution written by Steve Vinoski calls length/1 in guard for each partition which makes it O(N^2). It simply bothers me because it can be done in O(N) and I am performance freak. It can be done in many ways so just for example there is one:
divide(L, N) when is_integer(N), N > 0 ->
divide(N, 0, L, []).
divide(_, _, [], Acc) ->
[lists:reverse(Acc)];
divide(N, N, L, Acc) ->
[lists:reverse(Acc) | divide(N, 0, L, [])];
divide(N, X, [H|T], Acc) ->
divide(N, X+1, T, [H|Acc]).
or as a modification of Steve's solution
divide(L, N) ->
divide(L, N, []).
divide([], _, Acc) ->
lists:reverse(Acc);
divide(L, N, Acc) ->
try lists:split(N, L) of
{H,T} -> divide(T, N, [H|Acc])
catch
error:badarg ->
lists:reverse([L|Acc])
end.
or even simpler:
divide([], _) -> [];
divide(L, N) ->
try lists:split(N, L) of
{H,T} -> [H|divide(T, N)]
catch
error:badarg -> [L]
end.
You can use lists:split/2 for this:
divide(L, N) ->
divide(L, N, []).
divide([], _, Acc) ->
lists:reverse(Acc);
divide(L, N, Acc) when length(L) < N ->
lists:reverse([L|Acc]);
divide(L, N, Acc) ->
{H,T} = lists:split(N, L),
divide(T, N, [H|Acc]).
The first function, divide/2, serves as the entry point. It merely calls the helper function divide/3 with an initial accumulator value of an empty list, and then divide/3 does all the work. The first clause of divide/3 matches when the list has been completely processed, so it just reverses the accumulator and returns that value. The second clause handles the case when the length of L is less than the requested N value; it creates a new accumulator by prepending Acc with L and then returning the reverse of that new accumulator. The third clause first calls lists:split/2 to split the incoming list into H, which is a list of N elements, and T, the remainder of the list. It then calls itself recursively, passing T as the new list value, the original N value, and a new accumulator consisting of H as the first element and the original accumulator, Acc, as the tail.

List to list of tuples convertion

I want to convert [z,z,a,z,z,a,a,z] to [{z,2},{a,1},{z,2},{a,2},{z,1}]. How can I do it?
So, I need to accumulate previous value, counter of it and list of tuples.
I've create record
-record(acc, {previous, counter, tuples}).
Redefined
listToTuples([]) -> [];
listToTuples([H | Tail]) ->
Acc = #acc{previous=H, counter=1},
listToTuples([Tail], Acc).
But then I have some trouble
listToTuples([H | Tail], Acc) ->
case H == Acc#acc.previous of
true ->
false ->
end.
if you build up your answer (Acc) in reverse, the previous will be the head of that list.
here's how i would do it --
list_pairs(List) -> list_pairs(List, []).
list_pairs([], Acc) -> lists:reverse(Acc);
list_pairs([H|T], [{H, Count}|Acc]) -> list_pairs(T, [{H, Count+1}|Acc]);
list_pairs([H|T], Acc) -> list_pairs(T, [{H, 1}|Acc]).
(i expect someone will now follow with a one-line list comprehension version..)
I would continue on the road building the list in reverse. Notice the pattern matching over X on the first line.
F = fun(X,[{X,N}|Rest]) -> [{X,N+1}|Rest];
(X,Rest) -> [{X,1}|Rest] end.
lists:foldr(F,[],List).
I would personally use lists:foldr/3 or do it by hand with something like:
list_to_tuples([H|T]) -> list_to_tuples(T, H, 1);
list_to_tuples([]) -> [].
list_to_tuples([H|T], H, C) -> list_to_tuples(T, H, C+1);
list_to_tuples([H|T], P, C) -> [{P,C}|list_to_tuples(T, H, 1);
list_to_tuples([], P, C) -> [{P,C}].
Using two accumulators saves you unnecessarily building and pulling apart a tuple for every element in the list. I find writing it this way clearer.

Resources