Erlang repetition string in string

Erlang repetition string in string - erlang

I have a string:
"abc abc abc abc"
How do I calculate the number of "abc" repetitions?

If you are looking for practical and efficient implementation which will scale well for even longer substrings you can use binary:matches/2,3 which is using Boyer–Moore string search algorithm (and Aho-Corasic for multiple substrings). It obviously works only for ASCII or Latin1 strings.
repeats(L, S) -> length(binary:matches(list_to_binary(L), list_to_binary(S))).
If it is for education purposes, you can write your own less efficient version for lists of any kind. If you know substring in compile time you can use very simple and not so much bad in performance:
-define(SUBSTR, "abc").
repeats(L) -> repeats(L, 0).
repeats(?SUBSTR ++ L, N) -> repeats(L, N+1);
repeats([_|L] , N) -> repeats(L, N);
repeats([] , N) -> N.
If you don't know substring you can write a little bit more complicated and less efficient
repeats(L, S) -> repeats(L, S, 0).
repeats([], _, N) -> N;
repeats(L, S, N) ->
case prefix(L, S) of
{found, L2} -> repeats( L2, S, N+1);
nope -> repeats(tl(L), S, N)
end.
prefix([H|T], [H|S]) -> prefix(T, S);
prefix( L, [ ]) -> {found, L};
prefix( _, _ ) -> nope.
And you, of course, can try write some more sophisticated variant as simplified Boyer–Moore for lists.

1> F = fun
F([],_,_,N) -> N;
F(L,P,S,N) ->
case string:sub_string(L,1,S) == P of
true -> F(tl(string:sub_string(L,S,length(L))),P,S,N+1);
_ -> F(tl(L),P,S,N)
end
end.
#Fun<erl_eval.28.106461118>
2> Find = fun(L,P) -> F(L,P,length(P),0) end.
#Fun<erl_eval.12.106461118>
3> Find("abc abc abc abc","abc").
4
4>
this works if defined in a module, or in the shell but only with the R17.

length(lists:filter(fun(X) -> X=="abc" end, string:tokens("abc abc abc abc", " "))).

Related

How to split a list of strings into given number of lists in erlang

Given a list and an integer, I want to split that list into the specified number of lists (inside a list).
For example:
Input:
[1,2,3,4,5,6,7,8,9], 3
Output:
[[1,2,3],[4,5,6],[7,8,9]]
What is a clean and efficient way to do this?

The solution written by Steve Vinoski calls length/1 in guard for each partition which makes it O(N^2). It simply bothers me because it can be done in O(N) and I am performance freak. It can be done in many ways so just for example there is one:
divide(L, N) when is_integer(N), N > 0 ->
divide(N, 0, L, []).
divide(_, _, [], Acc) ->
[lists:reverse(Acc)];
divide(N, N, L, Acc) ->
[lists:reverse(Acc) | divide(N, 0, L, [])];
divide(N, X, [H|T], Acc) ->
divide(N, X+1, T, [H|Acc]).
or as a modification of Steve's solution
divide(L, N) ->
divide(L, N, []).
divide([], _, Acc) ->
lists:reverse(Acc);
divide(L, N, Acc) ->
try lists:split(N, L) of
{H,T} -> divide(T, N, [H|Acc])
catch
error:badarg ->
lists:reverse([L|Acc])
end.
or even simpler:
divide([], _) -> [];
divide(L, N) ->
try lists:split(N, L) of
{H,T} -> [H|divide(T, N)]
catch
error:badarg -> [L]
end.

You can use lists:split/2 for this:
divide(L, N) ->
divide(L, N, []).
divide([], _, Acc) ->
lists:reverse(Acc);
divide(L, N, Acc) when length(L) < N ->
lists:reverse([L|Acc]);
divide(L, N, Acc) ->
{H,T} = lists:split(N, L),
divide(T, N, [H|Acc]).
The first function, divide/2, serves as the entry point. It merely calls the helper function divide/3 with an initial accumulator value of an empty list, and then divide/3 does all the work. The first clause of divide/3 matches when the list has been completely processed, so it just reverses the accumulator and returns that value. The second clause handles the case when the length of L is less than the requested N value; it creates a new accumulator by prepending Acc with L and then returning the reverse of that new accumulator. The third clause first calls lists:split/2 to split the incoming list into H, which is a list of N elements, and T, the remainder of the list. It then calls itself recursively, passing T as the new list value, the original N value, and a new accumulator consisting of H as the first element and the original accumulator, Acc, as the tail.

How to collect frequencies of characters using a list of tuples {char,freq} in Erlang

I am supposed to collect frequencies of characters.
freq(Sample) -> freq(Sample,[]).
freq([],Freq) ->
Freq;
freq([Char|Rest],Freq)->
freq(Rest,[{Char,1}|Freq]).
This function does not work in the right way. If the input is "foo", then the output will be
[{f,1},{o,1},{o,1}].
But I wished to have the output like
[{f,1},{o,2}].
I can't manage to modify element in a tulpe. Can anyone help me out of this and show me how it can be fixed?

a one line solution :o)
% generate a random list
L = [random:uniform(26)+$a-1 || _ <- lists:seq(1,1000)].
% collect frequency
lists:foldl(fun(X,[{[X],I}|Q]) -> [{[X],I+1}|Q] ; (X,Acc) -> [{[X],1}|Acc] end , [], lists:sort(L)).
in action
1> lists:foldl(fun(X,[{[X],I}|Q]) -> [{[X],I+1}|Q] ; (X,Acc) -> [{[X],1}|Acc] end , [], lists:sort("foo")).
[{"o",2},{"f",1}]
quite fast with short list, but the execution time increase a lot with long list (on my PC, it needs 6.5s for a 1 000 000 character text) .
in comparison, with the same 1 000 000 character text Ricardo solution needs 5 sec
I will try another version using ets.

By far the easiest way is to use an orddict to store the value as it already comes with an update_counter function and returns the value in a (sorted) list.
freq(Text) ->
lists:foldl(fun (C, D) -> orddict:update_counter(C, 1, D) end, orddict:new(), Text).

Try with something like this:
freq(Text) ->
CharsDictionary = lists:foldl(fun(Char, Acc) -> dict:update_counter(Char, 1, Acc) end, dict:new(), Text),
dict:fold(fun(Char, Frequency, Acc) -> [{Char, Frequency} | Acc] end, [], CharsDictionary).
The first line creates a dictionary that uses the char as key and the frequency as value (dict:update_counter).
The second line converts the dictionary in the list that you need.

Using pattern matching and proplists.
-module(freq).
-export([char_freq/1]).
-spec char_freq(string()) -> [tuple()].
char_freq(L) -> char_freq(L, []).
char_freq([], PL) -> PL;
char_freq([H|T], PL) ->
case proplists:get_value([H], PL) of
undefined ->
char_freq(T, [{[H],1}|PL]);
N ->
L = proplists:delete([H], PL),
char_freq(T, [{[H],N+1}|L])
end.
Test
1> freq:char_freq("abacabz").
[{"z",1},{"b",2},{"a",3},{"c",1}]

L = [list_to_atom(X) || X <- Str].
D = lists:foldl(fun({Char, _}, Acc) -> dict:update_counter(Char, 1, Acc) end, dict:new(), L).
dict:to_list(D).

Splitting a list in equal sized chunks in Erlang

I want to split:
[1,2,3,4,5,6,7,8]
into:
[[1,2],[3,4],[5,6],[7,8]]
It generally works great with:
[ lists:sublist(List, X, 2) || X <- lists:seq(1,length(List),2) ] .
But it is really slow this way. 10000 Elements take amazing 2.5 seconds on my netbook. I have also written a really fast recursive function, but I am simply interested: Could this list comprehension also be written in a different way, so that it is faster?

Try this:
part(List) ->
part(List, []).
part([], Acc) ->
lists:reverse(Acc);
part([H], Acc) ->
lists:reverse([[H]|Acc]);
part([H1,H2|T], Acc) ->
part(T, [[H1,H2]|Acc]).
Test in erlang-shell (I've declared this function in module part):
2> part:part([1,2,3,4,5,6,7,8]).
[[1,2],[3,4],[5,6],[7,8]]
3>
3> timer:tc(part, part, [lists:seq(1,10000)]).
{774,
[[1,2],
[3,4],
[5,6],
[7,8],
"\t\n","\v\f",
[13,14],
[15,16],
[17,18],
[19,20],
[21,22],
[23,24],
[25,26],
[27,28],
[29,30],
[31,32],
"!\"","#$","%&","'(",")*","+,","-.","/0","12","34",
[...]|...]}
Just 774 microseconds (which is ~0,8 milliseconds)

Here are two quick solutions for you that are both flexible. One is easy to read, but only slightly faster than your proposed solution. The other is quite fast, but is a bit cryptic to read. And note that both of my proposed algorithms will work for lists of anything, not just numeric ordered lists.
Here is the "easy-to-read" one. Call by n_length_chunks(List,Chunksize). For example, to get a list of chunks 2 long, call n_length_chunks(List,2). This works for chunks of any size, ie, you could call n_length_chunks(List,4) to get [[1,2,3,4],[5,6,7,8],...]
n_length_chunks([],_) -> [];
n_length_chunks(List,Len) when Len > length(List) ->
[List];
n_length_chunks(List,Len) ->
{Head,Tail} = lists:split(Len,List),
[Head | n_length_chunks(Tail,Len)].
The much faster one is here, but is definitely harder to read, and is called in the same way: n_length_chunks_fast(List,2) (I've made one change to this compared with the one above, in that it pads the end of the list with undefined if the length of the list isn't cleanly divisible by the desired chunk length.
n_length_chunks_fast(List,Len) ->
LeaderLength = case length(List) rem Len of
0 -> 0;
N -> Len - N
end,
Leader = lists:duplicate(LeaderLength,undefined),
n_length_chunks_fast(Leader ++ lists:reverse(List),[],0,Len).
n_length_chunks_fast([],Acc,_,_) -> Acc;
n_length_chunks_fast([H|T],Acc,Pos,Max) when Pos==Max ->
n_length_chunks_fast(T,[[H] | Acc],1,Max);
n_length_chunks_fast([H|T],[HAcc | TAcc],Pos,Max) ->
n_length_chunks_fast(T,[[H | HAcc] | TAcc],Pos+1,Max);
n_length_chunks_fast([H|T],[],Pos,Max) ->
n_length_chunks_fast(T,[[H]],Pos+1,Max).
Tested on my (really old) laptop:
Your proposed solution took about 3 seconds.
My slow-but-readable one was slightly faster and takes about 1.5 seconds (still quite slow)
My fast version takes about 5 milliseconds.
For completeness, Isac's solution took about 180 milliseconds on my same machine.
Edit: wow, I need to read the complete question first. Oh well I'll keep here for posterity if it helps. As far as I can tell, there's not a good way to do this using list comprehensions. Your original version is slow because each iteration of sublist needs to traverse the list each time to get to each successive X, resulting in complexity just under O(N^2).

Or with a fold:
lists:foldr(fun(E, []) -> [[E]];
(E, [H|RAcc]) when length(H) < 2 -> [[E|H]|RAcc] ;
(E, [H|RAcc]) -> [[E],H|RAcc]
end, [], List).

I want to submit slightly complicated but more flexible (and mostly faster) solution of one proposed by #Tilman
split_list(List, Max) ->
element(1, lists:foldl(fun
(E, {[Buff|Acc], C}) when C < Max ->
{[[E|Buff]|Acc], C+1};
(E, {[Buff|Acc], _}) ->
{[[E],Buff|Acc], 1};
(E, {[], _}) ->
{[[E]], 1}
end, {[], 0}, List)).
so function part can be implemented as
part(List) ->
RevList = split_list(List, 2),
lists:foldl(fun(E, Acc) ->
[lists:reverse(E)|Acc]
end, [], RevList).
update
I've added reverse in case if you want to preserve order, but as I can see it adds no more than 20% of processing time.

You could do it like this:
1> {List1, List2} = lists:partition(fun(X) -> (X rem 2) == 1 end, List).
{[1,3,5|...],[2,4,6|...]}
2> lists:zipwith(fun(X, Y) -> [X, Y] end, List1, List2).
[[1,2],[3,4],[5,6]|...]
This takes ~73 milliseconds with a 10000 elements List on my computer. The original solution takes ~900 miliseconds.
But I would go with the recursive function anyway.

I was looking for a partition function which can split a large list to small amount of workers. With lkuty's partition you might get that one worker gets almost double work than all the others. If that's not what you want, here is a version which sublist lengths differ by at most 1.
Uses PropEr for testing.
%% #doc Split List into sub-lists so sub-lists lengths differ most by 1.
%% Does not preserve order.
-spec split_many(pos_integer(), [T]) -> [[T]] when T :: term().
split_many(N, List) ->
PieceLen = length(List) div N,
lists:reverse(split_many(PieceLen, N, List, [])).
-spec split_many(pos_integer(), pos_integer(), [T], [[T]]) ->
[[T]] when T :: term().
split_many(PieceLen, N, List, Acc) when length(Acc) < N ->
{Head, Tail} = lists:split(PieceLen, List),
split_many(PieceLen, N, Tail, [Head|Acc]);
split_many(_PieceLen, _N, List, Acc) ->
% Add an Elem to each list in Acc
{Appendable, LeaveAlone} = lists:split(length(List), Acc),
Appended = [[Elem|XS] || {Elem, XS} <- lists:zip(List, Appendable)],
lists:append(Appended, LeaveAlone).
Tests:
split_many_test_() ->
[
?_assertEqual([[1,2]], elibs_lists:split_many(1, [1,2])),
?_assertEqual([[1], [2]], elibs_lists:split_many(2, [1,2])),
?_assertEqual([[1], [3,2]], elibs_lists:split_many(2, [1,2,3])),
?_assertEqual([[1], [2], [4,3]], elibs_lists:split_many(3, [1,2,3,4])),
?_assertEqual([[1,2], [5,3,4]], elibs_lists:split_many(2, [1,2,3,4,5])),
?_assert(proper:quickcheck(split_many_proper1())),
?_assert(proper:quickcheck(split_many_proper2()))
].
%% #doc Verify all elements are preserved, number of groups is correct,
%% all groups have same number of elements (+-1)
split_many_proper1() ->
?FORALL({List, Groups},
{list(), pos_integer()},
begin
Split = elibs_lists:split_many(Groups, List),
% Lengths of sub-lists
Lengths = lists:usort(lists:map(fun erlang:length/1, Split)),
length(Split) =:= Groups andalso
lists:sort(lists:append(Split)) == lists:sort(List) andalso
length(Lengths) =< 2 andalso
case Lengths of
[Min, Max] -> Max == Min + 1;
[_] -> true
end
end
).
%% #doc If number of groups is divisable by number of elements, ordering must
%% stay the same
split_many_proper2() ->
?FORALL({Groups, List},
?LET({A, B},
{integer(1, 20), integer(1, 10)},
{A, vector(A*B, term())}),
List =:= lists:append(elibs_lists:split_many(Groups, List))
).

Here is a more general answer that works with any sublist size.
1> lists:foreach(fun(N) -> io:format("~2.10.0B -> ~w~n",[N, test:partition([1,2,3,4,5,6,7,8,9,10],N)] ) end, [1,2,3,4,5,6,7,8,9,10]).
01 -> [[1],[2],[3],[4],[5],[6],[7],[8],[9],[10]]
02 -> [[1,2],[3,4],[5,6],[7,8],[9,10]]
03 -> [[1,2,3],[4,5,6],[7,8,9],[10]]
04 -> [[1,2,3,4],[5,6,7,8],[10,9]]
05 -> [[1,2,3,4,5],[6,7,8,9,10]]
06 -> [[1,2,3,4,5,6],[10,9,8,7]]
07 -> [[1,2,3,4,5,6,7],[10,9,8]]
08 -> [[1,2,3,4,5,6,7,8],[10,9]]
09 -> [[1,2,3,4,5,6,7,8,9],[10]]
10 -> [[1,2,3,4,5,6,7,8,9,10]]
And the code to achieve this is stored inside a file called test.erl:
-module(test).
-compile(export_all).
partition(List, N) ->
partition(List, 1, N, []).
partition([], _C, _N, Acc) ->
lists:reverse(Acc) ;
partition([H|T], 1, N, Acc) ->
partition(T, 2, N, [[H]|Acc]) ;
partition([H|T], C, N, [HAcc|TAcc]) when C < N ->
partition(T, C+1, N, [[H|HAcc]|TAcc]) ;
partition([H|T], C, N, [HAcc|TAcc]) when C == N ->
partition(T, 1, N, [lists:reverse([H|HAcc])|TAcc]) ;
partition(L, C, N, Acc) when C > N ->
partition(L, 1, N, Acc).
It could probably be more elegant regarding the special case where C > N. Note that C is the size of the current sublist being constructed. At start, it is 1. And then it increments until it reaches the partition size of N.
We could also use a modified version of #chops code to let the last list contains the remaining items even if its size < N :
-module(n_length_chunks_fast).
-export([n_length_chunks_fast/2]).
n_length_chunks_fast(List,Len) ->
SkipLength = case length(List) rem Len of
0 -> 0;
N -> Len - N
end,
n_length_chunks_fast(lists:reverse(List),[],SkipLength,Len).
n_length_chunks_fast([],Acc,_Pos,_Max) -> Acc;
n_length_chunks_fast([H|T],Acc,Pos,Max) when Pos==Max ->
n_length_chunks_fast(T,[[H] | Acc],1,Max);
n_length_chunks_fast([H|T],[HAcc | TAcc],Pos,Max) ->
n_length_chunks_fast(T,[[H | HAcc] | TAcc],Pos+1,Max);
n_length_chunks_fast([H|T],[],Pos,Max) ->
n_length_chunks_fast(T,[[H]],Pos+1,Max).

I've slightly altered the implementation from #JLarky to remove the guard expression, which should be slightly faster:
split_list(List, Max) ->
element(1, lists:foldl(fun
(E, {[Buff|Acc], 1}) ->
{[[E],Buff|Acc], Max};
(E, {[Buff|Acc], C}) ->
{[[E|Buff]|Acc], C-1};
(E, {[], _}) ->
{[[E]], Max}
end, {[], Max}, List)).

version compare function, about special character

I am studying rabbitmq source code now for learning erlang technique.
The following is from rabbit_misc.erl file. The purpose is to check application's minimum version.
In the 5th and 7th sub sentance of version_compare/N, there is is a special character, which is $0. But I don't know how it happens?
My reason that it will not happens is that in the last sentance, after lists:splitwith/N, AT1 and BT1 will be started with "$.".
version_compare(A, B, lte) ->
case version_compare(A, B) of
eq -> true;
lt -> true;
gt -> false
end;
version_compare(A, B, gte) ->
case version_compare(A, B) of
eq -> true;
gt -> true;
lt -> false
end;
version_compare(A, B, Result) ->
Result =:= version_compare(A, B).
version_compare(A, A) ->
eq;
version_compare([], [$0 | B]) ->
version_compare([], dropdot(B));
version_compare([], _) ->
lt; %% 2.3 < 2.3.1
version_compare([$0 | A], []) ->
version_compare(dropdot(A), []);
version_compare(_, []) ->
gt; %% 2.3.1 > 2.3
version_compare(A, B) ->
{AStr, ATl} = lists:splitwith(fun (X) -> X =/= $. end, A),
{BStr, BTl} = lists:splitwith(fun (X) -> X =/= $. end, B),
ANum = list_to_integer(AStr),
BNum = list_to_integer(BStr),
if ANum =:= BNum -> version_compare(dropdot(ATl), dropdot(BTl));
ANum < BNum -> lt;
ANum > BNum -> gt
end.

$0 is not a special character -- this is zero string: "0".
Versions may be complex: 0.1.22.333 and splitwith/2 splits into head and tail ("0" and ".1.22.333").
I imagine that handling $0 is for cases like "1.0.0" and "1"
{"1",".0.0"} vs {"1",[]}

List to list of tuples convertion

I want to convert [z,z,a,z,z,a,a,z] to [{z,2},{a,1},{z,2},{a,2},{z,1}]. How can I do it?
So, I need to accumulate previous value, counter of it and list of tuples.
I've create record
-record(acc, {previous, counter, tuples}).
Redefined
listToTuples([]) -> [];
listToTuples([H | Tail]) ->
Acc = #acc{previous=H, counter=1},
listToTuples([Tail], Acc).
But then I have some trouble
listToTuples([H | Tail], Acc) ->
case H == Acc#acc.previous of
true ->
false ->
end.

if you build up your answer (Acc) in reverse, the previous will be the head of that list.
here's how i would do it --
list_pairs(List) -> list_pairs(List, []).
list_pairs([], Acc) -> lists:reverse(Acc);
list_pairs([H|T], [{H, Count}|Acc]) -> list_pairs(T, [{H, Count+1}|Acc]);
list_pairs([H|T], Acc) -> list_pairs(T, [{H, 1}|Acc]).
(i expect someone will now follow with a one-line list comprehension version..)

I would continue on the road building the list in reverse. Notice the pattern matching over X on the first line.
F = fun(X,[{X,N}|Rest]) -> [{X,N+1}|Rest];
(X,Rest) -> [{X,1}|Rest] end.
lists:foldr(F,[],List).

I would personally use lists:foldr/3 or do it by hand with something like:
list_to_tuples([H|T]) -> list_to_tuples(T, H, 1);
list_to_tuples([]) -> [].
list_to_tuples([H|T], H, C) -> list_to_tuples(T, H, C+1);
list_to_tuples([H|T], P, C) -> [{P,C}|list_to_tuples(T, H, 1);
list_to_tuples([], P, C) -> [{P,C}].
Using two accumulators saves you unnecessarily building and pulling apart a tuple for every element in the list. I find writing it this way clearer.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Erlang repetition string in string - erlang

I have a string: "abc abc abc abc" How do I calculate the number of "abc" repetitions?

length(lists:filter(fun(X) -> X=="abc" end, string:tokens("abc abc abc abc", " "))).

Related

How to split a list of strings into given number of lists in erlang

How to collect frequencies of characters using a list of tuples {char,freq} in Erlang

Splitting a list in equal sized chunks in Erlang

version compare function, about special character

List to list of tuples convertion

Categories

Resources