How to generate integer ranges in Erlang? - erlang

From the other languages I program in, I'm used to having ranges. In Python, if I want all numbers one up to 100, I write range(1, 101). Similarly, in Haskell I'd write [1..100] and in Scala I'd write 1 to 100.
I can't find something similar in Erlang, either in the syntax or the library. I know that this would be fairly simple to implement myself, but I wanted to make sure it doesn't exist elsewhere first (particularly since a standard library or language implementation would be loads more efficient).
Is there a way to do ranges either in the Erlang language or standard library? Or is there some idiom that I'm missing? I just want to know if I should implement it myself.
I'm also open to the possibility that I shouldn't want to use a range in Erlang (I wouldn't want to be coding Python or Haskell in Erlang). Also, if I do need to implement this myself, if you have any good suggestions for improving performance, I'd love to hear them :)

From http://www.erlang.org/doc/man/lists.html it looks like lists:seq(1, 100) does what you want. You can also do things like lists:seq(1, 100, 2) to get all of the odd numbers in that range instead.

You can use list:seq(From, TO) that's say #bitilly, and also you can use list comprehensions to add more functionality, for example:
1> [X || X <- lists:seq(1,100), X rem 2 == 0].
[2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,
44,46,48,50,52,54,56,58|...]

There is a difference between range in Ruby and list:seq in Erlang. Ruby's range doesn't create list and rely on next method, so (1..HugeInteger).each { ... } will not eat up memory. Erlang lists:seq will create list (or I believe it will). So when range is used for side effects, it does make a difference.
P.S. Not just for side effects:
(1..HugeInteger).inject(0) { |s, v| s + v % 1000000 == 0 ? 1 : 0 }
will work the same way as each, not creating a list. Erlang way for this is to create a recursive function. In fact, it is a concealed loop anyway.

Example of lazy stream in Erlang. Although it is not Erlang specific, I guess it can be done in any language with lambdas. New lambda gets created every time stream is advanced so it might put some strain on garbage collector.
range(From, To, _) when From > To ->
done;
range(From, To, Step) ->
{From, fun() -> range(From + Step, To, Step) end}.
list(done) ->
[];
list({Value, Iterator}) ->
[Value | list(Iterator())].
% ----- usage example ------
list_odd_numbers(From, To) ->
list(range(From bor 1, To, 2)).

Related

Erlang flatten function time complexity

I need a help with following:
flatten ([]) -> [];
flatten([H|T]) -> H ++ flatten(T).
Input List contain other lists with a different length
For example:
flatten([[1,2,3],[4,7],[9,9,9,9,9,9]]).
What is the time complexity of this function?
And why?
I got it to O(n) where n is a number of elements in the Input list.
For example:
flatten([[1,2,3],[4,7],[9,9,9,9,9,9]]) n=3
flatten([[1,2,3],[4,7],[9,9,9,9,9,9],[3,2,4],[1,4,6]]) n=5
Thanks for help.
First of all I'm not sure your code will work, at least not in the way standard library works. You could compare your function with lists:flatten/1 and maybe improve on your implementation. Try lists such as [a, [b, c]] and [[a], [b, [c]], [d]] as input and verify if you return what you expected.
Regarding complexity it is little tricky due to ++ operator and functional (immutable) nature of the language. All lists in Erlang are linked lists (not arrays like in C++), and you can not just add something to end of one without modifying it; before it was pointing to end of list, now you would like it to link to something else. And again, since it is not mutable language you have to make copy of whole list left of ++ operator, which increases complexity of this operator.
You could say that complexity of A ++ B is length(A), and it makes complexity of your function little bit greater. It would go something like length(FirstElement) + (lenght(FirstElement) + length(SecondElement)) + .... up to (without) last, which after some math magic could be simplified to (n -1) * 1/2 * k * k where n is number of elements, and k is average length of element. Or O(n^3).
If you are new to this it might seem little bit odd, but with some practice you can get hang of it. I would recommend going through few resources:
Good explanation of lists and how they are created
Documentation on list handling with DO and DO NOT parts
Short description of ++ operator myths and best practices
Chapter about recursion and tail-recursion with examples using ++ operator

Erlang: split binary on every char

I wrote a function that works, to split a binary to every char, but I have a feeling there is an easier way to do it:
my_binary_to_list(<<H,T/binary>>) ->
%slightly modified version of http://erlang.org/doc/efficiency_guide/binaryhandling.html
[list_to_binary([H])|my_binary_to_list(T)];
my_binary_to_list(<<>>) -> [].
> my_binary_to_list(<<"ABC">>).
[<<"A">>,<<"B">>,<<"C">>]
I think this is probably messy because of the list_to_binary([H]) because H should already be a binary.
I tried using that linked function directly but got "AA" which was not what I wanted. Then I tried just [H] and got ["A","B","C"] which was also not what I wanted.
You can create a binary from a single byte without creating a list and calling list_to_binary like this:
my_binary_to_list(<<H,T/binary>>) ->
[<<H>>|my_binary_to_list(T)];
You can also use binary comprehensions here to do the same logic as above in a single line:
1> [<<X>> || <<X>> <= <<"ABC">>].
[<<"A">>,<<"B">>,<<"C">>]
You can also directly extract binaries of size 1 (this is probably not faster than above though):
2> [X || <<X:1/binary>> <= <<"ABC">>].
[<<"A">>,<<"B">>,<<"C">>]
Edit: a quick bench using timer:tc/1 runs the second code in roughly half the time compared to first, but you should benchmark yourself before using either one for performance reasons. Maybe the second one is sharing the large binary by creating sub binaries?
1> Bin = binary:copy(<<"?">>, 1000000).
<<"????????????????????????????????????????????????????????????????????????????????????????????????????????????????????"...>>
2> timer:tc(fun() -> [<<X>> || <<X>> <= Bin] end).
{14345634,
[<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,
<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,
<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,
<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<...>>|...]}
3> timer:tc(fun() -> [X || <<X:1/binary>> <= Bin] end).
{7374003,
[<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,
<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,
<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,
<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<...>>|...]}
You can use a list comprehension with a bit string generator (<= consumes binaries, as opposed to <- which consumes lists):
> [<<A>> || <<A>> <= <<"foo">>].
[<<"f">>,<<"o">>,<<"o">>]
In your version, list_to_binary([H]) can be replaced by <<H>> - both generate a binary containing one byte. Whether using a list comprehension instead of a recursive function qualifies as "easier" might be a matter of taste.

Sieve of Erastosthenes highest prime factor in erlang

I have edited the program so that it works(with small numbers) however I do not understand how to implement an accumulator as suggested. The reason why is because P changes throughout the process, therefore I do not know in with which granularity I should break up the mother list. The Sieve of Erastosthenes is only efficient for generating smaller primes, so maybe I should have picked a different algorithm to use. Can anybody recommend a decent algorithm for calculating the highest prime factor of 600851475143? Please do not give me code I would prefer a Wikipedia article of something of that nature.
-module(sieve).
-export([find/2,mark/2,primes/1]).
primes(N) -> [2|lists:reverse(primes(lists:seq(2,N),2,[]))].
primes(_,bound_reached,[_|T]) -> T;
primes(L,P,Primes) -> NewList = mark(L,P),
NewP = find(NewList,P),
primes(NewList,NewP,[NewP|Primes]).
find([],_) -> bound_reached;
find([H|_],P) when H > P -> H;
find([_|T],P) -> find(T,P).
mark(L,P) -> lists:reverse(mark(L,P,2,[])).
mark([],_,_,NewList) -> NewList;
mark([_|T],P,Counter,NewList) when Counter rem P =:= 0 -> mark(T,P,Counter+1,[P|NewList]);
mark([H|T],P,Counter,NewList) -> mark(T,P,Counter+1,[H|NewList]).
I found writing this very difficult and I know there are a few things about it that are not very elegant, such as the way I have 2 hardcoded as a prime number. So I would appreciate any C&C and also advice about how to attack these kinds of problems. I look at other implementations and I have absoulutely no idea how the authors think in this way but its something I would like to master.
I have worked out that I can forget the list up until the most recent prime number found, however I have no idea how I am supposed to produce an end bound (subtle humour). I think there is probably something I can use like lists:seq(P,something) and the Counter would be able to handle that as I use modulo rather than resetting it to 0 each time. Ive only done AS level maths so I have no idea what this is.
I cant even do that can I? because I will have to remove multiples of 2 from the entirety of the list. Im thinking that this algorithm will not work unless I cache data to the harddrive, so I'm back to looking for a better algorithm.
I'm now considering writing an algorithm that just uses a counter and keeps a list of primes which are numbers that do not divide evenly with the previously generated prime numbers is this a good way to do it?
This is my new algorithm that I wrote I think it should work but I get the following error "sieve2.erl:7: call to local/imported function is_prime/2 is illegal in guard" I think this is just an aspect of erlang that I do not understand. However I've no idea how I could find the material to read about it. [Im purposely not using higher order functions etc as I have only read upto the bit on recursion in learnyousomeerlang.org]
-module(sieve2).
-export([primes/1]).
primes(N) -> primes(2,N,[2]).
primes(Counter,Max,Primes) when Counter =:= Max -> Primes;
primes(Counter,Max,Primes) when is_prime(Counter,Primes) -> primes(Counter+1,Max,[Counter|Primes]);
primes(Counter,Max,Primes) -> primes(Counter+1,Max,Primes).
is_prime(X, []) -> true;
is_prime(X,[H|T]) when X rem H =:= 0 -> false;
is_prime(X,[H|T]) -> prime(X,T).
The 2nd algorithm does not crash but runs too slowly, I'm thinking that I should reimplement the 1st but this time forget the numbers up until the most recently discovered prime, does anybody know what I could use as an end bound? After looking at other solutions it seems people sometimes just set an arbitrary limit i.e 2 million (this is something I do not really want to do. Others used "lazy" implementations which is what I think I am doing.
This:
lists:seq(2,N div 2)
allocates a list, and as the efficiency guide says, a list requires at least two words of memory per element. (A word is 4 or 8 bytes, depending on whether you have a 32-bit or 64-bit Erlang virtual machine.) So if N is 600851475143, this would require 48 terabytes of memory if I count correctly. (Unlike Haskell, Erlang doesn't do lazy evaluation.)
So you'd need to implement this using an accumulator, similar to what you did with Counter in the mark function. For the stop condition of the recursive function, you wouldn't check for the list being empty, but for the accumulator reaching the max value.
By the way you don't need to test all numbers up to N/2. It is enough to test up to sqrt(N).
Here I wrote a version that takes 20 seconds to find the answer on my machine. It uses kind of lazy list of primes and folding through them. It was fun because I solved some project-euler problems using Haskell quite a long ago and to use the same approach on Erlang was a bit of strange.
On your update3:
primes(Counter,Max,Primes) when Counter =:= Max -> Primes;
primes(Counter,Max,Primes) when is_prime(Counter,Primes) -> primes(Counter+1,Max,[Counter|Primes]);
primes(Counter,Max,Primes) -> primes(Counter+1,Max,Primes).
You cannot use your own defined functions as guard clauses as in Haskell. You have to rewrite it to use it in a case statement:
primes(Counter,Max,Primes) when Counter =:= Max ->
Primes;
primes(Counter,Max,Primes) ->
case is_prime(Counter,Primes) of
true ->
primes(Counter+1,Max,[Counter|Primes]);
_ ->
primes(Counter+1,Max,Primes)
end.

Creating a valid function declaration from a complex tuple/list structure

Is there a generic way, given a complex object in Erlang, to come up with a valid function declaration for it besides eyeballing it? I'm maintaining some code previously written by someone who was a big fan of giant structures, and it's proving to be error prone doing it manually.
I don't need to iterate the whole thing, just grab the top level, per se.
For example, I'm working on this right now -
[[["SIP",47,"2",46,"0"],32,"407",32,"Proxy Authentication Required","\r\n"],
[{'Via',
[{'via-parm',
{'sent-protocol',"SIP","2.0","UDP"},
{'sent-by',"172.20.10.5","5060"},
[{'via-branch',"z9hG4bKb561e4f03a40c4439ba375b2ac3c9f91.0"}]}]},
{'Via',
[{'via-parm',
{'sent-protocol',"SIP","2.0","UDP"},
{'sent-by',"172.20.10.15","5060"},
[{'via-branch',"12dee0b2f48309f40b7857b9c73be9ac"}]}]},
{'From',
{'from-spec',
{'name-addr',
[[]],
{'SIP-URI',
[{userinfo,{user,"003018CFE4EF"},[]}],
{hostport,"172.20.10.11",[]},
{'uri-parameters',[]},
[]}},
[{tag,"b7226ffa86c46af7bf6e32969ad16940"}]}},
{'To',
{'name-addr',
[[]],
{'SIP-URI',
[{userinfo,{user,"3966"},[]}],
{hostport,"172.20.10.11",[]},
{'uri-parameters',[]},
[]}},
[{tag,"a830c764"}]},
{'Call-ID',"90df0e4968c9a4545a009b1adf268605#172.20.10.15"},
{'CSeq',1358286,"SUBSCRIBE"},
["date",'HCOLON',
["Mon",44,32,["13",32,"Jun",32,"2011"],32,["17",58,"03",58,"55"],32,"GMT"]],
{'Contact',
[[{'name-addr',
[[]],
{'SIP-URI',
[{userinfo,{user,"3ComCallProcessor"},[]}],
{hostport,"172.20.10.11",[]},
{'uri-parameters',[]},
[]}},
[]],
[]]},
["expires",'HCOLON',3600],
["user-agent",'HCOLON',
["3Com",[]],
[['LWS',["VCX",[]]],
['LWS',["7210",[]]],
['LWS',["IP",[]]],
['LWS',["CallProcessor",[['SLASH',"v10.0.8"]]]]]],
["proxy-authenticate",'HCOLON',
["Digest",'LWS',
["realm",'EQUAL',['SWS',34,"3Com",34]],
[['COMMA',["domain",'EQUAL',['SWS',34,"3Com",34]]],
['COMMA',
["nonce",'EQUAL',
['SWS',34,"btbvbsbzbBbAbwbybvbxbCbtbzbubqbubsbqbtbsbqbtbxbCbxbsbybs",
34]]],
['COMMA',["stale",'EQUAL',"FALSE"]],
['COMMA',["algorithm",'EQUAL',"MD5"]]]]],
{'Content-Length',0}],
"\r\n",
["\n"]]
Maybe https://github.com/etrepum/kvc
I noticed your clarifying comment. I'd prefer to add a comment myself, but don't have enough karma. Anyway, the trick I use for that is to experiment in the shell. I'll iterate a pattern against a sample data structure until I've found the simplest form. You can use the _ match-all variable. I use an erlang shell inside an emacs shell window.
First, bind a sample to a variable:
A = [{a,b},[{c,d}, {e,f}]].
Now set the original structure against the variable:
[{a,b},[{c,d},{e,f}]] = A.
If you hit enter, you'll see they match. Hit alt-p (forget what emacs calls alt, but it's alt on my keyboard) to bring back the previous line. Replace some tuple or list item with an underscore:
[_,[{c,d},{e,f}]].
Hit enter to make sure you did it right and they still match. This example is trivial, but for deeply nested, multiline structures it's trickier, so it's handy to be able to just quickly match to test. Sometimes you'll want to try to guess at whole huge swaths, like using an underscore to match a tuple list inside a tuple that's the third element of a list. If you place it right, you can match the whole thing at once, but it's easy to misread it.
Anyway, repeat to explore the essential shape of the structure and place real variables where you want to pull out values:
[_, [_, _]] = A.
[_, _] = A.
[_, MyTupleList] = A. %% let's grab this tuple list
[{MyAtom,b}, [{c,d}, MyTuple]] = A. %% or maybe we want this atom and tuple
That's how I efficiently dissect and pattern match complex data structures.
However, I don't know what you're doing. I'd be inclined to have a wrapper function that uses KVC to pull out exactly what you need and then distributes to helper functions from there for each type of structure.
If I understand you correctly you want to pattern match some large datastructures of unknown formatting.
Example:
Input: {a, b} {a,b,c,d} {a,[],{},{b,c}}
function({A, B}) -> do_something;
function({A, B, C, D}) when is_atom(B) -> do_something_else;
function({A, B, C, D}) when is_list(B) -> more_doing.
The generic answer is of course that it is undecidable from just data to know how to categorize that data.
First you should probably be aware of iolists. They are created by functions such as io_lib:format/2 and in many other places in the code.
One example is that
[["SIP",47,"2",46,"0"],32,"407",32,"Proxy Authentication Required","\r\n"]
will print as
SIP/2.0 407 Proxy Authentication Required
So, I'd start with flattening all those lists, using a function such as
flatten_io(List) when is_list(List) ->
Flat = lists:map(fun flatten_io/1, List),
maybe_flatten(Flat);
flatten_io(Tuple) when is_tuple(Tuple) ->
list_to_tuple([flatten_io(Element) || Element <- tuple_to_list(Tuple)];
flatten_io(Other) -> Other.
maybe_flatten(L) when is_list(L) ->
case lists:all(fun(Ch) when Ch > 0 andalso Ch < 256 -> true;
(List) when is_list(List) ->
lists:all(fun(X) -> X > 0 andalso X < 256 end, List);
(_) -> false
end, L) of
true -> lists:flatten(L);
false -> L
end.
(Caveat: completely untested and quite inefficient. Will also crash for inproper lists, but you shouldn't have those in your data structures anyway.)
On second thought, I can't help you. Any data structure that uses the atom 'COMMA' for a comma in a string should be taken out and shot.
You should be able to flatten those things as well and start to get a view of what you are looking at.
I know that this is not a complete answer. Hope it helps.
Its hard to recommend something for handling this.
Transforming all the structures in a more sane and also more minimal format looks like its worth it. This depends mainly on the similarities in these structures.
Rather than having a special function for each of the 100 there must be some automatic reformatting that can be done, maybe even put the parts in records.
Once you have records its much easier to write functions for it since you don't need to know the actual number of elements in the record. More important: your code won't break when the number of elements changes.
To summarize: make a barrier between your code and the insanity of these structures by somehow sanitizing them by the most generic code possible. It will be probably a mix of generic reformatting with structure speicific stuff.
As an example already visible in this struct: the 'name-addr' tuples look like they have a uniform structure. So you can recurse over your structures (over all elements of tuples and lists) and match for "things" that have a common structure like 'name-addr' and replace these with nice records.
In order to help you eyeballing you can write yourself helper functions along this example:
eyeball(List) when is_list(List) ->
io:format("List with length ~b\n", [length(List)]);
eyeball(Tuple) when is_tuple(Tuple) ->
io:format("Tuple with ~b elements\n", [tuple_size(Tuple)]).
So you would get output like this:
2> eyeball({a,b,c}).
Tuple with 3 elements
ok
3> eyeball([a,b,c]).
List with length 3
ok
expansion of this in a useful tool for your use is left as an exercise. You could handle multiple levels by recursing over the elements and indenting the output.
Use pattern matching and functions that work on lists to extract only what you need.
Look at http://www.erlang.org/doc/man/lists.html:
keyfind, keyreplace, L = [H|T], ...

How does the Erlang compiler handle pattern matching? What does it output?

I just asked a question about how the Erlang compiler implements pattern matching, and I got some great responses, one of which is the compiled bytecode (obtained with a parameter passed to the c() directive):
{function, match, 1, 2}.
{label,1}.
{func_info,{atom,match},{atom,match},1}.
{label,2}.
{test,is_tuple,{f,3},[{x,0}]}.
{test,test_arity,{f,3},[{x,0},2]}.
{get_tuple_element,{x,0},0,{x,1}}.
{test,is_eq_exact,{f,3},[{x,1},{atom,a}]}.
return.
{label,3}.
{badmatch,{x,0}}
Its all just plain Erlang tuples. I was expecting some cryptic binary thingy, guess not. I am asking this on impulse here (I could look at the compiler source but asking questions always ends up better with extra insight), how is this output translated in the binary level?
Say {test,is_tuple,{f,3},[{x,0}]} for example. I am assuming this is one instruction, called 'test'... anyway, so this output would essentially be the AST of the bytecode level language, from which the binary encoding is just a 1-1 translation?
This is all so exciting, I had no idea that I can this easily see what the Erlang compiler break things into.
ok so I dug into the compiler source code to find the answer, and to my surprise the asm file produced with the 'S' parameter to the compile:file() function is actually consulted in as is (file:consult()) and then the tuples are checked one by one for further action(line 661 - beam_consult_asm(St) -> - compile.erl). further on then there's a generated mapping table in there (compile folder of the erlang source) that shows what the serial number of each bytecode label is, and Im guessing this is used to generate the actual binary signature of the bytecode.
great stuff. but you just gotta love the consult() function, you can almost have a lispy type syntax for a random language and avoid the need for a parser/lexer fully and just consult source code into the compiler and do stuff with it... code as data data as code...
The compiler has a so-called pattern match compiler which will take a pattern and compile it down to what is essentially a series of branches, switches and such. The code for Erlang is in v3_kernel.erl in the compiler. It uses Simon Peyton Jones, "The Implementation of Functional
Programming Languages", available online at
http://research.microsoft.com/en-us/um/people/simonpj/papers/slpj-book-1987/
Another worthy paper is the one by Peter Sestoft,
http://www.itu.dk/~sestoft/papers/match.ps.gz
which derives a pattern match compiler by inspecting partial evaluation of a simpler system. It may be an easier read, especially if you know ML.
The basic idea is that if you have, say:
% 1
f(a, b) ->
% 2
f(a, c) ->
% 3
f(b, b) ->
% 4
f(b, c) ->
Suppose now we have a call f(X, Y). Say X = a. Then only 1 and 2 are applicable. So we check Y = b and then Y = c. If on the other hand X /= a then we know that we can skip 1 and 2 and begin testing 3 and 4. The key is that if something does not match it tells us something about where the match can continue as well as when we do match. It is a set of constraints which we can solve by testing.
Pattern match compilers seek to optimize the number of tests so there are as few as possible before we have conclusion. Statically typed language have some advantages here since they may know that:
-type foo() :: a | b | c.
and then if we have
-spec f(foo() -> any().
f(a) ->
f(b) ->
f(c) ->
and we did not match f(a), f(b) then f(c) must match. Erlang has to check and then fail if it doesn't match.

Resources