Related
I need a help with following:
flatten ([]) -> [];
flatten([H|T]) -> H ++ flatten(T).
Input List contain other lists with a different length
For example:
flatten([[1,2,3],[4,7],[9,9,9,9,9,9]]).
What is the time complexity of this function?
And why?
I got it to O(n) where n is a number of elements in the Input list.
For example:
flatten([[1,2,3],[4,7],[9,9,9,9,9,9]]) n=3
flatten([[1,2,3],[4,7],[9,9,9,9,9,9],[3,2,4],[1,4,6]]) n=5
Thanks for help.
First of all I'm not sure your code will work, at least not in the way standard library works. You could compare your function with lists:flatten/1 and maybe improve on your implementation. Try lists such as [a, [b, c]] and [[a], [b, [c]], [d]] as input and verify if you return what you expected.
Regarding complexity it is little tricky due to ++ operator and functional (immutable) nature of the language. All lists in Erlang are linked lists (not arrays like in C++), and you can not just add something to end of one without modifying it; before it was pointing to end of list, now you would like it to link to something else. And again, since it is not mutable language you have to make copy of whole list left of ++ operator, which increases complexity of this operator.
You could say that complexity of A ++ B is length(A), and it makes complexity of your function little bit greater. It would go something like length(FirstElement) + (lenght(FirstElement) + length(SecondElement)) + .... up to (without) last, which after some math magic could be simplified to (n -1) * 1/2 * k * k where n is number of elements, and k is average length of element. Or O(n^3).
If you are new to this it might seem little bit odd, but with some practice you can get hang of it. I would recommend going through few resources:
Good explanation of lists and how they are created
Documentation on list handling with DO and DO NOT parts
Short description of ++ operator myths and best practices
Chapter about recursion and tail-recursion with examples using ++ operator
Why is the following saying variable unbound?
9> {<<A:Length/binary, Rest/binary>>, Length} = {<<1,2,3,4,5>>, 3}.
* 1: variable 'Length' is unbound
It's pretty clear that Length should be 3.
I am trying to have a function with similar pattern matching, ie.:
parse(<<Body:Length/binary, Rest/binary>>, Length) ->
But if fails with the same reason. How can I achieve the pattern matching I want?
What I am really trying to achieve is parse in incoming tcp stream packets as LTV(Length, Type, Value).
At some point after I parse the the Length and the Type, I want to ready only up to Length number of bytes as the value, as the rest will probably be for the next LTV.
So my parse_value function is like this:
parse_value(Value0, Left, Callback = {Module, Function},
{length, Length, type, Type, value, Value1}) when byte_size(Value0) >= Left ->
<<Value2:Left/binary, Rest/binary>> = Value0,
Module:Function({length, Length, type, Type, value, lists:reverse([Value2 | Value1])}),
if
Rest =:= <<>> ->
{?MODULE, parse, {}};
true ->
parse(Rest, Callback, {})
end;
parse_value(Value0, Left, _, {length, Length, type, Type, value, Value1}) ->
{?MODULE, parse_value, Left - byte_size(Value0), {length, Length, type, Type, value, [Value0 | Value1]}}.
If I could do the pattern matching, I could break it up to something more pleasant to the eye.
The rules for pattern matching are that if a variable X occurs in two subpatterns, as in {X, X}, or {X, [X]}, or similar, then they have to have the same value in both positions, but the matching of each subpattern is still done in the same input environment - bindings from one side do not carry over to the other. The equality check is conceptually done afterwards, as if you had matched on {X, X2} and added a guard X =:= X2. This means that your Length field in the tuple cannot be used as input to the binary pattern, not even if you make it the leftmost element.
However, within a binary pattern, variables bound in a field can be used in other fields following it, left-to-right. Therefore, the following works (using a leading 32-bit size field in the binary):
1> <<Length:32, A:Length/binary, Rest/binary>> = <<0,0,0,3,1,2,3,4,5>>.
<<0,0,0,3,1,2,3,4,5>>
2> A.
<<1,2,3>>
3> Rest.
<<4,5>>
I've run into this before. There is some weirdness between what is happening inside binary syntax and what happens during unification (matching). I suspect that it is just that binary syntax and matching occur at different times in the VM somewhere (we don't know which Length is failing to get assigned -- maybe binary matching is always first in evaluation, so Length is still meaningless). I was once going to dig in and find out, but then I realized that I never really needed to solve this problem -- which might be why it was never "solved".
Fortunately, this won't stop you with whatever you are doing.
Unfortunately, we can't really help further unless you explain the context in which you think this kind of a match is a good idea (you are having an X-Y problem).
In binary parsing you can always force the situation to be one of the following:
Have a fixed-sized header at the beginning of the binary message that tells you the next size element you need (and from there that can continue as a chain of associations endlessly)
Inspect the binary once on entry to determine the size you are looking for, pull that one value, and then begin the real parsing task
Have a set of fields, all of predetermined sizes that conform to some a binary schema standard
Convert the binary to a list and iterate through it with any arbitrary amount of look-ahead and backtracking you might need
Quick Solution
Without knowing anything else about your general problem, a typical solution would look like:
parse(Length, Bin) ->
<<Body:Length/binary, Rest/binary>> = Bin,
ok = do_something(Body),
do_other_stuff(Rest).
But I smell something funky here.
Having things like this in your code is almost always a sign that a more fundamental aspect of the code structure is not in agreement with the data that you are handling.
But deadlines.
Erlang is all about practical code that satisfies your goals in the real world. With that in mind, I suggest that you do something like the above for now, and then return to this problem domain and rethink it. Then refactor it. This will gain you three benefits:
Something will work right away.
You will later learn something fundamental about parsing in general.
Your code will almost certainly run faster if it fits your data better.
Example
Here is an example in the shell:
1> Parse =
1> fun
1> (Length, Bin) when Length =< byte_size(Bin) ->
1> <<Body:Length/binary, Rest/binary>> = Bin,
1> ok = io:format("Chopped off ~p bytes: ~p~n", [Length, Body]),
1> Rest;
1> (Length, Bin) ->
1> ok = io:format("Binary shorter than ~p~n", [Length]),
1> Bin
1> end.
#Fun<erl_eval.12.87737649>
2> Parse(3, <<1,2,3,4,5>>).
Chopped off 3 bytes: <<1,2,3>>
<<4,5>>
3> Parse(8, <<1,2,3,4,5>>).
Binary shorter than 8
<<1,2,3,4,5>>
Note that this version is a little safer, in that we avoid a crash in the case that Length is longer than the binary. This is yet another good reason why maybe we can't do that match in the function head.
Try with below code:
{<<A:Length/binary, Rest/binary>>, _} = {_, Length} = {<<1,2,3,4,5>>, 3}.
This question is mentioned a bit in EEP-52:
Any variables used in the expression must have been previously bound, or become bound in the same binary pattern as the expression. That is, the following example is illegal:
illegal_example2(N, <<X:N,T/binary>>) ->
{X,T}.
And explained a bit more in the following e-mail: http://erlang.org/pipermail/eeps/2020-January/000636.html
Illegal. With one exception, matching is not done in a left-to-right
order, but all variables in the pattern will be bound at the same
time. That means that the variables must be bound before the match
starts. For maps, that means that the variables referenced in key
expressions must be bound before the case (or receive) that matches
the map. In a function head, all map keys must be literals.
The exception to this general rule is that within a binary pattern,
the segments are matched from left to right, and a variable bound in a
previous segment can be used in the size expression for a segment
later in the binary pattern.
Also one of the members of OTP team mentioned that they made a prototype that can do that, but it was never finished http://erlang.org/pipermail/erlang-questions/2020-May/099538.html
We actually tried to make your example legal. The transformation of
the code that we did was not to rewrite to guards, but to match
arguments or parts of argument in the right order so that variables
that input variables would be bound before being used. (We would do a
topological sort to find the correct order.) For your example, the
transformation would look similar to this:
legal_example(Key, Map) ->
case Map of
#{Key := Value} -> Value;
_ -> error(function_clause, [Key, Map])
end.
In the prototype implementation, the compiler could compile the
following example:
convoluted(Ref,
#{ node(Ref) := NodeId, Loop := universal_answer},
[{NodeId, Size} | T],
<<Int:(Size*8+length(T)),Loop>>) when is_reference(Ref) ->
Int.
Things started to fall apart when variables are repeated. Repeated
variables in patterns already have a meaning in Erlang (they should be
the same), so it become tricky to understand to distinguish between
variables being bound or variables being used a binary size or map
key. Here is an example that the prototype couldn't handle:
foo(#{K := K}, K) -> ok.
A human can see that it should be transformed similar to this:
foo(Map, K) -> case Map of
{K := V} when K =:= V -> ok end.
Here are few other examples that should work but the prototype would
refuse to compile (often emitting an incomprehensible error message):
bin2(<<Sz:8,X:Sz>>, <<Y:Sz>>) -> {X,Y}.
repeated_vars(#{K := #{K := K}}, K) -> K.
match_map_bs(#{K1 := {bin,<<Int:Sz>>}, K2 := <<Sz:8>>}, {K1,K2}) ->
Int.
Another problem was when example was correctly rejected, the error
message would be confusing.
Because much more work would clearly be needed, we have shelved the
idea for now. Personally, I am not sure that the idea is sound in the
first place. But I am sure of one thing: the implementation would be
very complicated.
UPD: latest news from 2020-05-14
Is there a generic way, given a complex object in Erlang, to come up with a valid function declaration for it besides eyeballing it? I'm maintaining some code previously written by someone who was a big fan of giant structures, and it's proving to be error prone doing it manually.
I don't need to iterate the whole thing, just grab the top level, per se.
For example, I'm working on this right now -
[[["SIP",47,"2",46,"0"],32,"407",32,"Proxy Authentication Required","\r\n"],
[{'Via',
[{'via-parm',
{'sent-protocol',"SIP","2.0","UDP"},
{'sent-by',"172.20.10.5","5060"},
[{'via-branch',"z9hG4bKb561e4f03a40c4439ba375b2ac3c9f91.0"}]}]},
{'Via',
[{'via-parm',
{'sent-protocol',"SIP","2.0","UDP"},
{'sent-by',"172.20.10.15","5060"},
[{'via-branch',"12dee0b2f48309f40b7857b9c73be9ac"}]}]},
{'From',
{'from-spec',
{'name-addr',
[[]],
{'SIP-URI',
[{userinfo,{user,"003018CFE4EF"},[]}],
{hostport,"172.20.10.11",[]},
{'uri-parameters',[]},
[]}},
[{tag,"b7226ffa86c46af7bf6e32969ad16940"}]}},
{'To',
{'name-addr',
[[]],
{'SIP-URI',
[{userinfo,{user,"3966"},[]}],
{hostport,"172.20.10.11",[]},
{'uri-parameters',[]},
[]}},
[{tag,"a830c764"}]},
{'Call-ID',"90df0e4968c9a4545a009b1adf268605#172.20.10.15"},
{'CSeq',1358286,"SUBSCRIBE"},
["date",'HCOLON',
["Mon",44,32,["13",32,"Jun",32,"2011"],32,["17",58,"03",58,"55"],32,"GMT"]],
{'Contact',
[[{'name-addr',
[[]],
{'SIP-URI',
[{userinfo,{user,"3ComCallProcessor"},[]}],
{hostport,"172.20.10.11",[]},
{'uri-parameters',[]},
[]}},
[]],
[]]},
["expires",'HCOLON',3600],
["user-agent",'HCOLON',
["3Com",[]],
[['LWS',["VCX",[]]],
['LWS',["7210",[]]],
['LWS',["IP",[]]],
['LWS',["CallProcessor",[['SLASH',"v10.0.8"]]]]]],
["proxy-authenticate",'HCOLON',
["Digest",'LWS',
["realm",'EQUAL',['SWS',34,"3Com",34]],
[['COMMA',["domain",'EQUAL',['SWS',34,"3Com",34]]],
['COMMA',
["nonce",'EQUAL',
['SWS',34,"btbvbsbzbBbAbwbybvbxbCbtbzbubqbubsbqbtbsbqbtbxbCbxbsbybs",
34]]],
['COMMA',["stale",'EQUAL',"FALSE"]],
['COMMA',["algorithm",'EQUAL',"MD5"]]]]],
{'Content-Length',0}],
"\r\n",
["\n"]]
Maybe https://github.com/etrepum/kvc
I noticed your clarifying comment. I'd prefer to add a comment myself, but don't have enough karma. Anyway, the trick I use for that is to experiment in the shell. I'll iterate a pattern against a sample data structure until I've found the simplest form. You can use the _ match-all variable. I use an erlang shell inside an emacs shell window.
First, bind a sample to a variable:
A = [{a,b},[{c,d}, {e,f}]].
Now set the original structure against the variable:
[{a,b},[{c,d},{e,f}]] = A.
If you hit enter, you'll see they match. Hit alt-p (forget what emacs calls alt, but it's alt on my keyboard) to bring back the previous line. Replace some tuple or list item with an underscore:
[_,[{c,d},{e,f}]].
Hit enter to make sure you did it right and they still match. This example is trivial, but for deeply nested, multiline structures it's trickier, so it's handy to be able to just quickly match to test. Sometimes you'll want to try to guess at whole huge swaths, like using an underscore to match a tuple list inside a tuple that's the third element of a list. If you place it right, you can match the whole thing at once, but it's easy to misread it.
Anyway, repeat to explore the essential shape of the structure and place real variables where you want to pull out values:
[_, [_, _]] = A.
[_, _] = A.
[_, MyTupleList] = A. %% let's grab this tuple list
[{MyAtom,b}, [{c,d}, MyTuple]] = A. %% or maybe we want this atom and tuple
That's how I efficiently dissect and pattern match complex data structures.
However, I don't know what you're doing. I'd be inclined to have a wrapper function that uses KVC to pull out exactly what you need and then distributes to helper functions from there for each type of structure.
If I understand you correctly you want to pattern match some large datastructures of unknown formatting.
Example:
Input: {a, b} {a,b,c,d} {a,[],{},{b,c}}
function({A, B}) -> do_something;
function({A, B, C, D}) when is_atom(B) -> do_something_else;
function({A, B, C, D}) when is_list(B) -> more_doing.
The generic answer is of course that it is undecidable from just data to know how to categorize that data.
First you should probably be aware of iolists. They are created by functions such as io_lib:format/2 and in many other places in the code.
One example is that
[["SIP",47,"2",46,"0"],32,"407",32,"Proxy Authentication Required","\r\n"]
will print as
SIP/2.0 407 Proxy Authentication Required
So, I'd start with flattening all those lists, using a function such as
flatten_io(List) when is_list(List) ->
Flat = lists:map(fun flatten_io/1, List),
maybe_flatten(Flat);
flatten_io(Tuple) when is_tuple(Tuple) ->
list_to_tuple([flatten_io(Element) || Element <- tuple_to_list(Tuple)];
flatten_io(Other) -> Other.
maybe_flatten(L) when is_list(L) ->
case lists:all(fun(Ch) when Ch > 0 andalso Ch < 256 -> true;
(List) when is_list(List) ->
lists:all(fun(X) -> X > 0 andalso X < 256 end, List);
(_) -> false
end, L) of
true -> lists:flatten(L);
false -> L
end.
(Caveat: completely untested and quite inefficient. Will also crash for inproper lists, but you shouldn't have those in your data structures anyway.)
On second thought, I can't help you. Any data structure that uses the atom 'COMMA' for a comma in a string should be taken out and shot.
You should be able to flatten those things as well and start to get a view of what you are looking at.
I know that this is not a complete answer. Hope it helps.
Its hard to recommend something for handling this.
Transforming all the structures in a more sane and also more minimal format looks like its worth it. This depends mainly on the similarities in these structures.
Rather than having a special function for each of the 100 there must be some automatic reformatting that can be done, maybe even put the parts in records.
Once you have records its much easier to write functions for it since you don't need to know the actual number of elements in the record. More important: your code won't break when the number of elements changes.
To summarize: make a barrier between your code and the insanity of these structures by somehow sanitizing them by the most generic code possible. It will be probably a mix of generic reformatting with structure speicific stuff.
As an example already visible in this struct: the 'name-addr' tuples look like they have a uniform structure. So you can recurse over your structures (over all elements of tuples and lists) and match for "things" that have a common structure like 'name-addr' and replace these with nice records.
In order to help you eyeballing you can write yourself helper functions along this example:
eyeball(List) when is_list(List) ->
io:format("List with length ~b\n", [length(List)]);
eyeball(Tuple) when is_tuple(Tuple) ->
io:format("Tuple with ~b elements\n", [tuple_size(Tuple)]).
So you would get output like this:
2> eyeball({a,b,c}).
Tuple with 3 elements
ok
3> eyeball([a,b,c]).
List with length 3
ok
expansion of this in a useful tool for your use is left as an exercise. You could handle multiple levels by recursing over the elements and indenting the output.
Use pattern matching and functions that work on lists to extract only what you need.
Look at http://www.erlang.org/doc/man/lists.html:
keyfind, keyreplace, L = [H|T], ...
From the other languages I program in, I'm used to having ranges. In Python, if I want all numbers one up to 100, I write range(1, 101). Similarly, in Haskell I'd write [1..100] and in Scala I'd write 1 to 100.
I can't find something similar in Erlang, either in the syntax or the library. I know that this would be fairly simple to implement myself, but I wanted to make sure it doesn't exist elsewhere first (particularly since a standard library or language implementation would be loads more efficient).
Is there a way to do ranges either in the Erlang language or standard library? Or is there some idiom that I'm missing? I just want to know if I should implement it myself.
I'm also open to the possibility that I shouldn't want to use a range in Erlang (I wouldn't want to be coding Python or Haskell in Erlang). Also, if I do need to implement this myself, if you have any good suggestions for improving performance, I'd love to hear them :)
From http://www.erlang.org/doc/man/lists.html it looks like lists:seq(1, 100) does what you want. You can also do things like lists:seq(1, 100, 2) to get all of the odd numbers in that range instead.
You can use list:seq(From, TO) that's say #bitilly, and also you can use list comprehensions to add more functionality, for example:
1> [X || X <- lists:seq(1,100), X rem 2 == 0].
[2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,
44,46,48,50,52,54,56,58|...]
There is a difference between range in Ruby and list:seq in Erlang. Ruby's range doesn't create list and rely on next method, so (1..HugeInteger).each { ... } will not eat up memory. Erlang lists:seq will create list (or I believe it will). So when range is used for side effects, it does make a difference.
P.S. Not just for side effects:
(1..HugeInteger).inject(0) { |s, v| s + v % 1000000 == 0 ? 1 : 0 }
will work the same way as each, not creating a list. Erlang way for this is to create a recursive function. In fact, it is a concealed loop anyway.
Example of lazy stream in Erlang. Although it is not Erlang specific, I guess it can be done in any language with lambdas. New lambda gets created every time stream is advanced so it might put some strain on garbage collector.
range(From, To, _) when From > To ->
done;
range(From, To, Step) ->
{From, fun() -> range(From + Step, To, Step) end}.
list(done) ->
[];
list({Value, Iterator}) ->
[Value | list(Iterator())].
% ----- usage example ------
list_odd_numbers(From, To) ->
list(range(From bor 1, To, 2)).
As suggested in answers to a previous question, I tried using Erlang proplists to implement a prefix trie.
The code seems to work decently well... But, for some reason, it doesn't play well with the interactive shell. When I try to run it, the shell hangs:
> Trie = trie:from_dict(). % Creates a trie from a dictionary
% ... the trie is printed ...
% Then nothing happens
I see the new trie printed to the screen (ie, the call to trie:from_dict() has returned), then the shell just hangs. No new > prompt comes up and ^g doesn't do anything (but ^c will eventually kill it off).
With a very small dictionary (the first 50 lines of /usr/share/dict/words), the hang only lasts a second or two (and the trie is built almost instantly)... But it seems to grow exponentially with the size of the dictionary (100 words takes 5 or 10 seconds, I haven't had the patience to try larger wordlists). Also, as the shell is hanging, I notice that the beam.smp process starts eating up a lot of memory (somewhere between 1 and 2 gigs).
So, is there anything obvious that could be causing this shell hang and incredible memory usage?
Some various comments:
I have a hunch that the garbage collector is at fault, but I don't know how to profile or create an experiment to test that.
I've tried profiling with eprof and nothing obvious showed up.
Here is my "add string to trie" function:
add([], Trie) ->
[ stop | Trie ];
add([Ch|Rest], Trie) ->
SubTrie = proplists:get_value(Ch, Trie, []),
NewSubTrie = add(Rest, SubTrie),
NewTrie = [ { Ch, NewSubTrie } | Trie ],
% Arbitrarily decide to compress key/value list once it gets
% more than 60 pairs.
if length(NewTrie) > 60 ->
proplists:compact(NewTrie);
true ->
NewTrie
end.
The problem is (amongst others ? -- see my comment) that you are always adding a new {Ch, NewSubTrie} tuple to your proplist Tries, no matter if Ch already existed, or not.
Instead of
NewTrie = [ { Ch, NewSubTrie } | Trie ]
you need something like:
NewTrie = lists:keystore(Ch, 1, Trie, {Ch, NewSubTrie})
You're not really building a trie here. Your end result is effectively a randomly ordered proplist of proplists that requires full scans at each level when walking the list. Tries are typically implied ordering based on position in the array (or list).
Here's an implementation that uses tuples as the storage mechanism. Calling set only rebuilds the root and direct path tuples.
(note: would probably have to make the pair a triple (adding size) make delete work with any efficiency)
I believe erlang tuples are really just arrays (thought I read that somewhere), so lookup should be super fast, and modify is probably straight forward. Maybe this is faster with the array module, but I haven't really played with it much to know.
this version also stores an arbitrary value, so you can do things like:
1> c(trie).
{ok,trie}
2> trie:get("ab",trie:set("aa",bar,trie:new("ab",foo))).
foo
3> trie:get("abc",trie:set("aa",bar,trie:new("ab",foo))).
undefined
4>
code (entire module): note2: assumes lower case non empty string keys
-module(trie).
-compile(export_all).
-define(NEW,{ %% 26 pairs, to avoid cost of calculating a new level at runtime
{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},
{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},
{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},
{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},
{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},
{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},
{undefined,nodepth},{undefined,nodepth}
}
).
-define(POS(Ch), Ch - $a + 1).
new(Key,V) -> set(Key,V,?NEW).
set([H],V,Trie) ->
Pos = ?POS(H),
{_,SubTrie} = element(Pos,Trie),
setelement(Pos,Trie,{V,SubTrie});
set([H|T],V,Trie) ->
Pos = ?POS(H),
{SubKey,SubTrie} = element(Pos,Trie),
case SubTrie of
nodepth -> setelement(Pos,Trie,{SubKey,set(T,V,?NEW)});
SubTrie -> setelement(Pos,Trie,{SubKey,set(T,V,SubTrie)})
end.
get([H],Trie) ->
{Val,_} = element(?POS(H),Trie),
Val;
get([H|T],Trie) ->
case element(?POS(H),Trie) of
{_,nodepth} -> undefined;
{_,SubTrie} -> get(T,SubTrie)
end.