How can I do dynamic pattern matching in Erlang?
Supose I have the function filter/2 :
filter(Pattern, Array)
where Pattern is a string with the pattern I want to match (e.g "{book, _ }" or "{ebook, _ }") typed by an user and Array is an array of heterogenous elements (e.g {dvd, "The Godfather" } , {book, "The Hitchhiker's Guide to the Galaxy" }, {dvd, "The Lord of Rings"}, etc) Then I would like filter/2 above to return the array of elements in Array that match Pattern.
I've tried some ideas with erl_eval without any sucess...
tks in advance.
With little bit documentation study:
Eval = fun(S) -> {ok, T, _} = erl_scan:string(S), {ok,[A]} = erl_parse:parse_exprs(T), {value, V, _} = erl_eval:expr(A,[]), V end,
FilterGen = fun(X) -> Eval(lists:flatten(["fun(",X,")->true;(_)->false end."])) end,
filter(FilterGen("{book, _}"), [{dvd, "The Godfather" } , {book, "The Hitchhiker's Guide to the Galaxy" }, {dvd, "The Lord of Rings"}]).
[{book,"The Hitchhiker's Guide to the Galaxy"}]
Is there any special reason why you want the pattern in a string?
Patterns as such don't exist in Erlang, they can really only occur in code. An alternative is to use the same conventions as with ETS match and select and write your own match function. It is really quite simple. The ETS convention uses a term to represent a pattern where the atoms '$1', '$2', etc are used as variables which can be bound and tested, and '_' is the don't care variable. So your example patterns would become:
{book,'_'}
{ebook,'_'}
{dvd,"The Godfather"}
This is probably the most efficient way of doing it. There is the possibility of using match specifications here but it would complicate the code. It depends on how complicated matching you need.
EDIT:
I add without comment code for part of the matcher:
%% match(Pattern, Value) -> {yes,Bindings} | no.
match(Pat, Val) ->
match(Pat, Val, orddict:new()).
match([H|T], [V|Vs], Bs0) ->
case match(H, V, Bs0) of
{yes,Bs1} -> match(T, Vs, Bs1);
no -> no
end;
match('_', _, Bs) -> {yes,Bs}; %Don't care variable
match(P, V, Bs) when is_atom(P) ->
case is_variable(P) of
true -> match_var(P, V, Bs); %Variable atom like '$1'
false ->
%% P just an atom.
if P =:= V -> {yes,Bs};
true -> no
end
end.
match_var(P, V, Bs) ->
case orddict:find(P, Bs) of
{ok,B} when B =:= V -> {yes,Bs};
{ok,_} -> no;
error -> {yes,orddict:store(P, V, Bs)}
end.
You can use lists:filter/2 to do the filtering part. Converting the string to code is a different matter. Are all the patterns in the form of {atom, _}? If so, you might be able to store the atom and pass that into the closure argument of lists:filter.
Several possibilities come to the mind, depending on how dynamic the patterns are and what features you need in your patterns:
If you need exactly the syntax of erlang patterns and the pattern doesnt't change very often. You could create the matching source code and write it to a file. Use compile:file to create a binary and load this with code:load_binary.
Advantage: Very fast matching
Disadvantage: overhead when pattern changes
Stuff the data from Array into ETS and use match specifications to get out the data
You might use fun2ms to help create the match specification. But fun2ms normally is used as a parse transfor during compile time. There is also a mode used by the shell that can be made to work from strings with the help of the parser probably. For details see ms_transform
There might also be some way to use qlc but I didn't look into this in detail.
In any case be careful to sanitize your matching data if it comes from untrusted sources!
Related
I am working on simple list functions in Erlang to learn the syntax.
Everything was looking very similar to code I wrote for the Prolog version of these functions until I got to an implementation of 'intersection'.
The cleanest solution I could come up with:
myIntersection([],_) -> [];
myIntersection([X|Xs],Ys) ->
UseFirst = myMember(X,Ys),
myIntersection(UseFirst,X,Xs,Ys).
myIntersection(true,X,Xs,Ys) ->
[X|myIntersection(Xs,Ys)];
myIntersection(_,_,Xs,Ys) ->
myIntersection(Xs,Ys).
To me, this feels slightly like a hack. Is there a more canonical way to handle this? By 'canonical', I mean an implementation true to the spirit of what Erlang's design.
Note: the essence of this question is conditional handling of user-defined predicate functions. I am not asking for someone to point me to a library function. Thanks!
I like this one:
inter(L1,L2) -> inter(lists:sort(L1),lists:sort(L2),[]).
inter([H1|T1],[H1|T2],Acc) -> inter(T1,T2,[H1|Acc]);
inter([H1|T1],[H2|T2],Acc) when H1 < H2 -> inter(T1,[H2|T2],Acc);
inter([H1|T1],[_|T2],Acc) -> inter([H1|T1],T2,Acc);
inter([],_,Acc) -> Acc;
inter(_,_,Acc) -> Acc.
it gives the exact intersection:
inter("abcd","efgh") -> []
inter("abcd","efagh") -> "a"
inter("abcd","efagah") -> "a"
inter("agbacd","eafagha") -> "aag"
if you want that a value appears only once, simply replace one of the lists:sort/1 function by lists:usort/1
Edit
As #9000 says, one clause is useless:
inter(L1,L2) -> inter(lists:sort(L1),lists:sort(L2),[]).
inter([H1|T1],[H1|T2],Acc) -> inter(T1,T2,[H1|Acc]);
inter([H1|T1],[H2|T2],Acc) when H1 < H2 -> inter(T1,[H2|T2],Acc);
inter([H1|T1],[_|T2],Acc) -> inter([H1|T1],T2,Acc);
inter(_,_,Acc) -> Acc.
gives the same result, and
inter(L1,L2) -> inter(lists:usort(L1),lists:sort(L2),[]).
inter([H1|T1],[H1|T2],Acc) -> inter(T1,T2,[H1|Acc]);
inter([H1|T1],[H2|T2],Acc) when H1 < H2 -> inter(T1,[H2|T2],Acc);
inter([H1|T1],[_|T2],Acc) -> inter([H1|T1],T2,Acc);
inter(_,_,Acc) -> Acc.
removes any duplicate in the output.
If you know that there are no duplicate values in the input list, I think that
inter(L1,L2) -> [X || X <- L1, Y <- L2, X == Y].
is the shorter code solution but much slower (1 second to evaluate the intersection of 2 lists of 10 000 elements compare to 16ms for the previous solution, and an O(2) complexity comparable to #David Varela proposal; the ratio is 70s compare to 280ms with 2 lists of 100 000 elements!, an I guess there is a very high risk to run out of memory with bigger lists)
The canonical way ("canonical" as in "SICP") is to use an accumulator.
myIntersection(A, B) -> myIntersectionInner(A, B, []).
myIntersectionInner([], _, Acc) -> Acc;
myIntersectionInner(_, [], Acc) -> Acc;
myIntersectionInner([A|As], B, Acc) ->
case myMember(A, Bs) of
true ->
myIntersectionInner(As, Bs, [A|Acc]);
false ->
myIntersectionInner(As, Bs, [Acc]);
end.
This implementation of course produces duplicates if duplicates are present in both inputs. This can be fixed at the expense of calling myMember(A, Acc) and only appending A is the result is negative.
My apologies for the approximate syntax.
Although I appreciate the efficient implementations suggested, my intention was to better understand Erlang's implementation. As a beginner, I think #7stud's comment, particularly http://erlang.org/pipermail/erlang-questions/2009-December/048101.html, was the most illuminating. In essence, 'case' and pattern matching in functions use the same mechanism under the hood, although functions should be preferred for clarity.
In a real system, I would go with one of #Pascal's implementations; depending on whether 'intersect' did any heavy lifting.
Basically I have a structure that includes a Value and a list of Ids.
What I want to do is map over the list of Ids and send a message to them but when I first initialize the list of Ids I put the variable "empty_set".(Maybe I should rename it to empty_list :P).
The problem is that whenever I call the map function I want to check first if the list is an "empty_set" and if not then use the map function in it. Here is the code:
{From, set_value, V} ->
if ViewerSet /= empty_set -> set_viewer_values(V, ViewerSet)
end,
looper(V, ViewerSet)
This is the function that is called:
set_viewer_values(Value, ViewerSet) ->
if ViewerSet /= empty_set ->
lists:map(fun(ViewerPid) ->
ViewerPid ! {self(), set_value, Value} end, ViewerSet)
end.
This is how I initiate the process:
process() ->
C = spawn(fun() -> looper(no_value, empty_set) end),
{ok, C}.
The problem is that when I run it I get this error:
=ERROR REPORT==== 2-Nov-2014::15:03:07 ===
Error in process <0.367.0> with exit value: {function_clause,[{lists,map,
[#Fun<sheet.2.12938396>,empty_set],[{file,"lists.erl"},{line,1223}]},{lists,map,2,
[{file,"lists.erl"},{line,1224}]},{sheet,cell_loop,2,[{file,"sheet.erl"},{line,93}]}]}
From what I understand is that despite the if expression that I have to check whether or not the list is empty, it still tries to map over it.
So what am I doing wrong with the expression?
Thanks
Pattern matching. If you need to check for an empty list in a guard or if or cond its almost certain that you have a structural problem with the way you are thinking about Erlang.
This will nearly always manifest in confusing code and weird edge cases that make you ask yourself things like "How do I check for an empty list?" without realizing that what you are really asking is "How do I check for an empty list as a procedural condition?" This is the bane of sane functional programming.
Edit: A touch more explanation and an example may be in order
Wherever you would want to inject pattern matching you can either use something like a case or you can break whatever you are doing out into a separate function. Very often what you find is that you've got a semantic ambiguity where things are too tightly coupled on the one hand (you're doing work other than receipt of messages within a receive) and too loosely on the other (you're engaging in a lot of arbitrary procedural checking prior to calling a function, when really matching on parameters is the natural solution).
looper(V, ViewerSet) ->
receive
{From, set_value, V} ->
set_viewer_values(V, ViewerSet),
looper(V, ViewerSet);
% OtherStuff ->
% whatever else looper/2 does...
end.
set_viewer_values(V, []) ->
set_default_values(V);
set_viewer_values(V, ViewerSet) ->
% ... whatever the normal function definition is...
Wherever you are dispatching to from within your receive is what should be doing the actual work, and that is also the place you want to be doing matching. Since that is a function-call away anyway matching here is a good fit and simplifies your code.
If you want to match in looper/2 itself this is certainly possible. I don't know what you want to do when you receive an empty list, so I'll make up something, but you can do whatever you want:
looper(V, []) ->
looper(V, default_set());
looper(V, ViewerSet) ->
% As before, or whatever makes sense.
You could even decide that when you have an empty set you need to operate in a totally different way:
full_looper(V, []) ->
empty_looper(V);
full_looper(V, ViewerSet) ->
receive
{new_set, Set} ->
looper(V, Set);
{From, set_value, V} ->
set_viewer_values(V, ViewerSet),
looper(V, ViewerSet)
end.
empty_looper(V) ->
receive
{new_set, Set} ->
full_looper(V, Set);
{From, set_value, V} ->
set_viewer_values(V, default_set()),
empty_looper(V)
end.
My point above is that there are many ways to handle the case of having an empty set without resorting to arbitrary procedural checking, and all of them read easier once you know your way around (until you get used to doing things this way, though, it can feel pretty weird). As a side note, the last example is actually creating a finite state machine -- and there is an OTP module already to make creating FSMs really easy. (They are easy to write by hand in Erlang, too, but even easier with the gen_fsm module.)
Try Case to check when list is empty rather then recursion?
On both if expressions, what happens if ViewerSet is empty_set? There's no guard that handles this case.
if expressions in Erlang are not the typical if expressions you see in other languages. From the little experience I have, they are mostly avoided and for a good reason: (as another answer already mentioned) pattern matching can be used to check for equality and other comparison operations (through guards).
The following is taken from here:
If no guard sequence is true, an if_clause run-time error will occur. If necessary, the guard expression true can be used in the last branch, as that guard sequence is always true.
Example:
is_greater_than(X, Y) ->
if
X>Y ->
true;
true -> % works as an 'else' branch
false
end
So if expressions end up being a sort of case but with boolean values as their clauses, they tend to introduce more confusion than clarity. Some people even avoid any usage of if expression.
My suggestion is that everytime you see yourself using an if expression, ask yourself how you can replace that with pattern matching, either with a case or as part of a function clause.
If you have a list of ids in the variable ViewerSet, simply initialize it with the empty list: [].
Then when you receive the message {From, set_value, V} you can execute a function for each element of the list (even if it is empty) using lists:foreach/2 or using list comprehension:
{From, set_value, V} ->
lists:foreach(fun(ViewerPid) -> ViewerPid ! {self(), set_value, Value} end, ViewerSet),
looper(V, ViewerSet);
...
or
{From, set_value, V} ->
[fun(ViewerPid) -> ViewerPid ! {self(), set_value, Value} end || ViewerPid <- ViewerSet],
looper(V, ViewerSet);
...
Based on your code, this is what you should get:
(shell#a)8> Val.
myatom
(shell#a)9> if Val /= myatom -> lists:map(fun(X) -> io:format("~p",[X]) end, Val) end.
** exception error: no true branch found when evaluating an if expression
(shell#a)10>
So it seems the problem resides somewhere else.
The bson-erlang module turns BSON-encoded JSON such as this:
{ "salutation" : "hello",
"subject" : "world" }
Into an Erlang tuple like this:
{ salutation, <<"hello">>, subject, <<"world">> }
Now, the server I'm attempting to talk to can put those fields in any order, and there might be extra fields in there that I don't care about, so -- equally validly -- I might see this instead:
{ subject, <<"world">>, salutation, <<"hello">>, reason, <<"nice day">> }
Is there any way that I can specify a function pattern that extracts a particular piece of the tuple, based on the one appearing immediately before it?
If I try the following, it fails with "no function clause matching..." because the arity of the tuple is wrong, and because the fields that I care about aren't in the correct place:
handle({ salutation, Salutation, _, _ }) -> ok.
Is this possible? Is there a better way to do this?
T = { subject, <<"world">>, salutation, <<"hello">>, reason, <<"nice day">> },
L = size(T),
L1 = [{element(I,T),element(I+1,T)} || I <- lists:seq(1,L,2)].
[{subject,<<"world">>},
{salutation,<<"hello">>},
{reason,<<"nice day">>}]
proplists:get_value(salutation,L1).
<<"hello">>
and if you want all in 1:
F = fun(Key,Tup) -> proplists:get_value(Key,[{element(I,Tup),element(I+1,Tup)} || I <- lists:seq(1,size(Tup),2)]) end.
F(reason,T).
<<"nice day">>
F(foo,T).
undefined
There is no pattern that successfully matches values from a variable-length structure after a prefix of an unknown length. This is true for tuples, lists and binaries. Indeed, such a pattern would require to recurse through the structure.
A common approach for a list is to recurse by splitting head and tail, something typical of functional languages.
f_list([salutation, Salutation | _]) -> {value, Salutation};
f_list([_Key, _Value | Tail]) -> f_list(Tail);
f_list([]) -> false.
Please note that this function may fail if the list contains an odd number of elements.
The same approach is possible with tuples, but you need guards instead of matching patterns as there is no pattern to extract the equivalent of the tail of the tuple. Indeed, tuples are not linked lists but structures with a O(1) access to their elements (and their size).
f_tuple(Tuple) -> f_tuple0(Tuple, 1).
f_tuple0(Tuple, N) when element(N, Tuple) =:= salutation ->
{value, element(N + 1, Tuple)};
f_tuple0(Tuple, N) when tuple_size(Tuple) > N -> f_tuple0(Tuple, N + 2);
f_tuple0(_Tuple, _N) -> false.
Likewise, this function may fail if the tuple contains an odd number of elements.
Based on elements in the question, the advantage of guards over bson:at/2 is unclear, though.
I'm a completely new to erlang. As an exercise to learn the language, I'm trying to implement the function sublist using tail recursion and without using reverse. Here's the function that I took from this site http://learnyousomeerlang.com/recursion:
tail_sublist(L, N) -> reverse(tail_sublist(L, N, [])).
tail_sublist(_, 0, SubList) -> SubList;
tail_sublist([], _, SubList) -> SubList;
tail_sublist([H|T], N, SubList) when N > 0 ->
tail_sublist(T, N-1, [H|SubList]).
It seems the use of reverse in erlang is very frequent.
In Mozart/Oz, it's very easy to create such the function using unbound variables:
proc {Sublist Xs N R}
if N>0 then
case Xs
of nil then
R = nil
[] X|Xr then
Unbound
in
R = X|Unbound
{Sublist Xr N-1 Unbound}
end
else
R=nil
end
end
Is it possible to create a similar code in erlang? If not, why?
Edit:
I want to clarify something about the question. The function in Oz doesn't use any auxiliary function (no append, no reverse, no anything external or BIF). It's also built using tail recursion.
When I ask if it's possible to create something similar in erlang, I'm asking if it's possible to implement a function or set of functions in erlang using tail recursion, and iterating over the initial list only once.
At this point, after reading your comments and answers, I'm doubtful that it can be done, because erlang doesn't seem to support unbound variables. It seems that all variables need to be assigned to value.
Short Version
No, you can't have a similar code in Erlang. The reason is because in Erlang variables are Single assignment variables.
Unbound Variables are simply not allowed in Erlang.
Long Version
I can't imagine a tail recursive function similar to the one you presenting above due to differences at paradigm level of the two languages you are trying to compare.
But nevertheless it also depends of what you mean by similar code.
So, correct me if I am wrong, the following
R = X|Unbound
{Sublist Xr N-1 Unbound}
Means that the attribution (R=X|Unbound) will not be executed until the recursive call returns the value of Unbound.
This to me looks a lot like the following:
sublist(_,0) -> [];
sublist([],_) -> [];
sublist([H|T],N)
when is_integer(N) ->
NewTail = sublist(T,N-1),
[H|NewTail].
%% or
%%sublist([H|T],N)
%% when is_integer(N) -> [H|sublist(T,N-1)].
But this code isn't tail recursive.
Here's a version that uses appends along the way instead of a reverse at the end.
subl(L, N) -> subl(L, N, []).
subl(_, 0, Accumulator) ->
Accumulator;
subl([], _, Accumulator) ->
Accumulator;
subl([H|T], N, Accumulator) ->
subl(T, N-1, Accumulator ++ [H]).
I would not say that "the use of reverse in Erlang is very frequent". I would say that the use of reverse is very common in toy problems in functional languages where lists are a significant data type.
I'm not sure how close to your Oz code you're trying to get with your "is it possible to create a similar code in Erlang? If not, why?" They are two different languages and have made many different syntax choices.
I'm trying to get around a problem with file:consult/1 not allowing tuples with fun in them like in this example:
{add_one, fun(X) -> X+1 end}.
To get around this I'm considering writing the fun inside a string and evaluating it
{add_one, "fun(X) -> X+1 end"}.
The question is. How do I convert the string into a fun?
parse_fun_expr(S) ->
{ok, Ts, _} = erl_scan:string(S),
{ok, Exprs} = erl_parse:parse_exprs(Ts),
{value, Fun, _} = erl_eval:exprs(Exprs, []),
Fun.
Note that you need a period at the end of your fun expression, e.g. S = "fun(X) -> X + 1 end.".
file:script/1 almost does what you want - it evaluates a series of erlang expressions from a file and returns the last result. You could use it in place of file:consult/1 but you'd need to change the format of the file from "term. term. term." giving [term, term ,term] to "[term, term , term]." giving [term, term, term] - place a single expression in the file instead of a sequence.
I'd like to point out that Zed's answer creates an interpreted fun. When the fun is called it enters the evaluator which starts to evaluates the abstract syntax tree returned by erl_parse:parse_exprs/1 that it has captured. Looking at the fun created:
11> erlang:fun_info(Fun, env).
{env,[[],none,none,
[{clause,1,
[{var,1,'X'}],
[],
[{op,1,'+',{var,1,'X'},{integer,1,1}}]}]]}
12> erlang:fun_info(Fun, module).
{module,erl_eval}
One can see that it has closed over the parsed abstract syntax tree as seen in the env info, and it is a fun created inside erlang_eval as seen in the module info.
It is possible to use the erlang compiler to create a compiled module at runtime, and a pointer toward that is compile:forms/2 and code:load_binary/3. But the details of that should probably go into another stackoverflow question.
Maybe by using the erl_eval module?
2> F =fun(Str,Binding) ->
{ok,Ts,_} = erl_scan:string(Str),
Ts1 = case lists:reverse(Ts) of
[{dot,_}|_] -> Ts;
TsR -> lists:reverse([{dot,1} | TsR])
end,
{ok,Expr} = erl_parse:parse_exprs(Ts1),
erl_eval:exprs(Expr, Binding) end.
#Fun<erl_eval.12.111823515>
3> F("A=23.",[]).
{value,23,[{'A',23}]}
5> F("12+B.",[{'B',23}]).
{value,35,[{'B',23}]}