What is the detail of the Erlang select receive mechanism? - erlang

I have read an article about the Erlang select receive mechanism at the end of the article, there is a conclusion: "messages are moved from the mailbox to the save queue and then back to the mailbox after the matching message arrives". I have tried the example shown in the article, but I couldn't get the same result. Here is my code and my erlang/otp version is 21.
shell1:
(aaa#HW0003727)1> register(shell, self()).
true
(aaa#HW0003727)2> shell ! c, shell ! d.
d
(aaa#HW0003727)3> process_info(whereis(shell),messages).
{messages,[c,d]}.
(aaa#HW0003727)4> receive a -> 1; b -> 2 end.
shell2:
(aaa#HW0003727)1> process_info(whereis(shell),messages).
{messages,[c,d]}
(aaa#HW0003727)2> process_info(whereis(shell)).
[{registered_name,shell},
{current_function,{prim_eval,'receive',2}},
{initial_call,{erlang,apply,2}},
{status,waiting},
{message_queue_len,2},
{links,[<0.113.0>]},
{dictionary,[]},
{trap_exit,false},
{error_handler,error_handler},
{priority,normal},
{group_leader,<0.112.0>},
{total_heap_size,4212},
{heap_size,1598},
{stack_size,30},
{reductions,13906},
{garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
{min_bin_vheap_size,46422},
{min_heap_size,233},
{fullsweep_after,65535},
{minor_gcs,1}]},
{suspending,[]}]
The article.

This strange behaviour with the visible state of a "save queue" was only true in the interpreted code running in the shell, not in regular compiled modules. In the actual C implementation of receive, there is only one queue with a pointer to keep track of which ones have been scanned so far, and process_info does not show an empty queue during a real receive. The behaviour of the interpreted code was fixed back in R16B01, so nowadays there is no visible difference: https://github.com/erlang/otp/commit/acb8ef5d18cc3976bf580a8e6925cb5641acd401

Related

Erlang: Functions work in shell but not in YAWS

My sole method of debugging (io:format/2) is not working in YAWS. I'm at a loss. My supervisor starts three processes: ETS Manager, YAWS Init, and Ratelimiter. This is successful. I can play around with the rate limiter in the shell... calling the same functions YAWS should be. The difference being the shell behaves as I would expect and I have no idea what is happening in YAWS.
I do know if I spam the command in shell: ratelimiter:limit(IP) it will return true eventually. I can execute the following and it will also return true: ratelimiter:lockout(IP), ratelimiter:blacklist(IP). The limiter is a gen_server.
The functions do the following:
limit/1: Check ETS table if counter > threshold; update counter. If counter > blacklist threshold make entry in mnesia table
blacklist/1: Check mnesia table if entry exists; Yes: reset timer
lockout/1: Immediately enters ID into mnesia table
In my arg_rewrite_mod module I'm doing some checks to ensure I'm getting the HTTP requests I expect, namely GET, POST, and HEAD. I thought this would be a nice place to also do the rate limiting. Do it as soon as possible in the web server's chain of events.
All the changes I've made to the arg_rewrite module seem to work except using "printf"s and the limiter. I'm new to the language so I'm not sure my mistake is obvious or not.
Skeleton of my arg_rewrite_mod:
-module(arg_preproc).
-export([arg_rewrite/1]).
-include("limiter_def.hrl").
-include_lib("/usr/lib/yaws/include/yaws_api.hrl").
is_blacklisted(ID) ->
case ratelimiter:blacklist(ID) of
false -> continue;
true -> throw(blacklist)
end.
is_limited(ID) ->
case ratelimiter:limit(ID) of
false -> continue;
true -> throw(limit)
end.
arg_rewrite(A) ->
Allow = ['GET','POST', 'HEAD'],
try
{IP, _} = A#arg.client_ip_port,
ID = IP,
is_blacklisted(ID),
io:format("~p ~p ~n",[ID, is_blacklisted(ID)]),
%% === Allow expected HTTP requests
HttpReq = (A#arg.req)#http_request.method,
case lists:member(HttpReq, Allow) of
true ->
{_,ReqTgt} = (A#arg.req)#http_request.path,
PassThru = [".css",".jpg",".jpeg",".png",".js"],
%% ... much more ...
false ->
is_limited(ID),
throw(http_method_denied)
end
catch
throw:blacklist -> %% Send back a 429;
throw:limit -> %% Same but no Retry-After;
throw:http_method_denied ->
%%Only thrown experienced
AllowedReq = string:join([atom_to_list(M) || M <- Allow], ","),
A#arg{state=#rewrite_response{status=405,
headers=[{header, {"Allow", AllowedReq}},{header, {connection, "close"}}]
}};
Type:Reason -> {error, {unhandled,{Type, Reason}}}
end.
I can spam curl -I -X HEAD <<any page>> as fast as I can in a bash shell and all I get is HTTP 200. The ETS table has zero entries as well. Using PUT I get a HTTP 405 as intended. I can ratelimiter:lockout({MY_IP}) and get the web page to load in my browser and a HTTP 200 with curl.
I'm confused. Is it the way I started YAWS?
start() ->
os:putenv("YAWSHOME", ?HOMEPATH_YAWS),
code:add_patha(?MODPATH_YAWS),
ok = case (R = application:start(yaws)) of
{error, {already_started, _}} -> ok;
_ -> R
end,
{ok,self()}. %% Tell supervisor everything okay in a manner it expects.
I did this because I thought it would be "easier."
When starting Yaws as part of another application, it's important to use its embedding support. One important thing the Yaws embedding startup code does is set the application environment variable embedded to true:
application:set_env(yaws, embedded, true),
Yaws checks this variable in several of its code paths, especially during initialization, in order to avoid assuming that it's running as a stand-alone daemon process.
Regarding rate limiting, rather than using an arg rewriter, you might consider using a shaper. The yaws_shaper module provides a behavior that expects its callback module to implement two functions:
check/1: yaws_shaper calls this to allow the callback module to decide whether to allow the request from the client. It passes client host information as the callback argument. Your shaper callback module returns either the atom allow to allow the request to proceed, or the tuple {deny, Status, Message} where Status is an HTTP status code to return to the client, such as 429 to indicate the client is making too many requests, and Message is any extra HTML to be returned to the client. (It might be nice if Message could include a reply header such as Retry-After as well; this is something I'll consider adding to Yaws.)
update/3: yaws_shaper calls this when the response for a client is ready to be returned. The first argument is the client host information, the second argument is the number of "hits" (the value 1 for each request), and the third argument is the number of bytes being delivered in response to the client's request. Your shaper callback module can return ok from update/3 (Yaws does not use the return value).
A shaper can use this framework to track how many requests each client is making and how much data Yaws is delivering to each client, and use that information to limit or deny particular clients.
And finally, while "printf debugging" works, it's less than ideal especially in Erlang, which has built-in tracing. You should consider learning the dbg module so you can trace any function you want, see who called it, see what arguments are being passed to it, see what it returns, etc.

Triggering the update message in Erlang hot code reload feature

I am trying the hot code feature of erlang following the guide from LYAE but i do not understand how to make the update message to get triggered.
I have a module which runs a method that is upgradeable:
Module
-module(upgrade).
-export([main/1,upgrade/1,init/1,init_link/1]).
-record(state,{ version=0,comments=""}).
init(State)->
spawn(?MODULE,main,[State]).
main(State)->
receive
update->
NewState=?MODULE:upgrade(State),
if NewState#state.version>3 -> exit("Max Version Reached") end,
?MODULE:main(NewState);
SomeMessage->
main(State)
end.
upgrade(State=#state{version=Version,comments=Comments})->
Comm=case Version rem 2 of
0 -> "Even version";
_ -> "Uneven version"
end,
#state{version=Version+1,comments=Comm}.
Shell
>c(upgrade).
>rr(upgrade,state).
>U=upgrade:init(#state{version=0,comments="initial"}).
>Monitor=monitor(process,U).
> ......to something to trigger the update message
> flush(). % see the exit message reason
I do not understand how can i perform a hot code reload in order to trigger the update message.
I want when i use flush to get the exit reason from my main method.
The process expects to get the atom update as a message. Since you have the pid of the process in the variable U, you can send the message like this:
U ! update.
Note that the strings Even version and Uneven version are only kept in the state, never printed, so you won't see those. The only thing you'll see is the exit message, after sending update four times and calling flush().

Different behaviour of ETS with and without a shell

First a disclaimer I am learning erlang. Not an expert here at all.
While making some examples using ETS I came across something I am not understanding (even after searching).
I have a process where I create a public ETS with
TableID = ets:new(tablename, [public])}
I then pass TableID to other processes. When I do this running the modules form the shell, all is ok. When I run exactly the same modules with erl -noshell -s ... or even without the -noshell option, it fails.
I keep getting error:badarg as if the tabled does not exist. The ID is properly passes, but the table is actually behaving as private!
Is there a difference between running modules interactively from the shell or without?
Thanks
I am adding an example of the code I am using to try and debug the issue. As it is a piece of a larger software (and it is basically stripped to the bone to find the issue), it might be difficult to understand.
-record(loop_state, {
commands
}).
start() ->
LoopState = #loop_state{commands = ets:new(commands, [public])},
tcpserver_otp_backend:start(?MODULE, 7000, {?MODULE, loop}, LoopState).
loop(Socket, LoopState = #loop_state{commands = Commands}) ->
case gen_tcp:recv(Socket, 0) of
{ok, Data} ->
% the call below fails, no error generated, AND only in non interactive shell
A = newCommand(Commands, Data),
gen_tcp:send(Socket, A),
loop(Socket, LoopState);
{error, closed} ->
ok
end.
newCommand(CommandTableId, Command) ->
case ets:lookup(CommandTableId,Command) of
[] ->
_A = ets:insert(CommandTableId, {Command, 1}),
<<1, "new", "1">>; % used for testing
_ ->
<<1, "old", "1">> % used for testing
end.
When I remove the "offending command" ets:lookup, all works again as int he interactive shell.
The problem seems to be that you create the ets table in your start() function. An ets table has an owner (by default the creating process), and when the owner dies, the table gets deleted. When you run the start/0 function from the command line by passing -s to erl, that owner process will be some internal process in the Erlang kernel that is part of handling the startup sequence. Regardless of whether you pass -noshell or not, that process is probably transient and will die before the lookup function gets time to execute, so the table no longer exists when the lookup finally happens.
The proper place to create the ets table would be in the init() callback function of the gen_server that you start up. If it's supposed to be a public est table accessed by several processes, then you might want to create a separate server process whose task it is to own the table.

Basic Erlang Question

I have been trying to learn Erlang and came across some code written by Joe Armstrong:
start() ->
F = fun interact/2,
spawn(fun() -> start(F, 0) end).
interact(Browser, State) ->
receive
{browser, Browser, Str} ->
Str1 = lists:reverse(Str),
Browser ! {send, "out ! " ++ Str1},
interact(Browser, State);
after 100 ->
Browser ! {send, "clock ! tick " ++ integer_to_list(State)},
interact(Browser, State+1)
end.
It is from a blog post about using websockets with Erlang: http://armstrongonsoftware.blogspot.com/2009/12/comet-is-dead-long-live-websockets.html
Could someone please explain to me why in the start function, he spawns the anonymous function start(F, 0), when start is a function that takes zero arguments. I am confused about what he is trying to do here.
Further down in this blog post (Listings) you can see that there is another function (start/2) that takes two arguments:
start(F, State0) ->
{ok, Listen} = gen_tcp:listen(1234, [{packet,0},
{reuseaddr,true},
{active, true}]),
par_connect(Listen, F, State0).
The code sample you quoted was only an excerpt where this function was omitted for simplicity.
The reason for spawning a fun in this way is to avoid having to export a function which is only intended for internal use. One problem with is that all exported functions are available to all users even if they only meant for internal use. One example of this is a call-back module for gen_server which typically contains both the exported API for clients and the call-back functions for the gen_server behaviour. The call-back functions are only intended to be called by the gen_server behaviour and not by others but they are visible in the export list and not in anyway blocked.
Spawning a fun decreases the number of exported internal functions.
In Erlang, functions are identified by their name and their arity (the number of parameters they take). You can have more than one function with the same name, as long as they all have different numbers of parameters. The two functions you've posted above are start/0 and interact/2. start/0 doesn't call itself; instead it calls start/2, and if you take a look further down the page you linked to, you'll find the definition of start/2.
The point of using spawn in this way is to start a server process in the background and return control to the caller. To play with this code, I guess that you'd start up the Erlang interpreter, load the script and then call the start/0 function. This method would then start a process in the background and return so that you could continue to type into the Erlang interpreter.

Query an Erlang process for its state?

A common pattern in Erlang is the recursive loop that maintains state:
loop(State) ->
receive
Msg ->
NewState = whatever(Msg),
loop(NewState)
end.
Is there any way to query the state of a running process with a bif or tracing or something? Since crash messages say "...when state was..." and show the crashed process's state, I thought this would be easy, but I was disappointed that I haven't been able to find a bif to do this.
So, then, I figured using the dbg module's tracing would do it. Unfortunately, I believe because these loops are tail call optimized, dbg will only capture the first call to the function.
Any solution?
If your process is using OTP, it is enough to do sys:get_status(Pid).
The error message you mentions is displayed by SASL. SASL is an error reporting daemon in OTP.
The state you are referring in your example code is just an argument of tail recursive function. There is no way to extract it using anything except for tracing BIFs. I guess this would be not a proper solution in production code, since tracing is intended to be used only for debug purposes.
Proper, and industry tested, solution would be make extensive use of OTP in your project. Then you can take full advantage of SASL error reporting, rb module to collect these reports, sys - to inspect the state of the running OTP-compatible process, proc_lib - to make short-lived processes OTP-compliant, etc.
It turns out there's a better answer than all of these, if you're using OTP:
sys:get_state/1
Probably it didn't exist at the time.
It looks like you're making the problem out of nothing. erlang:process_info/1 gives enough information for debugging purposes. If your REALLY need loop function arguments, why don't you give it back to caller in response to one of the special messages that you define yourself?
UPDATE:
Just to clarify terminology. The closest thing to the 'state of the process' on the language level is process dictionary, usage of which is highly discouraged. It can be queried by erlang:process_info/1 or erlang:process/2.
What you actually need is to trace process's local functions calls along with their arguments:
-module(ping).
-export([start/0, send/1, loop/1]).
start() ->
spawn(?MODULE, loop, [0]).
send(Pid) ->
Pid ! {self(), ping},
receive
pong ->
pong
end.
loop(S) ->
receive
{Pid, ping} ->
Pid ! pong,
loop(S + 1)
end.
Console:
Erlang (BEAM) emulator version 5.6.5 [source] [smp:2] [async-threads:0] [kernel-poll:false]
Eshell V5.6.5 (abort with ^G)
1> l(ping).
{module,ping}
2> erlang:trace(all, true, [call]).
23
3> erlang:trace_pattern({ping, '_', '_'}, true, [local]).
5
4> Pid = ping:start().
<0.36.0>
5> ping:send(Pid).
pong
6> flush().
Shell got {trace,<0.36.0>,call,{ping,loop,[0]}}
Shell got {trace,<0.36.0>,call,{ping,loop,[1]}}
ok
7>
{status,Pid,_,[_,_,_,_,[_,_,{data,[{_,State}]}]]} = sys:get_status(Pid).
That's what I use to get the state of a gen_server. (Tried to add it as a comment to the reply above, but couldn't get formatting right.)
As far as I know you cant get the arguments passed to a locally called function. I would love for someone to prove me wrong.
-module(loop).
-export([start/0, loop/1]).
start() ->
spawn_link(fun () -> loop([]) end).
loop(State) ->
receive
Msg ->
loop([Msg|State])
end.
If we want to trace this module you do the following in the shell.
dbg:tracer().
dbg:p(new,[c]).
dbg:tpl(loop, []).
Using this tracing setting you get to see local calls (the 'l' in tpl means that local calls will be traced as well, not only global ones).
5> Pid = loop:start().
(<0.39.0>) call loop:'-start/0-fun-0-'/0
(<0.39.0>) call loop:loop/1
<0.39.0>
6> Pid ! foo.
(<0.39.0>) call loop:loop/1
foo
As you see, just the calls are included. No arguments in sight.
My recommendation is to base correctness in debugging and testing on the messages sent rather than state kept in processes. I.e. if you send the process a bunch of messages, assert that it does the right thing, not that it has a certain set of values.
But of course, you could also sprinkle some erlang:display(State) calls in your code temporarily. Poor man's debugging.
This is a "oneliner" That can be used in the shell.
sys:get_status(list_to_pid("<0.1012.0>")).
It helps you convert a pid string into a Pid.

Resources