Erlang receive ***WARNING*** - erlang

I have a process that sits in a loop and receives commands.
receive
increase ->
...
decrease ->
...
after 5000 ->
...
end
But when I bombard it with thousands of messages it breaks down and receives these warnings.
Warning Message:
***WARNING*** Unexp msg {<0.106.0>,rec_acked}, info {running,
[{'_UserConnections',20}],
{ieval,3994,34,log,
clientLogging,
[20],
false}}
Is there anyway to handle this? And does it cause any issue?
Thank you for your answer!

This code is good just for example and practice, but don't run in production environment.
You should always receive all messages from process mailbox and select what you want after getting.
handle_message() ->
receive
Msg ->
handle_message(Msg)
after 5000 ->
handle_timeout()
end.
handle_message(increase) ->
...;
handle_message(decrease) ->
...;
handle_message(_) ->
%% Back to receiving loop
handle_message().
You should prevent filling process mailbox.
In production-ready application, often nobody uses receive statement, often they use some standard codes which those codes handle receiving, timeouts, replies, hibernation, etc. we call those codes behavior, for example one of OTP standad behaviors is gen_server behavior
Because OTP behaviors are for general purposes if you need very efficient code for doing some special duty, you have to write something named Special process which should handle your own messages and Erlang system messages.

Related

Erlang: Functions work in shell but not in YAWS

My sole method of debugging (io:format/2) is not working in YAWS. I'm at a loss. My supervisor starts three processes: ETS Manager, YAWS Init, and Ratelimiter. This is successful. I can play around with the rate limiter in the shell... calling the same functions YAWS should be. The difference being the shell behaves as I would expect and I have no idea what is happening in YAWS.
I do know if I spam the command in shell: ratelimiter:limit(IP) it will return true eventually. I can execute the following and it will also return true: ratelimiter:lockout(IP), ratelimiter:blacklist(IP). The limiter is a gen_server.
The functions do the following:
limit/1: Check ETS table if counter > threshold; update counter. If counter > blacklist threshold make entry in mnesia table
blacklist/1: Check mnesia table if entry exists; Yes: reset timer
lockout/1: Immediately enters ID into mnesia table
In my arg_rewrite_mod module I'm doing some checks to ensure I'm getting the HTTP requests I expect, namely GET, POST, and HEAD. I thought this would be a nice place to also do the rate limiting. Do it as soon as possible in the web server's chain of events.
All the changes I've made to the arg_rewrite module seem to work except using "printf"s and the limiter. I'm new to the language so I'm not sure my mistake is obvious or not.
Skeleton of my arg_rewrite_mod:
-module(arg_preproc).
-export([arg_rewrite/1]).
-include("limiter_def.hrl").
-include_lib("/usr/lib/yaws/include/yaws_api.hrl").
is_blacklisted(ID) ->
case ratelimiter:blacklist(ID) of
false -> continue;
true -> throw(blacklist)
end.
is_limited(ID) ->
case ratelimiter:limit(ID) of
false -> continue;
true -> throw(limit)
end.
arg_rewrite(A) ->
Allow = ['GET','POST', 'HEAD'],
try
{IP, _} = A#arg.client_ip_port,
ID = IP,
is_blacklisted(ID),
io:format("~p ~p ~n",[ID, is_blacklisted(ID)]),
%% === Allow expected HTTP requests
HttpReq = (A#arg.req)#http_request.method,
case lists:member(HttpReq, Allow) of
true ->
{_,ReqTgt} = (A#arg.req)#http_request.path,
PassThru = [".css",".jpg",".jpeg",".png",".js"],
%% ... much more ...
false ->
is_limited(ID),
throw(http_method_denied)
end
catch
throw:blacklist -> %% Send back a 429;
throw:limit -> %% Same but no Retry-After;
throw:http_method_denied ->
%%Only thrown experienced
AllowedReq = string:join([atom_to_list(M) || M <- Allow], ","),
A#arg{state=#rewrite_response{status=405,
headers=[{header, {"Allow", AllowedReq}},{header, {connection, "close"}}]
}};
Type:Reason -> {error, {unhandled,{Type, Reason}}}
end.
I can spam curl -I -X HEAD <<any page>> as fast as I can in a bash shell and all I get is HTTP 200. The ETS table has zero entries as well. Using PUT I get a HTTP 405 as intended. I can ratelimiter:lockout({MY_IP}) and get the web page to load in my browser and a HTTP 200 with curl.
I'm confused. Is it the way I started YAWS?
start() ->
os:putenv("YAWSHOME", ?HOMEPATH_YAWS),
code:add_patha(?MODPATH_YAWS),
ok = case (R = application:start(yaws)) of
{error, {already_started, _}} -> ok;
_ -> R
end,
{ok,self()}. %% Tell supervisor everything okay in a manner it expects.
I did this because I thought it would be "easier."
When starting Yaws as part of another application, it's important to use its embedding support. One important thing the Yaws embedding startup code does is set the application environment variable embedded to true:
application:set_env(yaws, embedded, true),
Yaws checks this variable in several of its code paths, especially during initialization, in order to avoid assuming that it's running as a stand-alone daemon process.
Regarding rate limiting, rather than using an arg rewriter, you might consider using a shaper. The yaws_shaper module provides a behavior that expects its callback module to implement two functions:
check/1: yaws_shaper calls this to allow the callback module to decide whether to allow the request from the client. It passes client host information as the callback argument. Your shaper callback module returns either the atom allow to allow the request to proceed, or the tuple {deny, Status, Message} where Status is an HTTP status code to return to the client, such as 429 to indicate the client is making too many requests, and Message is any extra HTML to be returned to the client. (It might be nice if Message could include a reply header such as Retry-After as well; this is something I'll consider adding to Yaws.)
update/3: yaws_shaper calls this when the response for a client is ready to be returned. The first argument is the client host information, the second argument is the number of "hits" (the value 1 for each request), and the third argument is the number of bytes being delivered in response to the client's request. Your shaper callback module can return ok from update/3 (Yaws does not use the return value).
A shaper can use this framework to track how many requests each client is making and how much data Yaws is delivering to each client, and use that information to limit or deny particular clients.
And finally, while "printf debugging" works, it's less than ideal especially in Erlang, which has built-in tracing. You should consider learning the dbg module so you can trace any function you want, see who called it, see what arguments are being passed to it, see what it returns, etc.

Erlang: spawn a process and wait for termination without using `receive`

In Erlang, can I call some function f (BIF or not), whose job is to spawn a process, run the function argf I provided, and doesn't "return" until argf has "returned", and do this without using receive clause (the reason for this is that f will be invoked in a gen_server, I don't want pollute the gen_server's mailbox).
A snippet would look like this:
%% some code omitted ...
F = fun() -> blah, blah, timer:sleep(10000) end,
f(F), %% like `spawn(F), but doesn't return until 10 seconds has passed`
%% ...
The only way to communicate between processes is message passing (of course you can consider to poll for a specific key in an ets or a file but I dont like this).
If you use a spawn_monitor function in f/1 to start the F process and then have a receive block only matching the possible system messages from this monitor:
f(F) ->
{_Pid, MonitorRef} = spawn_monitor(F),
receive
{_Tag, MonitorRef, _Type, _Object, _Info} -> ok
end.
you will not mess your gen_server mailbox. The example is the minimum code, you can add a timeout (fixed or parameter), execute some code on normal or error completion...
You will not "pollute" the gen_servers mailbox if you spawn+wait for message before you return from the call or cast. A more serious problem with this maybe that you will block the gen_server while you are waiting for the other process to terminate. A way around this is to not explicitly wait but return from the call/cast and then when the completion message arrives handle it in handle_info/2 and then do what is necessary.
If the spawning is done in a handle_call and you want to return the "result" of that process then you can delay returning the value to the original call from the handle_info handling the process termination message.
Note that however you do it a gen_server:call has a timeout value, either implicit or explicit, and if no reply is returned it generates an error in the calling process.
Main way to communicate with process in Erlang VM space is message passing with erlang:send/2 or erlang:send/3 functions (alias !). But you can "hack" Erlang and use multiple way for communicating over process.
You can use erlang:link/1 to communicate stat of the process, its mainly used in case of your process is dying or is ended or something is wrong (exception or throw).
You can use erlang:monitor/2, this is similar to erlang:link/1 except the message go directly into process mailbox.
You can also hack Erlang, and use some internal way (shared ETS/DETS/Mnesia tables) or use external methods (database or other things like that). This is clearly not recommended and "destroy" Erlang philosophy... But you can do it.
Its seems your problem can be solved with supervisor behavior. supervisor support many strategies to control supervised process:
one_for_one: If one child process terminates and is to be restarted, only that child process is affected. This is the default restart strategy.
one_for_all: If one child process terminates and is to be restarted, all other child processes are terminated and then all child processes are restarted.
rest_for_one: If one child process terminates and is to be restarted, the 'rest' of the child processes (that is, the child processes after the terminated child process in the start order) are terminated. Then the terminated child process and all child processes after it are restarted.
simple_one_for_one: A simplified one_for_one supervisor, where all child processes are dynamically added instances of the same process type, that is, running the same code.
You can also modify or create your own supervisor strategy from scratch or base on supervisor_bridge.
So, to summarize, you need a process who wait for one or more terminating process. This behavior is supported natively with OTP, but you can also create your own model. For doing that, you need to share status of every started process, using cache or database, or when your process is spawned. Something like that:
Fun = fun
MyFun (ParentProcess, {result, Data})
when is_pid(ParentProcess) ->
ParentProcess ! {self(), Data};
MyFun (ParentProcess, MyData)
when is_pid(ParentProcess) ->
% do something
MyFun(ParentProcess, MyData2) end.
spawn(fun() -> Fun(self(), InitData) end).
EDIT: forgot to add an example without send/receive. I use an ETS table to store every result from lambda function. This ETS table is set when we spawn this process. To get result, we can select data from this table. Note, the key of the row is the process id of the process.
spawner(Ets, Fun, Args)
when is_integer(Ets),
is_function(Fun) ->
spawn(fun() -> Fun(Ets, Args) end).
Fun = fun
F(Ets, {result, Data}) ->
ets:insert(Ets, {self(), Data});
F(Ets, Data) ->
% do something here
Data2 = Data,
F(Ets, Data2) end.

Can my gen_server become a bottleneck?

I'm currently writing a piece of software in erlang, which is now based on gen_server behaviour. This gen_server should export a function (let's call it update/1) which should connect using ssl to another service online and send to it the value passed as argument to the function.
Currently update/1 is like this:
update(Value) ->
gen_server:call(?SERVER, {update, Value}).
So once it is called, there is a call to ?SERVER which is handled as:
handle_call({update, Value}, _From, State) ->
{ok, Socket} = ssl:connect("remoteserver.com", 5555, [], 3000).
Reply = ssl:send(Socket, Value).
{ok, Reply, State}.
Once the packet is sent to the remote server, the peer should severe the connection.
Now, this works fine with my tests in shell, but what happens if we have to call 1000 times mymod:update(Value) and ssl:connect/4 is not working well (i.e. is reaching its timeout)?
At this point, my gen_server will have a very large amount of values and they can be processed only one by one, leading to the point that the 1000th update will be done only 1000*3000 milliseconds after its value was updated using update/1.
Using a cast instead of a call would leave to the same problem. How can I solve this problem? Should I use a normal function and not a gen_server call?
From personal experience I can say that 1000 messages per gen_server process wont be a problem unless you are queuing big messages.
If from your testing it seems that your gen_server is not able to handle this much load, then you must create multiple instances of your gen_server preferably under a supervisor process at the boot time (or run-time) of your application.
Besides that, I really don't understand the requirement of making a new connection for each update!! you should consider some optimization like cached connections/ pre-connections to the server..no?

How to maintain stateful in yaws

I have some process (spawned) with state.
How to maintain simple stateful service in yaws?
How to implement communication to process in "appmods" erl source file?
update:
let's we have simple process
start() -> loop(0).
loop(C) ->
receive
{inc} -> loop(C + 1);
{get, FromPid} -> FromPid ! C, loop(C)
end.
What is the simplest (trivial: without gen_server, yapp) way to access process from web?
Maybe, I need a minimal example with gen_server+yapp+yaws / appmods+yaws.
The #arg structure is a very important datastructure for the yaws programmer.
In the ARG of Yaws out/1 there is a variable that can save user state.
"state, %% State for use by users of the out/1 callback"
You can get detail info here .
There only 2 ways to access a process in Erlang: Either you know its Pid (and the node where you expect the process to be) or You know its registered Name (and the erlang node its expected to be).
Lets say you have your appmod:
-module(myappmod).
-export([out/1]).
-include("PATH/TO/YAWS_SERVER/include/yaws_api.hrl").
-include("PATH/TO/YAWS_SERVER/include/yaws.hrl").
out(Arg) ->
case check_initial_state(Arg) of
unknown -> create_initial_state();
{ok,Value}->
UserPid = list_to_pid(Value),
UserPid ! extract_request(Arg),
receive
Response -> {html,format_response(Response)}
after ?TIMEOUT -> {html,"request_timedout"}
end
end.
check_initial_state(A)->
CookieObject = (A#arg.headers)#headers.cookie,
case yaws_api:find_cookie_val("InitialState", CookieObject) of
[] -> unkown;
Cookie -> {ok,Cookie}
end.
extract_request(Arg)-> %% get request from POST Data or Get Data
Post__data_proplist = yaws_api:parse_post(Arg),
Get_data_proplist = yaws_api:parse_query(Arg),
%% do many other things....
Request = remove_request(Post__data_proplist,Get_data_proplist),
Request.
That simple set up shows you how you would use processes to keep things about a user. However, the use of processes is not good. Processes do fail, so you need a way of recovering what data they were holding.
A better approach is to have a Data storage about your users and have one gen_server to do the look ups. You could use Mnesia. I do not advise you to use processes on the web to keep user state, no matter what kind of app you are doing, even if its a messaging app. Mnesia or ETS tables can keep state and all you need to do is look up.
Use a better storage mechanism to keep state other than processes. Processes are a point of failure. Others use Cookies (and/or Session cookies), whose value is used in some way to look up something from a database. However, if you insist that you need processes, then, have a way of remembering their Pids or registered names. You could store a user Pid into their session cookie e.t.c.

What are the performance characteristics of sending big messages in Distributed Erlang?

Suppose I create new local process in an Erlang application, and i want to send to it a big message.
-module(chain_hello).
start(N, Some_big_data)->
Pid1 = spawn(chain_hello, some_fun, [N]),
Pid1 ! Some_big_data,
io:format("done \n").
despite Some_big_data is a reference to really big data (eg. content of a file) - is it copied when sending? Are there big penalties for performance?
Normally I would use some thread safe shared object (and/or mutex). Is there any solution in Erlang to avoid copying message content?
ADDED:
Interesting case is when Some_big_data is structured content - to be specific: map, on which I can perform some operations.
ADDED2
Ok, I see there is no such solution for Erlang (sharing some structured data like map in worker process) - because of Erlang design. But I think it is justified by solid work, and easily concurrence management.
From the Erlang Efficiency Guide:
All data in messages between Erlang
processes is copied, with the
exception of refc binaries on the same
Erlang node.
Yes, you should avoid sending big terms between processes. If you need to send a lot of data, send it as a binary.
As a suggestion, you can send only the pid of the current process (self()) to the process that you want to handle that some_big_data. That way, when you want to use that some_big_data, you can reference it back to it from the 2nd process.
For instance:
-module(proc1).
send_big_data() ->
BigData = get_some_big_data(),
Pid = spawn(fun proc2:start/0),
Pid ! {process_big_data, self()},
loop(Pid, BigData).
loop(Pid,BigData) ->
receive
get_first_element ->
[H|T] = BigData,
Pid ! {response, H};
_ ->
Pid ! {nothing}
end,
loop(Pid).
(Sorry for the eventual syntax mistakes).

Resources