Erlang: Functions work in shell but not in YAWS - erlang

My sole method of debugging (io:format/2) is not working in YAWS. I'm at a loss. My supervisor starts three processes: ETS Manager, YAWS Init, and Ratelimiter. This is successful. I can play around with the rate limiter in the shell... calling the same functions YAWS should be. The difference being the shell behaves as I would expect and I have no idea what is happening in YAWS.
I do know if I spam the command in shell: ratelimiter:limit(IP) it will return true eventually. I can execute the following and it will also return true: ratelimiter:lockout(IP), ratelimiter:blacklist(IP). The limiter is a gen_server.
The functions do the following:
limit/1: Check ETS table if counter > threshold; update counter. If counter > blacklist threshold make entry in mnesia table
blacklist/1: Check mnesia table if entry exists; Yes: reset timer
lockout/1: Immediately enters ID into mnesia table
In my arg_rewrite_mod module I'm doing some checks to ensure I'm getting the HTTP requests I expect, namely GET, POST, and HEAD. I thought this would be a nice place to also do the rate limiting. Do it as soon as possible in the web server's chain of events.
All the changes I've made to the arg_rewrite module seem to work except using "printf"s and the limiter. I'm new to the language so I'm not sure my mistake is obvious or not.
Skeleton of my arg_rewrite_mod:
-module(arg_preproc).
-export([arg_rewrite/1]).
-include("limiter_def.hrl").
-include_lib("/usr/lib/yaws/include/yaws_api.hrl").
is_blacklisted(ID) ->
case ratelimiter:blacklist(ID) of
false -> continue;
true -> throw(blacklist)
end.
is_limited(ID) ->
case ratelimiter:limit(ID) of
false -> continue;
true -> throw(limit)
end.
arg_rewrite(A) ->
Allow = ['GET','POST', 'HEAD'],
try
{IP, _} = A#arg.client_ip_port,
ID = IP,
is_blacklisted(ID),
io:format("~p ~p ~n",[ID, is_blacklisted(ID)]),
%% === Allow expected HTTP requests
HttpReq = (A#arg.req)#http_request.method,
case lists:member(HttpReq, Allow) of
true ->
{_,ReqTgt} = (A#arg.req)#http_request.path,
PassThru = [".css",".jpg",".jpeg",".png",".js"],
%% ... much more ...
false ->
is_limited(ID),
throw(http_method_denied)
end
catch
throw:blacklist -> %% Send back a 429;
throw:limit -> %% Same but no Retry-After;
throw:http_method_denied ->
%%Only thrown experienced
AllowedReq = string:join([atom_to_list(M) || M <- Allow], ","),
A#arg{state=#rewrite_response{status=405,
headers=[{header, {"Allow", AllowedReq}},{header, {connection, "close"}}]
}};
Type:Reason -> {error, {unhandled,{Type, Reason}}}
end.
I can spam curl -I -X HEAD <<any page>> as fast as I can in a bash shell and all I get is HTTP 200. The ETS table has zero entries as well. Using PUT I get a HTTP 405 as intended. I can ratelimiter:lockout({MY_IP}) and get the web page to load in my browser and a HTTP 200 with curl.
I'm confused. Is it the way I started YAWS?
start() ->
os:putenv("YAWSHOME", ?HOMEPATH_YAWS),
code:add_patha(?MODPATH_YAWS),
ok = case (R = application:start(yaws)) of
{error, {already_started, _}} -> ok;
_ -> R
end,
{ok,self()}. %% Tell supervisor everything okay in a manner it expects.
I did this because I thought it would be "easier."

When starting Yaws as part of another application, it's important to use its embedding support. One important thing the Yaws embedding startup code does is set the application environment variable embedded to true:
application:set_env(yaws, embedded, true),
Yaws checks this variable in several of its code paths, especially during initialization, in order to avoid assuming that it's running as a stand-alone daemon process.
Regarding rate limiting, rather than using an arg rewriter, you might consider using a shaper. The yaws_shaper module provides a behavior that expects its callback module to implement two functions:
check/1: yaws_shaper calls this to allow the callback module to decide whether to allow the request from the client. It passes client host information as the callback argument. Your shaper callback module returns either the atom allow to allow the request to proceed, or the tuple {deny, Status, Message} where Status is an HTTP status code to return to the client, such as 429 to indicate the client is making too many requests, and Message is any extra HTML to be returned to the client. (It might be nice if Message could include a reply header such as Retry-After as well; this is something I'll consider adding to Yaws.)
update/3: yaws_shaper calls this when the response for a client is ready to be returned. The first argument is the client host information, the second argument is the number of "hits" (the value 1 for each request), and the third argument is the number of bytes being delivered in response to the client's request. Your shaper callback module can return ok from update/3 (Yaws does not use the return value).
A shaper can use this framework to track how many requests each client is making and how much data Yaws is delivering to each client, and use that information to limit or deny particular clients.
And finally, while "printf debugging" works, it's less than ideal especially in Erlang, which has built-in tracing. You should consider learning the dbg module so you can trace any function you want, see who called it, see what arguments are being passed to it, see what it returns, etc.

Related

Pass some arguments to supervisor init function when app starts

I want to pass some arguments to supervisor:init/1 function and it is desirable, that the application's interface was so:
redis_pool:start() % start all instances
redis_pool:start(Names) % start only given instances
Here is the application:
-module(redis_pool).
-behaviour(application).
...
start() -> % start without params
application:ensure_started(?APP_NAME, transient).
start(Names) -> % start with some params
% I want to pass Names to supervisor init function
% in order to do that I have to bypass application:ensure_started
% which is not GOOD :(
application:load(?APP_NAME),
case start(normal, [Names]) of
{ok, _Pid} -> ok;
{error, {already_started, _Pid}} -> ok
end.
start(_StartType, StartArgs) ->
redis_pool_sup:start_link(StartArgs).
Here is the supervisor:
init([]) ->
{ok, Config} = get_config(),
Names = proplists:get_keys(Config),
init([Names]);
init([Names]) ->
{ok, Config} = get_config(),
PoolSpecs = lists:map(fun(Name) ->
PoolName = pool_utils:name_for(Name),
{[Host, Port, Db], PoolSize} = proplists:get_value(Name, Config),
PoolArgs = [{name, {local, PoolName}},
{worker_module, eredis},
{size, PoolSize},
{max_overflow, 0}],
poolboy:child_spec(PoolName, PoolArgs, [Host, Port, Db])
end, Names),
{ok, {{one_for_one, 10000, 1}, PoolSpecs}}.
As you can see, current implementation is ugly and may be buggy. The question is how I can pass some arguments and start application and supervisor (with params who were given to start/1) ?
One option is to start application and run redis pools in two separate phases.
redis_pool:start(),
redis_pool:run([] | Names).
But what if I want to run supervisor children (redis pool) when my app starts?
Thank you.
The application callback Module:start/2 is not an API to call in order to start the application. It is called when the application is started by application:start/1,2. This means that overloading it to provide differing parameters is probably the wrong thing to do.
In particular, application:start will be called directly if someone adds your application as a dependency of theirs (in the foo.app file). At this point, they have no control over the parameters, since they come from your .app file, in the {mod, {Mod, Args}} term.
Some possible solutions:
Application Configuration File
Require that the parameters be in the application configuration file; you can retrieve them with application:get_env/2,3.
Don't start a supervisor
This means one of two things: becoming a library application (removing the {mod, Mod} term from your .app file) -- you don't need an application behaviour; or starting a dummy supervisor that does nothing.
Then, when someone wants to use your library, they can call an API to create the pool supervisor, and graft it into their supervision tree. This is what poolboy does with poolboy:child_spec.
Or, your application-level supervisor can be a normal supervisor, with no children by default, and you can provide an API to start children of that, via supervisor:start_child. This is (more or less) what cowboy does.
You can pass arguments in the AppDescr argument to application:load/1 (though its a mighty big tuple already...) as {mod, {Module, StartArgs}} according to the docs ("according to the docs" as in, I don't recall doing it this way myself, ever: http://www.erlang.org/doc/apps/kernel/application.html#load-1).
application:load({application, some_app, {mod, {Module, [Stuff]}}})
Without knowing anything about the internals of the application you're starting, its hard to say which way is best, but a common way to do this is to start up the application and then send it a message containing the data you want it to know.
You could make receipt of the message form tell the application to go through a configuration assertion procedure, so that the same message you send on startup is also the same sort of thing you would send it to reconfigure it on the fly. I find this more useful than one-shotting arguments on startup.
In any case, it is usually better to think in terms of starting something, then asking it to do something for you, than to try telling it everything in init parameters. This can be as simple as having it start up and wait for some message that will tell the listener to then spin up the supervisor the way you're trying to here -- isolated one step from the application inclusion issues RL mentioned in his answer.

Letting 'every process run until it is blocked' on an Erlang node

(Also available on Erlang's mailing list.)
Is it possible to write a function that waits for every process running on an Erlang node to reach a point where it is blocked, waiting for a message?
The function should return only when every process is waiting for a message that has not yet been sent to it. Assume that no process is in a time-related suspension (receive with an after clause, timer-related operations etc). The process running this function is, of course, excluded.
Obviously wrong answer:
erlang:yield/0: This gives a chance to every other process to run, but not necessarily until it is blocked.
Not 100% correct approach:
only_one_not_waiting() ->
Running =
[P || P <- processes(), process_info(P, status) =/= {status, waiting}],
length(Running) == 1
end.
everyone_blocked() ->
case only_one_not_waiting() of
true -> ok;
false -> everyone_blocked()
end.
Ignoring timers, running only_one_not_waiting/0 repeatedly until it returns true (as everyone_blocked/0 does) should indicate the desired system state, if this state is eventually reached.
However I am not sure how much trust one should put on the return value of process_info(P, status).

Can my gen_server become a bottleneck?

I'm currently writing a piece of software in erlang, which is now based on gen_server behaviour. This gen_server should export a function (let's call it update/1) which should connect using ssl to another service online and send to it the value passed as argument to the function.
Currently update/1 is like this:
update(Value) ->
gen_server:call(?SERVER, {update, Value}).
So once it is called, there is a call to ?SERVER which is handled as:
handle_call({update, Value}, _From, State) ->
{ok, Socket} = ssl:connect("remoteserver.com", 5555, [], 3000).
Reply = ssl:send(Socket, Value).
{ok, Reply, State}.
Once the packet is sent to the remote server, the peer should severe the connection.
Now, this works fine with my tests in shell, but what happens if we have to call 1000 times mymod:update(Value) and ssl:connect/4 is not working well (i.e. is reaching its timeout)?
At this point, my gen_server will have a very large amount of values and they can be processed only one by one, leading to the point that the 1000th update will be done only 1000*3000 milliseconds after its value was updated using update/1.
Using a cast instead of a call would leave to the same problem. How can I solve this problem? Should I use a normal function and not a gen_server call?
From personal experience I can say that 1000 messages per gen_server process wont be a problem unless you are queuing big messages.
If from your testing it seems that your gen_server is not able to handle this much load, then you must create multiple instances of your gen_server preferably under a supervisor process at the boot time (or run-time) of your application.
Besides that, I really don't understand the requirement of making a new connection for each update!! you should consider some optimization like cached connections/ pre-connections to the server..no?

How to maintain stateful in yaws

I have some process (spawned) with state.
How to maintain simple stateful service in yaws?
How to implement communication to process in "appmods" erl source file?
update:
let's we have simple process
start() -> loop(0).
loop(C) ->
receive
{inc} -> loop(C + 1);
{get, FromPid} -> FromPid ! C, loop(C)
end.
What is the simplest (trivial: without gen_server, yapp) way to access process from web?
Maybe, I need a minimal example with gen_server+yapp+yaws / appmods+yaws.
The #arg structure is a very important datastructure for the yaws programmer.
In the ARG of Yaws out/1 there is a variable that can save user state.
"state, %% State for use by users of the out/1 callback"
You can get detail info here .
There only 2 ways to access a process in Erlang: Either you know its Pid (and the node where you expect the process to be) or You know its registered Name (and the erlang node its expected to be).
Lets say you have your appmod:
-module(myappmod).
-export([out/1]).
-include("PATH/TO/YAWS_SERVER/include/yaws_api.hrl").
-include("PATH/TO/YAWS_SERVER/include/yaws.hrl").
out(Arg) ->
case check_initial_state(Arg) of
unknown -> create_initial_state();
{ok,Value}->
UserPid = list_to_pid(Value),
UserPid ! extract_request(Arg),
receive
Response -> {html,format_response(Response)}
after ?TIMEOUT -> {html,"request_timedout"}
end
end.
check_initial_state(A)->
CookieObject = (A#arg.headers)#headers.cookie,
case yaws_api:find_cookie_val("InitialState", CookieObject) of
[] -> unkown;
Cookie -> {ok,Cookie}
end.
extract_request(Arg)-> %% get request from POST Data or Get Data
Post__data_proplist = yaws_api:parse_post(Arg),
Get_data_proplist = yaws_api:parse_query(Arg),
%% do many other things....
Request = remove_request(Post__data_proplist,Get_data_proplist),
Request.
That simple set up shows you how you would use processes to keep things about a user. However, the use of processes is not good. Processes do fail, so you need a way of recovering what data they were holding.
A better approach is to have a Data storage about your users and have one gen_server to do the look ups. You could use Mnesia. I do not advise you to use processes on the web to keep user state, no matter what kind of app you are doing, even if its a messaging app. Mnesia or ETS tables can keep state and all you need to do is look up.
Use a better storage mechanism to keep state other than processes. Processes are a point of failure. Others use Cookies (and/or Session cookies), whose value is used in some way to look up something from a database. However, if you insist that you need processes, then, have a way of remembering their Pids or registered names. You could store a user Pid into their session cookie e.t.c.

Basic Erlang Question

I have been trying to learn Erlang and came across some code written by Joe Armstrong:
start() ->
F = fun interact/2,
spawn(fun() -> start(F, 0) end).
interact(Browser, State) ->
receive
{browser, Browser, Str} ->
Str1 = lists:reverse(Str),
Browser ! {send, "out ! " ++ Str1},
interact(Browser, State);
after 100 ->
Browser ! {send, "clock ! tick " ++ integer_to_list(State)},
interact(Browser, State+1)
end.
It is from a blog post about using websockets with Erlang: http://armstrongonsoftware.blogspot.com/2009/12/comet-is-dead-long-live-websockets.html
Could someone please explain to me why in the start function, he spawns the anonymous function start(F, 0), when start is a function that takes zero arguments. I am confused about what he is trying to do here.
Further down in this blog post (Listings) you can see that there is another function (start/2) that takes two arguments:
start(F, State0) ->
{ok, Listen} = gen_tcp:listen(1234, [{packet,0},
{reuseaddr,true},
{active, true}]),
par_connect(Listen, F, State0).
The code sample you quoted was only an excerpt where this function was omitted for simplicity.
The reason for spawning a fun in this way is to avoid having to export a function which is only intended for internal use. One problem with is that all exported functions are available to all users even if they only meant for internal use. One example of this is a call-back module for gen_server which typically contains both the exported API for clients and the call-back functions for the gen_server behaviour. The call-back functions are only intended to be called by the gen_server behaviour and not by others but they are visible in the export list and not in anyway blocked.
Spawning a fun decreases the number of exported internal functions.
In Erlang, functions are identified by their name and their arity (the number of parameters they take). You can have more than one function with the same name, as long as they all have different numbers of parameters. The two functions you've posted above are start/0 and interact/2. start/0 doesn't call itself; instead it calls start/2, and if you take a look further down the page you linked to, you'll find the definition of start/2.
The point of using spawn in this way is to start a server process in the background and return control to the caller. To play with this code, I guess that you'd start up the Erlang interpreter, load the script and then call the start/0 function. This method would then start a process in the background and return so that you could continue to type into the Erlang interpreter.

Resources