I'm currently writing a piece of software in erlang, which is now based on gen_server behaviour. This gen_server should export a function (let's call it update/1) which should connect using ssl to another service online and send to it the value passed as argument to the function.
Currently update/1 is like this:
update(Value) ->
gen_server:call(?SERVER, {update, Value}).
So once it is called, there is a call to ?SERVER which is handled as:
handle_call({update, Value}, _From, State) ->
{ok, Socket} = ssl:connect("remoteserver.com", 5555, [], 3000).
Reply = ssl:send(Socket, Value).
{ok, Reply, State}.
Once the packet is sent to the remote server, the peer should severe the connection.
Now, this works fine with my tests in shell, but what happens if we have to call 1000 times mymod:update(Value) and ssl:connect/4 is not working well (i.e. is reaching its timeout)?
At this point, my gen_server will have a very large amount of values and they can be processed only one by one, leading to the point that the 1000th update will be done only 1000*3000 milliseconds after its value was updated using update/1.
Using a cast instead of a call would leave to the same problem. How can I solve this problem? Should I use a normal function and not a gen_server call?
From personal experience I can say that 1000 messages per gen_server process wont be a problem unless you are queuing big messages.
If from your testing it seems that your gen_server is not able to handle this much load, then you must create multiple instances of your gen_server preferably under a supervisor process at the boot time (or run-time) of your application.
Besides that, I really don't understand the requirement of making a new connection for each update!! you should consider some optimization like cached connections/ pre-connections to the server..no?
Related
In Erlang, can I call some function f (BIF or not), whose job is to spawn a process, run the function argf I provided, and doesn't "return" until argf has "returned", and do this without using receive clause (the reason for this is that f will be invoked in a gen_server, I don't want pollute the gen_server's mailbox).
A snippet would look like this:
%% some code omitted ...
F = fun() -> blah, blah, timer:sleep(10000) end,
f(F), %% like `spawn(F), but doesn't return until 10 seconds has passed`
%% ...
The only way to communicate between processes is message passing (of course you can consider to poll for a specific key in an ets or a file but I dont like this).
If you use a spawn_monitor function in f/1 to start the F process and then have a receive block only matching the possible system messages from this monitor:
f(F) ->
{_Pid, MonitorRef} = spawn_monitor(F),
receive
{_Tag, MonitorRef, _Type, _Object, _Info} -> ok
end.
you will not mess your gen_server mailbox. The example is the minimum code, you can add a timeout (fixed or parameter), execute some code on normal or error completion...
You will not "pollute" the gen_servers mailbox if you spawn+wait for message before you return from the call or cast. A more serious problem with this maybe that you will block the gen_server while you are waiting for the other process to terminate. A way around this is to not explicitly wait but return from the call/cast and then when the completion message arrives handle it in handle_info/2 and then do what is necessary.
If the spawning is done in a handle_call and you want to return the "result" of that process then you can delay returning the value to the original call from the handle_info handling the process termination message.
Note that however you do it a gen_server:call has a timeout value, either implicit or explicit, and if no reply is returned it generates an error in the calling process.
Main way to communicate with process in Erlang VM space is message passing with erlang:send/2 or erlang:send/3 functions (alias !). But you can "hack" Erlang and use multiple way for communicating over process.
You can use erlang:link/1 to communicate stat of the process, its mainly used in case of your process is dying or is ended or something is wrong (exception or throw).
You can use erlang:monitor/2, this is similar to erlang:link/1 except the message go directly into process mailbox.
You can also hack Erlang, and use some internal way (shared ETS/DETS/Mnesia tables) or use external methods (database or other things like that). This is clearly not recommended and "destroy" Erlang philosophy... But you can do it.
Its seems your problem can be solved with supervisor behavior. supervisor support many strategies to control supervised process:
one_for_one: If one child process terminates and is to be restarted, only that child process is affected. This is the default restart strategy.
one_for_all: If one child process terminates and is to be restarted, all other child processes are terminated and then all child processes are restarted.
rest_for_one: If one child process terminates and is to be restarted, the 'rest' of the child processes (that is, the child processes after the terminated child process in the start order) are terminated. Then the terminated child process and all child processes after it are restarted.
simple_one_for_one: A simplified one_for_one supervisor, where all child processes are dynamically added instances of the same process type, that is, running the same code.
You can also modify or create your own supervisor strategy from scratch or base on supervisor_bridge.
So, to summarize, you need a process who wait for one or more terminating process. This behavior is supported natively with OTP, but you can also create your own model. For doing that, you need to share status of every started process, using cache or database, or when your process is spawned. Something like that:
Fun = fun
MyFun (ParentProcess, {result, Data})
when is_pid(ParentProcess) ->
ParentProcess ! {self(), Data};
MyFun (ParentProcess, MyData)
when is_pid(ParentProcess) ->
% do something
MyFun(ParentProcess, MyData2) end.
spawn(fun() -> Fun(self(), InitData) end).
EDIT: forgot to add an example without send/receive. I use an ETS table to store every result from lambda function. This ETS table is set when we spawn this process. To get result, we can select data from this table. Note, the key of the row is the process id of the process.
spawner(Ets, Fun, Args)
when is_integer(Ets),
is_function(Fun) ->
spawn(fun() -> Fun(Ets, Args) end).
Fun = fun
F(Ets, {result, Data}) ->
ets:insert(Ets, {self(), Data});
F(Ets, Data) ->
% do something here
Data2 = Data,
F(Ets, Data2) end.
Here is a simple UDP server:
-module(kvstore_udpserver).
-author("mylesmcdonnell").
%% API
-export([start/0]).
start() ->
spawn(fun() -> server(2346) end).
server(Port) ->
{ok, Socket} = gen_udp:open(Port, [binary]),
loop(Socket).
loop(Socket) ->
receive
{udp, Socket, Host, Port, Bin} ->
case binary_to_term(Bin) of
{store, Value} ->
io:format("kvstore_udpserver:{store, Value}~n"),
gen_udp:send(Socket,Host,Port,term_to_binary(kvstore:store(Value)));
{retrieve, Key} ->
io:format("kvstore_udpserver:{retrieve, Value}~n"),
gen_udp:send(Socket,Host,Port,term_to_binary(kvstore:retrieve(Key)))
end,
loop(Socket)
end.
How can I restructure this so that
a) It, or at least the relevant part of it, is a gen_server so that I can add to the supervision tree
b) increase concurrency by handling each message in a separate process.
I have reimplemented the sockserv example from Learn You Some Erlang for my TCP server but I'm struggling to determine a similar model for UDP.
For a):
You need to delcare the gen_server behaviour and implement all the callback functions (this is obvious, but it's worth calling out explicitly). If you have rebar installed, you can use the command rebar create template=simplesrv srvid=your_server_name to add the boilerplate functions.
You'd probably want to move the server starting business logic (the gen_udp:open/2 call) to your server's init/1 function. (The init is required by the gen_server behaviour. You can also start your loop/1 function there.
You'd probably want to make sure the udp server is closed by the module's terminate/2 function.
Move the business logic for handling requests that come in from parsing the messages to your loop/1 function into handle_call/3 or handle_cast/2 in your module (see below).
For b):
You have a few options, but basically, whenever you receive a message, you can use gen_server:cast/2 (if you don't care about the response) or gen_server:call/2,3 if you do. The casts or calls will be handled by the handle_cast/2 or handle_call/3 functions in your module.
Casts are inherently non-blocking and the answers to this question have a good design pattern for handling call operations asynchronously in gen_servers. You can crib from that.
I want to pass some arguments to supervisor:init/1 function and it is desirable, that the application's interface was so:
redis_pool:start() % start all instances
redis_pool:start(Names) % start only given instances
Here is the application:
-module(redis_pool).
-behaviour(application).
...
start() -> % start without params
application:ensure_started(?APP_NAME, transient).
start(Names) -> % start with some params
% I want to pass Names to supervisor init function
% in order to do that I have to bypass application:ensure_started
% which is not GOOD :(
application:load(?APP_NAME),
case start(normal, [Names]) of
{ok, _Pid} -> ok;
{error, {already_started, _Pid}} -> ok
end.
start(_StartType, StartArgs) ->
redis_pool_sup:start_link(StartArgs).
Here is the supervisor:
init([]) ->
{ok, Config} = get_config(),
Names = proplists:get_keys(Config),
init([Names]);
init([Names]) ->
{ok, Config} = get_config(),
PoolSpecs = lists:map(fun(Name) ->
PoolName = pool_utils:name_for(Name),
{[Host, Port, Db], PoolSize} = proplists:get_value(Name, Config),
PoolArgs = [{name, {local, PoolName}},
{worker_module, eredis},
{size, PoolSize},
{max_overflow, 0}],
poolboy:child_spec(PoolName, PoolArgs, [Host, Port, Db])
end, Names),
{ok, {{one_for_one, 10000, 1}, PoolSpecs}}.
As you can see, current implementation is ugly and may be buggy. The question is how I can pass some arguments and start application and supervisor (with params who were given to start/1) ?
One option is to start application and run redis pools in two separate phases.
redis_pool:start(),
redis_pool:run([] | Names).
But what if I want to run supervisor children (redis pool) when my app starts?
Thank you.
The application callback Module:start/2 is not an API to call in order to start the application. It is called when the application is started by application:start/1,2. This means that overloading it to provide differing parameters is probably the wrong thing to do.
In particular, application:start will be called directly if someone adds your application as a dependency of theirs (in the foo.app file). At this point, they have no control over the parameters, since they come from your .app file, in the {mod, {Mod, Args}} term.
Some possible solutions:
Application Configuration File
Require that the parameters be in the application configuration file; you can retrieve them with application:get_env/2,3.
Don't start a supervisor
This means one of two things: becoming a library application (removing the {mod, Mod} term from your .app file) -- you don't need an application behaviour; or starting a dummy supervisor that does nothing.
Then, when someone wants to use your library, they can call an API to create the pool supervisor, and graft it into their supervision tree. This is what poolboy does with poolboy:child_spec.
Or, your application-level supervisor can be a normal supervisor, with no children by default, and you can provide an API to start children of that, via supervisor:start_child. This is (more or less) what cowboy does.
You can pass arguments in the AppDescr argument to application:load/1 (though its a mighty big tuple already...) as {mod, {Module, StartArgs}} according to the docs ("according to the docs" as in, I don't recall doing it this way myself, ever: http://www.erlang.org/doc/apps/kernel/application.html#load-1).
application:load({application, some_app, {mod, {Module, [Stuff]}}})
Without knowing anything about the internals of the application you're starting, its hard to say which way is best, but a common way to do this is to start up the application and then send it a message containing the data you want it to know.
You could make receipt of the message form tell the application to go through a configuration assertion procedure, so that the same message you send on startup is also the same sort of thing you would send it to reconfigure it on the fly. I find this more useful than one-shotting arguments on startup.
In any case, it is usually better to think in terms of starting something, then asking it to do something for you, than to try telling it everything in init parameters. This can be as simple as having it start up and wait for some message that will tell the listener to then spin up the supervisor the way you're trying to here -- isolated one step from the application inclusion issues RL mentioned in his answer.
To exchange data,it becomes important to link the process first.The following code does the job of linking two processes.
start_link(Name) ->
gen_fsm:start_link(?MODULE, [Name], []).
My Question : which are the two processes being linked here?
In your example, the process that called start_link/1 and the process being started as (?MODULE, Name, Args).
It is a mistake to think that two processes need to be linked to exchange data. Data links the fate of the two processes. If one dies, the other dies, unless a system process is the one that starts the link (a "system process" means one that is trapping exits). This probably isn't what you want. If you are trying to avoid a deadlock or do something other than just timeout during synchronous messaging if the process you are sending a message to dies before responding, consider something like this:
ask(Proc, Request, Data, Timeout) ->
Ref = monitor(process, Proc),
Proc ! {self(), Ref, {ask, Request, Data}},
receive
{Ref, Res} ->
demonitor(Ref, [flush]),
Res;
{'DOWN', Ref, process, Proc, Reason} ->
some_cleanup_action(),
{fail, Reason}
after
Timeout ->
{fail, timeout}
end.
If you are just trying to spawn a worker that needs to give you an answer, you might want to consider using spawn_monitor instead and using its {pid(), reference()} return as the message you're listening for in response.
As I mentioned above, the process starting the link won't die if it is trapping exits, but you really want to avoid trapping exits in most cases. As a basic rule, use process_flag(trap_exit, true) as little as possible. Getting trap_exit happy everywhere will have structural effects you won't intend eventually, and its one of the few things in Erlang that is difficult to refactor away from later.
The link is bidirectional, between the process which is calling the function start_link(Name) and the new process created by gen_fsm:start_link(?MODULE, [Name], []).
A called function is executed in the context of the calling process.
A new process is created by a spawn function. You should find it in the gen_fsm:start_link/3 code.
When a link is created, if one process exit for an other reason than normal, the linked process will die also, except if it has set process_flag(trap_exit, true) in which case it will receive the message {'EXIT',FromPid,Reason} where FromPid is the Pid of the process that came to die, and Reason the reason of termination.
I am using yaws (Erlang framework) for socket communication. I can send message back to the user from server using websocket_send however i need to specify the PID of the user, that means that i can send message back to that user. However, i would like to send message to all connected users. Is there any way to do it?
Every time a websocket connection is established a new gen_server process is created for that connection. Hence each of these servers corresponds to one websocket connection. Thus websocket_send requires the PID of the gen_server.
For sending message to all the connected clients you need to maintain the PIDs of all the gen_servers. This can be done by having your own gen_server or using ets.
Similar to sending the Pid to gen_server
you can send the Pid in websocket callback init function
init(Args) ->
gen_server:cast(?YOURSERVER,{connection_open, self()}),
{ok, []}.
During termination
terminate(Reason, State) ->
gen_server:cast(?YOURSERVER,{connection_close, self()}).
Your gen_server handle_cast may look like this
handle_cast({connection_open, Pid}, Pids) ->
{noreply, [Pid | Pids]};
handle_cast({connection_close, Pid}, Pids) ->
{noreply, lists:delete(Pid, Pids)};
handle_cast({send_to_all, Msg}, Pids) ->
[yaws_api:websocket_send(Pid, Msg) || Pid <- Pids, is_process_alive(Pid)],
{noreply, Pids}.
Got it worked !!! Using GProc :)
Gproc is a process dictionary for Erlang, which provides a number of useful features beyond what the built-in dictionary has:
Use any term as a process alias
Register a process under several aliases
Non-unique properties can be registered simultaneously by many processes
QLC and match specification interface for efficient queries on the dictionary
Await registration, let's you wait until a process registers itself
Atomically give away registered names and properties to another process
Counters, and aggregated counters, which automatically maintain the total of all counters with a given name
Global registry, with all the above functions applied to a network of nodes
That will need a comprehensive approach which involves in-memory storage. Forexample, Each user may have a process holding the socket connection and so, you save say, in mnesia, or ets table e.t.c. a record like: #connected_user{pid = Pid,username = Username,other_params = []}. Later after advancing your perception of this problem, you will move onto session management, how to handle offline messages, and most importantly presence. Anyways, when a message comes in, having the destination username, then you will make a lookup from our table and get the corresponding Pid, and then send it this message, which in turn, it will then send it through its live Web Socket.