Reference vs pid? - erlang

I'm not entirely sure the differences between the PID and Reference and when to use which.
If I were to spawn a new process with spawn/1 pid. I can kill it with the PID no? Why would I need a reference?
Likewise I see monitor/1 receiving a message with a ref and pid number.
Thanks!

Pid is process identifier. You can get one when you create new process with spawn, or you can get Pid of yourself with self(). It allows you to interact with given process. Especially send messages to it by Pid ! Message. And some other stuff, like killing it explicitly (should not do) or obtaining some process information with erlang:process_info.
And you can create relations between process with erlang:link(Pid) and erlang:monitor(process, Pid) (that's between Pid process, and process execution this function). In short, it gives you "notifications" when another process dies.
Reference is just almost unique value (of different type). One might say, that it gives you some reference to here and now, which you could recognize later. For example, if we are sending a message to another process, and we expect a response, we would like to make sure, that the message we will receive is associated to our request, and not just any message from someone else. The easiest way to do it is to tag the message with a unique value, and wait until a response with exactly the same tag.
Tag = make_ref(),
Pid ! {Tag, Message},
receive
{Tag, Response} ->
....
In this code, with use of pattern matching, we make sure that (we wait in receive until) Response is exactly for the Message we sent. No matter other messages from other processes. This is the most common use of reference you can encounter.
And now back to monitor. When calling Ref = monitor(process, Pid) we make this special connection with Pid process. Ref that is returned is just some unique reference, that we could use to demonitor this process. That is all.
One might ask, if we are able to create monitor with Pid, why do we need Ref for demonitoring? Couldn't we just use Pid again. In theory we could, but monitors are implemented in such a way, that multiple monitors could be established between two same processes. So when demonitoring, we have to remove only one of such connections. It is done in this way to make monitoring more transparent. If you have library of function that's creating and removing one monitor, you would not like to interfere with other libraries and functions and monitors they might be using.

According this page:
References are erlang objects with exactly two properties:
They can be created by a program (using make_ref/0), and,
They can be compared for equality.
You should use it ever you need to bind an unique identifier to some "object". Any time you could generate new one using erlang:make_ref/0. Documentation says:
make_ref() -> reference()
Returns an almost unique reference.
The returned reference will re-occur after approximately 2^82 calls;
therefore it is unique enough for practical purposes.
When you call erlang:monitor/2 function, it returns you reference to give you availability to cancel monitor (erlang:demonitor/1 function). This reference only identifies certain call of erlang:monitor/1. If you need operate with process (kill it, for example), you still have to use process pid.
Likewise I see monitor/1 receiving a message with a ref and pid number.
Yep, monitor sends messages like {'DOWN', Ref, process, Pid, Reason}. What to use (pid or ref) is only depends on your application logic, but (IMO) in most usual cases, there is no matter what to use.

Related

Erlang: how to deal with long running init callback?

I have a gen_server that when started attempts to start a certain number of child processes (usually 10-20) under a supervisor in the supervision tree. The gen_server's init callback invokes supervisor:start_child/2 for each child process needed. The call to supervisor:start_child/2 is synchronous so it doesn't return until the child process has started. All the child processes are also gen_servers, so the start_link call doesn't return until the init callback returns. In the init callback a call is made to a third-party system, which may take a while to respond (I discovered this issue when calls to a third-party system were timing out after 60 seconds). In the meantime the init call has blocked, meaning the supervisor:start_child/2 is also blocked. So the whole time the gen_server process that invoked supervisor:start_child/2 is unresponsive. Calls to the gen_server timeout while it is waiting the on the start_child function to return. Since this can easily last for 60 seconds or more. I would like to change this as my application is suspended in a sort of half started state while it is waiting.
What is the best way to resolve this issue?
The only solution I can think of is to move the code that interacts with the third-party system out of the init callback and into a handle_cast callback. This would make the init callback faster. The disadvantage is that I would need to call gen_server:cast/2 after all the child processes have been started.
Is there a better way of doing this?
One approach I've seen is use of timeout init/1 and handle_info/2.
init(Args) ->
{ok, {timeout_init, Args} = _State, 0 = _Timeout}.
...
handle_info( timeout, {timeout_init, Args}) ->
%% do your inicialization
{noreply, ActualServerState}; % this time no need for timeout
handle_info( ....
Almost all results you can be returned with additional timeout parameter, which is basically time to wait for a another message. It given time passes the handle_info/2 is called, with timeout atom, and servers state. In our case, with timeout equal to 0, the timeout should occur even before gen_server:start finishes. Meaning that handle_info should be called even before we are able to return pid of our server to anyone else. So this timeout_init should be first call made to our server, and give us some assurance, that we finish initialization, before handling anything else.
If you don't like this approach (is not really readable), you might try to send message to self in init/1
init(Args) ->
self() ! {finish_init, Args},
{ok, no_state_yet}.
...
handle_info({finish_init, Args} = _Message, no_state_yet) ->
%% finish whateva
{noreply, ActualServerState};
handle_info( ... % other clauses
Again, you are making sure that message to finish initialization is send as soon as possible to this server, which is very important in case of gen_servers which register under some atom.
EDIT After some more careful study of OTP source code.
Such approach is good enough when you communicate with your server trough it's pid. Mainly because pid is returned after your init/1 functions returns. But it is little bit different in case of gen_.. started with start/4 or start_link/4 where we automatically register process under same name. There is one race condition you could encounter, which I would like to explain in little more detail.
If process is register one usually simplifies all calls and cast to server, like:
count() ->
gen_server:cast(?SERVER, count).
Where ?SERVER is usually module name (atom) and which will work just fine untill under this name is some registered (and alive) process. And of course, under the hood this cast is standard Erlang's message send with !. Nothing magical about it, almost the same as you do in your init with self() ! {finish ....
But in our case we assume one more thing. Not just registration part, but also that our server finished it's initialization. Of course since we are dealing with message box, it is not really important how long something takes, but it is important which message we receive firs. So to be exact, we would like to receive finish_init message before receiving count message.
Unfortunately such scenario could happened. This is due to fact that gen's in OTP are registered before init/1 callback is called. So in theory while one process calls start function which will go up to registration part, than another one could find our server and send count message, and just after that the init/1 function would be called with finish_init message. Chances are small (very, very small), but still it could happen.
There are three solutions to this.
First would be to do nothing. In case of such race condition the handle_cast would fail (due to function clause, since our state is not_state_yet atom), and supervisor would just restart whole thing.
Second case would be ignoring this bad message/state incident. This is easily achieved with
... ;
handle_cast( _, State) ->
{noreply, State}.
as your last clause. And unfortunately most people using templates use such unfortunate (IMHO) pattern.
In both of those you maybe could lose one count message. If that is really a problem you still could try to fix it by changing last clause to
... ;
handle_cast(Message, no_state_yet) ->
gen_server:cast( ?SERVER, Message),
{noreply, no_state_yet}.
but this have other obvious advantages, an I would prefer "let it fail" approach.
Third option is registering process little bit later. Rather than using start/4 and asking for automatic registration, use start/3, receive pid, and register it yourself.
start(Args) ->
{ok, Pid} = gen_server:start(?MODULE, Args, []),
register(?SERVER, Pid),
{ok, Pid}.
This way we send finish_init message before registration, and before any one else could send and count message.
But such approach has it's own drawbacks, mainly registration itself which could fail in few different ways. One could always check how OTP handles that, and duplicate this code. But this is another story.
So in the end it all depends on what you need, or even what problems you will encounter in production. It is important to have some idea what bad could happen, but I personally wouldn't try to fix any of it until I would actually suffer from such race condition.

How to check if queue with auto-generated name (amq.gen-*) exists?

In case of non-generated names it's enough to call #'queue.declare' to get newly created queue or existing one with given name. However, when using auto-generated names (beginning with amq.gen- prefix) it's not as trivial. First of all, amq. is restricted prefix, so there is no way to call #'queue.declare'{queue=<<"amq.gen-xxx">>}.
I also tried to play with passive=true option and although I may pass restricted name, I get an exit error when queue does not exists. Following is error report:
** Handler sse_handler terminating in init/3
for the reason exit:{{shutdown,
{server_initiated_close,404,
<<"NOT_FOUND - no queue 'amq.gen-wzPK0nIBPzr-dwtZ5Jy58V' in vhost '/'">>}},
{gen_server,call,
[<0.62.0>,
{call,
{'queue.declare',0,
<<"amq.gen-wzPK0nIBPzr-dwtZ5Jy58V">>,
true,false,false,false,false,[]},
none,<0.269.0>},
infinity]}}
Is there any way to solve this problem?
EDIT: Here is a short story behind this question. Disclaimer: I'm erlang newbie, so maybe there is better way to make it working :)
I have a gen_server based application holding SSE (server-side events) connections with web browsers. Each connection is bound to rabbitmq queue. SSE connection when broken, automatically tries to reconnect after given timeout - this is something that web browser supports out of the box. To reuse previously created queue I'm trying to check if queue of given name (taken from request cookie) already exists. It's all done in init callback.
You can declare a queue with the prefix amq. if the queue already exists. You would get Declare-Ok if the queue exists or access-refused if not. (My question is why would you, though? ;)
Furthermore, you can use the passive option to check if it already exists. According to AMQP reference the server treats it as not-found error if the queue doesn't exist. In order to catch this in your Erlang client you could try something around the lines of this:
try
%% declare queue with passive=true
queue_exists
catch exit:{{shutdown, {server_initiated_close,404,_},_,_} ->
queue_does_not_exist
end

erlang typechecking

From what I understand there is no way of type-checking the messages send in erlang.
lets say i start a module with the following receive loop:
loop(State) ->
receive
{insert, _} ->
io:fwrite("insert\n",[]),
loop(State);
{view, _} ->
io:fwrite("view\n", []),
loop(State)
after 10000 ->
ok
end.
There is no way for me to check what people are sending to the process, and no way to check for that its type safe?
Are there any easy work arrounds?
The one I have come up with is using functions in the module being called like :
send_insert(Message) ->
whereis(my_event_handler) ! {insert, Message},
ok.
this way at least I can add the -spec send_insert(string()) -> ok. spec to the module. now at least I have limited the error to my module.
Are there a more standard way of doing typechecking on messages?
There is sheriff project that solves your problem. You can use it for checking values against their type as defined through typespecs.
I would say that having a function like send_insert in your module, that just sends a message to the process, is good practice not just for type checking. If you need to change the message format some time in the future, you'll know that you only need to change that function and possibly its callers, which is easier to track down than finding all places that send a message of a certain format to some process (which may or may not be the process whose code you're refactoring). Also, since any callers will need to specify the module name, the code becomes a little more self-documenting; you'll know what process that message is supposed to go to.
(BTW, whereis(my_event_handler) ! {insert, Message} can be written as my_event_handler ! {insert, Message}.)
Well, if what you need is just some basic type (and maybe range) checking, you can use guards:
receive
{insert, Message} when is_list(Message) ->
io:fwrite("insert\n",[]),
loop(State);
Unfortunately, because of some constraints (guards must be free of any side-effects, for example) there's no way to write your own guard functions.
AFAIK, "-spec" is only for documentation purposes and will not check your types at runtime.
As you correctly say, there's no typechecking per se, but you can have a mix of pattern matching and guards to make things fails. Nevertheless, this is all defensive programming, and you should just let it crash, and have a supervisor tree restart whatever needs to be restarted. The logs and crash reports should give you enough information to know what went wrong and act accordingly.

Why are error_logger messages in different order on the console compared to error_logger_mf file

I'm looking at error_logger messages on the console and store them in a file with error_logger_mf at the same time.
The messages are totally in a different order if I look at the file and the console.
The time-stamps all show the same value, so its going pretty fast, and I do understand that messages could get out of order when sent from different processes.
But I always thought that once the reach the error_logger they are kept in the same order when they are sent to the different event handlers.
What I see that in the files (when I look at it with rb) the events come out in a more sane order than on the console.
Clarification:
It is clear that the order in which messages from different processes arrive at error_logger is not to be take too serious.
What I don't understand is the difference in order, when I compare the disk log to the screen log.
Added a answer as community wiki with my partial findings below, please edit if you know additional points.
Update: this is still unresolved, feel free to add to this community wiki if you know something
Did some digging in the source, but no solution to the riddle so far:
Looked into error_logger_tty_h.erl which should be responsible for output to the console:
handle_event({_Type, GL, _Msg}, State) when node(GL) =/= node() ->
{ok, State};
handle_event(Event, State) ->
write_event(tag_event(Event)),
{ok, State}.
So events that have a group_leader on another node are ignored, everything not ignored is passed through write_event/1. Which does some formatting and then outputs the result with:
format(String) -> io:format(user, String, []).
format(String, Args) -> io:format(user, String, Args).
In user.erl where io:format sends its io_request we have one server loop calling a cascade of functions that ultimately send the text to the tty port.
At no point there are messages sent from more than one process!
So I can't see any way for the messages to change order while travelling to the tty.
Where else can the order of reports change depending on if the messages are sent to tty or to mf?

Resolving a dead lock between two gen_tcp

While browsing the code of an erlang application, I came across an interesting design problem. Let me describe the situation, but I can't post any code because of PIA sorry.
The code is structured as an OTP application in which two gen_server modules are responsible for allocating some kind of resources. The application runs perfectly for some time and we didn't really had big issues.
The tricky part begins when one the first gen_server need to check if the second have enough resources left. A call is issued to the second gen_server that itself call a utility library that (in very very special case) issue a call to the first gen_server.
I'm relatively new to erlang but I think that this situation is going to make the two gen_server wait for each other.
This is probably a design problem but I just wanted to know if there is any special mechanism built into OTP that can prevent this kind of "hangs".
Any help would be appreciated.
EDIT :
To summaries the answers : If you have a situation where two gen_servers call each other in a cyclic way you'd better spend some more time in the application design.
Thanks for your help :)
This is called a deadlock and could/should be avoided at a design level. Below is a possible workaround and some subjective points that hopefully helps you avoid doing a mistake.
While there are ways to work around your problem, "waiting" is exactly what the call is doing.
One possible work around would be to spawn a process from inside A which calls B, but does not block A from handling the call from B. This process would reply directly to the caller.
In server A:
handle_call(do_spaghetti_call, From, State) ->
spawn(fun() -> gen_server:reply(From, call_server_B(more_spaghetti)) end),
{noreply, State};
handle_call(spaghetti_callback, _From, State) ->
{reply, foobar, State}
In server B:
handle_call(more_spaghetti, _From, State) ->
{reply, gen_server:call(server_a, spaghetti_callback), State}
For me this is very complex and superhard to reason about. I think you even could call it spaghetti code without offending anyone.
On another note, while the above might solve your problem, you should think hard about what calling like this actually implies. For example, what happens if server A executes this call many times? What happens if at any point there is a timeout? How do you configure the timeouts so they make sense? (The innermost call must have a shorter timeout than the outer calls, etc).
I would change the design, even if it is painful, because when you allow this to exist and work around it, your system becomes very hard to reason about. IMHO, complexity is the root of all evil and should be avoided at all costs.
It is mostly a design issue where you need to make sure that there are no long blocking calls from gen_server1. This can quite easily be done by spawning a small fun which takes care of your call to gen_server2 and the delivers the result to gen_server1 when done.
You would have to keep track of the fact that gen_server1 is waiting for a response from gen_server2. Something like this maybe:
handle_call(Msg, From, S) ->
Self = self(),
spawn(fun() ->
Res = gen_server:call(gen_server2, Msg),
gen_server:cast(Self, {reply,Res})
end),
{noreply, S#state{ from = From }}.
handle_cast({reply, Res}, S = #state{ from = From }) ->
gen_server:reply(From, Res),
{noreply, S#state{ from = undefiend}.
This way gen_server1 can serve requests from gen_server2 without hanging. You would ofcourse also need to do proper error propagation of the small process, but you get the general idea.
Another way of doing it, which I think is better, is to make this (resource) information passing asynchronous. Each server reacts and does what it is supposed to when it gets an (asynchronous) my_resource_state message from the other server. It can also prompt the other server to send its resource state with an send_me_your_resource_state asynchronous message. As both these messages are asynchronous they will never block and a server can process other requests while it is waiting for a my_resource_state message from the other server after prompting it.
Another benefit of having the message asynchronous is that servers can send off this information without being prompted when they feel it is necessary, for example "help me I am running really low!" or "I am overflowing, do you want some?".
The two replies from #Lukas and #knutin actually do do it asynchronously, but they do it by a spawning a temporary process, which can then do synchronous calls without blocking the servers. It is easier to use asynchronous messages straight off, and clearer in intent as well.

Resources