what's the process state when a erlang process is running `receive after`? - erlang

I want know the the erlang process state when this process is running receive after:
receive
X ->
ok
after 1000 ->
ok
end
1、Is the process state is running or waiting?
2、Does this process will use cpu schedler time?
3、If i have 120000 erlang process like this, every process will run code like this:
receive
X ->
ok
after 1000 ->
ok
end
So, Does this code will be a bottleneck?

The process is just moving along with whatever comes after the receive expression.
For example, let's say a inline a request/response:
ask_foo(SomePID) ->
Ref = make_ref(),
SomePID ! {self(), Ref, why},
receive
{Ref, Answer} ->
io:format("The answer: ~tp~n", [Answer])
after
1000 ->
io:format("~p is too slow. Moving on...~n", [SomePID])
end,
io:format("I'll print this in any case, and then exit.").
receive blocks until it either receives a message that matches one of its receive clauses, or the timeout occurs -- whichever happens first. Then it continues on doing whatever else is in its code. Very often there is a single receive loop, but it is not uncommon to use a series of inline receive clauses for things that should block, like waiting on a fixed sequence of inputs from a user or something similar.
The "process's state" is not changing in terms of its state data at all. It is blocking -- which means it is suspended until a message or a timeout occurrs. But, unlike polling systems, this does not carry an overhead penalty with it because the VM is managing the scheduling (the process doesn't have to wake itself up, it can safely block on receive).
You asked if this will be a bottleneck: No. No other processes are blocking, only this one. All other processes are executing on their own schedule, and they have nothing to do with this one. So when blocking on a receive you are only holding up the rest of the things this particular process is supposed to do. Whether or not that is a bottleneck becomes, therefore, an architectural question.

Related

Simple increment in erlang

I am looking to build simple erlang logic where an actor maintains the count of how many times it was invoked.
For example, here is the actor(many actors can be possible):
-module(actor).
-export([do_work/0]).
do_work() ->
increment = increment + 1
Suppose I invoke actor (with some Pid: XYZ) 5 times, XYZ = 5 because it was executed 5 times. But in Erlang, variables are immutable. So if I do increment++ for the first run, I cannot store the new result in increment or do increment = increment + 1. How do I maintain the count because I cannot create new variables dynamically say, increment_iteration_1 = increment + 1 and then do increment_iteration_2 = increment1 + 1 and so on.....in code for 1000 iterations?
Your question touches the main difference between Erlang and other popular languages. In order to maintain and update some state - as the counter in your example - in Erlang you need to do some specific things, which I describe below. (Other solutions, like using a database, I will skip here since I assume you want to know how to do state maintenance directly).
In Erlang what you call actor is called process. And yes, to maintain a state, you need a process. The code you presented, though named actor, is not a process. It is just a module, which means that it is just some code. To use a process you need to start it, keep it running, and communicate with it using messages. It may sound complicated, but Erlang standard library provides gen_server, which does most of the work for you.
The sample code implementing a counter with gen_server would look like that:
-module(actor).
-behaviour(gen_server).
-export([do_work/0, init/1, handle_cast/2, handle_call/3]).
do_work() ->
gen_server:cast(actor_server, do_work).
init(_Arguments) ->
{ok, 0}.
handle_cast(do_work, Counter) ->
{noreply, Counter + 1}.
handle_call(_Msg, _From, Counter) ->
{noreply, Counter}.
Now somewhere in your code you need to start your process with
gen_server:start_link({local, actor_server}, actor, [], [])
Then, whenever you call actor:do_work(), it will increment the counter.
Few things worth mentioning are:
actor is a callback module. Gen_server internally does the hard work and calls callback functions from your module when it needs
The init callback function is called once when your process starts. I used it here to initialise the counter.
The process is registered with the name actor_server. When do_work calls gen_server:cast internally it sends a message to the process registered under the actor_server name. If you need several processes doing the same counter separately, more housekeeping is required, which I will skip here.
The handle_call part, not used here, provides the functionality where you can receive a reply (e.g. updated counter) from the process.
calling gen_server:start_link directly is good for testing, but in real code you will have a supervisor starting the process for you
More information about gen_server you will find its documentation.

Is there a way for a process to find out that it's been orphanned in Erlang?

E.g.
make_orphan() ->
P = spawn(...),
ok
.
Is there a way for P to receive a message some time after make_orphan returns? Or is P destined to haunt the system (using up precious resources) for all eternity, unless it exits on its own?
A straightforward way to:
receive a message some time after make_orphan returns
is with a monitor.
make_orphan() ->
Parent = self(),
P = spawn(fun() -> monitor(process, Parent), ... end),
ok
P will then get a {'DOWN', Ref, process, Parent, Reason} message when Parent dies. Even if Parent exits before monitor/2 is called, the message will contain the reason noproc.
Communicate P to some process somewhere, register P in some way (register, global, gproc, pg2, some homebrew solution, etc.), have someone monitor it, etc. So sure, several ways. But a fundamental principle of an OTP program is that every process belongs to a supervision tree somewhere, so this becomes less of a problem.
Unless you are modeling a system that falls way outside the assumptions of OTP (like peer supervision among cellular automata) then you don't ever want to create the opportunity for orphans to exist. Orphan processes are the Erlang equivalent to memory leaks -- and that is never a good thing.
For some background information on some of the implications of writing OTP processes versus raw Erlang stuff where you're much more likely to leak processes, read the documentation for proc_lib and the "Sys and Proc_Lib" chapter of the OTP Design Principles docs.

Could you overflow the message queue of an Erlang process?

I'm still in the learning fase of Erlang, so I might be wrong, but this is how I understood a process' message queue.
A process could be in it's main receive loop, receiving certain types of messages, while later it could be put in a waiting loop to deal with a different kind of message in the second loop. If the process would receive messages intended for the first loop in the second loop, it would just put them in the queue, ignore them for the time being and only process those message that it can match against in the current loop it is in. Now if it would enter the first receive loop again, it would start from the beginning and again process the messages that it can match against.
Now my question would be, if this is how Erlang works and I understood this correctly, then what happens when a malicious process would send a bunch of messages that the process will never process. Will the queue eventually overflow, resulting in a crash for the process or how should I deal with this? I'll type out an example to illustrate what I mean.
Now if a malicious program would get a hold of the Pid and would go Pid ! {malicioudata, LotsOfData} repeatedly, would those messages be filtered out since they will never possibly be processed or would they just stack up in the queue?
startproc() -> firstloop(InitValues).
firstloop(Values) ->
receive
retrieveinformation ->
WaitingList=askforinformation(),
retrieveloop(WaitingList);
dostuff ->
NewValues=doingstuff(),
firstloop(NewValues);
sendmeyourdata ->
sendingdata(Values),
firstloop(Values)
end.
retrieveloop([],Values) -> firstloop(Values).
retrieveloop(WaitingList,Values) ->
receive
{hereismyinformation,Id,MyInfo} ->
NewValues=dosomethingwithinfo(Id,MyInfo),
retrieveloop(lists:remove(Id,1,WaitingList),NewValues);
end.
There is not a hard limit on message counts, and there is not a fixed amount of memory you are limited to, but you can certainly run out of memory if you have billions of messages (or a few super huge ones, maybe).
Long before you OOM because of a huge mailbox you will notice either selective receives taking a long time (not that "selective receive" is a good pattern to follow much of the time...) or innocently peek into a process mail queue and realized you've opened Pandora's Box in your terminal.
This is usually treated as a throttling and monitoring issue in the Erlang world. If you aren't able to keep up and your problem is parallelizable then you need more workers. If you are maxing out your hardware then you need more efficiency. If you are still maxing out your hardware, can't get any more, and you're still overwhelmed then you need to decide how to implement pushback or load shedding.
Unfortunately there is no "message queue overflow" and it's going to grow until VM crashes due to memory allocation error.
Solution is to drop any invalid messages in main loop, because you are not suppose to receive any of {hereismyinformation, _,_} nor one you get in askforinformation() due to blocking nature of your process.
startproc() -> firstloop(InitValues).
firstloop(Values) ->
receive
retrieveinformation ->
WaitingList=askforinformation(),
retrieveloop(WaitingList, Values); % i assume you meant that
dostuff ->
NewValues=doingstuff(),
firstloop(NewValues);
sendmeyourdata ->
sendingdata(Values),
firstloop(Values);
_ ->
firstloop(Values) % you can't get {hereismyinformation, _,_} here so we can drop any invalid message
end.
retrieveloop([],Values) -> firstloop(Values).
retrieveloop(WaitingList,Values) ->
receive
{hereismyinformation,Id,MyInfo} ->
NewValues=dosomethingwithinfo(Id,MyInfo),
retrieveloop(lists:remove(Id,1,WaitingList),NewValues);
end.
It's not really a problem with unexpected messages because it's easily avoidable but when process queue is growing faster than it's processed. For this specific problem there is a nice jobs framework for production systems.

Erlang/OTP pattern to ensure Composite process accepts a message only when children are done

Is there an Erlang/OTP pattern/library for the following problem(before I hack my own):
At the highest level, imagine there are three components(or processes?) such that A->B->C where -> means sends a message to.
B in terms of architecture is a composite process. It is composed of many unit processes(shown in khaki green below). Sometimes, the message chain goes from B1->B2->B3->C and sometimes it goes from B1->B4->B5->B6->B3->C.
What I would like to do is:
B can only accept the next message when all it's children processes are done i.e B receives a message I1 and depending on the message, it will choose one flow and finally C gets a message O1. Until that happens, B should not accept the message I2. This is to ensure ordering of messages so that O2 of I2 does not reach C before O1 of I1.
This has a few names. One is "dataflow" (as in "reactive programming" -- which is sort of an overblown ball of buzzwords if you look it up) and another is "signal simulation" (as in simulation of electrical signal switches). I am not aware of a framework for this in Erlang, because it is very straightforward to implement directly.
The issue of message ordering can be made to take care of itself, depending on how you want to write things. Erlang guarantees the ordering of message between two processes, so as long as messages travel in well-defined channels, this system-wide promise can be made to work for you. If you need some more interesting signal paths than straight lines you can force synch communication; though all Erlang message are asynchronous, you can introduce synchronous blocking on receive wherever you want.
If you want the "B constellation" to pass a message to C but only after its signal processing has completely run its route through the B's, you can make a signal manager which sends a message to B1, and blocks until it receives the output from B3, whence it passes the completed message on to C and checks its box for the next thing from A:
a_loop(B) ->
receive {in, Data} -> B ! Data end,
a_loop(B).
% Note the two receives here -- we are blocking for the end of processing based
% on the known Ref we send out and expect to receive back in a message match.
b_manager(B1, C) ->
Ref = make_ref(),
receive Data -> B1 ! {Ref, Data} end,
receive {Ref, Result} -> C ! Result end,
b_manager(B1, C).
b_1(B2) ->
receive
{Ref, Data} ->
Mod1 = do_processing(Data),
B2 ! {Ref, Mod1}
end,
b_1(B2).
% Here you have as many "b_#" processes as you need...
b_2(B) ->
receive
{Ref, Data} ->
Result = do_other_processing(Data),
B ! {Ref, Result}
end,
b_2(B).
c_loop() ->
receive Result -> stuff(Result) end,
c_loop().
Obviously I drastically simplified things -- as in this obviously doesn't include any concept of supervision -- I didn't even address how you would want to link these together (and with this little checking for liveness, you would need to spawn_link them so if anything dies they all die -- which is probably exactly what you want with the B subset anyway, so you can treat it as a single unit). Also, you may wind up needing a throttle in there somewhere (like at/before A, or in B). But basically speaking, this is a way of passing messages through in a way that makes B block until its segment of processing is finished.
There are other ways, like gen_event, but I find them to be less flexible than writing a actual simulation of a processing pipeline. As far as how to implement this -- I would make it a combination of OTP supervisors and gen_fsm, as these two components represent a nearly perfect parallel to signal processing components,which your system seems to be aimed at mimicking.
To discover what states you need in your gen_fsms and how you want to clump them together I would probably prototype in a very simplistic fashion in pure Erlang for a few hours, just to make sure I actually understand the problem, and then write my proper OTP supervisors and gen_fsms. This makes sure I don't get invested in some temple of gen_foo behaviors instead of getting invested in actually solving my problem (you're going to have to write it at least twice before its right anyway...).
Hopefully this gives you at least a place to start tackling your problem. In any case, this is a very natural sort of thing to do in Erlang -- and is close enough to the way the language and the problem work that it should be pretty fun to work on.

Passing variables through exit signals [Erlang]

I'm wondering if it's possible to send variables from a dying process to it's calling process. I have a process A that spawned another process B through spawn_link. B is about to die by calling exit(killed). I can catch this in A through {'EXIT', From, killed}, but I'd like to pass some variables from B to A before it dies. I can do this by sending a message from B to A right before it dies, but I'm wondering if this is a 'bad' thing to do. Because technically I'd be sending two messages from B to A. Right now, what I have looks like this:
B sends a message with values to A
A receives values and re-enters receive loop
B calls exit(killed)
A receives EXIT message and spawns another linked process
The idea is that B should always exist and when it gets killed, it should be 'resurrected' immediately. What seems like a better alternative in my opinion is to have something like exit(killed, [Variables]) and to catch it with {'EXIT', From, killed, [Variables]}. Is this possible? And if so, are there any reasons for not doing it? Having A store values for B when B hasn't even died yet seems like a bad move. I'd have to start implementing atomic actions to prevent problems with two linked processes dying at the same time. It also forces me to keep the variables in my receive loop.
What I mean is, if I could send values directly with the EXIT call, my loop would look like this:
loop() ->
receive ->
{'EXIT', From, killed, Variables} -> % spawn new linked process with variables
end.
But if I first need to receive a message, get into the loop again to then receive the exit message, I would get;
loop(Vars) ->
receive ->
{values, Variables} -> loop(Variables);
{'EXIT', From, killed} -> % spawn new linked process with variables
end.
This means I keep the list of variables long after I don't need them anymore and I need to enter my loop twice for what could be considered one action.
To answer your question directly: the exit reason can be any term, which means it can also be a tuple like exit({killed, Values}), so instead of receiving {'EXIT', From, killed, Values} you would received {'EXIT', From, {killed, Values}}.
But!
The way you are doing it now is not wrong. Its not particularly ugly, either. Sending a message (especially an asynchronous one) isn't some major operation to be minimized as much as possible, and neither is spawning/killing processes. If your way works for you, fine.
But! (again!)
Why are you doing this in the first place? Consider what it is about state that you need to be shuttling between two processes, one of which you are terminating just then? Should this value be a permanent entity held by the spawning process? Should it die with the worker? Should it be a quantity maintained by a third process and asked for as part of the worker's startup (a more general phrasing of what Łukasz Ptaszyński was getting at)?
I don't know the answers to those questions, because I don't know your program, but they are the things I would think about if I was finding it necessary to do this sort of work. In particular, if there is some base value that process A must seed process B with for it to work, and the next version of the base value is dependent on something process B does, then process B should be returning it as a part of its processing, not as a part of its shutdown.
This seems like a minor semantic difference, but its important to think about. You may find that you shouldn't be terminating B at all, or that you really need A to manage a directory for several concurrent B's and they should seed themselves as they move along, or whatever. You might even find that this means A should be spawning B as a synchronous, monitored operation, not an asynchronous linked one, and the whole herd of processes should be spawned as a complex of multiple managed A-B pairs! I don't know the answers in your case, but these are the things that come to mind on reading what you are doing.
I think you can try this method:
main()->
ParentPid = self(),
From = spawn_link(?MODULE, child, [ParentPid]),
receive
{'EXIT', From, Reason} ->
Reason
end.
child(ParentPid) ->
Value = 2*2,
exit(ParentPid, {killed, Value}).
Please read this link about erlang:exit/2

Resources