Is Erlang message distribution broker-aware? - erlang

As stated in Erlang's official documentation Distributed Erlang, erlang nodes can communicate ( issue messages ) to other nodes in the same erlang cluster. Therefore it is possible to issue message such as :
Nodes A, B, C and D
A --> B
A --> C
B --> A
C --> B
...
Per my question, by "broker-aware", I mean: Can we have a node issue a message to any other node that is available based on a load balancing rule ?
A --> [ B or C or D ]
B --> [ A or C or D ]
...
Well, I know it is "possible" to design this, which requires some state management, etc. But are there built-in features for that ? If not, any open-source project someone is aware of that is NOT non-erlang message driven ( overall excluding RabbitMQ, etc as I want a pure erlang message broker) ?

I don't think there is a library for that, because the problem is very general. Your computation may be CPU bound, memory bound, network bound or use some other resource. Some tasks should be spawned "close to data". For example, when there is lots of data on disk, that would have to transmitted over the network.
The easiest way is to have some central "job manager" and workers asking for jobs. Other option is to have some kind of metric and update it like in this post on mailing list

Related

Erlang/OTP pattern to ensure Composite process accepts a message only when children are done

Is there an Erlang/OTP pattern/library for the following problem(before I hack my own):
At the highest level, imagine there are three components(or processes?) such that A->B->C where -> means sends a message to.
B in terms of architecture is a composite process. It is composed of many unit processes(shown in khaki green below). Sometimes, the message chain goes from B1->B2->B3->C and sometimes it goes from B1->B4->B5->B6->B3->C.
What I would like to do is:
B can only accept the next message when all it's children processes are done i.e B receives a message I1 and depending on the message, it will choose one flow and finally C gets a message O1. Until that happens, B should not accept the message I2. This is to ensure ordering of messages so that O2 of I2 does not reach C before O1 of I1.
This has a few names. One is "dataflow" (as in "reactive programming" -- which is sort of an overblown ball of buzzwords if you look it up) and another is "signal simulation" (as in simulation of electrical signal switches). I am not aware of a framework for this in Erlang, because it is very straightforward to implement directly.
The issue of message ordering can be made to take care of itself, depending on how you want to write things. Erlang guarantees the ordering of message between two processes, so as long as messages travel in well-defined channels, this system-wide promise can be made to work for you. If you need some more interesting signal paths than straight lines you can force synch communication; though all Erlang message are asynchronous, you can introduce synchronous blocking on receive wherever you want.
If you want the "B constellation" to pass a message to C but only after its signal processing has completely run its route through the B's, you can make a signal manager which sends a message to B1, and blocks until it receives the output from B3, whence it passes the completed message on to C and checks its box for the next thing from A:
a_loop(B) ->
receive {in, Data} -> B ! Data end,
a_loop(B).
% Note the two receives here -- we are blocking for the end of processing based
% on the known Ref we send out and expect to receive back in a message match.
b_manager(B1, C) ->
Ref = make_ref(),
receive Data -> B1 ! {Ref, Data} end,
receive {Ref, Result} -> C ! Result end,
b_manager(B1, C).
b_1(B2) ->
receive
{Ref, Data} ->
Mod1 = do_processing(Data),
B2 ! {Ref, Mod1}
end,
b_1(B2).
% Here you have as many "b_#" processes as you need...
b_2(B) ->
receive
{Ref, Data} ->
Result = do_other_processing(Data),
B ! {Ref, Result}
end,
b_2(B).
c_loop() ->
receive Result -> stuff(Result) end,
c_loop().
Obviously I drastically simplified things -- as in this obviously doesn't include any concept of supervision -- I didn't even address how you would want to link these together (and with this little checking for liveness, you would need to spawn_link them so if anything dies they all die -- which is probably exactly what you want with the B subset anyway, so you can treat it as a single unit). Also, you may wind up needing a throttle in there somewhere (like at/before A, or in B). But basically speaking, this is a way of passing messages through in a way that makes B block until its segment of processing is finished.
There are other ways, like gen_event, but I find them to be less flexible than writing a actual simulation of a processing pipeline. As far as how to implement this -- I would make it a combination of OTP supervisors and gen_fsm, as these two components represent a nearly perfect parallel to signal processing components,which your system seems to be aimed at mimicking.
To discover what states you need in your gen_fsms and how you want to clump them together I would probably prototype in a very simplistic fashion in pure Erlang for a few hours, just to make sure I actually understand the problem, and then write my proper OTP supervisors and gen_fsms. This makes sure I don't get invested in some temple of gen_foo behaviors instead of getting invested in actually solving my problem (you're going to have to write it at least twice before its right anyway...).
Hopefully this gives you at least a place to start tackling your problem. In any case, this is a very natural sort of thing to do in Erlang -- and is close enough to the way the language and the problem work that it should be pretty fun to work on.

What guarantees does erlang's "monitor" give?

While reading the ERTS user's guide, I found this section:
The only signal ordering guarantee given is the following. If an entity sends multiple signals to the same destination
entity, the order will be preserved. That is, if A sends a signal S1 to B, and later sends the signal S2 to B, S1 is
guaranteed not to arrive after S2.
I've also happened across this while doing further research googling:
Erlang Reference Manual, 13.5:
Message sending is asynchronous and safe, the message is guaranteed to eventually reach the recipient, provided that the recipient exists.
That seems very vague and I'd like to know what guarantees I can rely on in the following scenario:
A,B are processes on two different nodes.
Assume A does not crash and B was a valid node at some point.
A and B monitor each other.
A sends messages M1,M2,M3 to B
In the above scenario, is it possible that B receives M1,M3 (M2 is dropped),
without any sort of 'DOWN'/'EXIT'/heartbeat timeout being received at A?
There are no other guarantees other than the ordering guarantee. Note that by default you don't even know who the sender is, unless the sender encodes this in the message.
Your example could happen:
A sends M1 and M2
B receives M1
The node on which B resides gets disconnected
The node on which B resides comes up again
A sends M3 to B
B receives M3
M2 can be lost on the network link in this scenario. It is highly unlikely this happens, but it can happen. The usual trick is to have some kind of notion of such errors. Either by having a timeout trigger, or by monitoring the node or Pid which is the recipient of the message.
Updated scenario:
In the updated scenario, provided I read it correctly, then A would get a 'DOWN' style message at some point, and likewise, it would get a message telling you that the node is up again, if you monitor the node.
Though often, such things are better modeled using an idempotent protocol if at all possible.
Reading through the erlang mailing-list and the academic faq, it seems like there are a few guarantees provided by the ERTS implementation, however I was not able to determine whether or not they are guaranteed at a language/specification level, too.
If you assume TCP is "reliable", then the current implementation guarantees that
given A,B are processes on different nodes (&hosts) and A monitors B, A sends to B, assuming A doesn't crash, any message delivery failures* between the two nodes or host/node/process failures on B will lead to A getting a 'DOWN' message (or 'EXIT' in the case of links). [ See 1 and 2 ]
*From what I have read on the mailing-list thread , this property is almost entirely based on the fact that TCP is used, so "message delivery failure" means any situation where TCP decides that a failure has occurred/the connection needs to be closed.
The academic faq talks about this like it's also a language/specification level guarantee, however I couldn't yet find anything to back that up.

How do I create an atom dynamically in Erlang?

I am trying to register a couple processess with atom names created dynamically, like so:
keep_alive(Name, Fun) ->
register(Name, Pid = spawn(Fun)),
on_exit(Pid, fun(_Why) -> keep_alive(Name, Fun) end).
monitor_some_processes(N) ->
%% create N processes that restart automatically when killed
for(1, N, fun(I) ->
Mesg = io_lib:format("I'm process ~p~n", [I]),
Name = list_to_atom(io_lib:format("zombie~p", [I])),
keep_alive(Name, fun() -> zombie(Mesg) end)
end).
for(N, N, Fun) -> [Fun(N)];
for(I, N, Fun) -> [Fun(I)|for(I+1, N, Fun)].
zombie(Mesg) ->
io:format(Mesg),
timer:sleep(3000),
zombie(Mesg).
That list_to_atom/1 call though is resulting in an error:
43> list_to_atom(io_lib:format("zombie~p", [1])).
** exception error: bad argument
in function list_to_atom/1
called as list_to_atom([122,111,109,98,105,101,"1"])
What am I doing wrong?
Also, is there a better way of doing this?
TL;DR
You should not dynamically generate atoms. From what your code snippet indicates you are probably trying to find some way to flexibly name processes, but atoms are not it. Use a K/V store of some type instead of register/2.
Discussion
Atoms are restrictive for a reason. They should represent something about the eternal structure of your program, not the current state of it. Atoms are so restrictive that I imagine what you really want to be able to do is register a process using any arbitrary Erlang value, not just atoms, and reference them more freely.
If that is the case, pick from one of the following four approaches:
Keep Key/Value pairs somewhere to act as your own registry. This could be a separate process or a list/tree/dict/map handler to store key/value pairs of #{Name => Pid}.
Use the global module (which, like gproc below, has features that work across a cluster).
Use a registry solution like Ulf Wiger's nice little project gproc. It is awesome for the times when you actually need it (which are, honestly, not as often as I see it used). Here is a decent blog post about its use and why it works the way it does: http://blog.rusty.io/2009/09/16/g-proc-erlang-global-process-registry/. An added advantage of gproc is that nearly every Erlanger you'll meet is at least passingly familiar with it.
A variant on the first option, structure your program as a tree of service managers and workers (as in the "Service -> Worker Pattern"). A side effect of this pattern is that very often the service manager winds up needing to monitor its process for one reason or another if you're doing anything non-trivial, and that makes it an ideal candidate for a place to keep a Key/Value registry of Pids. It is quite common for this sort of pattern to wind up emerging naturally as a program matures, especially if that program has high robustness requirements. Structuring it as a set of semi-independent services with an abstract management interface at the top of each from the outset is often a handy evolutionary shortcut.
io_lib:format returns a potentially "deep list" (i.e. it may contain other lists), while list_to_atom requires a "flat list". You can wrap the io_lib:format call in a call to lists:flatten:
list_to_atom(lists:flatten(io_lib:format("zombie~p", [1]))).

Network Traffic in AMQP QPID

QPID AMQP
I have a question regrading network traffic . suppose I have a Publisher on Machine A . The Qpid broker is running on Machine B . WE have two subscribers Machine C and Machine D (They both subscribe to same topics). Now Imagine a topology where
A-->B-->X-->C
|
D
(Publisher A is connected to B and subscriber C and D are connected to Broker through and intermediate node X)
Message that is published by A which matches the topics for C and D will be received by both .What I want to know is that will the edge b->x carry the message twice (once for b->x->c and second time for b->x->c). Or is the AMQP/qpid framework intelligent enough to send message once from B to X and then send copies to each individual subscriber (hence less network traffic on b->x).
What I thought was that since X knows nothing and if we have private subscription queues for each subscriber (or even if shared queue and browsing/copying message instead of consuming) , the message will be travelling twice through b->x
This question is not specific to QPID . I would like to know the solutions for other Broker based (RabbitMQ ) and brokerless messaging frameworks (Zero MQ , LBM/UMS). I read in an article Zero Mq tries to provide a smarter solution http://www.250bpm.com/pubsub#toc4 , but it seems complicated since how would intermediate hops know about when to send multiple copies or not (I am not Networking expert so i might be missign something obvoius ,so any help would be really appreciated)
I'm assuming X is another Qpid broker, connected to B through the 'federation' feature. That being the case, the message will not be transported twice from B to X.
There are different ways you can configure this, depending on other requirements for the scenario.
The first is to statically link X to B: you create a queue on B for X to subscribe to, bind that Q to the exchange in question such that messages for both C and D will match, then use qpid-route to create a bridge from that queue to the exchange on X. C and D now connect and bind to that exchange on X and will receive the messages published by A as expected. In this case the messages will always flow from B to X regardless of whether C or D are active. If you add another consumer, E, then you may need to statically add a binding to the brdiged queue on B.
The second option is to use dynamic routing. This will automatically handle the propagation of binding information from X to B such that messages will only flow from B to X if they are needed by an active binding on X.
RabbitMQ will also only propagate a message across an intermediate link such as this once (and it will only get sent at all if some downstream consumer will actually end up seeing the message).

How to publish JMX stats on to a single remote server

Lets say I have two applications/tomcats T1 and T2, both of which are jmx enabled. Each of them normally would have their own URL <serve_X>:<port_X> to which the jmx clients would connect. I want to know if it is possible to have a single rmi-server S1, running on port P1; which can hold the statistics of both T1 and T2.
If so how can I figure out the context? (as all the stats are now redirected to the same url). The closest I could find on internet is point 7 in this page. The intent is to have a centralized location for jmx services. I am trying to figure out if there is something like a context name (as in servlets) to facilitate this.
One solution that is rather new (compared to when the question was asked) is something like this:
JMX -> Codahale Metrics -> metrics-statsd -> StatsD -> Graphite/Reporting/Monitoring
Basically you use StatsD to aggregate the stats and Metrics library to convert JMX to something reasonable.
I think to do this you would need to re-write the object names so that they don't clash.
The server S1 would run on P1 as you say, and for requests coming in, forward them to the respective tomcats T1 and T2.
If you have e.g. tomacat:key1=value1 as objectname, then you could expose that on your proxyserver S1 as tomcat:server=T1,key1=value1 for the first real server and tomcat:server=T2,key1=value1 for the second.
If you end goal is ease of monitoring, or ability to combine stats, then look into Evident ClearStone

Resources