Erlang pid comparison guarantees - erlang

This might be a trivial question for some erlang veterans but it would be nice to know since it wasn't clear in the documentation. Many distributed systems algorithms make use of the comparability of unique pids to make decisions. Erlang is kind enough to offer build-in comparison of pids, However, I was wandering whether comparisons stay consistent among multiple machines referring to both local and external pids. My guess is there are no comparison guarantees but I might be wrong, am I?

Erlang stores more than just a simple process ID in its PID structures; the data includes a unique identifier for the remote node (whether it be another local or a remote VM).
See Can someone explain the structure of a Pid in Erlang? for details.
Thus, you're guaranteed to not send a message to the wrong PID on the wrong VM (or misinterpret the source of a received message), at least not without making an error somewhere in your code.
Update: It occurs to me that I may well have been answering the wrong question. If you're asking how the comparisons would work (e.g., if Pid1 < Pid2, whether Pid1 is local or remote), all I can state with some confidence is that the ordering will be constant, based on http://learnyousomeerlang.com/starting-out-for-real#bool-and-compare.

Related

Can I call GenServer client functions from a remote node?

I have a GenServer on a remote node with both implementation and client functions in the module. Can I use the GenServer client functions remotely somehow?
Using GenServer.call({RemoteProcessName, :"app#remoteNode"}, :get) works a I expect it to, but is cumbersome.
If I want clean this up am I right in thinking that I'd have to write the client functions on the calling (client) node?
You can use the :rpc.call/{4,5} functions.
:rpc.call(:"app#remoteNode", MyModule, :some_func, [arg1, arg2])
For large number of calls, It's better to user gen_server:call/2-3.
If you want to use rpc:call/4-5, you should know that it is just one process named rex on each node for handling all requests. So if it is running one Mod:Func(Arg1, Arg2, Argn), It can not response to other request at this time !
TL;DR
Yes
Discussion
There are PIDs, messages, monitors and links. Nothing more, nothing less. That is your universe. (Unless you get into some rather esoteric aspects of the runtime implementation -- but at the abstraction level represented by EVM languages the previously stated elements (should) constitute your universe.)
Within an Erlang environment (whether local or distributed in a mesh) any PID can send a message addressed to any other PID (no middle-man required), as well as establish monitors and so on.
gen_server:cast sends a gen_server packaged message (so it will arrive in the form handle_cast/2 will be called on). gen_server:call/2 establishes a monitor and a timeout for receiving a labeled reply. Simply doing PID ! SomeMessage does essentially the same thing as gen_server:cast (sends a message) without any of the gen_server machinery behind it (messier to abstract as an interface).
That's all there is to it.
With this in mind, of course you can use gen_server:call/2 across nodes, as long as they are connected into a cluster/mesh via disterl. Two disconnected nodes would have to communicate a different way (network sockets) and wouldn't have any knowledge of each other's internal mapping of PIDs, but as long as disterl is being used they all translate PIDs amongst themselves quite readily. Named processes is where things get a little tricky, but that is the purpose of the global module and utilities such as gproc (though dependence on such facilities beyond a certain point is usually an indication of an architectural problem).
Of course, just because PIDs from any node can communicate with PIDs from another node doesn't always means they should. The physical topology of the network (bandwidth, latency, jitter) comes into play when you start sending high-frequency or large messages (lots of gen_server:calls), and you have always got to think of partition tolerance -- but for off-loading heavy sorts of work (rare) or physically partitioning sub-systems within a very large system (more common) directly sending messages is a very simple way to take a program coded for a single node and distribute it across a cluster.
(With all that in mind, it is somewhat rare to see the rpc module used.)

Is Erlang bad language for this app?

I am building framework for realtime web applications. I started to do it in Elixir, because
it is modern way how to develop application for Erlang VM. Erlang should be good if you need concurrency, fault tolerant, scalable apps (something like web server etc.). That is exactly what i need.
Question: Realtime framework always need for instance keep information about who is interested in what. This will be accomplished by using publish/subscribe pattern. So i will have 1000 clients subscribing to topic "newest-message". I need to save those clients (pid of process representing each client) somewhere to later access them if content for topic "newest-message" appears.
This is where i am confused if Erlang is really good for my framework.
ETS is probably the only option where to store shared data, but ETS is always copying everything if you save/access records. So that means copy 1000 pids always when i need to access them (instead of just iterating over some list, if i will do it for instance in c/java/python).
This will be probably great bottleneck if still copying many and many records from ETS (many clients, many subscriptions etc), i am right?
Sharing the state may be a sign of bad design. You can for example have process for each queue/topic and it will store its own list of subscribers. You send a message to that topic process and it in turn sends the message to clients. This way, you don't copy entire subscriber list.
If you need to process them in parallel, you can split the subscriber list between more processes.
The fault tolerance of Erlang is achieved, because it doesn't let you share state and you have to put more thought to the design, that will not involve state sharing, but will be efficient. This will pay off in the long run, so Erlang/Elixir is definitely good language for this kind of apps. Just look at RabbitMQ.
In my opnion, if you plan to save states like "who is interested in what" Erlang alone may not be a good idea. Of course, sometimes it is very convenient to pass everything in signals (like you'd do in Erlang), but when there is much content to store - lack of state in Erlang starts to hinder you rather than help.
On the other hand, you can keep a broad piece of convenience of Erlang and use it with a Java application, for example. Erlangs interface for Java enables you to connect both technologies quite easily, and at the same time you can use a Java app to store information for you (and save them somewhere, when necessary) and Erlang for the whole concurrent signaling real time part. Even better than that: you can still implement OTP with architecture like that, so you can create quite a lightweight application (because real-time logic is done by Erlang for you) being able to access stored data easily (because Java helps you here).

Erlang Monitor Type

Full discloser: I've only just started learning erlang. So forgive me if I'm being nieve. In the erlang manual the signature for the monitor function is:
monitor(Type, Item) -> MonitorRef
According to the rest of the documentation:
Currently only processes can be monitored, i.e. the only allowed Type
is process, but other types may be allowed in the future.
Monitor semantics seem pretty inherently tied to processes i.e. it doesn't make sense to monitor something other than a process. Having this extra parameter seems to border on paranoia rather than trying to plan for the future. What are these other things that might be allowed to be monitored in the future?
I don't know what the designers may have had in mind, but I'd guess remote nodes.
It may also make sense that a process group (http://www.erlang.org/doc/man/pg2.html) could be monitored.

fault-tolerant counters

I would like to keep a set of counters in a fault-tolerant data store with the following properties:
can communicate to it from erlang
production ready
fault tolerant out of the box (multi-server and no roll-your-own master-slave shenanigans)
the number of counters is dynamic (let's say from 1k to 100k)
I am willing to trade C for AP. You may assume that the counters are only increasing. Things I've already considered:
riak
I assume one could try turning on allow_mult, and aggregating siblings at read time. This probably works great for sets but I'm unsure if it works for counters.
riak_zab
At the time of this writing it's not production ready.
There's some counters code in statebox in a branch that I've been prototyping:
https://github.com/mochi/statebox/tree/counters
This can be used in combination with Riak pretty easily.
It should work, but I haven't written an application with it yet so it's not on master yet. Doesn't fit your production ready goal, but nothing will (except maybe Mnesia, but there are other issues with that).
Use ZooKeeper. You will have use a port to run the zk c client in erlang, but it satisfies the rest of your requirements. A simple solution is to use Sequential nodes in zk, but there are other possible ways too.

How does Erlang pass messages between processes on the same node?

Between nodes, message are (must be) passed over TCP/IP. However, by what mechanism are they passed between processes running on the same node? Is TCP/IP used in this case as well? Unix domain sockets? What is the difference in performance between "within node" and "between node" message passing?
by what mechanism are they passed between processes running on the same node?
Because Erlang processes on the same node are all running within a single native process — the BEAM emulator — message structures are simply copied into the receiver's message queue. The message structure is copied, rather than simply referenced, for all the standard no-side-effects functional programming reasons.
See erts_send_message() in erts/emulator/beam/erl_message.c in the Erlang sources for more detail. In R15B01, the bits most relevant to your question start at line 980 or so, with the call to erts_queue_message().
If you did choose to run multiple BEAM emulators on a single physical machine, I would guess messages get sent between them the same way as between different physical machines. There's probably no good reason to do that now that BEAM has good SMP support, though.
What is the difference in performance between "within node" and "between node" message passing?
A simple benchmark on your actual hardware would be more useful to you than anecdotal evidence from others.
If you want generalities, however, observe that memory bandwidths are around 20 GByte/sec these days, and that you're unlikely to have a network link faster than 10 Gbit/sec between nodes. That means that while there may be many differences between your actual application and any simple benchmark you perform or find, these differences probably cannot swamp an order of magnitude difference in transfer rate.
If you "only" have a 1 Gbit/sec end-to-end network link between nodes, intranode transfers will probably be over two orders of magnitude faster than internode transfers.
"All data in messages between Erlang processes is copied, with the exception of refc binaries on the same Erlang node.":
http://erlang.org/doc/efficiency_guide/processes.html#id2265332

Resources