erlang inter-process lock mechanism (such as flock) - erlang

Does Erlang have an inter-process (I mean Linux or Windows process) lock mechanism such as flock ?
The usage would be as follows :
an Erlang server starts serving a repository, and puts a file lock (or whatever)
if another OS process (another Erlang server or a command-line Erlang script) interacts with the repo, then the file lock warns about possible conflict

If you mean between Erlang processes, no, it has inter-process lock mechanisms. That is not the Erlang way of controlling access to a shared resource. Generally if you want to control access to a resource you have an Erlang process which manages the resource and all access to the resource goes through this process. This means we have no need for inter-process locks or mutexes to control access. It is also safe as you can't "cheat" and access anyway and the managing process can detect if clients die in the middle of a transaction.

In Erlang, you would probably use a different way of solving this. One thing that comes to mind is to keep a single Erlang node() which handles all the repositories. It has a lock_mgr process which does the resource lock management.
When another node or escript wants to run, it can connect to the running Erlang node over distribution and request the locking.

There is module global which could fit your needs.
global:set_lock/1,2,3
Sets a lock on the specified nodes (or on all nodes if none are specified) on ResourceId for LockRequesterId.

Related

Where to put a global gen_server when dividing system into OTP apps?

TL;DR
If OTP application A makes calls to a globally registered gen_server in application B, and I don't want to install all of app B on nodes that don't run it, how do I handle the gen_servers client code?
Background (slightly simplified)
I have a system using distributed erlang, 2 nodes with distinct purposes, running mostly different code. So far, I have been using hand made Makefiles and installed all software on both nodes. Some of the code is run as OTP applications with supervisors, but it is not done systematically so not all modules listed in any app-files of part of proper supervision trees.
The dependencies of the code running at each node is different enough that I want to divide it into OTP applications (one per node), to build releases and install them separately. I hope this would let me ditch my handmade Makefiles and switch to rebar3.
One node runs a central server in all erlang, it has dependencies (cowboy) which are not relevant for the other node. The other node runs a client program that use the server, but also use different port programs and gui libs which are not needed in the server node.
Problem
The way the client interact with the server is by making regular function calls to the client API of a globally registered gen_server. I.e. the gen_server which runs on the server node, has its client functions in the same module. This means that this gen_servers beam file needs to be present in both nodes, but it should only be part of a supervision tree in one of the applications.
The server side code in this gen_server uses other modules that are only needed in the server node, thus there are test code for the gen_server that also depend on those other modules. (I realise this could be solved by proper mocking in the tests.)
What solutions have I considered?
Put it in a library application
I could put the gen_servers code in a library app which both the others depend on. It would be strange for few reasons.
The gen_server module would no be part of the same app as the other modules it depends on (and the app-level dependency would be reversed compared to the actual dependency in the code).
Test code would either need to stay in the server app (not the same app as the code it tests) or be re-worked to not depend on surrounding modules (which would be good but time consuming).
Include the server app in both releases
I could include the server app in both nodes, and have the supervisor code check if it should actually start anything based on init-arguments or node name. But it would kind of defeat the purpose of what I'm trying to do.
Include the gen_server module in both apps
I could use a symlink or something to include the gen_server module in the client app as well. I guess it would work but it feels dirty.
Split the gen_server module into a client- and a server-module
Then the client module could be put in the client app (or in a lib if some part of that server also use it). It would divert a lot from the way gen_severs are are usually written.

Can I call GenServer client functions from a remote node?

I have a GenServer on a remote node with both implementation and client functions in the module. Can I use the GenServer client functions remotely somehow?
Using GenServer.call({RemoteProcessName, :"app#remoteNode"}, :get) works a I expect it to, but is cumbersome.
If I want clean this up am I right in thinking that I'd have to write the client functions on the calling (client) node?
You can use the :rpc.call/{4,5} functions.
:rpc.call(:"app#remoteNode", MyModule, :some_func, [arg1, arg2])
For large number of calls, It's better to user gen_server:call/2-3.
If you want to use rpc:call/4-5, you should know that it is just one process named rex on each node for handling all requests. So if it is running one Mod:Func(Arg1, Arg2, Argn), It can not response to other request at this time !
TL;DR
Yes
Discussion
There are PIDs, messages, monitors and links. Nothing more, nothing less. That is your universe. (Unless you get into some rather esoteric aspects of the runtime implementation -- but at the abstraction level represented by EVM languages the previously stated elements (should) constitute your universe.)
Within an Erlang environment (whether local or distributed in a mesh) any PID can send a message addressed to any other PID (no middle-man required), as well as establish monitors and so on.
gen_server:cast sends a gen_server packaged message (so it will arrive in the form handle_cast/2 will be called on). gen_server:call/2 establishes a monitor and a timeout for receiving a labeled reply. Simply doing PID ! SomeMessage does essentially the same thing as gen_server:cast (sends a message) without any of the gen_server machinery behind it (messier to abstract as an interface).
That's all there is to it.
With this in mind, of course you can use gen_server:call/2 across nodes, as long as they are connected into a cluster/mesh via disterl. Two disconnected nodes would have to communicate a different way (network sockets) and wouldn't have any knowledge of each other's internal mapping of PIDs, but as long as disterl is being used they all translate PIDs amongst themselves quite readily. Named processes is where things get a little tricky, but that is the purpose of the global module and utilities such as gproc (though dependence on such facilities beyond a certain point is usually an indication of an architectural problem).
Of course, just because PIDs from any node can communicate with PIDs from another node doesn't always means they should. The physical topology of the network (bandwidth, latency, jitter) comes into play when you start sending high-frequency or large messages (lots of gen_server:calls), and you have always got to think of partition tolerance -- but for off-loading heavy sorts of work (rare) or physically partitioning sub-systems within a very large system (more common) directly sending messages is a very simple way to take a program coded for a single node and distribute it across a cluster.
(With all that in mind, it is somewhat rare to see the rpc module used.)

Sandboxing user code with Erlang

As far as I know Erlang provides advanced features for error handling and isolation of processes.
I'm building a system that allow user to submit their code to be executed on the shared server environment and need to make it safe.
Requirements are:
limit CPU and Memory usage individually for each user-process.
forbid user-process to communicate with other processes (except some processes specially designed for such purpose).
forbid access to all sytem resources (shell, file system, ...).
terminate user-process in case of errors or high resource consumption.
Is it possible to to all this with Erlang and keep it performance efficient?
In general, Erlang doesn't provide means to sandbox code which a user can inject. You can try writing your own piece of protection code, but it is rather hard.
A better choice would probably be a language like "safe haskell":
http://www.haskell.org/ghc/docs/7.4.2/html/users_guide/safe-haskell.html
which is specifically built to do this kind of thing.
The isolation provided by Erlang is not intended to protect against malicious modules being injected. In fact, there is no such protection in the distributed case either. As soon as two machines are connected, there is no limit to what you can do to the other machine.
There has been work done on Safe Erlang in the past and you can find several papers about it.
The ErlHive project addresses the problem in an interesting way.

Erlang: When is it logical to spawn a new process? When not?

If we have really heavy-processes system where process spawning is made for some kind of distribution of load - that's clear.
If we are talking about web-server : it's a good idea to spawn a new proccess for each connection, because then can be distributed. But what else? A single process for Model, View and Controller? Sounds strange, because they all run in a "liner" way, so it can not be good paralleled and we only get overhead on swapping. Also, those "Model, View and Controller" are so light, so they can stay in a single process, isn't it?
So, where is it good to spawn a new process excepting "new connection" situation.
Thank you in advice.
In general, it's anywhere you have a shared resource to manage. It may be a socket, or a database connection, but it may also be some shared in-memory data, or a state machine of some kind.
You may also want to do parallel processing of a list of values (see pmap).
To your "swapping" point you should know that Erlang processes do no use op-sys facilities for scheduling, and scheduling is all but free.
In the specific case of a web-application server, I understand your question. If you are writing a conventional web application with very little share state. Your web framework probably already handles caching and session state and such (these facilities will spawn process).
We are all highly indoctrinated into this stateless web application model. We have all been told since we were pups the stateful systems are hard to develop and they don't scale. I think you will find that there are those that are challenging that. As browser support for WebSockets improve, and with server-side language like Erlang and Clojure providing scalable platforms with safe state management, there will be those who are able to make more interactive web-applications. As an extreme example, could you image WoW as a web application?
One reason to spawn a new process for each connection is that it makes programming the connections much simpler. As a process only handles one connection doing things like having blocking access to data-bases, long polling or streaming becomes much easier. That this process blocks will not affect any other connections.
In Erlang the general "rule" is that you use processes to model concurrent activity and to manage shared resources. Processes are the fundamental way for structuring your system.

Is it better to start multiple erlang nodes per machine, or just one per machine?

Preface: When I say "machine" below, I mean either a physical dedicated server, or a virtual private server. When I say "node" I mean, an instance of the erlang virtual machine, of which there could be multiple running as separate processes under a single unix kernel.
I've got a project that involves multiple erlang/OTP applications. The applications will be running together and talking to each other on the same machine. They will all be hitting the disk, using memory and spawning erlang processes. They will also be using network resources because they will be talking to similar machines with the same set of applications running on them in a cluster.
Almost all of this communication is via HTTP. Thus I could separate each erlang OTP application into a separate instance of the erlang VM on the same machine and they could still talk to each other.
My question is: Is it better to have them running all under one erlang VM so that this erlang VM process can allocate access to resources among them, and schedule the execution of the various erlang processes.
Or is it better to have separate erlang nodes on a given server?
If one is better than the other, why?
I'm assuming running all of these apps in a single erlang vm which is given, essentially, full run of the server, will result in better performance. The OS is just managing the disk and ram at the low level, and only has one significant process (the erlang VM) to switch with... and the erlang VM is probably smarter about allocating resources when it has the holistic view of all the erlang processes.
This may be something that I need to test, but I'm not in a position to do so effectively in the near term.
The answer is: it depends.
Advantages of using a single node:
Memory is controlled by a single Erlang VM. It is way easier.
Inter-application communication (if using erlang-messaging) is faster.
Less operating system context switches happens
Advantages of using multiple nodes:
If the system is linking in C code to the VM, death of one node due to a bug in C will not kill the others.
Agree with #I GIVE CRAP ANSWERS
I would go with one VM. Here is why:
dynamic handling of run time queues belonging to schedulers (with varied origin of CPU load its important)
fewer VMs to monitor
better understanding of memory allocation and easier to spot malicious process (can compare all of them at once)
much easier inter app supervision
I wouldn't care about VM crash - you need to be prepared any way. Heart works especially well in the cluster of equal units.
We've always used one VM per application because it's easier to manage.
The scheduler and SMP support in Erlang have come a long way in the past few years, so there isn't as much reason as there used to be to run multiple VMs on the same node.
I Agree with previous answers but there is a case scenario where having multiple nodes per cpu is the answer: When a heavy task hits the node. A task may take multiple minutes to complete and in such case a gen server will hold the node until completion of the task.

Resources