Should separate Erlang applications share the same VM on the same machine? - erlang

I have a CouchDB instance running on one machine, and thus with its own Erlang VM process. If I have another separate Erlang application running on that machine too, is it be better to share the same VM between CouchDB and my application, or is it recommended to start a new Erlang node?

While many would recommend decoupling these subsystems I would take the opposite approach. Erlang has a built in strategy to run many applications on the same release. If your applications talk to each other directly it might make sense for you to bundle them together into a release. This will make calls between the applications faster. Some will argue that all you applications now share the same fate should you need to take the system down for an upgrade that only one of the applications needs. This is a moot point with Erlang where you are distributing your applications across many nodes. Also most upgrades can be done with hot code loading.

There is no problem with running several VMs on the same machnie (at least recent OTP releases), however it is quite handy if you have all your applications on one Erlang node. Easier communication, dependency management, supervision, fault-tolerance - you get it for free in this case, not mentioning maintaining only one 'node' in source control system.
The problem starts with CouchDB. It does not have decent build system which let you use it as one of independent Erlang node applications. So in this case you need to have at least 2 VMs (one acts as Couch daemon, the other one hosts your application)

Related

Where should I put shared services for multiple kubernetes-clusters?

Our company is developing an application which runs in 3 seperate kubernetes-clusters in different versions (production, staging, testing).
We need to monitor our clusters and the applications over time (metrics and logs). We also need to run a mailserver.
So basically we have 3 different environments with different versions of our application. And we have some shared services that just need to run and we do not care much about them:
Monitoring: We need to install influxdb and grafana. In every cluster there's a pre-installed heapster, that needs to send data to our tools.
Logging: We didn't decide yet.
Mailserver (https://github.com/tomav/docker-mailserver)
independant services: Sentry, Gitlab
I am not sure where to run these external shared services. I found these options:
1. Inside each cluster
We need to install the tools 3 times for the 3 environments.
Con:
We don't have one central point to analyze our systems.
If the whole cluster is down, we cannot look at anything.
Installing the same tools multiple times does not feel right.
2. Create an additional cluster
We install the shared tools in an additional kubernetes-cluster.
Con:
Cost for an additional cluster
It's probably harder to send ongoing data to external cluster (networking, security, firewall etc.).
3) Use an additional root-server
We run docker-containers on an oldschool-root-server.
Con:
Feels contradictory to use root-server instead of cutting-edge-k8s.
Single point of failure.
We need to control the docker-containers manually (or attach the machine to rancher).
I tried to google for the problem but I cannot find anything about the topic. Can anyone give me a hint or some links on this topic?
Or is it just no relevant problem that a cluster might go down?
To me, the second option sound less evil but I cannot estimate yet if it's hard to transfer data from one cluster to another.
The important questions are:
Is it a problem to have monitoring-data in a cluster because one cannot see the monitoring-data if the cluster is offline?
Is it common practice to have an additional cluster for shared services that should not have an impact on other parts of the application?
Is it (easily) possible to send metrics and logs from one kubernetes-cluster to another (we are running kubernetes in OpenTelekomCloud which is basically OpenStack)?
Thanks for your hints,
Marius
That is a very complex and philosophic topic, but I will give you my view on it and some facts to support it.
I think the best way is the second one - Create an additional cluster, and that's why:
You need a point which should be accessible from any of your environments. With a separate cluster, you can set the same firewall rules, routes, etc. in all your environments and it doesn't affect your current workload.
Yes, you need to pay a bit more. However, you need resources to run your shared applications, and overhead for a Kubernetes infrastructure is not high in comparison with applications.
With a separate cluster, you can setup a real HA solution, which you might not need for staging and development clusters, so you will not pay for that multiple times.
Technically, it is also OK. You can use Heapster to collect data from multiple clusters; almost any logging solution can also work with multiple clusters. All other applications can be just run on the separate cluster, and that's all you need to do with them.
Now, about your questions:
Is it a problem to have monitoring-data in a cluster because one cannot see the monitoring-data if the cluster is offline?
No, it is not a problem with a separate cluster.
Is it common practice to have an additional cluster for shared services that should not have an impact on other parts of the application?
I think, yes. At least I did it several times, and I know some other projects with similar architecture.
Is it (easily) possible to send metrics and logs from one kubernetes-cluster to another (we are running kubernetes in OpenTelekomCloud which is basically OpenStack)?
Yes, nothing complex there. Usually, it does not depend on the platform.

what does it mean by 'one application per one operating system' in hyper-visor virtualization?

I was going through the documentation for docker. It was providing the concepts for virtual machine before container. The author stated that, a server can be divided into multiple virtual machines having their own operating system. He also stated that, this way, multiple applications can be run in one physical server by running each of them in separate virtual machine (one virtual machine for one application). I was little bit confused here. Can't multiple applications run in one virtual machine (operating system) without the need for other vm ? By applications, what do we mean? I am a total beginner in this topic. If anyone could make me understand this terminology, I would be very grateful. Thank You.
An application is a service or a process such as: Nginx, PHP, Redis, Apache, Memcached and so on.
The reason why is recommended this way it is because containers have been designed to isolate a process by giving its own userspace and filesystem.
Therefore, this comes of benefits such as: having just one process per container makes it easily re-usable for another projects, easily scalable and you also separate worries so for example if run 2 applications inside a container and you want to shut one of them then will that process gracefully stop or you will have to stop the entire container?

Use Docker to keep track of software versions/installations?

I have an data processing application which is updated on a regular basis. This application has a bunch of dependencies which are also updated every now and then. However, different versions of the software (+dependencies) might produce different results (this is expected). The application is run on a remote computer and it can be accessed through a Web page. Every time the user uses the Web page to do some processing she/he also chooses which version of the software he/she wants to use.
Now I am trying to decide which is the best way of keeping track different software (+dependencies) versions. The simplest way of course is to just compile and install each version of my software and its dependencies in a different folder, and then based on the request the user sends, the appropriate folder is selected. However, this sounds very clunky to me. So I thought I could use Docker to keep track of the different software versions. Do you think that it is a good idea? If yes, what is most appropriate to do every time I have a new version of the software (and/or dependencies): 1) Create a new container from scratch with the new version (and end up having multiple containers), or 2) Update the existing container and commit the changes? (I suppose I can access the older commits of the container, right?)
PS: Keep in mind that the reason I looked into Docker and not a simple virtual machine solution is that the application I am running is a high-performance GPU-based software.
Docker is a reasonable choice. Your repository would contain all of the app versions you wish to publish. Note, you will only realize savings if you organize the resulting app filesystem into layers, of which the lower layers are the least likely to change between versions. This will keep the storage requirements at a minimum.
Then you have to decide how you will process each job. A robust (but complex) solution would be to have one or more API containers which take in processing jobs from your user and "dole" them out to worker containers (one or more from each release version). This would provide the lowest response latency and be non-blocking. You can look at different service discovery models to see how your "worker" containers can register with your "manager" containers. This is probably more than you'd like to bite off, but consider using a good key-value database (another container!) like etcd or a 3rd party service discovery tool like zookeeper/eureka/consul.
A much simpler model would have a single API container with one each of the release containers created, but not started. The API container would start, direct, and then stop the appropriate release container. You would incur the startup latency, but this is the least resource intensive... and easiest to manage. But this is a blocking operation.
Somewhere in the middle, but less user friendly is to have each release container running but listening on different host ports (the app always sees the same port). The user would would connect to the port which is servicing the desired release of the app. You'd have to provide some sort of index to make this useful.

Is it better to start multiple erlang nodes per machine, or just one per machine?

Preface: When I say "machine" below, I mean either a physical dedicated server, or a virtual private server. When I say "node" I mean, an instance of the erlang virtual machine, of which there could be multiple running as separate processes under a single unix kernel.
I've got a project that involves multiple erlang/OTP applications. The applications will be running together and talking to each other on the same machine. They will all be hitting the disk, using memory and spawning erlang processes. They will also be using network resources because they will be talking to similar machines with the same set of applications running on them in a cluster.
Almost all of this communication is via HTTP. Thus I could separate each erlang OTP application into a separate instance of the erlang VM on the same machine and they could still talk to each other.
My question is: Is it better to have them running all under one erlang VM so that this erlang VM process can allocate access to resources among them, and schedule the execution of the various erlang processes.
Or is it better to have separate erlang nodes on a given server?
If one is better than the other, why?
I'm assuming running all of these apps in a single erlang vm which is given, essentially, full run of the server, will result in better performance. The OS is just managing the disk and ram at the low level, and only has one significant process (the erlang VM) to switch with... and the erlang VM is probably smarter about allocating resources when it has the holistic view of all the erlang processes.
This may be something that I need to test, but I'm not in a position to do so effectively in the near term.
The answer is: it depends.
Advantages of using a single node:
Memory is controlled by a single Erlang VM. It is way easier.
Inter-application communication (if using erlang-messaging) is faster.
Less operating system context switches happens
Advantages of using multiple nodes:
If the system is linking in C code to the VM, death of one node due to a bug in C will not kill the others.
Agree with #I GIVE CRAP ANSWERS
I would go with one VM. Here is why:
dynamic handling of run time queues belonging to schedulers (with varied origin of CPU load its important)
fewer VMs to monitor
better understanding of memory allocation and easier to spot malicious process (can compare all of them at once)
much easier inter app supervision
I wouldn't care about VM crash - you need to be prepared any way. Heart works especially well in the cluster of equal units.
We've always used one VM per application because it's easier to manage.
The scheduler and SMP support in Erlang have come a long way in the past few years, so there isn't as much reason as there used to be to run multiple VMs on the same node.
I Agree with previous answers but there is a case scenario where having multiple nodes per cpu is the answer: When a heavy task hits the node. A task may take multiple minutes to complete and in such case a gen server will hold the node until completion of the task.

Starting remote Erlang nodes

I want to write a master-slave application in Erlang. I am thinking at the following things I need from the architecture:
the slaves shouldn't die when the master dies, but rather try to reconnect to it while the master is down
the master should automatically start the remote nodes if they don't connect automatically or they are down (probably the supervisor behaviour in OTP)
Is there a OTP oriented behaviour to do this? I know I can start remote nodes with slave:start_link() and I can monitor nodes with erlang:monitor(), but I don't know how this can be incorporated in a gen_server behaviour.
I agree with the comments about using erlang:monitor_node and the use of distributed applications.
You cannot just use the slave module to accomplish that, it clearly states "All slave nodes which are started by a master will terminate automatically when the master terminates".
There is currently no OTP behaviour to do it either. Supervision trees are hierarchical ; it seems like you are looking for something where there is a hierarchy in terms of application logic, but spawning is done an a peer-to-peer basis (or an individual basis, depending upon your point of view).
If you were to use multiple Erlang VMs then you should carefully consider how many you run, as a large number of them may cause performance issues due to the OS swapping OS processes in and out. A rule of thumb for best performance is to aim for having no more than one OS process (i.e. one Erlang VM) per CPU core.
If you're interested in studying other implementations, Basho's riak_core framework has a pretty good take on decentralized distributed applications.
riak_core_node_watcher.erl has most of the interesting node observation code in it.
Search and you'll find there are quite a few talks and presentations about the framework.

Resources