Unique agents across application instances and BEAM - erlang

My requirement is to use named agents. Basically one agent per record with a custom id. Can we query the Agent's name across application instances and BEAM ? I mean that if we have 2 instances of an app on 2 different BEAM machines we need to make sure that we have only one agent per record. Not more. How can I achieve this?

Agent is basically a GenServer. The latter has three options to register it’s name. Both {:global, term} and {:via, module, term} register the name globally.
Of course, all the nodes should be connected for this to work.
To make it easier to address globally registered processes, one might use Registry, although in this particular case {:global, name} should be fine enough.

Related

Calling specific instances of a docker service

Not exactly sure how to ask this question or if this is a valid approach. So I am learning all about docker, containers, etc. From what I have read it is great for creating individual different microservices that perform various tasks such as BasketService, CartService, etc, which can each be contained in their own docker container on a vm which I think the URL calls from my UI (If hosted on a linux vm) would be something along the lines of https://MyLinuxVM/BasketService/{controller}.
My Question:
Now lets say I have only 1 service. We will call it MyService, that needs to have multiple instances. So I could have 4 instances i.e: MyService1, MyService2, MyService3, MyService4. All exactly the same. From my client, would the following assumption be correct?
I can call https://MyLinuxVM/MyService1/{controller} or https://MyLinuxVM/MyService2/{controller} to send to a specific container instance?
Why:
I feel this may help explain why I am doing this and possibly help everyone understand my problem in the first place. I have 4 physical devices I need to communicate with. We will call them Device1, Device2, Device3, Device4. Each device has its own IP Address, and its own set of "Tools" connected to it on various ports of the device (10-20 ports per device).
From our UI, the users can click a button that sets some torque values for the tool in their hand by sending the data to the MVC backend which gets sent to the "Correct" background worker/container which will then transform the data into byte[] and pass it along to its dedicated device. I am not sure if I need multiple background workers in a single container, or just a single configurable container with a single background worker that gets deployed multiple times dependent on number of devices we have running in the shop.
I have read a lot of things on creating different worker services that do different tasks, but I need multiple instances of a worker service that can be configured (preferably from db tables) to send to a specific device.
Picture for additional details / visual:

Share storage/volume between worker nodes in Kubernetes?

Is it possible to have a centralized storage/volume that can be shared between two pods/instances of an application that exist in different worker nodes in Kubernetes?
So to explain my case:
I have a Kubernetes cluster with 2 worker nodes. In each one of these I have 1 instance of app X running. This means I have 2 instances of app X running totally at the same time.
Both instances subscribe on the topic topicX, that has 2 partitions, and are part of a consumer group in Apache Kafka called groupX.
As I understand it the message load will be split among the partitions, but also among the consumers in the consumer group. So far so good, right?
So to my problem:
In my whole solution I have a hierarchy division with the unique constraint by country and ID. Each combination of country and ID has a pickle model (python Machine Learning Model), which is stored in a directory accessed by the application. For each combination of a country and ID I receive one message per minute.
At the moment I have 2 countries, so to be able to scale properly I wanted to split the load between two instances of app X, each one handling its own country.
The problem is that with Kafka the messages can be balanced between the different instances, and to access the pickle-files in each instance without know what country the message belongs to, I have to store the pickle-files in both instances.
Is there a way to solve this? I would rather keep the setup as simple as possible so it is easy to scale and add a third, fourth and fifth country later.
Keep in mind that this is an overly simplified way of explaining the problem. The number of instances is much higher in reality etc.
Yes. It's possible if you look at this table any PV (Physical Volume) that supports ReadWriteMany will help you accomplish having the same data store for your Kafka workers. So in summary these:
AzureFile
CephFS
Glusterfs
Quobyte
NFS
VsphereVolume - (works when pods are collocated)
PortworxVolume
In my opinion, NFS is the easiest to implement. Note that Azurefile, Quobyte, and Portworx are paid solutions.

Restrict service detection in OpenNMS based on "hostname"

I am able to restrict a service detection based on the ipaddress , but suppose if I want to use another parameter like hostname or node_label for service detection , then how do I configure that?
I need to know exact snippet config for hostname in default-foreign-source.xml
P.S : I am using the Discovery demon i.e auto-discovery of nodes
Any help would be appreciated.
The OpenNMS model is as follows:
node --> interface --> service
So OpenNMS has no way of associating a node label with a service. There is a BusinessServiceMonitor in development that will help deal with more complicated models, but it isn't in release code at the moment.
This is why you aren't able to associate as you want.
You might get around this by labeling (ifAlias) interfaces with tags and matching categories to tags to exclude the service.
Also, you should never edit provisioning XML configuration files directly. OpenNMS utilizes caching for those configs for performance purposes and you can break your system (unlikely but possible).
I would also get away from using discovery. It limits the ability for you to separate groups of nodes out as distinct requisitions, which give you the ability to apply different sets of provisioning policies (filters, ability to monitor or not monitor services or data collections) to different groups of nodes. Discovery operates only against the default foreign source policy so you lose that kind of flexibility.

Is this the right way of building an Erlang network server for multi-client apps?

I'm building a small network server for a multi-player board game using Erlang.
This network server uses a local instance of Mnesia DB to store a session for each connected client app. Inside each client's record (session) stored in this local Mnesia, I store the client's PID and NODE (the node where a client is logged in).
I plan to deploy this network server on at least 2 connected servers (Node A & B).
So in order to allow a Client A who is logged in on Node A to search (query to Mnesia) for a Client B who is logged in on Node B, I replicate the Mnesia session table from Node A to Node B or vise-versa.
After Client A queries the PID and NODE of the Client B, then Client A and B can communicate with each other directly.
Is this the right way of establishing connection between two client apps that are logged-in on two different Erlang nodes?
Creating a system where two or more nodes are perfectly in sync is by definition impossible. In practice however, you might get close enough that it works for your particular problem.
You don't say the exact reason behind running on two nodes, so I'm going to assume it is for scalability. With many nodes, your system will also be more available and fault-tolerant if you get it right. However, the problem could be simplified if you know you only ever will run in a single node, and need the other node as a hot-slave to take over if the master is unavailable.
To establish a connection between two processes on two different nodes, you need some global addressing(user id 123 is pid<123,456,0>). If you also care about only one process running for User A running at a time, you also need a lock or allow only unique registrations of the addressing. If you also want to grow, you need a way to add more nodes, either while your system is running or when it is stopped.
Now, there are already some solutions out there that helps solving your problem, with different trade-offs:
gproc in global mode, allows registering a process under a given key(which gives you addressing and locking). This is distributed to the entire cluster, with no single point of failure, however the leader election (at least when I last looked at it) works only for nodes that was available when the system started. Adding new nodes requires an experimental version of gen_leader or stopping the system. Within your own code, if you know two players are only going to ever talk to each other, you could start them on the same node.
riak_core, allows you to build on top of the well-tested and proved architecture used in riak KV and riak search. It maps the keys into buckets in a fashion that allows you to add new nodes and have the keys redistributed. You can plug into this mechanism and move your processes. This approach does not let you decide where to start your processes, so if you have much communication between them, this will go across the network.
Using mnesia with distributed transactions, allows you to guarantee that every node has the data before the transaction is commited, this would give you distribution of the addressing and locking, but you would have to do everything else on top of this(like releasing the lock). Note: I have never used distributed transactions in production, so I cannot tell you how reliable they are. Also, due to being distributed, expect latency. Note2: You should check exactly how you would add more nodes and have the tables replicated, for example if it is possible without stopping mnesia.
Zookeper/doozer/roll your own, provides a centralized highly-available database which you may use to store the addressing. In this case you would need to handle unregistering yourself. Adding nodes while the system is running is easy from the addressing point of view, but you need some way to have your application learn about the new nodes and start spawning processes there.
Also, it is not necessary to store the node, as the pid contains enough information to send the messages directly to the correct node.
As a cool trick which you may already be aware of, pids may be serialized (as may all data within the VM) to a binary. Use term_to_binary/1 and binary_to_term/1 to convert between the actual pid inside the VM and a binary which you may store in whatever accepts binary data without mangling it in some stupid way.

Passing messages between remote MailboxProcessors?

I'm using MailboxProcessor classes in order to keep separate agents that do their own thing. Normally agents can communicate with one another in the same process, but I want agents to talk to one another when they are on separate processes or even different machines. What kind of mechanism is best for implementing communication between them? Is there some standard solution?
Please note that I'm using Ubuntu instances to run the agents.
I think you're going to have write your own routines to serialize messages, pass them accross the process boundaries and then dispatch them on the other side. This will also require a implementation of a ID system where each mailbox has an ID and processes can send messages to IDs instead of just Mailbox.Send. This is not easy, as local boxes will be able to access local memory, but remote mailboxes will not.
I would look at something like RPyC (http://rpyc.wikidot.com/) as it provides a protocol somewhat like you are looking for.
Basically the answer is 'no' there isn't really a good way to do this.

Resources