I'm using Icinga2 with NSClient++
I have a PowerShell check for certain cluster roles which is installed on every cluster node.
Should a cluster role fail, all cluster nodes would send out identical notifications which will result in a lot of spam for just one actual service problem.
Only installing the check on one cluster node is no option as it would produce a single point of failure for role monitoring: A failing cluster node should not affect the cluster roles (aside from a short timeout) but I would not be able to check any cluster role as soon as it's down.
Is it possible to assign a service to a hostgroup in a way that only one notification will be sent if this service fails?
I ended up having the check itself check if he should report a problem as critical (service on the node itself failed) or warning/ok (service on another node failed).
Related
The bounty expires in 2 days. Answers to this question are eligible for a +500 reputation bounty.
Jasper Blues wants to draw more attention to this question.
Has anyone ever had the following error, when installing neo4j cluster with 3 nodes, I followed the indication in the site https://neo4j.com/docs/operations-manual/current/clustering/setup/deploy/: Caused by: com.neo4j.causalclustering.seeding.FailedValidationException: The seed validation failed with response [RemoteSeedValidationResponse{status=FAILURE, remote=XXXX:6000
I follower the indication in the site https://neo4j.com/docs/operations-manual/current/clustering/setup/deploy/
Clustering is a tricky subject, but neo4j is using several ports for clustering. Port 6000, specifically, is for clustering transactions and in this case, it's failing the seed validation process. Now I can see a couple of potential reasons for this.
The network's control plane doesn't allow you to establish a connection. (check the reachability of ports)
Your cluster configuration is not compatible with the environment you are running (check what interface the server listens to and that the cluster DNS resolves your addresses, and check your cluster is not using a loopback address as the interface, setting it to 0.0.0.0 would listen on all interfaces)
or you are trying to start a cluster with pre-existing data that is out of sync (term). (Drain your nodes, remove all instances (and volumes) and if you can, terminate the namespace and try again)
Your scheduler does not have enough cluster resources to start all cluster servers with the given constraint, failing the minimum required amount of nodes available to run clustering. (There should not be more than one db instance per node). Scale-out you cluster or check you settings for Taints, Tolerances or Topology configuration
Hope this helps you troubleshoot your issue.
I have 2 swarm nodes and I whish that in case one node shut down, the other one rearrange all services to itself.
Right now I have one leader(manager) and one worker, and it works perfectly if the worker goes down, because leader rearranges all services to itself.
My problem here is when leader goes down and no one assumes services within it.
I already tried with two managers, but didn't works.
So I am thinking about to let all my services in the worker node so if leader node goes down there is no problem at all and if worker node goes down, leader node would rearrange all services to itself.
I tried with
deploy:
placement:
constraints:
- "node.role!=manager"
But it also does not works, because it will never instance this service in a manager node.
So I would like to ask if there is any way to make those two nodes to rearrange all services to itself in case other goes down?!
or
There is an way to configure a service to "preferably" be deployed in one specific node if that node is available otherwise be deployed in any other node?
The rub of it is, you need 3 nodes, all managers. It is not a good idea, even with a 2 node swarm, to make 2 nodes managers as docker swarm uses the raft protocol for manager quorum, and this protocol requires a clear majority. With two manager nodes, if either node goes down, the remaining manager node only represents 50% of the swarm managers and so will not represent the swarm until qorum is restored.
Once you have 3 nodes - all managers - the swarm will tolerate any single nodes failure and move tasks to the other two nodes.
Don't bother with 4 manager nodes - they dont provide extra protection from single node failures, and don't protect from two node failures as, again, only 2 out 4 does not represet more than 50%, to survive 2 node failures you want 5 managers.
Currently, I am doing some R&D on Thingsboard IOT platform. I am planning to deploy it in cluster mode.
When it is deployed, how two Thingsboard servers communicate with each other?
I got this problem in my mind because a particular device can send a message to one Thingsboard server (A) but actually, the message might need to be transferred to another server (B) since a node in the B server is processing that particular device's messages (As I know Thingsboard nodes uses a device hash to handle messages).
How Kafka stream forward that message accordingly when in a cluster?
I read the official documentation and did some googling. But couldn't find exact answers.
Thingsboard uses Zookeeper as a service discovery.
Each Thingsboard microservice knows what other services run somewhere in the cluster.
All communications perform through message queues (Kafka is a good choice).
Each topic has several partitions. Each partition will be assigned to the respective node.
Message for device will be hashed by originator id and always pushed to the constant partition number. There is no direct communication between nodes.
In the case of some nodes crash or simply scaled up/down, Zookeeper will fire the repartition event on each node. And existing partitions will be reassigned according to the line node count. The device service will follow the same logic.
That is all magic. Simple and effective. Hope it helps with the Thingsboard cluster architure.
I want to set up an environment where I have several VMs, representing several partners, and where each VM host one or more nodes. Ideally, I would use kubernetes to bring up/down my environment. I have understood from the docs that this has to be done as a Dev-network, not as my own compatibility zone or anything.
However, the steps to follow are not clear (to me). I have used Dockerform or the docker image provided, but this does not seem to be the way for what i need to do.
My current (it changes with the hours) understanding is that:
a) I should create a network between the vms that will be hosting nodes. To do so, i understand i should use Cordite or the Bootstrap jar. Cordite documentation seems clearer that the Corda docs, but i haven't been able to try it yet. Should one or the other be my first step? Can anyone shed some light on how?
b) Once I have my network created I need a certifying entity (Thanks #Chris_Chabot for pointing it out!)
c) The next step should be running deployNodes so I create the config files. Here, I am not sure of whether I can indicate in deployNodes at which IPs? should the nodes be created or I just need to create the dockerfiles and certificate folders and so on, and distribute across the VMs them accordingly. I am not sure either about how to point out to the Network service.
Personally, I guess that I will not use the Dockerfiles if I am going to use Kubernetes and that I only need to distribute the certificates and config files to all the slave VMs so they are available to the nodes when they are to be launched.
To be clear, and honest :D, this is even before including any cordapp in the containers, I am just trying to have the environment ready. Basically, starting a process that builds the nodes, distribute the config files among the slave vms, and runs the dockers with the nodes. As explained in a comment, the goal here is not testing Cordapps, is testing how to deploy an operative distributed dev environment.
ANY help is going to be ABSOLUTELY welcome.
Thanks!
(Developer Relations # R3 here)
A network of Corda nodes needs three things:
- A notary node, or a pool of multiple notary nodes
- A certification manager
- A network map service
The certification manager is the root of the trust in the network, and, well, manages certificates. These need to be distributed to the nodes to declare and prove their identity.
The nodes connect to the network map service, which checks their certificate to see if they have access to the network, and if so, add them to the list of nodes that it manages -- and distributes this list of node identities + ip addresses to all the nodes on that network.
Finally the nodes use the notaries to sign the transactions that happen on the network.
Generally we find that most people start developing on the https://testnet.corda.network/ network, and later deploy to the production corda.network.
One benefit of that is that this already comes with all these pieces (certification manager, network map, and a geographically distributed pool of notaries). The other benefit is that it guarantees that you have interoperability with other parties in the future, as everyone uses the same root certificate authority -- With your own network other 3rd parties couldn't just connect as they'd be on a different cert chain and couldn't be validated.
If however you have a strong reason to want to build your own network, you can use Cordite to provide the network map and certman services.
In that case step 1 is to go through the setup and configuration instructions on https://gitlab.com/cordite/network-map-service
Once that is fully setup and up and running, https://docs.corda.net/permissioning.html has some more information on how the certificates are setup, and the "Joining an existing Compatibility Zone" section in https://docs.corda.net/docker-image.html has instructions on how to get a Corda docker image / node to join that network by specifying which network map / certman url's to use.
Oh and on the IP network question: The network manager stores a combination of the X509 identity and the IP address for each node which it distributes to the network -- this means that every node, including the notaries, certman, network map and all nodes need to be able to connect to that IP address -- either by all being on the same network that you created, or by having public ip addresses
I am new in ejabberd clustering setup i tried ejabberd cluster setup past one week but till i did not get it.
1.After clustering setup i got the output like running db nodes = ['ejabberd#first.example.com','ejabberd#second.example.com'] still now fine.
After that i login into PSI+ client and login credtials username:one#first.example.com then password:xxxxx.
Then i stopped ejabberd#first.example.com node my PSI+ client also down.
So why its not automatically connect with my second server ejabberd#second.example.com
Then how will i achieve ejabberd clustering suppose if one node is crash another node is manitain the connection automatically.
Are you trying to set up one cluster, or federate two clusters? If just one cluster, they should share the same domain (either first.example.com or second.example.com).
Also, when there's a node failure, your client must reconnect (not sure what PSI does), and you need to have all nodes in your cluster behind a VIP so the reconnect attempt will find the next available node.