The bounty expires in 2 days. Answers to this question are eligible for a +500 reputation bounty.
Jasper Blues wants to draw more attention to this question.
Has anyone ever had the following error, when installing neo4j cluster with 3 nodes, I followed the indication in the site https://neo4j.com/docs/operations-manual/current/clustering/setup/deploy/: Caused by: com.neo4j.causalclustering.seeding.FailedValidationException: The seed validation failed with response [RemoteSeedValidationResponse{status=FAILURE, remote=XXXX:6000
I follower the indication in the site https://neo4j.com/docs/operations-manual/current/clustering/setup/deploy/
Clustering is a tricky subject, but neo4j is using several ports for clustering. Port 6000, specifically, is for clustering transactions and in this case, it's failing the seed validation process. Now I can see a couple of potential reasons for this.
The network's control plane doesn't allow you to establish a connection. (check the reachability of ports)
Your cluster configuration is not compatible with the environment you are running (check what interface the server listens to and that the cluster DNS resolves your addresses, and check your cluster is not using a loopback address as the interface, setting it to 0.0.0.0 would listen on all interfaces)
or you are trying to start a cluster with pre-existing data that is out of sync (term). (Drain your nodes, remove all instances (and volumes) and if you can, terminate the namespace and try again)
Your scheduler does not have enough cluster resources to start all cluster servers with the given constraint, failing the minimum required amount of nodes available to run clustering. (There should not be more than one db instance per node). Scale-out you cluster or check you settings for Taints, Tolerances or Topology configuration
Hope this helps you troubleshoot your issue.
Related
I have created a virtual private network across two clusters in different regions. While the masters are able to communicate, I can't reach a worker across the virtual private network. I assume this is due to anti spoofing measures.
How I tried to fix it
I read (https://www.packetcoders.io/openstack-neutron-port-security-explained/ and https://www.redhat.com/en/blog/whats-coming-openstack-networking-kilo-release) that one needs to add allowed access pairs to all instances so that those allowed access pairs can access the worker. However I am unsure what IP needs to be allowed. I added the master's local ip in the subnet, its floating_ip and its ip in the virtual private network. None of these fixed my issue. I am still unable to reach the workers.
What I am looking for
A complete answer would be great, but ways to debug this further would be highly appreciated, too.
More info to avoid xy problem
I have started two clusters in different regions and have connected the masters using wireguard. They can already ping each other using subnet addresses. That's why I think that my problem is not wireguard related. If you think otherwise, I am happy to give additional info on this setup to avoid asking the wrong question.
I am new to docker swarm, I read documentation and googled to the topic, but results was vague,
Is it possible to add worker or manager node from distinct and separate Virtual private servers?
Idea is to connect many non-related hosts into a swarm which then creates distribution over many systems and resiliency in case of any HW failures. The only thing you need to watch out for is that the internet connection between the hosts is stable and that all of the needed ports based of the official documentation are open. And you are good to go :)
Oh and between managers you want a VERY stable internet connection without any random ping spikes, or you may encounter weird behaviour (because of consensus with raft and decision making).
other than that it is good
Refer to Administer and maintain a swarm of Docker Engines
In production the best practice to maximise swarm HA is to spread your swarm managers across multiple availability zones. Availability Zones are geo-graphically co-located but distinct sites. i.e. instead of having a single London data centre, have 3 - each connected to a different internet and power utility. That way, if any single ISP or Power utility has an outage, you still have 2 data centres connected to the internet.
Swarm was designed with this kind of Highly available topology in mind and can scale to having its managers - and workers - distributed across nodes in different data centres.
However, Swarm is sensitive to latency over longer distances - so global distribution is not a good idea. In a single city, Data center to Data centre latencies will be in the low 10s of ms. Which is fine.
Connecting data centres in different cities / continents moves the latency to the low, to mid 100s of ms which does cause problems and leads to instability.
Otherwise, go ahead. Build your swarm across AZ distributed nodes.
Currently, I am doing some R&D on Thingsboard IOT platform. I am planning to deploy it in cluster mode.
When it is deployed, how two Thingsboard servers communicate with each other?
I got this problem in my mind because a particular device can send a message to one Thingsboard server (A) but actually, the message might need to be transferred to another server (B) since a node in the B server is processing that particular device's messages (As I know Thingsboard nodes uses a device hash to handle messages).
How Kafka stream forward that message accordingly when in a cluster?
I read the official documentation and did some googling. But couldn't find exact answers.
Thingsboard uses Zookeeper as a service discovery.
Each Thingsboard microservice knows what other services run somewhere in the cluster.
All communications perform through message queues (Kafka is a good choice).
Each topic has several partitions. Each partition will be assigned to the respective node.
Message for device will be hashed by originator id and always pushed to the constant partition number. There is no direct communication between nodes.
In the case of some nodes crash or simply scaled up/down, Zookeeper will fire the repartition event on each node. And existing partitions will be reassigned according to the line node count. The device service will follow the same logic.
That is all magic. Simple and effective. Hope it helps with the Thingsboard cluster architure.
I want to set up an environment where I have several VMs, representing several partners, and where each VM host one or more nodes. Ideally, I would use kubernetes to bring up/down my environment. I have understood from the docs that this has to be done as a Dev-network, not as my own compatibility zone or anything.
However, the steps to follow are not clear (to me). I have used Dockerform or the docker image provided, but this does not seem to be the way for what i need to do.
My current (it changes with the hours) understanding is that:
a) I should create a network between the vms that will be hosting nodes. To do so, i understand i should use Cordite or the Bootstrap jar. Cordite documentation seems clearer that the Corda docs, but i haven't been able to try it yet. Should one or the other be my first step? Can anyone shed some light on how?
b) Once I have my network created I need a certifying entity (Thanks #Chris_Chabot for pointing it out!)
c) The next step should be running deployNodes so I create the config files. Here, I am not sure of whether I can indicate in deployNodes at which IPs? should the nodes be created or I just need to create the dockerfiles and certificate folders and so on, and distribute across the VMs them accordingly. I am not sure either about how to point out to the Network service.
Personally, I guess that I will not use the Dockerfiles if I am going to use Kubernetes and that I only need to distribute the certificates and config files to all the slave VMs so they are available to the nodes when they are to be launched.
To be clear, and honest :D, this is even before including any cordapp in the containers, I am just trying to have the environment ready. Basically, starting a process that builds the nodes, distribute the config files among the slave vms, and runs the dockers with the nodes. As explained in a comment, the goal here is not testing Cordapps, is testing how to deploy an operative distributed dev environment.
ANY help is going to be ABSOLUTELY welcome.
Thanks!
(Developer Relations # R3 here)
A network of Corda nodes needs three things:
- A notary node, or a pool of multiple notary nodes
- A certification manager
- A network map service
The certification manager is the root of the trust in the network, and, well, manages certificates. These need to be distributed to the nodes to declare and prove their identity.
The nodes connect to the network map service, which checks their certificate to see if they have access to the network, and if so, add them to the list of nodes that it manages -- and distributes this list of node identities + ip addresses to all the nodes on that network.
Finally the nodes use the notaries to sign the transactions that happen on the network.
Generally we find that most people start developing on the https://testnet.corda.network/ network, and later deploy to the production corda.network.
One benefit of that is that this already comes with all these pieces (certification manager, network map, and a geographically distributed pool of notaries). The other benefit is that it guarantees that you have interoperability with other parties in the future, as everyone uses the same root certificate authority -- With your own network other 3rd parties couldn't just connect as they'd be on a different cert chain and couldn't be validated.
If however you have a strong reason to want to build your own network, you can use Cordite to provide the network map and certman services.
In that case step 1 is to go through the setup and configuration instructions on https://gitlab.com/cordite/network-map-service
Once that is fully setup and up and running, https://docs.corda.net/permissioning.html has some more information on how the certificates are setup, and the "Joining an existing Compatibility Zone" section in https://docs.corda.net/docker-image.html has instructions on how to get a Corda docker image / node to join that network by specifying which network map / certman url's to use.
Oh and on the IP network question: The network manager stores a combination of the X509 identity and the IP address for each node which it distributes to the network -- this means that every node, including the notaries, certman, network map and all nodes need to be able to connect to that IP address -- either by all being on the same network that you created, or by having public ip addresses
I have a minor bosun setup, and its collecting metrics from numerous services, and we are planning to scale these services on the cloud.
This will mean more data coming into bosun and hence, the load/efficiency/scale of bosun is affected.
I am afraid of losing data, due to network overhead, and in case of failures.
I am looking for any performance benchmark reports for bosun, or any inputs on benchmarking/testing bosun for scale and HA.
Also, any inputs on good practices to be followed to scale bosun will be helpful.
My current thinking is to run numerous bosun binaries as a cluster, backed by a distributed opentsdb setup.
Also, I am thinking is it worthwhile to run some bosun executors as plain 'collectors' of scollector data (with bosun -n command), and some to just calculate the alerts.
The problem with this approach is it that same alerts might be triggered from multiple bosun instances (running without option -n). Is there a better way to de-duplicate the alerts?
The current best practices are:
Use https://godoc.org/bosun.org/cmd/tsdbrelay to forward metrics to opentsdb. This gets the bosun binary out of the "critical path". It should also forward the metrics to bosun for indexing, and can duplicate the metric stream to multiple data centers for DR/Backups.
Make sure your hadoop/opentsdb cluster has at least 5 nodes. You can't do live maintenance on a 3 node cluster, and hadoop usually runs on a dozen or more nodes. We use Cloudera Manager to manage the hadoop cluster, and others have recommended Apache Ambari.
Use a load balancer like HAProxy to split the /api/put write traffic across multiple instances of tsdbrelay in an active/passive mode. We run one instance on each node (with tsdbrelay forwarding to the local opentsdb instance) and direct all write traffic at a primary write node (with multiple secondary/backup nodes).
Split the /api/query traffic across the remaining nodes pointed directly at opentsdb (no need to go thru the relay) in an active/active mode (aka round robin or hash based routing). This improves query performance by balancing them across the non-write nodes.
We only run a single bosun instance in each datacenter, with the DR site using the read only flag (any failover would be manual). It really isn't designed for HA yet, but in the future may allow two nodes to share a redis instance and allow active/active or active/passive HA.
By using tsdbrelay to duplicate the metric streams you don't have to deal with opentsdb/hbase replication and instead can setup multiple isolated monitoring systems in each datacenter and duplicate the metrics to whichever sites are appropriate. We have a primary and a DR site, and choose to duplicate all metrics to both data centers. I actually use the DR site daily for Grafana queries since it is closer to where I live.
You can find more details about production setups at http://bosun.org/resources including copies of all of the haproxy/tsdbrelay/etc configuration files we use at Stack Overflow.