The docs of Neo4j state that when running in HA mode, you get eventual consistency. This is a quote from that page:
All updates will however propagate from the master to other slaves
eventually so a write from one slave may not be immediately visible on
all other slaves
My question is: is there a configuration that will allow me to write a cluster with strong consistency, of course at the cost of reduced performance? I'm looking for some sort of active-passive failover cluster configuration.
There is such an config option. ha.tx_push_factor determines to how many slaves a transaction should be pushed to synchronously. When setting this to ha.tx_push_factor=<clustersize>-1 you have immediate full consistency.
Related
I have a Jenkins master and two agents. However the connectivity to one agent(agentA) is bit shaky and I want to use the other agent(agentB) when the connectivity to the first one is not available.
I am only using the Jenkins web interface and have not used scripts. I am trying to figure out how it can be done using the "Restrict where this project can be run" option in job's configuration. I tried using agentA|| agentB but when agentA is not available it hangs saying "pending - agentA is offline"
Is it possible to have a configuration to achieve what I need?
I can;t leave it blank because I have other agent (agentC, agentD) which do not want this job to run in.
I am not an admin of the Jenkins server, hence adding new plugins is not my preferred option but it can be done.
As noted in Least Load plugin,
By default Jenkins tries to allocate a jobs to the last node is was executed on. This can result in nodes being left idle while other nodes are overloaded.
As you generalized the example, I'm not 100% sure if your situation can be solved by simply better labelling of your nodes or you want to look at least load plugin (it is designed for balancing the load across nodes). Your example appears to show Node names (ie; agentA/agentB). The Queue allocation logic may be "Only A or Only B", then Jenkins sticks to it. Load balancing may not address that as while a Node (a Computer) name is also a label, it may have additional logic tied to it.
If you label the pair of nodes in a pool with a common label, say "CapabilityA", and constrain your jobs to run where "CapabilityA" rather than the node names, you may find jobs float across the pool (to B if A is not available. That's how we have our nodes labelled - by Capability, and we see jobs floating across nodes, but only once the first node is full (4 executors each), so not balanced.
Nodes can have many labels and you can use label conditions to have complex constraints.
How people detect and automate replacement of dead Swarm Manager?
That seems important considering: "If the swarm loses the quorum of managers, the swarm cannot perform management tasks."
You need to implement this with an external monitoring solution. It's not a built in capability of docker swarm mode.
Implementing this solution will be non-trivial. First, keep in mind when you promote a node, you are now giving it full administrative access over the swarm where a normal worker has none of that access, so make sure your security model is ok with this change. You also need to avoid cascade failures, where an overload of one manager causes it to fail, and automatically promoting other nodes causes them to immediately fail until there are no more workers as the existing workload is redistributed to fewer and fewer nodes. Lastly, when you add a new manager, you'll need to consider what to do with the reference to the currently failed manager. If it recovers, do you want it to continue where it left off, or do you want to have it completely removed from the swarm to reduce the number of nodes needed for quorum.
One last thing to note is when you lose quorum, nodes will continue to run the containers they have started. The only thing you lose is the ability to manage and make changes to that infrastructure. Therefore most places I've seen have 3 or 5 managers, depending on the level of fault tolerance needed, and often make the managers virtual so that if a failure occurs, the VM image can be easily restarted elsewhere in their environment.
Is there any difference between I create two slaves, or one slave with two executors on the same Windows server?
Yes, there is a difference: It's about memory consumption and effort of maintenance/administration.
Starting a slave on a system starts a (main) process. This process costs (private) main memory to run and connects to the master.
Each executor is a sub-process of the main process.
It is therefore apparent that running two executors on one slave costs less memory in total compared to running two slaves (with one executor each), as there would be the memory consumption of the main process twice:
2 * Main Processes + 2 * Executors > 1 * Main Process + 2 * Executors
Moreover, administrating a slave is some more effort than just an executor: Whilst an executor has virtually nothing to worry, there are numerous things to configure for a slave. Additionally, the capabilities of the two slaves are anyhow the same (they are running on the same OS as you said), so there is little value-add to also assign it different labels.
In short, if there are no other boundary conditions, which make me do it differently, I always would prefer running two executors on one slave, as this is easier to administrate and some memory is saved.
A slave is a "machine". An executor is an "OS Process" in the slave.
So ideally we always add executors - they do the work and can run in parallel, and the simple theoretic answer to your question is "2 executors on one slave"
In practice we need to add slaves in several use cases:
We need more resources (more cpu, more memory, more "machines")
We need a different setting (Different OSes, Different hardware)
We have global resources that would create a conflict for executors on same machine (shared browser for a UI testing process)
Make the decision based on your use case.
One benefit which immediately comes to my mind for running 1 executor on given node, is to prevent conflicts between processes run at the same time.
On other hand you could prevent job conflicts using existing Jenkins plugins, ie. Heavy Job, Build Blocker.
I am planning to configure some sort of 2 node replication for neo4j, similar to mysql replication. Since I am a little constrained on resources I don't want to pay for more than two Cloud compute instances. Also I am happy with just one real time or near real time copy of the neo4j database. So the approach i can think of is:
Configure HA on the two compute nodes with the help of an arbiter instance. Setup one neo4j instance (master) on first node and another neo4j instance (slave) + another neo4j instance (arbiter, only for arbitration, no data logging) instance on second node.
OR
Setup a cron for online backup using the neo4j-backup tool. Setup incremental backups every hour or so. Not sure the load it may put on the prod server, planning to test that out.
I am more inclined on the first approach since I get a more real time copy the database (I also get HA/load balancing with instant failover but that is not a priority right now).
Please let me know
which of the two approach is better,
if there is another way to achieve the same or
if any of the above approaches are not suitable or have some flaws.
I am a little new to Neo4j HA so please pardon me for my ignorance. Thanks !
So. You already mentioned available solutions.
TL;DR; I prefer first option.
Cluster
In general, recommended layout is 3 nodes (2 slaves + 1 master).
But your layout - 2 nodes (1 master + 1 slave + 1 arbiter) is viable too. Especially if one server can handle your workload.
Good things:
Almost "real-time" replica.
Possibility to utilise resources to handle bigger workload.
Better availability.
Notes:
If you have 10mb/sec write load on master, then same load will be applied on slave node. This shouldn't affect reads from slave at all (except write load is REALLY huge).
Maintenance costs are bigger, then single-instance installation. You should plan how to handle cluster upgrades, configuration updates, plugin updates.
Branched data. In clustered environment there is possibility to end up in "split-brain" scenario, when 2 nodes have different data and decision should be made which data should be kept. Neo4j handles such cases quite good. But you should keep in mind that small data-loss can occur in VERY RARE scenarios.
Backup
Good things:
Simple. Just do backups from database.
Consistency check. When backup is made, tool runs consistency check to verify if database is not damaged. There is no possibility that Backup will screw up live database. If there any issues - you will be notified via logs from backup utility. See below detailed info on to how backup is performed.
Database. Neo4j backup is fully-functional database. You can spin-up server that points to backup database, and do everything you wan't.
Incremental backups. You can do incremental backups as often, as you wan't.
Notes:
Neo4j scales vertically very well (depends on size of database). It can handle huge load on single instance (we had up to 3k requests/second on medium machine). So, you can get one bigger machine for Neo4j server and other smaller (cheaper) for backups.
How backup is performed?
One thing that should be kept in mind - live database is still fully operational. Backup utility doesn't not stop or prevent any actions.
When transaction in database is committed, all changes are appended to transaction log.
When there are no previous backup present: copy whole storage.
When there is previous backup AND transaction logs are available: copy new transaction logs and replay them on to storage.
When there is previous backup AND transactions are NOT available: discard existing storage, copy existing storage.
Why transaction logs can not be available? Your configuration may say to keep only latest transaction logs (i.e. 1 hour), or not to keep at all.
Relevant settings:
keep_logical_logs
logical_log_rotation_threshold
Other
Anyway, you should consider making backups event in clustered environment. Everything can fail, in any moment.
In general - everything depends on your load and database size.
If your database is small enough to fully fit in memory and one machine is enough to handle all load, then one Neo4j instance will be enough. Just do backup.
If you wan't better scalability/availability and real-time working replica, then cluster setup is best choice.
When starting up Enterprise Neo4j in HA the 1st server is starting as the master.
I have a requirement where I want to control who the master is in the cluster, is that actually possible in Neo4j?
What would happen if I set all the slaves with 'ha.slave_coordinator_update_mode=none'. Will this permit me to have a single master, and if it goes down no other instance will become the master, and when that instance recovers will become the master again.
Or, if I didn't use that setting, the master goes down and a slave takes over, when the original master comes back up will it just act as a slave or will it become the master again?
Is there some configuration that will permit control of that, the documentation doesn't cover that very clearly.
Orlok,
You can use ha.slave_only to ensure an instance doesn't ever become master. See http://docs.neo4j.org/chunked/stable/ha-configuration.html
That effectively allows you to add as many read slaves as you wish, but beware that you lose high availability if you only have one instance that can become master. I.e. have a few instances master-ready, setup with ha.slave_only=false, as well as a bunch of read slaves.
Regards,
Lasse