From the official docker doc, there is a statement (as below) looks confusing to me. From my understanding, don't we only need to pick anyone of healthy manager nodes to backup for future restoration purpose?
"You must perform a manual backup on each manager node, because logs contain node IP address information and are not transferable to other nodes. If you do not backup the raft logs, you cannot verify workloads or Swarm resource provisioning after restoring the cluster."
Link: https://docs.docker.com/ee/admin/backup/back-up-swarm/
It depends on how you want to recover. If you want to restore a specific node, you need a backup from that node.
If you are rebuilding your swarm cluster from an old backup, then you only need one healthy node's backup. See the following guide for performing a backup and restore:
https://docs.docker.com/engine/swarm/admin_guide/#back-up-the-swarm
If you restore the cluster from a single node, you will need to reset and join the swarm again on the other managers since you are running a single node cluster. What is restored in that scenario are the services, stacks, and other definitions, but not the nodes.
Related
Here is the situation : on a couchdb cluster made of two nodes, each node is a couchdb docker instance on a server (ip1 and ip2). I had to reboot one server and restart docker, after that both my couchdb instances displays for each database: "This database failed to load."
I can connect with Futon and see the full list of databases, but that's all. On "Verify Couchdb Installation" with Futon I have several errors (only 'Create database' is a green check)
The docker logs for the container gives me this error :
"internal_server_error : No DB shards could be opened"
I tried to recover the database locally by copying the .couch and shards/ files to a local instance of couchdb but the same problem occurs.
How can I retrieve the data ?
PS: I checked the connectivity between my two nodes with erl, no problem there. Looks like docker messed up with some couchdb config file on restart.
metadata and cloning a node
The individual databases have metadata indicating on which nodes their shards are stored which is built on creation based on cluster options, so copying database files alone does not actually move or mirror the database on to the new node. (If one sets the metadata correctly the shards are copied by couch itself, so copying the files is only done to speed up the process.)
replica count
A 2 node cluster usually does not make sense. As with file system RAID, you can stripe for maximal performance and half the reliability or you can create a mirror, but unless individual node state has perfect consistency detection one can not automatically decide which of two nodes is incorrect, while deciding which of 3 nodes is incorrect is easy enough to perform automatically. Consequently, most clusters are 3 or more nodes and each shard has 3 replicas on any 3 nodes.
Alright, just in case someone do the same mistake :
When you have a 2 node cluster, couchdb#ip1 and couchdb#ip2, and created the cluster from couchdb#ip1 :
1) If the node couchdb#ip2 stops, then the cluster setup is messed up (couchdb#ip1 will no longer work), on restart it appears that the node will not connect correctly and the databases will appear but will not be available.
2) On the other hand, stoping and starting couchdb#ip1 do not cause any problem
The solution in case 1 is to recreate the cluster with 2 fresh couchdb instances (couchdb#ip1 and couchdb#ip2), then copy the databases on one couchdb instance and all the databases will be back !
If anyone can explain in detail why this happend ? It also means that this cluster configuration is absolutly not reliable (if couchdb#ip2 is down then nothing works), I guess it will not be the same with a 3 nodes cluster ?
I'd like to upgrade the Docker engine on my Docker Swarm managed nodes (both manager and worker nodes) from 18.06 to 19.03, without causing any downtime.
I see there are many tutorials online for rolling update of a Dockerized application without downtime, but nothing related to upgrading the Docker engine on all Docker Swarm managed nodes.
Is it really not possible to upgrade the Docker daemon on Docker Swarm managed nodes without a downtime? If true, that would indeed be a pity.
Thanks in advance to the wonderful community at SO!
You can upgrade managers, in place, one at a time. During this upgrade process, you would drain the node with docker node update, and run the upgrade to the docker engine with the normal OS commands, and then return the node to active. What will not work is to add or remove nodes to the cluster while the managers have mixed versions. This means you cannot completely replace nodes with an install from scratch at the same time you upgrade the versions. All managers need to be the same version (upgraded) and then you can look at rebuilding/replacing the hosts. What I've seen in the past is that nodes do not fully join the manager quorum, and after losing enough managers you eventually lose quorum.
Once all managers are upgraded, then you can upgrade the workers, either with in place upgrades or replacing the nodes. Until the workers have all been upgraded, do not use any new features.
You can drain your node and after that upgrade your docker version,then make this ACTIVE again.
Repeat this step for all the nodes.
DRAIN availability prevents a node from receiving new tasks from the swarm manager. Manager stops tasks running on the node and launches replica tasks on a node with ACTIVE availability.
For detailed information you can refer this link :- https://docs.docker.com/engine/swarm/swarm-tutorial/drain-node/
Can I somehow configure how master node distributes services in docker swarm? I thought, that it should see free resources of worker nodes and distribute it to "freest" node.
Currently I have problem, that service is distributed into one node, which is full (90% RAM) and it starts be laggy, but at the same time second node has few services and it can handle another one.
docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
wdkklpy6065zxckxyuj000ei4 * docker-master Ready Drain Leader 18.09.6
sk45rol2whdr5eh2jqozy0035 docker-node01 Ready Active Reachable 18.09.6
o4zwwbwwcrbwo4tsd00pxkfuc docker-node02 Ready Active 18.09.6
Now I have 36 (very similar) services, 28 run on docker-node01, 8 on docker-node02. I thought, that ideal state is 16 services on both nodes.
Both docker-nodes are same.
How docker swarm knows where to run service? What algorithm does it use?
It is possible to change/update algorithm for selecting node?
According to the swarmkit project README the only available strategy is spread so it schedule tasks on the least loaded modes.
Note that the swarm won't move nodes around to maintain this strategy so if you added the node02 after the node01 was full then the node02 will remain mostly empty. You could drain both nodes then activate them to see if it distributes better the load.
You can find a more detailed description on the schedules algorithm on the project documentation: scheduling-algorithm
For the older swarm manager this attribute was configurable:
https://docs.docker.com/swarm/reference/manage/#--strategy--scheduler-placement-strategy
Also I found https://docs.docker.com/swarm/scheduler/strategy/, it explains a lot about Docker swarm strategies.
I'm aware that service configs and secrets are stored in the RAFT log and that this log is replicated to other swarm managers.. but what if the entire swarm is stopped? Is the RAFT log persistent or should you always keep local copies?
I eventually found out that if you back up the swarm, you should be able to recover as detailed in the documentation:
Back up the swarm
Docker manager nodes store the swarm state and manager logs in the /var/lib/docker/swarm/ directory. In 1.13 and higher, this data includes the keys used to encrypt the Raft logs. Without these keys, you will not be able to restore the swarm.
You can back up the swarm using any manager. Use the following procedure.
If the swarm has auto-lock enabled, you will need the unlock key in order to restore the swarm from backup. Retrieve the unlock key if necessary and store it in a safe location. If you are unsure, read Lock your swarm to protect its encryption key.
Stop Docker on the manager before backing up the data, so that no data is being changed during the backup. It is possible to take a backup while the manager is running (a “hot” backup), but this is not recommended and your results will be less predictable when restoring. While the manager is down, other nodes will continue generating swarm data that will not be part of this backup.
Note: Be sure to maintain the quorum of swarm managers. During the time that a manager is shut down, your swarm is more vulnerable to losing the quorum if further nodes are lost. The number of managers you run is a trade-off. If you regularly take down managers to do backups, consider running a 5-manager swarm, so that you can lose an additional manager while the backup is running, without disrupting your services.
Back up the entire /var/lib/docker/swarm directory.
Restart the manager.
One friend of mine and I are trying to develop a CorDapp for a financial use case, I can run the cordapp-tutorial and the demos, however they only run on localhost.
We would like to create two "real" nodes and I understood correctly we should build two Corda nodes, my pc as one node server and his pc as another node server, but how can we effectively connect over the internet? On slack I have been told to enable dev-mode, but how do you enable it?
We have a corda.jar and the nodea.conf, but the part I don't really understand from the documentation is:
"Each node server by default must have a node.conf file in the current working directory. After first execution of the node server there will be many other configuration and persistence files created in this workspace directory. The directory can be overridden by the --base-directory= command line argument."
What is intended as working directory?
I've read this documentation
: Corda Nodes
Thank to all, I think I will be asking a lot of question in the near future :D
In Corda 3.1, you can use the network bootstrapper to create a dev-mode network of nodes running on two separate machines as follows:
Create the nodes by following the instructions here (e.g. by using gradlew deployNodes)
Navigate to the folder where the nodes were created (e.g. build/nodes)
Open the node.conf file of each node and change the localhost part of its p2pAddress to the IP address of the machine where the node will be run (e.g. p2pAddress="10.18.0.166:10007")
After making these changes, we need to redistribute the updated nodeInfo files to each node, so that they have the updated IP addresses for each node. Use the network bootstrapper tool to automatically update the files and have them distributed to each node:
java -jar network-bootstrapper.jar kotlin-source/build/nodes
Move the node folders to their individual machines (e.g. using a USB key). It is important that none of the nodes - including the notary - end up on more than one machine. Each computer should also have a copy of runnodes and runnodes.bat.
For example, you may end up with the following layout:
Machine 1: Notary, PartyA, runnodes, runnodes.bat
Machine 2: PartyB, PartyC, runnodes, runnodes.bat
After starting each node, the nodes will be able to see one another and agree ledger updates among themselves
Warning
The bootstrapper must be run after the node.conf files have been modified, but before the nodes are distributed across machines. Otherwise, the nodes will not have the updated IP addresses for each node and will not be able to communicate.
Each of the nodes will have a node.conf file. To enable devMode add this line to the node.conf file.
devMode=true