Error creating default "bridge" network: cannot create network (docker0): conflicts with network (docker0): networks have same bridge name - docker

After stopping docker it refused to start again. It complaint that another bridge called docker0 already exists:
level=warning msg="devmapper: Base device already exists and has filesystem xfs on it. User specified filesystem will be ignored."
level=info msg="[graphdriver] using prior storage driver \"devicemapper\""
level=info msg="Graph migration to content-addressability took 0.00 seconds"
level=info msg="Firewalld running: false"
level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
level=fatal msg="Error starting daemon: Error initializing network controller: Error creating default \"bridge\" network: cannot create network fa74b0de61a17ffe68b9a8f7c1cd698692fb56f6151a7898d66a30350ca0085f (docker0): conflicts with network bb9e0aab24dd1f4e61f8e7a46d4801875ade36af79d7d868c9a6ddf55070d4d7 (docker0): networks have same bridge name"
docker.service: Main process exited, code=exited, status=1/FAILURE
Failed to start Docker Application Container Engine.
docker.service: Unit entered failed state.
docker.service: Failed with result 'exit-code'.
Deleting the bridge with ip link del docker0 and then starting docker leads to the same result with another id.

For me I downgraded my OS (Centos Atomic Host in this case) and came across this error message. The docker of the older Centos Atomic was 1.9.1. I did not have any running docker containers or images pulled before running the downgrade.
I simply ran the below and docker was happy again:
sudo rm -rf /var/lib/docker/network
sudo systemctl start docker
More info.

The Problem seems to be in /var/docker/network/. There are a lot of sockets stored that reference the bridge by its old id. To solve the Problem you can delete all sockets, delete the interface and then start docker but all your container will refuse to work since their sockets are gone. In my case I did not care about my stateless containers anyway so this fixed the problem:
ip link del docker0
rm -rf /var/docker/network/*
mkdir /var/docker/network/files
systemctl start docker
# delete all containers
docker ps -a | cut -d' ' -f 1 | xargs -n 1 echo docker rm -f
# recreate all containers

It may sound obvious, but you may want to consider rebooting, especially if there was some major system update recently.
Worked for me, since I didn't reboot my VM after installing some kernel updates, which probably led to many network modules being left in an undefined state.

Related

docker.socket: Failed with result 'service-start-limit-hit' after protecting docker daemon socket

I followed the steps provided in the documentation here to add tls security for docker api. Certificates are located in ~/.docker/ as well as /etc/docker/ssl/ folders. I added override.conf to /etc/systemd/system/docker.service.d/ with content
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd -H tcp://0.0.0.0:2376 --tlsverify --tlscacert=ca.pem --tlscert=server-cert.pem --tlskey=server-key.pem
Then, I used daemon-reload and docker start
$ systemctl daemon-reload
$ service docker start
The errors in journalctl -xe is:
-- Unit docker.socket has finished starting up.
--
-- The start-up result is RESULT.
Jan 15 21:43:24 cynicalplyaground systemd[1]: docker.service: Start request repeated too quickly.
Jan 15 21:43:24 cynicalplyaground systemd[1]: docker.service: Failed with result 'exit-code'.
Jan 15 21:43:24 cynicalplyaground systemd[1]: Failed to start Docker Application Container Engine.
-- Subject: Unit docker.service has failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Unit docker.service has failed.
--
-- The result is RESULT.
Jan 15 21:43:24 cynicalplyaground systemd[1]: docker.socket: Failed with result 'service-start-limit-hit'.
Jan 15 21:45:01 cynicalplyaground CRON[12768]: pam_unix(cron:session): session opened for user root by (uid=0)
Jan 15 21:45:01 cynicalplyaground CRON[12769]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Jan 15 21:45:01 cynicalplyaground CRON[12768]: pam_unix(cron:session): session closed for user root
How can I sort this issue?
In the present case the same error occured after the latest manjaro update (2020-01-20).
Tried to change the systemd docker service, as adviced in other cases, but I reverted those changes and finally this was solved with:
a reboot of the system
(like advised here: https://www.reddit.com/r/archlinux/comments/7ya4ug/installing_docker_on_arch_linux/)
Getting to the root of the problem;
systemctl status docker.service
has this:
/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
Trying to run that command, it complains about
unable to configure the Docker daemon with file /etc/docker/daemon.json: EOF
ls -l /etc/docker/daemon.json
-rw-r--r-- 1 root root 0 Jul 30 10:32 /etc/docker/daemon.json
NOTE that the JSON file is empty. Delete it.
For me it was because the docker installer uses iptables for nat. Unfortunately Debian uses nftables. You can convert the entries over to nftables or just setup Debian to use the legacy iptables.
sudo update-alternatives --set iptables /usr/sbin/iptables-legacy
sudo update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
dockerd, should start fine after switching to iptables-legacy.
I have the same issue and just modify the "/usr/bin/dockerd" to "/usr/sbin/dockerd", then it works.
You can check the dockerd path first.
in my case... the host was part of a docker swarm...but the IPv6 was no longer reachable or automatically assigned to the host...
I manually add the old_IPv6
ip -6 address add 28xx:xxxx:x:x:xx:ebff:fe14:xxx dev ens3x
the journalctl -u docker.service mention:
level=fatal msg="Error starting cluster component: could not find local IP address: dial udp [2xxx:xxx:xxxx:xxx]:2377: connect: network is unreachable"
after add manually the IPv6 I was able to start docker so with docker running I leave the "swarm" and reboot
docker swarm leave --force
after reboot the docker services run as usual
For me it was missing disk space. Reboot also helped, but I was stillnot able to build any container.
After pruning some outdated stuff from the docker volumes I was able to continue.
I faced a similar issue on Ubuntu because I added the hosts option to /etc/docker/daemon.json file. That's ok, but for systems that use systemd it may cause conflict with the arguments passed to dockerd on start.
The solution was to delete the /etc/docker/daemon.json's hosts entry and set this config on file /etc/systemd/system/docker.service.d/options.conf.
$ cat /etc/systemd/system/docker.service.d/options.conf
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd -H tcp://0.0.0.0:2375 -H unix://
After that, restart the service.
$ sudo systemctl daemon-reload
$ sudo systemctl restart docker
You may check that your changes has been applied by running docker info. Also, you may note on the docker service status that Drop-In field is using the options.conf created, and dockerd was executed with the specified host list.
$ systemctl status docker
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset>
Drop-In: /etc/systemd/system/docker.service.d
└─options.conf
Active: active (running) since Fri 2022-11-18 01:02:18 EST; 1h 50min ago
TriggeredBy: ● docker.socket
Docs: https://docs.docker.com
Main PID: 1111 (dockerd)
Tasks: 18
Memory: 58.5M
CPU: 1.294s
CGroup: /system.slice/docker.service
└─1111 /usr/bin/dockerd -H tcp://0.0.0.0:2375 -H unix://
References:
Daemon configuration file
Control Docker with systemd
I had a similar issue on nixOS installed in a btrfs filesystem.
For me the solution was to add virtualisation.docker.storageDriver = "btrfs"; to my /etc/nixos/configuration.nix
Which according to the docker docs should equate to adding the following to /etc/docker/daemon.json in most other distros:
{
"storage-driver": "btrfs"
}
I was able to solve the problem by disabling the firewalld
systemctl disable firewalld
systemctl stop firewalld

docker - start failed because /etc/fstab not found

I'm using Window Linux Subsystem (Debian stretch). Followed the instruction on Docker website, I installed docker-ce, but it cannot start. Here is the info:
$ sudo service docker start
grep: /etc/fstab: No such file or directory
[ ok ] Starting Docker: docker.
$ sudo service docker status
[FAIL] Docker is not running ... failed!
What should I do with /etc/fstab not found?
to fix fstab
touch /etc/fstab
if you run dockerd, it will give you the failed message:
INFO[2022-01-27T17:55:14.100489400+07:00] Loading containers: start.
WARN[2022-01-27T17:55:14.191666800+07:00] Running iptables --wait -t nat -L -n failed with message: `iptables v1.8.2 (nf_tables): CHAIN_ADD failed (No such file or directory): chain PREROUTING`, error: exit status 4
INFO[2022-01-27T17:55:14.493716300+07:00] stopping event stream following graceful shutdown error="<nil>" module=libcontainerd namespace=moby
INFO[2022-01-27T17:55:14.494906600+07:00] stopping event stream following graceful shutdown error="context canceled" module=libcontainerd namespace=plugins.moby
INFO[2022-01-27T17:55:14.495048400+07:00] stopping healthcheck following graceful shutdown module=libcontainerd
failed to start daemon: Error initializing network controller: error obtaining controller instance: failed to create NAT chain DOCKER: iptables failed: iptables --wait -t nat -N DOCKER: iptables v1.8.2 (nf_tables): CHAIN_ADD failed (No such file or directory): chain PREROUTING
(exit status 4)
that is Debian nat issue, fix it with:
sudo update-alternatives --set iptables /usr/sbin/iptables-legacy
sudo update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
now you can start the service again
you can follow this to make it start on startup https://askubuntu.com/a/1356147/138352
Edited:
if the issue with IP table still persisted try to set WSL version to 2, run the command from Windows shell:
wsl --set-version <distribution name> 2
the distribution list can be found with command wsl -l
I was getting the same error. Apparently on my install of WSL with Debian, I didn't have an etc/fstab file. Surprisingly, just creating the file via 'touch' worked:
sudo touch /etc/fstab
Perhaps a good signal https://learn.microsoft.com/en-us/windows/wsl/release-notes#build-17093
WSL now processes the /etc/fstab file during instance start [GH 2636].
For anybody stumbling across this years later like me, Docker doesn't work inside WSL.
But you can use Docker for Windows and WSL2 to run native containers inside your Linux Distro and the install and config is quite painless https://learn.microsoft.com/en-us/windows/wsl/tutorials/wsl-containers

Docker stopped all of sudden in CentOS 7

I was running docker on my CentOS 7 machine.
Today I was trying to upgrade a container. So I stopped the container and tried to pull new image.
I got the below error
Error getting v2 registry: Get https://registry-1.docker.io/v2/: proxyconnect tcp: dial tcp: lookup https_proxy=http: no such host"
I checked the proxy setting for machine in cat /etc/environment and for docker in cat /etc/systemd/system/docker.service.d/http-proxy.conf
It is set correctly.
I enabled daemon logs for docker and the logs says
Sep 14 10:43:18 myCentOsServer kernel: [4913751.074277] docker0: port 1(veth1e3300a) entered disabled state
Sep 14 10:43:18 myCentOsServer kernel: [4913751.084599] docker0: port 1(veth1e3300a) entered disabled state
Sep 14 10:43:18 myCentOsServer kernel: [4913751.084888] docker0: port 1(veth1e3300a) entered disabled state
Sep 14 10:43:18 myCentOsServer NetworkManager[794]: <info> [1505349798.0267] device (veth1e3300a): released from master device docker0
Sep 14 10:44:48 myCentOsServer dockerd[29136]: time="2017-09-14T10:44:48.802236300+10:00" level=warning msg="Error getting v2 registry: Get https://registry-1.docker.io/v2/: proxyconnect tcp: dial tcp: lookup https_proxy=http: no such host"
I tried below commands but it is stuck.
systemctl daemon-reload
systemctl restart docker
Any idea what might be the issue.
Thanks in advance.
I was finally able to solve this issue.
Issue was with my docker mount points. Mine was set as /var/lib/docker and I suspect it got courrupted when I did data volume export.
Steps I followed
1) Navigated to /var/lib/docker, took a backup of images,containers and volumes folder and deleted them.
2) Reloaded the Daemon
3) Restarted the docker.
Now it is working fine.
However bad news is I lost my datadump which I took from one of the containers (using volumes-from).
But it was a dev version of software. So I reinstalled and did the setup.
It occurs sometimes in CentOS. You can simply restart the docker service by
systemctl restart docker.service

When using mesos, marathon, and zookeeper my mesos-slave doesnt start when I specify the "containerizers" file with "docker,mesos"?

I have 3 CentOS VMs and I have installed Zookeeper, Marathon, and Mesos on the master node, while only putting Mesos on the other 2 VMs. The master node has no mesos-slave running on it. I am trying to run Docker containers so i specified "docker,mesos" in the containerizes file. One of the mesos-agents starts fine with this configuration and I have been able to deploy a container to that slave. However, the second mesos-agent simply fails when I have this configuration (it works if i take out that containerizes file but then it doesn't run containers). Here are some of the logs and information that has come up:
Here are some "messages" in the log directory:
Apr 26 16:09:12 centos-minion-3 systemd: Started Mesos Slave.
Apr 26 16:09:12 centos-minion-3 systemd: Starting Mesos Slave...
WARNING: Logging before InitGoogleLogging() is written to STDERR
[main.cpp:243] Build: 2017-04-12 16:39:09 by centos
[main.cpp:244] Version: 1.2.0
[main.cpp:247] Git tag: 1.2.0
[main.cpp:251] Git SHA: de306b5786de3c221bae1457c6f2ccaeb38eef9f
[logging.cpp:194] INFO level logging started!
[systemd.cpp:238] systemd version `219` detected
[main.cpp:342] Inializing systemd state
[systemd.cpp:326] Started systemd slice `mesos_executors.slice`
[containerizer.cpp:220] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni
[linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
[provisioner.cpp:249] Using default backend 'copy'
[slave.cpp:211] Mesos agent started on (1)#172.22.150.87:5051
[slave.cpp:212] Flags at startup: --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="docker,mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_command_executor="false" --http_heartbeat_interval="30secs" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher="linux" --launcher_dir="/usr/libexec/mesos" --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" --max_completed_executors_per_framework="150" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" --runtime_dir="/var/run/mesos" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/var/lib/mesos"
[slave.cpp:541] Agent resources: cpus(*):1; mem(*):919; disk(*):2043; ports(*):[31000-32000]
[slave.cpp:549] Agent attributes: [ ]
[slave.cpp:554] Agent hostname: node3
[status_update_manager.cpp:177] Pausing sending status updates
[state.cpp:62] Recovering state from '/var/lib/mesos/meta'
[state.cpp:706] No committed checkpointed resources found at '/var/lib/mesos/meta/resources/resources.info'
[status_update_manager.cpp:203] Recovering status update manager
[docker.cpp:868] Recovering Docker containers
[containerizer.cpp:599] Recovering containerizer
[provisioner.cpp:410] Provisioner recovery complete
[group.cpp:340] Group process (zookeeper-group(1)#172.22.150.87:5051) connected to ZooKeeper
[group.cpp:830] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
[group.cpp:418] Trying to create path '/mesos' in ZooKeeper
[detector.cpp:152] Detected a new leader: (id='15')
[group.cpp:699] Trying to get '/mesos/json.info_0000000015' in ZooKeeper
[zookeeper.cpp:259] A new leading master (UPID=master#172.22.150.88:5050) is detected
Failed to perform recovery: Collect failed: Failed to run 'docker -H unix:///var/run/docker.sock ps -a': exited with status 1; stderr='Cannot connect to the Docker daemon. Is the docker daemon running on this host?'
To remedy this do as follows:
Step 1: rm -f /var/lib/mesos/meta/slaves/latest
This ensures agent doesn't recover old live executors.
Step 2: Restart the agent.
Apr 26 16:09:13 centos-minion-3 systemd: mesos-slave.service: main process exited, code=exited, status=1/FAILURE
Apr 26 16:09:13 centos-minion-3 systemd: Unit mesos-slave.service entered failed state.
Apr 26 16:09:13 centos-minion-3 systemd: mesos-slave.service failed.
Logs from docker:
$ sudo systemctl status docker
● docker.service - Docker Application Container Engine Loaded:
loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/docker.service.d
└─flannel.conf Active: inactive (dead) since Tue 2017-04-25 18:00:03 CDT;
24h ago Docs: docs.docker.com Main PID: 872 (code=exited, status=0/SUCCESS)
Apr 26 18:25:25 centos-minion-3 systemd[1]: Dependency failed for Docker Application Container Engine.
Apr 26 18:25:25 centos-minion-3 systemd[1]: Job docker.service/start failed with result 'dependency'
Logs from flannel:
[flanneld-start: network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
You have answer in your logs
Failed to perform recovery: Collect failed:
Failed to run 'docker -H unix:///var/run/docker.sock ps -a': exited with status 1;
stderr='Cannot connect to the Docker daemon. Is the docker daemon running on this host?'
To remedy this do as follows:
Step 1: rm -f /var/lib/mesos/meta/slaves/latest
This ensures agent doesn't recover old live executors.
Step 2: Restart the agent.
Mesos keeps it state/metadata on local disk. When it's restarted it try to load this state. If configuration changed and is not compatible with previous state it won't start.
Just bring docker to live by fixing problems with flannel and etcd and everything will be fine.
add the following flag while starting agent,
--reconfiguration_policy=additive
more details here: http://mesos.apache.org/documentation/latest/agent-recovery/

Docker on RHEL 6 Cgroup mounting failing

I'm trying to get my head around something that's been working on a Centos+Vagrant, but not on our providers RHEL (Red Hat Enterprise Linux Server release 6.5 (Santiago)). A sudo service docker restart hands this:
Stopping docker: [ OK ]
Starting cgconfig service: Error: cannot mount cpuset to /cgroup/cpuset: Device or resource busy
/sbin/cgconfigparser; error loading /etc/cgconfig.conf: Cgroup mounting failed
Failed to parse /etc/cgconfig.conf [FAILED]
Starting docker: [ OK ]
The service starts okey enough, but images cannot run. A mounting failed error is shown when I try. And the startup-log also gives a warning or two. Regarding the kernelwarning, centos gives the same and has no problems as Epel should resolve this:
WARNING: You are running linux kernel version 2.6.32-431.17.1.el6.x86_64, which might be unstable running docker. Please upgrade your kernel to 3.8.0.
2014/08/07 08:58:29 docker daemon: 1.1.2 d84a070; execdriver: native; graphdriver:
[1233d0af] +job serveapi(unix:///var/run/docker.sock)
[1233d0af] +job initserver()
[1233d0af.initserver()] Creating server
2014/08/07 08:58:29 Listening for HTTP on unix (/var/run/docker.sock)
[1233d0af] +job init_networkdriver()
[1233d0af] -job init_networkdriver() = OK (0)
2014/08/07 08:58:29 WARNING: mountpoint not found
Anyone had any success overcoming this problem or should I throw in the towel and wait for the provider to update to RHEL 7?
I have the same issue.
(1) check cgconfig status
# /etc/init.d/cgconfig status
if it stopped, restart it
# /etc/init.d/cgconfig restart
check cgconfig is running
(2) check cgconfig is on
# chkconfig --list cgconfig
cgconfig 0:off 1:off 2:off 3:off 4:off 5:off 6:off
if cgconfig is off, turn it on
(3) if still does not work, may be some cgroups modules is missing. In the kernel .config file, make menuconfig, add those modules into kernel and recompile and reboot
after that, it should be OK
I ended up asking the same question at Google Groups and in the end finding a solution with some help. What worked for me was this:
umount cgroup
sudo service cgconfig start
The project of making Docker work was put on halt all the same. Later a problem of network connection for the containers. This took to much time to solve and had to give up.
So I spent the whole day trying to rig docker to work on my vps. I was running into this same error. Basically what it came down to was the fact that OpenVZ didn't support docker containers up until a couple months ago. Specifically this RHEL update:
https://openvz.org/Download/kernel/rhel6/042stab105.14
Assuming this is your problem, or some variation of it, the burden of solving it is on your host. They will need to follow these steps:
https://openvz.org/Docker_inside_CT
In my case
/etc/rc.d/rc.cgconfig start
was generating
Starting cgconfig service: Error: cannot mount cpu,cpuacct,memory to
/cgroup/cpu_and_mem: Device or resource busy /usr/sbin/cgconfigparser;
error loading /etc/cgconfig.conf: Cgroup mounting failed Failed to
parse /etc/cgconfig.conf
i had to use:
/etc/rc.d/rc.cgconfig restart
and it automagicly umouted and mounted groups
Stopping cgconfig service: Starting cgconfig service:
it seems like the cgconfig service not running,so check it!
# /etc/init.d/cgconfig status
# mkdir -p /cgroup/cpuacct /cgroup/memory /cgroup/devices /cgroup/freezer net_cls /cgroup/blkio
# cat /etc/cgconfig.conf |tail|grep "="|awk '{print "mount -t cgroup -o",$1,$1,$NF}'>cgroup_mount.sh
# sh ./cgroup_mount.sh
# /etc/init.d/cgconfig restart
# /etc/init.d/docker restart
This situation occurs when the kernel is booted with cgroup_disable=memory and /etc/cgconfig.conf contains memory = /cgroup/memory;
This causes only /cgroup/cpuset to be mounted instead of the full set.
Solution: either remove cgroup_disable=memory from your kernel boot options or comment out memory = /cgroup/memory; from cgconfig.conf.
The cgconfig service startup uses mount and umount which requires an extra privilege bump from docker.
See the --privileged=true flag here for more info.
I was able to overcome this issue by starting my container with:
docker run -it --privileged=true my-image.
Tested in Centos6, Centos6.5.

Resources