Docker image fails to create netlink handle - docker

Can anyone help me make sense of the below error and others like it? I've Googled around, but nothing makes sense for my context. I download my Docker Image, but the container refuses to start. The namespace referenced is not always 26, but could be anything from 20-29. I am launching my Docker container onto an EC2 instance and pulling the image from AWS ECR. The error is persistent no matter if I re-launch the instance completely or restart docker.
docker: Error response from daemon: oci runtime error:
container_linux.go:247: starting container process caused
"process_linux.go:334: running prestart hook 0 caused \"error running
hook: exit status 1, stdout: , stderr: time=\\\"2017-05-
11T21:00:18Z\\\" level=fatal msg=\\\"failed to create a netlink handle:
failed to set into network namespace 26 while creating netlink socket:
invalid argument\\\" \\n\"".

Update from my Github issue: https://github.com/moby/moby/issues/33656
It seems like the DeepSecurity agent (ds_agent) running on a container with Docker can cause this issue invariably. A number of other users reported this problem, causing me to investigate. I previously installed ds_agent on these boxes, before replacing it with other software as a business decision, which is when the problem went away. If you are having this problem, might be worthwhile to check if you are running the ds_agent process, or other similar services that could be causing a conflict using 'htop' as the user in the issue above did.

Did you try running it with the --privileged option?
If it still doesn't run, try adding --security-opts seccomp=unconfined and either --security-opts apparmor=unconfined or --security-opts selinux=unconfined depending whether you're running Ubuntu or a distribution with SELinux enabled, respectively.
If it works, try substituting the --privileged option with --cap-add=NET_ADMIN` instead, as running containers in privileged mode is discouraged for security reasons.

Related

Rsyslog can't start inside of a docker container

I've got a docker container running a service, and I need that service to send logs to rsyslog. It's an ubuntu image running a set of services in the container. However, the rsyslog service cannot start inside this container. I cannot determine why.
Running service rsyslog start (this image uses upstart, not systemd) returns only the output start: Job failed to start. There is no further information provided, even when I use --verbose.
Furthermore, there are no error logs from this failed startup process. Because rsyslog is the service that can't start, it's obviously not running, so nothing is getting logged. I'm not finding anything relevant in Upstart's logs either: /var/log/upstart/ only contains the logs of a few things that successfully started, as well as dmesg.log which simply contains dmesg: klogctl failed: Operation not permitted. which from what I can tell is because of a docker limitation that cannot really be fixed. And it's unknown if this is even related to the issue.
Here's the interesting bit: I have the exact same container running on a different host, and it's not suffering from this issue. Rsyslog is able to start and run in the container just fine on that host. So obviously the cause is some difference between the hosts. But I don't know where to begin with that: There are LOTS of differences between the hosts (the working one is my local windows system, the failing one is a virtual machine running in a cloud environment), so I wouldn't know where to even begin about which differences could cause this issue and which ones couldn't.
I've exhausted everything that I know to check. My only option left is to come to stackoverflow and ask for any ideas.
Two questions here, really:
Is there any way to get more information out of the failure to start? start itself is a binary file, not a script, so I can't open it up and edit it. I'm reliant solely on the output of that command, and it's not logging anything anywhere useful.
What could possibly be different between these two hosts that could cause this issue? Are there any smoking guns or obvious candidates to check?
Regarding the container itself, unfortunately it's a container provided by a third party that I'm simply modifying. I can't really change anything fundamental about the container, such as the fact that it's entrypoint is /sbin/init (which is a very bad practice for docker containers, and is the root cause of all of my troubles). This is also causing some issues with the docker logging driver, which is why I'm stuck using syslog as the logging solution instead.

ERROR: traefik, xdbautomationworker, Container is unhealthy

Trying to create sitecore 10 image using Docker on Windows 10 Enterprise locally but getting unhealthy containers. Please help me out as I have tried various steps that was updated in the forums.
Getting below errors:
Creating network "sitecore-xp0_default" with the default driver
Creating sitecore-xp0_solr_1 ... done
Creating sitecore-xp0_mssql_1 ... done
Creating sitecore-xp0_id_1 ... done
Creating sitecore-xp0_solr-init_1 ... done
Creating sitecore-xp0_xconnect_1 ... done
Creating sitecore-xp0_cm_1 ... done
ERROR: for cortexprocessingworker Container "992574e988e3" is unhealthy.
ERROR: for xdbautomationworker Container "992574e988e3" is unhealthy.
ERROR: for xdbsearchworker Container "992574e988e3" is unhealthy.
ERROR: for traefik Container "933b548fc2f9" is unhealthy.
ERROR: Encountered errors while bringing up the project.
Checked the following things:
docker-compose stop on Powershell.
docker-compose down on Powershell.
iisreset /stop on Powershell to make sure that the required ports are free.
docker-compose up -d on Powershell.
Stopped, removed the container and executed the command docker-compose.exe up --detach multiple times but no luck.
Check the .env file and make sure SITECORE_LICENSE has a value.
You may need to run the init.ps1 file.
Based on the logs now provided in the comments above, my suggestion would be to check the collection SQL connection string, to the shardsmanager database.
You can inspect the SQL container in docker for Windows and find the IP address of the SQL server. Connect to that using ssms and try connecting with the creds you have in current string.
Edit: looking again at the exception, it looks like it can't find the SQL server. Yet the CM server appears to not have a problem finding the same server. So compare the web/master/core connection string to the collection one. I'm guessing the SQL server portion will be different?

Docker killing container "permission denied" due to AppArmor. Just why?

I am trying to kill containers I launched through docker-compose. Either by gracefully stopping (Ctrl+C) or by docker-compose down I encounter the following error:
ERROR: for <container-name> cannot stop container: b60c1c4d886899504b...2a022e4d39429dc6ca6e4784afdd: Cannot kill container b60c1c4d886899504b...2a022e4d39429dc6ca6e4784afdd: unknown error after kill: runc did not terminate sucessfully: container_linux.go:388: signaling init process caused "permission denied"
: unknown
I am just looking for the answer WHY. I am trying to understand AppArmor better but understanding WHY I can't stop the containers before everything would help to understand what's going on.
I see that this is an error many people come across. 1 2 3 4
However, most of the answers suggest workarounds and no solutions. Even the explanatory answers like 1 dive directly into AppArmor and profiles. From docker documentation I see docker has a default AppArmor policy docker-default. I partly understand the concept but still don't get WHY I can't stop the containers through the user and the environment I've started them in.
If I try to wrap my questions:
I started some containers as a user, WHY can't I stop them. sudo does not work either. Who can stop them then?
Do I need an AppArmor profile for each container?
I don't feel it's a good idea to restart AppArmor or disable it. Should I do that? What is the ideal solution?
Any feedback or explanations welcome. Thanks.
I couldn't find the exact reason but came close. It seems there were conflicting docker installations on my Ubuntu 19.10 and this was causing this access control issue.
As stated here, I've removed the snap installation. As I already had another installation I didn't have to install it via another way.
sudo snap remove docker

Docker deployments fail on Marathon, work fine otherwise

I have been trying to deploy a docker container web based application on Mesos using Mesosphere Marathon.
I first tried deploying my Play Framework application which works fine when I launch it using the docker container. Then I also tried the example application mention on the Mesosphere website. Both fail inside marathon, but work fine when run as standalone docker images.
The application shows up as "Waiting" or "Deploying" in Marathon web UI while on Mesos it fails. I have made sure that the Mesos slave is running fine.
I believe that because the application fails on Mesos, Marathon tries to restart it which is why I get these status message almost always.
I have previously tried deploying the same application (without wrapping it inside the docker container) on Marathon (same installation) and it has worked fine. However, we really want to use Docker for our applications.
I have gone through plenty of tutorials and everything seems to be following the "rules". I don't understand what could be wrong.
Edit:
E1104 19:29:01.291219 4242 slave.cpp:3342] Container '9dbebe8c-5506-4f70-b560-34be39ecdc96' for executor 'mediator.30dbd1ed-82fc-11e5-b1d4-56847afe9799' of framework '64d39023-aad3-4fdc-8565-6d8e3ec9cb77-0000' failed to start: Failed to 'docker -H unix:///var/run/docker.sock pull devrep/message-mediator:latest': exit status = exited with status 1 stderr = Error: image devrep/message-mediator:latest not found
W1104 19:29:01.293334 4244 docker.cpp:1002] Ignoring updating unknown container: 9dbebe8c-5506-4f70-b560-34be39ecdc96
E1104 19:29:06.711524 4241 slave.cpp:3342] Container 'b7f8004a-2759-41ec-8169-61d04a7c4c3d' for executor 'mediator.343b027e-82fc-11e5-b1d4-56847afe9799' of framework '64d39023-aad3-4fdc-8565-6d8e3ec9cb77-0000' failed to start: Failed to 'docker -H unix:///var/run/docker.sock pull devrep/message-mediator:latest': exit status = exited with status 1 stderr = Error: image devrep/message-mediator:latest not found
Without an actual error message or the logs, it's hard to guess what your problem could be.
My first thought is that you should check whether your Mesos Slaves are started with the --containerizers=docker,mesos flag at all. If not, it can't work at all.
Also, if you're using a private registry, either make sure that Docker on your Mesos Slaves is either configured to use it, or follow the guidelines in the Marathon docs on how o use a private registry.
Can you do a docker pull devrep/message-mediator:latest on any Mesos Slave?
Also, see
https://github.com/mesosphere/marathon/issues/1781
I know its very late to answer it but might be helpful. Seeing your logs I find
devrep/message-mediator:latest
here latest is the tag name of your image, if you don't provide one in container docker image or leave it blank like below
"container": {
"type": "DOCKER",
"docker": {
"image": "devrep/message-mediator",
},
},
it automatically tries to pull the devrep/message-mediator:latest which I highly doubt will be present so try adding a tag name always e.g in my case it was v1
devrep/message-mediator:v1

Unable to run rabbitmq using marathon mesos

I am unable to run rabbitmq using marathon/mesos framework. I have tried it with rabbitmq images available in docker hub as well as custom build rabbitmmq docker image. In the mesos slave log I see the following error:
E0222 12:38:37.225500 15984 slave.cpp:2344] Failed to update resources for container c02b0067-89c1-4fc1-80b0-0f653b909777 of executor rabbitmq.9ebfc76f-ba61-11e4-85c9-56847afe9799 running task rabbitmq.9ebfc76f-ba61-11e4-85c9-56847afe9799 on status update for terminal task, destroying container: Failed to determine cgroup for the 'cpu' subsystem: Failed to read /proc/13197/cgroup: Failed to open file '/proc/13197/cgroup': No such file or directory
On googling I could find one hit as follows
https://github.com/mesosphere/marathon/issues/632
Not sure if this is the issue even I am facing. Anyone tried running rabbitmq using marathon/mesos/docker?
Looks like the process went away (likely crashed) before the container was set up. You should check stdout and stderr to see what happened, and fix the root issue.
"cmd": "", is the like'y culprit. I'd look at couchbase docker containers for a few clues on how to get it working.

Resources