Docker deployments fail on Marathon, work fine otherwise - docker

I have been trying to deploy a docker container web based application on Mesos using Mesosphere Marathon.
I first tried deploying my Play Framework application which works fine when I launch it using the docker container. Then I also tried the example application mention on the Mesosphere website. Both fail inside marathon, but work fine when run as standalone docker images.
The application shows up as "Waiting" or "Deploying" in Marathon web UI while on Mesos it fails. I have made sure that the Mesos slave is running fine.
I believe that because the application fails on Mesos, Marathon tries to restart it which is why I get these status message almost always.
I have previously tried deploying the same application (without wrapping it inside the docker container) on Marathon (same installation) and it has worked fine. However, we really want to use Docker for our applications.
I have gone through plenty of tutorials and everything seems to be following the "rules". I don't understand what could be wrong.
Edit:
E1104 19:29:01.291219 4242 slave.cpp:3342] Container '9dbebe8c-5506-4f70-b560-34be39ecdc96' for executor 'mediator.30dbd1ed-82fc-11e5-b1d4-56847afe9799' of framework '64d39023-aad3-4fdc-8565-6d8e3ec9cb77-0000' failed to start: Failed to 'docker -H unix:///var/run/docker.sock pull devrep/message-mediator:latest': exit status = exited with status 1 stderr = Error: image devrep/message-mediator:latest not found
W1104 19:29:01.293334 4244 docker.cpp:1002] Ignoring updating unknown container: 9dbebe8c-5506-4f70-b560-34be39ecdc96
E1104 19:29:06.711524 4241 slave.cpp:3342] Container 'b7f8004a-2759-41ec-8169-61d04a7c4c3d' for executor 'mediator.343b027e-82fc-11e5-b1d4-56847afe9799' of framework '64d39023-aad3-4fdc-8565-6d8e3ec9cb77-0000' failed to start: Failed to 'docker -H unix:///var/run/docker.sock pull devrep/message-mediator:latest': exit status = exited with status 1 stderr = Error: image devrep/message-mediator:latest not found

Without an actual error message or the logs, it's hard to guess what your problem could be.
My first thought is that you should check whether your Mesos Slaves are started with the --containerizers=docker,mesos flag at all. If not, it can't work at all.
Also, if you're using a private registry, either make sure that Docker on your Mesos Slaves is either configured to use it, or follow the guidelines in the Marathon docs on how o use a private registry.
Can you do a docker pull devrep/message-mediator:latest on any Mesos Slave?
Also, see
https://github.com/mesosphere/marathon/issues/1781

I know its very late to answer it but might be helpful. Seeing your logs I find
devrep/message-mediator:latest
here latest is the tag name of your image, if you don't provide one in container docker image or leave it blank like below
"container": {
"type": "DOCKER",
"docker": {
"image": "devrep/message-mediator",
},
},
it automatically tries to pull the devrep/message-mediator:latest which I highly doubt will be present so try adding a tag name always e.g in my case it was v1
devrep/message-mediator:v1

Related

Testcontainers do not start after replacing Docker Desktop with minikube

I want to make my testcontainers in Java integration tests work with minikube replacing Docker Desktop.
I followed below article to get started:
https://www.atomicjar.com/2021/10/docker-on-windows-and-macos/#minikube
This is what I've got in testcontainers.properties
docker.client.strategy=org.testcontainers.dockerclient.EnvironmentAndSystemPropertyClientProviderStrategy
docker.host=tcp\://192.168.64.2\:2376
docker.cert.path=/Users/username/.minikube/certs
docker.tls.verify=true
Although my docker is up and running, I'm getting following exception:
Caused by: java.lang.IllegalStateException: Could not find a valid Docker environment. Please see logs and check configuration
Can anybody please suggest anything to make it working?
TA
If you are using gradle try -no-daemon flag to use a new daemon. Your old gradle daemon still using your previous testcontainers properties, also restart your IDE if you're running your build inside.
After restarting Minikube and Intellij editor, and updating testcontainer-bom to be the latest - from 1.15 to 1.16.2, I was able to pull some third-party docker images. This means docker is working now.
However, I'm still trying to find a way to work with local images (Other application docker images) for integration testing as it used to work with Docker Desktop.

Running `docker stack deploy` on a local VM results in "No such image" error even though the image is on the public registry

I'm trying to follow the Docker Get Started guide. Currently I'm at part 4. Everything up until the point
docker stack deploy -c docker-compose.yml getstartedlab
worked well. However, after trying to deploy the services, when I run docker stack ps getstartedlab, I see that the swarm manager keeps trying to restart the containers, since every time they get the error "No such image: username/get-st…" and have their state as "Rejected 6 seconds ago" etc.
I tried to search for solutions a bit but surprisingly it seems that nobody encountered this error before whatsoever. The issue here and a similar section in the Get Started guide talks about situations where one wants to pull from a private registry. However, throughout the tutorial I've been working with the default public registry. All previous steps (e.g. launching the swarm locally, without using virtualbox) worked fine.
Versions:
Docker version 18.02.0-ce, build fc4de447b5
Virtualbox 5.2.8 r120774
System Kernel: 4.14.25-1-MANJARO
Any idea what might have been the problem?
Surprisingly passing in the flag --with-registry-auth worked even though my repo is apparently on Docker Hub. Not sure what the problem was but maybe the claim that one would only need this flag if they're using a private registry is a bit inaccurate then.

Docker image fails to create netlink handle

Can anyone help me make sense of the below error and others like it? I've Googled around, but nothing makes sense for my context. I download my Docker Image, but the container refuses to start. The namespace referenced is not always 26, but could be anything from 20-29. I am launching my Docker container onto an EC2 instance and pulling the image from AWS ECR. The error is persistent no matter if I re-launch the instance completely or restart docker.
docker: Error response from daemon: oci runtime error:
container_linux.go:247: starting container process caused
"process_linux.go:334: running prestart hook 0 caused \"error running
hook: exit status 1, stdout: , stderr: time=\\\"2017-05-
11T21:00:18Z\\\" level=fatal msg=\\\"failed to create a netlink handle:
failed to set into network namespace 26 while creating netlink socket:
invalid argument\\\" \\n\"".
Update from my Github issue: https://github.com/moby/moby/issues/33656
It seems like the DeepSecurity agent (ds_agent) running on a container with Docker can cause this issue invariably. A number of other users reported this problem, causing me to investigate. I previously installed ds_agent on these boxes, before replacing it with other software as a business decision, which is when the problem went away. If you are having this problem, might be worthwhile to check if you are running the ds_agent process, or other similar services that could be causing a conflict using 'htop' as the user in the issue above did.
Did you try running it with the --privileged option?
If it still doesn't run, try adding --security-opts seccomp=unconfined and either --security-opts apparmor=unconfined or --security-opts selinux=unconfined depending whether you're running Ubuntu or a distribution with SELinux enabled, respectively.
If it works, try substituting the --privileged option with --cap-add=NET_ADMIN` instead, as running containers in privileged mode is discouraged for security reasons.

Docker container stops without any errors while runnning sbt/play application

I'm running into an issue where my docker container will exit with exit code 137 after ~a day of running. The logs for the container contains no information indicating that an error code has occurred. Additionally, attempts to restart the container returns an error that the PID already exists for the application.
The container is built using the sbt docker plugin, sbt docker:publishLocal and is then run using
docker run --name=the_app --net=the_app_nw -d the_app:1.0-SNAPSHOT.
I'm also running 3 other docker containers which all together do use 90% of the available memory, but its only ever that particular container which exits.
Looking for any advice on to where to look next.
The error code 137 (128+9) means that it was killed (like kill -9 yourApp) by something. That something can be a lot of things (maybe it was killed because was using too much resources by docker or something else, maybe it got out of memory, etc)
Regarding the pid problem, you can add to your build.sbt this
javaOptions in Universal ++= Seq(
"-Dpidfile.path=/dev/null"
)
Basically this should instruct Play to not create a RUNNING_PID file. If it does not work you can try to pass that option directly in Docker using the JAVA_OPTS env variable.

Unable to run rabbitmq using marathon mesos

I am unable to run rabbitmq using marathon/mesos framework. I have tried it with rabbitmq images available in docker hub as well as custom build rabbitmmq docker image. In the mesos slave log I see the following error:
E0222 12:38:37.225500 15984 slave.cpp:2344] Failed to update resources for container c02b0067-89c1-4fc1-80b0-0f653b909777 of executor rabbitmq.9ebfc76f-ba61-11e4-85c9-56847afe9799 running task rabbitmq.9ebfc76f-ba61-11e4-85c9-56847afe9799 on status update for terminal task, destroying container: Failed to determine cgroup for the 'cpu' subsystem: Failed to read /proc/13197/cgroup: Failed to open file '/proc/13197/cgroup': No such file or directory
On googling I could find one hit as follows
https://github.com/mesosphere/marathon/issues/632
Not sure if this is the issue even I am facing. Anyone tried running rabbitmq using marathon/mesos/docker?
Looks like the process went away (likely crashed) before the container was set up. You should check stdout and stderr to see what happened, and fix the root issue.
"cmd": "", is the like'y culprit. I'd look at couchbase docker containers for a few clues on how to get it working.

Resources