Marathon Docker Tasks Failing - docker

I have setup Marathon and Mesos on two of my machines.
I can successfully schedule commands from the marathon web console, but when I try to schedule a job involving docker images I immediately get job failed. Plus I get no stderr or stdout files.
Example Running a normal command:
Marathon job conf:
{
"id": "testecho",
"cmd": "echo hello; sleep 10",
"cpus": 1,
"mem": 128,
"disk": 0,
"instances": 1
}
On mesos I see that the tasks have succeeded. I have the stderr and stdout files like normal.
But now if I run a simple docker image task:
Marathon job conf:
{
"id": "/ubuntu",
"cmd": "date -u +%T",
"cpus": 0.5,
"mem": 512,
"disk": 0,
"instances": 1,
"container": {
"type": "DOCKER",
"volumes": [],
"docker": {
"image": "libmesos/ubuntu",
"network": null,
"portMappings": null,
"privileged": false,
"parameters": [],
"forcePullImage": false
}
},
"portDefinitions": [
{
"port": 10001,
"protocol": "tcp",
"labels": {}
}
]
}
On mesos, I see that it has instantly failed:
And I have no stderr or stdout files:
I also notice that on both my machines, when I run:
docker ps -a
I see nothing on both the machines. So that would mean that the docker jobs were not even launched
What could be affecting docker deployment?
The one reason I can think of is that the user that marathon uses to launch tasks not have access to docker? How do I test this?
I noticed that when I run the command:
sudo cat /etc/passwd
I see a user zookeeper. Maybe this is the user that doesn't have access to docker?
But when i do:
su zookeeper
I don't change user profiles

After going through a few tutorials I found the answer from the following tutorial: http://frankhinek.com/deploy-docker-containers-on-mesos-0-20/
I had to enable Docker Containerizer on my mesos-slaves
Set the --containerizers=docker,mesos" command line parameter:
echo "docker,mesos" | sudo tee /etc/mesos-slave/containerizers
Increase the executor timeout to 5 minutes1: (i guess this is optional)
echo "5mins" | sudo tee /etc/mesos-slave/executor_registration_timeout
Restart the Mesos Slave:
sudo service mesos-slave restart

Related

how to bring up failed container

have a container that failed after a long setup and i want to log in (exec bash) at that point instead of executing the slow setup again. Is there any way?
The container is a left over from a docker build process, it is still the FROM ... AS builder stage.
if i try to start it, it will fail right away.
$ docker start -ai 3d35a7f7a7b4
/bin/sh: mvn: command not found
trying to exec anything right away doesn't work either
$ docker start 3d35a7f7a7b4 & docker exec 3d35a7f7a7b4 -it /bin/sh
[1] 403273
3d35a7f7a7b4
unable to upgrade to tcp, received 500
[1]+ Done docker start 3d35a7f7a7b4
more info:
$ docker inspect 3d35a7f7a7b4
[
{
"Id": "3d35a7f7a7b4018ebbbd9aa59356714d7fed291a43752cbcb86dd852c946cc1e",
"Created": "2022-07-06T23:56:37.001004587Z",
"Path": "/bin/sh",
"Args": [
"-c",
"mvn --version"
],
"State": {
"Status": "exited",
"Running": false,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 0,
"ExitCode": 127,
"Error": "",
"StartedAt": "2022-07-07T00:02:35.755444447Z",
"FinishedAt": "2022-07-07T00:02:35.75741167Z"
},
"Image": "sha256:4819e2469963fdf531ec5bce5401b7ae7d28cd403528c0109512b5170ef61752",
...
this is not an optimal answer. Here just for documentation (and for people to vote up if it is the best one can do with docker)
docker run can be used on the image of the stopped container, and you can pass the CMD parameter right away. But any other peculiarity of the stopped container will also have to be repeated. e.g. network.
for the example on the question:
host$ docker run -it sha256:4819e2469963fdf531ec5bce5401b7ae7d28cd403528c0109512b5170ef61752 /bin/bash
container# _

Marathon With Private Docker Repo

I'm having issues pulling from a private docker repo when I add a marathon application. I've tarred my ~/.docker folder (including the docker.config file which contains my login information) and distributed that along to my mesos slaves as /etc/docker.tar.gz (I'm using docker 1.6.2).
I've then added a new marathon app with:
dcos marathon add app marathon.json
My marathon.json is as follows:
{
"id": "api",
"cpus": 1,
"mem": 1024,
"instances": 1,
"container": {
"type": "DOCKER",
"docker": {
"image": "company/api",
}
},
"args": ["java", "-jar", "api.jar"],
"uris": [
"file:///etc/docker.tar.gz"
]
}
The marathon app never starts, however. In my slave logs I've found the following line:
Container x for executor y of framework z failed to start: Failed to 'docker pull company/api': exit status = exited with status 1 stderr = time="2015-11-12T00:03:57Z" level=fatal msg="Error: image company/api:latest not found"
How can I get this to pull correctly?

marathon docker jobs hanged in deployment state

Hi I have been successfull so far with simple jobs in marathon but it stuck when i have tried deploying a deocker job in mesos through marathon framework.
I am using a json file as below to deploy a docker job:
{
"id": "pga-docker",
"cpus": 0.2,
"mem": 1024.0,
"instances": 1,
"container": {
"type": "DOCKER",
"docker": {
"image": "pga",
"network": "BRIDGE",
"portMappings": [
{ "containerPort": 80, "hostPort": 6565, "servicePort": 0, "protocol": "tcp" }
]
}
}
}
My pga docker image have no problem when run as container, but through marathon its just not working. Its staying in the deploying state forever.
I am using the below command line:
curl -X POST http://10.141.141.10:8080/v2/apps -d #basic-3.json -H "Content-type: application/json"
But when I run the same image from marathon UI, its working. To run from marathon I used "docker run --publish 6060:80 --name test --rm pga" in the cmd field of the UI new job page.
Any one have idea why this is hanged in the command line approach?
This is what i have found during some trial and error with the json file.
I found that when we run docker image in local system, if we have mentioned an entry point or a cmd then that will execute while running the container. But this is not same for mesos/marathon. my observation is that if I explicitly mentioned cmd in the deployment json then its working fine.
"cmd":"sh pga-setup.sh"
I will love to know if anyone faced a similar issue an solved it by another way.

marathon does not delete a docker container after destroying the job

when I run a docker container as a marathon job, it creates a docker container in the active mesos slave system. when suspend or destroy the docker job what I expect that marathon should delete the docker container as its no longer required. But the container does not get deleted. I have to delete them manually every time marathon restart a docker container job.
is there any way to delete these unwanted containers automatically?
Edit:
Adding json file for initiating a marathon job
{
"id": "pga-docker",
"cmd":"sh pga-setup.sh",
"cpus": 0.5,
"mem": 1024.0,
"container": {
"type": "DOCKER",
"docker": {
"image": "pga:test",
"parameters": [
{ "key": "env", "value": "SERVER_HOST=value" },
{ "key": "env", "value": "SERVER_PORT=value" }
],
"network": "BRIDGE",
"portMappings": [
{ "containerPort": 80, "hostPort": 0}
]
}
}
}
Marathon will restart a docker container which failed so that you have the number of instances you requested. It could be that you see stopped/failed containers which were not cleaned up by Mesos. This could be related to the fact that Mesos delays container cleanup until GC.
see https://issues.apache.org/jira/browse/MESOS-1656
It is the behavior of Marathon, because it is meant for long running services, as soon the task is completed, Marathon assumes it has been terminated in that host and immediately it will assign a new instance for running the application. If you need one of task you can use Chronos, so it makes the task to run only one time. I have written a script to do this automatically for marathon.
start=$1
end=$2
for (( c=$start; c<=$end; c++ ))
do
echo "deleting:$c"
sleep 10
var=$(curl -X GET http://localhost:8080/v2/apps/docker-app-$c | grep "startedAt")
echo "$var"
if [[ $var == *"startedAt"* ]]
then
curl -X DELETE http://localhost:8080/v2/apps/docker-app-$c
echo "going to delete"
else
echo "application not started yet"
fi
sleep 1
done
echo "Completed!"

Mesos cannot deploy container from private Docker registry

I have a private Docker registry that is accessible at https://docker.somedomain.com (over standard port 443 not 5000). My infrastructure includes a set up of Mesosphere, which have docker containerizer enabled. I'm am trying to deploy a specific container to a Mesos slave via Marathon; however, this always fails with Mesos failing the task almost immediately with no data in stderr and stdout of that sandbox.
I tried deploying from an image from the standard Docker Registry and it appears to work fine. I'm having trouble figuring out what is wrong. My private Docker registry does not require password authentication (turned off for debugging this), AND if I shell into the Meso's slave instance, and sudo su as root, I can run a 'docker pull docker.somedomain.com/services/myapp' successfully every time.
Here is my Marathon post data for starting the task:
{
"id": "myapp",
"cpus": 0.5,
"mem": 64.0,
"instances": 1,
"container": {
"type": "DOCKER",
"docker": {
"image": "docker.somedomain.com/services/myapp:2",
"network": "BRIDGE",
"portMappings": [
{ "containerPort": 7000, "hostPort": 0, "servicePort": 0, "protocol": "tcp" }
]
},
"volumes": [
{
"containerPath": "application.yml",
"hostPath": "/var/myapp/application.yml",
"mode": "RO"
}
]
},
"healthChecks": [
{
"protocol": "HTTP",
"portIndex": 0,
"path": "/",
"gracePeriodSeconds": 5,
"intervalSeconds": 20,
"maxConsecutiveFailures": 3
}
]
}
I've been stuck on this for almost a day now, everything I've tried seems to be yielding the same result. Any insights on this would be much appreciated.
My versions:
Mesos: 0.22.1
Marathon: 0.8.2
Docker: 1.6.2
So this turns out to be an issue with volumes
"volumes": [
{
"containerPath": "/application.yml",
"hostPath": "/var/myapp/application.yml",
"mode": "RO"
}
]
Using the root path of the container of the root path may be legal in docker, but Mesos appears not to handle this behavior. Modifying the containerPath to a non-root path resolves this, i.e
"volumes": [
{
"containerPath": "/var",
"hostPath": "/var/myapp",
"mode": "RW"
}
]
If it is a problem between Marathon and the registry, the answer should be in the http logs of your registry. If Marathon connects, there will be an entry. And the Mesos master log should contain a clue as well.
It doesn't really sound like a problem between Marathon and Registry though. Are you sure you have 'docker,mesos' in /etc/mesos-slave/containerizers?
Did you --despite having no authentification-- try to follow Using a Private Docker Repository?
To supply credentials to pull from a private repository, add a .dockercfg to the uris field of your app. The $HOME environment variable will then be set to the same value as $MESOS_SANDBOX so Docker can automatically pick up the config file.

Resources