Getting error while running the Mesos container in mesos cluster using marathon - docker

"container": {
"type": "MESOS",
"docker": {
"image": "redis",
"forcePullImage": false
}
}
The above example has the container type as Mesos..but again specifying "docker" image... for using universal container or mesos container, do we need to install docker?
because, when i try to run a sample in mesos with type "mesos" container, i am getting error like this:
unsupported container image:DOCKER.
I have not installed docker.
I am using Mesos1.1 version

See https://mesosphere.github.io/marathon/docs/native-docker.html#mesos-containerizer-and-universal-container-runtime for a valid example on how to run a Docker image with the Mesos UCR.
{
"id": "mesos-docker",
"container": {
"docker": {
"image": "mesosphere/inky"
},
"type": "MESOS"
},
"args": ["hello"],
"cpus": 0.2,
"mem": 16.0,
"instances": 1
}
You'll need Marathon >= 1.3.0 and Mesos >= 1.0 for that.

Related

Can you tell me the solution to the change of service ip in mesos + marathon combination?

I am currently posting a docker service with the MESOS + Marathon combination.
This means that the IP address of the docker is constantly changing.
For example, if you put mongodb on marathon, you would use the following code.
port can specify the port that is coming into the host. After a day, the service will automatically shut down and run and the IP will change.
So, when I was looking for a method called mesos dns, when I was studying the docker command, I learned how to find the ip of the service with the alias name by specifying the network alias in the docker.
I thought it would be easier to access without using mesos dns by using this method.
However, in marathon, docker service is executed in json format like below.
I was asked because I do not know how to specify the docker network alias option or the keyword or method.
{
"id": "mongodbTest",
"instances": 1,
"cpus": 2,
"mem": 2048.0,
"container": {
"type": "DOCKER",
"docker": {
"image": "mongo:latest",
"network": "BRIDGE",
"portMappings": [
{
"containerPort": 27017,
"hostPort": 0,
"servicePort": 0,
"protocol": "tcp"
}
]
},
"volumes": [
{
"containerPath": "/etc/mesos-mg",
"hostPath": "/var/data/mesos-mg",
"mode": "RW"
}
]
}
}

POD Definition - Deploying to DC/OS

I'm new to DC/OS and I have been really struggling trying to deploy a POD. I have tried the simple examples provided in the documentation
but the deployments remain stuck in the deploying stage. There are plenty of resources available so that is not the issue.
I have 3 containers that I need to exist within a virtual network (queue, PDI, API). I have included my definition file that starts with a single container deployment and once I can successfully deploy I will add 2 additional containers to the definition. I have been looking at this example but have been unsuccessful.
I have successfully deployed the containers one at a time through Jenkins. All 3 images have been published and exist in the docker registry (Jfrog). I have included an example of my marathon.json for one of those successful deployments. I would appreciate any feedback that can help. The service is stuck in a deployed stage so I'm unable to drill down and see the logs via the command line or UI.
containers.image = pdi-queue
artifactory server = repos.pdi.com:5010/pdi-queue
1 Container POD Definition - (Error: Stuck in Deployment Stage)
{
"id":"/pdi-queue",
"containers":[
{
"name":"simple-docker",
"resources":{
"cpus":1,
"mem":128,
"disk":0,
"gpus":0
},
"image":{
"kind":"DOCKER",
"id":"repos.pdi.com:5010/pdi-queue",
"portMappings":[
{
"hostPort": 0,
"containerPort": 15672,
"protocol": "tcp",
"servicePort": 15672
}
]
},
"endpoints":[
{
"name":"web",
"containerPort":80,
"protocol":[
"http"
]
}
],
"healthCheck":{
"http":{
"endpoint":"web",
"path":"/"
}
}
}
],
"networks":[
{
"mode":"container",
"name":"dcos"
}
]
}
Marathon.json - (No Error: Successful deployment)
{
"id": "/pdi-queue",
"backoffFactor": 1.15,
"backoffSeconds": 1,
"container": {
"portMappings": [
{"containerPort": 15672, "hostPort": 0, "protocol": "tcp", "servicePort": 15672, "name": "health"},
{"containerPort": 5672, "hostPort": 0, "protocol": "tcp", "servicePort": 5672, "name": "queue"}
],
"type": "DOCKER",
"volumes": [],
"docker": {
"image": "repos.pdi.com:5010/pdi-queue",
"forcePullImage": true,
"privileged": false,
"parameters": []
}
},
"cpus": 0.1,
"disk": 0,
"healthChecks": [
{
"gracePeriodSeconds": 300,
"intervalSeconds": 60,
"maxConsecutiveFailures": 3,
"portIndex": 0,
"timeoutSeconds": 20,
"delaySeconds": 15,
"protocol": "MESOS_HTTP",
"path": "/"
}
],
"instances": 1,
"maxLaunchDelaySeconds": 3600,
"mem": 512,
"gpus": 0,
"networks": [
{
"mode": "container/bridge"
}
],
"requirePorts": false,
"upgradeStrategy": {
"maximumOverCapacity": 1,
"minimumHealthCapacity": 1
},
"killSelection": "YOUNGEST_FIRST",
"unreachableStrategy": {
"inactiveAfterSeconds": 300,
"expungeAfterSeconds": 600
},
"fetch": [],
"constraints": [],
"labels": {
"traefik.frontend.redirect.entryPoint": "https",
"traefik.frontend.redirect.permanent": "true",
"traefik.enable": "true"
}
}
I may not know the answer to the issues you are running into but I think I may be able to share some pointers to help debug this.
First of all, if you are unable to view logs from the DC/OS UI, you can also go to <cluster_url>/mesos and find the simple_docker task under Completed Tasks . It would show up as TASK_FAILED. Click on the Sandbox link on the right and then check stderr and stdout files for the task. There might be some clues there as to why it failed.
Another place to look can be to note the Agent IP from the Mesos UI where the task failed. SSH into the node and run sudo journalctl -u dcos-mesos-slave to see agent logs and try to find the logs corresponding to the failing task
One difference between the running the application as a Pod and a the App definition you shared is that your app definition is using DOCKER as the containerizer for the task while Pods use MESOS containerizer.
I noticed that you are using a private docker registry for your docker images. One possibility is that if your private registry's certificate is not trusted by Mesos but docker is configured already to trust it:
<copy the certificate(s) to /var/lib/dcos/pki/tls/certs>
cd /var/lib/dcos/pki/tls/certs
for file in *.crt; do ln -s \"$file\" \"$(openssl x509 -hash -noout -in \"$file\")\".0; done
This would need to be done on each agent node.
If its not a certificate issue, it could be docker registry credential issues. If the docker registry you are using requires authentication then you can specify docker credential at install time (assuming advanced install method) using : https://docs.mesosphere.com/1.11/installing/production/advanced-configuration/configuration-reference/#cluster-docker-credentials

Marathon Docker Tasks Failing

I have setup Marathon and Mesos on two of my machines.
I can successfully schedule commands from the marathon web console, but when I try to schedule a job involving docker images I immediately get job failed. Plus I get no stderr or stdout files.
Example Running a normal command:
Marathon job conf:
{
"id": "testecho",
"cmd": "echo hello; sleep 10",
"cpus": 1,
"mem": 128,
"disk": 0,
"instances": 1
}
On mesos I see that the tasks have succeeded. I have the stderr and stdout files like normal.
But now if I run a simple docker image task:
Marathon job conf:
{
"id": "/ubuntu",
"cmd": "date -u +%T",
"cpus": 0.5,
"mem": 512,
"disk": 0,
"instances": 1,
"container": {
"type": "DOCKER",
"volumes": [],
"docker": {
"image": "libmesos/ubuntu",
"network": null,
"portMappings": null,
"privileged": false,
"parameters": [],
"forcePullImage": false
}
},
"portDefinitions": [
{
"port": 10001,
"protocol": "tcp",
"labels": {}
}
]
}
On mesos, I see that it has instantly failed:
And I have no stderr or stdout files:
I also notice that on both my machines, when I run:
docker ps -a
I see nothing on both the machines. So that would mean that the docker jobs were not even launched
What could be affecting docker deployment?
The one reason I can think of is that the user that marathon uses to launch tasks not have access to docker? How do I test this?
I noticed that when I run the command:
sudo cat /etc/passwd
I see a user zookeeper. Maybe this is the user that doesn't have access to docker?
But when i do:
su zookeeper
I don't change user profiles
After going through a few tutorials I found the answer from the following tutorial: http://frankhinek.com/deploy-docker-containers-on-mesos-0-20/
I had to enable Docker Containerizer on my mesos-slaves
Set the --containerizers=docker,mesos" command line parameter:
echo "docker,mesos" | sudo tee /etc/mesos-slave/containerizers
Increase the executor timeout to 5 minutes1: (i guess this is optional)
echo "5mins" | sudo tee /etc/mesos-slave/executor_registration_timeout
Restart the Mesos Slave:
sudo service mesos-slave restart

Marathon With Private Docker Repo

I'm having issues pulling from a private docker repo when I add a marathon application. I've tarred my ~/.docker folder (including the docker.config file which contains my login information) and distributed that along to my mesos slaves as /etc/docker.tar.gz (I'm using docker 1.6.2).
I've then added a new marathon app with:
dcos marathon add app marathon.json
My marathon.json is as follows:
{
"id": "api",
"cpus": 1,
"mem": 1024,
"instances": 1,
"container": {
"type": "DOCKER",
"docker": {
"image": "company/api",
}
},
"args": ["java", "-jar", "api.jar"],
"uris": [
"file:///etc/docker.tar.gz"
]
}
The marathon app never starts, however. In my slave logs I've found the following line:
Container x for executor y of framework z failed to start: Failed to 'docker pull company/api': exit status = exited with status 1 stderr = time="2015-11-12T00:03:57Z" level=fatal msg="Error: image company/api:latest not found"
How can I get this to pull correctly?

Mesos cannot deploy container from private Docker registry

I have a private Docker registry that is accessible at https://docker.somedomain.com (over standard port 443 not 5000). My infrastructure includes a set up of Mesosphere, which have docker containerizer enabled. I'm am trying to deploy a specific container to a Mesos slave via Marathon; however, this always fails with Mesos failing the task almost immediately with no data in stderr and stdout of that sandbox.
I tried deploying from an image from the standard Docker Registry and it appears to work fine. I'm having trouble figuring out what is wrong. My private Docker registry does not require password authentication (turned off for debugging this), AND if I shell into the Meso's slave instance, and sudo su as root, I can run a 'docker pull docker.somedomain.com/services/myapp' successfully every time.
Here is my Marathon post data for starting the task:
{
"id": "myapp",
"cpus": 0.5,
"mem": 64.0,
"instances": 1,
"container": {
"type": "DOCKER",
"docker": {
"image": "docker.somedomain.com/services/myapp:2",
"network": "BRIDGE",
"portMappings": [
{ "containerPort": 7000, "hostPort": 0, "servicePort": 0, "protocol": "tcp" }
]
},
"volumes": [
{
"containerPath": "application.yml",
"hostPath": "/var/myapp/application.yml",
"mode": "RO"
}
]
},
"healthChecks": [
{
"protocol": "HTTP",
"portIndex": 0,
"path": "/",
"gracePeriodSeconds": 5,
"intervalSeconds": 20,
"maxConsecutiveFailures": 3
}
]
}
I've been stuck on this for almost a day now, everything I've tried seems to be yielding the same result. Any insights on this would be much appreciated.
My versions:
Mesos: 0.22.1
Marathon: 0.8.2
Docker: 1.6.2
So this turns out to be an issue with volumes
"volumes": [
{
"containerPath": "/application.yml",
"hostPath": "/var/myapp/application.yml",
"mode": "RO"
}
]
Using the root path of the container of the root path may be legal in docker, but Mesos appears not to handle this behavior. Modifying the containerPath to a non-root path resolves this, i.e
"volumes": [
{
"containerPath": "/var",
"hostPath": "/var/myapp",
"mode": "RW"
}
]
If it is a problem between Marathon and the registry, the answer should be in the http logs of your registry. If Marathon connects, there will be an entry. And the Mesos master log should contain a clue as well.
It doesn't really sound like a problem between Marathon and Registry though. Are you sure you have 'docker,mesos' in /etc/mesos-slave/containerizers?
Did you --despite having no authentification-- try to follow Using a Private Docker Repository?
To supply credentials to pull from a private repository, add a .dockercfg to the uris field of your app. The $HOME environment variable will then be set to the same value as $MESOS_SANDBOX so Docker can automatically pick up the config file.

Resources