Background
I need to setup a docker-compose with a RabbitMQ service and my application. This RabbitMQ service needs to have 3 things to work properly:
a user named "user1" with full permissions
a vhost named "vhost1"
inside "vhost1", I need an exchange called "Pizza"
What we tried
To achieve this we tried creating a folder in our project called rabbitmq with the following files:
definitions.json
{
"rabbit_version": "3.6.6",
"users": [
{
"name": "user1",
"password_hash": "pass1",
"hashing_algorithm": "rabbit_password_hashing_sha256",
"tags": "administrator"
}
],
"vhosts": [
{
"name": "\/vhost1"
}
],
"permissions": [
{
"user": "user1",
"vhost": "\/vhost1",
"configure": ".*",
"write": ".*",
"read": ".*"
}
],
"parameters": [],
"policies": [],
"queues": [],
"exchanges": [],
"bindings": []
}
rabbitmq.conf
loopback_users.guest = false
listeners.tcp.default = 5672
We are mounting this folder using the volumes command from docker-compose using the following file:
version: '3'
services:
rabbit:
image: rabbitmq:management
ports:
- "8080:15672"
- "5672:5672"
volumes:
- ${PWD}/rabbitmq:/etc/rabbitmq
Problems
We are facing two issues at the moment:
we are not creating the exchange called "Pizza".
we cannot access the RabbitMQ management UI via localhost:8080 even though we specify the mapping of this port in our docker-compose file.
Questions
How do we define an exchange for a vhost in the defininitions.json file? (where can I read about it?)
Why can't we access the UI? What are we doing wrong?
Solutions
1. Exchange creation
The first issue is easily solvable. The reason the exchange is not being created is because the "exchanges" field in the the definitions.json file is empty. To fix this you need to add an exchange object to that list:
"exchanges": [
{
"name": "Pizza",
"vhost": "\/vhost1",
"type": "fanout",
"durable": true,
"auto_delete": false,
"internal": false,
"arguments": {}
}
],
One can read more about this in this blog post:
https://devops.datenkollektiv.de/creating-a-custom-rabbitmq-container-with-preconfigured-queues.html
2. Accessing the management UI
Here there are several problems with the configurations. First, I was smashing the contents of the original /etc/rabbitmq folder in the container with the ones in my local folder. This was not intended, and the fix to this issue can be found here:
Unknown variable "management.load_definitions" in rabbitmq rabbit.conf file
The second issue was in the rabbitmq.conf file. We were missing the field that tells the application to load our definitions file. Following is the correct version of the rabbitmq.conf file:
loopback_users.guest = false
listeners.tcp.default = 5672
management.load_definitions = /etc/rabbitmq/definitions.json
The third (and final) issue was with the user's password, specifically the password_hash field, which needs to follow a specific algorithm and be encoded in a specific format. More about this can be read in RabbitMQ's official documentation:
https://www.rabbitmq.com/passwords.html
To skip the pain of dealing with the salting, hashing and encoding, if all you want is to test a setup for integration purpose like we want, then just go with the password test12 that is given in the example:
"users": [
{
"name": "user1",
"password_hash": "kI3GCqW5JLMJa4iX1lo7X4D6XbYqlLgxIs30+P6tENUV2POR",
"hashing_algorithm": "rabbit_password_hashing_sha256",
"tags": "administrator"
}
]
If however, it is really important for you to know how to generate user passwords that RabbitMQ will accept here is a bash script, created by the blood and tears of a colleague:
#!/bin/bash
PWD_HEX=$(echo -n $1 | xxd -p)
SALT="908D C60A"
HEX="$SALT $PWD_HEX"
SHA256=$(echo -n $HEX | xxd -r -p | sha256sum)
# This is thw pwd to be inserted on your rabbit load_definitions file
echo "908D C60A $SHA256" | xxd -r -p | base64
Usage: ./my_script userpass1
Conclusion
And with all this out of the way one should be able to create users, vhosts and exchanges while also having access to the management UI, all via a docker image.
Related
recently I need to do debug for a single file go binary application which is contained in the docker under k8s environment with its source code. When I package the docker I use the
/dlv --listen=:40000 --headless=true --api-version=2 exec /singleExeFile
and expose the 40000 port to the outer VM like
ports:
- 40000:40000
When I use my dev environment to connect to outer vm with dlv command. It seems that it can be connected. Like the following
foo#foo-vm:~$ dlv connect 110.123.123.123:40000
Type 'help' for list of commands.
(dlv)
But when use vscode to attach to the code, it meets two error(The vscode has installed go extension)
When use legacy connect, there is my launch.json
{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"name": "Connect to server",
"type": "go",
"debugAdapter": "legacy",
"request": "attach",
"mode": "remote",
"port": 40000,
"host": "110.123.123.123",
"substitutePath": [
{
"from": "${workspaceFolder}/cmd/maine.go",
"to": "/singleExeFile"
}
]
}
]
}
But the vscode raises error and I haven't found similar error in google. Error: Socket connection to remote was closed
Use the dlv-dap method to connect
{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"name": "Delve into Docker",
"type": "go",
"debugAdapter": "dlv-dap",
"request": "attach",
"mode": "remote",
"port": 40000,
"host": "110.123.123.123",
"substitutePath": [
{
"from": "${workspaceFolder}/cmd/maine.go",
"to": "/singleExeFile"
}
]
}
]
}
And when try to connect, there is no error raised by vscode. Just try to connect and stop by vscode. Even don't know what's the error.
With verbose param, there still isn't any output in the DEBUG console for dlv method. But for legacy method, the following error is outputted. Please check.
By the way, add verbose in legacy method and it raises some detailed message in DEBUG CONSOLE.
AttachRequest
Start remote debugging: connecting 110.123.123.123:40000
To client: {"seq":0,"type":"event","event":"initialized"}
InitializeEvent
To client: {"seq":0,"type":"response","request_seq":2,"command":"attach","success":true}
From client: configurationDone(undefined)
ConfigurationDoneRequest
Socket connection to remote was closed
To client: {"seq":16,"type":"response","request_seq":2,"command":"attach","success":false,"message":"Failed to continue: Check the debug console for details.","body":{"error":{"id":3000,"format":"Failed to continue: Check the debug console for details.","showUser":true}}}
Sending TerminatedEvent as delve is closed
To client: {"seq":0,"type":"event","event":"terminated"}
From client: disconnect({"restart":false})
DisconnectRequest
New Update in 9 July
I made another try to create a simple docker using the following dockerfile
FROM golang:1.16.15
RUN mkdir -p /var/lib/www && mkdir -p /var/lib/temp
WORKDIR /var/lib/temp
COPY . ./
RUN go env -w GOPROXY="https://goproxy.cn,direct"
RUN go install github.com/go-delve/delve/cmd/dlv#latest
RUN go mod tidy
RUN go build
RUN mv ./webproj /var/lib/www/ && rm -rf /var/lib/temp
WORKDIR /var/lib/www
COPY ./build.sh ./
EXPOSE 8080
EXPOSE 2345
RUN chmod 777 ./webproj
RUN chmod 777 ./build.sh
ENTRYPOINT ["/bin/bash","./build.sh"]
And the build.sh code is like
dlv --listen=:2345 --headless=true --api-version=2 --accept-multiclient exec ./webproj
After that, it works with GoLand debug. GoLand can debug when I send the GET api with designed. But it still can't be work with VSCode. When I use the vscode, it did connect to the docker. But when I add the break point. It shows that this is an unverified BreakPoint and can't stop.
Here is my launch.json
{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"name": "Connect to server",
"type": "go",
"request": "attach",
"mode": "remote",
"remotePath": "${fileDirname}",
"port": 2345,
"host": "127.0.0.1"
}
]
}
So currently this is blocked. Help is very needed. Thanks.
How can I calculate a deterministic and reproducible checksum of a docker image, locally, without pinging any registry?
The checksum should not depend on the image name or in which registry it lives. It should solely depend on the content of all layers.
For example, assume the following:
a given file a
a dockerfile with the content
FROM scratch
COPY a /a
Then building the image with docker build . --no-cache multiple times should always yield the same checksum.
The regular image ID does not cut it, as it somehow uses content from intermediate containers and hence always changes. I am also aware that since Docker 1.10, images have a "RepoDigest" attribute, which uniquely identifies images based on their layers' content. However, as far as I can tell, that digest is only calculated when pulling or pushing to a registry. Is there a way to get this field without contacting a registry? (and is it actually deterministic, regardless of image name, tag or repo?)
Basically, I'm looking for a way to run a good ol' sha256sum on a docker image. This would help me to achieve something similar to as what can be done with Bazel: a hermetic build environment, which in turn enables:
declaring dependencies between docker images, and have a CI system only rebuild what is needed without using docker's cache (assuming that I have a build tool which already manages caches)
allow me to "sign" images using the same approach as signing classic tarballs (that is, publish a checksum and somehow sign that)
the big one: enable reproducible builds!
This should be what Sigstore is for. It is made up of three projects:
Cosign, which signs software.
Fulcio, a certificate authority that lets anyone access short-lived certificates via OpenID Connect.
Rekor, a secure log of signing events that allows you to verify the provenance of software artifacts.
You can then follow "Keyless Sign and Verify Your Container Images With Cosign" (Chris Nesbitt-Smith)
Behind the scenes, cosign creates the keypair ephemerally (they last 20 minutes) and gets them signed by Fulcio using your authenticated OIDC identity.
That is OIDC: OpenID Connect 1.0 is a simple identity layer on top of the OAuth 2.0 protocol.
OIDC allows:
Clients to verify the identity of the End-User based on the authentication performed by an Authorization Server,
as well as to obtain basic profile information about the End-User in an interoperable and REST-like manner.
COSIGN_EXPERIMENTAL=1 cosign sign image:tag
COSIGN_EXPERIMENTAL=1 cosign verify image:tag
But you would need to setup your own local OCI registry in order to keep the all toolchain local, since cosign stores signatures in an OCI registry, and uses a naming convention (tag based on the sha256 of what we're signing) for locating the signature index.
It looks like you're working on the same problem that I'm actively solving right now.
The big issue with the question is that container image builds with docker build are not deterministic or reproducible unless it happens to reuse the cache from a previous build. A container image build, even with the same filesystem layers, contains metadata on that build, and the metadata contains timestamps:
$ regctl manifest get localhost:5000/library/alpine --platform linux/amd64 --format body | jq .
{
"schemaVersion": 2,
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"config": {
"mediaType": "application/vnd.docker.container.image.v1+json",
"size": 1472,
"digest": "sha256:0ac33e5f5afa79e084075e8698a22d574816eea8d7b7d480586835657c3e1c8b"
},
"layers": [
{
"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
"size": 2814559,
"digest": "sha256:df9b9388f04ad6279a7410b85cedfdcb2208c0a003da7ab5613af71079148139"
}
]
}
$ regctl blob get localhost:5000/library/alpine sha256:0ac33e5f5afa79e084075e8698a22d574816eea8d7b7d480586835657c3e1c8b | jq .
{
"architecture": "amd64",
"config": {
"Hostname": "",
"Domainname": "",
"User": "",
"AttachStdin": false,
"AttachStdout": false,
"AttachStderr": false,
"Tty": false,
"OpenStdin": false,
"StdinOnce": false,
"Env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
],
"Cmd": [
"/bin/sh"
],
"Image": "sha256:d49869997c508135352366cebd3509ee756bba1ceb8eef708a4c3ff0d481084a",
"Volumes": null,
"WorkingDir": "",
"Entrypoint": null,
"OnBuild": null,
"Labels": null
},
"container": "b714116bd3f3418e7b61a6d70dd7244382f0844e47a8d1d66dbf61cb1cb02b2b",
"container_config": {
"Hostname": "b714116bd3f3",
"Domainname": "",
"User": "",
"AttachStdin": false,
"AttachStdout": false,
"AttachStderr": false,
"Tty": false,
"OpenStdin": false,
"StdinOnce": false,
"Env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
],
"Cmd": [
"/bin/sh",
"-c",
"#(nop) ",
"CMD [\"/bin/sh\"]"
],
"Image": "sha256:d49869997c508135352366cebd3509ee756bba1ceb8eef708a4c3ff0d481084a",
"Volumes": null,
"WorkingDir": "",
"Entrypoint": null,
"OnBuild": null,
"Labels": {}
},
"created": "2022-04-05T00:19:59.912662499Z",
"docker_version": "20.10.12",
"history": [
{
"created": "2022-04-05T00:19:59.790636867Z",
"created_by": "/bin/sh -c #(nop) ADD file:5d673d25da3a14ce1f6cf66e4c7fd4f4b85a3759a9d93efb3fd9ff852b5b56e4 in / "
},
{
"created": "2022-04-05T00:19:59.912662499Z",
"created_by": "/bin/sh -c #(nop) CMD [\"/bin/sh\"]",
"empty_layer": true
}
],
"os": "linux",
"rootfs": {
"type": "layers",
"diff_ids": [
"sha256:4fc242d58285699eca05db3cc7c7122a2b8e014d9481f323bd9277baacfa0628"
]
}
}
Both the "created" and "history" steps have timestamps that will be unique to the build. Changing those timestamps changes the digest of the config blob, which changes the digest of the image manifest.
The next issue you'll run into is that json serialization would need to be canonical. Some tools will use pretty formatting like jq, others will eliminate all unneeded whitespace for compactness, the order of listing multiple keys in a map doesn't need to be alphabetical, etc. So you need to ensure the same tool is always used for serialization and it has canonical output.
To build without pushing to a registry, you can have docker's buildkit output to an OCI layout tar file:
docker build --output type=oci,dest=/path/to/file.tar .
And in that tar, you will find an index.json with the digest of an image manifest as it was created by buildkit. I've been taking this a step further with regclient's image modification features, changing timestamps (in my case to the git commit time) and stripping other mutable values from the build. Then I verify the result matches a previous build.
Tools like cosign will allow you to sign an image using a digest rather than depending on the image in the registry, even before that image has been pushed.
The image mod feature in regclient is still very much a WIP, but you can see the current features here:
$ regctl image mod --help
EXPERIMENTAL: Applies requested modifications to an image
Usage:
regctl image mod <image_ref> [flags]
Flags:
--annotation stringArray set an annotation (name=value) (default )
--annotation-base stringArray set base image annotations (image/name:tag,sha256:digest) (default )
--buildarg-rm string delete a build arg (default "")
--buildarg-rm-regex string delete a build arg with a regex value (default "")
--config-time-max string max timestamp for a config (default "")
--create string Create tag
--data-max stringArray sets or removes descriptor data field (size in bytes) (default )
--expose-add stringArray add an exposed port (default )
--expose-rm stringArray delete an exposed port (default )
--external-urls-rm remove external url references from layers (first copy image with "--include-external") (default )
-h, --help help for mod
--label stringArray set an label (name=value) (default )
--label-to-annotation set annotations from labels (default )
--layer-rm-created-by string delete a layer based on history (created by string is a regex) (default "")
--layer-rm-index uint delete a layer from an image (index begins at 0) (default )
--layer-strip-file string delete a file or directory from all layers (default "")
--layer-time-max string max timestamp for a layer (default "")
--replace Replace tag (ignored when "create" is used)
--time-max string max timestamp for both the config and layers (default "")
--to-oci convert to OCI media types (default )
--volume-add stringArray add a volume definition (default )
--volume-rm stringArray delete a volume definition (default )
Global Flags:
--logopt stringArray Log options
--user-agent string Override user agent
-v, --verbosity string Log level (debug, info, warn, error, fatal, panic) (default "warning")
The other part of the puzzle is to make RUN steps reproducible. That's less trivial since not only do the files have timestamps, but the contents of the files being created could have timestamps or other mutable content, and the commands could pull from external mutable sources. Solving that part of the problem is still a work in progress for me.
For a docker image, named "hello-world":
docker save --output hello-world.tar hello-world
sha256sum hello-world.tar
It should give you the content sha of image.
I am attempting to run a simple daily batch script that can run for some hours, after which it will send the data it generated and shut down the instance. To achieve that, I have put the following into user-data:
users:
- name: cloudservice
uid: 2000
runcmd:
- sudo HOME=/home/root docker-credential-gcr configure-docker
- |
sudo HOME=/home/root docker run \
--rm -u 2000 --name={service_name} {image_name} {command}
- shutdown
final_message: "machine took $UPTIME seconds to start"
I am creating the instance using a python script to generate the configuration for the API like so:
def build_machine_configuration(
compute, name: str, project: str, zone: str, image: str
) -> Dict:
image_response = (
compute.images()
.getFromFamily(project="cos-cloud", family="cos-stable")
.execute()
)
source_disk_image = image_response["selfLink"]
machine_type = f"zones/{zone}/machineTypes/n1-standard-1"
# returns the cloud init from above
cloud_config = build_cloud_config(image)
config = {
"name": f"{name}",
"machineType": machine_type,
# Specify the boot disk and the image to use as a source.
"disks": [
{
"type": "PERSISTENT",
"boot": True,
"autoDelete": True,
"initializeParams": {"sourceImage": source_disk_image},
}
],
# Specify a network interface with NAT to access the public
# internet.
"networkInterfaces": [
{
"network": "global/networks/default",
"accessConfigs": [{"type": "ONE_TO_ONE_NAT", "name": "External NAT"}],
}
],
# Allow the instance to access cloud storage and logging.
"serviceAccounts": [
{
"email": "default",
"scopes": [
"https://www.googleapis.com/auth/devstorage.read_write",
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/datastore",
"https://www.googleapis.com/auth/bigquery",
],
}
],
# Metadata is readable from the instance and allows you to
# pass configuration from deployment scripts to instances.
"metadata": {
"items": [
{
# Startup script is automatically executed by the
# instance upon startup.
"key": "user-data",
"value": cloud_config,
},
{"key": "google-monitoring-enabled", "value": True},
]
},
}
return config
I am however running out of disk space inside the docker engine.
Any ideas on how to increase the size of the volume available to docker services?
The Docker engine uses the space of the disk of the Instance. So if the container doesn't have space is because the disk of the Instance is full.
The first thing that you can try to do is create an Instance with a bigger disk. The documentation says:
disks[ ].initializeParams.diskSizeGb string (int64 format)
Specifies the size of the disk in base-2 GB. The size must be at least
10 GB. If you specify a sourceImage, which is required for boot disks,
the default size is the size of the sourceImage. If you do not specify
a sourceImage, the default disk size is 500 GB.
You could increase the size adding the field diskSizeGb in the deployment:
"disks": [
{
[...]
"initializeParams": {
"diskSizeGb": 50,
[...]
Other thing you could try is execute the following command in the instance to see if the disk is full and what partition is full:
$ df -h
In the same way you could execute the following command to see the disk usage of the Docker Engine:
$ docker system df
The client and daemon API must both be at least 1.25 to use this command. Use the docker version command on the client to check your client and daemon API versions.
If you want more infomration you could use the flag -v
$ docker system df -v
I've been putting together a POC mesos/marathon system that I am using to launch and control docker images.
I have a Vagrant virtual machine running in VirtualBox on which I run docker, marathon, zookeeper, mesos-master and mesos-slave processes, with everything working as expected.
I decided to add Chronos into the mix and initially I started with it running as a service on the vagrant VM, but then opted to switch to running it in a docker container using the mesosphere/chronos image.
I have found that I can get container image to start and run successfully when I specify HOST network mode for the container, but when I change to BRIDGE mode then I run into problems.
In BRIDGE mode, the chronos framework registers successfully with mesos (I can see the entry on the frameworks page of the mesos UI), but it looks as though the framework itself doesn't know that the registration was successful. The mesos master log if full of messages like:
strong textI1009 09:47:35.876454 3131 master.cpp:2094] Received SUBSCRIBE call for framework 'chronos-2.4.0' at scheduler-16d21dac-b6d6-49f9-90a3-bf1ba76b4b0d#172.17.0.59:37318
I1009 09:47:35.876832 3131 master.cpp:2164] Subscribing framework chronos-2.4.0 with checkpointing enabled and capabilities [ ]
I1009 09:47:35.876924 3131 master.cpp:2174] Framework 20151009-094632-16842879-5050-3113-0001 (chronos-2.4.0) at scheduler-16d21dac-b6d6-49f9-90a3-bf1ba76b4b0d#172.17.0.59:37318 already subscribed, resending acknowledgement
This implies some sort of configuration/communication issue but I have not been able to work out exactly what the root of the problem is. I'm not sure if there is any way to confirm if the acknowledgement from mesos is making it back to chronos or to check the status of the communication channels between the components.
I've done a lot of searching and I can find posts by folk who have encountered the same issue but I haven't found an detailed explanation of what needs to be done to correct it.
For example, I found the following post which mentions a problem that was resolved and which implies the user successfully ran their chronos container in bridge mode, but their description of the resolution was vague. There was also this post but the change suggested did resolve the issue that I am seeing.
Finally there was a post by someone at ILM who had what sound like exactly my problem and the resolution appeared to involve a fix to Mesos to introduce two new environment variables LIBPROCESS_ADVERTISE_IP and LIBPROCESS_ADVERTISE_PORT (on top of LIBPROCESS_IP and LIBPROCESS_PORT) but I can't find a decent explanation of what values should be assigned to any of these variables, so have yet to work out whether the change will resolve the issue I am having.
It's probably worth mentioning that I've also posted a couple of questions on the chronos-scheduler group, but I haven't had any responses to these.
If it's of any help the versions of software I'm running are as follows (the volume mount allows me to provide values of other parameters [e.g. master, zk_hosts] as files, without having to keep changing the JSON):
Vagrant: 1.7.4
VirtualBox: 5.0.2
Docker: 1.8.1
Marathon: 0.10.1
Mesos: 0.24.1
Zookeeper: 3.4.5
The JSON that I am using to launch the chronos container is as follows:
{
"id": "chronos",
"cpus": 1,
"mem": 1024,
"instances": 1,
"container": {
"type": "DOCKER",
"docker": {
"image": "mesosphere/chronos",
"network": "BRIDGE",
"portMappings": [
{
"containerPort": 4400,
"hostPort": 0,
"servicePort": 4400,
"protocol": "tcp"
}
]
},
"volumes": [
{
"containerPath": "/etc/chronos/conf",
"hostPath": "/vagrant/vagrantShared/chronos",
"mode": "RO"
}
]
},
"cmd": "/usr/bin/chronos --http_port 4400",
"ports": [
4400
]
}
If anyone has any experience of using chronos in a configuration like this then I'd appreciate any help that you might be able to provide in resolving this issue.
Regards,
Paul Mateer
I managed to work out the answer to my problem (with a little help from the sample framework here), so I thought I should post a solution to help anyone else the runs into the same issue.
The chronos service (and also the sample framework) were configured to communicate with zookeeper on the IP associated with the docker0 interface on the host (vagrant) VM (in this case 172.17.42.1).
Zookeeper would report the master as being available on 127.0.1.1 which was the IP address of the host VM that the mesos-master process started on, but although this IP address could be pinged from the container any attempt to connect to specific ports would be refused.
The solution was to start the mesos-master with the --advertise_ip parameter and specify the IP of the docker0 interface. This meant that although the service started on the host machine it would appear as though it had been started on the docker0 ionterface.
Once this was done communications between mesos and the chronos framework started completeing and the tasks scheduled in chronos ran successfully.
Running Mesos 1.1.0 and Chronos 3.0.1, I was able to successfully configure Chronos in BRIDGE mode by explicitly setting LIBPROCESS_ADVERTISE_IP, LIBPROCESS_ADVERTISE_PORT and pinning its second port to a hostPort which isn't ideal but the only way I could find to make it advertise its port to Mesos properly:
{
"id": "/core/chronos",
"cmd": "LIBPROCESS_ADVERTISE_IP=$(getent hosts $HOST | awk '{ print $1 }') LIBPROCESS_ADVERTISE_PORT=$PORT1 /chronos/bin/start.sh --hostname $HOST --zk_hosts master-1:2181,master-2:2181,master-3:2181 --master zk://master-1:2181,master-2:2181,master-3:2181/mesos --http_credentials ${CHRONOS_USER}:${CHRONOS_PASS}",
"cpus": 0.1,
"mem": 1024,
"disk": 100,
"instances": 1,
"container": {
"type": "DOCKER",
"volumes": [],
"docker": {
"image": "mesosphere/chronos:v3.0.1",
"network": "BRIDGE",
"portMappings": [
{
"containerPort": 9900,
"hostPort": 0,
"servicePort": 0,
"protocol": "tcp",
"labels": {}
},
{
"containerPort": 9901,
"hostPort": 9901,
"servicePort": 0,
"protocol": "tcp",
"labels": {}
}
],
"privileged": true,
"parameters": [],
"forcePullImage": true
}
},
"env": {
"CHRONOS_USER": "admin",
"CHRONOS_PASS": "XXX",
"PORT1": "9901",
"PORT0": "9900"
}
}
I have a private Docker registry that is accessible at https://docker.somedomain.com (over standard port 443 not 5000). My infrastructure includes a set up of Mesosphere, which have docker containerizer enabled. I'm am trying to deploy a specific container to a Mesos slave via Marathon; however, this always fails with Mesos failing the task almost immediately with no data in stderr and stdout of that sandbox.
I tried deploying from an image from the standard Docker Registry and it appears to work fine. I'm having trouble figuring out what is wrong. My private Docker registry does not require password authentication (turned off for debugging this), AND if I shell into the Meso's slave instance, and sudo su as root, I can run a 'docker pull docker.somedomain.com/services/myapp' successfully every time.
Here is my Marathon post data for starting the task:
{
"id": "myapp",
"cpus": 0.5,
"mem": 64.0,
"instances": 1,
"container": {
"type": "DOCKER",
"docker": {
"image": "docker.somedomain.com/services/myapp:2",
"network": "BRIDGE",
"portMappings": [
{ "containerPort": 7000, "hostPort": 0, "servicePort": 0, "protocol": "tcp" }
]
},
"volumes": [
{
"containerPath": "application.yml",
"hostPath": "/var/myapp/application.yml",
"mode": "RO"
}
]
},
"healthChecks": [
{
"protocol": "HTTP",
"portIndex": 0,
"path": "/",
"gracePeriodSeconds": 5,
"intervalSeconds": 20,
"maxConsecutiveFailures": 3
}
]
}
I've been stuck on this for almost a day now, everything I've tried seems to be yielding the same result. Any insights on this would be much appreciated.
My versions:
Mesos: 0.22.1
Marathon: 0.8.2
Docker: 1.6.2
So this turns out to be an issue with volumes
"volumes": [
{
"containerPath": "/application.yml",
"hostPath": "/var/myapp/application.yml",
"mode": "RO"
}
]
Using the root path of the container of the root path may be legal in docker, but Mesos appears not to handle this behavior. Modifying the containerPath to a non-root path resolves this, i.e
"volumes": [
{
"containerPath": "/var",
"hostPath": "/var/myapp",
"mode": "RW"
}
]
If it is a problem between Marathon and the registry, the answer should be in the http logs of your registry. If Marathon connects, there will be an entry. And the Mesos master log should contain a clue as well.
It doesn't really sound like a problem between Marathon and Registry though. Are you sure you have 'docker,mesos' in /etc/mesos-slave/containerizers?
Did you --despite having no authentification-- try to follow Using a Private Docker Repository?
To supply credentials to pull from a private repository, add a .dockercfg to the uris field of your app. The $HOME environment variable will then be set to the same value as $MESOS_SANDBOX so Docker can automatically pick up the config file.