Use nvidia-docker instead of docker with Ansible

Use nvidia-docker instead of docker with Ansible - docker

I'm trying to figure out how to use nvidia-docker (https://github.com/NVIDIA/nvidia-docker) using https://docs.ansible.com/ansible/latest/docker_container_module.html#docker-container.
Problem
My current Ansible playbook execute my container using "docker" command instead of "nvidia-docker".
What I have done
According to some readings, I have tried adding my devices, without success
docker_container:
name: testgpu
image: "{{ image }}"
devices: ['/dev/nvidiactl', '/dev/nvidia-uvm', '/dev/nvidia0', '/dev/nvidia-uvm-tools]
state: started
note I tried different syntax for devices (inline ..), but still getting the same problem
This command does not throws any error. As expected it creates a Docker container with my image and try to start it.
Looking at my container logs:
terminate called after throwing an instance of 'std::runtime_error'
what(): No CUDA driver found
which is the exact same error I'm getting when running
docker run -it <image>
instead of
nvidia-docker run -it <image>
Any ideas how to override docker command when using docker_container with Ansible?
I can confirm my CUDA drivers are installed, and all the path /dev/nvidia* are valid.
Thanks

docker_container module doesn't use docker executable, it uses Docker daemon API through docker-py Python library.
Looking at nvidia-docker wrapper script, it sets --runtime=nvidia and -e NVIDIA_VISIBLE_DEVICES.
To set NVIDIA_VISIBLE_DEVICES you can use env argument of docker_container.
But I see no ways to set runtime via docker_container module as of current Ansible 2.4.
You can try to overcome this by setting "default-runtime": "nvidia" in your daemon.json configuration file, so Docker daemon will use nvidia runtime by default.

Related

Jenkins Docker plugin volume/mount what syntax to use

I have a linux vm on which I installed docker. I have several docker containers with the different programs I have to use. Here's my architecture:
Everything is working fine except for the red box.
What I am trying to do is to dynamically provide a jenkins docker-in-docker agent with the cloud functionality in order to build my docker images and push them to the docker registry I set up.
I have been looking for documentation to create a docker in docker container and I found this:
https://jpetazzo.github.io/2015/09/03/do-not-use-docker-in-docker-for-ci/
This article states that in order to avoid problems with my main docker installation I have to create a volume:
-v /var/run/docker.sock:/var/run/docker.sock
I tested my image locally and I have no problem to run
docker run -d -v --name test /var/run/docker.sock:/var/run/docker.sock
docker exec -it test /bin/bash
docker run hello-world
The container is using the linux vm docker installation to build and run the docker images so everything is fine.
However, I face problems when it comes to the jenkins docker cloud configuration.
From what I gather, since the #826 build, the docker jenkins plugin has change its syntax for volumes.
This is the configuration I tried:
And the error message I have when trying to launch the agent:
Reason: Template provisioning failed.
com.github.dockerjava.api.exception.BadRequestException: {"message":"create
/var/run/docker.sock: \"/var/run/docker.sock\" includes invalid characters for a local
volume name, only \"[a-zA-Z0-9][a-zA-Z0-9_.-]\" are allowed. If you intended to pass a
host directory, use absolute path"}
I also tried that configuration:
Reason: Template provisioning failed.
com.github.dockerjava.api.exception.BadRequestException: {"message":"invalid mount config for type \"volume\": invalid mount path: './var/run/docker.sock' mount path must be absolute"}
I do not get what that means as on my linux vm the docker.sock absolute path is /var/run/docker.sock, and it is the same path inside the docker in docker I ran locally...
I tried to check the source code to find what I did wrong but it's unclear what the code is doing for me (https://github.com/jenkinsci/docker-plugin/blob/master/src/main/java/com/nirima/jenkins/plugins/docker/DockerTemplateBase.java, from row 884 onward), I also tried with backslashes, etc. Nothing worked.
Has anyone any idea what is the expected syntax in that configuration panel for setting up a simple volume?

Change the configuration to this:
type=bind,source=/var/run/docker.sock,destination=/var/run/docker.sock
it is not a volume, it is a bind type.

This worked for me
type=bind,source=/sys/fs/cgroup,target=/sys/fs/cgroup,readonly

Input file not found in docker command on windows

Complete docker noob here, i installed docker desktop on windows - Trying to follow the commands on this link to setup OSRM backend on my machine. i've downloaded the dataset for india(india-latest.osm.pbf) to D:/docker
and am running the commands from that location
docker run -t -v "${PWD}:/data" osrm/osrm-backend osrm-extract -p /opt/car.lua /data/india-latest.osm.pbf
fails with
[error] Input file /data/india-latest.osm.pbf not found!
i just don't understand WHY it doesn't work. according to osrm documentation of the docker command -
The file /data/india-latest.osm.pbf inside the container is referring
to "${PWD}/india-latest.osm.pbf" on the host.
but it's not the case,i am running from d:/docker so it should find india-latest.osm.pbf no problem. This is really really confusing to me even though it must be so basic

it was due to a bug in docker https://github.com/docker/for-win/issues/1712
when you change password it silently fails for commands that access the host filesystem on windows until you reauthenticate

Address Sanitizer with gcc fails on Ubuntu 17.10 docker container

Executing a binary compiled with gcc 7.2.0+ASan fails on Ubuntu 17.10 docker container with following error:
==5==HINT: LeakSanitizer does not work under ptrace (strace, gdb, etc)

LSan (that performs leak checks) attaches to the program under test via ptrace. It fails to do so under docker as it does not have permissions.
This can be fixed by running docker container with privileges using either of the two options:
docker run .... --privileged
or more specific:
docker run .... --cap-add SYS_PTRACE
--cap-add SYS_PTRACE is a much more preferred option for CI and automation as it restricts privileges to ptrace only.

Packer Docker Builder with remote docker daemon

I'm using packer docker builder with ansible to create docker image (https://www.packer.io/docs/builders/docker.html)
I have a machine(client) which is meant to run build scripts. The packer docker is executed with ansible from this machine. This machine has docker client. It's connected to a remote docker daemon. The environment variable DOCKER_HOST is set to point to the remote docker host. I'm able to test the connectivity and things are working good.
Now the problem is, when I execute packer docker to build the image, it errors out saying:
docker: Run command: docker run -v /root/.packer.d/tmp/packer-docker612435850:/packer-files -d -i -t ubuntu:latest /bin/bash
==> docker: Error running container: Docker exited with a non-zero exit status.
==> docker: Stderr: docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?.
==> docker: See 'docker run --help'.
It seems the packer docker is stuck looking at local daemon.
Workaround: I renamed docker binary and introduced a script called "docker" which sets DOCKER_HOST and invokes the original docker binary with parameters passed on.
Is there a better way to deal this?

Packers Docker builder doesn't work with remote hosts since packer uses the /packer-files volume mount to communicate with the container. This is vaguely expressed in the docs with:
The Docker builder must run on a machine that has Docker installed.
And explained in Overriding the host directory.

Way to set "--rm" flag for Ansible Docker module?

I am using the Ansible Docker module and trying to run docker with the "--rm" flag set. However, I do not see an option for specifying to use the "--rm" flag or a way to pass in which Docker flags to set on the Ansible Docker Module.
Is there a way to set the "--rm" flag when starting a container with the Ansible Docker module?
Thanks

The Docker module linked by OP is deprecated and #Lexandro's answer is outdated.
This is now supported in the newer module named docker_container as the auto_remove feature (added in 2.4)

deprecated
--rm only implemented in the docker client itself with combining two functions: run then remove and only works in interactive mode. So you can't run container with -d option or invoke this function via RESTful API. You can use --rm only in case if you call it via docker run --rm ...........

If you have a older docker API version, then you will get
"msg": "Docker API version is x.xx. Minimum version required is x.xx to set auto_remove option."
Hence may be you can use the
cleanup: yes
It cleans up the container after the command executes which does the same as --rm.
I had similar issue when I had,
"msg": "Docker API version is 1.23. Minimum version required is 1.25 to set auto_remove option."

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart