Deployment manifest for Azure IoT Edge module not configuring docker build correctly - docker

I am currently trying to get an IoT edge module built, however, the options I'm using in the deployment manifest aren't showing up in the docker build command generated when I right click on it and select "Build IoT Edge Solution" in VS code. In particular, I need one of the modules to use the GPU on the host machine as well as several volumes. I have the following under the module's section of the deployment.json:
"camera-module": {
# Other stuff
"settings": {
"image": "${MODULES.CameraModule}",
"createOptions": {
"HostConfig":{
"Binds":["volume/source:/volume/destination:rw",
"some/more:/volumes:rw",
],
"DeviceRequests": [
{
"Driver": "",
"Count": -1,
"DeviceIDs": null,
"Capabilities": [
[
"gpu"
]
],
"Options": {}
}
],
}
}
},
# More stuff
}
The GPU is necessary or I run into the following error when attempting to execute a command while building the container.
Step 4/10 : RUN /isaac-sim/python.sh -m pip install --upgrade pip
---> Running in 0b953f3d327f
running as root
Fatal Error: Can't find libGLX_nvidia.so.0...
Ensure running with NVIDIA runtime. (--gpus all) or (--runtime nvidia)
The command '/bin/sh -c /isaac-sim/python.sh -m pip install --upgrade pip' returned a non-zero code: 1
The actual docker build command generated by clicking on the options described at top of post is:
docker build --rm -f "/path/to/modules/CameraModule/Dockerfile.amd64" -t blah.azurecr.io/cameramodule:0.0.1-amd64 "path/to/modules/CameraModule"
The lack of volumes in the above command are what's making me think they aren't there, as I can't even build the container to actually test that right now. I could just modify the actual docker build command, but I'd prefer to just have the deployment manifest generate it correctly so that it's usable long term. How can I make sure that the gpu and volumes will be properly set up?

Related

Building devcontainer with --ssh key for GitHub repository in build process fails on VS Code for ARM Mac

We are trying to run a python application using a devcontainer.json with VS Code.
The Dockerfile includes the installation of GitHub repositories with pip that require an ssh key. To build the images, we usually use the --ssh flag to pass the required key. We then use this key to run pip inside the Dockerfile as follows:
RUN --mount=type=ssh,id=ssh_key python3.9 -m pip install --no-cache-dir -r pip-requirements.txt
We now want to run a devcontainer.json inside VS Code. We have been trying many different ways.
1. Passing the --ssh key using the build arg variable:
Since you can not directly pass the --ssh key, we tried a workaround:
"args": {"kek":"kek --platform=linux/amd64 --ssh ssh_key=/Users/user/.ssh/id_rsa"}
This produces an OK looking build command that works in a normal terminal, but inside VS Code the key is not being passed and the build fails. (Both on Windows & Mac)
2. Putting an initial build command into the initializeCommand parameter and then a simple build command that should use the cached results:
We run a first build inside the initializeCommand parameter:
"initializeCommand": "docker buildx build --platform=linux/amd64 --ssh ssh_key=/Users/user/.ssh/id_rsa ."
and then we have a second build in the regular parameter:
"build": {
"dockerfile": "../Dockerfile",
"context": "..",
"args": {"kek":"kek --platform=linux/amd64"}
}
This solution is a nice workaround and works flawlessly on Windows. On the ARM Mac, however, only the initializeCommand build stage runs well, the actual build fails, as it does not use the cached version of the images. The crucial step when the --ssh key is used, fails just like described before.
We have no idea why the Mac VS Code ignores the already created images. In a regular terminal, again, the second build command generated by VS Code works flawlessly.
The problem is reproducible on different ARM Macs, and on different repositories.
Here is the entire devcontainer:
{
"name": "Dockername",
"build": {
"dockerfile": "../Dockerfile",
"context": "..",
"args": {"kek":"kek --platform=linux/amd64"}
},
"initializeCommand": "docker buildx build --platform=linux/amd64 --ssh ssh_key=/Users/user/.ssh/id_rsa .",
"runArgs": ["--env-file", "configuration.env", "-t"],
"customizations": {
"vscode": {
"extensions": [
"ms-python.python"
]
}
}
}
So, we finally found a work around:
We add a target to the initialize command:
"initializeCommand": "docker buildx build --platform=linux/amd64 --ssh ssh_key=/Users/user/.ssh/id_rsa -t dev-image ."
We create a new Dockerfile Dockerfile-devcontainer that only uses one line:
FROM --platform=linux/amd64 docker.io/library/dev-image:latest
In the build command of the devcontainer use that Dockerfile:
"name": "Docker",
"initializeCommand": "docker buildx build --platform=linux/amd64 --ssh ssh_key=/Users/user/.ssh/id_rsa -t dev-image:latest .",
"build": {
"dockerfile": "Dockerfile-devcontainer",
"context": "..",
"args": {"kek":"kek --platform=linux/amd64"}
},
"runArgs": ["--env-file", "configuration.env"],
"customizations": {
"vscode": {
"extensions": [
"ms-python.python"
]
}
}
}
In this way we can use the .ssh key and the docker image created in the initializeCommand (Tested on MacOS and Windows).

Create a custom Docker dev environment

I like the Docker dev environment tool but I'd like to also be able have some tools preinstalled when a user clones the repository using the Docker Dev Environment tool.
I've have a .devcontainer folder in the repository with a Dockerfile:
# [Choice] Alpine version: 3.13, 3.12, 3.11, 3.10
ARG VARIANT="3.13"
FROM mcr.microsoft.com/vscode/devcontainers/base:0-alpine-${VARIANT}
# Install Terraform CLI
# Install GCloud SDK
And a devcontainer.json file:
{
"name": "Alpine",
"build": {
"dockerfile": "Dockerfile",
// Update 'VARIANT' to pick an Alpine version: 3.10, 3.11, 3.12, 3.13
"args": { "VARIANT": "3.13" }
},
// Set *default* container specific settings.json values on container create.
"settings": {},
// Add the IDs of extensions you want installed when the container is created.
// Note that some extensions may not work in Alpine Linux. See https://aka.ms/vscode-remote/linux.
"extensions": [],
// Use 'forwardPorts' to make a list of ports inside the container available locally.
// "forwardPorts": [],
// Use 'postCreateCommand' to run commands after the container is created.
// "postCreateCommand": "uname -a",
// Uncomment when using a ptrace-based debugger like C++, Go, and Rust
// "runArgs": [ "--cap-add=SYS_PTRACE", "--security-opt", "seccomp=unconfined" ],
// Comment out connect as root instead. More info: https://aka.ms/vscode-remote/containers/non-root.
"remoteUser": "vscode"
}
I've tried to include curl and install commands in the Dockerfile but the commands just don't seem to work. To clarify, once the container is built I can't seem to access the CLI tools eg. terraform --version says terraform not found.
The docker launches as a VSCode window running in the container and I am attempting to use the CLI tools from the VSCode terminal if that makes a difference.
EDIT: So the issue is that creating an environment from the Docker dashboard doesn't read in your .devcontainer folder and files, it jus creates a stock basic container. You need to clone the repository, open in VSCode, and then Reopen in Container and it will build your environment.
I swapped to Ubuntu as the base image instead of Alpine and then instead of creating the dev environment from the Docker dashboard I instead opened the project folder locally in VSCode and selected "Reopen in Container". It then seemed to install everything and I have the CLI tools available now.
The below install commands come from the official documentation from each provider. I'm going to retest pulling the repository down through the Docker dashboard to see if it works.
# [Choice] Ubuntu version: bionic, focal
ARG VARIANT="focal"
FROM mcr.microsoft.com/vscode/devcontainers/base:0-${VARIANT}
# Installs Terragrunt + Terraform
ARG TERRAGRUNT_PATH=/bin/terragrunt
ARG TERRAGRUNT_VERSION=0.31.1
RUN wget https://github.com/gruntwork-io/terragrunt/releases/download/v${TERRAGRUNT_VERSION}/terragrunt_linux_amd64 -O ${TERRAGRUNT_PATH} \
&& chmod 755 ${TERRAGRUNT_PATH}
# Installs GCloud SDK
RUN echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] http://packages.cloud.google.com/apt cloud-sdk main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list && curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key --keyring /usr/share/keyrings/cloud.google.gpg add - && apt-get update -y && apt-get install google-cloud-sdk -y

VS Code: No process running with Id error when attaching to ASP.NET Core process in Docker container

Currently I'm running ASP.NET Core directly on my Linux workstation. But I want to directly develop with Docker, including remote debugging. Following different articles, it seems that vsdbg is required. So I tried the following:
FROM mcr.microsoft.com/dotnet/core/sdk:2.2
RUN apt update && \
apt install unzip && \
curl -sSL https://aka.ms/getvsdbgsh | /bin/sh /dev/stdin -v latest -l /vsdbg
WORKDIR /app
COPY . .
RUN dotnet publish
WORKDIR /app/bin/Debug/netcoreapp2.2/publish/
ENTRYPOINT dotnet MyApp.dll
The docker container is started and I can access my ASP.NET Core app in the browser. VS Code should attach with the following launch.json snippet:
{
"name": ".NET Core Attach",
"type": "coreclr",
"request": "attach",
"processId": "${command:pickProcess}",
"sourceFileMap": {
"/app": "${workspaceRoot}"
},
"pipeTransport": {
"pipeProgram": "docker",
"pipeArgs": ["exec", "-i", "my-docker-container-name"],
"debuggerPath": "/vsdbg/vsdbg",
"pipeCwd": "${workspaceRoot}",
"quoteArgs": false
}
}
After running this debug attach entry, It shows a select list with process ids. The first one contains dotnet MyApp.dll so this must be my docker process. But after selecting, VS Code shows a popup that currently no process is running with the selected id.
The process id IS correct. I tried running kill <pid> and my dotnet process in the docker container exited.
So why VS Code breaks with the error that the docker process is not running?
System information:
Ubuntu 19.04
Docker 19.03.0-rc3
VS Code 1.35.1
Edit
Seems to be something to do with quoteArgs. When I set this to false, another error occurs:
OCI runtime exec failed: exec failed: container_linux.go:345: starting container process caused "exec: \"/vsdbg/vsdbg --interpreter=vscode\": stat /vsdbg/vsdbg --interpreter=vscode: no such file or directory": unknown
Without the quotes, he try to execute "/usr/bin/docker" exec -i <my-docker-container-name> /vsdbg/vsdbg --interpreter=vscode. I executed this manually and it starts vsdbg and keep the terminal open. So quoteArgs: false seems correct for Docker. Altough I doesn't understand why this gave the strange no process running with pid error in VS Code.

Packer fails my docker build with error "sudo: not found" despite sudo being present

I'm trying to build a packer image with docker on it and I want docker to create a docker image with a custom script. The relevant portion of my code is (note that the top builder double-checks that sudo is installed):
{
"type": "shell",
"inline": [
"apt-get install sudo"
]
},
{
"type": "docker",
"image": "python:3",
"commit": true,
"changes": [
"RUN pip install Flask",
"CMD [\"python\", \"echo.py\"]"
]
}
The relevant portion of my screen output is:
==> docker: provisioning with shell script: /var/folders/s8/g1_gobbldygook/T/packer-shell23453453245
docker: /temp/script_1234.sh: 3: /tmp/script_1234.sh: sudo: not found
==> docker: killing the contaner: 34234hashvomit234234
Build 'docker' errored: Scipt exited with non-zero exit status: 127
The script in question is not one of mine. It's some randomly generated script that has a different series of four numbers every time I build. I'm new to both packer and docker, so maybe it's obvious what the problem is, but it's not to me.
There seem to be a few problems with your packer input. Since you haven't included the complete input file it's hard to tell, but notice a couple of things:
You probably need to run apt-get update before calling apt-get install sudo. Without that, even if the image has cached package metadata it is probably stale. If I try to build an image using your input it fails with:
E: Unable to locate package sudo
While not a problem in this context, it's good to explicitly include -y on the apt-get command line when you're running it non-interactively:
apt-get -y install sudo
In situations where apt-get is attached to a terminal, this will prevent it from prompting for confirmation. This is not a necessary change to your input, but I figure it's good to be explicit.
Based on the docs and on my testing, you can't include a RUN statement in the changes block of a docker builder. That fails with:
Stderr: Error response from daemon: run is not a valid change command
Fortunately, we can move that pip install command into a shell provisioner.
With those changes, the following input successfully builds an image:
{
"builders": [{
"type": "docker",
"image": "python:3",
"commit": true
}],
"provisioners": [{
"type": "shell",
"inline": [
"apt-get update",
"apt-get -y install sudo",
"pip install Flask"
]
}],
"post-processors": [[ {
"type": "docker-tag",
"repository": "packer-test",
"tag": "latest"
} ]]
}
(NB: Tested using Packer v1.3.5)

ssh-agent does not remember identities when running inside a docker container in DC/OS

I am trying to run a service using DC/OS and Docker. I created my Stack using the template for my region from here. I also created the following Dockerfile:
FROM ubuntu:16.04
RUN apt-get update && apt-get install -y expect openssh-client
WORKDIR "/root"
ENTRYPOINT eval "$(ssh-agent -s)" && \
mkdir -p .ssh && \
echo $PRIVATE_KEY > .ssh/id_rsa && \
chmod 600 /root/.ssh/id_rsa && \
expect -c "spawn ssh-add /root/.ssh/id_rsa; expect \"Enter passphrase for /root/.ssh/id_rsa:\" send \"\"; interact " && \
while true; do ssh-add -l; sleep 2; done
I have a private repository that I would like to clone/pull from when the docker container starts. This is why I am trying to add the private key to the ssh-agent.
If I run this image as a docker container locally and supply the private key using the PRIVATE_KEY environment variable, everything works fine. I see that the identity is added.
The problem that I have is that when I try to run a service on DC/OS using the docker image, the ssh-agent does not seem to remember the identity that was added using the private key.
I have checked the error log from DC/OS. There are no errors.
Does anyone know why running the docker container on DC/OS is any different compared to running it locally?
EDIT: I have added details of the description of the DC/OS service in case it helps:
{
"id": "/SOME-ID",
"instances": 1,
"cpus": 1,
"mem": 128,
"disk": 0,
"gpus": 0,
"constraints": [],
"fetch": [],
"storeUrls": [],
"backoffSeconds": 1,
"backoffFactor": 1.15,
"maxLaunchDelaySeconds": 3600,
"container": {
"type": "DOCKER",
"volumes": [],
"docker": {
"image": "IMAGE NAME FROM DOCKERHUB",
"network": "BRIDGE",
"portMappings": [{
"containerPort": SOME PORT NUMBER,
"hostPort": SOME PORT NUMBER,
"servicePort": SERVICE PORT NUMBER,
"protocol": "tcp",
"name": “default”
}],
"privileged": false,
"parameters": [],
"forcePullImage": true
}
},
"healthChecks": [],
"readinessChecks": [],
"dependencies": [],
"upgradeStrategy": {
"minimumHealthCapacity": 1,
"maximumOverCapacity": 1
},
"unreachableStrategy": {
"inactiveAfterSeconds": 300,
"expungeAfterSeconds": 600
},
"killSelection": "YOUNGEST_FIRST",
"requirePorts": true,
"env": {
"PRIVATE_KEY": "ID_RSA PRIVATE_KEY WITH \n LINE BREAKS",
}
}
Docker Version
Check that your local version of Docker matches the version installed on the DC/OS agents. By default, the DC/OS 1.9.3 AWS CloudFormation templates uses CoreOS 1235.12.0, which comes with Docker 1.12.6. It's possible that the entrypoint behavior has changed since then.
Docker Command
Check the Mesos task logs for the Marathon app in question and see what docker run command was executed. You might be passing it slightly different arguments when testing locally.
Script Errors
As mentioned in another answer, the script you provided has several errors that may or may not be related to the failure.
echo $PRIVATE_KEY should be echo "$PRIVATE_KEY" to preserve line breaks. Otherwise key decryption will fail with Bad passphrase, try again for /root/.ssh/id_rsa:.
expect -c "spawn ssh-add /root/.ssh/id_rsa; expect \"Enter passphrase for /root/.ssh/id_rsa:\" send \"\"; interact " should be expect -c "spawn ssh-add /root/.ssh/id_rsa; expect \"Enter passphrase for /root/.ssh/id_rsa:\"; send \"\n\"; interact ". It's missing a semi-colon and a line break. Otherwise the expect command fails without executing.
File Based Secrets
Enterprise DC/OS 1.10 (1.10.0-rc1 out now) has a new feature named File Based Secrets which allows for injecting files (like id_rsa files) without including their contents in the Marathon app definition, storing them securely in Vault using DC/OS Secrets.
Creation: https://docs.mesosphere.com/1.10/security/secrets/create-secrets/
Usage: https://docs.mesosphere.com/1.10/security/secrets/use-secrets/
File based secrets wont do the ssh-add for you, but it should make it easier and more secure to get the file into the container.
Mesos Bug
Mesos 1.2.0 switched to using Docker --env_file instead of -e to pass in environment variables. This triggers a Docker env_file bug that it doesn't support line breaks. A workaround was put into Mesos and DC/OS, but the fix may not be in the minor version you are using.
A manual workaround is to convert the rsa_id to base64 for the Marathon definition and back in your entrypoint script.
The key file contents being passed via PRIVATE_KEY originally contain line breaks. After echoing the PRIVATE_KEY variable content to ~/.ssh/id_rsa the line breaks will be gone. You can fix that issue by wrapping the $PRIVATE_KEY variable with double quotes.
Another issue arises when the container is started without attached TTY, typically via -i -t command line parameters to docker run. The password request will fail and won't add the ssh key to the ssh-agent. For the container being run in DC/OS, the interaction probably won't make sense, so you should change your entrypoint script accordingly. That will require your ssh key to be passwordless.
This changed Dockerfile should work:
ENTRYPOINT eval "$(ssh-agent -s)" && \
mkdir -p .ssh && \
echo "$PRIVATE_KEY" > .ssh/id_rsa && \
chmod 600 /root/.ssh/id_rsa && \
ssh-add /root/.ssh/id_rsa && \
while true; do ssh-add -l; sleep 2; done

Resources