I am trying to build a docker image that contains all of the necessary plugins/providers that several source repos need, so that when an automated terraform validate runs, it doesn't have to download gigs of redundant data.
However, I recognize that this provides for a maintenance problem in that someone may update a plugin version, and that would needed to be downloaded, since the docker image would not contain it.
The question
How can I pre-download all providers and plugins
Tell the CLI use those predownloaded plugins AND
also tell it that, if it doesn't find what it needs locally, then it can go to the network
Below are the relevant file:
.terraformrc
plugin_cache_dir = "$HOME/.terraform.d/plugin-cache"
disable_checkpoint = true
provider_installation {
filesystem_mirror {
path = "$HOME/.terraform/providers"
}
direct {
}
}
tflint (not relevant to this question, but it shows up in the below Dockerfile)
plugin "aws" {
enabled = true
version = "0.21.1"
source = "github.com/terraform-linters/tflint-ruleset-aws"
}
plugin "azurerm" {
enabled = true
version = "0.20.0"
source = "github.com/terraform-linters/tflint-ruleset-azurerm"
}
Dockerfile
FROM ghcr.io/terraform-linters/tflint-bundle AS base
LABEL name=tflint
RUN adduser -h /home/jenkins -s /bin/sh -u 1000 -D jenkins
RUN apk fix && apk --no-cache --update add git terraform openssh
ADD .terraformrc /home/jenkins/.terraformrc
RUN mkdir -p /home/jenkins/.terraform.d/plugin-cache/registry.terraform.io
ADD .tflint.hcl /home/jenkins/.tflint.hcl
WORKDIR /home/jenkins
RUN tflint --init
FROM base AS build
ARG SSH_PRIVATE_KEY
RUN mkdir /root/.ssh && \
echo "${SSH_PRIVATE_KEY}" > /root/.ssh/id_ed25519 && \
chmod 400 /root/.ssh/id_ed25519 && \
touch /root/.ssh/known_hosts && \
ssh-keyscan mygitrepo >> /root/.ssh/known_hosts
RUN git clone git#mygitrepo:wrai/tools/g.git
RUN git clone git#mygitrepo:myproject/a.git && \
git clone git#mygitrepo:myproject/b.git && \
git clone git#mygitrepo:myproject/c.git && \
git clone git#mygitrepo:myproject/d.git && \
git clone git#mygitrepo:myproject/e.git && \
git clone git#mygitrepo:myproject/f.git
RUN ls -1d */ | xargs -I {} find {} -name '*.tf' | xargs -n 1 dirname | sort -u | \
xargs -I {} -n 1 -P 20 terraform -chdir={} providers mirror /home/jenkins/.terraform.d
RUN chown -R jenkins:jenkins /home/jenkins
USER jenkins
FROM base AS a
COPY --from=build /home/jenkins/a/ /home/jenkins/a
RUN cd /home/jenkins/a && terraform init
FROM base AS b
COPY --from=build /home/jenkins/b/ /home/jenkins/b
RUN cd /home/jenkins/b && terraform init
FROM base AS c
COPY --from=build /home/jenkins/c/ /home/jenkins/c
RUN cd /home/jenkins/c && terraform init
FROM base AS azure_infrastructure
COPY --from=build /home/jenkins/d/ /home/jenkins/d
RUN cd /home/jenkins/d && terraform init
FROM base AS aws_infrastructure
COPY --from=build /home/jenkins/e/ /home/jenkins/e
RUN cd /home/jenkins/e && terraform init
Staging plugins:
This is most easily accomplished with the plugin cache dir setting in the CLI. This supersedes the old usage with the -plugin-dir=PATH argument for the init command. You could also set a filesystem mirror in each terraform block within the root module config, but this would be cumbersome for your use case. In your situation, you are already configuring this in your .terraformrc, but the filesystem_mirror path conflicts with the plugin_cache_dir. You would want to resolve that conflict, or perhaps remove the mirror block entirely.
Use staged plugins:
Since the setting is captured in the CLI configuration file within the Dockerfile, this would be automatically used in future commands.
Download additional plugins if necessary:
This is default behavior of the init command, and therefore requires no further actions on your part.
Side note:
The jenkins user typically is /sbin/nologin for shell and /var/lib/jenkins for home directory. If the purpose of this Docker image is for a Jenkins build agent, then you may want the jenkins user configuration to be more aligned with the standard.
TL;DR:
Configure the terraform plugin cache directory
Create directory with a single TF file containing required_providers block
Run terraform init from there
...
I've stumbled over this question as I tried to figure out the same thing.
I first tried leveraging an implied filesystem_mirror by running terraform providers mirror /usr/local/share/terraform/plugins in a directory containing only one terraform file containing the required_providers block. This works fine as long as you only use the versions of the providers you mirrored.
However, it's not possible to use a different version of a provider than the one you mirrored, because:
Terraform will scan all of the filesystem mirror directories to see which providers are placed there and automatically exclude all of those providers from the implied direct block.
I've found it to be a better solution to use a plugin cache directory instead. EDIT: You can prefetch the plugins by setting TF_PLUGIN_CACHE_DIR to some directory and then running terraform init in a directory that only declares the required_providers.
Previously overengineered stuff below:
The only hurdle left was that terraform providers mirror downloads the providers in the packed layout:
Packed layout: HOSTNAME/NAMESPACE/TYPE/terraform-provider-TYPE_VERSION_TARGET.zip is the distribution zip file obtained from the provider's origin registry.
while Terraform expects the plugin cache directory to use the unpacked layout:
Unpacked layout: HOSTNAME/NAMESPACE/TYPE/VERSION/TARGET is a directory containing the result of extracting the provider's distribution zip file.
So I converted the packed layout to the unpacked layout with the help of find and parallel:
find path/to/plugin-dir -name index.json -exec rm {} +`
find path/to/plugin-dir -name '*.json' | parallel --will-cite 'mkdir -p {//}/{/.}/linux_amd64; unzip {//}/*.zip -d {//}/{/.}/linux_amd64; rm {}; rm {//}/*.zip'
Related
I'm following installation instructions for RedhawkSDR, which rely on having a Centos7 OS. Since my machine uses Ubuntu 22.04, I'm creating a Docker container to run Centos7 then installing RedhawkSDR in that.
One of the RedhawkSDR installation instructions is to create a file with the following command:
cat<<EOF|sed 's#LDIR#'`pwd`'#g'|sudo tee /etc/yum.repos.d/redhawk.repo
[redhawk]
name=REDHAWK Repository
baseurl=file://LDIR/
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhawk
EOF
How do I get a Dockerfile to execute this command when creating an image?
(Also, although I can see that this command creates the file /etc/yum.repos.d/redhawk.repo, which consists of the lines from [redhawk] to gpgkey=...., I have no idea how to parse this command and understand exactly why it does that...)
Using the text editor of your choice, create the file on your local system. Remove the word sudo from it; give it an additional first line #!/bin/sh. Make it executable using chmod +x create-redhawk-repo.
Now it is an ordinary shell script, and in your Dockerfile you can just RUN it.
COPY create-redhawk-repo ./
RUN ./create-redhawk-repo
But! If you look at what the script actually does, it just writes a file into /etc/yum.repos.d with a LDIR placeholder replaced with some other directory. The filesystem layout inside a Docker image is fixed, and there's no particular reason to use environment variables or build arguments to hold filesystem paths most of the time. You could use a fixed path in the file
[redhawk]
name=REDHAWK Repository
baseurl=file:///redhawk-yum/
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhawk
and in your Dockerfile, just COPY that file in as-is, and make sure the downloaded package archive is in that directory. Adapting the installation instructions:
ARG redhawk_version=3.0.1
RUN wget https://github.com/RedhawkSDR/redhawk/releases/download/$redhawk_version/\
redhawk-yum-$redhawk_version-el7-x86_64.tar.gz \
&& tar xzf redhawk-yum-$redhawk_version-el7-x86_64.tar.gz \
&& rm redhawk-yum-$redhawk_version-el7-x86_64.tar.gz \
&& mv redhawk-yum-$redhawk_version-el7-x86_64 redhawk-yum \
&& rpm -i redhawk-yum/redhawk-release*.rpm
COPY redhawk.repo /etc/yum.repos.d/
Remember that, in a Dockerfile, you are root unless you've switched to another USER (and in that case you can use USER root to switch back); you do not need generally sudo in Docker at all, and can just delete sudo where it appears in these instructions.
How do I get a Dockerfile to execute this command when creating an image?
Just use printf and run this command as single line:
FROM image_name:image_tag
ARG LDIR="/default/folder/if/argument/not/set"
# if container has sudo command and default user is not root
# you should choose this variant
RUN printf '[redhawk]\nname=REDHAWK Repository\nbaseurl=file://%s/\nenabled=1\ngpgcheck=1\ngpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhawk\n' "$LDIR" | sudo tee /etc/yum.repos.d/redhawk.repo
# if default container user is root this command without piping may be used
RUN printf '[redhawk]\nname=REDHAWK Repository\nbaseurl=file://%s/\nenabled=1\ngpgcheck=1\ngpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhawk\n' "$LDIR" > /etc/yum.repos.d/redhawk.repo
Where LDIR is an argument and docker build process should be run like:
docker build ./ --build-arg LDIR=`pwd`
I try to write a Dockerfile that adds a file to the image like this:
ADD https://repository.internal/file.zip /tmp/
The repository.internal host is only reachable through a proxy. I provide the proxy configuraton with the --config option but the ADD command seems not to use the proxy and fails.
I know the proxy configuration is correct because I added the line
RUN curl https://repository.internal/file.zip
which is working fine.
Is there any possibility to run the ADD command also with the proxy config?
As per my comments above, I believe this to be something to do with the internal way the Docker build process handles the ADD and RUN commands... I cant find documentation to back this up - so someone with greater internal knowledge may confirm or deny, but makes sense as a RUN command is done in a layer TO the image being built, where as the ADD command is performed and the results of it are baked into the image.
Whichever way this is being handled, you can achieve what you need by moving to the RUN method as follows:
FROM <your base image>
RUN curl https://repository.internal/file.zip >> /tmp/file.zip \
&& cd /tmp \
&& unzip file.zip \
&& rm file.zip
And you will have the files unzipped.
You may need to check if the rm at the end is required - cant remember off the top of my head if the unzip command removes the original zip file.
As you mentioned, this would rely on the curl and unzip packages being available on the image... however you could potentially avoid having these within your final application image by using Docker Multi Stage Builds
Your Dockerfile would then look something like:
FROM <some useful base image> as collector
RUN apt-get install -y curl unzip
RUN mkdir /tmp/files && \
&& curl https://repository.internal/file.zip >> /tmp/files/file.zip \
&& cd /tmp/files \
&& unzip file.zip \
&& rm file.zip
FROM <your final desired base image>
COPY --from=collector /tmp/files /tmp
This would then utilise an image to have curl and unzip in to collect and deal with the extraction of your files without having to install them on your final application image.
I am new to docker, and am attempting to build an image that involves performing an npm install. Some of our the dependencies are coming from private repos we have, and I am hitting an SSH related issue:
I realised I was not supplying any form of SSH details to my file, and came across various posts online about how to do this using args into the docker build command.
So taken from here, I have added the following to my dockerfile before the npm install command gets run:
ARG ssh_prv_key
ARG ssh_pub_key
RUN apt-get update && \
apt-get install -y \
git \
openssh-server \
libmysqlclient-dev
# Authorize SSH Host
RUN mkdir -p /root/.ssh && \
chmod 0700 /root/.ssh && \
ssh-keyscan github.com > /root/.ssh/known_hosts
# Add the keys and set permissions
RUN echo "$ssh_prv_key" > /root/.ssh/id_rsa && \
echo "$ssh_pub_key" > /root/.ssh/id_rsa.pub && \
chmod 600 /root/.ssh/id_rsa && \
chmod 600 /root/.ssh/id_rsa.pub
So running the docker build command again with the correct args supplied, I do see further activity in the console that suggests my SSH key is being utilised:
But as you can see I am getting no hostkey alg messages and
I still getting the same 'Host key verification failed' error. I was wondering if I could view the log file it references in the error:
Do I need to get the image running in order to be able to connect to it and browse the 'root' folder?
I hope I have made sense, please be gentle I am a docker noob!
Thanks
The lines that start with —-> in the docker build output are valid Docker image IDs. You can pick any of these and docker run them:
docker run --rm -it 59c45dac474a sh
If a step is actually failing, one useful debugging trick is to launch the image built in the step before it and run the command by hand.
Remember that anyone who has your image can do this; the way you’ve built it, if you ever push your image to any repository, your ssh private key is there for the taking, and you should probably consider it compromised. That’s doubly true since it will also be there in plain text in docker history output.
My use case is a little different than others with this problem, so a little up-front description:
I am working on Google Cloud and have a "dockerized" Django app. Part of the app depends on using gsutil for moving files to/from a Google Storage bucket. For various reasons, we do not want to use Google Container Engine to manage our containers. Rather, we would like to scale horizontally by starting additional Google Compute VMs which will, in turn, run this Docker container. Similar to https://cloud.google.com/python/tutorials/bookshelf-on-compute-engine except we will use a container rather than pulling a git repository.
The VM's will be built from a basic Debian image, and the startup and installation of dependencies (e.g. Docker itself) will be orchestrated with a startup script (e.g. gcloud compute instances create some-instance --metadata-from-file startup-script=/path/to/startup.sh).
If I manually create a VM, elevate with sudo -s, run gsutil config -f (which creates a credential file at /root/.boto) and then run my docker container (see Dockerfile below) with
docker run -v /root/.boto:/root/.boto username/gs gsutil ls gs://my-test-bucket
then it works. However, that requires my interaction to create the boto file.
My question is: How can I pass the default service credentials to the Docker container that will be starting in that new VM?
gsutil works out of the box on even a "fresh" Debian VM since it is using the default compute engine credentials that all VMs are loaded with. Is there a way to use those credentials and pass them to the docker container? After the first call to gsutil on a fresh VM, I've noticed that it creates ~/.gsutil and ~/.config folders. Unfortunately, mounting both of those in Docker with
docker run -v ~/.config/:/root/.config -v ~/.gsutil:/root/.gsutil username/gs gsutil ls gs://my-test-bucket
does not fix my problem. It tells me:
ServiceException: 401 Anonymous users does not have storage.objects.list access to bucket my-test-bucket.
A minimal gsutil Dockerfile (not mine):
FROM alpine
#install deps and install gsutil
RUN apk add --update \
python \
py-pip \
py-cffi \
py-cryptography \
&& pip install --upgrade pip \
&& apk add --virtual build-deps \
gcc \
libffi-dev \
python-dev \
linux-headers \
musl-dev \
openssl-dev \
&& pip install gsutil \
&& apk del build-deps \
&& rm -rf /var/cache/apk/*
CMD ["gsutil"]
Addition: a workaround:
I have since solved my issue, but it is quite roundabout so I'm still interested in a simpler way, if possible. All the details are below:
First, a description:
I first created a service account in the web console. I then save the JSON keyfile (call it credentials.json) into a storage bucket. In the startup script for the GCE VM, I copy that keyfile to the local filesystem (gsutil cp gs://<bucket>/credentials.json /gs_credentials/). I then start my docker container, mounting that local directory. Then, as the docker container starts, it runs a script that authenticates the credentials.json (which creates a .boto file inside the docker), export BOTO_PATH=, and finally I can perform gsutil operations in the Docker container.
Here are the files for a small working example:
Dockerfile:
FROM alpine
#install deps and install gsutil
RUN apk add --update \
python \
py-pip \
py-cffi \
py-cryptography \
bash \
curl \
&& pip install --upgrade pip \
&& apk add --virtual build-deps \
gcc \
libffi-dev \
python-dev \
linux-headers \
musl-dev \
openssl-dev \
&& pip install gsutil \
&& apk del build-deps \
&& rm -rf /var/cache/apk/*
# install the gcloud SDK-
# this allows us to use gcloud auth inside the container
RUN curl -sSL https://sdk.cloud.google.com > /tmp/gcl \
&& bash /tmp/gcl --install-dir=~/gcloud --disable-prompts
RUN mkdir /startup
ADD gsutil_docker_startup.sh /startup/gsutil_docker_startup.sh
ADD get_account_name.py /startup/get_account_name.py
ENTRYPOINT ["/startup/gsutil_docker_startup.sh"]
gsutil_docker_startup.sh: Takes a single argument, which is the path to a JSON-format service account credentials file. The file exists because the directory on the host machine was mounted in the container.
#!/bin/bash
CRED_FILE_PATH=$1
mkdir /results
# List the bucket, see that it gives a "ServiceException:401"
gsutil ls gs://<input bucket> > /results/before.txt
# authenticate the credentials- this creates a .boto file:
/root/gcloud/google-cloud-sdk/bin/gcloud auth activate-service-account --key-file=$CRED_FILE_PATH
# need to extract the service account which is like:
# <service acct ID>#<google project>.iam.gserviceaccount.com"
SERVICE_ACCOUNT=$(python /startup/get_account_name.py $CRED_FILE_PATH)
# with that service account, we can locate the .boto file:
export BOTO_PATH=/root/.config/gcloud/legacy_credentials/$SERVICE_ACCOUNT/.boto
# List the bucket and copy the file to an output bucket for good measure
gsutil ls gs://<input bucket> > /results/after.txt
gsutil cp /results/*.txt gs://<output bucket>/
get_account_name.py:
import json
import sys
j = json.load(open(sys.argv[1]))
sys.stdout.write(j['client_email'])
Then, the GCE startup script (executed automatically as the VM is started) is:
#!/bin/bash
# <SNIP>
# Install docker, other dependencies
# </SNIP>
# pull docker image
docker pull userName/containerName
# get credential file:
mkdir /cloud_credentials
gsutil cp gs://<bucket>/credentials.json /cloud_credentials/creds.json
# run container
# mount the host machine directory where the credentials were saved.
# Note that the container expects a single arg,
# which is the path to the credential file IN THE CONTAINER
docker run -v /cloud_credentials:/cloud_credentials \
userName/containerName /cloud_credentials/creds.json
You can assign a Specific Service Account to your instance and then use the Application Default Credential in your code. Please verify these points before testing.
Set the instance access scopes to: "Allow full access to all Cloud APIs" as they are not really a security feature
Set the right role to you service account: "Storage Object Viewer"
Authentication Token are retrieved automatically by Application Default Credential via Google Metadata Server which is available from your instance and your Docker containers as well. There is no need to manage any credentials.
def implicit():
from google.cloud import storage
# If you don't specify credentials when constructing the client, the
# client library will look for credentials in the environment.
storage_client = storage.Client()
# Make an authenticated API request
buckets = list(storage_client.list_buckets())
print(buckets)
I also quickly tested with docker and I worked perfectly
yann#test:~$ gsutil cat gs://my-test-bucket/hw.txt
Hello World
yann#test:~$ docker run --rm google/cloud-sdk gsutil cat gs://my-test-bucket/hw.txt
Hello World
I am using automated builds on Docker cloud to compile a C++ app and provide it in an image.
Compilation is quite long (range 2-3 hours) and commits on github are frequent (~10 to 30 per day).
Is there a way to keep the building cache (using ccache) somehow?
As far as I understand it, docker caching is useless since the compilation layer producing the ccache will not be used due to the source code changes.
Or can we tweak to bring some data back to first layer?
Any other solution? Pushing it somewhere?
Here is the Dockerfile:
# CACHE_TAG is provided by Docker cloud
# see https://docs.docker.com/docker-cloud/builds/advanced/
# using ARG in FROM requires min v17.05.0-ce
ARG CACHE_TAG=latest
FROM qgis/qgis3-build-deps:${CACHE_TAG}
MAINTAINER Denis Rouzaud <denis.rouzaud#gmail.com>
ENV CC=/usr/lib/ccache/clang
ENV CXX=/usr/lib/ccache/clang++
ENV QT_SELECT=5
COPY . /usr/src/QGIS
WORKDIR /usr/src/QGIS/build
RUN cmake \
-GNinja \
-DCMAKE_INSTALL_PREFIX=/usr \
-DBINDINGS_GLOBAL_INSTALL=ON \
-DWITH_STAGED_PLUGINS=ON \
-DWITH_GRASS=ON \
-DSUPPRESS_QT_WARNINGS=ON \
-DENABLE_TESTS=OFF \
-DWITH_QSPATIALITE=ON \
-DWITH_QWTPOLAR=OFF \
-DWITH_APIDOC=OFF \
-DWITH_ASTYLE=OFF \
-DWITH_DESKTOP=ON \
-DWITH_BINDINGS=ON \
-DDISABLE_DEPRECATED=ON \
.. \
&& ninja install \
&& rm -rf /usr/src/QGIS
WORKDIR /
You should try saving and restoring your cache data from a third party service:
- an online object storage like Amazon S3
- a simple FTP server
- an Internet available machine with ssh to make a scp
I'm assuming that your cache data is stored inside the ´~/.ccache´ directory
Using Docker multistage build
From some time, Docker supports Multi-stage builds and you can try using it to implement the solution with a single Dockerfile:
Warning: I've not tested it
# STAGE 1 - YOUR ORIGINAL DOCKER FILE CUSTOMIZED
# CACHE_TAG is provided by Docker cloud
# see https://docs.docker.com/docker-cloud/builds/advanced/
# using ARG in FROM requires min v17.05.0-ce
ARG CACHE_TAG=latest
FROM qgis/qgis3-build-deps:${CACHE_TAG} as builder
MAINTAINER Denis Rouzaud <denis.rouzaud#gmail.com>
ENV CC=/usr/lib/ccache/clang
ENV CXX=/usr/lib/ccache/clang++
ENV QT_SELECT=5
COPY . /usr/src/QGIS
WORKDIR /usr/src/QGIS/build
# restore cache
RUN curl -o ccache.tar.bz2 http://my-object-storage/ccache.tar.bz2
RUN tar -xjvf ccache.tar.bz2
COPY --from=downloader /.ccache ~/.ccache
RUN cmake \
-GNinja \
-DCMAKE_INSTALL_PREFIX=/usr \
-DBINDINGS_GLOBAL_INSTALL=ON \
-DWITH_STAGED_PLUGINS=ON \
-DWITH_GRASS=ON \
-DSUPPRESS_QT_WARNINGS=ON \
-DENABLE_TESTS=OFF \
-DWITH_QSPATIALITE=ON \
-DWITH_QWTPOLAR=OFF \
-DWITH_APIDOC=OFF \
-DWITH_ASTYLE=OFF \
-DWITH_DESKTOP=ON \
-DWITH_BINDINGS=ON \
-DDISABLE_DEPRECATED=ON \
.. \
&& ninja install
# save the current cache online
WORKDIR ~/
RUN tar -cvjSf ccache.tar.bz2 .ccache
RUN curl -T ccache.tar.bz2 -X PUT http://my-object-storage/ccache.tar.bz2
# STAGE 2
FROM alpine:latest
# YOUR CUSTOM LOGIC TO CREATE THE FINAL IMAGE WITH ONLY REQUIRED BINARIES
# USE THE FROM IMAGE YOU NEED, this is only an example
# E.g.:
# COPY --from=builder /usr/src/QGIS/build/YOUR_EXECUTABLE /usr/bin
# ...
In the stage 2 you will build the final image that will be pushed to your repository.
Using Docker cloud hooks
Another, but less clear, approach could be using a Docker Cloud pre_build hook file to download cache data:
#!/bin/bash
echo "=> Downloading build cache data"
curl -o ccache.tar.bz2 http://my-object-storage/ccache.tar.bz2 # e.g. Amazon S3 like service
cd /
tar -xjvf ccache.tar.bz2
Obviously you can use dedicate docker images to run curl or tar mounting the local directory as a volume in this script.
Then, copy the .ccache extracted folder inside your container during the build, using a COPY command before your cmake call:
WORKDIR /usr/src/QGIS/build
COPY /.ccache ~/.ccache
RUN cmake ...
In order to make this you should find a way to upload your cache data after the build and you could make this easily using a post_build hook file:
#!/bin/bash
echo "=> Uploading build cache data"
tar -cvjSf ccache.tar.bz2 ~/.ccache
curl -T ccache.tar.bz2 -X PUT http://my-object-storage/ccache.tar.bz2
But your compilation data aren't available from the outside, because they live inside the container. So you should upload the cache after the cmake command inside your main Dockerfile:
RUN cmake...
&& tar ...
&& curl ...
&& ninja ...
&& rm ...
If curl or tar aren't available, just add them to your container using the package manager (qgis/qgis3-build-deps is based on Ubuntu 16.04, so they should be available).