Cloud Composer missing variables file - google-cloud-composer

I've been trying to import a JSON file of environment variables to a newly created Cloud Composer instance using the airflow CLI but when running the below I get the error: Missing variables file.
gcloud composer environments run ${COMPOSER_NAME} \
--location=${COMPOSER_LOCATION} \
variables -- \
-i ${VARIABLES_JSON}
From looking at the source it seems that this happens when an environment variable file doesn't exist at the expected location. Is this because Cloud Composer sets up its variables in a different location so this CLI won't work? I've noticed that there's a env_var.json file that's created on the instance's GCS bucket, I realise I can overwrite this file but that doesn't seem like best practice.

It feels like a hack but I copied over the variables.json to my Composer's GCS bucket data folder and then it worked.
This is due to os.path.exists() checking the container that Airflow is running on. I chose this approach over overwriting env_var.json because I get the variables in Airflow's UI with this method.
Script for anyone interested:
COMPOSER_DATA_FOLDER=/home/airflow/gcs/data
COMPOSER_GCS_BUCKET=$(gcloud composer environments describe ${COMPOSER_NAME} --location ${COMPOSER_LOCATION} | grep 'dagGcsPrefix' | grep -Eo "\S+/")
gsutil cp ${ENV_VARIABLES_JSON_FILE} ${COMPOSER_GCS_BUCKET}data
gcloud composer environments run ${COMPOSER_NAME} \
--location ${COMPOSER_LOCATION} variables -- \
-i ${COMPOSER_DATA_FOLDER}/variables.json

Related

Auto-create Rundeck jobs on startup (Rundeck in Docker container)

I'm trying to setup Rundeck inside a Docker container. I want to use Rundeck to provision and manage my Docker fleet. I found an image which ships an ansible-plugin as well. So far running simple playbooks and auto-discovering my Pi nodes work.
Docker script:
echo "[INFO] prepare rundeck-home directory"
mkdir ../../target/work/home
mkdir ../../target/work/home/rundeck
mkdir ../../target/work/home/rundeck/data
echo -e "[INFO] copy host inventory to rundeck-home"
cp resources/inventory/hosts.ini ../../target/work/home/rundeck/data/inventory.ini
echo -e "[INFO] pull image"
docker pull batix/rundeck-ansible
echo -e "[INFO] start rundeck container"
docker run -d \
--name rundeck-raspi \
-p 4440:4440 \
-v "/home/sebastian/work/workspace/workspace-github/raspi/target/work/home/rundeck/data:/home/rundeck/data" \
batix/rundeck-ansible
Now I want to feed the container with playbooks which should become jobs to run in Rundeck. Can anyone give me a hint on how I can create Rundeck jobs (which should invoke an ansible playbook) from the outside? Via api?
One way I can think of is creating the jobs manually once and exporting them as XML or YAML. When the container and Rundeck is up and running I could import the jobs automatically. Is there a certain folder in rundeck-home or somewhere where I can put those files for automatic import? Or is there an API call or something?
Could Jenkins be more suited for this task than Rundeck?
EDIT: just changed to a Dockerfile
FROM batix/rundeck-ansible:latest
COPY resources/inventory/hosts.ini /home/rundeck/data/inventory.ini
COPY resources/realms.properties /home/rundeck/etc/realms.properties
COPY resources/tokens.properties /home/rundeck/etc/tokens.properties
# import jobs
ENV RD_URL="http://localhost:4440"
ENV RD_TOKEN="yJhbGciOiJIUzI1NiIs"
ENV rd_api="36"
ENV rd_project="Test-Project"
ENV rd_job_path="/home/rundeck/data/jobs"
ENV rd_job_file="Ping_Nodes.yaml"
# copy job definitions and script
COPY resources/jobs-definitions/Ping_Nodes.yaml /home/rundeck/data/jobs/Ping_Nodes.yaml
RUN curl -kSsv --header "X-Rundeck-Auth-Token:$RD_TOKEN" \
-F yamlBatch=#"$rd_job_path/$rd_job_file" "$RD_URL/api/$rd_api/project/$rd_project/jobs/import?fileformat=yaml&dupeOption=update"
Do you know how I can delay the curl at the end until after the rundeck service is up and running?
That's right you can design an script with an API call using cURL (pointing to your Docker instance) after deploying your instance (a script that deploys your instance and later import the jobs), I leave a basic example (in this example you need the job definition in XML format).
For XML job definition format:
#!/bin/sh
# protocol
protocol="http"
# basic rundeck info
rdeck_host="localhost"
rdeck_port="4440"
rdeck_api="36"
rdeck_token="qNcao2e75iMf1PmxYfUJaGEzuVOIW3Xz"
# specific api call info
rdeck_project="ProjectEXAMPLE"
rdeck_xml_file="HelloWorld.xml"
# api call
curl -kSsv --header "X-Rundeck-Auth-Token:$rdeck_token" \
-F xmlBatch=#"$rdeck_xml_file" "$protocol://$rdeck_host:$rdeck_port/api/$rdeck_api/project/$rdeck_project/jobs/import?fileformat=xml&dupeOption=update"
For YAML job definition format:
#!/bin/sh
# protocol
protocol="http"
# basic rundeck info
rdeck_host="localhost"
rdeck_port="4440"
rdeck_api="36"
rdeck_token="qNcao2e75iMf1PmxYfUJaGEzuVOIW3Xz"
# specific api call info
rdeck_project="ProjectEXAMPLE"
rdeck_yml_file="HelloWorldYML.yaml"
# api call
curl -kSsv --header "X-Rundeck-Auth-Token:$rdeck_token" \
-F xmlBatch=#"$rdeck_yml_file" "$protocol://$rdeck_host:$rdeck_port/api/$rdeck_api/project/$rdeck_project/jobs/import?fileformat=yaml&dupeOption=update"
Here the API call.

Running Docker on Google Cloud Instance with data in gcsfuse-mounted Bucket

I am trying to run a Docker container to analyze data in a Google Cloud Bucket.
I have been able to successfully mount the Bucket using gcsfuse, and I tested that I could do things like create and delete files within the Bucket.
In order to be able to install other programs (and mount the bucket), I installed Docker (and didn't use the Docker-optimized instance option). If I run Docker in interactive mode (without mounting a drive), it looks like it is working OK.
However, if I try to run Docker in interactive mode with the mounted drive (which is the gcsfuse-mounted Bucket), I get an error message:
user#instance:~/bucket-name/subfolder$ docker run -it -v /home/user/bucket-name:/mnt/bucket-name gcr.io/deepvariant-docker/deepvariant
docker: Error response from daemon: error while creating mount source path '/home/user/bucket-name': mkdir /home/user/bucket-name: file exists.
I hope that I am close to having this working: does anybody have any ideas about a relatively simple fix for this error message?
BTW, I realize that there are other ways to run DeepVariant on Google Cloud, but I am trying to makes things as similar as possible to what I am doing on AWS (plus, I may need to do some extra troubleshooting for analysis of one of my files).
Thank you very much for your help!
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
FYI, this is how I mounted the Bucket:
#mount directory: https://github.com/GoogleCloudPlatform/gcsfuse/blob/master/docs/installing.md
export GCSFUSE_REPO=gcsfuse-`lsb_release -c -s`
echo "deb http://packages.cloud.google.com/apt $GCSFUSE_REPO main" | sudo tee /etc/apt/sources.list.d/gcsfuse.list
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
sudo apt-get update
sudo apt-get -y install gcsfuse
#restart and mount directory: https://cloud.google.com/storage/docs/gcs-fuse
#NOTE: please make sure you are in your home directory (I encounter issues if I try to mount from /mnt)
mkdir [bucket-name]
gcsfuse -o allow_other --file-mode 777 --dir-mode 777 [bucket-name] ./[bucket-name]
and this is how I installed Docker:
#install Docker for Debian: https://docs.docker.com/install/linux/docker-ce/debian/
sudo apt-get update
sudo apt-get -y install \
apt-transport-https \
ca-certificates \
curl \
gnupg2 \
software-properties-common
curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -
sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/debian \
$(lsb_release -cs) \
stable"
sudo apt-get update
sudo apt-get -y --allow-unauthenticated install docker-ce docker-ce-cli containerd.io
#fix Docker sock issue: https://stackoverflow.com/questions/47854463/got-permission-denied-while-trying-to-connect-to-the-docker-daemon-socket-at-uni
sudo usermod -a -G docker [user]
#have to restart after this
For anyone experiencing a similar error / issue - here is what worked for me. Steps I took:
First unmount the disk if it's already mounted: sudo umount /mounted_folder
Remount the disk using the below command, listing the credentials file to be used explicitly
sudo GOOGLE_APPLICATION_CREDENTIALS=/home/user/credentials/example-asdf21b0af7.json gcsfuse -o allow_other bucket_name /mounted_folder
Should now be connected successfully without further errors :)
NOTE: This command needs to be run everytime after restarting the computer / VM. Formatting this into fstab could probably be done so one does not need to manually execute these steps upon each restart.
EXPLANATION: What I did here was explicitly specifying the credentials via a credentials JSON for the user / service account with appropriate access (Not explained here on how to get this but should be googl-able) and referring to that json in the GOOGLE_APPLICATION_CREDENTIALS environment variable option, as suggested by this answer: https://stackoverflow.com/a/39047673/10002593. The need for this environment variable option is likely due to gcsfuse not registering the same level of access as the activated acount in gcloud config for some reason.
I think I figured out at least a partial solution to my problem:
As mentioned in this tutorial, you also need to run gcloud auth configure-docker.
I found you also needed to exit and restart your instance, but this strictly solved the original error message for this post.
I think got a strange message, but perhaps that is more about the specific container. So, I ran another test:
docker run -it -v /home/user/bucket-name:/mnt/bucket-name cwarden45/dnaseq-dependencies
This time, I got an error message about storage space on the instance (to be able to download and run the Docker container). So, I went back and created a new instance with a larger local hard drive:
1) From the Google Cloud Console, I selected "Compute Instance" and "VM instances"
2) I clicked "create instance" (similar to before)
3) I select "change" under "boot disk"
4) I set size to 300 GB instead of 10 GB (currently, towards bottom-right, under "Size (GB)")
Similar to before, I choose 8 vCPUs for the "Machine type", I selected "Allow full access to all Cloud APIs" under "Identity and API access", and I checked boxes for both "Allow HTTP traffic" and "Allow HTTPS traffic" (under "Firewall").
I am not selecting "Deploy a container image to this VM instance," which I believe is how I got Docker installed with "sudo" to be able to install gcsfuse.
I also have to call this a "parital" solution because this allows me to run the Docker container successfully in interactive mode, but the mounted bucket appears empty within Docker.
For another project, I noticed that executables could work if I installed them on the local hard drive under /opt, but not if I tried to install them on my bucket (in order to save the installation time for those programs each time). On AWS, I believe I needed to use EFS storage instead of S3 storage to do something similar, but I will keep learning more about using the Google Cloud Bucket for mounted storage / analysis.
Also, it is a different issue, but I noticed that I could fix an issue with running exectuable files from the bucket from changing the command from gcsfuse [bucket-name] ./[bucket-name] to gcsfuse --file-mode 777 --dir-mode 777 [bucket-name] ./[bucket-name] (and I changed the example code accordingly)
I noticed more recently that the set of commands above is no longer sufficient to be able to have a functional directory (I can't add or edit files, for example).
Based upon this discussion, I thought that I needed to add the -o allow_other parameter.
However, if that is all I do, I get the following error message
fusermount: option allow_other only allowed if 'user_allow_other' is set in /etc/fuse.conf
I can resolve that error message if I uncomment the corresponding line in that file. However, that still doesn't resolve having the right file permissions in the mounted directory.
So, I then tried editing my /etc/fstab file, by adding the following entry
[bucket-name] /home/[username]/[bucket-name] gcsfuse rw,allow_other,file_mode=777,dir_mode=777
I am also accordingly editing the content at the top (for whatever seems like it might help).
Also, please note that this was not a Docker-specific issue. This was necessary to essentially do anything within the bucket. Plus, I haven't actually solved this new problem.
For example, I still can't create files as root, after changing to the superuser via sudo su - (as described here)

Docker run: pass raw env variables

I've been banging my head against a wall for a couple hours trying to figure this out.
My node.js code uses an environment variable GAC to authenticate with google cloud.
let { GAC } = process.env
const firestore = new Firestore({ keyFilename: GAC })
GAC is the path to the json file that contains secret keys, ids, etc.
On my local machine, printenv displays:
GAC="${HOME}/project-id.json"
And obviously the file itself is located where it should be.
Now I want to run a docker container, passing it the GAC variable and the json file, so that my code can run as normal.
$ docker run \
-e GAC=$GAC \
-v $GAC:$GAC \
-t image \
printenv
I want to pass the unexpanded GAC variable to docker. So printenv inside the container should show the exact same thing as printenv on my local machine.
GAC="${HOME}/project-id.json"
However the $GAC variable is expanded by the shell before the command is run. So printenv inside the container ends up looking like this
GAC="C:\Users\eric\project-id.json"
which is incorrect.
How do I do this properly? I want my code to retrieve the GAC env variable, no matter if its being run on my local machine or in a container.

Parse a variable with the result of a command in DockerFile

I need to fill a variable in dockerfile with the result of a command
Like in bash var=$(date)
EDIT 1
date is a example.
in my case i use FROM phusion/baseimage:0.9.17 so i want at each building use the last version so i use this
curl -v --silent api.github.com/repos/phusion/baseimage-docker/tags 2>&1 | grep -oh 'rel-.*",' | head -1 | sed 's/",//' | sed 's/rel-//' ==> 0.9.17.
but i don't know how i parse it in var with dockerfile for this result
ENV verbaseimage=curl...
FROM phusion/baseimage:$verbaseimage
RESULT
In my use case
FROM phusion/baseimage:latest
But the question remains unresolved for other case
I had same issue and found way to set environment variable as result of function by using RUN command in dockerfile.
For example i need to set SECRET_KEY_BASE for Rails app just once without changing as would when i run:
docker run -e SECRET_KEY_BASE="$(openssl rand -hex 64)"
Instead it i write to Dockerfile string like:
RUN bash -l -c 'echo export SECRET_KEY_BASE="$(openssl rand -hex 64)" >> /etc/bash.bashrc'
and my env variable available from root, even after bash login.
or may be
RUN /bin/bash -l -c 'echo export SECRET_KEY_BASE="$(openssl rand -hex 64)" > /etc/profile.d/docker_init.sh'
then it variable available in CMD and ENTRYPOINT commands
Docker cache it as layer and change only if you change some strings before it.
You also can try different ways to set environment variable.
The old workaround is mentioned here (issue 2637: Feature request: expand Dockerfile ENV $VARIABLES in WORKDIR):
One work around that I've used, is to have a file in my context called "build-env". What I do is source it and run my desired command in the same RUN step. So for example:
build-env:
VERSION=stable
Dockerfile:
FROM radial/axle-base:latest
ADD build-env /build-env
RUN source build-env && mkdir /$VERSION
RUN ls /
But for date, that might not be as precise as you want.
Other workarounds are in issue 2022 "Dockerfile with variable interpolation".
In docker 1.9 (end of October 2015), you will have "support for build-time environment variables to the 'build' API (PR 9176)" and "Support for passing build-time variables in build context (PR 15182)".
docker build --build-arg=[]: Set build-time variables
You can use ENV instructions in a Dockerfile to define variable values. These values persist in the built image. However, often persistence is not what you want. Users want to specify variables differently depending on which host they build an image on.
A good example is http_proxy or source versions for pulling intermediate files. The ARG instruction lets Dockerfile authors define values that users can set at build-time using the ---build-arg flag:
$ docker build --build-arg HTTP_PROXY=http://10.20.30.2:1234 .
This flag allows you to pass the build-time variables that are accessed like regular environment variables in the RUN instruction of the Dockerfile.
Also, these values don't persist in the intermediate or final images like ENV values do.
so I want at each building use the last version so I use this
curl -v --silent api.github.com/repos/phusion/baseimage-docker/tags 2>&1 | grep -oh 'rel-.*",' | head -1 | sed 's/",//' | sed 's/rel-//' ==> 0.9.17.
If you want to use the last version of that image, all you need to do is use the tag 'latest' with the FROM directive:
FROM phusion/baseimage:latest
See also "The misunderstood Docker tag: latest": it doesn't always reference the actual latest build, but in this instance, it should work.
If you really want to use the curl|parse option, use it to generate a Dockerfile with the right value (as in a template processed to generate the right file).
Don't try to use it directly in the Dockerfile.
I wanted to set an ENV or LABEL variable from a computation in the Dockerfile, e.g. to make some computed installation options visible in docker inspect.
There does not seem to be any way to do that, and this issue suggests that it's a security design choice.
A Dockerfile can set an ENV variable to $X, ${X:-default}, or ${X:+substitute} where that $X must be another ENV or ARG variable.
A single RUN command can set and use shell variables, but that goes away at the end of the RUN command when that container layer shuts down.
A RUN command can write computed data into files, but the Dockerfile still can't get that data into an ENV or LABEL even if the file is ~/.bashrc. (File contents can, of course, be used by code running in the Container.)
The build can at least RUN echo $X to record choices to the build log -- unless that step comes from the build cache, in which case the RUN step doesn't run.
Please do correct me if there's a way out.
Partially connected to question. If one wants to use the result of some command later on it is possible within single RUN statement as follows:
RUN CUR_DIR=`pwd` && \
echo $CUR_DIR

Google gsutil auth without prompt

I want to use gsutil inside a Docker container. I have created an O2Auth Service Account JSON file.
How can I setup gsutil auth to use the JSON config file and execute commands without prompting?
Currently I get something like this:
$ gsutil config -e
It looks like you are trying to run "/.../google-cloud-sdk/bin/bootstrapping/gsutil.py config".
The "config" command is no longer needed with the Cloud SDK.
To authenticate, run: gcloud auth login
Really run this command? (y/N) y
This command will create a boto config file at /.../.boto
containing your credentials, based on your responses to the following questions.
What is the full path to your private key file?
What command/parameters/setup do I have to use to circumstance prompts?
Solved this issue by executing:
gcloud auth activate-service-account --key-file=/opt/gcloud/auth.json
The whole example and finished container can be found here: blacklabelops/gcloud
If you want gsutil only and bypass the prompt you can do it easily with an expect script:
#!/usr/bin/expect
spawn gsutil config -e
expect "What is the full path to your private key file?" { send "/path/your.key\r" }
expect "Would you like gsutil to change the file permissions for you? (y/N)" { send "y\r" }
expect "What is your project-id?" { send "your-projet-42\r" }
interact
The -o Credentials:gs_service_key_file=<path to JSON file> does the good work, using the boto configuration override parameters as documented at https://cloud.google.com/storage/docs/gsutil/addlhelp/TopLevelCommandLineOptions
$ gsutil -v
gsutil version: 4.57
$ gsutil -o Credentials:gs_service_key_file=key.json ls -al gs://bucket/filename
79948 2021-05-24T02:12:25Z gs://bucket/filename#1111111145678393 metageneration=2
The above solutions don't work for me.
What solved this problem for me is the following
Set the gs_service_key_file in the [Credentials] section of the boto config file (see here)
Activate your service account with gcloud auth activate-service-account
Set your default project in gcloud config
Dockerfile snipped:
ENV GOOGLE_APPLICATION_CREDENTIALS /.gcp/your_service_account_key.json
ENV GOOGLE_PROJECT_ID your-project-id
RUN echo '[Credentials]\ngs_service_key_file = /.gcp/your_service_account_key.json' \
> /etc/boto.cfg
RUN mkdir /.gcp
COPY your_service_account_key.json $GOOGLE_APPLICATION_CREDENTIALS
RUN gcloud auth activate-service-account --key-file=$GOOGLE_APPLICATION_CREDENTIALS --project $GOOGLE_PROJECT_ID
RUN gcloud config set project $GOOGLE_PROJECT_ID
using only gsutil:
first run this command to configure the authentication manually
gsutil config -a
This will create /root/.boto file with the needed credentials.
Copy that path/file into your docker image.
gsutil will now work with those credentials.

Resources