I'm using Composer version 1.2.0-1.9.0, and I'm trying to use a MsSqlOperator in one of my DAGs. However, when published, Airflow gave me the error: No module named 'pymssql'.
Now, I could install it as a PyPi package, but shouldn't it be supported natively? Even if not, can't I include the mssql subpackage when creating an environment?
If you want to add additional packages which are not part of the base environment, in the Cloud Composer console, when clicking on your environments name you have a tab named PYPI PACKAGES. There you can specify the Python package and version you would like to add to your environment.
You can also do it programmatically by creating a requirements.txt file indicating the additional packages you want and pass it to your Composer environment using the gcloud commands. The needed gcloud command is the one below.
gcloud composer environments update ENVIRONMENT-NAME \
--update-pypi-packages-from-file requirements.txt \
--location LOCATION
Related
I am running aws-mwaa-local-runner in order to run a local Apache Airflow environment (in Docker for Windows).
However, after creating the container using ./mwaa-local-env start, I repeatedly get the Broken DAG ModuleNotFoundError' However, when I check my /docker/config/requirements.txt file (see here, although my file has a few more requirements that I need in it). When I compare my /docker/config/requirements.txt file with the output of pip freeze command run in Airflow container, I can see those requirements I need for my DAGs are missing.
I tried to pip install my other requirements in Airflow container but to no avail.
Is there a way to modify docker-compose-local.yml file so it installs all of my requirements.txt when creating the container (i.e. running the Airflow)?
Is there maybe something I might be missing? Any help or suggestion would be greatly appreciated.
Look at this: https://github.com/aws/aws-mwaa-local-runner . You should run the requirements file located in dags, locally.
pip install -r requirements.txt
Add your extra requirements to dags/requirements.txt, not docker/config/requirements.txt. The former is installed every time you start the service, but the latter is only installed when you build or rebuild the image.
Additionally, keeping your added requirements separate is important because you will need to upload the list to your MWAA environment.
The background is, I'm responsible for maintaining a fancy Docker image that is used by our team for analytics. It uses a Jupyter notebook image as the base, and then adds various customisations, extra packages, etc.
One of the team members recently wanted to run Tensorflow. No problem, I'll just run mamba install and add it to the image. However, this created an issue: Tensorflow 2.4.3 (the latest version) is somehow incompatible with R 4.1.1 (also the latest version) or something else in the ecosystem, causing R to to be downgraded to 3.6.3. So I created a new environment and installed TF into that:
FROM hongooi/jupytermodelrisk:1.0.0
RUN mamba create -n tensorflow --clone base
# Make RUN commands use the new environment
RUN echo "conda activate tensorflow" >> ~/.bashrc
SHELL ["/bin/bash", "--login", "-c"]
RUN mamba install -y 'tensorflow=2.4.3'
But when I rebuilt the image, I found that while the tensorflow env had been created, the Tensorflow package had been installed into the base env, not the tensorflow env. Has anyone else encountered this? I can verify, if I login to the container, that the tensorflow env has been created: it just doesn't contain the Tensorflow package.
I don't get this problem if run the create, activate and install commands from inside the container. It's only when I try to do it in the Dockerfile.
I use mamba instead of conda because the latter takes forever to run, given the number of packages installed. In fact, trying to run conda install tensorflow crashes after ~5 hours.
Not an expert on dockerfiles, but in general you could just use the -n flag to the install command to specify the target environment for the installation like so:
mamba install -n tensorflow -y tensorflow=2.4.3
Is there a way to install VS Code extensions in a Dockerfile?
Apparently, while most browser-based VS Code forks (including openvscode-server) do not permit headless installation of VS Code extensions (as seen from my other answer), it is possible to do such automated installs using docker build in one of them: Code Server (code-server), like in this sample Dockerfile:
FROM ubuntu:22.04
RUN apt update && apt install -y curl
# install VS Code (code-server)
RUN curl -fsSL https://code-server.dev/install.sh | sh
# install VS Code extensions
RUN code-server --install-extension redhat.vscode-yaml \
--install-extension ms-python.python
Relevant fragment of the docker build log:
[..]
---> Running in 59eea050a2db
[2022-11-13T10:13:58.762Z] info Wrote default config file to ~/.config/code-server/config.yaml
Installing extensions...
Installing extension 'redhat.vscode-yaml'...
Installing extension 'ms-python.python'...
Extension 'redhat.vscode-yaml' v1.10.1 was successfully installed.
Extension 'ms-python.python' v2022.16.1 was successfully installed.
[..]
Per the VS Code Documentation, the extensions property is specific to VS Code and can only be configured using .devcontainer.
The best you can do is if the extension has a CLI, you can install that. For example,
RUN npm install prettier -D --save-exact
Then use npx:
npx prettier --check .
This is sadly disallowed by design, as confirmed by this error message you will see in your docker build log when you attempt to run code --install-extension or openvscode-server --install-extension:
Command is only available in WSL or inside a Visual Studio Code terminal.
This is also confirmed (and tagged) as being as designed by one of VS Code Remote devs in this GitHub issue:
This is correct the 'vs code server CLI' is only available from the integrated terminal.
So VS Code Remote or even openvscode-server make it impossible to automate installs of first- or third-party extensions for this popular Microsoft IDE, unless you run their custom terminal inside their closed-source IDE, which normally entails buying a license for their GUI-based closed-source operating system as well;)
I want to create a Jenkins based image to have some plugins installed as well as npm. To do so I have the following Dockerfile:
FROM jenkins:2.60.3
RUN install-plugins.sh bitbucket
USER root
RUN apt-get update
RUN curl -sL https://deb.nodesource.com/setup_8.x | bash -
RUN apt-get install -y nodejs
RUN npm --version
USER jenkins
That works fine however when I run the image I have two problems:
It seems that the plugins I tried to install manually didn't get persisted for some reason.
I get prompted the list of plugins I want to install but I don't want to install anything else.
Am I missing anything configuring the Dockerfile or is it that what I want to achieve is simply not possible?
Without seeing the contents of install-plugins.sh, I can't comment as to why the plugins aren't persisting. It is most likely caused by an incorrect installation destination; persistence shouldn't be an issue at this stage, since the plugin installation is built into the image itself.
As for the latter issue, you should be able to skip the installation wizard altogether by adding the line ENV JAVA_OPTS=-Djenkins.install.runSetupWizard=false
to your Dockerfile. Please note that this can be a security risk, if the Jenkins image is exposed to the world at large, since this option disables the need for authentication
EDIT: The default plugin directory for the Docker image is /var/jenkins_home/plugins
EDIT 2: According to the README on the Jenkins Docker repo, adding the line RUN echo 2.0 > /usr/share/jenkins/ref/jenkins.install.UpgradeWizard.stateshould accomplish the same thing
Things have changed since 2017, when the last answer was posted, and it no longer works. The current way is in following Dockerfile snippet:
# Prevent setup wizard from running.
# WARNING: Jenkins will start with security disabled, without any password.
ENV JENKINS_OPTS="-Djenkins.install.runSetupWizard=false"
# plugins.txt must contain the list of plugins to be installed
# (One plugin per line, e.g. sidebar-link:1.11.0)
COPY plugins.txt /tmp/plugins.txt
RUN /usr/local/bin/install-plugins.sh < /tmp/plugins.txt
I have a requirement that before an application runs, some part of it needs to read the environmental variable. For this I have the following docker file
FROM nodesource/jessie:0.12.7
# install gettext for envsubst
RUN apt-get update
RUN apt-get install -y gettext-base
# cache package.json and node_modules to speed up builds
ADD package.json package.json
RUN npm install
# Add source files
ADD src src
# Substiture value for backend endpoint env var
RUN envsubst < src/js/envapp.js > src/js/app.js
ADD node_modules node_modules
EXPOSE 8000
CMD ["npm","start"]
The above envsubst line reads (should read) an env variable $MYENV and substitutes it. But when I open the file app.js, its empty.
I checked if the environmental variable exists in the container and it does. Any reason its value is not read and substituted?
I also tried the same command in teh container and it works. It only does not work when I run the image
This is likely because $MYENV is not available for envsubst when you run the image.
Each RUN command runs on its own shell.
From the Docker documentations:
RUN (the command is run in a shell - /bin/sh -c - shell form)
You need to source your profile as well, for example if the $MYENV environment variable is available in the .bashrc file, you can modify your Dockerfile like this:
RUN source ~/.bashrc && envsubst < src/js/envapp.js > src/js/app.js
I encountered the same issues, and after much research and fishing through the internet. I managed to find a few work arounds to this issue. Below I'll list them and identifiable risks at the time of this "Answer post"
Solutions:
1.) apt-get install -y gettext its a standard GNU package language library, one of these libraries that it includes is envsubst` and I can confirm that it works for docker UBUNTU:latest and it should work for every flavored version.
2.) npm install envsub dependent on the "use case" - this approach would be better supported by node based projects.
3.) enstub cli project lib in my opinion it seems a bit overkill to downloading a custom cli from a random stranger but, it's also another option.
Risk:
apt-get install -y gettext:
1.) gettext - this approach would NOT be ideal for VM's as with any package library, it requires maintenance and updates as time passes. However, this isn't necessary for docker because once an a container is initialized and up and running we can create a bashscript to add the package, substitute env vars and then remove the package.
2.) It's a bad idea for VM's because it can be used to execute arbitrary code
npm install ensub
1.) envsub - updating packages and this approach wouldn't be ideal if your dealing with a different stack and not using nodejs.
NOTE:
There's also a PHP version for those developing a PHP application and it seems to work PHP's cli if you need a custom environment.
Resources:
GetText package library info: https://www.gnu.org/software/gettext/
GetText Risk - https://ubuntu.com/security/notices/USN-3815-2
PHP-GetText - apt-get install -y php-gettext
Custom ensubst cli: https://github.com/a8m/envsubst
I suggest that since your are using Node, you use the npm envsub module.
This module is well tested and is developed with docker in mind.
It avoids the need for relying on other dependencies when you already have the full Node arsenal at your fingertips.
envsub is described as
envsub is envsubst for NodeJS
NodeJS global CLI module providing file-level environment variable substitution via Handlebars
I am the author of the package. I think you will enjoy it.
I had some issues with envsubst in Docker.
For some reasons envsubst doesn't work when I try to copy the output in the same file. For example, this is not working:
RUN envsubst < file.conf > file.conf
But when I when I tried to use a temp file the issue disappeared:
RUN envsubst < file.conf > file.conf.temp && cp -f file.conf.temp file.conf