We are currently developing a python package, which will be build via an AzureDevOps Pipeline and the result package will be stored in the Azure Artifacts.
In production we install that package directly to some databricks clusters directly from the Azure Artifacts. Benfit is, whenever a new Version of the package is available, it is getting installed when starting a cluster
For developing, i want to do the similar within a local spark environment with docker container. We already set up docker containers which are working fine except one thing.
When i run my docker-compose command, i want to install my package from AzureArtifacts with the latest version.
Because we need access tokens to get this package in our setup, i can't provide this tokens in a git Repo. Therefore i need a way to provide the token safely to a docker-compose command and install the package from startup.
Also, if using the Dockerfile for the command, each time we will get a new version of our package, i have to build the docker-images again.
So this tasks need to be done from the user in my mind (assume DockerImages are already build):
Have a local file where a token is stored
Use my Docker-compose command to start up a local environment (by the way, with spark-master and workers and jupyter-notebook)
Automatic: get the token from the local file, provide it to any startup-script in the docker container and install the package from Azure Artifacts.
As i am no real expert on Docker, i found some topics regarding to ENTRYPOINT and CMD, but didn't understand that and what exactly to do.
Have anyone a hint which way we can go to easily implement that above logic?
PS: For testing i tried to install the package with command: during docker-compose with plaintext token, the installation worked but the juypter notebook was not accessible anymore :-(
Hopefully anybody has an idea or a better approach for what i am aiming to do.
Best Regards
You can use build-args:
docker-compose build --build-arg ARTIFACTORY_USERNAME=<your_username> --build-arg ARTIFACTORY_PASSWORD=<your_password> <service_to_build>
then your Dockerfile might look like:
FROM <my_base_image>
ARG ARTIFACTORY_USERNAME
ARG ARTIFACTORY_PASSWORD
RUN pip install <your_package_name> --extra-index-url https://$ARTIFACTORY_USERNAME:$ARTIFACTORY_PASSWORD#pkgs.dev.azure.com/<org>/_packaging/<your_package_name>/pypi/simple/
...
Related
I am trying to create an image for OpenShift v4 using RedHat universal base image(registry.access.redhat.com/ubi8/ubi). Unfortunately this image comes with some limitations at least for me, i.e. missing wget and on top I have corporate proxy messing up with the SSL certificates so I am creating builds from dockerfile and running them directly in OpenShift.
So far my Dockerfile looks like:
FROM registry.access.redhat.com/ubi8/ubi
RUN \
dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-8-aarch64/pgdg-redhat-repo-latest.noarch.rpm && \
dnf install -y postgresql13-server
CMD [ "systemctl start postgresql-13" ]
This ends-up with "Error: GPG check FAILED". I need some help how to create the proper Dockerfile using an image from RedHat and the rpm package for Docker. Any other ideas are pretty welcome.
Thanks in advance!
"Error: GPG check FAILED" is telling you that your system is not trusting that repo. You need to import it's key as rpm --import https://download.postgresql.org/pub/repos/yum/RPM-GPG-KEY-PGDG-AARCH64 or whichever key is right for your version
You don't want to start a postgres server with a systemd, that's actually against the container philosophy of running a single process inside container. Also, you can't have a proper pid 1 inside openshift without messing with SCCs, since the main idea of openshift restrictions is to run unprivileged containers, so getting systemd might be impossible in your environment.
Look at the existing postgres dockerfiles out there to gain inspiration, i.e. very popular bitnami postgres image. Notice that there is entrypoint.sh, which checks if database is already initialized, and creates it if it's not. Then in actually launces as postgres "-D" "$POSTGRESQL_DATA_DIR" "--config-file=$POSTGRESQL_CONF_FILE" "--external_pid_file=$POSTGRESQL_PID_FILE" "--hba_file=$POSTGRESQL_PGHBA_FILE"
Unless you really need a postgres 13 built upon rhel 8 UBI, i suggest you to look at official redhat docker images, here is the link if you want to build them yourself - https://github.com/sclorg/postgresql-container . As you can see - building a proper postgresql is quite a task, and without working all the quirks and knowing everything beforehand - you may end up with improperly configured or corrupted database.
You may also have postgres helm charts, templates or even operators configured in you cluster, and deploying a database can be as easy as couple of clicks.
TL,DR: Do not reinvent the wheel and do not create custom database images unless you have to. And if you have to - draw inspiration from existing Dockerfiles from reputable vendors.
I'm in the process of learning about Docker by trying to containerify a Strapi CMS. The default Docker image (https://github.com/strapi/strapi-docker) works well enough as a starting point, but I'm trying to add a couple packages to the Strapi instance for my needs (adding in Azure storage account support using https://www.npmjs.com/package/strapi-provider-upload-azure-storage). As I'm new to Docker, I'm having a hard time figuring how to make the container install that package as part of the Docker run process.
I see that the strapi/base image Dockerfile contains this line referencing a package.json file:
COPY ./package.json ./
I'm assuming that's where I would add a reference to the packages I'm wanting to install so that later on they are installed by npm, but I'm not sure where that package.json file is located, let alone how to modify it.
Any help on figuring out how to install that package during the Docker run process is greatly appreciated!
I figured out that strapi-docker uses a script to build images and not just the Docker files in the repo (bin/build.js). I also discovered that docker-entrypoint.sh is where the dependency installation is happening, so I added a couple of npm install statements after the check for the node_modules directory. Doing this allowed me to successfully add the desired packages to my Docker container.
I followed some of the advice from the Docker team here:
https://www.docker.com/blog/keep-nodejs-rockin-in-docker/
After performing the initial setup with docker-compose and the strapi/strapi image, I was able to install additional dependencies directly inside the container using docker-compose run <service name> yarn add <package>.
I opted for this route since I was having trouble installing the sharp library - it has different dependencies/binaries for Linux and Mac OS. This approached worked well for me, but the downside is that you can't mount your node_modules folder as a volume and it may take a little longer to install packages in the container.
What is the best practices for getting code into a docker container?
Here are some possible approaches:
ADD call in docker container
git clone or git wget from repo in docker container
Mount an external device
Usually, we use mount for dev/local environment with docker to have the code changes instantly applied.
You can use RUN git clone but you will need to install git and have access to the repository.
The easiest and most used/recommended way is putting the Dockerfile inside your repo and use the ADD call. But use the COPY directive instead because it’s more explicit
You should always COPY your application code in the Dockerfile. You generally shouldn't mount something over that, and you almost definitely shouldn't run git in your Dockerfile.
Running Git has a number of obvious issues. You can only build whatever specific commit the Dockerfile mentions; in real development it's not unusual to want to build a branch or a past commit. You need to install both git itself and relevant credentials in the image, with corresponding space and security issues. Docker layer caching means that docker build by default won't repeat a git clone step if it thinks it's already done that, which means you need to go out of your way to get a newer build.
It's very common in SO questions to mount code into a container. This very much defeats the point of Docker, IMHO. If you go to run your application in production you want to just distribute the image and not also separately distribute its code, but if you've been developing in an environment where you always hide the code built into the image, you've never actually run the image itself. There are also persistent performance and consistency problems in various environments. I don't really recommend docker run -v or Docker Compose volumes: as a way of getting code into a container.
I've seen enough SO questions that have a Dockerfile that's essentially just FROM python, for example, and then use Docker Compose volumes: and command: options to get a live-development environment on top of that. It's much easier to just install Python, and have the tools you need right in front of you without needing to go through a complex indirection sequence to run them in Docker.
I have an application which can be installed with ansible. No I want to create docker image that includes installed application.
My idea is to up docker container from some base image, after that start installation from external machine, to this docker container. After that create image from this container.
I am just starting with dockers, could you please advise if it is good idea and how can I do it?
This isn’t the standard way to create a Docker image and it isn’t what I’d do, but it will work. Consider looking at a tool like Hashicorp’s Packer that can automate this sequence.
Ignoring the specific details of the tools, the important thing about the docker build sequence is that you have some file checked into source control that an automated process can use to build a Docker image. An Ansible playbook coupled with a Packer JSON template would meet this same basic requirement.
The important thing here though is that there are some key differences between the Docker runtime environment and a bare-metal system or VM that you’d typically configure with Ansible: it’s unlikely you’ll be able to use your existing playbook unmodified. For example, if your playbook tries to configure system daemons, install a systemd unit file, add ssh users, or other standard system administrative tasks, these generally aren’t relevant or useful in Docker.
I’d suggest making at least one attempt to package your application using a standard Dockerfile to actually understand the ecosystem. Don’t expect to be able to use an Ansible playbook unmodified in a Docker environment; but if your organization has a lot of Ansible experience and you can easily separate “install the application” from “configure the server”, the path you’re suggesting is technically fine.
You can use multi-stage builds in Docker, which might be a nice solution:
FROM ansible/centos7-ansible:stable as builder
COPY playbook.yaml .
RUN ansible-playbook playbook.yaml
FROM alpine:latest # Include whatever image you need for your application
# Add required setup for your app
COPY --from=builder . . # Copy files build in the ansible image, aka your app
CMD ["<command to run your app>"]
Hopefully the example is clear enough for you to create your Dockerfile
I'm working on a project the requires me to run docker within docker. Currently, I am just relying on the docker client to be running within docker and passing in an environment variable to the TCP address of the docker daemon with which I want to communicate.
The file in the Dockerfile that I use to install the client looks like this:
RUN curl -s https://get.docker.io/builds/Linux/x86_64/docker-latest -o /usr/local/bin/docker
However, the problem is that this will always download the latest docker version. Ideally, I will always have the Docker instance running this container on the latest version, but occasionally it may be a version behind (for example I haven't yet upgraded from 1.2 to 1.3). What I really want is a way to dynamically get the version of the Docker instance that's building this Dockerfile, and then pass that in to the URL to download the appropriate version of Docker. Is this at all possible? The only thing I can think of is to have an ENV command at the top of the Dockerfile, which I need to manually set, but ideally I was hoping that it could be set dynamically based on the actual version of the Docker instance.
While your question makes sense from an engineering point of view, it is at odds with the intention of the Dockerfile. If the build process depended on the environment, it would not be reproducible elsewhere. There is not a convenient way to achieve what you ask.