Setting up and running docker images: basic questions [closed] - docker

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I'm a bit confused as to how to go ahead with docker.
I can build an image with the following Dockerfile:
FROM condaforge/mambaforge:4.10.1-0
# Use bash as shell
SHELL ["/bin/bash", "-c"]
# Set working directory
WORKDIR /work_dir
# Install vim
RUN ["apt-get", "update"]
RUN ["apt-get", "install", "-y", "vim"]
# Start Bash shell by default
CMD /bin/bash
I build it with docker build --rm . -t some_docker but then I'd like to enter the container, and install things individually interactively, so that later on I can export the whole image with all additional installations. So I then can start it interactively with docker run -it some_docker, after which I do my things. I would then like to export it.
So here are my specific questions:
Is there an easier way to build (and keep) the image available so that then I can come back to it at another point? When I run docker ps -a I see so many images that I dont know what they do since many of them dont have any tag.
After building I get the warning Use 'docker scan' to run Snyk tests against images to find vulnerabilities and learn how to fix them. Is this a problem and if so, how to solve it?
How can I specify in my Dockerfile (or docker build?) that ports for rstudio should be open? I saw that docker-compose allows you to specify ports: 8787:8787, how do I do it in here?

With docker ps -a, what you're seeing is container rather than images. To list images, use docker image ls instead. Whether you should delete images depends on what containers you're going to run in the future. Docker uses layer architecture with Copy-on-write strategy. So for example, in the future, if you want to use the image FROM condaforge/mambaforge:4.10.1-0, docker won't have to download and install it again. Your example is fairly simple, but with more complicated apps, it may take a lot of time to build images and run container from scratch (the longest I have experienced is about 30 mins). However, if storage is your concern, just go ahead delete images that you don't use very often. Read more
Yes, of course. However, it depends on the details that you have from docker scan. To see more details, you can run docker scan --file PATH_TO_DOCKERFILE DOCKER_IMAGE. Read more
Dockerfile is for building images, and Docker-compose file is for orchestrating containers. That's why you cannot publish ports in Dockerfile. This also creates problems like security or conflicts. All you can do is to expose container ports, then run docker run -d -P --name app_name app_image_name to publish all ports exposed in the container.

Related

How to test a Dockerfile with minimal overhead

I'm trying to learn how to write a Dockerfile. Currently my strategy is as follows:
Guess what commands are correct to write based documentation.
Run sudo docker-compose up --build -d to build a docker container
Wait ~5 minutes for my anaconda packages to install
Find that I made a mistake on step 15, and go back to step 1.
Is there a way to interactively enter the commands for a Dockerfile, or to cache the first 14 successful steps so I don't need to rebuild the entire file? I saw something about docker exec but it seems that's only for running containers. I also want to try and use the same syntax as I use in the dockerfile (i.e. ENTRYPOINT and ENV) since I'm not sure what the bash equivalent is/if it exists.
you can run docker-compose without the --build flag that way you don't have to rebuild the image every time, although as you are testing the Dockerfile i don't know if you have much options here; the docker should cache automatically the builds but only if there's no changes from the last time that you made a build, and there's no way to build a image interactively, docker doesn't work like that, lastly, the docker exec is just to run commands inside the container that was created from the build.
some references for you: docker cache, docker file best practices

Running DBT within Airflow through the Docker Operator

Building my question on How to run DBT in airflow without copying our repo, I am currently running airflow and syncing the dags via git. I am considering different option to include DBT within my workflow. One suggestion by louis_guitton is to Dockerize the DBT project, and run it in Airflow via the Docker Operator.
I have no prior experience using the Docker Operator in Airflow or generally DBT. I am wondering if anyone has tried or can provide some insights about their experience incorporating that workflow, my main questions are:
Should DBT as a whole project be run as one Docker container, or is it broken down? (for example: are tests ran as a separate container from dbt tasks?)
Are logs and the UI from DBT accessible and/or still useful when run via the Docker Operator?
How would partial pipelines be run? (example: wanting to run only a part of the pipeline)
Judging by your questions, you would benefit from trying to dockerise dbt on its own, independently from airflow. A lot of your questions would disappear. But here are my answers anyway.
Should DBT as a whole project be run as one Docker container, or is it broken down? (for example: are tests ran as a separate container from dbt tasks?)
I suggest you build one docker image for the entire project. The docker image can be based on the python image since dbt is a python CLI tool. You then use the CMD arguments of the docker image to run any dbt command you would run outside docker.
Please remember the syntax of docker run (which has nothing to do with dbt): you can specify any COMMAND you wand to run at invocation time
$ docker run [OPTIONS] IMAGE[:TAG|#DIGEST] [COMMAND] [ARG...]
Also, the first hit on Google for "docker dbt" is this dockerfile that can get you started
Are logs and the UI from DBT accessible and/or still useful when run via the Docker Operator?
Again, it's not a dbt question but rather a docker question or an airflow question.
Can you see the logs in the airflow UI when using a DockerOperator? Yes, see this how to blog post with screenshots.
Can you access logs from a docker container? Yes, Docker containers emit logs to stdout and stderr output streams (which you can see in airflow, since airflow picks this up). But logs are also stored in JSON files on the host machine in a folder /var/lib/docker/containers/. If you have any advanced needs, you can pick up those logs with a tool (or a simple BashOperator or PythonOperator) and do what you need with it.
How would partial pipelines be run? (example: wanting to run only a part of the pipeline)
See answer 1, you would run your docker dbt image with the command
$ docker run my-dbt-image dbt run -m stg_customers

How do I build docker images without docker?

Is there some lightweight way I can build a docker image within a container without having a working docker machine. Here's what I'm trying to do:
$ docker run -it --rm docker:latest
/ # mkdir test
/ # touch test/Dockerfile
/ # docker build test
Cannot connect to the Docker daemon. Is the docker daemon running on this host?
Assuming I had a valid Dockerfile in place, is there some way I could create an docker image from within a container like this?
Part of the problem could be that you're missing the --privileged flag, but in general, your questions can probably be answered here: https://hub.docker.com/_/docker/
And you might take the time to read the blog linked there detailing some of the pitfalls of using docker-in-docker.

Updating a container created from a custom dockerfile

Before anything, I have read this question and the related links in it, but I am still confused about how to resolve this on my setup.
I wrote my own docker file to install Archiva, which is very similar to this file. I created an image from the docker file using docker build -t archiva . and have a container which I run using docker run archiva. As seen in the docker file, the user data that I want to preserve is in a volume.
Now I want to upgrade to Archive 2.2.0. How can I update my container, so that the user-data thats in the volume is preserved? If I change the docker file by h=just changing the version number, and run the docker build again, it will just create another container.
Best practice
The option --volume of the docker-run enables sharing files between host and container(s) and especially preserve consistent [user] data.
The problem is ..
.. it appears that you are not using --volume and that the user data are in the image. (and that's a bad practice beacuse it leads to the situation you are in: unable to upgrade a service easily.
One solution (the best IMO) is
Back-up the user data
To use the command docker-cp: "Copy files/folders between a container and the local filesystem."
docker cp [--help] CONTAINER:SRC_PATH DEST_PATH
Upgrade your Dockerfile
By editing your Dockerfile and changing the version.
Use the --volume option
Use docker run -v /host/path/user-data:container/path/user-data archiva
And you're good!

is it possible to wrap an entire ubuntu 14 os in a docker image

I have a Ubuntu 14 desktop, on which I do some of my development work.
This work mainly revolves around Django & Flask development using PyCharm
I was wandering if it was possible to wrap the entire OS file system in a Docker container, so my whole development environment, including PyCharm and any other tools, would become portable
Yes, this is where Docker shines. Once you install Docker you can run:
docker run --name my-dev -it ubuntu:14.04 /bin/bash
and this will put you, as root, inside a Docker container's bash prompt. It is for all intents and purposes the entire os without anything extra, you will need to install the extras, like pycharm, flask, django, etc. Your entire environment. The environment you start with has nothing, so you will have to add things like pip (apt-get install -y python-pip), and other goodies. Once you have your entire environment you can exit (with exit, or ^D) and you will be back in your host operating system. Then you can commit :
docker commit -m 'this is my development image' my-dev my-dev
This takes the Docker image you just ran (and updated with changes) and saves it on your machine with the tag my-dev:v1, any time in the future you can run this again using the invocation:
docker run -it my-dev /bin/bash
Building a Docker image like this is harder, it is easier once you learn how to make a Dockerfile that describes the base image (ubuntu:14.04) and all of the modifications you want to make to it in a file called Dockerfile. I have an example of a Dockerfile here:
https://github.com/tacodata/pythondev
This builds my python development environment, including git, ssh keys, compilers, etc. It does have my name hardcoded in it, so, it won't help you much doing development (I need to fix that). Anyway, you can download the Dockerfile, change it with your details in it, and create your own image like this:
docker build -t my-dev -< Dockerfile
There are hundreds of examples on the Docker hub which is where I started with mine.
-g

Resources