Is there any way to know what is in a public image other than downloading it and checking it out manually?
e.g. I can see on dockerhub various java images and a various ansible images, I would have to download quite a lot to determine which one to use and if any had both
The dockerfile lists some info but often there is inheritance and so you can't see all the info.
Is there anything that lists all the contained packages or an online service that lets you try them out without downloading the whole image?
MicroBadger lists the docker history of a Docker image and shows matching base images (with their layers as well). E.g. https://microbadger.com/images/ansible/ansible
Related
I'm new to using Docker and wanted to understand how to add large folders (combined ~1GB) kept elsewhere (such as in SharePoint) to the Docker container using Dockerfile. What is the best way to add the files and can someone explain the commands to be used? For example, one method I have come across is the following:
ADD http://example.com/big.tar.xz /usr/src/things/
Does the /usr/src/things/ specify the location where I want to save the folders (not individual files) with respect to my original repository?
This answer is from: Adding large files to docker during build which covers the question at a high level. Can someone share details/commands for each step involved? One answer mentions not adding the files to the image but mounting as a volume. Is that a better option than using ADD in the Dockerfile.
Thanks!
I am facing an issue when I try to create a custom image containing tensorflow. But when I use the official repositories I did not see that problem. Then I am trying to know which Docker file from https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/dockerfiles/dockerfiles was used to generate the Docker.hub image. Could you help me, please?
You can always use [docker image inspect][1] to get further information about an image's layers and how it was built locally.
On docker hub if you click on the tags tab, you can click on any image, and it will show you that info in a very nice annotation of that image's layers. Next to the dockerfile.
I currently have an application that (for the sake of simplicity) just requires a .csv file. However, this file needs to be constructed with a script, let's call it create_db.py.
I have two images (let's call them API1 and API2) that require the .csv file so I declared a 2-stage build and both images copy the .csv into their filesystem. This makes the Dockerfiles somewhat ugly as API1 and API2 have the same first lines of the Dockerfile plus there is no guarantee that both images have the same .csv because it is constructed "on the fly"
I have two possible solutions to this problem:
First option:
Make a separate docker image that executes create_db.py and then tag it as data:latest. Copy the .csv in API1 and API2 doing
FROM data:latest as datapipeline
FROM continuumio/miniconda3:4.7.12
...
...
COPY --from=datapipeline file.csv .
Then I will need to create a bash file to make sure data:latest is built (and up to date) before building API1 and API2.
Pros: Data can be pulled from a repository if you are in a different machine, no need to "rebuilt it" again.
Cons: Every time I build API1 and API2 I need to make sure that data:latest is up to date. API1 and API2 require data:latest to be used.
Second option:
Create a volume data/ and an image that runs create_db.py and mount the volume so the .csv is in data/. Then mount the volume for API1 and API2. I will also need some kind of mechanism that makes sure that data/ contains the required file.
Mounting volumes sounds like the right choice when dealing with shared data, but in this case, I am not sure because my data needs "to be built" before being able to be used. Should I go with the first option then?
Chosen solution, thanks to #David Maze
What I ended up doing is separating the data pipeline in its own Docker image and then COPY from that image in API1 and API2.
To make sure that API1 and API2 always have the latest "data image" versión, the data pipeline calculate the hashes of all output files, then tries to do docker pull data:<HASH> if it fails it means that this version of the data is not in the registry and the data image is tagged as both data:<HASH> and data:latest and pushed to the registry. This guarantees that data:latest always points to the last data pushed to the registry and at the same time I can keep track of all the data:<HASH> versions
If it’s manageable size-wise, I’d prefer baking it into the image. There’s two big reasons for this: it makes it possible to just docker run the image without any external host dependencies, and it works much better in cluster environments (Docker Swarm, Kubernetes) where sharing files can be problematic.
There’s two more changes you can make to this to improve your proposed Dockerfile. You can pass the specific version of the dataset you’re using as an ARG, which will help the situation where you need to build two copies of the image and need them to have the same dataset. You can also directly COPY --from= an image, without needing to declare it as a stage.
FROM continuumio/miniconda3:4.7.12
ARG data_version=latest
COPY --from=data:${data_version} file.csv .
I’d consider the volume approach only if the data file is really big (gigabytes). Docker images start to get unwieldy at that size, so if you have a well-defined auxiliary data set you can break out, that will help things run better. Another workable approach could be to store the datafile somewhere remote like an AWS S3 bucket, and download it at startup time (adds some risk of startup-time failure and increases the startup time, but leaves the image able to start autonomously).
I want to use yocto to build a customized image for an embedded system and I want to create a docker image from this custom image. Usually in docker one would build a image using a parent image e.g. FROM ubuntu:xenial. However, in this case there is no official image available, so I need to create a new base image. I looked up the docs for creating a base image but it doesn't explain the whole process. I would appreciate if anyone could give me a hint or a link for a tutorial or something.
Thank you!
Yes that is possible. A dockerfile would look like that:
FROM scratch
ADD app-container-image-python3-data-collector-container-x86-64.tar.bz2 /
But I would recommend having a look at meta-virtualization and oci-images, which you can generate directly and e.g. upload to docker.io.
Since v2.4.0 a garbage collector command is included within the registry binary. I read about how it works in the official documentation.
To use the garbage-collection:
bin/registry garbage-collect [--dry-run] /path/to/config.yml
I see the config in /etc/docker/registry/config.yml
When I just perform a dry-run I see a lot of blobs marked and at the end the blobs which would have been deleted without dry-run.
But I don't see how I can easily link this blobs to images?
Which images will be deleted and am I able to tell which image should be deleted or do I need to use another command and after that I have to run the gc?)
Can someone maybe provide an example in which case an image/blob will be deleted? Thanks
From your referenced documentation:
In the context of the Docker registry, garbage collection is the process of removing blobs from the filesystem which are no longer referenced by a manifest. Blobs can include both layers and manifests.
Manifests are groups of blobs (layers) used to represent an image tag. The only blobs deleted no longer reference any image. So to answer your question, if GC is working correctly, no one should be able to give an example of this deleting an image, but every useful GC should delete blobs, including your own.