Nextflow Does Not Pull "latest" Docker Image - docker

I am running two VMs. One VM is used for running nextflow, on the other VM there is a Jenkins build server. Jenkins is responsible for building new Docker images and pushing new Docker images to our private google container registry.
My nextflow.config file looks something like this:
process {
withLabel: awesome_image {
container = "eu.gcr.io/best-project-1234/coolest_os:latest"
}
}
After building a new image using the Jenkins server I was running a new nextflow script and I noticed that nextflow was still using the old image. After some research (https://stackoverflow.com/a/58539792/1820480), I realized that this has to do with the fact that I am using the latest tag, and since there is already an image called latest on the nextflow VM, nextflow uses that one and does not bother checking the registry.
Question: How can I ensure that before every run of nextflow, it checks the registry for newer images? Or, is there a script/program that I can run on the VM that checks the registry (instead of nextflow)?
Thank you.

Nextflow just runs your command(s) in a container using docker run. If you specify an image that you haven't pulled yet, docker run will first do a docker pull to download/localize the image. To check the registry again for newer images, you'll just need to make sure you call docker pull (for each image) before running Nextflow. If you want to instead check the registry for newer images each time a process is spawned, please see below.
After some research, it looks like the latest Docker cli (v20.10.0) now has a flag to modify the pull behavior when running containers:
--pull string Pull image before running ("always"|"missing"|"never") (default "missing")
This is nice because it means it should now be possible pass this through in your nextflow.config:
docker {
enabled = true
runOptions = '--pull=always'
}
But this will have the overhead of doing a docker pull for each process spawned and, depending on when new images are pushed to your registry, may mean some processes get different containers during your workflow execution. This may not be a concern though if you only need the 'latest' containers and do not care for reproducibility.

Related

Dockerfile FROM command - Does it always download from Docker Hub?

I just started working with docker this week and came across a 'dockerfile'. I was reading up on what this file does, and the official documentation basically mentions that the FROM keyword is needed to build a "base image". These base images are pulled from Docker hub, or downloaded from there.
Silly question - Are base images always pulled from docker hub?
If so and if I understand correctly I am assuming that running the dockerfile to create an image is not done very often (only when needing to create an image) and once the image is created then the image is whats run all the time?
So the dockerfile then can be migrated to which ever enviroment and things can be set up all over again quickly?
Pardon the silly question I am just trying to understand the over all flow and how dockerfile fits into things.
If the local (on your host) Docker daemon (already) has a copy of the container image (i.e. it's been docker pull'd) specified by FROM in a Dockerfile then it's cached and won't be repulled.
Container images include a tag (be wary of ever using latest) and the image name e.g. foo combined with the tag (which defaults to latest if not specified) is the full name of the image that's checked i.e. if you have foo:v0.0.1 locally and FROM:v0.0.1 then the local copy is used but FROM foo:v0.0.2 will pull foo:v0.0.2.
There's an implicit docker.io prefix i.e. docker.io/foo:v0.0.1 that references the Docker registry that's being used.
You could repeatedly docker build container images on the machines where the container is run but this is inefficient and the more common mechanism is that, once a container image is built, it is pushed to a registry (e.g. DockerHub) and then pulled from there by whatever machines need it.
There are many container registries: DockerHub, Google Artifact Registry, Quay etc.
There are tools other than docker that can be used to interact with containers e.g. (Red Hat's) Podman.

How to pull new docker images and restart docker containers after building docker images on gitlab?

There is an asp.net core api project, with sources in gitlab.
Created gitlab ci/cd pipeline to build docker image and put the image into gitlab docker registry
(thanks to https://medium.com/faun/building-a-docker-image-with-gitlab-ci-and-net-core-8f59681a86c4).
How to update docker containers on my production system after putting the image to gitlab docker registry?
*by update I mean:
docker-compose down && docker pull && docker-compose up
Best way to do this is to use Image puller, lot of open sources are available, or you can write your own on the Shell. There is one here. We use swarm, and we use this hook concept to be triggered from our CI-CD pipeline. Once our build stage is done, we http the hook url, and the docker pulls the updated image. One disadvantage with this is you need a daemon to watch your hook task, that it doesnt crash or go down. So my suggestion is to run this hook task as a docker container with restart-policy as RestartAlways

Some questions on Docker basics?

I'm new to docker.Most of the tutorials on docker cover the same thing.I'm afraid I'm just ending up with piles of questions,and no answers really. I've come here after my fair share of Googling, kindly help me out with these basic questions.
When we install a docker,where it gets installed? Is it in our computer in local or does it happen in cloud?
Where does containers get pulled into?I there a way I can see what is inside the container?(I'm using Ubuntu 18.04)
When we pull an image.Docker image or clone a repository from
Git.where does this data get is stored?
Looks like you are confused after reading to many documents. Let me try to put this in simple words. Hope this will help.
When we install a docker,where it gets installed? Is it in our
computer in local or does it happen in cloud?
We install the docker on VM be it you on-prem VM or cloud. You can install the docker on your laptop as well.
Where does containers get pulled into?I there a way I can see what is
inside the container?(I'm using Ubuntu 18.04)
This question can be treated as lack of terminology awareness. We don't pull the container. We pull the image and run the container using that.
Quick terminology summary
Container-> Containers allow you to easily package an application's code, configurations, and dependencies into a template called an image.
Dockerfile-> Here you mention your commands or infrastructure blueprint.
Image -> Image gets derived from Dockerfile. You use image to create and run the container.
Yes, you can log inside the container. Use below command
docker exec -it <container-id> /bin/bash
When we pull an image.Docker image or clone a repository from
Git.where does this data get is stored?
You can pull the opensource image from Docker-hub
When you clone the git project which is docerized, you can look for Dockerfile in that project and create the your own image by build it.
docker build -t <youimagenae:tag> .
When you build or pull the image it get store in to your local.
user docker images command
Refer the below cheat-sheet for more commands to play with docker.
The docker daemon gets installed on your local machine and everything you do with the docker cli gets executed on your local machine and containers.
(not sure about the 1st part of your question). You can easily access your docker containers by docker exec -it <container name> /bin/bash for that you will need to have the container running. Check running containers with docker ps
(again I do not entirely understand your question) The images that you pull get stored on your local machine as well. You can see all the images present on your machine with docker images
Let me know if it was helpful and if you need any futher information.

Syncing docker images

I have 2 machines(separate hosts) running docker and I am using the same image on both the machines. How do I keep both the images in sync. For eg. suppose I make changes to the image in one of the hosts and want the changes to reflect in the other host as well. I can commit the image and copy the image over to the other host. Is there any other efficient way of doing this??
Some ways I can think of:
1. with a Docker registry
the workflow here is:
HOST A: docker commit, docker push
HOST B: docker pull
2. by saving the image to a .tar file
the workflow here is:
HOST A: docker save
HOST B: docker load
3. with a Dockerfile and by building the image again
the workflow here is:
provide a Dockerfile together with your code / files required
everytime your code has changed and you want to make a release, use docker build to create a new image.
from the hosts that you want to take the update, you will have to get the updated source code (maybe by using a version control software like Git), and then docker build the image
4. CI/CD pipeline
you can see a video here: docker.com/use-cases/cicd
Keep in mind that containers are considered to be ephemeral. This means that updating an image inside another host will then require:
to stop and remove any old container (running with the outdated image)
to run a new one (with the updated image)
I quote from: Best practices for writing Dockerfiles
General guidelines and recommendations
Containers should be ephemeral
The container produced by the image your Dockerfile defines should be as ephemeral as possible. By “ephemeral,” we mean that it can be stopped and destroyed and a new one built and put in place with an absolute minimum of set-up and configuration.
You can perform docker push to upload you image to docker registry and perform a docker pull to get the latest image from another host.
For more information please look at this

Docker - is it necessary to push images to remote server?

I have successfully built some Docker images:
Now I would like to start my microservices by docker-compose, unfortunatelly I am unable to pull those images i.e. repository callista/discovery-server not found: does not exist or no pull access I solved this error by logging into my DockerHub account and pushining those images to remote server. But it seems to me like a little overkill to send such larges images (which are likely to change pretty soon) over the Internet over and over again twice (push&pull).
Is it possible to configure Docker to install those images locally and not to pull from remote server?
I use Docker 1.8 and work on Windows 10.
Do you need to run this images in a server different from the one you build then?
If you need you have some alternatives:
As #engineer-dollery said, you can run a registry into your network, than you would not need to send it over the internet, only in your network. Docs: https://docs.docker.com/registry/deploying/
You could use the docker save and docker import to move then around too. Docs: https://docs.docker.com/engine/reference/commandline/save/
But if the server you run the images is the same you build then...
...than you could just add the tag image to your docker-compose services, and do a docker-compose build, as #lauri said, but with the image docker-compose will create a image with that name after the build, and then you could do docker run using than. Or do a docker-compose up --build so it will always build than again if something changes into the Dockerfile
If you define build option in docker-compose.yml, you should be able to build images locally with Docker Compose and then it uses those images without pulling. By default Docker Compose builds images if they are not found locally. If you want to rebuild images just add --build option docker-compose up command docker-compose up --build
Docker Compose build reference:
https://docs.docker.com/compose/compose-file/#build

Resources