Using docker for only some processes in Nextflow - docker

I am writing a pipeline in Nextflow, which contains multiple processes, where most of them use docker. Now I am trying to add a new process which includes only a python script to preprocess some results - no docker image needed.
However, I get the error Missing container image for process 'my_python_process'.
I define the docker images in nextflow.config as follows:
process {
withName:process1 {
container = 'some/image1:1.0'
}
withName:process2{
container = 'some/image2:1.0'
}
withName:process3{
container = 'some/image3:1.0'
}
}
docker {
enabled = true
}
I found a discussion, where they suggested using container = null for the process without container, but it still gives the same error, no matter what the process script contains.
Does anyone know what I'm missing please? Thank you!

With docker.enabled = true, Nextflow will try to run each process in a Docker container created using the specified image. You then get the error you're seeing when the container directive has not been specified for a particular process. The usual way is to just specify a 'base' or 'default' container for your workflow. You may want to choose one that comes with Python. Otherwise, Ubuntu would be a good choice in my opinion.
Note that the withName process selector has the highest priority1.
process {
container = 'ubuntu:22.04'
withName: my_python_process {
container = 'python:3.9'
}
withName: process1 {
container = 'some/image1:1.0'
}
withName: process2 {
container = 'some/image2:1.0'
}
withName: process3 {
container = 'some/image3:1.0'
}
}
docker {
enabled = true
}
I'm not aware of a way to disable Docker execution for a particular process, but nor would you really want to2. The above approach should be preferred:
Containerization allows you to write self-contained and truly
reproducible computational pipelines, by packaging the binary
dependencies of a script into a standard and portable format that can
be executed on any platform that supports a container runtime.
Furthermore, the same pipeline can be transparently executed with any
of the supported container runtimes, depending on which runtimes are
available in the target compute environment.

Related

Adding docker run flags to ECS operator in airflow

I'm using ECSOperator in airflow and I need to pass flags to the docker run. I searched the internet but I couldn't find a way to give an ECSOperator flags such as: -D, --cpus and more.
Is there a way to pass these flags to a docker run (if a certain condition is true) using the ECSOperator (same way we can pass tags, and network configuration), or they can only be defined in the ECS container running the docker image?
I'm not familiar with ECSOpearor but if I understand correctly that is python library. And you can create new task using python
As I can see in this exmaple it is possible to set task_definition and overrides:
...
ecs_operator_task = ECSOperator(
task_id = "ecs_operator_task",
dag=dag,
cluster=CLUSTER_NAME,
task_definition=service['services'][0]['taskDefinition'],
launch_type=LAUNCH_TYPE,
overrides={
"containerOverrides":[
{
"name":CONTAINER_NAME,
"command":["ls", "-l", "/"],
},
],
},
network_configuration=service['services'][0]['networkConfiguration'],
awslogs_group="mwaa-ecs-zero",
awslogs_stream_prefix=f"ecs/{CONTAINER_NAME}",
...
So if you want to set CPU and Memory specs for whole task you have to update task_definition dictionary parameters (something like service['services'][0]['taskDefinition']['cpu'] = 2048)
If you want to specify parameters for exact container, overrides should be proper way:
overrides={
"containerOverrides":[
{
"cpu": 2048,
...
},
],
},
Or edited containerDefinitions may be set directly inside task_definition in theory...
Anyway most of docker parameters should be pass inside containerDefinitions section.
So about your question:
Is there a way to pass these flags to a docker run
If I understand correctly you have a JSON TaskDefinition file and want to run it locally using docker?
Then try to check these tools. It allows you to convert docker-compose.yml into ECS definition, and that is opposite of what you looking for, but maybe some of these tools able to convert it vice-versa..?
In other way you have to parse TaskDefinition's JSON manually and convert it to docker command arguments

Executing command in docker container after compose up

I'm trying to automate the execution of some command(s) in a docker container after compose up. My first idea was to use a gradle DockerExecContainer task (provided by docker plugin) and retrieve the target container id from the dockerCompose extension (provided by docker compose plugin).
I thought something like this would work:
plugins {
id 'com.avast.gradle.docker-compose' version '0.12.1'
id 'com.bmuschko.docker-remote-api' version '6.4.0'
}
import com.bmuschko.gradle.docker.tasks.container.DockerExecContainer
task setupPhabricator(type: DockerExecContainer) {
containerId.set(dockerCompose.servicesInfos.phabricator.firstContainer.containerId)
commands.add(['echo', '$PHABRICATOR_HOST'] as String[])
}
dockerCompose {
isRequiredBy(setupPhabricator)
}
This fails because dockerCompose.servicesInfos.phabricator is null. This makes sense because at configuration time the container is not up yet.
I figured I should set the container ID during the execution phase:
task setupPhabricator(type: DockerExecContainer) {
commands.add(['echo', '$PHABRICATOR_HOST'] as String[])
doFirst {
containerId.set(dockerCompose.servicesInfos.phabricator.firstContainer.containerId)
}
}
But this fails with:
No value has been specified for property 'containerId'
I assume I'm missing something fundamental here so any idea is welcome.

Explanation of Container From Scratch

I am learning about containers and docker in particular. I just watched this Liz Rice video in which she created a container from scratch (repo is on github.com/lizrice). I wasn't able to follow it completely as I am new to Docker and containers and I don't know Go programming language. However, I wanted to see if someone could give me a very quick explanation of what these items in the code are/trying to accomplish:
package main
import (
"fmt"
"io/ioutil"
"os"
"os/exec"
"path/filepath"
"strconv"
"syscall"
)
// go run main.go run <cmd> <args>
func main() {
switch os.Args[1] {
case "run":
run()
case "child":
child()
default:
panic("help")
}
}
func run() {
fmt.Printf("Running %v \n", os.Args[2:])
cmd := exec.Command("/proc/self/exe", append([]string{"child"}, os.Args[2:]...)...)
cmd.Stdin = os.Stdin
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
cmd.SysProcAttr = &syscall.SysProcAttr{
Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID | syscall.CLONE_NEWNS,
Unshareflags: syscall.CLONE_NEWNS,
}
must(cmd.Run())
}
func child() {
fmt.Printf("Running %v \n", os.Args[2:])
cg()
cmd := exec.Command(os.Args[2], os.Args[3:]...)
cmd.Stdin = os.Stdin
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
must(syscall.Sethostname([]byte("container")))
must(syscall.Chroot("/home/liz/ubuntufs"))
must(os.Chdir("/"))
must(syscall.Mount("proc", "proc", "proc", 0, ""))
must(syscall.Mount("thing", "mytemp", "tmpfs", 0, ""))
must(cmd.Run())
must(syscall.Unmount("proc", 0))
must(syscall.Unmount("thing", 0))
}
func cg() {
cgroups := "/sys/fs/cgroup/"
pids := filepath.Join(cgroups, "pids")
os.Mkdir(filepath.Join(pids, "liz"), 0755)
must(ioutil.WriteFile(filepath.Join(pids, "liz/pids.max"), []byte("20"), 0700))
// Removes the new cgroup in place after the container exits
must(ioutil.WriteFile(filepath.Join(pids, "liz/notify_on_release"), []byte("1"), 0700))
must(ioutil.WriteFile(filepath.Join(pids, "liz/cgroup.procs"), []byte(strconv.Itoa(os.Getpid())), 0700))
}
func must(err error) {
if err != nil {
panic(err)
}
}
In particular, my understanding of a container is that it is a virtualized run-time environment where users can isolate applications from the underlying system and that containers are only isolated groups of processes running on a single host, which fulfill a set of “common” features. I have a good sense of what a container is and trying to accomplish in a broader sense, but I wanted help to understand a specific example like this. If someone understands this well -What is being imported in the import block; what are the cases for in the main function; what is the use of the statement in the run function, and what is being accomplished by the child and cg functions?
I think with my current understanding and going through Docker tutorial, plus an explanation of a real code from scratch example would be extremely beneficial. Just to confirm - this code is not related to Docker itself outside of the code creates a container and Docker is a technology that makes creating containers easier.
She is creating a sort of container by doing this:
she will execute main.go and pass a command to be executed in the container
to do this she runs a process that executes the run() function
in the run() function she prepares a process to be forked that will execute the child() function
but before actually forking, via syscall.SysProcAttr, she configures a new namespace for:
"unix timesharing" (syscall.CLONE_NEWUTS) this essentially will allow to have a separate hostname in the child process
PIDs (syscall.CLONE_NEWPID) such that in the "container" she is creating she will have new PIDs starting from 1
mounts (syscall.CLONE_NEWNS) will enable the "container" to have separate mounts
next she executes the fork (cmd.Run())
in the forked process the child() function is executed an here:
she prepares a control group via cg() that will limit the resources available to the "container", this is done by writing some proper files in the /sys/fs/cgroup/
next she prepares the command to be executed by using the args passed to main.go
she uses chroot to a new root under /home/liz/ubuntufs
she monuts the special fs proc and another temporary fs
finally she executes the command provided as args to main.go
in the video containers from scratch she presents all of this very well.
There she executes a bash in the container that sees new PIDs, has a new hostname, and is limited to 20 processes.
To make it work she needed a full ubuntu fs clone under /home/liz/ubuntufs.
The 3 key points to take home are that a continer (well her "container") essentially does this:
uses namespaces to define what the container will see in terms of
PIDs/mounts (she did not handle networking in this container example)
uses chroot to restrict the container to a portion of the filesystem
uses cgroups to limit resources the container can use
Due to my lack of experience in GO & limited experience with custom docker containers, I can not confirm what this code does.
While this is not directly answering the question in the title, I want to provide an answer that helps you learn the basics in docker to get you started.
You're understanding of containers is correct. Try to find a tutorial that uses a simpler example in a language you're familiar with .
One simple example to get you started would be to create a container of your preferred linux OS, attach the docker container to your current terminal then run few OS specific commands within the container (such as installing a software inside the container or any linux command) .

Change log level on runtime for containers

im using logrus for logging for out applications which run on K8S
we have env variable which we can set the log-level and change it when we restart out application
our applications is running with docker containers on k8s
Now we want to change the log-level on runtime, i.e. don’t restart the container and change it
when it’s running and with this we can change it from error to debug, I think this
is legitimic request but didn’t find any reference or any open source which doing this, any idea?
package logs
import (
"fmt"
"os"
"github.com/sirupsen/logrus"
)
const (
AppLogLevel = “APP_LOG_LEVEL"
DefLvl = "info"
)
var Logger *logrus.Logger
func NewLogger() *logrus.Logger {
var level logrus.Level
lvl := getLogLevel()
// In case level doesn't set will not print any message
level = logLevel(lvl)
logger := &logrus.Logger{
Out: os.Stdout,
Level: level,
}
Logger = logger
return Logger
}
// use from env
func getLogLevel() string {
lvl, _ := os.LookupEnv(AppLogLevel)
if lvl != "" {
return lvl
}
return DefLvl
}
func logLevel(lvl string) logrus.Level {
switch lvl {
case "debug":
// Used for tracing
return logrus.DebugLevel
case "info":
return logrus.InfoLevel
case "error":
return logrus.ErrorLevel
case "fatal":
return logrus.FatalLevel
default:
panic(fmt.Sprintf("the specified %s log level is not supported", lvl))
}
}
I know how to change the log level but I need a way to infuance the logger to change the level
As a general Un*x statement, you cannot change an environment variable in a process after it has started. (You can setenv(3) your own environment, and you can specify a new process's environment when you execve(2) it, but once it's started, you can't change it again.)
This restriction carries through to higher levels. If you've docker run a container, its -e option to set an environment variable is one of the things you have to delete and recreate a container to change. The env: is one of the many immutable parts of a Kubernetes Pod specification; you also can't change it without deleting and recreating the pod.
If you've deployed the pod via a Deployment (and you really should), you can change the environment variable setting in the Deployment spec (edit the YAML file in source control and kubectl apply -f it, or directly kubectl edit). This will cause Kubernetes to start new pods with the new log value and shut down old ones, in that order, doing a zero-downtime update. Deleting and recreating pods like this is totally normal and happens whenever you want to, for example, change the image inside the deployment to have today's build.
If your application is capable of noticing changes to config files it's loaded (and it would have to be specially coded to do that) one other path that could work for you is to mount a ConfigMap into a container; if you change the ConfigMap contents, the files the container sees will change but it will not restart. I wouldn't go out of my way to write this just to avoid restarting a pod, though.
You can run the command kubectl exec -it <container_name> bash and use the command line inside the container to change the environment variable .
You can do it by running the command export LOG_LEVEL=debug or export LOG_LEVEL=error inside the container.
First off, understand this should happen on the application level. I.e. it's not something that Kubernetes is supposed to do for you.
That being said, you could have your application checking an environment variable's value (you are already doing this), and depending on what that value is, it can set the application's log-level. In other words, let the application code poll an environment variable to see if it has changed.
You can inject environment variables like Shahaf suggests, but that requires you to exec into the pod, which may not always be possible or good practice.
I would suggest you run kubectl set env rs [REPLICASET_NAME] SOME_ENVIRONMENT_VAR=1.
All of this being said, you need to consider why this is important. Kubernetes is built under the principle that "pods should be treated like cattle, not pets". Meaning when a pod is no longer useful, or out of sync, it should be terminated and a new one, that represents the code's current state, should be booted up in its stead.
Regardless of how you go about doing what you need to do, you REALLY shouldn't be doing this in production, or even in staging.
Instead let your app's underlying environment variables set a log-level that is appropriate for that environment.

How can nomad' job from local docker images

nomad docker image will be fetched from Docker Hub.But I have want use some local images.How can I use theme.(I dont want to use private repo)
Example I want to use local image test
> docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
test latest da795ca8a32f 36 minutes ago 567MB
job "test" {
datacenters = ["dc1"]
group "example" {
task "test" {
driver = "docker"
config {
image = "test"
}
resources {
cpu = 500
memory = 256
}
}
}
}
It's wrong !
I'm not sure if this can be treated as an answer or a "hack".
But if you want Nomad to use docker image that is already present on a node the image MUST NOT be tagged latest.
For testing I tag my images as IMAGE:local. This way Nomad uses it if present, pulls it from remote if not.
Looking at Nomad's source code here and here, it seems that using machine local images is not supported. That would make sense, as in a cluster environment with several nodes, the scheduler needs to be able to get the image irrespective of which machine the job is allocated to.
(One possible workaround would be to run a registry service within the Nomad cluster, and use whichever storage backend is most convenient for you)
Nomad now supports tar docker images.
here is an example
artifact {
source = "http://path.to/redis.tar"
}
config {
load = "redis.tar"
image = "redis"
}
However, the tar size may be too large to be resiliently transport and provisioned.
While #Miao1007 s answer works, you need to be aware of one thing. It seems that you you cannot use the tag latest or omit the tag altogether ( see the discussion here). You need to tag your docker build with some version number like
sudo docker build --tag dokr:1.0.0 .
sudo docker save dokr:1.0.0 > dokr-1.0.0.tar
then use the following in the job file
artifact {
source = "http://localhost:8000/dokr-1.0.0.tar"
}
config {
load = "go-docker-dokr-1.0.0.tar"
image = "go-docker-dokr:1.0.0"
}
Starting from version 0.9.0, Nomad checking whether the image has already been loaded.
Source code
Contributor comment
// We're going to check whether the image is already downloaded. If the tag
// is "latest", or ForcePull is set, we have to check for a new version every time so we don't
// bother to check and cache the id here. We'll download first, then cache.

Resources