Explanation of Container From Scratch - docker

I am learning about containers and docker in particular. I just watched this Liz Rice video in which she created a container from scratch (repo is on github.com/lizrice). I wasn't able to follow it completely as I am new to Docker and containers and I don't know Go programming language. However, I wanted to see if someone could give me a very quick explanation of what these items in the code are/trying to accomplish:
package main
import (
"fmt"
"io/ioutil"
"os"
"os/exec"
"path/filepath"
"strconv"
"syscall"
)
// go run main.go run <cmd> <args>
func main() {
switch os.Args[1] {
case "run":
run()
case "child":
child()
default:
panic("help")
}
}
func run() {
fmt.Printf("Running %v \n", os.Args[2:])
cmd := exec.Command("/proc/self/exe", append([]string{"child"}, os.Args[2:]...)...)
cmd.Stdin = os.Stdin
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
cmd.SysProcAttr = &syscall.SysProcAttr{
Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID | syscall.CLONE_NEWNS,
Unshareflags: syscall.CLONE_NEWNS,
}
must(cmd.Run())
}
func child() {
fmt.Printf("Running %v \n", os.Args[2:])
cg()
cmd := exec.Command(os.Args[2], os.Args[3:]...)
cmd.Stdin = os.Stdin
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
must(syscall.Sethostname([]byte("container")))
must(syscall.Chroot("/home/liz/ubuntufs"))
must(os.Chdir("/"))
must(syscall.Mount("proc", "proc", "proc", 0, ""))
must(syscall.Mount("thing", "mytemp", "tmpfs", 0, ""))
must(cmd.Run())
must(syscall.Unmount("proc", 0))
must(syscall.Unmount("thing", 0))
}
func cg() {
cgroups := "/sys/fs/cgroup/"
pids := filepath.Join(cgroups, "pids")
os.Mkdir(filepath.Join(pids, "liz"), 0755)
must(ioutil.WriteFile(filepath.Join(pids, "liz/pids.max"), []byte("20"), 0700))
// Removes the new cgroup in place after the container exits
must(ioutil.WriteFile(filepath.Join(pids, "liz/notify_on_release"), []byte("1"), 0700))
must(ioutil.WriteFile(filepath.Join(pids, "liz/cgroup.procs"), []byte(strconv.Itoa(os.Getpid())), 0700))
}
func must(err error) {
if err != nil {
panic(err)
}
}
In particular, my understanding of a container is that it is a virtualized run-time environment where users can isolate applications from the underlying system and that containers are only isolated groups of processes running on a single host, which fulfill a set of “common” features. I have a good sense of what a container is and trying to accomplish in a broader sense, but I wanted help to understand a specific example like this. If someone understands this well -What is being imported in the import block; what are the cases for in the main function; what is the use of the statement in the run function, and what is being accomplished by the child and cg functions?
I think with my current understanding and going through Docker tutorial, plus an explanation of a real code from scratch example would be extremely beneficial. Just to confirm - this code is not related to Docker itself outside of the code creates a container and Docker is a technology that makes creating containers easier.

She is creating a sort of container by doing this:
she will execute main.go and pass a command to be executed in the container
to do this she runs a process that executes the run() function
in the run() function she prepares a process to be forked that will execute the child() function
but before actually forking, via syscall.SysProcAttr, she configures a new namespace for:
"unix timesharing" (syscall.CLONE_NEWUTS) this essentially will allow to have a separate hostname in the child process
PIDs (syscall.CLONE_NEWPID) such that in the "container" she is creating she will have new PIDs starting from 1
mounts (syscall.CLONE_NEWNS) will enable the "container" to have separate mounts
next she executes the fork (cmd.Run())
in the forked process the child() function is executed an here:
she prepares a control group via cg() that will limit the resources available to the "container", this is done by writing some proper files in the /sys/fs/cgroup/
next she prepares the command to be executed by using the args passed to main.go
she uses chroot to a new root under /home/liz/ubuntufs
she monuts the special fs proc and another temporary fs
finally she executes the command provided as args to main.go
in the video containers from scratch she presents all of this very well.
There she executes a bash in the container that sees new PIDs, has a new hostname, and is limited to 20 processes.
To make it work she needed a full ubuntu fs clone under /home/liz/ubuntufs.
The 3 key points to take home are that a continer (well her "container") essentially does this:
uses namespaces to define what the container will see in terms of
PIDs/mounts (she did not handle networking in this container example)
uses chroot to restrict the container to a portion of the filesystem
uses cgroups to limit resources the container can use

Due to my lack of experience in GO & limited experience with custom docker containers, I can not confirm what this code does.
While this is not directly answering the question in the title, I want to provide an answer that helps you learn the basics in docker to get you started.
You're understanding of containers is correct. Try to find a tutorial that uses a simpler example in a language you're familiar with .
One simple example to get you started would be to create a container of your preferred linux OS, attach the docker container to your current terminal then run few OS specific commands within the container (such as installing a software inside the container or any linux command) .

Related

Using docker for only some processes in Nextflow

I am writing a pipeline in Nextflow, which contains multiple processes, where most of them use docker. Now I am trying to add a new process which includes only a python script to preprocess some results - no docker image needed.
However, I get the error Missing container image for process 'my_python_process'.
I define the docker images in nextflow.config as follows:
process {
withName:process1 {
container = 'some/image1:1.0'
}
withName:process2{
container = 'some/image2:1.0'
}
withName:process3{
container = 'some/image3:1.0'
}
}
docker {
enabled = true
}
I found a discussion, where they suggested using container = null for the process without container, but it still gives the same error, no matter what the process script contains.
Does anyone know what I'm missing please? Thank you!
With docker.enabled = true, Nextflow will try to run each process in a Docker container created using the specified image. You then get the error you're seeing when the container directive has not been specified for a particular process. The usual way is to just specify a 'base' or 'default' container for your workflow. You may want to choose one that comes with Python. Otherwise, Ubuntu would be a good choice in my opinion.
Note that the withName process selector has the highest priority1.
process {
container = 'ubuntu:22.04'
withName: my_python_process {
container = 'python:3.9'
}
withName: process1 {
container = 'some/image1:1.0'
}
withName: process2 {
container = 'some/image2:1.0'
}
withName: process3 {
container = 'some/image3:1.0'
}
}
docker {
enabled = true
}
I'm not aware of a way to disable Docker execution for a particular process, but nor would you really want to2. The above approach should be preferred:
Containerization allows you to write self-contained and truly
reproducible computational pipelines, by packaging the binary
dependencies of a script into a standard and portable format that can
be executed on any platform that supports a container runtime.
Furthermore, the same pipeline can be transparently executed with any
of the supported container runtimes, depending on which runtimes are
available in the target compute environment.

`docker run` as Prefect task

My actual workloads that should be run as tasks within a Prefect flow are all packaged as docker images. So a flow is basically just "run this container, then run that container".
However, I'm unable to find any examples of how I can easily start a docker container as task. Basically, I just need docker run from a flow.
I'm aware of https://docs.prefect.io/api/latest/tasks/docker.html and tried various combinations of CreateContainer and StartContainer, but without any luck.
Using the Docker tasks from Prefect's Task Library could look something like this for your use case:
from prefect import task, Flow
from prefect.tasks.docker import (
CreateContainer,
StartContainer,
GetContainerLogs,
WaitOnContainer,
)
create = CreateContainer(image_name="prefecthq/prefect", command="echo 12345")
start = StartContainer()
wait = WaitOnContainer()
logs = GetContainerLogs()
#task
def see_output(out):
print(out)
with Flow("docker-flow") as flow:
container_id = create()
s = start(container_id=container_id)
w = wait(container_id=container_id)
l = logs(container_id=container_id)
l.set_upstream(w)
see_output(l)
flow.run()
This snippet above will create a container, start it, wait for completion, retrieve logs, and then print the output of echo 12345 to the command line.
Alternatively you could also use the Docker Python client directly in your own tasks https://docker-py.readthedocs.io/en/stable/api.html#module-docker.api.container

Change log level on runtime for containers

im using logrus for logging for out applications which run on K8S
we have env variable which we can set the log-level and change it when we restart out application
our applications is running with docker containers on k8s
Now we want to change the log-level on runtime, i.e. don’t restart the container and change it
when it’s running and with this we can change it from error to debug, I think this
is legitimic request but didn’t find any reference or any open source which doing this, any idea?
package logs
import (
"fmt"
"os"
"github.com/sirupsen/logrus"
)
const (
AppLogLevel = “APP_LOG_LEVEL"
DefLvl = "info"
)
var Logger *logrus.Logger
func NewLogger() *logrus.Logger {
var level logrus.Level
lvl := getLogLevel()
// In case level doesn't set will not print any message
level = logLevel(lvl)
logger := &logrus.Logger{
Out: os.Stdout,
Level: level,
}
Logger = logger
return Logger
}
// use from env
func getLogLevel() string {
lvl, _ := os.LookupEnv(AppLogLevel)
if lvl != "" {
return lvl
}
return DefLvl
}
func logLevel(lvl string) logrus.Level {
switch lvl {
case "debug":
// Used for tracing
return logrus.DebugLevel
case "info":
return logrus.InfoLevel
case "error":
return logrus.ErrorLevel
case "fatal":
return logrus.FatalLevel
default:
panic(fmt.Sprintf("the specified %s log level is not supported", lvl))
}
}
I know how to change the log level but I need a way to infuance the logger to change the level
As a general Un*x statement, you cannot change an environment variable in a process after it has started. (You can setenv(3) your own environment, and you can specify a new process's environment when you execve(2) it, but once it's started, you can't change it again.)
This restriction carries through to higher levels. If you've docker run a container, its -e option to set an environment variable is one of the things you have to delete and recreate a container to change. The env: is one of the many immutable parts of a Kubernetes Pod specification; you also can't change it without deleting and recreating the pod.
If you've deployed the pod via a Deployment (and you really should), you can change the environment variable setting in the Deployment spec (edit the YAML file in source control and kubectl apply -f it, or directly kubectl edit). This will cause Kubernetes to start new pods with the new log value and shut down old ones, in that order, doing a zero-downtime update. Deleting and recreating pods like this is totally normal and happens whenever you want to, for example, change the image inside the deployment to have today's build.
If your application is capable of noticing changes to config files it's loaded (and it would have to be specially coded to do that) one other path that could work for you is to mount a ConfigMap into a container; if you change the ConfigMap contents, the files the container sees will change but it will not restart. I wouldn't go out of my way to write this just to avoid restarting a pod, though.
You can run the command kubectl exec -it <container_name> bash and use the command line inside the container to change the environment variable .
You can do it by running the command export LOG_LEVEL=debug or export LOG_LEVEL=error inside the container.
First off, understand this should happen on the application level. I.e. it's not something that Kubernetes is supposed to do for you.
That being said, you could have your application checking an environment variable's value (you are already doing this), and depending on what that value is, it can set the application's log-level. In other words, let the application code poll an environment variable to see if it has changed.
You can inject environment variables like Shahaf suggests, but that requires you to exec into the pod, which may not always be possible or good practice.
I would suggest you run kubectl set env rs [REPLICASET_NAME] SOME_ENVIRONMENT_VAR=1.
All of this being said, you need to consider why this is important. Kubernetes is built under the principle that "pods should be treated like cattle, not pets". Meaning when a pod is no longer useful, or out of sync, it should be terminated and a new one, that represents the code's current state, should be booted up in its stead.
Regardless of how you go about doing what you need to do, you REALLY shouldn't be doing this in production, or even in staging.
Instead let your app's underlying environment variables set a log-level that is appropriate for that environment.

Is there a good, standard way to "bootstrap" a containerized application?

I'm working on an application which needs to be initialized the first time it is run.
Practically, what this will do is initialize a database with some starter values, and save some files in a persistent volume. If I stop the container and then restart it, I don't want to re-run that bootstrapping routine. In other words, if the container is present and populated - skip the initialization routine.
The way I was going to implement this was have an entry-point script which checks if the configuration files are present, and if so will skip the bootstrapping routine, however, I was wondering if there is a better way to do it?
For example, is there a way to run a script which is specifically triggered by the need to create a volume? If I could d that, the only circumstance under which I'd run the bootstrapper would be when the application was initializing for the first time.
Or, is there a better, more Dockerish pattern that defines how I should go about this problem?
"Do the initialization in an entrypoint script if the files don't already exist" seems to be reasonably idiomatic. For example, the standard postgres:9.6 image checks for a $PGDATA/PG_VERSION file.
Hypothetically this can look something like:
#!/bin/sh
if [ ! -f /data/config.ini ]; then
/opt/myapp/setup-data.sh /data
fi
exec "$#"
Remember that it's very routine to delete and recreate containers for a variety of reasons (IME stop and start as actions are rare, but some of this is habits born of an earlier age of Docker);this ties well into your intuition to use the entrypoint for this, since it will get launched on every docker run. From within your container you can't really tell if a directory is or isn't a volume and there aren't any hooks you can tie into; at the point the entrypoint begins, the container environment is fully set up, with whatever networks and volumes already attached.

Looking for a convenient way to start and stop applications with docker-compose

For each of my projects, I have configured a docker development environment consisting of several containers. I often switch between projects. That requires stopping one set of containers and starting another. I currently do it like this:
$ cd project1
$ docker-compose stop
$ cd ../project2
$ docker-compose up -d
So I need to remember which application is currently running, cd into the directory where its docker-compose.yml is, stop it, then remember what other project I want to run, cd there and start it.
Is there a better way? Like a utility that remembers which multicontainer applications I have, can stop the currently running one and run another one without manual cding and docker-composeing?
(By the way, what's the correct term for a set of containers hosting parts of a single application?)
Hope docker-compose-ui will help you in managing applications.
I think the real problem here is this:
That requires stopping one set of containers and starting another.
You shouldn't need to stop one project to start another.
Instead of mapping to the same host ports I would not map any ports at all. Then use a script to lookup the IP of the container, and connect directly to that:
#!/bin/bash
cip=$(docker inspect -f '{{range $key, $value := .NetworkSettings.Networks}} {{ $value.IPAddress}} {{end}}' $1)
This will look up the container ip. Combine that with a command to open the url:
url=http://cip:8080/
xdg-open $url || open $url
All together this will let you run the application without having to map any host ports. When host ports don't exist, you don't have to stop other projects.
If you are ruby proven a bit, you can use scaffolding for this.
A barebone example using thread ( to start different docker-compose session without one process and then stop them all together )
require 'docker-compose'
threads = []
project_paths = %w(/project/path1 /project/path2 /project/path3 /project/path)
project_paths.each do |path|
threads.push Docker::Compose::Session.new(dir:compose_base_path1)
end
begin
threads.each do |thread|
thread.join
end
rescue SystemExit, Interrupt
threads.each do |thread|
thread.kill
end
rescue Exception => e
handle_exception e
end
source
It uses
docker-compose gem
threads
Just set project_paths to the folders of your projects. And if you want to end them all, use CTRL+c
You can of course go beyond that, using a daemon and try to start / stop some of them giving "names" and such, but i guess as a starting point for scaffolding, that should be enaugh

Resources