Forking a namespace in a docker container - docker

I'd like to execute a program in a docker container that calls clone().
This leads to the following error message:
failed to fork initial namespaces process: Operation not permitted
The situation is pretty much the same as in this question: clone: operation not permitted
It's possible to give extended privileges to the container, by starting it with --privileged. But this doesn't just include the permission to fork namespaces, it also gives access to the host hardware. I want to avoid that.
How can I allow the process executed in the docker container to clone itself, without giving extensive privileges to the docker container?
In principle, this constraint is something that I don't fully understand. The process is already executing, why would it be a security issue to have more processes with the same namespace? I would assume that anything they could do, the original process could do just as well.

Related

Rsyslog can't start inside of a docker container

I've got a docker container running a service, and I need that service to send logs to rsyslog. It's an ubuntu image running a set of services in the container. However, the rsyslog service cannot start inside this container. I cannot determine why.
Running service rsyslog start (this image uses upstart, not systemd) returns only the output start: Job failed to start. There is no further information provided, even when I use --verbose.
Furthermore, there are no error logs from this failed startup process. Because rsyslog is the service that can't start, it's obviously not running, so nothing is getting logged. I'm not finding anything relevant in Upstart's logs either: /var/log/upstart/ only contains the logs of a few things that successfully started, as well as dmesg.log which simply contains dmesg: klogctl failed: Operation not permitted. which from what I can tell is because of a docker limitation that cannot really be fixed. And it's unknown if this is even related to the issue.
Here's the interesting bit: I have the exact same container running on a different host, and it's not suffering from this issue. Rsyslog is able to start and run in the container just fine on that host. So obviously the cause is some difference between the hosts. But I don't know where to begin with that: There are LOTS of differences between the hosts (the working one is my local windows system, the failing one is a virtual machine running in a cloud environment), so I wouldn't know where to even begin about which differences could cause this issue and which ones couldn't.
I've exhausted everything that I know to check. My only option left is to come to stackoverflow and ask for any ideas.
Two questions here, really:
Is there any way to get more information out of the failure to start? start itself is a binary file, not a script, so I can't open it up and edit it. I'm reliant solely on the output of that command, and it's not logging anything anywhere useful.
What could possibly be different between these two hosts that could cause this issue? Are there any smoking guns or obvious candidates to check?
Regarding the container itself, unfortunately it's a container provided by a third party that I'm simply modifying. I can't really change anything fundamental about the container, such as the fact that it's entrypoint is /sbin/init (which is a very bad practice for docker containers, and is the root cause of all of my troubles). This is also causing some issues with the docker logging driver, which is why I'm stuck using syslog as the logging solution instead.

Periodic cron-like Functions Across Containers in a Docker Project

I have implemented the LAMP stack for a 3rd party forum application on its own dedicated virtual server. One of my aims here was to use a composed docker project (under Git) to encapsulate the application fully. I wanted to keep this as simple to understand as possible for the other sysAdmins supporting the forum, so this really ruled out using S6 etc., and this in turn meant that I had to stick to the standard of one container per daemon service using the docker runtime to do implement the daemon functionality.
I had one particular design challenge that doesn't seem to be addressed cleanly through the Docker runtime system, and that is I need to run periodic housekeeping activities that need to interact across various docker containers, for example:
The forum application requires a per-minute PHP housekeeping task to be run using php-cli, and I only have php-cli and php-fpm (which runs as the foreground deamon process) installed in the php container.
Letsencrypt certificate renewal need a weekly certbot script to be run in the apache container's file hierarchy.
I use conventional /var/log based logging for high-volume Apache access logs as these generate Gb access files that I want to retain for ~7 days in the event of needing to do hack analysis, but that are otherwise ignored.
Yes I could use the hosts crontab to run docker exec commands but this involves exposing application internals to the host system and IMO this is breaking one of my design rules. What follows is my approach to address this. My Q is really to ask for comments and better alternative approaches, and if not then this can perhaps serve as a template for others searching for an approach to this challenge.
All of my containers contain two special to container scripts: docker-entrypoint.sh which is a well documented convention; docker-service-callback.sh which is the action mechanism to implement the tasking system.
I have one application agnostic host service systemctl: docker-callback-reader.service which uses this bash script, docker-callback-reader. This services requests on a /run pipe that is volume-mapped into any container that need to request such event processes.
In practice I have only one such housekeeping container see here that implements Alpine crond and runs all of the cron-based events. So for example the following entry does the per-minute PHP tasking call:
- * * * * echo ${VHOST} php task >/run/host-callback.pipe
In this case the env variable VHOST identifies the relevant docker stack, as I can have multiple instances (forum and test) running on the server; the next parameter (php in this case) identifies the destination service container; the final parameter (task) plus any optional parameters are passed as arguments to a docker exec of php containers docker-service-callback.sh and magic happens as required.
I feel that the strengths of the system are that:
Everything is suitably encapsulated. The host knows nothing of the internals of the app other than any receiving container must have a docker-service-callback.sh on its execution path. The details of each request are implemented internally in the executing container, and are hidden from the tasking container.
The whole implementation is simple, robust and has minimal overhead.
Anyone is free to browse my Git repo and cherry-pick whatever of this they wish.
Comments?

Architectural question about user-controlled Docker instances

I got a website in Laravel where you can click on a button which sends a message to a Python daemon which is isolated in Docker. This works for an easy MVP to prove a concept, but it's not viable in production because a user would most likely want to pause, resume and stop that process as well because that service is designed to never stop otherwise considering it's a scanner which is looped.
I have thought about a couple of solutions for this, such as fixing it in the software layer but that would add complexity to the program. I have googled Docker and I have found that it is actually possible to do what I want to do with Docker itself with the commands pause, unpause, run and kill.
It would be optimal if I had a service which would interact with the Docker instances with the criteria of above and would be able to take commands from HTTP. Is Docker Swarm the right solution for this problem or is there an easier way?
There are both significant security and complexity concerns to using Docker this way and I would not recommend it.
The core rule of Docker security has always been, if you can run any docker command, then you can easily take over the entire host. (You cannot prevent someone from docker run a container, as container-root, bind-mounting any part of the host filesystem; so they can reset host-root's password in the /etc/shadow file to something they know, allow remote-root ssh access, and reboot the host, as one example.) I'd be extremely careful about connecting this ability to my web tier. Strongly coupling your application to Docker will also make it more difficult to develop and test.
Instead of launching a process per crawling job, a better approach might be to set up some sort of job queue (perhaps RabbitMQ), and have a multi-user worker that pulls jobs from the queue to do work. You could have a queue per user, and a separate control queue that receives the stop/start/cancel messages.
If you do this:
You can run your whole application without needing Docker: you need the front-end, the message queue system, and a worker, but these can all run on your local development system
If you need more crawlers, you can launch more workers (works well with Kubernetes deployments)
If you're generating too many crawl requests, you can launch fewer workers
If a worker dies unexpectedly, you can just restart it, and its jobs will still be in the queue
Nothing needs to keep track of which process or container belongs to a specific end user

What's the purpose of the node-modules container in wolkenkit?

That container is built when deploying the application.
Looks like its purpose is to share dependencies across modules.
It looks like it is started as a container but nothing is apparently running, a bit like an init container.
Console says it starts/stops that component when using respective wolkenkit start and wolkenkit stop command.
On startup:
On shutdown:
When you docker ps, that container cannot be found:
Can someone explain these components?
When starting a wolkenkit application, the application is boxed in a number of Docker containers, and these containers are then started along with a few other containers that provide the infrastructure, such as databases, a message queue, ...
The reason why the application is split into several Docker containers is because wolkenkit builds upon the CQRS pattern, which suggests separating the read side of an application from the application's write side, and hence there is one container for the read side, and one for the write side (actually there are a few more, but you get the picture).
Now, since you may develop on an operating system other than Linux, the wolkenkit application may run under a different operating system than when you develop it, as within Docker it's always Linux. This means that the start command can not simply copy over the node_modules folder into the containers, as they may contain binary modules, which are then not compatible (imagine installing on Windows on the host, but running on Linux within Docker).
To avoid issues here, wolkenkit runs an npm install when starting the application inside of the containers. The problem now is that if wolkenkit did this in every single container, the start would be super slow (it's not the fastest thing on earth anyway, due to all the Docker building and starting that's happening under the hood). So wolkenkit tries to optimize this as much as possible.
One concept here is to run npm install only once, inside of a container of its own. This is the node-modules container you encountered. This container is then linked as a volume to all the containers that contain the application's code. This way you only have to run npm install once, but multiple containers can use the outcome of this command.
Since this container now contains data, but no code, it only has to be there, it doesn't actually do anything. This is why it gets created, but is not run.
I hope this makes it a little bit clearer, and I was able to answer your question :-)
PS: Please note that I am one of the core developers of wolkenkit, so take my answer with a grain of salt.

Safest way to turn docker cli bash commands into an api for external application in production use

I'm running a program that uses several docker images and containers and it's all spawned and managed by the code. At the same time, I need to enter into the docker exec -it cli bash and execute some commands. These commands however cant be manual and must be made into an api. After extensive searching the closest thing I found is docker remote api [https://blog.trifork.com/2013/12/24/docker-from-a-distance-the-remote-api]. However, I'm a bit scared messing with the internals of docker. I want the spawning and management to remain controlled by the program. I only need to run a limited number of commands to docker cli. Is docker remote api the right way to go? Will it handle scale- my application may see ~27000 mobile and webapps use/call the apis from different parts of the world. Tried and tested solutions would be preferred.
Any advice would be highly appreciated.
There’s not an easy answer to this. Since you include “safest” in the question title, I will suggest you probably need to do some redesign of your application architecture.
The first critical detail is this: being able to run any Docker command, or access the Docker API, implies unrestricted root access on the host. You can trivially docker run an image with writeable root-level access to the host’s filesystem and steal public keys, user passwords, give yourself sudo access, and so on. Using it as a core part of your workflow is incredibly dangerous. Turning on the Docker remote API at all is incredibly dangerous.
As a corollary to this, while docker exec is handy as a debugging tool, you can’t really use it as part of your core workflow. As you note running commands by hand as a trusted administrator doesn’t scale. There are also dangers in shell quoting: you need to make sure an argument doesn’t look like foo; docker run -v/:/host ... and inadvertently gain access to the host system.
In my mind your only real option here is to do this “properly”. Take whatever administrative commands you need to do and wrap them in some API, probably HTTP-based. Build a new service (or several) and add it to your Docker deployment. Maybe under the hood that launches a shell script as a subprocess, but the API wrapper has control over the arguments and can double-check things. The plus side is that this approach probably won’t be a choke point if your application does need to scale out.

Resources